v9fs leaks memory if the file server responds with Rerror to a Twalk
message. The patch fixes the leak.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@hera.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Yesterday, I got the following error with 2.6.16.13 during a file copy from
a smb filesystem over a wireless link. I guess there was some error on the
wireless link, which in turn caused an error condition for the smb
filesystem.
In the log, smb_file_read reports error=4294966784 (0xfffffe00), which also
shows up in the slab dumps, and also is -ERESTARTSYS. Error code 27499
corresponds to 0x6b6b, so the rq_errno field seems to be the only one being
set after freeing the slab.
In smb_add_request (which is the only place in smbfs where I found
ERESTARTSYS), I found the following:
if (!timeleft || signal_pending(current)) {
/*
* On timeout or on interrupt we want to try and remove the
* request from the recvq/xmitq.
*/
smb_lock_server(server);
if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
list_del_init(&req->rq_queue);
smb_rput(req);
}
smb_unlock_server(server);
}
[...]
if (signal_pending(current))
req->rq_errno = -ERESTARTSYS;
I guess that some codepath like smbiod_flush() caused the request to be
removed from the queue, and smb_rput(req) be called, without
SMB_REQ_RECEIVED being set. This violates an asumption made by the quoted
code.
Then, the above code calls smb_rput(req) again, the req gets freed, and
req->rq_errno = -ERESTARTSYS writes into the already freed slab. As
list_del_init doesn't cause an error if called multiple times, that does
cause the observed behaviour (freed slab with rq_errno=-ERESTARTSYS).
If this observation is correct, the following patch should fix it.
I wonder why the smb code uses list_del_init everywhere - using list_del
instead would catch such situations by poisoning the next and prev
pointers.
May 4 23:29:21 knautsch kernel: [17180085.456000] ipw2200: Firmware error detected. Restarting.
May 4 23:29:21 knautsch kernel: [17180085.456000] ipw2200: Sysfs 'error' log captured.
May 4 23:33:02 knautsch kernel: [17180306.316000] ipw2200: Firmware error detected. Restarting.
May 4 23:33:02 knautsch kernel: [17180306.316000] ipw2200: Sysfs 'error' log already exists.
May 4 23:33:02 knautsch kernel: [17180306.968000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:34:18 knautsch kernel: [17180383.256000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:34:18 knautsch kernel: [17180383.284000] SMB connection re-established (-5)
May 4 23:37:19 knautsch kernel: [17180563.956000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:40:09 knautsch kernel: [17180733.636000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:40:26 knautsch kernel: [17180750.700000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:43:02 knautsch kernel: [17180907.304000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:43:08 knautsch kernel: [17180912.324000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:43:34 knautsch kernel: [17180938.416000] smb_errno: class Unknown, code 27499 from command 0x6b
May 4 23:43:34 knautsch kernel: [17180938.416000] Slab corruption: start=c4ebe09c, len=244
May 4 23:43:34 knautsch kernel: [17180938.416000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:43:34 knautsch kernel: [17180938.416000] Last user: [<e087b903>](smb_rput+0x53/0x90 [smbfs])
May 4 23:43:34 knautsch kernel: [17180938.416000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
May 4 23:43:34 knautsch kernel: [17180938.416000] 0f0: 00 fe ff ff
May 4 23:43:34 knautsch kernel: [17180938.416000] Next obj: start=c4ebe19c, len=244
May 4 23:43:34 knautsch kernel: [17180938.416000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:43:34 knautsch kernel: [17180938.416000] Last user: [<00000000>](_stext+0x3feffde0/0x30)
May 4 23:43:34 knautsch kernel: [17180938.416000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:43:34 knautsch kernel: [17180938.416000] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:43:34 knautsch kernel: [17180938.460000] SMB connection re-established (-5)
May 4 23:43:42 knautsch kernel: [17180946.292000] ipw2200: Firmware error detected. Restarting.
May 4 23:43:42 knautsch kernel: [17180946.292000] ipw2200: Sysfs 'error' log already exists.
May 4 23:45:04 knautsch kernel: [17181028.752000] ipw2200: Firmware error detected. Restarting.
May 4 23:45:04 knautsch kernel: [17181028.752000] ipw2200: Sysfs 'error' log already exists.
May 4 23:45:05 knautsch kernel: [17181029.868000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:45:36 knautsch kernel: [17181060.984000] smb_errno: class Unknown, code 27499 from command 0x6b
May 4 23:45:36 knautsch kernel: [17181060.984000] Slab corruption: start=c4ebe09c, len=244
May 4 23:45:36 knautsch kernel: [17181060.984000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:45:36 knautsch kernel: [17181060.984000] Last user: [<e087b903>](smb_rput+0x53/0x90 [smbfs])
May 4 23:45:36 knautsch kernel: [17181060.984000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
May 4 23:45:36 knautsch kernel: [17181060.984000] 0f0: 00 fe ff ff
May 4 23:45:36 knautsch kernel: [17181060.984000] Next obj: start=c4ebe19c, len=244
May 4 23:45:36 knautsch kernel: [17181060.984000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:45:36 knautsch kernel: [17181060.984000] Last user: [<00000000>](_stext+0x3feffde0/0x30)
May 4 23:45:36 knautsch kernel: [17181060.984000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:45:36 knautsch kernel: [17181060.984000] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:45:36 knautsch kernel: [17181061.024000] SMB connection re-established (-5)
May 4 23:46:17 knautsch kernel: [17181102.132000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:47:46 knautsch kernel: [17181190.468000] smb_errno: class Unknown, code 27499 from command 0x6b
May 4 23:47:46 knautsch kernel: [17181190.468000] Slab corruption: start=c4ebe09c, len=244
May 4 23:47:46 knautsch kernel: [17181190.468000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:47:46 knautsch kernel: [17181190.468000] Last user: [<e087b903>](smb_rput+0x53/0x90 [smbfs])
May 4 23:47:46 knautsch kernel: [17181190.468000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
May 4 23:47:46 knautsch kernel: [17181190.468000] 0f0: 00 fe ff ff
May 4 23:47:46 knautsch kernel: [17181190.468000] Next obj: start=c4ebe19c, len=244
May 4 23:47:46 knautsch kernel: [17181190.468000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:47:46 knautsch kernel: [17181190.468000] Last user: [<00000000>](_stext+0x3feffde0/0x30)
May 4 23:47:46 knautsch kernel: [17181190.468000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:47:46 knautsch kernel: [17181190.468000] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:47:46 knautsch kernel: [17181190.492000] SMB connection re-established (-5)
May 4 23:49:20 knautsch kernel: [17181284.828000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:49:39 knautsch kernel: [17181303.896000] smb_file_read: //some_file validation failed, error=4294966784
Signed-off-by: Jan Niehusmann <jan@gondor.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark Moseley reported that a chroot environment on a SMB share can be left
via "cd ..\\". Similar to CVE-2006-1863 issue with cifs, this fix is for
smbfs.
Steven French <sfrench@us.ibm.com> wrote:
Looks fine to me. This should catch the slash on lookup or equivalent,
which will be all obvious paths of interest.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes two problems.
First, the comparison of entries in the waitq.c was incorrect.
Second, the NFY_NONE check was incorrect. The test of whether the dentry
is mounted if ineffective, for example, if an expire fails then we could
wait forever on a non existant expire. The bug was identified by Jeff
Moyer.
The patch changes autofs4 to wait on expires only as this is all that's
needed. If there is no existing wait when autofs4_wait is call with a type
of NFY_NONE it delays until either a wait appears or the the expire flag is
cleared.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure to clear the driverfs_dev pointer when we do del_gendisk() (on
disk removal), so that other users that may still have a ref to the disk
won't try to use the stale pointer.
Also move the KOBJ_REMOVE uevent handler up, so that the uevent still
has access to the driverfs_dev data.
This all should hopefully fix the problems with MMC umounts after device
removals that caused commit 56cf6504fc and
its reversal (1a2acc9e92).
Original problem reported by Todd Blumer and others.
Acked-by: Greg KH <gregkh@suse.de>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Erik Mouw <erik@harddisk-recovery.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: Todd Blumer <todd@sdgsystems.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attributes logic for immutable was wrong so that there was
not way to remove this attribute once set. This fixes the
bug.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The gh_owner field shouldn't be set or reset outside the glock code.
These were left over from when recursive locking was allowed. It
isn't any more, so they are not needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The original code ordered the blocks allocated in the build_height
routine backwards causing excessive disk seeks during a read of the
metadata. This patch reverses the order to try and reduce disk seeks.
Example: A five level metadata tree, I = Inode, P = Pointers, D = Data
You need to read the blocks in the order:
I P5 P4 P3 P2 P1 D
in order to read a single data block. The new code now orders the blocks
in this way. The old code used to order them as:
I P1 P2 P3 P4 P5 D
requiring two extra seeks on average. Note that for files which are
grown by gradual extension rather than by truncate or by llseek/write
at a large offset, this doesn't apply. In the case of writing to a
file linearly, this routine will only be called upon to extend the
height of the tree by one block at a time, so the ordering is
determined by when its called rather than by the internals of the
routine itself. Optimising that part of the ordering is a much
harder problem.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
It is insane to be giving lease_init() the task of freeing the lock it is
supposed to initialise, given that the lock is not guaranteed to be
allocated on the stack. This causes lockups in fcntl_setlease().
Problem diagnosed by Daniel Hokka Zakrisson <daniel@hozac.com>
Also fix a slab leak in __setlease() due to an uninitialised return value.
Problem diagnosed by Björn Steinbrink.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Daniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds readpages support (and also corrects a small bug in
the readpage error path at the same time). Hopefully this will
improve performance by allowing GFS to submit larger lumps of
I/O at a time.
In order to simplify the setting of BH_Boundary, it currently gets
set when we hit the end of a indirect pointer block. There is
always a boundary at this point with the current allocation code.
It doesn't get all the boundaries right though, so there is still
room for improvement in this.
See comments in fs/gfs2/ops_address.c for further information about
readpages with GFS2.
Signed-off-by: Steven Whitehouse
Well, I managed to track down the bug in gfs2 that was causing
my grief. Below is a patch for the problem. Please incorporate
as you see fit. Or should I say: as you see git.
The problem was basically that you never set d_ops for the root
inode, so the wrong hash algorithm was being used. But only for
the root directory. Turns out that if I used subdirectories, it
used the proper hash and my files were found just fine.
Signed-off-by: Robert S Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
nr_segs may not be > UIO_MAXIOV, however it may be equal to. This makes
the behaviour identical to the real sys_vmsplice(). The other foov
syscalls also agree that this is the way to go.
Signed-off-by: Jens Axboe <axboe@suse.de>
This can happen quite easily, if several processes are trying to splice
the same file at the same time. It's not a failure, it just means someone
raced with us in allocating this file page. So just dump the allocated
page and relookup the original.
Signed-off-by: Jens Axboe <axboe@suse.de>
Nick says that the current construct isn't safe. This goes back to the
original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all
seem to be on the LRU in the first place.
Signed-off-by: Jens Axboe <axboe@suse.de>
Looking at generic_file_buffered_write(), we need to unlock_page() if
prepare write fails and it isn't due to racing with truncate().
Also trim the size if ->prepare_write() fails, if we have to.
Signed-off-by: Jens Axboe <axboe@suse.de>
Some places in ext3 multiple block allocation code (in 2.6.17-rc3) don't
handle the little endian well. This was resulting in *wrong* block numbers
being assigned to in-memory block variables and then stored on disk
eventually. The following patch has been verified to fix an ext3
filesystem failure when run ltp test on a 64 bit machine.
Signed-off-by; Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In dlm_grant_after_purge() we were holding a hash table read_lock while
calling put_rsb() which potentially removes the rsb from the hash table,
taking the same lock in write. Fix this by flagging rsb's ahead of time
that have been purged. Then iteratively read_lock the hash table, find a
flagged rsb, unlock, process rsb.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As pointed out by Wendy Cheng, the logic in GFS2's writepage() function
wasn't quite right with respect to invalidating pages when a file has been
truncated. This patch fixes that.
CC: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.
So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off.
The latter doesn't make any sense, and could cause us to attempt negative
length transfers...
Signed-off-by: Jens Axboe <axboe@suse.de>
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.
The data must be properly page aligned and also a multiple of the page size
in length.
Signed-off-by: Jens Axboe <axboe@suse.de>
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.
lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.
Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds
Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds
Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds
Signed-off-by: Jens Axboe <axboe@suse.de>
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.
Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c
Signed-off-by: Jens Axboe <axboe@suse.de>
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.
- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.
And as a cleanup on that:
- Make the bottom fall-through logic a little less convoluted. Also make
the steal path hold an extra reference to the page, so we don't have
to differentiate between stolen and non-stolen at the end.
Signed-off-by: Jens Axboe <axboe@suse.de>
- Check that page has suitable count for stealing in the regular pipes.
- pipe_to_file() assumes that the page is locked on succesful steal, so
do that in the pipe steal hook
- Missing unlock_page() in add_to_page_cache() failure.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 unused functions
- remove the following global function that was both unused and
unimplemented:
- super.c: gfs2_do_upgrade()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Expose the current recovery state in sysfs to help in debugging.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When a node is removed from a lockspace configuration, close our
connection to it, clearing any remaining messages for it.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Lockspaces created from user space should be forcibly freed without
requiring any further user space interaction.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Despite my earlier careful search, there was a recursive lock left
in the deallocation code. This removes it. It also should speed up
deallocation be reducing the number of locking operations which take
place by using two "try lock" operations on the two locks involved in
inode deallocation which allows us to grab the locks out of order
(compared with NFS which grabs the inode lock first and the iopen
lock later). It is ok for us to fail while doing this since if it
does fail it means that someone else is still using the inode and
thus it wouldn't be possible to deallocate anyway.
This fixes the bug reported to me by Rob Kenna.
Cc: Rob Kenna <rkenna@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Wire up *at syscalls.
This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the new find_get_pages_contig() to potentially look up the entire
splice range in one single call. This speeds up generic_file_splice_read()
quite a bit.
Signed-off-by: Jens Axboe <axboe@suse.de>
This saves the journal recovery result and makes it visible through sysfs.
User space needs to know if the node actually recovered the journal or
tried and gave up.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch changes the last user of recursive locking so that
it no longer needs this feature and removes it from the glock
layer. This makes the glock code a lot simpler and easier to
understand. Its also a prerequsite to adding support for the
AOP_TRUNCATED_PAGE return code (or at least it is if you don't
want your brain to melt in the process)
I've left in a couple of checks just in case there is some place
else in the code which is still using this feature that I didn't
spot yet, but they can probably be removed long term.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch addresses a flaw in LSM, where there is no mediation of readv()
and writev() in for 32-bit compatible apps using a 64-bit kernel.
This bug was discovered and fixed initially in the native readv/writev
code [1], but was not fixed in the compat code. Thanks to Al for spotting
this one.
[1] http://lwn.net/Articles/154282/
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All modifications of ->i_flags in inodes that might be visible to
somebody else must be under ->i_mutex. That patch fixes ext3 ioctl()
setting S_APPEND and friends.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of
buffer size from address of buffer_head is a bad idea - it will generate
junk in any case, may oops if buffer_head is close to the end of slab
page and next page is not mapped and isn't what was intended there.
IOW, ->b_data is missing in that call. Fortunately, result doesn't go
into the primary on-disk data structures, so only backup ones get crap
written to them; that had allowed this bug to remain unnoticed until
now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>