Otherwise, we're leaking the payload memory.
CC: Stable Kernel <stable@vger.kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch fixes a locking issue in the rename code by ensuring that we hold
the per sb rename lock over both directory and "other" renames which involve
different parent directories.
At the same time, this moved the (only called from one place) function
gfs2_ok_to_move into the file that its called from, so we can mark it
static. This should make a code a bit easier to follow.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Peter Staubach <staubach@redhat.com>
The commit c5dec1c303 introduced
__bio_copy_iov() to add bounce support to blk_rq_map_user_iov.
__bio_copy_iov() uses bio->bv_len to copy data for READ commands after
the completion but it doesn't work with a request that partially
completed. SCSI always completes a PC request as a whole but seems
some don't.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The commit 68154e90c9 introduced
bio_copy_kern() to add bounce support to blk_rq_map_kern.
bio_copy_kern() uses bio->bv_len to copy data for READ commands after
the completion but it doesn't work with a request that partially
completed. SCSI always completes a PC request as a whole but seems
some don't.
This patch fixes bio_copy_kern to handle the above case. As
bio_copy_user does, bio_copy_kern uses struct bio_map_data to store
struct bio_vec.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reported-by: Nix <nix@esperi.org.uk>
Tested-by: Nix <nix@esperi.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Reported by Milan Broz <mbroz@redhat.com>, commit 18ce3751 inadvertently
made submit_bh() discard the barrier bit for a WRITE_SYNC request. Fix
that up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently, we don't check the version in the SPNEGO upcall response
even though one is provided. Jeff and Q have made the corresponding
change to the Samba client (cifs.upcall).
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The ocfs2_stack_driver_request() function failed to increment the
refcount of an already-active stack. It only did the increment on the
first reference. Whoops.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Marcos Matsunaga <marcos.matsunaga@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
The TNC mutex is unlocked prematurely when reading leaf nodes
with non-hashed keys. This is unsafe because the node may be
moved by garbage collection and the eraseblock unmapped, although
that has never actually happened during stress testing.
This patch fixes the flaw by detecting the race and retrying with
the TNC mutex locked.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Leaf-nodes that have a hashed key are stored in the
leaf-node-cache (LNC) which is protected by the TNC
mutex. Consequently, when reading a leaf node with
a hashed key (i.e. directory entries, xattr entries)
the TNC mutex is always required.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
seq_read() has a subtle bug - we want the first loop there to go
until at least one *non-empty* record had fit entirely into buffer.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
d_add_ci was lifted 1:1 from ntfs. Change ntfs to use the common
version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As pointed out during review d_add_ci argument order should match d_add,
so switch the dentry and inode arguments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Ouch, if number taken from IDA is too big, the intent was to signal an
error, not check for overflow and still do overflowing addition.
One still needs 2^28 proc entries to notice this.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch lets the files using linux/version.h match the files that
#include it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It appears that configfs_rmdir() can protect configfs_detach_prep() retries with
less calls to {spin,mutex}_{lock,unlock}, and a cleaner code.
This patch does not change any behavior, except that it removes two useless
lock/unlock pairs having nothing inside to protect and providing a useless
barrier.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
We were setting i_blocks based on allocation before the extent insert, which
is wrong as the value is a calculation based on ip_clusters which gets
updated as a result of the insert. This patch moves the line in question
to just after the call to ocfs2_insert_extent().
Without this fix, inline directories were temporarily having an i_blocks
value of zero immediately after expansion to extents.
Reported-and-tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
When we fail to insert extent in ocfs2_expand_inline_dir(), we should go to
out_commit, not out.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This fixes a bug introduced with 539d8264093560b917ee3afe4c7f74e5da09d6a5:
[PATCH 2/2] ocfs2: Fix race between mount and recovery
ocfs2_mark_dead_nodes() was reading journal inodes while holding the
spinlock protecting our in-memory recovery state. The fix is very simple -
the disk state is protected by a cluster lock that's already held, so we
just move the spinlock down past the read.
Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ocfs2/cluster/netdebug.c: fix warning
fs/ocfs2/cluster/netdebug.c:154: warning: format '%lu' expects
type 'long unsigned int', but argument 17 has type 'suseconds_t'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Commit 0f475b2abe (ocfs2/net: Silence build
warnings) made sense as far as it fixed compile warnings, but it was not
required that it made the functions global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Always allow truncations to zero, even if budgeting thinks there
is no space. UBIFS reserves some space for deletions anyway.
Otherwise, the following happans:
1. create a file, and write as much as possible there, until ENOSPC
2. truncate the file, which fails with ENOSPC, which is not good.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
After commit a97c9bf33f (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.
It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).
Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.
Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org> [2.6.14 and later]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting. The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.
Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process. Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.
Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case the binfmt_misc binary handler is registered *before* the e.g.
script one (when for example being compiled as a module) the following
situation may occur:
1. user launches a script, whose interpreter is a misc binary;
2. the load_misc_binary sets the misc_bang and returns -ENOEVEC,
since the binary is a script;
3. the load_script_binary loads one and calls for search_binary_hander
to run the interpreter;
4. the load_misc_binary is called again, but refuses to load the
binary due to misc_bang bit set.
The fix is to move the misc_bang setting lower - prior to the actual
call to the search_binary_handler.
Caused by the commit 3a2e7f47 (binfmt_misc.c: avoid potential kernel
stack overflow)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: <stable@kernel.org> [2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was another FAT BKL conversion deadlock reported by Bart
Trojanowski due to the BKL being used as a recursive lock by FAT, which
was missed because it only triggers with 'sync' (or 'dirsync') mounts.
The recursion worked for the BKL, but after the conversion to lock_super
(which uses a mutex), it just deadlocks.
Thanks to Bart for debugging this and testing the fix. The lock
debugging information from the original report:
=============================================
[ INFO: possible recursive locking detected ]
2.6.27-rc3-bisect-00448-ga7f5aaf #16
---------------------------------------------
mv/4020 is trying to acquire lock:
(&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20
but task is already holding lock:
(&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20
other info that might help us debug this:
3 locks held by mv/4020:
#0: (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140
#1: (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110
#2: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20
stack backtrace:
Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16
[<c014e694>] validate_chain+0x984/0xea0
[<c0108d70>] ? native_sched_clock+0x0/0xf0
[<c014ee9c>] __lock_acquire+0x2ec/0x9b0
[<c014f5cf>] lock_acquire+0x6f/0x90
[<c01a90fe>] ? lock_super+0x1e/0x20
[<c044e5fd>] mutex_lock_nested+0xad/0x300
[<c01a90fe>] ? lock_super+0x1e/0x20
[<c01a90fe>] ? lock_super+0x1e/0x20
[<c01a90fe>] lock_super+0x1e/0x20
[<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat]
[<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80
[<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat]
[<f8b3a962>] fat_sync_inode+0x12/0x20 [fat]
[<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat]
[<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat]
[<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat]
[<c01b0968>] vfs_unlink+0x98/0x110
[<c01b2400>] do_unlinkat+0x130/0x140
[<c016a8f5>] ? audit_syscall_entry+0x105/0x150
[<c01b253b>] sys_unlinkat+0x3b/0x40
[<c01040d3>] sysenter_do_call+0x12/0x3f
=======================
where the deadlock is due to the nesting of lock_super from vfat_unlink
to fat_write_inode:
- do_unlinkat
- vfs_unlink
- vfat_unlink
* lock_super
- fat_remove_entries
- fat_sync_inode
- fat_write_inode
* lock_super
and the fix is to simply remove the use of lock_super() in fat_write_inode.
The lock_super() there had been just an automatic conversion of the
kernel lock to the superblock lock, but no locking was actually needed
there, since the code in fat_write_inode already protected all relevant
accesses with a spinlock (sbi->inode_hash_lock to be exact). The only
code inside the BKL (and thus the superblock lock) was accesses tp local
variables or calls to functions that have long been SMP-safe (i.e.
sb_bread, mark_buffe_dirty and brlese).
Bart reports:
"Looks good. I ran 10 parallel processes creating 1M files truncating
them, writing to them again and then deleting them. This patch fixes
the issue I ran into.
Signed-off-by: Bart Trojanowski <bart@jukie.net>"
Reported-and-tested-by: Bart Trojanowski <bart@jukie.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Properly handle MSKRB5 by passing sec=mskrb5 to the upcall so that the
spengo blob can be generated appropriately. Also, make
decode_negTokenInit prefer whichever mechanism is first in the list.
Needed for some NetApp servers, and possibly some older
versions of Windows which treat the two KRB5 mechanisms differently.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_setup_session references pSesInfo->server several times. That
pointer shouldn't change during the life of the function so grab it
once and store it in a local var. This makes the code look a little
cleaner too.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
I case we failed to allocate memory for inode when creating it, we did not
properly free block already allocated for this inode. Move memory allocation
before the block allocation which fixes this issue (thanks for the idea go to
Ingo Oeser <ioe-lkml@rameria.de>). Also remove a few superfluous
initializations already done in udf_alloc_inode().
Reviewed-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Jan Kara <jack@suse.cz>
A memory allocation inside alloc_mutex must not recurse back into the
filesystem itself because that leads to lock inversion between iprune_mutex and
alloc_mutex (and thus to deadlocks - see traces below). alloc_mutex is actually
needed only to update allocation statistics in the superblock so we can drop it
before we start allocating memory for the inode.
tar D ffff81015b9c8c90 0 6614 6612
ffff8100d5a21a20 0000000000000086 0000000000000000 00000000ffff0000
ffff81015b9c8c90 ffff81015b8f0cd0 ffff81015b9c8ee0 0000000000000000
0000000000000003 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff803c1d8a>] __mutex_lock_slowpath+0x64/0x9b
[<ffffffff803c1bef>] mutex_lock+0xa/0xb
[<ffffffff8027f8c2>] shrink_icache_memory+0x38/0x200
[<ffffffff80257742>] shrink_slab+0xe3/0x15b
[<ffffffff802579db>] try_to_free_pages+0x221/0x30d
[<ffffffff8025657e>] isolate_pages_global+0x0/0x31
[<ffffffff8025324b>] __alloc_pages_internal+0x252/0x3ab
[<ffffffff8026b08b>] cache_alloc_refill+0x22e/0x47b
[<ffffffff8026ae37>] kmem_cache_alloc+0x3b/0x61
[<ffffffff8026b15b>] cache_alloc_refill+0x2fe/0x47b
[<ffffffff8026b34e>] __kmalloc+0x76/0x9c
[<ffffffffa00751f2>] :udf:udf_new_inode+0x202/0x2e2
[<ffffffffa007ae5e>] :udf:udf_create+0x2f/0x16d
[<ffffffffa0078f27>] :udf:udf_lookup+0xa6/0xad
...
kswapd0 D ffff81015b9d9270 0 125 2
ffff81015b903c28 0000000000000046 ffffffff8028cbb0 00000000fffffffb
ffff81015b9d9270 ffff81015b8f0cd0 ffff81015b9d94c0 000000000271b490
ffffe2000271b458 ffffe2000271b420 ffffe20002728dc8 ffffe20002728d90
Call Trace:
[<ffffffff8028cbb0>] __set_page_dirty+0xeb/0xf5
[<ffffffff8025403a>] get_dirty_limits+0x1d/0x22f
[<ffffffff803c1d8a>] __mutex_lock_slowpath+0x64/0x9b
[<ffffffff803c1bef>] mutex_lock+0xa/0xb
[<ffffffffa0073f58>] :udf:udf_bitmap_free_blocks+0x47/0x1eb
[<ffffffffa007df31>] :udf:udf_discard_prealloc+0xc6/0x172
[<ffffffffa007875a>] :udf:udf_clear_inode+0x1e/0x48
[<ffffffff8027f121>] clear_inode+0x6d/0xc4
[<ffffffff8027f7f2>] dispose_list+0x56/0xee
[<ffffffff8027fa5a>] shrink_icache_memory+0x1d0/0x200
[<ffffffff80257742>] shrink_slab+0xe3/0x15b
[<ffffffff80257e93>] kswapd+0x346/0x447
...
Reported-by: Tibor Tajti <tibor.tajti@gmail.com>
Reviewed-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Jan Kara <jack@suse.cz>
A fuzzed fileystem image failed with OMFS when the extent count was
used in a loop without being checked against the max number of extents.
It also provoked a signed division for an array index that was checked
as if unsigned, leading to index by -1.
omfsck will be updated to fix these cases, in the meantime bail out
gracefully.
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
write_cache_pages() uses i_mapping->writeback_index to pick up where it
left off the last time a given inode was found by pdflush or
balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
alloc_inode() should set it to a sane value so that writeback doesn't
start in the middle of a file. It is somewhat difficult to notice the bug
since write_cache_pages will loop around to the start of the file and the
elevator helps hide the resulting seeks.
For whatever reason, Btrfs hits this often. Unpatched, untarring 30
copies of the linux kernel in series runs at 47MB/s on a single sata
drive. With this fix, it jumps to 62MB/s.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xattr code has not been tested for a while and there were
serveral bugs. One of them is using wrong inode in
'ubifs_jnl_change_xattr()'. The other is a deadlock in
'ubifs_setxattr()': the i_mutex is locked in
'cap_inode_need_killpriv()' path, so deadlock happens when
'ubifs_setxattr()' tries to lock it again.
Thanks to Zoltan Sogor for finding these bugs.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
In looking at network named pipe support on cifs, I noticed that
Dave Howell's iget patch:
iget: stop CIFS from using iget() and read_inode()
broke mounts to IPC$ (the interprocess communication share), and don't
handle the error case (when getting info on the root inode fails).
Thanks to Gunter who noted a typo in a debug line in the original
version of this patch.
CC: David Howells <dhowells@redhat.com>
CC: Gunter Kukkukk <linux@kukkukk.com>
CC: Stable Kernel <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The patches that are intended to introduce copy-on-write credentials for 2.6.28
require abstraction of access to some fields of the task structure,
particularly for the case of one task accessing another's credentials where RCU
will have to be observed.
Introduced here are trivial no-op versions of the desired accessors for current
and other tasks so that other subsystems can start to be converted over more
easily.
Wrappers are introduced into a new header (linux/cred.h) for UID/GID,
EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed
user_struct. These wrappers are macros because the ordering between header
files mitigates against making them inline functions.
linux/cred.h is #included from linux/sched.h.
Further, XFS is modified such that it no longer defines and uses parameterised
versions of current_fs[ug]id(), thus getting rid of the namespace collision
otherwise incurred.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add a dlm_ prefix to the struct names in config.c. This resolves a
conflict with struct node in particular, when include/linux/node.h
happens to be included.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
A couple of unlikely error conditions were missing a kfree on the error
exit path.
Reported-by: Juha Leppanen <juha_motorsportcom@luukku.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Commit d70b67c8bc fixed VFS and
it never calls FS lookup function in deleted directories now.
We may remove corresponding UBIFS check.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes a problem whereby simultaneous unlink, rmdir,
rename and link operations (e.g. rm -fR *) from multiple nodes
on the same GFS2 file system can cause kernel panics, hangs,
and/or memory corruption. It also gets rid of all the non-rgrp
calls to gfs2_glock_nq_m.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch is intended to fix the issues reported in bz #457798. Instead
of having the metafs as a separate filesystem, it becomes a second root
of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it
is still possible (for backwards compatibility purposes) to mount it as
type gfs2meta. A new mount flag "meta" is introduced so that its possible
to tell the two cases apart in /proc/mounts.
As a result it becomes possible to mount type gfs2 with -o meta and
get the same result as mounting type gfs2meta. So it is possible to
mount just the metafs on its own. Currently if you do this, its then
impossible to mount the "normal" root of the gfs2 filesystem without
first unmounting the metafs root. I'm not sure if thats a feature or
a bug :-)
Either way, this is a great improvement on the previous scheme and I've
verified that it works ok with bind mounts on both the "normal" root
and the metafs root in various combinations.
There were also a bunch of functions in super.c which didn't belong there,
so this moves them into ops_fstype.c where they can be static. Hopefully
the mount/umount sequence is now more obvious as a result.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Due to an incorrect iterator, some glocks were being missed from the
glock dumps obtained via debugfs. This patch fixes the problem and
ensures that we don't miss any glocks in future.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>