This fix the uninitialized bs when we try to replace a xattr entry in
ibody with the new value which require more than free space.
This situation only happens we format ext3/4 with inode size more than 128 and
we have put xattr entries both in ibody and block. The consequences about
this bug is we will lost the xattr block which pointed by i_file_acl with all
xattr entires in it. We will alloc a new xattr block and put that large value
entry in it. The old xattr block will become orphan block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updating the current transaction's t_state is protected by j_state_lock. We
need to do the same when updating the t_state to T_COMMIT.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bdevname() fills the buffer that it is given as a parameter, so calling
strcpy() or snprintf() on the returned value is redundant (and probably not
guaranteed to work - I don't think strcpy and snprintf support overlapping
buffers.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prior to 2.6.26 fuse only supported single page write requests. In theory all
fuse filesystem should be able support bigger than 4k writes, as there's
nothing in the API to prevent it. Unfortunately there's a known case in
NTFS-3G where big writes cause filesystem corruption. There could also be
other filesystems, where the lack of testing with big write requests would
result in bugs.
To prevent such problems on a kernel upgrade, disable big writes by default,
but let filesystems set a flag to turn it on.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Szabolcs Szakacsits <szaka@ntfs-3g.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When mm destruction happens, we should pass mm_update_next_owner() the old mm.
But unfortunately new mm is passed in exec_mmap().
Thus, kernel panic is possible when a multi-threaded process uses exec().
Also, the owner member comment description is wrong. mm->owner does not
necessarily point to the thread group leader.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: "Paul Menage" <menage@google.com>
Cc: "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix an oops with a corrupted hfs+ image.
See http://bugzilla.kernel.org/show_bug.cgi?id=10548 for details.
Problem is that we call hfs_btree_open() from hfsplus_fill_super() to set
HFSPLUS_SB(sb).[ext_tree|cat_tree] Both trees are still NULL at this moment.
If hfs_btree_open() fails for any reason it calls iput() on the page, which
gets to hfsplus_releasepage() which tries to access HFSPLUS_SB(sb).* which is
still NULL and oopses while dereferencing it.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is currently no way to query the bounding set of another task. As there
appears to be no security reason not to, and as Michael Kerrisk points out the
following valid reasons to do so exist:
* consistency (I can see all of the other per-thread/process sets in
/proc/.../status)
* debugging -- I could imagine that it would make the job of debugging an
application that uses capabilities a little simpler.
this patch adds the bounding set to /proc/self/status right after the
effective set.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes, vfs_quota_off() is called on a partially set up super block (for
example when fill_super() fails for some reason). In such cases we cannot
call ->sync_fs() because it can Oops because of not properly filled in super
block. So in case we find there's not quota to turn off, we just skip
everything and return which fixes the above problem.
[akpm@linux-foundation.org: fxi tpyo]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dget(dentry->d_parent) --> dget_parent(dentry)
unlock_parent() is racy and unnecessary. Replace single caller with
unlock_dir().
There are several other suspect uses of ->d_parent in ecryptfs...
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's no reason for the _kern in hppfs_kern.c, so move it to hppfs.c.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hppfs tidying and fixes noticed during hch's get_inode work -
style fixes
a copy_to_user got its return value checked
hppfs_write no longer fiddles file->f_pos because it gets and
returns pos in its arguments
hppfs_delete_inode dputs the underlyng procfs dentry stored in
its private data and mntputs the vfsmnt stashed in s_fs_info
hppfs_put_super no longer needs to mntput the s_fs_info, so it
no longer needs to exist
hppfs_readlink and hppfs_follow_link were doing a bunch of stuff
with a struct file which they didn't use
there is now a ->permission which calls generic_permission
get_inode was always returning 0 for some reason - it now
returns an inode if nothing bad happened
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It acts exactly like a regular 'cond_resched()', but will not get
optimized away when CONFIG_PREEMPT is set.
Normal kernel code is already preemptable in the presense of
CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
02b67cc3ba "sched: do not do
cond_resched() when CONFIG_PREEMPT").
But when wanting to conditionally reschedule while holding a lock, you
need to use "cond_sched_lock(lock)", and the new function is the BKL
equivalent of that.
Also make fs/locks.c use it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cifs_demultiplex_thread can exit under several conditions:
1) if it's signaled
2) if there's a problem with session setup
3) if kthread_stop is called on it
The first two are problems. If kthread_stop is called on the thread,
there is no guarantee that it will still be up. We need to have the
thread stay up until kthread_stop is called on it.
One option would be to not even try to tear things down until after
kthread_stop is called. However, in the case where there is a problem
setting up the session, there's no real reason to try continuing the
loop.
This patch allows the thread to clean up and prepare for exit under all
three conditions, but it has the thread go to sleep until kthread_stop
is called. This allows us to simplify the shutdown code somewhat since
we can be reasonably sure that the thread won't exit after being
signaled but before kthread_stop is called.
It also removes the places where the thread itself set the tsk variable
since it appeared that it could have a potential race where the thread
might never be shut down.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When creating a directory on a CIFS share without POSIX extensions,
and the given mode has no write bits set, set the ATTR_READONLY bit.
When creating a file, set ATTR_READONLY if the create mode has no write
bits set and we're not using unix extensions.
There are some comments about this being problematic due to the VFS
splitting creates into 2 parts. I'm not sure what that's actually
talking about, but I'm assuming that it has something to do with how
mknod is implemented. In the simple case where we have no unix
extensions and we're just creating a regular file, there's no reason
we can't set ATTR_READONLY.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Clean up cifs_setattr a bit by adding a local inode pointer, and
changing all of the direntry->d_inode references to it. This also adds a
bit of micro-optimization. d_inode shouldn't change over the life of
this function, so we only need to dereference it once.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch cleans up cifs_find_tcp_session so it become
less indented. Also the error of skipping IPv6 matched
addresses fixed.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Remember to close the files if copy_to_user() failed.
Spotted by dm.n9107@gmail.com.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: DM <dm.n9107@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix fs/bio.c kernel-doc parameter warning:
Warning(linux-2.6.25-git14//fs/bio.c:972): No description found for parameter 'reading'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When UDF filesystem is mounted with noadinicb mount option, it
happens that we extend an empty directory with a block. A code in
udf_add_entry() didn't count with this possibility and used
uninitialized data leading to memory and filesystem corruption.
Add a check whether file already has some extents before operating
on them.
Signed-off-by: Jan Kara <jack@suse.cz>
generic_file_splice_write() duplicates remove_suid() just because it
doesn't hold i_mutex. But it grabs i_mutex inside splice_from_pipe()
anyway, so this is rather pointless.
Move locking to generic_file_splice_write() and call remove_suid() and
__splice_from_pipe() instead.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Was a holdover from the old kernel_thread based cifsd
code. We needed to know that the thread had set the task variable
before proceeding. Now that kthread_run returns the new task, this
doesn't appear to be needed anymore.
As best I can tell, this sleep was intended to try to prevent
cifs_umount from freeing the cifsSesInfo struct before cifsd had
exited. Now that cifsd is using the kthread API, we know that
when kthread_stop returns that cifsd has exited, so I don't
think this is needed any longer.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Christop Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Commit 33dcdac2df ("kill ->put_inode")
removed the final use of i_op->put_inode, but left the now totally
unused "op" variable in iput().
Get rid of it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.
As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides. We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).
Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves. Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.
(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)
Also remove a very misleading comment above the defintion of
struct super_operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- remove affs_put_inode, so preallocations aren't discared unnecessarily
often.
- remove affs_drop_inode, it's called with a spinlock held, so it can't
use a mutex.
- make i_opencnt atomic
- avoid direct b_count manipulations
- a few allocation failure fixes, so that these are more gracefully
handled now.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In this unfortunate case, proc_mkdir_mode wrapper can't be used anymore and
this is no way to reuse proc_create_data due to nlinks assignment. So,
copy the code from proc_mkdir and assign PDE->data at the appropriate
moment.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding the ability to get a physical address from point() in addition
to virtual address. This physical address is required for XIP of
userspace code from flash.
Signed-off-by: Jared Hulbert <jaredeh@gmail.com>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
To support NFS export, we need to know the parent inode of directories.
Rather than growing the jffs2_inode_cache structure, share space with
the nlink field -- which was always set to 1 for directories anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* if luser with root sets it to something that is not a multiple of
BITS_PER_LONG, the system is screwed.
* if it gets decreased at the wrong time, we can get expand_files()
returning success and _not_ increasing the size of table as asked.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) none of the callers even looks at inode or file returned by anon_inode_getfd()
b) any caller that would try to look at those would be racy, since by the time
it returns we might have raced with close() from another thread and that
file would be pining for fjords.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We don't actually care about nlink; we only care whether the inode in
question is unlinked or not.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Here are some more places where path_{get,put}() can be used instead of
dput()/mntput() pair. Besides that it fixes a bug in autofs4_mount_busy()
where mntput() was called before dput().
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer has identified a case where the autofs4 function
root.c:try_to_fill_dentry() can return -EBUSY when it should return 0.
Jeff's description of the way this happens is:
"automount starts an expire for directory d. after the callout to the daemon,
but before the rmdir, another process tries to walk into the same directory.
It puts itself onto the waitq, pending the expiration.
When the expire finishes, the second process is woken up. In
try_to_fill_dentry, it does this check:
status = d_invalidate(dentry);
if (status != -EBUSY)
return -EAGAIN;
And status is EBUSY. The dentry still has a non-zero d_inode, and the
flags do not contain LOOKUP_CONTINUE or LOOKUP_DIRECTORY
So, we fall through and return -EBUSY to the caller."
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer has identified a race in due to an execution order dependency
in the autofs4 function root.c:try_to_fill_dentry().
Jeff's description of this race is:
"P1 does a lookup of /mount/submount/foo. Since the VFS can't find an entry
for "foo" under /mount/submount, it calls into the autofs4 kernel module to
allocate a new dentry, D1. The kernel creates a new waitq for this lookup and
calls the daemon to perform the mount.
The daemon performs a mkdir of the "foo" directory under /mount/submount,
which ends up creating a *new* dentry, D2.
Then, P2 does a lookup of /mount/submount/foo. The VFS path walking logic
finds a dentry in the dcache, D2, and calls the revalidate function with this.
In the autofs4 revalidate code, we then trigger a mount, since the dentry is
an empty directory that isn't a mountpoint, and so set DCACHE_AUTOFS_PENDING
and call into the wait code to trigger the mount.
The wait code finds our existing waitq entry (since it is keyed off of the
directory name) and adds itself to the list of waiters.
After the daemon finishes the mount, it calls back into the kernel to release
the waiters. When this happens, P1 is woken up and goes about clearing the
DCACHE_AUTOFS_PENDING flag, but it does this in D1! So, given that P1 in our
case is a program that will immediately try to access a file under
/mount/submount/foo, we end up finding the dentry D2 which still has the
pending flag set, and we set out to wait for a mount *again*!
So, one way to address this is to re-do the lookup at the end of
try_to_fill_dentry, and to clear the pending flag on the hashed dentry. This
seems a sane approach to me."
And Jeff's patch does this.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Catch invalid dentry when calculating its path.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>