Delete the unused variable "err" in v9fs_vfs_getattr()
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Allow requests for security.* and trusted.* xattr name spaces
to pass through to server.
The new files are 99% cut and paste from fs/9p/xattr_user.c with the
namespaces changed. It has the intended effect in superficial testing.
I do not know much detail about how these namespaces are used, but passing
them through to the server, which can decide whether to handle them or not,
seems reasonable.
I want to support a use case where an ext4 file system is mounted via 9P,
then re-exported via samba to windows clients in a cluster. Windows wants
to store xattrs such as security.NTACL. This works when ext4 directly
backs samba, but not when 9P is inserted. This use case is documented here:
http://code.google.com/p/diod/issues/detail?id=95
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.
Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).
This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.
We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.
Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
... otherwise the path we'd built isn't worth much. Don't accept such fids
obtained from paths unless dentry is still alived by the end of the work.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following set of operations on a NFS client and server will cause
server# mkdir a
client# cd a
server# mv a a.bak
client# sleep 30 # (or whatever the dir attrcache timeout is)
client# stat .
stat: cannot stat `.': Stale NFS file handle
Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.
nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.
The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.
Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.
[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix up the ->page_mkwrite handler to provide stable page writes if necessary.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify v9fs_get_fsgid_for_create to return a kgid and modify all of
the variables that hold the result of v9fs_get_fsgid_for_create to be
of type kgid_t.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change struct v9fs_session_info and the code that popluates it to use
kuids and kgids. When parsing the 9p mount options convert the
dfltuid, dflutgid, and the session uid from the current user namespace
into kuids and kgids. Modify V9FS_DEFUID and V9FS_DEFGUID to be kuid
and kgid values.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change struct 9p_fid and it's associated functions to
use kuid_t's instead of uid_t.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
9p has thre strucrtures that can encode inode stat information. Modify
all of those structures to contain kuid_t and kgid_t values. Modify
he wire encoders and decoders of those structures to use 'u' and 'g' instead of
'd' in the format string where uids and gids are present.
This results in all kuid and kgid conversion to and from on the wire values
being performed by the same code in protocol.c where the client is known
at the time of the conversion.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Modify the p9_client_rpc format specifiers of every function that
directly transmits a uid or a gid from 'd' to 'u' or 'g' as
appropriate.
Modify those same functions to take kuid_t and kgid_t parameters
instead of uid_t and gid_t parameters.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Return EEXISTS if requested file already exists, without this patch open
call will always succeed even if the file exists and user specified
O_CREAT|O_EXCL.
Following test code can be used to verify this patch. Without this patch
executing following test code on 9p mount will result in printing 'test case
failed' always.
main()
{
int fd;
/* first create the file */
fd = open("./file", O_CREAT|O_WRONLY);
if (fd < 0) {
perror("open");
return -1;
}
close(fd);
/* Now opening same file with O_CREAT|O_EXCL should fail */
fd = open("./file", O_CREAT|O_EXCL);
if (fd < 0 && errno == EEXIST)
printf("test case pass\n");
else
printf("test case failed\n");
close(fd);
return 0;
}
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We do the truncate via setattr request, hence don't pass the O_TRUNC flag in
open request. Without this patch we end up sending zero sized write request
to server when we try to truncate. Some servers (VirtFS) were not handling that
properly.
Reported-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
... is really excessive. First of all, ->readdir() is serialized by
file->f_path.dentry->d_inode->i_mutex; playing with file->f_path.dentry->d_lock
is not buying you anything. Moreover, rdir->mutex is pointless for exactly
the same reason - you'll never see contention on it.
While we are at it, there's no point in having rdir->buf a pointer -
you have it point just past the end of rdir, so it might as well be a flex
array (and no, it's not a gccism).
Absolutely untested patch follows:
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().
Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.
Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- Pass the user namespace the uid and gid values in the xattr are stored
in into posix_acl_from_xattr.
- Pass the user namespace kuid and kgid values should be converted into
when storing uid and gid values in an xattr in posix_acl_to_xattr.
- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
pass in &init_user_ns.
In the short term this change is not strictly needed but it makes the
code clearer. In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.
Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
These are generally very small strings. We don't need an entire 4k
allocation for each. Instead, just free and reallocate them on an
as-needed basis.
Note: This patch is untested since I don't have a 9p server available at
the moment. It's mainly something I noticed while doing some
getname/putname cleanup work.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Reading a symlink longer than the given buffer, a p9_debug use would
try to print the link name (not NUL-terminated) using a %s format.
Use %.*s instead, and replace the strncpy+strnlen with functionally
equivalent strlen+memcpy.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called. They could also be passed to the
compare function.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just pass struct file *. Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int. Next: saner prototypes for parts in
namei.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change of calling conventions:
old new
NULL 1
file 0
ERR_PTR(-ve) -ve
Caller *knows* that struct file *; no need to return it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.
[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create. No functionality is changed.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In *all* callers we have a dentry of child of that directory.
Just use ->d_parent of that one, for fsck sake...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>