cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget
Rather than allocating an inode and filling it out, have
cifs_get_inode_info fill out a cifs_fattr and call cifs_iget. This means
a pretty hefty reorganization of cifs_get_inode_info.
For the readdir codepath, add a couple of new functions for filling out
cifs_fattr's from different FindFile response infolevels.
Finally, remove cifs_new_inode since there are no more callers.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls
When there's an open filehandle, SET_FILE_INFO is apparently preferred
over SET_PATH_INFO. Add a new variant that sets a FILE_UNIX_INFO_BASIC
infolevel via SET_FILE_INFO and switch cifs_setattr_unix to use the
new call when there's an open filehandle available.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO
The SET_FILE_INFO variant will need to do the same thing here. Break
this code out into a separate function that both variants can call.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo
...in preparation of adding a SET_FILE_INFO variant.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: add pid of initiating process to spnego upcall info
This will allow the upcall to poke in /proc/<pid>/environ and get
the value of the $KRB5CCNAME env var for the process.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
In the 'ubifs_recover_leb()' function, when we find corrupted
empty space, we dump 8K starting from the offset where the last
node ends. This is OK if the corrupted empty space is somewhere
near that offset. But if the corruption is far at the end of the
LEB, we will dump all 0xFF bytes and complitely ignore the
interesting data. This is observed on a PPC ("kilauea") with
NOR flash.
This patch changes the behavior and teaches UBIFS to print only
interesting data. I.e., now we find where corruption starts and
start dumping from that offset.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
recovery.c has 'is_empty()' helper and it is better to use
this helper instead of re-implementing it in several places.
This patch does this and removes some amount of unneeded code.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
This patch fixes few minor things I've spotted while going through
code:
1. Better document return codes
2. If 'ubifs_scan_a_node()' returns some thing we do not expect,
treat this as an error.
3. Try to do recovery only when 'ubifs_scan()' returns %-EUCLEAN,
not on any error.
4. If empty space starts at a non-aligned address, print a message.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
In case of corruptions, dump 8192 bytes instead of 4096. The
largest node is 4096+ bytes, so it is better to see a node
boundary, which is not always possible when only 4096 bytes
are printed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics. printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.
<level> is now included in the output on each additional use.
Remove all uses of multiple KERN_<level>s in formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1c8542c7bb replaced kmalloc() with memdup_user() in the write()
function but also dropped the kfree(temp). The memdup_user() function
allocates memory which is never freed.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix various silly problems wrt mnt_namespace.h:
- exit_mnt_ns() isn't used, remove it
- done that, sched.h and nsproxy.h inclusions aren't needed
- mount.h inclusion was need for vfsmount_lock, but no longer
- remove mnt_namespace.h inclusion from files which don't use anything
from mnt_namespace.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following test script triggers a deadlock on ext2 filesystem:
while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done &
while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done &
I found there is a potential deadlock between quotaon and quotaoff (or
quotasync). Basically, all of quotactl operations need to be protected by
dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call sb->s_op->quota_write
that needs to grab the i_mutex of the quota file. But in vfs_quota_on_inode
(called from quotaon operation), the current code tries to grab the i_mutex of
the quota file first before getting quonoff_mutex.
Reverse the order in which we take locks in vfs_quota_on_inode().
Jan Kara: Changed changelog to be more readable, made lockdep happy with
I_MUTEX_QUOTA.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
do_execve() and ptrace_attach() return -EINTR if
mutex_lock_interruptible(->cred_guard_mutex) fails.
This is not right, change the code to return ERESTARTNOINTR.
Perhaps we should also change proc_pid_attr_write().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I run many ffsb test cases on JBODs (typically 13/12 disks). Comparing
with kernel 2.6.30, 2.6.31-rc1 has about 16% regression with
ffsb_create_4k. The sub test case creates files continuously for 10
minitues and every file is 1MB.
Bisect located below patch.
5cee5815d1 is first bad commit
commit 5cee5815d1
Author: Jan Kara <jack@suse.cz>
Date: Mon Apr 27 16:43:51 2009 +0200
vfs: Make sys_sync() use fsync_super() (version 4)
It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.
As a matter of fact, ffsb calls sys_sync in the end to make sure all data
is flushed to disks and the flushing is counted into the result. vmstat
shows ffsb is blocked when syncing for a long time. With 2.6.30, ffsb is
blocked for a short time.
I checked the patch and did experiments to recover the original methods.
Eventually, the root cause is the patch deletes the calling to
wakeup_pdflush when syncing, so only ffsb is blocked on disk I/O.
wakeup_pdflush could ask pdflush to write back pages with ffsb at the
same time.
[akpm@linux-foundation.org: restore comment too]
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
UBIFS uses a bdi device per volume, but does not care to hand out unique
names to each of them. This causes an error when trying to mount more
than one volumes. Append the UBI volume and device ID to avoid that.
[Amended a bit by Artem Bityutskiy]
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When debugging is enabled and an unclean file-system is mounter,
the following assertion is triggered:
UBIFS assert failed in ubifs_tnc_start_commit at 805 (pid 1081)
Call Trace:
[cfaffbd0] [c0006cf8] show_stack+0x44/0x16c (unreliable)
[cfaffc10] [c011b738] ubifs_tnc_start_commit+0xbb8/0xd18
[cfaffc90] [c0112670] do_commit+0x150/0xa44
[cfaffd10] [c0125234] ubifs_rcvry_gc_commit+0xd8/0x544
[cfaffd60] [c0100e9c] ubifs_fill_super+0xe78/0x15f8
[cfaffdf0] [c0102118] ubifs_get_sb+0x20c/0x320
[cfaffe70] [c007f764] vfs_kern_mount+0x58/0xe0
[cfaffe90] [c007f83c] do_kern_mount+0x40/0xf8
[cfaffeb0] [c0095c24] do_mount+0x550/0x758
[cfafff10] [c0095ebc] sys_mount+0x90/0xe0
[cfafff40] [c000ed4c] ret_from_syscall+0x0/0x3c
The reason is that we initialize 'c->min_leb_idx' early, and do
not re-calculate it after journal replay.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds the following minor optimization:
1. If write-buffer does not use the timer, indicate it with the
wbuf->no_timer variable, instead of using the wbuf->softlimit
variable. This is better because wbuf->softlimit is of ktime_t
type, and the ktime_to_ns function contains 64-bit multiplication.
2. Do not call the 'hrtimer_cancel()' function for write-buffers
which do not use timers.
3. Do not cancel the timer in 'ubifs_put_super()' because the
synchronization function does this.
This patch also removes a confusing comment.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
1. Make the I/O debugging message print the journal head number.
2. Add prints to timer functions.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Fix the following warning:
fs/ubifs/io.c: In function 'ubifs_wbuf_init':
fs/ubifs/io.c:860: warning: integer overflow in expression
And limit maximum hrtimer delta to ULONG_MAX because the
argument is 'unsigned long'.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This fixes a bug that checkpoint count gets wrong on errors when
deleting a series of checkpoints.
The count error is persistent since the checkpoint count is stored on
disk. Some userland programs refer to the count via ioctl, and this
bugfix is needed to prevent malfunction of such programs.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
This will fix the following false positive of recursive locking which
lockdep has detected:
=============================================
[ INFO: possible recursive locking detected ]
2.6.30-nilfs #42
---------------------------------------------
nilfs_cleanerd/10607 is trying to acquire lock:
(&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2]
but task is already holding lock:
(&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2]
other info that might help us debug this:
2 locks held by nilfs_cleanerd/10607:
#0: (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
#1: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2]
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Leandro Lucarella gave me a report that nilfs gets stuck after its
write function fails.
The problem turned out to be caused by bugs which leave writeback flag
on pages. This fixes the problem by ensuring to clear the writeback
flag in error path.
Reported-by: Leandro Lucarella <llucax@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
The following error code handling in nilfs_segctor_write() function
wrongly converted negative error codes to a truth value (i.e. 1):
err = unlikely(err) ? : res;
which originaly meant to be
err = err ? : res;
This mis-conversion caused that write or sync functions receive the
unexpected error code. This fixes the bug by removing the unlikely
directive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.
Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.
Possibly fh_verify() should return the creds to use.
This is a regression introduced by
745ca2475a "CRED: Pass credentials through
dentry_open()".
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-Verified-By: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make an error msg look nicer by inserting a space between number and word.
Signed-off-by: Hu Tao <hu.taoo@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
worker memory is already freed on one fail path in btrfs_start_workers,
but is still dereferenced. Switch the dereference and kfree.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The btrfs attr patches unconditionally inherited the inode flags field
without honoring nodatacow and nodatasum. This fix makes sure
we properly record the nodatacow/sum mount options in new inodes.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The new backref format has restriction on type of backref item. If a tree
block isn't referenced by its owner tree, full backrefs must be used for the
pointers in it. When a tree block loses its owner tree's reference, backrefs
for the pointers in it should be updated to full backrefs. Current
btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for
general use.
This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a
problem in the restricted form btrfs_drop_snapshot is used today, but for
general snapshot deletion this update is required.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC
panic on btrfs. This is because we do not account for data we may use when
doing the fallocate. This patch fixes the problem by properly reserving space,
and then just freeing it when we are done. The reservation stuff was made with
delalloc in mind, so its a little crude for this case, but it keeps the box
from panicing.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The per-user inotify_devs value is incremented each time a new file is
allocated, but never decremented. This led to inotify_init failing after a
limited number of calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
cifs: add new cifs_iget function and convert unix codepath to use it
In order to unify some codepaths, introduce a common cifs_fattr struct
for storing inode attributes. The different codepaths (unix, legacy,
normal, etc...) can fill out this struct with inode info. It can then be
passed as an arg to a common set of routines to get and update inodes.
Add a new cifs_iget function that uses iget5_locked to identify inodes.
This will compare inodes based on the uniqueid value in a cifs_fattr
struct.
Rather than filling out an already-created inode, have
cifs_get_inode_info_unix instead fill out cifs_fattr and hand that off
to cifs_iget. cifs_iget can then properly look for hardlinked inodes.
On the readdir side, add a new cifs_readdir_lookup function that spawns
populated dentries. Redefine FILE_UNIX_INFO so that it's basically a
FILE_UNIX_BASIC_INFO that has a few fields wrapped around it. This
allows us to more easily use the same function for filling out the fattr
as the non-readdir codepath.
With this, we should then have proper hardlink detection and can
eventually get rid of some nasty CIFS-specific hacks for handing them.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Check before use it.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs. Each bip slab
has an embedded bio_vec array at the end. This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version. Only the largest bip slab is backed by a mempool. The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.(none)>
Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13531
Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.
The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.
This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.
Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With ELF, at generating coredump, some more headers other than used
vmas are added.
When max_map_count == 65536, a core generated by following kinds of
code can be unreadable because the number of ELF's program header is
written in 16bit in Ehdr (please see elf.h) and the number overflows.
==
... = mmap(); (munmap, mprotect, etc...)
if (failed)
abort();
==
This can happen in mmap/munmap/mprotect/etc...which calls split_vma().
I think 65536 is not safe as _default_ and reduce it to 65530 is good
for avoiding unexpected corrupted core.
Anyway, max_map_count can be enlarged by sysctl if a user is brave..
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jakub Jelinek <jakub@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.
Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away. Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.
This patch is required by KVM's IRQfd code, which is still under
development.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't unlock on vfs_rejected_lock path in afs_do_setlk, since the lock
is unlocked after abort_attempt label.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add notification messages that allow the filesystem to invalidate VFS
caches.
Two notifications are added:
1) inode invalidation
- invalidate cached attributes
- invalidate a range of pages in the page cache (this is optional)
2) dentry invalidation
- try to invalidate a subtree in the dentry cache
Care must be taken while accessing the 'struct super_block' for the
mount, as it can go away while an invalidation is in progress. To
prevent this, introduce a rw-semaphore, that is taken for read during
the invalidation and taken for write in the ->kill_sb callback.
Cc: Csaba Henk <csaba@gluster.com>
Cc: Anand Avati <avati@zresearch.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
This patch lets filesystems handle masking the file mode on creation.
This is needed if filesystem is using ACLs.
- The CREATE, MKDIR and MKNOD requests are extended with a "umask"
parameter.
- A new FUSE_DONT_MASK flag is added to the INIT request/reply. With
this the filesystem may request that the create mode is not masked.
CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>