Everytime I have to deal with btrfs_cont_expand I stare at it for 20 minutes
trying to remember what exactly it does and why the hell we need it. So add a
comment to save future-Josef some time. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Mark_inode_dirty will call btrfs_dirty_inode which will take care of updating
the inode. This makes setsize a little cleaner since we don't have to start a
transaction and update the inode in there, we can just call mark_inode_dirty.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
We don't need an orphan item when expanding files, we just need them for
truncating them, so only add the orphan item in btrfs_truncate instead of in
btrfs_setsize. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
This fixes a problem where if truncate fails the inode will still be on the in
memory orphan list. This is will make us complain when the inode gets destroyed
because it's still on the orphan list. So if we fail just remove us from the in
memory list and carry on.
Signed-off-by: Josef Bacik <josef@redhat.com>
If we cannot truncate an inode for some reason we will never delete the orphan
item associated with that inode, which means that we will loop forever in
btrfs_orphan_cleanup. Instead of doing this just return error so we fail to
mount. It sucks, but hey it's better than hanging. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Now that we can handle having errors in the truncate path lets make sure we
return errors instead of doing BUG_ON() and such. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
->truncate() is going away, instead all of the work needs to be done in
->setattr(). So this converts us over to do this. It's fairly straightforward,
just get rid of our .truncate inode operation and call btrfs_truncate() directly
from btrfs_setsize. This works out better for us since truncate can technically
return ENOSPC, and before we had no way of letting anybody know. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Since we alloc/free free space entries a whole lot, lets use a slab to keep
track of them. This makes some of my tests slightly faster. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
We track delayed allocation per inodes via 2 counters, one is
outstanding_extents and reserved_extents. Outstanding_extents is already an
atomic_t, but reserved_extents is not and is protected by a spinlock. So
convert this to an atomic_t and instead of using a spinlock, use atomic_cmpxchg
when releasing delalloc bytes. This makes our inode 72 bytes smaller, and
reduces locking overhead (albiet it was minimal to begin with). Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Really we don't need to memset the pages array at all, since we know how many
pages we're going to use in the array and pass that around. So don't memset,
just trust we're not idiots and we pass num_pages around properly.
Signed-off-by: Josef Bacik <josef@redhat.com>
Our aio_write function is huge and kind of hard to follow at times. So this
patch fixes this by breaking out the buffered and direct write paths out into
seperate functions so it's a little clearer what's going on. I've also fixed
some wrong typing that we had and added the ability to handle getting an error
back from btrfs_set_extent_delalloc. Tested this with xfstests and everything
came out fine. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
We must not use dummy for index.
After the first index, READ32(dummy) will change dummy!!!!
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: Trond points out READ_BUF alone is sufficient.]
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y
for cleaner conditional inclusion.
Signed-off-by: matt mooney <mfm@muteddisk.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
We don't have proper reference counting for this yet, so we run into
cases where the device is pulled and we OOPS on flushing the fs data.
This happens even though the dirty inodes have already been
migrated to the default_backing_dev_info.
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.
Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.
Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snizer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
'write_op' was still used, even though it was always WRITE_SYNC now.
Add plugging around the cases where it submits IO, and flush them
before we end up waiting for that IO.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
'write_op' was still used, even though it was always WRITE_SYNC now.
Add plugging around the cases where it submits IO, and flush them
before we end up waiting for that IO.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It used WRITE_SYNC_PLUG before and potentially submits a batch
of IO, so lets enable plugging for this case.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
When allocating a new inode, we need to make sure i_sync_tid and
i_datasync_tid are initialized. Otherwise, one or both of these two
values could be left initialized to zero, which could potentially
result in BUG_ON in jbd2_journal_commit_transaction.
(This could happen by having journal->commit_request getting set to
zero, which could wake up the kjournald process even though there is
no running transaction, which then causes a BUG_ON via the
J_ASSERT(j_ruinning_transaction != NULL) statement.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The last remaining instances of ->get_sb() can be converted ->mount()
now - nothing in them uses new vfsmount anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) ->show_devname(m, mnt) - what to put into devname columns in mounts,
mountinfo and mountstats
b) ->show_path(m, mnt) - what to put into relative path column in mountinfo
Leaving those NULL gives old behaviour. NFS switched to using those.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
part 3: now we have everything to get nfs_path() just by dentry -
just follow to (disconnected) root and pick the rest of the thing
there.
Start killing propagation of struct vfsmount * on the paths that
used to bring it to nfs_path().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
part 2: make sure that disconnected roots have corresponding mnt_devname
values stashed into them.
Have nfs*_get_root() stuff a copy of devname into ->d_fsdata of the
found root, provided that it is disconnected.
Have ->d_release() free it when dentry goes away.
Have the places where NFS uses ->d_fsdata for sillyrename (and that
can *never* happen to a disconnected root - dentry will be attached
to its parent) free old devname copies if they find those.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
According to rfc5661,
ca_maxresponsesize_cached:
Like ca_maxresponsesize, but the maximum size of a reply that
will be stored in the reply cache (Section 2.10.6.1). For each
channel, the server MAY decrease this value, but MUST NOT
increase it.
the latest kernel(2.6.38-rc8) may increase the value for ignoring
request's ca_maxresponsesize_cached value. We should not ignore it.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It turns out that while a maximum of 8 partitions may be what people
"should" have had, you can actually fit up to 18 entries(*) in a sector.
And some people clearly were taking advantage of that, like Michael
Cree, who had ten partitions on one of his OSF disks.
(*) The OSF partition data starts at byte offset 64 in the first sector,
and the array of 16-byte partition entries start at offset 148 in
the on-disk partition structure.
Reported-by: Michael Cree <mcree@orcon.net.nz>
Cc: stable@kernel.org (v2.6.38)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iprune_sem is continously giving us lockdep warnings because we do take it in
read mode in the reclaim path, but we're also doing non-NOFS allocations under
it taken in write mode.
Taking a bit deeper look at it I think it's fixable quite trivially:
- for invalidate_inodes we do not need iprune_sem at all. We have an active
reference on the superblock, so the filesystem is not going away until it
has finished.
- for evict_inodes we do need it, to make sure prune_icache has done it's
work before we tear down the superblock. But there is no reason to
hold it over the actual reclaim operation - it's enough to cycle through
it after the actual reclaim to make sure we wait for any pending
prune_icache to complete. We just have to remove the WARN_ON for
otherwise busy inodes as they can actually happen now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When debugging is enabled, we allocate a buffer of PEB size for
various debugging purposes. However, now all users of this buffer
are gone and we can safely remove it and save 128KiB or more RAM.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_scan_orphans()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dump_lpt_leb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_check_ltab_lnum()', dynamically allocate it when needed. The
intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and
save 128KiB of RAM (or more if PEB size is larger). Indeed,
currently we allocate this memory even if the user never enables
any self-check, which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'scan_check_cb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_dump_leb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Handle the rare case where a directory metadata block is uncompressed and
corrupted, leading to a kernel oops in directory scanning (memcpy).
Normally corruption is detected at the decompression stage and dealt with
then, however, this will not happen if:
- metadata isn't compressed (users can optionally request no metadata
compression), or
- the compressed metadata block was larger than the original, in which
case the uncompressed version was used, or
- the data was corrupt after decompression
This patch fixes this by adding some sanity checks against known maximum
values.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename
because the latter relies on being able to determine the parent
directory of the dentry in the ->iput() callback in order to send the
appropriate unlink rpc call.
Looking at the code that cares about races with dput(), there doesn't
seem to be anything that specifically uses d_parent as a test for
whether or not there is a race:
- __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent
- shrink_dcache_for_umount() is safe since nothing else can rearrange
the dentries in that super block.
- have_submount(), select_parent() and d_genocide() can test for a
deletion if we set the DCACHE_DISCONNECTED flag when the dentry
is removed from the parent's d_subdirs list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org (2.6.38, needs commit c826cb7dfc "dcache.c:
create helper function for duplicated functionality" )
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This creates a helper function for he "try to ascend into the parent
directory" case, which was written out in triplicate before. With all
the locking and subtle sequence number stuff, we really don't want to
duplicate that kind of code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pull the handling of current->total_link_count into
__do_follow_link()
* put the common "do ->put_link() if needed and path_put() the link"
stuff into a helper (put_link(nd, link, cookie))
* rename __do_follow_link() to follow_link(), while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The last remaining place (resolution of nested symlink) converted
to the loop of the same kind we have in path_lookupat() and
path_openat().
Note that we still *do* have a recursion in pathname resolution;
can't avoid it, really. However, it's strictly for nested symlinks
now - i.e. ones in the middle of a pathname.
link_path_walk() has lost the tail now - it always walks everything
except the last component.
do_follow_link() renamed to nested_symlink() and moved down.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that link_path_walk() is called without LOOKUP_PARENT
only from do_follow_link(), we can simplify the checks in
last component handling. First of all, checking if we'd
arrived to a directory is not needed - the caller will check
it anyway. And LOOKUP_FOLLOW is guaranteed to be there,
since we only get to that place with nd->depth > 0.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
new helper: walk_component(). Handles everything except symlinks;
returns negative on error, 0 on success and 1 on symlinks we decided
to follow. Drops out of RCU mode on such symlinks.
link_path_walk() and do_last() switched to using that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>