commit 8268f5a741 ("deny partial write for loop dev fd") tried to fix the
loop device partial read information leak problem. But it changed the
semantics of read behavior. When we read beyond the end of the device we
should get 0 bytes, which is normal behavior, we should not just return
-EIO
Instead of returning -EIO, zero out the bio to avoid information leak in
case of partail read.
Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Dmitry Monakhov <dmonakhov@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
floppy driver does not call add_disk() on all the drives hence we don't take
gendisk reference on request queue for these drives. Don't call put_disk()
with disk->queue set, otherwise we try to put the reference we never took.
Reported-and-tested-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de>
Signed-off-by: Vivek Goyal<vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
add_disk() takes gendisk reference on request queue. If driver failed during
initialization and never called add_disk() then that extra reference is not
taken. That reference is put in put_disk(). floppy driver allocates the
disk, allocates queue, sets disk->queue and then relizes that floppy
controller is not present. It tries to tear down everything and tries to
put a reference down in put_disk() which was never taken.
In such error cases cleanup disk->queue before calling put_disk() so that
we never try to put down a reference which was never taken in first place.
Reported-and-tested-by: Suresh Jayaraman <sjayaraman@suse.com>
Tested-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Removed the following:
* irrelevant argument 'barrier' of mtip_hw_submit_io()
* unused member 'eh_active' of struct driver_data
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The rbd_client structure uses a kref to arrange for cleaning up and
freeing an instance when its last reference is dropped. The cleanup
routine is rbd_client_release(), and one of the things it does is
delete the rbd_client from rbd_client_list. It acquires node_lock
to do so, but the way it is done is still not safe.
The problem is that when attempting to reuse an existing rbd_client,
the structure found might already be in the process of getting
destroyed and cleaned up.
Here's the scenario, with "CLIENT" representing an existing
rbd_client that's involved in the race:
Thread on CPU A | Thread on CPU B
--------------- | ---------------
rbd_put_client(CLIENT) | rbd_get_client()
kref_put() | (acquires node_lock)
kref->refcount becomes 0 | __rbd_client_find() returns CLIENT
calls rbd_client_release() | kref_get(&CLIENT->kref);
| (releases node_lock)
(acquires node_lock) |
deletes CLIENT from list | ...and starts using CLIENT...
(releases node_lock) |
and frees CLIENT | <-- but CLIENT gets freed here
Fix this by having rbd_put_client() acquire node_lock. The result
could still be improved, but at least it avoids this problem.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
If an existing rbd client is found to be suitable for use in
rbd_get_client(), the rbd_options structure is not being
freed as it should. Fix that.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
The type of 'make_request_fn' changed in 5a7bbad27a ("block: remove
support for bio remapping from ->make_request"), but the merge of the
nvme driver didn't take that into account, and as a result the driver
would compile with a warning:
drivers/block/nvme.c: In function 'nvme_alloc_ns':
drivers/block/nvme.c:1336:2: warning: passing argument 2 of 'blk_queue_make_request' from incompatible pointer type [enabled by default]
include/linux/blkdev.h:830:13: note: expected 'void (*)(struct request_queue *, struct bio *)' but argument is of type 'int (*)(struct request_queue *, struct bio *)'
It's benign, but the warning is annoying.
Reported-by: Stephen Rothwell <sfr@canb.auug.org>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a wrapper around scsi_cmd_ioctl that takes a block device.
The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter points out that it's an int, not a bool:
pcd.c:427: if (verbose > 1)
pcd.c:433: if (verbose > 1)
pcd.c:437: if (verbose < 2)
pcd.c:506:#define DBMSG(msg) ((verbose>1)?(msg):NULL)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
New rbd device structures get initialized in rbd_add(). Many of
the fields rely on being initially zero-filled. However we lockdep
was noticing that the rw_semaphore embedded in the header field
was not getting properly initialized. Fix that.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Delete the vq and flush any pending requests from the block queue on the
freeze callback to prepare for hibernation.
Re-create the vq in the restore callback to resume normal function.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The probe and PM restore functions will share this code.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix a theoretical race related to config work
handler: a config interrupt might happen
after we flush config work but before we
reset the device. It will then cause the
config work to run during or after reset.
Two problems with this:
- if this runs after device is gone we will get use after free
- access of config while reset is in progress is racy
(as layout is changing).
As a solution
1. flush after reset when we know there will be no more interrupts
2. add a flag to disable config access before reset
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The number of submission & completion queues should be set by calling
Set Features, not Get Features.
Reported-by: Kwok Kong <Kwok.Kong@idt.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
QUEUE_FLAG_* are flags (other than QUEUE_FLAG_DEFAULT), so they cannot
be ORed together. Set the queue flags using queue_flag_set_unlocked().
Reported-by: Donald Wood <donald.e.wood@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
By using the iod->nents field (the same way other I/O paths do), we can
avoid recalculating the number of sg entries at unmap time, and make
nvme_unmap_user_pages() easier to call.
Also, use the 'write' parameter instead of assuming DMA_FROM_DEVICE.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
We were always mapping as DMA_FROM_DEVICE then unmapping with
DMA_TO_DEVICE which was clearly not correct. Follow the same pattern as
nvme_submit_io() and key off the bottom bit of the opcode to determine
whether this is a read or a write.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
IO_TIMEOUT is a little too generic and might be used by other parts of
the kernel in the future.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The new merged data structure is called nvme_iod. This improves performance
for mid-sized I/Os (in the 16k range) since we save a memory allocation.
It is also a slightly simpler interface to use.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The queue is only needed for some rare occasions, and it's more consistent
to pass the device around.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Upcoming patches require calling get_nvmeq when we don't have a namespace.
Some callers already have the device in a local variable anyway.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Instead of encoding the handler type in the bottom two bits of the
per-completion context pointer, store the handler function as well
as the context pointer. This gives us more flexibility and the code
is clearer. It comes at the cost of an extra 8k of memory per queue,
but this feels like a reasonable price to pay.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Drivers shouldn't use NO_IRQ. Microblaze and PPC
define NO_IRQ as 0 and this reference will be removed
in near future.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Reviewed-by: Ryan Mallon <rmallon@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
CC: Rob Herring <rob.herring@calxeda.com>
The 'name', 'owner', and 'mod_name' members are redundant with the
identically named fields in the 'driver' sub-structure. Rather than
switching each instance to specify these fields explicitly, introduce
a macro to simplify this.
Eliminate further redundancy by allowing the drvname argument to
DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from
the ID table will be used for .driver.name).
Also eliminate the questionable xenbus_register_{back,front}end()
wrappers - their sole remaining purpose was the checking of the
'owner' field, proper setting of which shouldn't be an issue anymore
when the macro gets used.
v2: Restore DRV_NAME for the driver name in xen-pciback.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Earlier, rebuild monitoring was done in the context of probe. Now the service
thread takes the responsibility of rebuild monitoring, and probe returns good
status.
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.
Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.
The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
[v1: Seperated the drivers/block/cciss_scsi.c out of this patch]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The only user left for blk_insert_request() is sx8 and it can be
trivially switched to use blk_execute_rq_nowait() - special requests
aren't included in io stat and sx8 doesn't use block layer tagging.
Switch sx8 and kill blk_insert_requeset().
This patch doesn't introduce any functional difference.
Only compile tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The old PowerMac swim3 driver has some "interesting" locking issues,
using a private lock and failing to lock the queue before completing
requests, which triggered WARN_ONs among others.
This rips out the private lock, makes everything operate under the
block queue lock, and generally makes things simpler.
We used to also share a queue between the two possible instances which
was problematic since we might pick the wrong controller in some cases,
so make the queue and the current request per-instance and use
queuedata to point to our private data which is a lot cleaner.
We still share the queue lock but then, it's nearly impossible to actually
use 2 swim3's simultaneously: one would need to have a Wallstreet
PowerBook, the only machine afaik with two of these on the motherboard,
and populate both hotswap bays with a floppy drive (the machine ships
only with one), so nobody cares...
While at it, add a little fix to clear up stale interrupts when loading
the driver or plugging a floppy drive in a bay.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move some forward declarations into header files and adjust includes.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
This doesn't interact with resizing well, since it doesn't set the
size of the device to the size at the snapshot. It's also an expensive
operation to be synchronous. Rollback can still be done with the
userspace rbd tool.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
Please let me know if I missed anything, and I will try to get it changed and resent.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
discard_alignment is not relevant to the loop driver since it is
supposed to be set as a workaround for the old sector 63 alignments.
So set it to zero rather than block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We weren't filling in the transfer length of the
flush cache command (it transfers 4 bytes of zeroes).
Firmware didn't seem to be bothered by this, but it
should be fixed.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
IRQF_SHARED is required for older controllers that don't support MSI(X)
and which may end up sharing an interrupt.
Also remove deprecated IRQF_DISABLED.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The loop driver does not support discard if encryption is enabled,
fix the comment.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We recently introduce new continue in the loop which make gcc complain.
In theory if MTIP_FLAG_SVC_THD_ACTIVE_BIT is set, we could hit continue
over and over until eventually we time out of the loop. In that case
"active" should be set as true, but right now it's uninitialized.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* queue ncq commands when a non-ncq is in progress or error handling is active
* merge variables 'internal_cmd_in_progress' and 'eh_active' into new variable 'flags'
* get rid of read/write semaphore 'internal_sem'
* new service thread to issue queued commands
* use macros from ata.h for command codes
* return ENOTTY for BLKFLSBUF ioctl
* style changes
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As of dfaa2ef68e, loop devices support
discard request now. We could just issue a discard request, and
the loop driver will punch the hole for us, so we don't need to touch
the internals of loop device and punch the hole ourselves, Thanks.
V0->V1: rebased on devel/for-jens-3.3
Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Part of the blkdev_issue_discard(xx) operation is that it can also
issue a secure discard operation that will permanantly remove the
sectors in question. We advertise that we can support that via the
'discard-secure' attribute and on the request, if the 'secure' bit
is set, we will attempt to pass in REQ_DISCARD | REQ_SECURE.
CC: Li Dongyang <lidongyang@novell.com>
[v1: Used 'flag' instead of 'secure:1' bit]
[v2: Use 'reserved' uint8_t instead of adding a new value]
[v3: Check for nseg when mapping instead of operation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In a union type structure to deal with the overlapping
attributes in a easier manner.
Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Smatch has a new check for Rosenberg type information leaks where structs
are copied to the user with uninitialized stack data in them. i In this
case, the pg_write_hdr struct has a hole in it.
struct pg_write_hdr {
char magic; /* 0 1 */
char func; /* 1 1 */
/* XXX 2 bytes hole, try to pack */
int dlen; /* 4 4 */
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tim Waugh <tim@cyberelk.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A long time ago, probably in 2002, one of the distros, or maybe more than
one, loaded block drivers prior to loading the SCSI mid layer. This meant
that the cciss driver, being a block driver, could not engage the SCSI mid
layer at init time without panicking, and relied on being poked by a
userland program after the system was up (and the SCSI mid layer was
therefore present) to engage the SCSI mid layer.
This is no longer the case, and cciss can safely rely on the SCSI mid
layer being present at init time and engage the SCSI mid layer straight
away. This means that users will see their tape drives and medium
changers at driver load time without need for a script in /etc/rc.d that
does this:
for x in /proc/driver/cciss/cciss*
do
echo "engage scsi" > $x
done
However, if no tape drives or medium changers are detected, the SCSI mid
layer will not be engaged. If a tape drive or medium change is later
hot-added to the system it will then be necessary to use the above script
or similar for the device(s) to be acceesible.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>