MTD internal API presently uses 32-bit values to represent
device size. This patch updates them to 64-bits but leaves
the external API unchanged. Extending the external API
is a separate issue for several reasons. First, no one
needs it at the moment. Secondly, whether the implementation
is done with IOCTLs, sysfs or both is still debated. Thirdly
external API changes require the internal API to be accepted
first.
Note that although the MTD API will be able to support 64-bit
device sizes, existing drivers do not and are not required
to do so, although NAND base has been updated.
In general, changing from 32-bit to 64-bit values cause little
or no changes to the majority of the code with the following
exceptions:
- printk message formats
- division and modulus of 64-bit values
- NAND base support
- 32-bit local variables used by mtdpart and mtdconcat
- naughtily assuming one structure maps to another
in MEMERASE ioctl
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There's no point in having too short SG_IO timeouts, since if the
command does end up timing out, we'll end up through the reset sequence
that is several seconds long in order to abort the command that timed
out.
As a result, shorter timeouts than a few seconds simply do not make
sense, as the recovery would be longer than the timeout itself.
Add a BLK_MIN_SG_TIMEOUT to match the existign BLK_DEFAULT_SG_TIMEOUT.
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sure all FMODE_ constants are documents, and ensure a coherent
style for the already existing comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Update FMODE_NDELAY before each ioctl call so that we can kill the
magic FMODE_NDELAY_NOW. It would be even better to do this directly
in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
not just block special files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Due to a wrong safety check in af_can.c it was not possible to filter
for SFF frames with a specific CAN identifier without getting the
same selected CAN identifier from a received EFF frame also.
This fix has a minimum (but user visible) impact on the CAN filter
API and therefore the CAN version is set to a new date.
Indeed the 'old' API is still working as-is. But when now setting
CAN_(EFF|RTR)_FLAG in can_filter.can_mask you might get less traffic
than before - but still the stuff that you expected to get for your
defined filter ...
Thanks to Kurt Van Dijck for pointing at this issue and for the review.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
devices.
When stacking devices (LVM over MD over SCSI) some of the request queue
parameters are not set up correctly in some cases by default, namely
max_segment_size and and seg_boundary mask.
If you create MD device over SCSI, these attributes are zeroed.
Problem become when there is over this mapping next device-mapper mapping
- queue attributes are set in DM this way:
request_queue max_segment_size seg_boundary_mask
SCSI 65536 0xffffffff
MD RAID1 0 0
LVM 65536 -1 (64bit)
Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of
physical segments according to these parameters.
During the generic_make_request() is segment cout recalculated and can
increase bio->bi_phys_segments count over the allowed limit. (After
bio_clone() in stack operation.)
Thi is specially problem in CCISS driver, where it produce OOPS here
BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
(MAXSEGENTRIES is 31 by default.)
Sometimes even this command is enough to cause oops:
dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
This command generates bios with 250 sectors, allocated in 32 4k-pages
(last page uses only 1024 bytes).
For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
unfortunatelly on lower layer it is recalculated to 32 segments and this
violates CCISS restriction and triggers BUG_ON().
The patch tries to fix it by:
* initializing attributes above in queue request constructor
blk_queue_make_request()
* make sure that blk_queue_stack_limits() inherits setting
(DM uses its own function to set the limits because it
blk_queue_stack_limits() was introduced later. It should probably switch
to use generic stack limit function too.)
* sets the default seg_boundary value in one place (blkdev.h)
* use this mask as default in DM (instead of -1, which differs in 64bit)
Bugs related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=471639http://bugzilla.kernel.org/show_bug.cgi?id=8672
Signed-off-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
both start the timeout timer. Barrier code dequeues the original
barrier request but doesn't passes the request itself to lower level
driver, only broken down proxy requests; however, as the original
barrier code goes through the same dequeue path and timeout timer is
started on it. If barrier sequence takes long enough, this timer
expires but the low level driver has no idea about this request and
oops follows.
Timeout timer shouldn't have been started on the original barrier
request as it never goes through actual IO. This patch unexports
elv_dequeue_request(), which has no external user anyway, and makes it
operate on elevator proper w/o adding the timer and make
blkdev_dequeue_request() call elv_dequeue_request() and add timer.
Internal users which don't pass the request to driver - barrier code
and end_that_request_last() - are converted to use
elv_dequeue_request().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The previous patch from Alan Cox ("nfsd: fix vm overcommit crash",
commit 731572d39f) fixed the problem where
knfsd crashes on exported shmemfs objects and strict overcommit is set.
But the patch forgot supporting the case when CONFIG_SECURITY is
disabled.
This patch copies a part of his fix which is mainly for detecting a bug
earlier.
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It seems that on some nVidia controllers using AltStatus register
can be unreliable so default to Status register if the PCI device
is in Compatibility Mode. In order to achieve this:
* Add ide_pci_is_in_compatibility_mode() inline helper to <linux/ide.h>.
* Add IDE_HFLAG_BROKEN_ALTSTATUS host flag and set it in amd74xx host
driver for nVidia controllers in Compatibility Mode.
* Teach actual_try_to_identify() and drive_is_ready() about the new flag.
This fixes the regression caused by removal of CONFIG_IDEPCI_SHARE_IRQ
config option in 2.6.25 and using AltStatus register unconditionally when
available (kernel.org bugs #11659 and #10216).
[ Moreover for CONFIG_IDEPCI_SHARE_IRQ=y (which is what most people
and distributions use) it never worked correctly. ]
Thanks to Remy LABENE and Lars Winterfeld for help with debugging the problem.
More info at:
http://bugzilla.kernel.org/show_bug.cgi?id=11659http://bugzilla.kernel.org/show_bug.cgi?id=10216
Reported-by: Remy LABENE <remy.labene@free.fr>
Tested-by: Remy LABENE <remy.labene@free.fr>
Tested-by: Lars Winterfeld <lars.winterfeld@tu-ilmenau.de>
Acked-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2nd part of the fixes needed for
http://bugzilla.kernel.org/show_bug.cgi?id=11796.
When the idr tree is either grown or shrunk, then the update to the number
of layers and the top pointer were not atomic. This race caused crashes.
The attached patch fixes that by replicating the layers counter in each
layer, thus idr_find doesn't need idp->layers anymore.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Clement Calmels <cboulte@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface. Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds. To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced. A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:
max_user_instances = Maximum number of devices - per user
max_user_watches = Maximum number of "watched" fds - per user
The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM. As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users. The
default value for "max_user_instances" is set to 128, that should be
enough too.
This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already
listed, so that should be ok.
[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some recent Seagate harddrives have firmware bug which causes FLUSH
CACHE to timeout under certain circumstances if NCQ is being used.
This can be worked around by disabling NCQ and fixed by updating the
firmware. Implement ATA_HORKAGE_FIRMWARE_UPDATE and blacklist these
devices.
The wiki page has been updated to contain information on this issue.
http://ata.wiki.kernel.org/index.php/Known_issues
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).
Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Same as for hotplug_cpu - we want static notifier_block in there in meminitdata,
to avoid false positives whenever it's used.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7ff93f8b ("mlx4_core: Multiple port type support") introduced
support for different port types. As part of that support, SET_PORT
is invoked to set the port type during driver startup. However, as a
side-effect, for IB ports the invocation of this command also sets the
port's capability mask to zero (losing the default value set by FW).
To fix this, get the default ib port capabilities (via a MAD_IFC Port
Info query) during driver startup, and save them for use in the
mlx4_SET_PORT command when setting the port-type to Infiniband.
This patch fixes problems with subnet manager (SM) failover such as
<https://bugs.openfabrics.org/show_bug.cgi?id=1183>, which occurred
because the IsTrapSupported bit in the capability mask was zeroed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With aliasing VIPT cache support, the ARM implementation of
clear_user_page() and copy_user_page() sets up a temporary kernel space
mapping such that we have the same cache colour as the userspace page.
This avoids having to consider any userspace aliases from this operation.
However, when highmem is enabled, kmap_atomic() have to setup mappings.
The copy_user_highpage() and clear_user_highpage() call these functions
before delegating the copies to copy_user_page() and clear_user_page().
The effect of this is that each of the *_user_highpage() functions setup
their own kmap mapping, followed by the *_user_page() functions setting
up another mapping. This is rather wasteful.
Thankfully, copy_user_highpage() can be overriden by architectures by
defining __HAVE_ARCH_COPY_USER_HIGHPAGE. However, replacement of
clear_user_highpage() is more difficult because its inline definition
is not conditional. It seems that you're expected to define
__HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE and provide a replacement
__alloc_zeroed_user_highpage() implementation instead.
The allocation itself is fine, so we don't want to override that. What
we really want to do is to override clear_user_highpage() with our own
version which doesn't kmap_atomic() unnecessarily.
Other VIPT architectures (PARISC and SH) would also like to override
this function as well.
Acked-by: Hugh Dickins <hugh@veritas.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When entryinfo was a standalone parameter to functions, it used to be
"const void *". Put the const back in.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After adding a node into the machine, top cpuset's mems isn't updated.
By reviewing the code, we found that the update function
cpuset_track_online_nodes()
was invoked after node_states[N_ONLINE] changes. It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY. So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.
This patch fixes it. And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Menage <menage@google.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new accept4() system call. The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.
The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument. Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)
SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4(). This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec(). More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).
The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.
Here's a test program. Works on x86-32. Should work on x86-64, but
I (mtk) don't have a system to hand to test with.
It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().
I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.
/* test_accept4.c
Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
<mtk.manpages@gmail.com>
Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#define PORT_NUM 33333
#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
/**********************************************************************/
/* The following is what we need until glibc gets a wrapper for
accept4() */
/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK O_NONBLOCK
#endif
#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif
static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
printf("Calling accept4(): flags = %x", flags);
if (flags != 0) {
printf(" (");
if (flags & SOCK_CLOEXEC)
printf("SOCK_CLOEXEC");
if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
printf(" ");
if (flags & SOCK_NONBLOCK)
printf("SOCK_NONBLOCK");
printf(")");
}
printf("\n");
#if USE_SOCKETCALL
long args[6];
args[0] = fd;
args[1] = (long) sockaddr;
args[2] = (long) addrlen;
args[3] = flags;
return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}
/**********************************************************************/
static int
do_test(int lfd, struct sockaddr_in *conn_addr,
int closeonexec_flag, int nonblock_flag)
{
int connfd, acceptfd;
int fdf, flf, fdf_pass, flf_pass;
struct sockaddr_in claddr;
socklen_t addrlen;
printf("=======================================\n");
connfd = socket(AF_INET, SOCK_STREAM, 0);
if (connfd == -1)
die("socket");
if (connect(connfd, (struct sockaddr *) conn_addr,
sizeof(struct sockaddr_in)) == -1)
die("connect");
addrlen = sizeof(struct sockaddr_in);
acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
closeonexec_flag | nonblock_flag);
if (acceptfd == -1) {
perror("accept4()");
close(connfd);
return 0;
}
fdf = fcntl(acceptfd, F_GETFD);
if (fdf == -1)
die("fcntl:F_GETFD");
fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
((closeonexec_flag & SOCK_CLOEXEC) != 0);
printf("Close-on-exec flag is %sset (%s); ",
(fdf & FD_CLOEXEC) ? "" : "not ",
fdf_pass ? "OK" : "failed");
flf = fcntl(acceptfd, F_GETFL);
if (flf == -1)
die("fcntl:F_GETFD");
flf_pass = ((flf & O_NONBLOCK) != 0) ==
((nonblock_flag & SOCK_NONBLOCK) !=0);
printf("nonblock flag is %sset (%s)\n",
(flf & O_NONBLOCK) ? "" : "not ",
flf_pass ? "OK" : "failed");
close(acceptfd);
close(connfd);
printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
return fdf_pass && flf_pass;
}
static int
create_listening_socket(int port_num)
{
struct sockaddr_in svaddr;
int lfd;
int optval;
memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
svaddr.sin_port = htons(port_num);
lfd = socket(AF_INET, SOCK_STREAM, 0);
if (lfd == -1)
die("socket");
optval = 1;
if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval)) == -1)
die("setsockopt");
if (bind(lfd, (struct sockaddr *) &svaddr,
sizeof(struct sockaddr_in)) == -1)
die("bind");
if (listen(lfd, 5) == -1)
die("listen");
return lfd;
}
int
main(int argc, char *argv[])
{
struct sockaddr_in conn_addr;
int lfd;
int port_num;
int passed;
passed = 1;
port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;
memset(&conn_addr, 0, sizeof(struct sockaddr_in));
conn_addr.sin_family = AF_INET;
conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
conn_addr.sin_port = htons(port_num);
lfd = create_listening_socket(port_num);
if (!do_test(lfd, &conn_addr, 0, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
passed = 0;
close(lfd);
exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}
[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure. This change will be used to fix md
autodetection bug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Inotify watch removals suck violently.
To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex. That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount. We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.
Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done. Cleanup is just deactivate_super().
However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords. That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.
We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable. OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e. that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().
So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong. We had
to drop ih->mutex before we could grab ->s_umount. So the watch
could've been gone already.
That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer. If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address. That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that. Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.
So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check. If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.
And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A common reason for device drivers to implement their own printk macros
is the lack of a printk prefix with the standard pr_xyz macros.
Introduce a pr_fmt() macro that is applied for every pr_xyz macro to the
format string.
The most common use of the pr_fmt macro would be to add the name of the
device driver to all pr_xyz messages in a source file.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix this warning:
net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used
this is a lockdep macro problem in the !LOCKDEP case.
We cannot convert it to an inline because the macro works on multiple types,
but we can mark the parameter used.
[ also clean up a misaligned tab in sock_lock_init_class_and_name() ]
[ also remove #ifdefs from around af_family_clock_key strings - which
were certainly added to get rid of the ugly build warnings. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch (as1155) fixes a bug in usbcore. When interfaces are
deleted, either because the device was disconnected or because of a
configuration change, the extra attribute files and child endpoint
devices may get left behind. This is because the core removes them
before calling device_del(). But during device_del(), after the
driver is unbound the core will reinstall altsetting 0 and recreate
those extra attributes and children.
The patch prevents this by adding a flag to record when the interface
is in the midst of being unregistered. When the flag is set, the
attribute files and child devices will not be created.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@kernel.org> [2.6.27, 2.6.26, 2.6.25]
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Explain this SLAB_DESTROY_BY_RCU thing...
[hugh@veritas.com: add a pointer to comment in mm/slab.c]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
On macbooks there are specific keys for the user-space functions Expose
and Dashboard, which currently has no counterpart in input.h. This patch
adds KEY_SCALE and KEY_DASHBOARD, and maps the keyboard accordingly.
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
C2port implements a two wire serial communication protocol (bit
banging) designed to enable in-system programming, debugging, and
boundary-scan testing on low pin-count Silicon Labs devices.
Currently this code supports only flash programming through sysfs
interface but extensions shoud be easy to add.
Signed-off-by: Rodolfo Giometti <giometti@linux.it>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for the RTC provided by the Wolfson Microelectronics
WM8350.
This driver was originally written by Graeme Gregory and Liam Girdwood,
though it has been modified since then to update it to current mainline
coding standards and for API completeness.
[akpm@linux-foundation.org: s/schedule_timeout_interruptible/schedule_timeout_uninterruptible/ to prevent bogus timeout when signal_pending()]
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Liam Girdwood <linux@wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It mistakenly assumes that a static local in an inlined function is a
kernel-wide singleton. It also has no callers, so let's remove it.
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SH7723 has SCIFA. This module is similer SCI register map, but it has FIFO.
So this patch adds new type(PORT_SCIFA) and change some type checking.
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Impact: enable/disable ring buffer recording API added
Several kernel developers have requested that there be a way to stop
recording into the ring buffers with a simple switch that can also
be enabled from userspace. This patch addes a new kernel API to the
ring buffers called:
tracing_on()
tracing_off()
When tracing_off() is called, all ring buffers will not be able to record
into their buffers.
tracing_on() will enable the ring buffers again.
These two act like an on/off switch. That is, there is no counting of the
number of times tracing_off or tracing_on has been called.
A new file is added to the debugfs/tracing directory called
tracing_on
This allows for userspace applications to also flip the switch.
echo 0 > debugfs/tracing/tracing_on
disables the tracing.
echo 1 > /debugfs/tracing/tracing_on
enables it.
Note, this does not disable or enable any tracers. It only sets or clears
a flag that needs to be set in order for the ring buffers to write to
their buffers. It is a global flag, and affects all ring buffers.
The buffers start out with tracing_on enabled.
There are now three flags that control recording into the buffers:
tracing_on: which affects all ring buffer tracers.
buffer->record_disabled: which affects an allocated buffer, which may be set
if an anomaly is detected, and tracing is disabled.
cpu_buffer->record_disabled: which is set by tracing_stop() or if an
anomaly is detected. tracing_start can not reenable this if
an anomaly occurred.
The userspace debugfs/tracing/tracing_enabled is implemented with
tracing_stop() but the user space code can not enable it if the kernel
called tracing_stop().
Userspace can enable the tracing_on even if the kernel disabled it.
It is just a switch used to stop tracing if a condition was hit.
tracing_on is not for protecting critical areas in the kernel nor is
it for stopping tracing if an anomaly occurred. This is because userspace
can reenable it at any time.
Side effect: With this patch, I discovered a dead variable in ftrace.c
called tracing_on. This patch removes it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix hang/crash on ia64 under high load
This is ugly, but the simplest patch by far.
Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check ->signal != NULL.
Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
before __cleanup_signal() to make sure ->signal can't be freed under
task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.
Thanks to Ingo who nacked my previous buggy patch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Doug Chapman <doug.chapman@hp.com>
This fixes compilation of the SSB DMA-API code on non-PCI platforms.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts the following three commits which convert libata to
use block layer tagging.
43a49cbdf3e013e13bf62fca5ccf97
Although using block layer tagging is the right direction, due to the
tight coupling among tag number, data structure allocation and
hardware command slot allocation, libata doesn't work correctly with
the current conversion.
The biggest problem is guaranteeing that tag 0 is always used for
non-NCQ commands. Due to the way blk-tag is implemented and how SCSI
starts and finishes requests, such guarantee can't be made. I'm not
sure whether this would actually break any low level driver but it
doesn't look like a good idea to break such assumption given the
frailty of ATA controllers.
So, for the time being, keep using the old dumb in-libata qc
allocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axobe <jens.axboe@oracle.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: preserve user-modified affinities on interrupts
Kumar Galak noticed that commit
1840475676 (genirq: Expose default irq
affinity mask (take 3))
overrides an already set affinity setting across a free /
request_irq(). Happens e.g. with ifdown/ifup of a network device.
Change the logic to mark the affinities as set and keep them
intact. This also fixes the unlocked access to irq_desc in
irq_select_affinity() when called from irq_affinity_proc_write()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Clean up based on feedback from Andrew Morton and others:
- change to inline functions instead of macros
- add __init to bootmem method
- add a missing debug check
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, all existing users of cnt32_to_63() are fine since the CPU
architectures where it is used don't do read access reordering, and user
mode preemption is disabled already. It is nevertheless a good idea to
better elaborate usage requirements wrt preemption, and use an explicit
memory barrier on SMP to avoid different CPUs accessing the counter
value in the wrong order. On UP a simple compiler barrier is
sufficient.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an ACPI graphics device supports backlight brightness functions (cmp. with
latest ACPI spec Appendix B), let the ACPI video driver control backlight and
switch backlight control off in vendor specific ACPI drivers (asus_acpi,
thinkpad_acpi, eeepc, fujitsu_laptop, msi_laptop, sony_laptop, acer-wmi).
Currently it is possible to load above drivers and let both poke on the
brightness HW registers, the video and vendor specific ACPI drivers -> bad.
This patch provides the basic support to check for BIOS capabilities before
driver loading time. Driver specific modifications are in separate follow up
patches.
"acpi_backlight=vendor"
Prever vendor driver over ACPI driver for backlight.
"acpi_backlight=video" (default)
Prever ACPI driver over vendor driver for backlight.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Tune SD_MC_INIT the same way as SD_CPU_INIT:
unset SD_BALANCE_NEWIDLE, and set SD_WAKE_BALANCE.
This improves vmark by 5%:
vmark 132102 125968 125497 messages/sec avg 127855.66 .984
vmark 139404 131719 131272 messages/sec avg 134131.66 1.033
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
# *DOCUMENTATION*
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput(). Inside of the initial __scm_destroy() we
keep running the list until it is empty.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the hrtimer_add_expires_ns() function. It should take a 'u64 ns' argument,
but rather takes an 'unsigned long ns' argument - which might only be 32-bits.
On FRV, this results in the kernel locking up because hrtimer_forward() passes
the result of a 64-bit multiplication to this function, for which the compiler
discards the top 32-bits - something that didn't happen when ktime_add_ns() was
called directly.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>