When a DNS resolver key is instantiated with an error indication, attempts to
read that key will result in an oops because user_read() is expecting there to
be a payload - and there isn't one [CVE-2011-1076].
Give the DNS resolver key its own read handler that returns the error cached in
key->type_data.x[0] as an error rather than crashing.
Also make the kenter() at the beginning of dns_resolver_instantiate() limit the
amount of data it prints, since the data is not necessarily NUL-terminated.
The buggy code was added in:
commit 4a2d789267
Author: Wang Lei <wang840925@gmail.com>
Date: Wed Aug 11 09:37:58 2010 +0100
Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2]
This can trivially be reproduced by any user with the following program
compiled with -lkeyutils:
#include <stdlib.h>
#include <keyutils.h>
#include <err.h>
static char payload[] = "#dnserror=6";
int main()
{
key_serial_t key;
key = add_key("dns_resolver", "a", payload, sizeof(payload),
KEY_SPEC_SESSION_KEYRING);
if (key == -1)
err(1, "add_key");
if (keyctl_read(key, NULL, 0) == -1)
err(1, "read_key");
return 0;
}
What should happen is that keyctl_read() reports error 6 (ENXIO) to the user:
dns-break: read_key: No such device or address
but instead the kernel oopses.
This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands
as both of those cut the data down below the NUL termination that must be
included in the data. Without this dns_resolver_instantiate() will return
-EINVAL and the key will not be instantiated such that it can be read.
The oops looks like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f
PGD 3bdf8067 PUD 385b9067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq
CPU 0
Modules linked in:
Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468 /DG965RY
RIP: 0010:[<ffffffff811b99f7>] [<ffffffff811b99f7>] user_read+0x4f/0x8f
RSP: 0018:ffff88003bf47f08 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378
RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000
R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1
FS: 00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090)
Stack:
ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000
ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000
00000000004005a0 00007fffba368060 0000000000000000 0000000000000000
Call Trace:
[<ffffffff811b708e>] keyctl_read_key+0xac/0xcf
[<ffffffff811b7c07>] sys_keyctl+0x75/0xb6
[<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b
Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed <41> 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48
RIP [<ffffffff811b99f7>] user_read+0x4f/0x8f
RSP <ffff88003bf47f08>
CR2: 0000000000000010
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
cc: Wang Lei <wang840925@gmail.com>
Signed-off-by: James Morris <jmorris@namei.org>
Clean up entries in 00-INDEX: drop files that have been removed.
Reported-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the "log_buf_len" description to use [KMG] syntax for the
buffer size.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The '[KMG]' suffix is commonly described after a number of kernel
parameter values documentation. Explicitly state its semantics.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the PCI ID to support the internal temperature sensor of the
AMD "Llano" and "Brazos" processor families.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: stable@kernel.org # ca86828: x86, AMD, PCI: Add AMD northbridge PCI device
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'. The former is the more prominent one. The latter is
mostly used by workqueue and in a few other odd places. Unify the
spelling to 'freezable'.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
On systems where the temperature sensor is actually used, the BIOS is
likely to have locked the alarm registers. In that case, all writes
through the corresponding sysfs files would be silently ignored.
To prevent this, detect the locks and make the affected sysfs files
read-only.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
The documentation lists standard numbers and chip names in excruciating
detail, but that's all it does. To help mere mortals in deciding
whether to enable this driver, mention what this sensor is for and in
which systems it might be found.
Also add a link to the actual JC 42.4 specification.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
A few wrong usages of drm_device, which should be drm_driver.
Signed-off-by: Xiao Jiang <jgq516@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit 9830fcd6f6.
The ARM dt support has not been merged yet; this documentation update
was premature.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
-I (include path) should be specified for host builds.
This one was overlooked somehow. Fixes
https://bugzilla.kernel.org/show_bug.cgi?id=25902
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-by: Alexey Salmin <alexey.salmin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
noswapaccount couldn't be used to control memsw for both on/off cases so
we have added swapaccount[=0|1] parameter. This way we can turn the
feature in two ways noswapaccount resp. swapaccount=0. We have kept the
original noswapaccount but I think we should remove it after some time as
it just makes more command line parameters without any advantages and also
the code to handle parameters is uglier if we want both parameters.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Requested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was some confusion at LCA as to why the sysctl tcp_ecn took one
of three values when it was documented as a Boolean. This patch fixes
the documentation.
Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Version 15 of schedstats was introduced in:
67aa0f767af4: sched: remove unused fields from struct rq
and removed three unused counters in sched_yield(). Update
the documentation.
Signed-off-by: Javi Merino <cibervicho@gmail.com>
Cc: henrix@sapo.pt
Cc: rdunlap@xenotime.net
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1296515496-8229-1-git-send-email-cibervicho@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
32 and 64 bit powerpc support has been merged for a while now, but
the booting-without-of.txt document still describes 32 bit as not
supporting multiplatform, which is no longer true. This patch fixes
the documentation.
Also remove references to powerpc-specific details outside of section
I in preparation to add details for other architectures.
v3: cleaned up a lot more powerpc-isms and updated text to reflect current
usage conventions.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The device tree is used by more than just PowerPC. Make the documentation
directory available to all.
v2: reorganized files while moving to create arch and driver specific
directories.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
In ntfs_mft_record_alloc() when mapping the new extent mft record with
map_extent_mft_record() we overwrite @m with the return value and on
error, we then try to use the old @m but that is no longer there as @m
now contains an error code instead so we crash when dereferencing the
error code as if it were a pointer.
The simple fix is to use a temporary variable to store the return value
thus preserving the original @m for later use. This is a backport from
the commercial Tuxera-NTFS driver and is well tested...
Thanks go to Julia Lawall for pointing this out (whilst I had fixed it
in the commercial driver I had failed to fix it in the Linux kernel).
Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bonding documentation used to provide configuration
details and examples for initscripts and sysconfig only.
This patch describe the third possible configuration:
/etc/network/interfaces.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a chip bug (errata 50.2.6.3 & 50.3.5.3 in
"AT91SAM9263 Preliminary 6249H-ATARM-27-Jul-09") the contents of mailbox
0 may be send under certain conditions (even if disabled or in rx mode).
The workaround in the errata suggests not to use the mailbox and load it
with an unused identifier.
This patch implements the second part of the workaround. A sysfs entry
"mb0_id" is introduced. While the interface is down it can be used to
configure the can_id of mailbox 0. The default value id 0x7ff.
In order to use an extended can_id add the CAN_EFF_FLAG (0x80000000U)
to the can_id. Example:
- standard id 0x7ff:
echo 0x7ff > /sys/class/net/can0/mb0_id
- extended id 0x1fffffff:
echo 0x9fffffff > /sys/class/net/can0/mb0_id
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
For the Documentation-part:
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Fix move of drivers/serial/ to drivers/tty/, where it broke
one of the docbook files:
docproc: drivers/serial/serial_core.c: No such file or directory
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
BugLink: http://bugs.launchpad.net/bugs/701271
This new model, named "asus", is identical to the "hp_laptop" model,
except for the location of the internal mic, which is at pin 0x1a.
It is used for Asus K52JU and Lenovo G560.
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
All architectures are finally converted. Remove the cruft.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Rusty Russell wrote:
> Ah, it will appear as /dev/hwrng. It's a weirdness of Linux that our actual
> hardware number generators are not wired up to /dev/random...
Reflected this in the documentation, thanks :-)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
PROT_EXEC seems to be completely unnecessary (as the lguest binary
never executes there), and will allow it to work with SELinux (and
more importantly, PaX :-) as they can/do forbid writable and
executable mappings.
Also, map PROT_NONE guard pages at start and end of guest memory for extra
paranoia.
I changed the length check to addr + size > guest_limit because >= is wrong
(addr of 0, size of getpagesize() with a guest_limit of getpagesize() would
false positive).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Renamed has_new to is_new.
Drivers can use the is_new field to determine if a new value was specified
for a control. The v4l2_ctrl_handler_setup() must always set this to 1 since
the setup has to force a full update of all controls.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds basic support for LM94 to the LM93 driver. LM94 specific
sensors and features are not supported.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
This patch is only for RFC purpose of ASoC documentation updates which
match with current ASoC codes with documents. Mostly modify features
are modified to be sync with changes after multi-component patches.
Signed-off-by: Seungwhan Youn <sw.youn@samsung.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.
This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself. follow_automount() will then
do the addition.
This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer. The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.
To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2. One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.
d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used. The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.
This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call
d_manage() directly. d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).
autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk. The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).
The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory. This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.
The ->d_manage() dentry operation:
int (*d_manage)(struct path *path, bool mounting_here);
takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.
->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep. However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.
Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.
follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).
A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()). The new follow_down() calls d_manage() as appropriate. It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage(). follow_down()
ignores automount points so that it can be used to mount on them.
__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself. That would allow the autofs
daemon to continue on in rcu-walk mode.
Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked. It can always be set again when necessary.
==========================
WHAT THIS MEANS FOR AUTOFS
==========================
Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.
autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it. This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.
The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:
mkdir S ffffffff8014e05a 0 32580 24956
Call Trace:
[<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
[<ffffffff80127f7d>] avc_has_perm+0x46/0x58
[<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
[<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
[<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
[<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
[<ffffffff80057a2f>] lookup_create+0x46/0x80
[<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:
automount D ffffffff8014e05a 0 32581 1 32561
Call Trace:
[<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
[<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
[<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
[<ffffffff800e6d55>] do_rmdir+0x77/0xde
[<ffffffff8005d229>] tracesys+0x71/0xe0
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
which means that the system is deadlocked.
This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().
Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation. The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).
This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.
The ->d_automount() dentry operation:
struct vfsmount *(*d_automount)(struct path *mountpoint);
takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted. If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar). If there's a collision with
another automount attempt, NULL should be returned. If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned. In any other case, an error code should be
returned.
The ->d_automount() operation is called with no locks held and may sleep. At
this point the pathwalk algorithm will be in ref-walk mode.
Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints. It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful. The path will be updated to point to the mounted
filesystem if a successful automount took place.
__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()). This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).
__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.
follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".
I've also extracted the mount/don't-mount logic from autofs4 and included it
here. It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname. If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.
I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points. This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary. It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.
[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]
Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
LIO target is a full featured in-kernel target framework with the
following feature set:
High-performance, non-blocking, multithreaded architecture with SIMD
support.
Advanced SCSI feature set:
* Persistent Reservations (PRs)
* Asymmetric Logical Unit Assignment (ALUA)
* Protocol and intra-nexus multiplexing, load-balancing and failover (MC/S)
* Full Error Recovery (ERL=0,1,2)
* Active/active task migration and session continuation (ERL=2)
* Thin LUN provisioning (UNMAP and WRITE_SAMExx)
Multiprotocol target plugins
Storage media independence:
* Virtualization of all storage media; transparent mapping of IO to LUNs
* No hard limits on number of LUNs per Target; maximum LUN size ~750 TB
* Backstores: SATA, SAS, SCSI, BluRay, DVD, FLASH, USB, ramdisk, etc.
Standards compliance:
* Full compliance with IETF (RFC 3720)
* Full implementation of SPC-4 PRs and ALUA
Significant code cleanups done by Christoph Hellwig.
[jejb: fix up for new block bdev exclusive interface. Minor fixes from
Randy Dunlap and Dan Carpenter.]
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Subjct: Revert memory cgroup dirty_ratio Documentation.
The commit ece72400c2 adds documentation
for memcg's dirty ratio. But the function is not implemented yet.
Remove the documentation for avoiding confusing users.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We'd like to be able to oom_score_adj a process up/down as it
enters/leaves the foreground. Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
or its inherited value at fork. Assuming the thread that has forked it
has oom_score_adj of 0, each process could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.
Alternative considered:
* a setuid binary
* a daemon with CAP_SYS_RESOURCE
Since you don't wan't all processes to be able to reduce their oom_adj, a
setuid or daemon implementation would be complex. The alternatives also
have much higher overhead.
This patch updated from original patch based on feedback from David
Rientjes.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently there is no way to find whether a process has locked its pages
in memory or not. And which of the memory regions are locked in memory.
Add a new field "Locked" to export this information via the smaps file.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 0fdae42d36, which
wasn't really supposed to go in, and causes lots of annoying warnings.
Quoth Andrew:
"Complete brainfart - I meant to drop that patch ages ago."
Quoth Greg:
"Ick, yeah, that patch isn't ok to go in as-is, all of the callers
need to be fixed up first, which is what I thought we had agreed on..."
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is the skeleton for the DM target that will be
the bridge from DM to MD (initially RAID456 and later RAID1). It
provides a way to use device-mapper interfaces to the MD RAID456
drivers.
As with all device-mapper targets, the nominal public interfaces are the
constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
and STATUSTYPE_TABLE). The CTR table looks like the following:
1: <s> <l> raid \
2: <raid_type> <#raid_params> <raid_params> \
3: <#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>
Line 1 contains the standard first three arguments to any device-mapper
target - the start, length, and target type fields. The target type in
this case is "raid".
Line 2 contains the arguments that define the particular raid
type/personality/level, the required arguments for that raid type, and
any optional arguments. Possible raid types include: raid4, raid5_la,
raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (again, raid1 is
planned for the future.) The list of required and optional parameters
is the same for all the current raid types. The required parameters are
positional, while the optional parameters are given as key/value pairs.
The possible parameters are as follows:
<chunk_size> Chunk size in sectors.
[[no]sync] Force/Prevent RAID initialization
[rebuild <idx>] Rebuild the drive indicated by the index
[daemon_sleep <ms>] Time between bitmap daemon work to clear bits
[min_recovery_rate <kB/sec/disk>] Throttle RAID initialization
[max_recovery_rate <kB/sec/disk>] Throttle RAID initialization
[max_write_behind <value>] See '-write-behind=' (man mdadm)
[stripe_cache <sectors>] Stripe cache size for higher RAIDs
Line 3 contains the list of devices that compose the array in
metadata/data device pairs. If the metadata is stored separately, a '-'
is given for the metadata device position. If a drive has failed or is
missing at creation time, a '-' can be given for both the metadata and
data drives for a given position.
Examples:
# RAID4 - 4 data drives, 1 parity
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
0 1960893648 raid \
raid4 1 2048 \
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
# RAID4 - 4 data drives, 1 parity (no metadata devices)
# Chunk size of 1MiB, force RAID initialization,
# min recovery rate at 20 kiB/sec/disk
0 1960893648 raid \
raid4 4 2048 min_recovery_rate 20 sync\
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
Performing a 'dmsetup table' should display the CTR table used to
construct the mapping (with possible reordering of optional
parameters).
Performing a 'dmsetup status' will yield information on the state and
health of the array. The output is as follows:
1: <s> <l> raid \
2: <raid_type> <#devices> <1 health char for each dev> <resync_ratio>
Line 1 is standard DM output. Line 2 is best shown by example:
0 1960893648 raid raid4 5 AAAAA 2/490221568
Here we can see the RAID type is raid4, there are 5 devices - all of
which are 'A'live, and the array is 2/490221568 complete with recovery.
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds generic multikey handling to be used
in following patch for Loop-AES mode compatibility.
This patch extends mapping table to optional keycount and
implements generic multi-key capability.
With more keys defined the <key> string is divided into
several <keycount> sections and these are used for tfms.
The tfm is used according to sector offset
(sector 0->tfm[0], sector 1->tfm[1], sector N->tfm[N modulo keycount])
(only power of two values supported for keycount here).
Because of tfms per-cpu allocation, this mode can be take
a lot of memory on large smp systems.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Max Vozeler <max@hinterhof.net>
This integrates the XZ decompression code to the x86 pre-boot code.
mkpiggy.c is updated to reserve about 32 KiB more buffer safety margin for
kernel decompression. It is done unconditionally for all decompressors to
keep the code simpler.
The XZ decompressor needs around 30 KiB of heap, so the heap size is
increased to 32 KiB on both x86-32 and x86-64.
Documentation/x86/boot.txt is updated to list the XZ magic number.
With the x86 BCJ filter in XZ, XZ-compressed x86 kernel tends to be a few
percent smaller than the equivalent LZMA-compressed kernel.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>