Add support for extracting LZ4-compressed kernel images, as well as
LZ4-compressed ramdisk images in the kernel boot process.
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for LZ4 decompression in the Linux Kernel. LZ4 Decompression
APIs for kernel are based on LZ4 implementation by Yann Collet.
Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2
1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 20.1MB/s, 25.2MB/s(UA)
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s, 52.2MB/s(UA)
LZ4 6.5MB 86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].
But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones? This issue
had been discussed and came to the conclusion [2].
Russell King said that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
If we have a replacement one for one of these, then it should do exactly
that: replace it.
The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the
fastest decompressor in the Kernel). Therefore the "fast but may not be
small" compression title has clearly been taken by LZ4 [3].
[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347
LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository: http://code.google.com/p/lz4/
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Yann Collet <yann.collet.73@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some architectures need __c[lt]z[sd]i2() for __builtin_c[lt]z[ll] and
that causes a build failure. They can be implemented using the
fls()/__ffs() and overridden by linking arch-specific versions may not
be implemented yet.
This is required by "lib: add lz4 compressor module".
Reference: https://lkml.org/lkml/2013/4/18/603
Signed-off-by: Chanho Min <chanho.min@lge.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: hpa@linux.intel.com
The hard/softlockup and hung-task entries take up 6 lines
of screen real-estate when enabled. I bet folks don't
mess with these _that_ often, so move them in a group
down a level.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Same deal, take the printk-related things and hide them in a menu.
This takes another 4 items out of the top-level menu.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original posting:
http://lkml.kernel.org/r/20121214184208.D9E5804D@kernel.stglabs.ibm.com
There are quite a few of these, and we want to make sure that
there is one-stop-shopping for lock debugging.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original Post:
http://lkml.kernel.org/r/20121214184207.6E00DDEC@kernel.stglabs.ibm.com
Again, trying to come up with some common themes of the stuff in
the kernel hacking menu... There are quite a few options to
tweak compilation in some way, or perform extra compile-time
checks. Give them their own menu.
The diff here looks a bit funny... makes it look like I'm
moving debugfs even though I'm actually moving the options on
either side of it.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original posting:
http://lkml.kernel.org/r/20121214184206.FC11422F@kernel.stglabs.ibm.com
These runtime tests are great, except that there are a lot of them,
and they are very rarely needed. Give them their own menu so that
only the folks who need them will have to go looking for them.
Note that there are some other runtime tests that are not in here,
like for RCU or locking. This menu should only be used for tests
that do not have a more appropriate home.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original posting:
http://lkml.kernel.org/r/20121214184203.37E6C724@kernel.stglabs.ibm.com
There are a *LOT* of memory debugging options. They are just scattered
all over the "Kernel Hacking" menu. Sure, "memory debugging" is a very
vague term and it's going to be hard to make absolute rules about what
goes in here, but this has to be better than what we had before.
This does, however, leave out the architecture-specific memory
debugging options (like x86's DEBUG_SET_MODULE_RONX). There would need
to be some substantial changes to move those in here. Kconfig can not
easily mix arch-specific and generic options together: it really
requires a file per-architecture, and I think having an
arch/foo/Kconfig.debug-memory might be taking things a bit too far
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original posting:
http://lkml.kernel.org/r/20121214184202.F54094D9@kernel.stglabs.ibm.com
Several architectures have similar stack debugging config options.
They all pretty much do the same thing, some with slightly
differing help text.
This patch changes the architectures to instead enable a Kconfig
boolean, and then use that boolean in the generic Kconfig.debug
to present the actual menu option. This removes a bunch of
duplication and adds consistency across arches.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We print a dump stack after idr_remove warning. This is useful to find
the faulty piece of code. Let's do the same for ida_remove, as it would
be equally useful there.
[akpm@linux-foundation.org: convert the open-coded printk+dump_stack into WARN()]
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__this_cpu_write doesn't need to be protected by spinlock, AS we are doing
per cpu write with preempt disabled. And another reason to remove
__this_cpu_write outside of spinlock: __percpu_counter_sum is not an
accurate counter.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add functionality to serialize the output from dump_stack() to avoid
mangling of the output when dump_stack is called simultaneously from
multiple cpus.
[akpm@linux-foundation.org: fix comment indenting, avoid inclusion of <asm/> files - use <linux/> where possiblem fix uniprocessor build (__dump_stack undefined), remove unneeded ifdef around smp.h inclusion]
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reported-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to avoid making code that deals with printing both, IPv4 and
IPv6 addresses, unnecessary complicated as for example ...
if (sa.sa_family == AF_INET6)
printk("... %pI6 ...", ..sin6_addr);
else
printk("... %pI4 ...", ..sin_addr.s_addr);
... it would be better to introduce a format specifier that can deal
with those kind of situations internally; just as we have a "struct
sockaddr" for generic mapping into "struct sockaddr_in" or "struct
sockaddr_in6" as e.g. done in "union sctp_addr". Then, we could
reduce the above statement into something like:
printk("... %pIS ..", &sockaddr);
In case our pointer is NULL, pointer() then deals with that already at
an earlier point in time internally. While we're at it, support for both
%piS/%pIS, where 'S' stands for sockaddr, comes (almost) for free.
Additionally to that, postfix specifiers 'p', 'f' and 's' are supported
as suggested and initially implemented in 2009 by Joe Perches [1].
Handling of those additional specifiers orientate on the initial RFC that
was proposed. Also we support IPv6 compressed format specified by 'c' and
various other IPv4 extensions as stated in the documentation part.
Likely, there are many other areas than just SCTP in the kernel to make
use of this extension as well.
[1] http://patchwork.ozlabs.org/patch/31480/
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
CC: Joe Perches <joe@perches.com>
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Several drivers need font support independent of CONFIG_VT, cfr. commit
9cbce8d7e1dae0744ca4f68d62aa7de18196b6f4, "console/font: Refactor font
support code selection logic").
Hence move the fonts and their support logic from drivers/video/console/ to
its own library directory lib/fonts/.
This also allows to limit processing of drivers/video/console/Makefile to
CONFIG_VT=y again.
[Kevin Hilman <khilman@linaro.org>: Update arch/arm/boot/compressed/Makefile]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
When CONFIG_PROVE_LOCKING is not enabled, more tests are
expected to pass unexpectedly, but there no tests that should
start to fail that pass with CONFIG_PROVE_LOCKING enabled.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113151.4001.77963.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Injects EDEADLK conditions at pseudo-random interval, with
exponential backoff up to UINT_MAX (to ensure that every lock
operation still completes in a reasonable time).
This way we can test the wound slowpath even for ww mutex users
where contention is never expected, and the ww deadlock
avoidance algorithm is only needed for correctness against
malicious userspace. An example would be protecting kernel
modesetting properties, which thanks to single-threaded X isn't
really expected to contend, ever.
I've looked into using the CONFIG_FAULT_INJECTION
infrastructure, but decided against it for two reasons:
- EDEADLK handling is mandatory for ww mutex users and should
never affect the outcome of a syscall. This is in contrast to -ENOMEM
injection. So fine configurability isn't required.
- The fault injection framework only allows to set a simple
probability for failure. Now the probability that a ww mutex acquire
stage with N locks will never complete (due to too many injected
EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
operations for the completely uncontended case would be O(exp(N)).
The per-acuiqire ctx exponential backoff solution choosen here only
results in O(log N) overhead due to injection and so O(log N * N)
lock operations. This way we can fail with high probability (and so
have good test coverage even for fancy backoff and lock acquisition
paths) without running into patalogical cases.
Note that EDEADLK will only ever be injected when we managed to
acquire the lock. This prevents any behaviour changes for users
which rely on the EALREADY semantics.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Wound/wait mutexes are used when other multiple lock
acquisitions of a similar type can be done in an arbitrary
order. The deadlock handling used here is called wait/wound in
the RDBMS literature: The older tasks waits until it can acquire
the contended lock. The younger tasks needs to back off and drop
all the locks it is currently holding, i.e. the younger task is
wounded.
For full documentation please read Documentation/ww-mutex-design.txt.
References: https://lwn.net/Articles/548909/
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
FAULT_INJECTION_STACKTRACE_FILTER selects FRAME_POINTER but
that symbol is not available for MIPS.
Fixes the following problem on a randconfig:
warning: (LOCKDEP && FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP &&
KMEMCHECK) selects FRAME_POINTER which has unmet direct dependencies
(DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 || SUPERH || BLACKFIN ||
MN10300 || METAG) || ARCH_WANT_FRAME_POINTERS)
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Acked-by: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5441/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When building a kernel using 'make -s', I expect to see an empty output,
except for build warnings and errors. The build_OID_registry code
always prints one line when run, which is not helpful to most people
building the kernels, and which makes it harder to automatically
check for build warnings.
Let's just remove the one line output.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Correct spelling typo in printk within various drivers.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
percpu-refcount was incorrectly using preempt_disable/enable() for RCU
critical sections against call_rcu(). 6a24474da8 ("percpu-refcount:
consistently use plain (non-sched) RCU") fixed it by converting the
preepmtion operations with rcu_read_[un]lock() citing that there isn't
any advantage in using sched-RCU over using the usual one; however,
rcu_read_[un]lock() for the preemptible RCU implementation -
CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
more expensive than preempt_disable/enable().
In a contrived microbench which repeats the followings,
- percpu_ref_get()
- copy 32 bytes of data into percpu buffer
- percpu_put_get()
- copy 32 bytes of data into percpu buffer
rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
about 15% when compared to using sched-RCU.
As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications. Convert to RCU-sched.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed. Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.
For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.
While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().
v2: Patch description rephrased to emphasize that tryget() may
continue to succeed on some CPUs after kill() returns as suggested
by Kent.
v3: Function comment in percpu_ref_kill_and_confirm() updated warning
people to not depend on the implied RCU grace period from the
confirm callback as it's an implementation detail.
Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously. The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.
This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release. To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.
v2: Explain the weird name and usage restriction in the function
comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Two small changes.
* Unlike most init functions, percpu_ref_init() allocates memory and
may fail. Let's mark it with __must_check in case the caller
forgets.
* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
dereference @ref->pcpu_count, which can be misleading. The pointer
is guaranteed to be valid and visible and can't change underneath
the function. Drop ACCESS_ONCE().
Signed-off-by: Tejun Heo <tj@kernel.org>
* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
postfix for types and the type is gonna be used for a different type
of callback too.
* Add @ARG to function comments.
* Drop unnecessary and unaligned indentation from percpu_ref_init()
function comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
For 'while' looping, need stop when 'nbytes == 0', or will cause issue.
('nbytes' is size_t which is always bigger or equal than zero).
The related warning: (with EXTRA_CFLAGS=-W)
lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unlike kobject_set_name(), the kset_create_and_add() interface does not
provide a way to use format strings, so make sure that the interface
cannot be abused accidentally. It looks like all current callers use
static strings, so there's no existing flaw.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since we have at least one user of this function outside of CONFIG_NET
scope, we have to provide this function independently. The proposed
solution is to move it under lib/net_utils.c with corresponding
configuration variable and select wherever it is needed.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
lib/crc-t10dif.c:42:1-3: WARNING: PTR_RET can be used
Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR
Generated by: coccinelle/api/ptr_ret.cocci
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The cmpxchg() was just to ensure the debug check didn't race, which was
a bit excessive. The caller is supposed to do the appropriate
synchronization, which means percpu_ref_kill() can just do a simple
store.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This implements a refcount with similar semantics to
atomic_get()/atomic_dec_and_test() - but percpu.
It also implements two stage shutdown, as we need it to tear down the
percpu counts. Before dropping the initial refcount, you must call
percpu_ref_kill(); this puts the refcount in "shutting down mode" and
switches back to a single atomic refcount with the appropriate
barriers (synchronize_rcu()).
It's also legal to call percpu_ref_kill() multiple times - it only
returns true once, so callers don't have to reimplement shutdown
synchronization.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tejun Heo <tj@kernel.org>
debugfs currently lack the ability to create attributes
that set/get atomic_t values.
This patch adds support for this through a new
debugfs_create_atomic_t() function.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The umul_ppmm() macro for parisc uses the xmpyu assembler statement
which does calculation via a floating point register.
But usage of floating point registers inside the Linux kernel are not
allowed and gcc will stop compilation due to the -mdisable-fpregs
compiler option.
Fix this by disabling the umul_ppmm() and udiv_qrnnd() macros. The
mpilib will then use the generic built-in implementations instead.
Signed-off-by: Helge Deller <deller@gmx.de>
There is a race between klist_remove and klist_release. klist_remove
uses a local var waiter saved on stack. When klist_release calls
wake_up_process(waiter->process) to wake up the waiter, waiter might run
immediately and reuse the stack. Then, klist_release calls
list_del(&waiter->list) to change previous
wait data and cause prior waiter thread corrupt.
The patch fixes it against kernel 3.9.
Signed-off-by: wang, biao <biao.wang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CRC T10 DIF is calculated using the crypto transform framework, we
wrap the crc_t10dif function call to utilize it. This allows us to
take advantage of any accelerated CRC T10 DIF transform that is
plugged into the crypto framework.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!
That function is only present with CONFIG_NET. Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.
In addition, commit 6d4f0139d6 added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.
socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
- make warning smp-safe
- result of atomic _unless_zero functions should be checked by caller
to avoid use-after-free error
- trivial whitespace fix.
Link: https://lkml.org/lkml/2013/4/12/391
Tested: compile x86, boot machine and run xfstests
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
[ Removed line-break, changed to use WARN_ON_ONCE() - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.
In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.
Thanks to Peter Hurley for noticing this possible race.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers. But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue. This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting. To that end, we grant the first reader lock
before counting how many more readers are queued.
We also require some adjustments to the wake_type semantics.
RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock. This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.
Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available. We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is mostly for cleanup value:
- We don't need several gotos to handle the case where the first
waiter is a writer. Two simple tests will do (and generate very
similar code).
- In the remainder of the function, we know the first waiter is a reader,
so we don't have to double check that. We can use do..while loops
to iterate over the readers to wake (generates slightly better code).
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>