GFP_NOIO is often used for idr_alloc() inside preloaded section as the
allocation mask doesn't really matter. If the idr tree needs to be
expanded, idr_alloc() first tries to allocate using the specified
allocation mask and if it fails falls back to the preloaded buffer. This
order prevent non-preloading idr_alloc() users from taking advantage of
preloading ones by using preload buffer without filling it shifting the
burden of allocation to the preload users.
Unfortunately, this allowed/expected-to-fail kmem_cache allocation ends up
generating spurious slab lowmem warning before succeeding the request from
the preload buffer.
This patch makes idr_layer_alloc() add __GFP_NOWARN to the first
kmem_cache attempt and try kmem_cache again w/o __GFP_NOWARN after
allocation from preload_buffer fails so that lowmem warning is generated
if not suppressed by the original @gfp_mask.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5dc49c75a2 ("decompressors: make the default XZ_DEC_* config
match the selected architecture") added
default y if POWERPC
to lib/xz/Kconfig. But there is no Kconfig symbol POWERPC. The most
general Kconfig symbol for the powerpc architecture is PPC. So let's
use that.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that all in-kernel users are converted to ues the new alloc
interface, mark the old interface deprecated. We should be able to
remove these in a few releases.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix new kernel-doc warnings in idr:
Warning(include/linux/idr.h:113): No description found for parameter 'idr'
Warning(include/linux/idr.h:113): Excess function parameter 'idp' description in 'idr_find'
Warning(lib/idr.c:232): Excess function parameter 'id' description in 'sub_alloc'
Warning(lib/idr.c:232): Excess function parameter 'id' description in 'sub_alloc'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
idr_find(), idr_remove() and idr_replace() used to silently ignore the
sign bit and perform lookup with the rest of the bits. The weird behavior
has been changed such that negative IDs are treated as invalid. As the
behavior change was subtle, WARN_ON_ONCE() was added in the hope of
determining who's calling idr functions with negative IDs so that they can
be examined for problems.
Up until now, all two reported cases are ID number coming directly from
userland and getting fed into idr_find() and the warnings seem to cause
more problems than being helpful. Drop the WARN_ON_ONCE()s.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: <markus@trippelsdorf.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add [!]METAG to a couple of Kconfig dependencies in lib/Kconfig.debug.
Don't allow stack utilization instrumentation on metag, and allow
building with frame pointers.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Paul E. McKenney" <paul.mckenney@linaro.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Added missing Kconfig option KDB_CONTINUE_CATASTROPHIC which lead to a dead
ifdef block in kernel/debug/kdb/kdb_main.c:73-75.
The code using KDB_CONTINUE_CATASTROPHIC was originally introduced in
commit '5d5314d6795f3c1c0f415348ff8c51f7de042b77' by Jason Wessel.
This patchset ("kdb: core for kgdb back end (1 of 2)")
added platform independent part of kdb to the linux kernel.
The Kernel option however, even though it had the same options and
behaviour on all supported architectures, was part of the x86 and
ia64 patchset of KDB and therefore not pulled into the mainline kernel tree.
I actually took the originally written Kconfig by
Keith Owens <kaos@sgi.com> (2003-06-20 according to KDB changelog)
and changed it to reflect the correct behaviour,
as the KDUMP patchset is not part of the kernel and the expected
functionality is missing from it.
Signed-off-by: Robert Obermeier <obbi89@googlemail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kfifo_alloc() and kfifo_init() to alloc at least the requested number
of elements. Since the kfifo operates on power of 2 the request size will
be rounded up to the next power of two.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Until recently, when an negative ID is specified, idr functions used to
ignore the sign bit and proceeded with the operation with the rest of
bits, which is bizarre and error-prone. The behavior recently got changed
so that negative IDs are treated as invalid but we're triggering
WARN_ON_ONCE() on negative IDs just in case somebody was depending on the
sign bit being ignored, so that those can be detected and fixed easily.
We only need this for a while. Explain why WARN_ON_ONCE()s are there and
that they can be removed later.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While idr lookup isn't a particularly heavy operation, it still is too
substantial to use in hot paths without worrying about the performance
implications. With recent changes, each idr_layer covers 256 slots
which should be enough to cover most use cases with single idr_layer
making lookup hint very attractive.
This patch adds idr->hint which points to the idr_layer which
allocated an ID most recently and the fast path lookup becomes
if (look up target's prefix matches that of the hinted layer)
return hint->ary[ID's offset in the leaf layer];
which can be inlined.
idr->hint is set to the leaf node on idr_fill_slot() and cleared from
free_layer().
[andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a field which carries the prefix of ID the idr_layer covers. This
will be used to implement lookup hint.
This patch doesn't make use of the new field and doesn't introduce any
behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, idr->bitmap is declared as an unsigned long which restricts
the number of bits an idr_layer can contain. All bitops can handle
arbitrary positive integer bit number and there's no reason for this
restriction.
Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
unsigned long.
* idr_layer->bitmap is now an array. '&' dropped from params to
bitops.
* Replaced "== IDR_FULL" tests with bitmap_full() and removed
IDR_FULL.
* Replaced find_next_bit() on ~bitmap with find_next_zero_bit().
* Replaced "bitmap = 0" with bitmap_clear().
This patch doesn't (or at least shouldn't) introduce any behavior
changes.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
whole positive integer range, it's defined as 0x7fffffff or INT_MAX.
Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
They basically mask off the sign bit and operate on the rest, so if
the caller, by accident, passes in a negative number, the sign bit
will be masked off and the remaining part will be used as if that was
the input, which is worse than crashing.
The constant is visible in idr.h and there are several users in the
kernel.
* drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()
Basically used to test if adap->nr is a negative number which isn't
-1 and returns -EINVAL if so. idr_alloc() already has negative
@start checking (w/ WARN_ON_ONCE), so this can go away.
* drivers/infiniband/core/cm.c:cm_alloc_id()
drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()
Used to wrap cyclic @start. Can be replaced with max(next, 0).
Note that this type of cyclic allocation using idr is buggy. These
are prone to spurious -ENOSPC failure after the first wraparound.
* fs/super.c:get_anon_bdev()
The ID allocated from ida is masked off before being tested whether
it's inside valid range. ida allocated ID can never be a negative
number and the masking is unnecessary.
Update idr_*() functions to fail with -EINVAL when negative @id is
specified and update other MAX_IDR_MASK users as described above.
This leaves MAX_IDR_MASK without any user, remove it and relocate
other MAX_IDR_* constants to lib/idr.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Wolfram Sang <wolfram@the-dreams.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most functions in idr fail to deal with the high bits when the idr
tree grows to the maximum height.
* idr_get_empty_slot() stops growing idr tree once the depth reaches
MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to
cover the whole range. The function doesn't even notice that it
didn't grow the tree enough and ends up allocating the wrong ID
given sufficiently high @starting_id.
For example, on 64 bit, if the starting id is 0x7fffff01,
idr_get_empty_slot() will grow the tree 5 layer deep, which only
covers the 30 bits and then proceed to allocate as if the bit 30
wasn't specified. It ends up allocating 0x3fffff01 without the bit
30 but still returns 0x7fffff01.
* __idr_remove_all() will not remove anything if the tree is fully
grown.
* idr_find() can't find anything if the tree is fully grown.
* idr_for_each() and idr_get_next() can't iterate anything if the tree
is fully grown.
Fix it by introducing idr_max() which returns the maximum possible ID
given the depth of tree and replacing the id limit checks in all
affected places.
As the idr_layer pointer array pa[] needs to be 1 larger than the
maximum depth, enlarge pa[] arrays by one.
While this plugs the discovered issues, the whole code base is
horrible and in desparate need of rewrite. It's fragile like hell,
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current idr interface is very cumbersome.
* For all allocations, two function calls - idr_pre_get() and
idr_get_new*() - should be made.
* idr_pre_get() doesn't guarantee that the following idr_get_new*()
will not fail from memory shortage. If idr_get_new*() returns
-EAGAIN, the caller is expected to retry pre_get and allocation.
* idr_get_new*() can't enforce upper limit. Upper limit can only be
enforced by allocating and then freeing if above limit.
* idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping
around MAX_IDR_FREE idr_layers. The memory consumed per idr is
under two pages but it makes it difficult to make idr_layer larger.
This patch implements the following new set of allocation functions.
* idr_preload[_end]() - Similar to radix preload but doesn't fail.
The first idr_alloc() inside preload section can be treated as if it
were called with @gfp_mask used for idr_preload().
* idr_alloc() - Allocate an ID w/ lower and upper limits. Takes
@gfp_flags and can be used w/o preloading. When used inside
preloaded section, the allocation mask of preloading can be assumed.
If idr_alloc() can be called from a context which allows sufficiently
relaxed @gfp_mask, it can be used by itself. If, for example,
idr_alloc() is called inside spinlock protected region, preloading can
be used like the following.
idr_preload(GFP_KERNEL);
spin_lock(lock);
id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);
spin_unlock(lock);
idr_preload_end();
if (id < 0)
error;
which is much simpler and less error-prone than idr_pre_get and
idr_get_new*() loop.
The new interface uses per-pcu idr_layer buffer and thus the number of
idr's in the system doesn't affect the amount of memory used for
preloading.
idr_layer_alloc() is introduced to handle idr_layer allocations for
both old and new ID allocation paths. This is a bit hairy now but the
new interface is expected to replace the old and the internal
implementation eventually will become simpler.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move slot filling to idr_fill_slot() from idr_get_new_above_int() and
make idr_get_new_above() directly call it. idr_get_new_above_int() is
no longer needed and removed.
This will be used to implement a new ID allocation interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
exception conditions internally. The return value is later translated
to errno values using _idr_rc_to_errno().
This is confusing. Drop the custom ones and consistently use -EAGAIN
for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
"ran out of ID space".
Due to the weird memory preloading mechanism, [ra]_get_new*() return
-EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
-EAGAIN on those interface functions. They'll eventually be cleaned
up and the translations will go away.
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Move idr_for_each_entry() definition next to other idr related
definitions.
* Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().
This changes the implementation of idr_get_new() but the new
implementation is trivial. This patch doesn't introduce any
functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was only one legitimate use of idr_remove_all() and a lot more of
incorrect uses (or lack of it). Now that idr_destroy() implies
idr_remove_all() and all the in-kernel users updated not to use it,
there's no reason to keep it around. Mark it deprecated so that we can
later unexport it.
idr_remove_all() is made an inline function calling __idr_remove_all()
to avoid triggering deprecated warning on EXPORT_SYMBOL().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
idr is silly in quite a few ways, one of which is how it's supposed to
be destroyed - idr_destroy() doesn't release IDs and doesn't even whine
if the idr isn't empty. If the caller forgets idr_remove_all(), it
simply leaks memory.
Even ida gets this wrong and leaks memory on destruction. There is
absoltely no reason not to call idr_remove_all() from idr_destroy().
Nobody is abusing idr_destroy() for shrinking free layer buffer and
continues to use idr after idr_destroy(), so it's safe to do remove_all
from destroy.
In the whole kernel, there is only one place where idr_remove_all() is
legitimiately used without following idr_destroy() while there are quite
a few places where the caller forgets either idr_remove_all() or
idr_destroy() leaking memory.
This patch makes idr_destroy() call idr_destroy_all() and updates the
function description accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The iteration logic of idr_get_next() is borrowed mostly verbatim from
idr_for_each(). It walks down the tree looking for the slot matching
the current ID. If the matching slot is not found, the ID is
incremented by the distance of single slot at the given level and
repeats.
The implementation assumes that during the whole iteration id is aligned
to the layer boundaries of the level closest to the leaf, which is true
for all iterations starting from zero or an existing element and thus is
fine for idr_for_each().
However, idr_get_next() may be given any point and if the starting id
hits in the middle of a non-existent layer, increment to the next layer
will end up skipping the same offset into it. For example, an IDR with
IDs filled between [64, 127] would look like the following.
[ 0 64 ... ]
/----/ |
| |
NULL [ 64 ... 127 ]
If idr_get_next() is called with 63 as the starting point, it will try
to follow down the pointer from 0. As it is NULL, it will then try to
proceed to the next slot in the same level by adding the slot distance
at that level which is 64 - making the next try 127. It goes around the
loop and finds and returns 127 skipping [64, 126].
Note that this bug also triggers in idr_for_each_entry() loop which
deletes during iteration as deletions can make layers go away leaving
the iteration with unaligned ID into missing layers.
Fix it by ensuring proceeding to the next slot doesn't carry over the
unaligned offset - ie. use round_up(id + 1, slot_distance) instead of
id += slot_distance.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For better code reuse use the newly added page iterator to iterate
through the pages. The offset, length within the page is still
calculated by the mapping iterator as well as the actual mapping. Idea
from Tejun Heo.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an iterator to walk through a scatter list a page at a time starting
at a specific page offset. As opposed to the mapping iterator this is
meant to be small, performing well even in simple loops like collecting
all pages on the scatterlist into an array or setting up an iommu table
based on the pages' DMA address.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A misplaced #endif causes link errors related to pcim_*() functions.
This is because pcim_*() functions are related to CONFIG_PCI option,
however these are not related to CONFIG_HAS_IOPORT option. Therefore,
when CONFIG_PCI is enabled and CONFIG_HAS_IOPORT is not enabled, it makes
link errors related to pcim_*() functions as below:
drivers/ata/libata-sff.c:3233: undefined reference to `pcim_iomap_regions'
drivers/ata/libata-sff.c:3238: undefined reference to `pcim_iomap_table'
drivers/built-in.o: In function `ata_pci_sff_init_host':
drivers/ata/libata-sff.c:2318: undefined reference to `pcim_iomap_regions'
drivers/ata/libata-sff.c:2329: undefined reference to `pcim_iomap_table
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Greg KH <greg@kroah.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the defautl XZ_DEC_* config symbol to match the configured
architecture. It is perfectly legitimate to support multiple XZ BCJ
filters for different architectures (e.g.: to mount foreign squashfs/xz
compressed filesystems), it is however more natural not to select them all
by default, but only the one matching the configured architecture.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Acked-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the XZ_DEC_* depedencey on CONFIG_EXPERT as recommended by Lasse
Colin.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Acked-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Group all architecture-specific BCJ filter configuration symbols under an
if XZ_BCJ / endif statement.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Acked-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
match_number() has return values of -ENOMEM, -EINVAL and -ERANGE. So, for
all the functions calling match_number, the return value should include
these values. Fix up the comments to reflect the correct values.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the %pa format specifier for printing a phys_addr_t type and its
derivative types (such as resource_size_t), since the physical address
size on some platforms can vary based on build options, regardless of
the native integer type.
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: Rob Landley <rob@landley.net>
Cc: George Spelvin <linux@horizon.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_EXPERT doesn't really make sense, and hides it unintentionally.
Remove superfluous "default n" pointed out by Ingo as well.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit updates the kernel LZO code to the current upsteam version
which features a significant speed improvement - benchmarking the Calgary
and Silesia test corpora typically shows a doubled performance in
both compression and decompression on modern i386/x86_64/powerpc machines.
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Rename the source file to match the function name and thereby
also make room for a possible future even slightly faster
"non-safe" decompressor version.
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
We (Linux Kernel Performance project) found a regression
introduced by commit:
5a505085f0 mm/rmap: Convert the struct anon_vma::mutex to an rwsem
which converted all anon_vma::mutex locks rwsem write locks.
The semantics are the same, but the behavioral difference is
quite huge in some cases. After investigating it we found the
root cause: mutexes support lock stealing while rwsems don't.
Here is the link for the detailed regression report:
https://lkml.org/lkml/2013/1/29/84
Ingo suggested adding write lock stealing to rwsems:
"I think we should allow lock-steal between rwsem writers - that
will not hurt fairness as most rwsem fairness concerns relate to
reader vs. writer fairness"
And here is the rwsem-spinlock version.
With this patch, we got a double performance increase in one
test box with following aim7 workfile:
FILESIZE: 1M
POOLSIZE: 10M
10 fork_test
/usr/bin/time output w/o patch /usr/bin/time_output with patch
-- Percent of CPU this job got: 369% Percent of CPU this job got: 537%
Voluntary context switches: 640595016 Voluntary context switches: 157915561
We got a 45% increase in CPU usage and saved about 3/4 voluntary context switches.
Reported-by: LKP project <lkp@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1359716356-23865-1-git-send-email-yuanhan.liu@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To make the lockdep selftest working on RT we need to convert the
spinlock tests to a raw spinlock. Otherwise we cannot run the irq
context checks. For mainline this is just annotational as spinlocks
are mapped to raw_spinlocks anyway.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Link: http://lkml.kernel.org/r/1334559716-18447-2-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 5a505085f0 ("mm/rmap: Convert the struct anon_vma::mutex
to an rwsem") changed struct anon_vma::mutex to an rwsem, which
caused aim7 fork_test performance to drop by 50%.
Yuanhan Liu did the following excellent analysis:
https://lkml.org/lkml/2013/1/29/84
and found that the regression is caused by strict, serialized,
FIFO sequential write-ownership of rwsems. Ingo suggested
implementing opportunistic lock-stealing for the front writer
task in the waitqueue.
Yuanhan Liu implemented lock-stealing for spinlock-rwsems,
which indeed recovered much of the regression - confirming
the analysis that the main factor in the regression was the
FIFO writer-fairness of rwsems.
In this patch we allow lock-stealing to happen when the first
waiter is also writer. With that change in place the
aim7 fork_test performance is fully recovered on my
Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines.
Reported-by: lkp@linux.intel.com
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: paul.gortmaker@windriver.com
Link: https://lkml.org/lkml/2013/1/29/84
Link: http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex.shi@intel.com
[ Small stylistic fixes, updated changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arches can have more efficient implementation of these routines
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Remove MIN, MAX and ABS macros that are duplicates kernel's native
implementation.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
In existing use case, copying of the decoded data is unnecessary in
pkcs_1_v1_5_decode_emsa. It is just enough to get pointer to the message.
Removing copying and extra buffer allocation.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
digsig_verify_rsa() does not free kmalloc'ed buffer returned by
mpi_get_buffer().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or AMD could setup properly, swiotlb buffer
will be freed.
The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only, or with memmap to limit mem under 4G.
for example: memmap=4095M$1M to remove memory under 4G.
According to Eric, add _nopanic version and no_iotlb_memory to fail
map single later if swiotlb is still needed.
-v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
-v3: make swiotlb_init to be notpanic, but will affect:
arm64, ia64, powerpc, tile, unicore32, x86.
-v4: cleanup swiotlb_init by removing swiotlb_init_with_default_size.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-36-git-send-email-yinghai@kernel.org
Reviewed-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Cc: linux-mips@linux-mips.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Tiny RCU has historically omitted RCU CPU stall warnings in order to
reduce memory requirements, however, lack of these warnings caused
Thomas Gleixner some debugging pain recently. Therefore, this commit
adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps
the memory footprint small, while still enabling CPU stall warnings
in kernels built to enable them.
Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON
config variable to simplify #if expressions.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The RCU-related debugging Kconfig options are in two different places,
and consume too much screen real estate. This commit therefore
consolidates them into their own menu.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The ERR_PTR() and IS_ERR() macros used by the devm_ioremap_resource()
function are defined in the linux/err.h header. On ARM this seems to be
pulled in by one of the other headers but the build fails at least on
OpenRISC.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The devm_request_and_ioremap() function is very useful and helps avoid a
whole lot of boilerplate. However, one issue that keeps popping up is
its lack of a specific error code to determine which of the steps that
it performs failed. Furthermore, while the function gives an example and
suggests what error code to return on failure, a wide variety of error
codes are used throughout the tree.
In an attempt to fix these problems, this patch adds a new function that
drivers can transition to. The devm_ioremap_resource() returns a pointer
to the remapped I/O memory on success or an ERR_PTR() encoded error code
on failure. Callers can check for failure using IS_ERR() and determine
its cause by extracting the error code using PTR_ERR().
devm_request_and_ioremap() is implemented as a wrapper around the new
API and return NULL on failure as before. This ensures that backwards
compatibility is maintained until all users have been converted to the
new API, at which point the old devm_request_and_ioremap() function
should be removed.
A semantic patch is included which can be used to convert from the old
devm_request_and_ioremap() API to the new devm_ioremap_resource() API.
Some non-trivial cases may require manual intervention, though.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The option allows you to remove TTY and compile without errors. This
saves space on systems that won't support TTY interfaces anyway.
bloat-o-meter output is below.
The bulk of this patch consists of Kconfig changes adding "depends on
TTY" to various serial devices and similar drivers that require the TTY
layer. Ideally, these dependencies would occur on a common intermediate
symbol such as SERIO, but most drivers "select SERIO" rather than
"depends on SERIO", and "select" does not respect dependencies.
bloat-o-meter output comparing our previous minimal to new minimal by
removing TTY. The list is filtered to not show removed entries with awk
'$3 != "-"' as the list was very long.
add/remove: 0/226 grow/shrink: 2/14 up/down: 6/-35356 (-35350)
function old new delta
chr_dev_init 166 170 +4
allow_signal 80 82 +2
static.__warned 143 142 -1
disallow_signal 63 62 -1
__set_special_pids 95 94 -1
unregister_console 126 121 -5
start_kernel 546 541 -5
register_console 593 588 -5
copy_from_user 45 40 -5
sys_setsid 128 120 -8
sys_vhangup 32 19 -13
do_exit 1543 1526 -17
bitmap_zero 60 40 -20
arch_local_irq_save 137 117 -20
release_task 674 652 -22
static.spin_unlock_irqrestore 308 260 -48
Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Reviewed-by: Jamey Sharp <jamey@minilop.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ma noted that dynamic-debug is silent about many query errors, so add
pr_err()s to explain those errors, and tweak a few others. Also parse
flags 1st, so that match-spec errs are slightly clearer.
CC: Jianpeng Ma <majianpeng@gmail.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce print_hex_dump_debug() that can be dynamically controlled, similar to
pr_debug.
Also, make print_hex_dump_bytes() dynamically controlled
Implement only 'p' flag (_DPRINTK_FLAGS_PRINT) to keep it simple since hex dump prints
multiple lines and long prefix would impact readability.
To provide line/file etc. information, use pr_debug or similar
before/after print_hex_dump_debug()
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>