plists are used with spinlocks and raw_spinlocks. Change the plist
debugging to handle both types.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
kernel_lock.c emits a warning because a raw spinlock function is used
with a spinlock. Convert BKL to raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
The name space hierarchy for the internal lock functions is now a bit
backwards. raw_spin* functions map to _spin* which use __spin*, while
we would like to have _raw_spin* and __raw_spin*.
_raw_spin* is already used by lock debugging, so rename those funtions
to do_raw_spin* to free up the _raw_spin* name space.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Now that the raw_spin name space is freed up, we can implement
raw_spinlock and the related functions which are used to annotate the
locks which are not converted to sleeping spinlocks in preempt-rt.
A side effect is that only such locks can be used with the low level
lock fsunctions which circumvent lockdep.
For !rt spin_* functions are mapped to the raw_spin* implementations.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Name space cleanup for rwlock functions. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Not strictly necessary for -rt as -rt does not have non sleeping
rwlocks, but it's odd to not have a consistent naming convention.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Further name space cleanup. No functional change
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Add a flag to let a platform ioremap memory regions marked as reserved.
This flag will be used later by the Nintendo Wii support code to allow
ioremapping the I/O region sitting between MEM1 and MEM2 and marked
as reserved RAM in the patch "wii: use both mem1 and mem2 as ram".
This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
As shown by the previous patch (6698e3472: "tty: Fix BKL taken under a
spinlock bug introduced in the BKL split") the BKL removal is prone to
some subtle issues, where removing the BKL in one place may in fact make
a previously nested BKL call the new outer call, and then prone to nasty
deadlocks with other spinlocks.
In general, we should never take the BKL while we're holding a spinlock,
so let's just add a "might_sleep()" to it (even though the BKL doesn't
technically sleep - at least not yet), and we'll get nice warnings the
next time this kind of problem happens during BKL removal.
Acked-and-Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix some typos and punctuation in comments
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The RCU_CPU_STALL_DETECTOR costs almost nothing and has located
some bugs that might otherwise have been difficult to track
down. Make it be default for the TREE RCU implementations.
The vmlinux size impact is limited (on 64-bit x86 defconfig):
text data bss dec hex filename
8440248 1260076 995588 10695912 a334e8 vmlinux.before
8440774 1260060 995588 10696422 a336e6 vmlinux.after
+526 bytes - acceptable default cost.
For RAM starved systems, TINY_RCU does not support CPU-stall detection
and is much smaller, but then again it is a uniprocessor...
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846162906-git-send-email->
[ v2: added image size calculations to the changelog ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't delete pending pages from the page-store tracking tree, but rather send
them for another write as they've presumably been updated.
Signed-off-by: David Howells <dhowells@redhat.com>
__fscache_write_page() attempts to load the radix tree preallocation pool for
the CPU it is on before calling radix_tree_insert(), as the insertion must be
done inside a pair of spinlocks.
Use of the preallocation pool, however, is contingent on the radix tree being
initialised without __GFP_WAIT specified. __fscache_acquire_cookie() was
passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.
The solution is to AND out __GFP_WAIT.
Additionally, the banner comment to radix_tree_preload() is altered to make
note of this prerequisite. Possibly there should be a WARN_ON() too.
Without this fix, I have seen the following recursive deadlock caused by
radix_tree_insert() attempting to allocate memory inside the spinlocked
region, which resulted in FS-Cache being called back into to release memory -
which required the spinlock already held.
=============================================
[ INFO: possible recursive locking detected ]
2.6.32-rc6-cachefs #24
---------------------------------------------
nfsiod/7916 is trying to acquire lock:
(&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
but task is already holding lock:
(&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
other info that might help us debug this:
5 locks held by nfsiod/7916:
#0: (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
#1: (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
#2: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
#3: (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache]
#4: (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache]
stack backtrace:
Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
Call Trace:
[<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3
[<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3
[<ffffffff8100e27d>] ? dump_trace+0x248/0x257
[<ffffffff8105ad70>] lock_acquire+0x57/0x6d
[<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
[<ffffffff8135467c>] _spin_lock+0x2c/0x3b
[<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
[<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
[<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache]
[<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs]
[<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs]
[<ffffffff81087ffb>] try_to_release_page+0x32/0x3b
[<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac
[<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
[<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31
[<ffffffff81093153>] shrink_inactive_list+0x392/0x67c
[<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
[<ffffffff810934ca>] shrink_list+0x8d/0x8f
[<ffffffff81093744>] shrink_zone+0x278/0x33c
[<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba
[<ffffffff8109453b>] try_to_free_pages+0x22e/0x392
[<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212
[<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf
[<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1
[<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c
[<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118
[<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c
[<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c
[<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache]
[<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
[<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs]
[<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs]
[<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc]
[<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc]
[<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc]
[<ffffffff810482e5>] worker_thread+0x1ef/0x2e2
[<ffffffff81048290>] ? worker_thread+0x19a/0x2e2
[<ffffffff81352433>] ? thread_return+0x3e/0x101
[<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc]
[<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34
[<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff810480f6>] ? worker_thread+0x0/0x2e2
[<ffffffff8104bd21>] kthread+0x7a/0x82
[<ffffffff8100beda>] child_rip+0xa/0x20
[<ffffffff8100b87c>] ? restore_args+0x0/0x30
[<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44
[<ffffffff8104bca7>] ? kthread+0x0/0x82
[<ffffffff8100bed0>] ? child_rip+0x0/0x20
Signed-off-by: David Howells <dhowells@redhat.com>
Doing the strcmp return value as
signed char __res = *cs - *ct;
is wrong for two reasons. The subtraction can overflow because __res
doesn't use a type big enough. Moreover the compared bytes should be
interpreted as unsigned char as specified by POSIX.
The same problem is fixed in strncmp.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add debugobject support to track the life time of work_structs.
While at it, remove duplicate definition of
INIT_DELAYED_WORK_ON_STACK().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
POWERPC doesn't expect it to be used.
This fixes the linux-next build failure reported by
Stephen Rothwell:
lib/swiotlb.c: In function 'setup_io_tlb_npages':
lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function)
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: peterz@infradead.org
LKML-Reference: <20091112000258F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the sys_sysctl is now a compatibility wrapper around
/proc/sys we can remove much of sysctl_check and reduce it
to a few remaining sanity checks. This completely decouples
it from the binary sysctl system call.
Little things like ensuring that the sysctl has not already
been registered are all that remain.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.
The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:
http://marc.info/?l=linux-kernel&m=125657444317079&w=2
The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.
This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.
The new x86 IOMMU initialization sequence are:
1. we initialize the swiotlb (and setting swiotlb to 1) in the case
of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
the boot option, we finish here.
2. we call the detection functions of all the IOMMUs
3. the detection function sets x86_init.iommu.iommu_init to the
IOMMU initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).
4. if the IOMMU initialization function doesn't need to swiotlb
then sets swiotlb to zero (e.g. the initialization is
sucessful).
5. if we find that swiotlb is set to zero, we free swiotlb
resource.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.
This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: tony.luck@intel.com
Cc: benh@kernel.crashing.org
LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
swiotlb_free() function frees all allocated memory for swiotlb.
We need to initialize swiotlb before IOMMU initialization (x86
and powerpc needs to allocate memory from bootmem allocator). If
IOMMU initialization is successful, we need to free swiotlb
resource (don't want to waste 64MB).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-8-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_SWIOTLB case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.
The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf
[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This adds support for printing struct resource type and flag information.
For example, "%pRt" looks like "[mem 0x80080000000-0x8008001ffff 64bit pref]",
and "%pRf" looks like "[mem 0xff5e2000-0xff5e2007 pref flags 0x1]".
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Print addresses (IO port numbers and memory addresses) in hex, but print
others (IRQs and DMA channels) in decimal. Only print the end if it's
different from the start.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The leading "0x" consumes field width, so leave space for it in addition to
the 4 or 8 hex digits. This means we'll print "0x0000-0x01df" rather than
"0x00-0x1df", for example.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When do_csum gets unaligned data, we really need to treat
the first byte as an even byte, not an odd byte, because
we swap the two halves later.
Found by Mike's checksum-selftest module.
Reported-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Mike Frysinger suggested that do_csum should be optional
so that an architecture can use the generic checksum code
but still provide an optimized fast-path for the most
critical function.
This can mean an implementation using inline assembly,
or in case of Alpha one using 64-bit arithmetic in C.
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The use of 'unsigned long' variables in the 32-bit part of do_csum()
is confusing at best, and potentially broken for long input on 64-bit
machines.
This changes the code to use 'unsigned int' instead, which makes
the code behave in the same (correct) way on both 32 and 64 bit
machines.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When PAE is enabled in the kernel configuration the size of
phys_addr_t differs from the size of a void pointer. The gcc
prints a warning about that in dma-debug code.
This patch fixes the warning by converting the output to
unsigned long long instead of a pointer.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
We don't need an explicit PPC64 in the DEBUG_PREEMPT dependancies as all
PPC platforms now support TRACE_IRQFLAGS_SUPPORT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Today I got:
[39648.224782] Registered led device: iwl-phy0::TX
[40676.545099] __ratelimit: 246 callbacks suppressed
[40676.545103] abcdef[23675]: segfault at 0 ...
as you can see the ratelimit message contains a function prefix.
Since this is always __ratelimit, this wont help much.
This patch changes __ratelimit and printk_ratelimit to print the
function name that calls ratelimit.
This will pinpoint the responsible function, as long as not several
different places call ratelimit with the same ratelimit state at
the same time. In that case we catch only one random function that
calls ratelimit after the wait period.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200910231458.11832.borntraeger@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Also increase the maximum possible kmemleak early log entries since
2000 are not sufficient on s390.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When using %*s, sscanf should honor conversion specifiers immediately
following the %*s. For example, the following code should find the
position of the end of the string "hello".
int end;
char buf[] = "hello world";
sscanf(buf, "%*s%n", &end);
printf("%d\n", end);
Ideally, sscanf would advance the fmt and str pointers the same as it
would without the *, but the code for that is rather complicated and is
not included in the patch.
Signed-off-by: Andy Spencer <andy753421@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we are calling the bkl tracepoint callbacks just before the
bkl lock/unlock operations, ie the tracepoint call is not inside a
lock_kernel() function but inside a lock_kernel() macro. Hence the
bkl trace event header must be included from smp_lock.h. This raises
some nasty circular header dependencies:
linux/smp_lock.h -> trace/events/bkl.h -> trace/define_trace.h
-> trace/ftrace.h -> linux/ftrace_event.h -> linux/hardirq.h
-> linux/smp_lock.h
This results in incomplete event declarations, spurious event
definitions and other kind of funny behaviours.
This is hardly fixable without ugly workarounds. So instead, we push
the file name, line number and function name as lock_kernel()
parameters, so that we only deal with the trace event header from
lib/kernel_lock.c
This adds two parameters to lock_kernel() and unlock_kernel() but
it should be fine wrt to performances because this pair dos not seem
to be called in fast paths.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
If the lzma/gzip decompressors are called with insufficient input data
(len > 0 & fill = NULL), they will attempt to call the fill function to
obtain more data, leading to a kernel oops.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add two events lock_kernel and unlock_kernel() to trace the bkl uses.
This opens the door for userspace tools to perform statistics about
the callsites that use it, dependencies with other locks (by pairing
the trace with lock events), use with recursivity and so on...
The {__reacquire,release}_kernel_lock() events are not traced because
these are called from schedule, thus the sched events are sufficient
to trace them.
Example of a trace:
hald-addon-stor-4152 [000] 165.875501: unlock_kernel: depth: 0, fs/block_dev.c:1358 __blkdev_put()
hald-addon-stor-4152 [000] 167.832974: lock_kernel: depth: 0, fs/block_dev.c:1167 __blkdev_get()
How to get the callsites that acquire it recursively:
cd /debug/tracing/events/bkl
echo "lock_depth > 0" > filter
firefox-4951 [001] 206.276967: unlock_kernel: depth: 1, fs/reiserfs/super.c:575 reiserfs_dirty_inode()
You can also filter by file and/or line.
v2: Use of FILTER_PTR_STRING attribute for files and lines fields to
make them traceable.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Jens Rosenboom noticed that a possibly unaligned const char*
is cast to a const struct in6_addr *.
Avoid this at the cost of a struct in6_addr copy on the stack.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Decouple kernel.h from ratelimit.h: the global declaration of
printk's ratelimit_state is not needed, and it leads to messy
circular dependencies due to ratelimit.h's (new) adding of a
spinlock_types.h include.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add kerneldoc annotations for function formals of type struct flex_array
and gfp_t which are currently lacking.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FLEX_ARRAY_INIT(element_size, total_nr_elements) cannot determine if
either parameter is valid, so flex arrays which are statically allocated
with this interface can easily become corrupted or reference beyond its
allocated memory.
This removes FLEX_ARRAY_INIT() as a struct flex_array initializer since no
initializer may perform the required checking. Instead, the array is now
defined with a new interface:
DEFINE_FLEX_ARRAY(name, element_size, total_nr_elements)
This may be prefixed with `static' for file scope.
This interface includes compile-time checking of the parameters to ensure
they are valid. Since the validity of both element_size and
total_nr_elements depend on FLEX_ARRAY_BASE_SIZE and FLEX_ARRAY_PART_SIZE,
the kernel build will fail if either of these predefined values changes
such that the array parameters are no longer valid.
Since BUILD_BUG_ON() requires compile time constants, several of the
static inline functions that were once local to lib/flex_array.c had to be
moved to include/linux/flex_array.h.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new function to the flex_array API:
int flex_array_shrink(struct flex_array *fa)
This function will free all unused second-level pages. Since elements are
now poisoned if they are not allocated with __GFP_ZERO, it's possible to
identify parts that consist solely of unused elements.
flex_array_shrink() returns the number of pages freed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Newly initialized flex_array's and/or flex_array_part's are now poisoned
with a new poison value, FLEX_ARRAY_FREE. It's value is similar to
POISON_FREE used in the various slab allocators, but is different to
distinguish between flex array's poisoned kmem and slab allocator poisoned
kmem.
This will allow us to identify flex_array_part's that only contain free
elements (and free them with an addition to the flex_array API). This
could also be extended in the future to identify `get' uses on elements
that have not been `put'.
If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided.
These elements are considered to be in-use since they have been
initialized.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new function to the flex_array API:
int flex_array_clear(struct flex_array *fa,
unsigned int element_nr)
This function will zero the element at element_nr in the flex_array.
Although this is equivalent to using flex_array_put() and passing a
pointer to zero'd memory, flex_array_clear() does not require such a
pointer to memory that would most likely need to be allocated on the
caller's stack which could be significantly large depending on
element_size.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>