It fixes an accounting bug where we would continue accumulating runtime
even though the bandwidth control is disabled. This would lead to very long
throttle periods once bandwidth control gets turned on again.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
CC kernel/rcuclassic.o
kernel/rcuclassic.c: In function '__rcu_process_callbacks':
kernel/rcuclassic.c:561: error: 'flags' undeclared (first use in this function)
kernel/rcuclassic.c:561: error: (Each undeclared identifier is reported only once
kernel/rcuclassic.c:561: error: for each function it appears in.)
Declare missing variable flags.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Given that the rcp->lock is now acquired from call_rcu(), which can be
invoked from irq-disable regions, all acquisitions need to disable irqs.
The following patch fixes this.
Although I don't have any reason to believe that this is the cause of
Yinghai's oops, it does need to be fixed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the redundant definition of ACCESS_ONCE() from rcupreempt.c in
favor of the one in compiler.h. Also merge the comment header from
rcupreempt.c's definition into that in compiler.h.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since f82b217e35 lockdep can output spurious
warnings related to hwirqs due to hardirq_off shrinkage from int to bit-sized
flag. Guard it with double negation to fix the warning.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If CONFIG_PROVE_LOCKING not defined, then no dependency information
is available.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David reported that his Niagra spend a little too much time in
tg_shares_up(), which considering he has a large cpu count makes sense.
So scale the ratelimit value with the number of cpus like we do for
other controls as well.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the initialization of the RCU trace module, if
rcupreempt_debugfs_init() fails, we never free the the trace buffer.
This patch frees the trace buffer in case the debugfs fails.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
m68k fails to build with these functions inlined in completion.h. Move
them out of line into sched.c and export them to avoid this problem.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/kexec.c: In function 'kernel_kexec':
kernel/kexec.c:1506: warning: value computed is not used
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch simplifies the locking and memory-barrier usage in the Classic
RCU grace-period-detection mechanism, incorporating Lai Jiangshan's
feedback from the earlier version (http://lkml.org/lkml/2008/8/1/400
and http://lkml.org/lkml/2008/8/3/43). Passed 10 hours of
rcutorture concurrent with CPUs being put online and taken offline on
a 128-hardware-thread Power machine. My apologies to whoever in the
Eastern Hemisphere was planning to use this machine over the Western
Hemisphere night, but it was sitting idle and...
So this is ready for tip/core/rcu.
This patch is in preparation for moving to a hierarchical
algorithm to allow the very large SMP machines -- requested by some
people at OLS, and there seem to have been a few recent patches in the
4096-CPU direction as well. The general idea is to move to a much more
conservative concurrency design, then apply a hierarchy to reduce
contention on the global lock by a few orders of magnitude (larger
machines would see greater reductions). The reason for taking a
conservative approach is that this code isn't on any fast path.
Prototype in progress.
This patch is against the linux-tip git tree (tip/core/rcu). If you
wish to test this against 2.6.26, use the following set of patches:
http://www.rdrop.com/users/paulmck/patches/2.6.26-ljsimp-1.patchhttp://www.rdrop.com/users/paulmck/patches/2.6.26-ljsimpfix-3.patch
The first patch combines commits 5127bed588
and 3cac97cbb1 from Lai Jiangshan
<laijs@cn.fujitsu.com>, and the second patch contains my changes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
One small change needed to keep from flooding the console when one
CPU notices that another is AWOL. Unless I am missing something subtle.
Otherwise the cleanups look good!
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we hot-unplug a cpu and rebuild the sched-domain, all cpus will be
detatched. Alex observed the case where a runqueue was stealing bandwidth
from an already disabled runqueue to satisfy its own needs.
Stop this by skipping over already disabled runqueues.
Reported-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
the target process if that is not the current process and it is trying to
change its own flags in a different way at the same time.
__capable() is using neither atomic ops nor locking to protect t->flags. This
patch removes __capable() and introduces has_capability() that doesn't set
PF_SUPERPRIV on the process being queried.
This patch further splits security_ptrace() in two:
(1) security_ptrace_may_access(). This passes judgement on whether one
process may access another only (PTRACE_MODE_ATTACH for ptrace() and
PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
current is the parent.
(2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only,
and takes only a pointer to the parent process. current is the child.
In Smack and commoncap, this uses has_capability() to determine whether
the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
This does not set PF_SUPERPRIV.
Two of the instances of __capable() actually only act on current, and so have
been changed to calls to capable().
Of the places that were using __capable():
(1) The OOM killer calls __capable() thrice when weighing the killability of a
process. All of these now use has_capability().
(2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
whether the parent was allowed to trace any process. As mentioned above,
these have been split. For PTRACE_ATTACH and /proc, capable() is now
used, and for PTRACE_TRACEME, has_capability() is used.
(3) cap_safe_nice() only ever saw current, so now uses capable().
(4) smack_setprocattr() rejected accesses to tasks other than current just
after calling __capable(), so the order of these two tests have been
switched and capable() is used instead.
(5) In smack_file_send_sigiotask(), we need to allow privileged processes to
receive SIGIO on files they're manipulating.
(6) In smack_task_wait(), we let a process wait for a privileged process,
whether or not the process doing the waiting is privileged.
I've tested this with the LTP SELinux and syscalls testscripts.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
This is an updated version of my previous cpuset patch on top of
the latest mainline git.
The patch fixes CPU hotplug handling issues in the current cpusets code.
Namely circular locking in rebuild_sched_domains() and unsafe access to
the cpu_online_map in the cpuset cpu hotplug handler.
This version includes changes suggested by Paul Jackson (naming, comments,
style, etc). I also got rid of the separate workqueue thread because it is
now safe to call get_online_cpus() from workqueue callbacks.
Here are some more details:
rebuild_sched_domains() is the only way to rebuild sched domains
correctly based on the current cpuset settings. What this means
is that we need to be able to call it from different contexts,
like cpu hotplug for example.
Also latest scheduler code in -tip now calls rebuild_sched_domains()
directly from functions like arch_reinit_sched_domains().
In order to support that properly we need to rework cpuset locking
rules to avoid circular dependencies, which is what this patch does.
New lock nesting rules are explained in the comments.
We can now safely call rebuild_sched_domains() from virtually any
context. The only requirement is that it needs to be called under
get_online_cpus(). This allows cpu hotplug handlers and the scheduler
to call rebuild_sched_domains() directly.
The rest of the cpuset code now offloads sched domains rebuilds to
a workqueue (async_rebuild_sched_domains()).
This version of the patch addresses comments from the previous review.
I fixed all miss-formated comments and trailing spaces.
I also factored out the code that builds domain masks and split up CPU and
memory hotplug handling. This was needed to simplify locking, to avoid unsafe
access to the cpu_online_map from mem hotplug handler, and in general to make
things cleaner.
The patch passes moderate testing (building kernel with -j 16, creating &
removing domains and bringing cpus off/online at the same time) on the
quad-core2 based machine.
It passes lockdep checks, even with preemptable RCU enabled.
This time I also tested in with suspend/resume path and everything is working
as expected.
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Acked-by: Paul Jackson <pj@sgi.com>
Cc: menage@google.com
Cc: a.p.zijlstra@chello.nl
Cc: vegard.nossum@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fix:
kernel/built-in.o: In function `lockdep_stats_show':
lockdep_proc.c:(.text+0x3cb2f): undefined reference to `lockdep_count_forward_deps'
kernel/built-in.o: In function `l_show':
lockdep_proc.c:(.text+0x3d02b): undefined reference to `lockdep_count_forward_deps'
lockdep_proc.c:(.text+0x3d047): undefined reference to `lockdep_count_backward_deps'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch /proc/irq/*/smp_affinity , /proc/irq/default_smp_affinity to
seq_files.
cat(1) reads with 1024 chunks by default, with high enough NR_CPUS, there
will be -EINVAL.
As side effect, there are now two less users of the ->read_proc interface.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 doesn't support the additional_cpus kernel parameter anymore since a
long time. So we better update the code and documentation to reflect
that.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Nick Piggin (1):
> > generic-ipi: fix stack and rcu interaction bug in
> > smp_call_function_mask()
>
> I'm still not 100% sure that I have this patch right... I might have seen
> a lockup trace implicating the smp call function path... which may have
> been due to some other problem or a different bug in the new call function
> code, but if some more people can take a look at it before merging?
OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
data properly in the alloc && wait case. Secondly, I wasn't resetting
CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
wait).
After those fixes, the patch boots and runs with the kmalloc commented
out (so it always executes the slowpath).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The kernel has this really nice facility where if you put "initcall_debug"
on the kernel commandline, it'll print which function it's going to
execute just before calling an initcall, and then after the call completes
it will
1) print if it had an error code
2) checks for a few simple bugs (like leaving irqs off)
and
3) print how long the init call took in milliseconds.
While trying to optimize the boot speed of my laptop, I have been loving
number 3 to figure out what to optimize... ... and then I wished that
the same thing was done for module loading.
This patch makes the module loader use this exact same functionality; it's
a logical extension in my view (since modules are just sort of late
binding initcalls anyway) and so far I've found it quite useful in finding
where things are too slow in my boot.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
we should be doing hlock checking (check=1).
This causes a false warning and a lockdep self-disable.
Rectify this.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mark Langsdorf reported:
> One of my co-workers noticed that the powernow-k8
> driver no longer restarts when a CPU core is
> hot-disabled and then hot-enabled on AMD quad-core
> systems.
>
> The following comands work fine on 2.6.26 and fail
> on 2.6.27-rc1:
>
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 1 > /sys/devices/system/cpu/cpu3/online
> find /sys -name cpufreq
>
> For 2.6.26, the find will return a cpufreq
> directory for each processor. In 2.6.27-rc1,
> the cpu3 directory is missing.
>
> After digging through the code, the following
> logic is failing when the core is hot-enabled
> at runtime. The code works during the boot
> sequence.
>
> cpumask_t = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
> if (smp_processor_id() != cpu)
> return -ENODEV;
So set the CPU active before calling the CPU_ONLINE notifier chain,
there are a handful of notifiers that use set_cpus_allowed().
This fix also solves the problem with x86-microcode. I've sent
alternative patches for microcode, but as this "rely on
set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
assumption seems to be more broad than what we thought, perhaps this fix
should be applied.
With this patch we define that by the moment CPU_ONLINE is being sent,
a 'cpu' is online and ready for tasks to be migrated onto it.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Reported-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask() with
> wait parameter.
>
> The actual oops looked like
>
> [ 11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
> [ 11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
> [ 11.278155] PGD 202063 PUD 0
> [ 11.278576] Oops: 0010 [1] SMP
> [ 11.279006] CPU 5
> [ 11.279336] Modules linked in:
> [ 11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f #290
> [ 11.280039] RIP: 0010:[<ffff8802ffffffff>] [<ffff8802ffffffff>] 0xffff8802ffffffff
> [ 11.280692] RSP: 0018:ffff88027f1f7f70 EFLAGS: 00010086
> [ 11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
> [ 11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
> [ 11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
> [ 11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
> [ 11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
> [ 11.282502] FS: 0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
> [ 11.283096] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
> [ 11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
> [ 11.284936] Stack: ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
> [ 11.285802] ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
> [ 11.286599] ffffffff8020bdd6 ffff88027f1f3e40 <EOI> ffff88027f1f3ef8 0000000000000000
> [ 11.287120] Call Trace:
> [ 11.287768] <IRQ> [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
> [ 11.288354] [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
> [ 11.288744] [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
> [ 11.289030] <EOI> [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
> [ 11.289380] [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
> [ 11.289760] [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
> [ 11.290051] [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
> [ 11.290338] [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
> [ 11.290723] [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
> [ 11.291010]
> [ 11.291287]
> [ 11.291654] Code: Bad RIP value.
> [ 11.292041] RIP [<ffff8802ffffffff>] 0xffff8802ffffffff
> [ 11.292380] RSP <ffff88027f1f7f70>
> [ 11.292741] CR2: ffff8802ffffffff
> [ 11.310951] ---[ end trace 137c54d525305f1c ]---
>
> The problem is with the following sequence of events:
>
> - CPU A calls smp_call_function_mask() for CPU B with wait parameter
> - CPU A sets up the call_function_data on the stack and does an rcu add to
> call_function_queue
> - CPU A waits until the WAIT flag is cleared
> - CPU B gets the call function interrupt and starts going through the
> call_function_queue
> - CPU C also gets some other call function interrupt and starts going through
> the call_function_queue
> - CPU C, which is also going through the call_function_queue, starts referencing
> CPU A's stack, as that element is still in call_function_queue
> - CPU B finishes the function call that CPU A set up and as there are no other
> references to it, rcu deletes the call_function_data (which was from CPU A
> stack)
> - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
> - CPU A which was waiting on the flag continues executing and the stack
> contents change
>
> - CPU C is still in rcu_read section accessing the CPU A's stack sees
> inconsistent call_funation_data and can try to execute
> function with some random pointer, causing stack corruption for A
> (by clearing the bits in mask field) and oops.
Nice debugging work.
I'd suggest something like the attached (boot tested) patch as the simple
fix for now.
I expect the benefits from the less synchronized, multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Defer commit 6d299f1b53 to the next release.
Testing of the tip/sched/clock tree revealed a mysql+oltp regression
which bisection eventually traced back to this commit in mainline.
Pertinent test results: Three run sysbench averages, throughput units
in read/write requests/sec.
clients 1 2 4 8 16 32 64
6e0534f 9646 17876 34774 33868 32230 30767 29441
2.6.26.1 9112 17936 34652 33383 31929 30665 29232
6d299f1 9112 14637 28370 33339 32038 30762 29204
Note: subsequent commits hide the majority of this regression until you
apply the clock fixes, at which time it reemerges at full magnitude.
We cannot see anything bad about the change itself so we defer it to the
next release until this problem is fully analysed.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this is a diagnostic patch for Classic RCU.
The approach is to record a timestamp at the beginning
of the grace period (in rcu_start_batch()), then have
rcu_check_callbacks() complain if:
1. it is running on a CPU that has holding up grace periods for
a long time (say one second). This will identify the culprit
assuming that the culprit has not disabled hardware irqs,
instruction execution, or some such.
2. it is running on a CPU that is not holding up grace periods,
but grace periods have been held up for an even longer time
(say two seconds).
It is enabled via the default-off CONFIG_DEBUG_RCU_STALL kernel parameter.
Rather than exponential backoff, it backs off to once per 30 seconds.
My feeling upon thinking on it was that if you have stalled RCU grace
periods for that long, a few extra printk() messages are probably the
least of your worries...
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Witbrodt <dawitbro@sbcglobal.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the names were too generic:
drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Solve this by marking the classes as unused and not printing information
about the unused classes.
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Expose the new lock protection lock.
This can be used to annotate places where we take multiple locks of the
same class and avoid deadlocks by always taking another (top-level) lock
first.
NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Fri, 2008-08-01 at 16:26 -0700, Linus Torvalds wrote:
> On Fri, 1 Aug 2008, David Miller wrote:
> >
> > Taking more than a few locks of the same class at once is bad
> > news and it's better to find an alternative method.
>
> It's not always wrong.
>
> If you can guarantee that anybody that takes more than one lock of a
> particular class will always take a single top-level lock _first_, then
> that's all good. You can obviously screw up and take the same lock _twice_
> (which will deadlock), but at least you cannot get into ABBA situations.
>
> So maybe the right thing to do is to just teach lockdep about "lock
> protection locks". That would have solved the multi-queue issues for
> networking too - all the actual network drivers would still have taken
> just their single queue lock, but the one case that needs to take all of
> them would have taken a separate top-level lock first.
>
> Never mind that the multi-queue locks were always taken in the same order:
> it's never wrong to just have some top-level serialization, and anybody
> who needs to take <n> locks might as well do <n+1>, because they sure as
> hell aren't going to be on _any_ fastpaths.
>
> So the simplest solution really sounds like just teaching lockdep about
> that one special case. It's not "nesting" exactly, although it's obviously
> related to it.
Do as Linus suggested. The lock protection lock is called nest_lock.
Note that we still have the MAX_LOCK_DEPTH (48) limit to consider, so anything
that spills that it still up shit creek.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Most the free-standing lock_acquire() usages look remarkably similar, sweep
them into a new helper.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of using a per-rq lock class, use the regular nesting operations.
However, take extra care with double_lock_balance() as it can release the
already held rq->lock (and therefore change its nesting class).
So what can happen is:
spin_lock(rq->lock); // this rq subclass 0
double_lock_balance(rq, other_rq);
// release rq
// acquire other_rq->lock subclass 0
// acquire rq->lock subclass 1
spin_unlock(other_rq->lock);
leaving you with rq->lock in subclass 1
So a subsequent double_lock_balance() call can try to nest a subclass 1
lock while already holding a subclass 1 lock.
Fix this by introducing double_unlock_balance() which releases the other
rq's lock, but also re-sets the subclass for this rq's lock to 0.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this can be used to reset a held lock's subclass, for arbitrary-depth
iterated data structures such as trees or lists which have per-node
locks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some arch's can't handle sched_clock() being called too early - delay
this until sched_clock_init() has been called.
Reported-by: Bill Gatliff <bgat@billgatliff.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
CC: Russell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A documentation cleanup patch. With a minor tweak to clarify units for
kbs.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: mark gross <mgross@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_order() takes byte-sized input, not a page-granular one.
Irrespective of this fix I'm inclined to believe that this doesn't work
right anyway - bitmap_allocate_region() has an implicit assumption of
'pos' being suitable for 'order', which this function doesn't seem to
enforce (and since it's being called with a byte-granular value there's no
reason to believe that the callers would make sure device_addr is passed
accordingly - it's also not documented that way).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dmitry Baryshkov <dbaryshkov@gmail.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>