Impact: prevent false positive WARN_ON() in clockevents_program_event()
clock_was_set() changes the base->offset of CLOCK_REALTIME and
enforces the reprogramming of the clockevent device to expire timers
which are based on CLOCK_REALTIME. If the clock change is large enough
then the subtraction of the timer expiry value and base->offset can
become negative which triggers the warning in
clockevents_program_event().
Check the subtraction result and set a negative value to 0.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: fix CPU hotplug hang on Power6 testbox
On architectures that support offlining all cpus (at least powerpc/pseries),
hot-unpluging the tick_do_timer_cpu can result in a system hang.
This comes from the fact that if the cpu going down happens to be the
cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
cpu is dead (via the CPU_DEAD notification), we're left without ticks,
jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
That's particularly the case for the cpu looping in __cpu_die() waiting
for the dying cpu to be dead.
This patch addresses this by having the tick_do_timer_cpu handover happen
earlier during the CPU_DYING notification. For this, a new clockevent
notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
in hrtimer_cpu_notify().
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: avoid timer IRQ hanging slow systems
While using the function graph tracer on a virtualized system, the
hrtimer_interrupt can hang the system on an infinite loop.
This can be caused in several situations:
- the hardware is very slow and HZ is set too high
- something intrusive is slowing the system down (tracing under emulation)
... and the next clock events to program are always before the current time.
This patch implements a reasonable compromise: if such a situation is
detected, we share the CPUs time in 1/4 to process the hrtimer interrupts.
This is enough to let the system running without serious starvation.
It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ
with function graph tracer launched. On both cases, the clock events were
increased until about 25 ms periodic ticks, which means 40 HZ.
So we change a hard to debug hang into a warning message and a system that
still manages to limp along.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clean up the comments
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix rare runtime deadlock
There are a few sites that do:
spin_lock_irq(&foo)
hrtimer_start(&bar)
__run_hrtimer(&bar)
func()
spin_lock(&foo)
which obviously deadlocks. In order to avoid this, never call __run_hrtimer()
from hrtimer_start*() context, but instead defer this to softirq context.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
No need for a smp function call, which is likely to run on the same
CPU anyway. We can just call hrtimers_peek_ahead() in the interrupts
disabled section of migrate_hrtimers().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
kernel/hrtimer.c: In function 'hrtimer_cpu_notify':
kernel/hrtimer.c:1574: warning: unused variable 'dcpu'
Introduced by commit 37810659ea
("hrtimer: removing all ur callback modes, fix hotplug") from the
timers. dcpu is only used if CONFIG_HOTPLUG_CPU is set.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Provide a peek ahead function that assumes irqs disabled, allows for micro
optimizations.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
<linux/irq.h> can be removed and should be, because:
- hrtimer doesn't use any irq feature.
- <linux/irq.h> shouldn't be include from generic code.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this warning:
kernel/hrtimer.c: In function ‘hrtimer_cpu_notify’:
kernel/hrtimer.c:1574: warning: unused variable ‘dcpu’
is caused because 'dcpu' is only used in the CONFIG_HOTPLUG_CPU case.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Ingo, this addition fixes the hotplug issue on my machine
And because we're all human...
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix hrtimer locking (reported by lockdep) in the CPU hotplug case
This addition fixes the hotplug locking issue on my machine
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, move all hrtimer processing into hardirq context
This is an attempt at removing some of the hrtimer complexity by
reducing the number of callback modes to 1.
This means that all hrtimer callback functions will be ran from HARD-irq
context.
I went through all the 30 odd hrtimer callback functions in the kernel
and saw only one that I'm not quite sure of, which is the one in
net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
Furthermore, the hrtimer core now calls callbacks directly with IRQs
disabled in case you try to enqueue an expired timer. If this timer is a
periodic timer (which should use hrtimer_forward() to advance its time)
then it might be possible to end up in an inf. recursive loop due to the
fact that hrtimer_forward() doesn't round up to the next timer
granularity, and therefore keeps on calling the callback - obviously
this needs a fix.
Aside from that, this seems to compile and actually boot on my dual core
test box - although I'm sure there are some bugs in, me not hitting any
makes me certain :-)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix incorrect locking triggered during hotplug-intense stress-tests
While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline,
we queue them on the cb_pending list, so that they won't go
stale.
Thus, when the callbacks of the timers run from the softirq context,
they could run into potential deadlocks, since these callbacks
assume that they're running with irq's disabled, thereby annoying
lockdep!
Fix this by emulating hardirq context while running these callbacks from
the hrtimer softirq.
=================================
[ INFO: inconsistent lock state ]
2.6.27 #2
--------------------------------
inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes:
(&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
{in-hardirq-W} state was registered at:
[<c014103c>] __lock_acquire+0x549/0x121e
[<c0107890>] native_sched_clock+0x88/0x99
[<c013aa12>] clocksource_get_next+0x39/0x3f
[<c0139abc>] update_wall_time+0x616/0x7df
[<c0141d6b>] lock_acquire+0x5a/0x74
[<c0121724>] scheduler_tick+0x3a/0x18d
[<c047ed45>] _spin_lock+0x1c/0x45
[<c0121724>] scheduler_tick+0x3a/0x18d
[<c0121724>] scheduler_tick+0x3a/0x18d
[<c012c436>] update_process_times+0x3a/0x44
[<c013c044>] tick_periodic+0x63/0x6d
[<c013c062>] tick_handle_periodic+0x14/0x5e
[<c010568c>] timer_interrupt+0x44/0x4a
[<c0150c9f>] handle_IRQ_event+0x13/0x3d
[<c0151c14>] handle_level_irq+0x79/0xbd
[<c0105634>] do_IRQ+0x69/0x7d
[<c01041e4>] common_interrupt+0x28/0x30
[<c047007b>] aac_probe_one+0x1a3/0x3f3
[<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39
[<c01512b4>] setup_irq+0x1be/0x1f9
[<c065d70b>] start_kernel+0x259/0x2c5
[<ffffffff>] 0xffffffff
irq event stamp: 50102
hardirqs last enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23
hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b
softirqs last enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d
softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d
other info that might help us debug this:
no locks held by ksoftirqd/0/4.
stack backtrace:
Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2
[<c013f6cb>] print_usage_bug+0x13e/0x147
[<c013fef5>] mark_lock+0x493/0x797
[<c01410b1>] __lock_acquire+0x5be/0x121e
[<c0141d6b>] lock_acquire+0x5a/0x74
[<c011db84>] sched_rt_period_timer+0x9e/0x1fc
[<c047ed45>] _spin_lock+0x1c/0x45
[<c011db84>] sched_rt_period_timer+0x9e/0x1fc
[<c011db84>] sched_rt_period_timer+0x9e/0x1fc
[<c01210fd>] finish_task_switch+0x41/0xbd
[<c0107890>] native_sched_clock+0x88/0x99
[<c011dae6>] sched_rt_period_timer+0x0/0x1fc
[<c0136dda>] run_hrtimer_pending+0x54/0xe5
[<c011dae6>] sched_rt_period_timer+0x0/0x1fc
[<c0128afb>] __do_softirq+0x7b/0xef
[<c0128ba6>] do_softirq+0x37/0x4d
[<c0128c12>] ksoftirqd+0x56/0xc5
[<c0128bbc>] ksoftirqd+0x0/0xc5
[<c0134649>] kthread+0x38/0x5d
[<c0134611>] kthread+0x0/0x5d
[<c0104477>] kernel_thread_helper+0x7/0x10
=======================
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's a small race/chance that, while hrtimers are enabled globally,
they're later not enabled when we're calling the hrtimer_interrupt() function,
which then BUG_ON()'s for that. This patch closes that race/gap.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Impact: per CPU hrtimers can be migrated from a dead CPU
The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.
Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: during migration active hrtimers can be seen as inactive
The migration code removes the hrtimers from the queues of the dead
CPU and sets the state temporary to INACTIVE. The enqueue code sets it
to ACTIVE/PENDING again.
Prevent that the wrong state can be seen by using a separate migration
state bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: Stale timers after a CPU went offline.
commit 37bb6cb409
hrtimer: unlock hrtimer_wakeup
changed the hrtimer sleeper callback mode to CB_IRQSAFE_NO_SOFTIRQ due
to locking problems. A result of this change is that when enqueue is
called for an already expired hrtimer the callback function is not
longer called directly from the enqueue code. The normal callers have
been fixed in the code, but the migration code which moves hrtimers
from a dead CPU to a live CPU was not made aware of this.
This can be fixed by checking the timer state after the call to
enqueue in the migration code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: hrtimers which are on the pending list are not migrated at cpu
offline and can be stale forever
Add the pending list migration when CONFIG_HIGH_RES_TIMERS is enabled
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra noticed this 8 months ago and I just noticed
it again.
hrtimer_clock_base::get_softirq_time() is currently unused
in the entire tree. In fact, looking at the logs, it appears
as if it was never used. Remove it.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As part of going idle, we already look at the time of the next timer event to determine
which C-state to select etc.
This patch adds functionality that causes the timers that are past their
soft expire time, to fire at this time, before we calculate the next wakeup
time. This functionality will thus avoid wakeups by running timers before
going idle rather than specially waking up for it.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
This patch makes the nanosleep() system call use the per process
slack value; with this users are able to externally control existing
applications to reduce the wakeup rate.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
this patch adds a _range version of hrtimer_start() so that range timers
can be created; the hrtimer_start() function is just a wrapper around this.
In addition, hrtimer_start_expires() will now preserve existing ranges.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
this patch turns hrtimers into range timers; they have 2 expire points
1) the soft expire point
2) the hard expire point
the kernel will do it's regular best effort attempt to get the timer run
at the hard expire point. However, if some other time fires after the soft
expire point, the kernel now has the freedom to fire this timer at this point,
and thus grouping the events and preventing a power-expensive wakeup in the
future.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts kernel/* to these accessors.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
This patch adds a schedule_hrtimeout() function, to be used by select() and
poll() in a later patch. This function works similar to schedule_timeout()
in most ways, but takes a timespec rather than jiffies.
With a lot of contributions/fixes from Thomas
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the comment to explain why the double lock in migrate_timers()
can't deadlock.
Change the code to use spinlock_irq() instead of local_irq_disable()
+ spin_lock().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to a possible deadlock, the waking of the softirq was pushed outside
of the hrtimer base locks. See commit 0c96c5979a
Unfortunately this allows the task to migrate after setting up the softirq
and raising it. Since softirqs run a queue that is per-cpu we may raise the
softirq on the wrong CPU and this will keep the queued softirq task from
running.
To solve this issue, this patch disables preemption around the releasing
of the hrtimer lock and raising of the softirq.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The variables dns and inc are not used, remove them.
Signed-off-by: Carlos R. Mafra <crmafra@gmail.com>
Cc: tglx@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hres_timers_resume() warns if there appears to be more than one cpu
online. This warning makes sense when the suspend/resume mechanism
offlines all cpus but one during the suspend/resume process.
However, Xen suspend does not need to offline the other cpus; it
merely keeps them tied up in stop_machine() while the virtual machine
is suspended. The warning hres_timers_resume issues is therefore
spurious.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As git-grep shows, open_softirq() is always called with the last argument
being NULL
block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
This observation has already been made by Matthew Wilcox in June 2002
(http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)
"I notice that none of the current softirq routines use the data element
passed to them."
and the situation hasn't changed since them. So it appears we can safely
remove that extra argument to save 128 (54) bytes of kernel data (text).
Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The helper function hrtimer_callback_running() is used in
kernel/hrtimer.c as well as in the updated net/can/bcm.c which now
supports hrtimers. Moving the helper function to hrtimer.h removes the
duplicate definition in the C-files.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hrtimers have now dynamic users in the network code. Put them under
debugobjects surveillance as well.
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The scheduler hrtimer bits in 2.6.25 introduced a circular lock
dependency in a rare code path:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25-sched-devel.git-x86-latest.git #19
-------------------------------------------------------
X/2980 is trying to acquire lock:
(&rq->rq_lock_key#2){++..}, at: [<ffffffff80230146>] task_rq_lock+0x56/0xa0
but task is already holding lock:
(&cpu_base->lock){++..}, at: [<ffffffff80257ae1>] lock_hrtimer_base+0x31/0x60
which lock already depends on the new lock.
The scenario which leads to this is:
posix-timer signal is delivered
-> posix-timer is rearmed
timer is already expired in hrtimer_enqueue()
-> softirq is raised
To prevent this we need to move the raise of the softirq out of the
base->lock protected code path.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
When using hrtimer with timer->cb_mode == HRTIMER_CB_SOFTIRQ
in some cases the clockevent is not programmed.
This happens, if:
- a timer is rearmed while it's state is HRTIMER_STATE_CALLBACK
- hrtimer_reprogram() returns -ETIME, when it is called after
CALLBACK is finished. This occurs if the new timer->expires
is in the past when CALLBACK is done.
In this case, the timer needs to be removed from the tree and put
onto the pending list again.
The patch is against 2.6.22.5, but AFAICS, it is relevant
for 2.6.25 also (in run_hrtimer_pending()).
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The previous optimization did not take the case into account where a
clock provides its own softirq_get_time() function.
Check for the availablitiy of the clock get time function first and
then check if we need to retrieve the time for both clocks via
hrtimer_softirq_gettime() to avoid a double evaluation of time in that
case as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more
often than it needs to. This can cause frequent contention on systems with
large numbers of processors/cores.
With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if
there is a pending timer in one of the hrtimer bases, and only once.
This also combines hrtimer_run_queues() and the inline run_hrtimer_queue()
into one function.
[ tglx@linutronix.de: coding style ]
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In order to avoid the false positive from lockdep, each per-cpu base->lock has
the separate lock class and migrate_hrtimers() uses double_spin_lock().
This is overcomplicated: except for migrate_hrtimers() we never take 2 locks
at once, and migrate_hrtimers() can use spin_lock_nested().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert all the nanosleep related users of restart_block to the
new nanosleep specific restart_block fields.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
A CLOCK_REALTIME timer, which has an absolute expiry time less than
the clock realtime offset calls with a negative delta into the clock
events code and triggers the WARN_ON() there.
This is a false positive and needs to be prevented. Check the result
of timer->expires - timer->base->offset right away and return -ETIME
right away.
Thanks to Frans Pop, who reported the problem and tested the fixes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Frans Pop <elendil@planet.nl>