sched: Fix race against ptrace_freeze_trace()

There is apparently one site that violates the rule that only current
and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced()
will change task->state for a remote task.

Oleg explains:

  "TASK_TRACED/TASK_STOPPED was always protected by siglock. In
particular, ttwu(__TASK_TRACED) must be always called with siglock
held. That is why ptrace_freeze_traced() assumes it can safely do
s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)."

This breaks the ordering scheme introduced by commit:

  dbfb089d360b ("sched: Fix loadavg accounting race")

Specifically, the reload not matching no longer implies we don't have
to block.

Simply things by noting that what we need is a LOAD->STORE ordering
and this can be provided by a control dependency.

So replace:

	prev_state = prev->state;
	raw_spin_lock(&rq->lock);
	smp_mb__after_spinlock(); /* SMP-MB */
	if (... && prev_state && prev_state == prev->state)
		deactivate_task();

with:

	prev_state = prev->state;
	if (... && prev_state) /* CTRL-DEP */
		deactivate_task();

Since that already implies the 'prev->state' load must be complete
before allowing the 'prev->on_rq = 0' store to become visible.

Fixes: dbfb089d360b ("sched: Fix loadavg accounting race")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
Change-Id: Iccb651f3757ed543e8f104bc16cded57674caf78
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
fourteen
Peter Zijlstra 4 years ago committed by Jenna
parent cb4edce02d
commit 901480479d
  1. 24
      kernel/sched/core.c

@ -3626,9 +3626,6 @@ static void __sched notrace __schedule(bool preempt)
local_irq_disable(); local_irq_disable();
rcu_note_context_switch(preempt); rcu_note_context_switch(preempt);
/* See deactivate_task() below. */
prev_state = prev->state;
/* /*
* Make sure that signal_pending_state()->signal_pending() below * Make sure that signal_pending_state()->signal_pending() below
* can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
@ -3649,11 +3646,16 @@ static void __sched notrace __schedule(bool preempt)
update_rq_clock(rq); update_rq_clock(rq);
switch_count = &prev->nivcsw; switch_count = &prev->nivcsw;
/* /*
* We must re-load prev->state in case ttwu_remote() changed it * We must load prev->state once (task_struct::state is volatile), such
* before we acquired rq->lock. * that:
*
* - we form a control dependency vs deactivate_task() below.
* - ptrace_{,un}freeze_traced() can change ->state underneath us.
*/ */
if (!preempt && prev_state && prev_state == prev->state) { prev_state = prev->state;
if (!preempt && prev_state) {
if (unlikely(signal_pending_state(prev_state, prev))) { if (unlikely(signal_pending_state(prev_state, prev))) {
prev->state = TASK_RUNNING; prev->state = TASK_RUNNING;
} else { } else {
@ -3667,10 +3669,12 @@ static void __sched notrace __schedule(bool preempt)
/* /*
* __schedule() ttwu() * __schedule() ttwu()
* prev_state = prev->state; if (READ_ONCE(p->on_rq) && ...) * prev_state = prev->state; if (p->on_rq && ...)
* LOCK rq->lock goto out; * if (prev_state) goto out;
* smp_mb__after_spinlock(); smp_acquire__after_ctrl_dep(); * p->on_rq = 0; smp_acquire__after_ctrl_dep();
* p->on_rq = 0; p->state = TASK_WAKING; * p->state = TASK_WAKING
*
* Where __schedule() and ttwu() have matching control dependencies.
* *
* After this, schedule() must not care about p->state any more. * After this, schedule() must not care about p->state any more.
*/ */

Loading…
Cancel
Save