xen: sched: close potential races when switching scheduler to CPUs
In short, the point is making sure that the actual switch
of scheduler and the remapping of the scheduler's runqueue
lock occur in the same critical section, protected by the
"old" scheduler's lock (and not, e.g., in the free_pdata
hook, as it is now for Credit2 and RTDS).
Not doing so, is (at least) racy. In fact, for instance,
if we switch cpu X from, Credit2 to Credit, we do:
schedule_cpu_switch(x, csched2 --> csched):
//scheduler[x] is csched2
//schedule_lock[x] is csched2_lock
csched_alloc_pdata(x)
csched_init_pdata(x)
pcpu_schedule_lock(x) ----> takes csched2_lock
scheduler[X] = csched
pcpu_schedule_unlock(x) --> unlocks csched2_lock
[1]
csched2_free_pdata(x)
pcpu_schedule_lock(x) --> takes csched2_lock
schedule_lock[x] = csched_lock
spin_unlock(csched2_lock)
While, if we switch cpu X from, Credit to Credit2, we do:
schedule_cpu_switch(X, csched --> csched2):
//scheduler[x] is csched
//schedule_lock[x] is csched_lock
csched2_alloc_pdata(x)
csched2_init_pdata(x)
pcpu_schedule_lock(x) --> takes csched_lock
schedule_lock[x] = csched2_lock
spin_unlock(csched_lock)
[2]
pcpu_schedule_lock(x) ----> takes csched2_lock
scheduler[X] = csched2
pcpu_schedule_unlock(x) --> unlocks csched2_lock
csched_free_pdata(x)
And if we switch cpu X from RTDS to Credit2, we do:
schedule_cpu_switch(X, RTDS --> csched2):
//scheduler[x] is rtds
//schedule_lock[x] is rtds_lock
csched2_alloc_pdata(x)
csched2_init_pdata(x)
pcpu_schedule_lock(x) --> takes rtds_lock
schedule_lock[x] = csched2_lock
spin_unlock(rtds_lock)
pcpu_schedule_lock(x) ----> takes csched2_lock
scheduler[x] = csched2
pcpu_schedule_unlock(x) --> unlocks csched2_lock
rtds_free_pdata(x)
spin_lock(rtds_lock)
ASSERT(schedule_lock[x] == rtds_lock) [3]
schedule_lock[x] = DEFAULT_SCHEDULE_LOCK [4]
spin_unlock(rtds_lock)
So, the first problem is that, if anything related to
scheduling, and involving CPU, happens at [1] or [2], we:
- take csched2_lock,
- operate on Credit1 functions and data structures,
which is no good!
The second problem is that the ASSERT at [3] triggers, and
the third that at [4], we screw up the lock remapping we've
done for ourself in csched2_init_pdata()!
The first problem arises because there is a window during
which the lock is already the new one, but the scheduler is
still the old one. The other two, becase we let schedulers
mess with the lock (re)mapping done by others.
This patch, therefore, introduces a new hook in the scheduler
interface, called switch_sched, meant at being used when
switching scheduler on a CPU, and implements it for the
various schedulers, so that things are done in the proper
order and under the protection of the best suited (set of)
lock(s). It is necessary to add the hook (as compared to
keep doing things in generic code), because different
schedulers may have different locking schemes.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Robert VanVossen <robert.vanvossen@dornerworks.com>