xen/schedule: Fix races in vcpu migration
The current sequence to initiate vcpu migration is inefficent and error-prone:
- The initiator sets VPF_migraging with the lock held, then drops the
lock and calls vcpu_sleep_nosync(), which immediately grabs the lock
again
- A number of places unnecessarily check for v->pause_flags in between
those two
- Every call to vcpu_migrate() must be prefaced with
vcpu_sleep_nosync() or introduce a race condition; this code
duplication is error-prone
- In the event that v->is_running is true at the beginning of
vcpu_migrate(), it's almost certain that vcpu_migrate() will end up
being called in context_switch() as well; we might as well simply
let it run there and save the duplicated effort (which will be
non-negligible).
The result is that Credit1 has several races which result in runqueue
<-> v->processor invariants being violated (triggering ASSERTs in
debug builds and strange bugs in production builds).
Instead, introduce vcpu_migrate_start() to initiate the process.
vcpu_migrate_start() is called with the scheduling lock held. It not
only sets VPF_migrating, but also calls vcpu_sleep_nosync_locked()
(which will automatically do nothing if there's nothing to do).
Rename vcpu_migrate() to vcpu_migrate_finish(). Check for v->is_running and
pause_flags & VPF_migrating at the top and return if appropriate.
Then the way to initiate migration is consistently:
* Grab lock
* vcpu_migrate_start()
* Release lock
* vcpu_migrate_finish()
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Release-acked-by: Juergen Gross <jgross@suse.com>