xen/smp: Speed up on_selected_cpus()
cpumask_weight() is an incredibly expensive way to find if no bits are set,
made worse by the fact that the calculation is performed with the global
call_lock held.
This appears to be a missing optimisation from c/s
433f14699d48 ("x86: Clean
up smp_call_function handling.") in 2011 which dropped the logic requiring the
count of CPUs.
Switch to using cpumask_empty() instead, which will short circuit as soon as
it finds any set bit in the cpumask.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>