x86/tlb: use Xen L0 assisted TLB flush when available
Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.
The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.
Using x2APIC and ALLBUT shorthand:
real 4m35.973s
user 4m35.110s
sys 36m24.117s
Using L0 assisted flush:
real 1m2.596s
user 4m34.818s
sys 5m16.374s
The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook.
Note that the Xen implementation completely ignores the dirty CPU mask
and the linear address passed in, and always performs a global TLB
flush on all vCPUs. This is a limitation of the hypercall provided by
Xen. Also note that local TLB flushes are not performed using the
assisted TLB flush, only remote ones.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>