x86/pv: Rewrite segment context switching from scratch
There are multiple bugs with the existing implementation.
On AMD CPUs prior to Zen2, loading a NUL segment selector doesn't clear the
segment base, which is a problem for 64bit code which typically expects to use
a NUL %fs/%gs selector.
On a context switch from any PV vcpu, to a 64bit PV vcpu with an %fs/%gs
selector which faults, the fixup logic loads NUL, and the guest is entered at
the failsafe callback with the stale base.
Alternatively, a PV context switch sequence of 64 (NUL, non-zero base) =>
32 (NUL) => 64 (NUL, zero base) will similarly cause Xen to enter the guest
with a stale base.
Both of these corner cases manifest as state corruption in the final vcpu.
However, damage is limited to to 64bit code expecting to use Thread Local
Storage with a base pointer of 0, which doesn't occur by default.
The context switch logic is extremely complicated, and is attempting to
optimise away loading a NUL selector (which is fast), or writing a 64bit base
of 0 (which is rare). Furthermore, it fails to respect Linux's ABI with
userspace, which manifests as userspace state corruption as far as Linux is
concerned.
Always restore all selector and base state, in all cases.
Leave a large comment explaining hardware behaviour, and the new ABI
expectations. Update the comments in the public headers.
Drop all "segment preloading" to handle the AMD corner case. It was never
anything but a waste of time for %ds/%es, and isn't needed now that %fs/%gs
bases are unconditionally written for 64bit PV guests. In load_segments(),
store the result of is_pv_32bit_vcpu() as it is an expensive predicate now,
and not used in a way which impacts speculative safety.
Reported-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Sarah Newman <srn@prgmr.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/pv: Fix assertions in svm_load_segs()
OSSTest has shown an assertion failure:
http://logs.test-lab.xenproject.org/osstest/logs/153906/test-xtf-amd64-amd64-1/serial-rimava1.log
This is because we pass a non-NUL selector into svm_load_segs(), which is
something we must not do, as this path does not load the attributes/limits
from the GDT/LDT.
Drop the {fs,gs}_sel parameters from svm_load_segs() and use 0 instead. This
is acceptable even for non-zero NUL segments, as it is how the IRET
instruction behaves in all CPUs.
Only use the svm_load_segs() path when both FS and GS are NUL, which is the
common case when scheduling a 64bit vcpu with 64bit userspace in context.
Fixes: ad0fd291c5 ("x86/pv: Rewrite segment context switching from scratch")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit:
ad0fd291c5e79191c2e3c70e43dded569f11a450
master date: 2020-09-07 11:32:34 +0100
master commit:
1e2d3be2e516e6f415ca6029f651b76a8563a27c
master date: 2020-09-08 16:46:31 +0100