x86/pv: Split pv_hypercall() in two
authorAndrew Cooper <andrew.cooper3@citrix.com>
Mon, 4 Oct 2021 18:11:45 +0000 (19:11 +0100)
committerAndrew Cooper <andrew.cooper3@citrix.com>
Tue, 12 Oct 2021 15:56:11 +0000 (16:56 +0100)
commiteb7518b89be6df874ec9b2bafadeaaa7c874ea6b
tree5db187001ee888caf0c7bac4fdcf94f3627082ce
parent2faeb4213d9b412836fe80e5685bfcccc51feb92
x86/pv: Split pv_hypercall() in two

The is_pv_32bit_vcpu() conditionals hide four lfences, with two taken on any
individual path through the function.  There is very little code common
between compat and native, and context-dependent conditionals predict very
badly for a period of time after context switch.

Move do_entry_int82() from pv/traps.c into pv/hypercall.c, allowing
_pv_hypercall() to be static and forced inline.  The delta is:

  add/remove: 0/0 grow/shrink: 1/1 up/down: 300/-282 (18)
  Function                                     old     new   delta
  do_entry_int82                                50     350    +300
  pv_hypercall                                 579     297    -282

which is tiny, but the perf implications are large:

  Guest | Naples | Milan  | SKX    | CFL-R  |
  ------+--------+--------+--------+--------+
  pv64  |  17.4% |  15.5% |   2.6% |   4.5% |
  pv32  |   1.9% |  10.9% |   1.4% |   2.5% |

These are percentage improvements in raw TSC detlas for a xen_version
hypercall, with obvious outliers excluded.  Therefore, it is an idealised best
case improvement.

The pv64 path uses `syscall`, while the pv32 path uses `int $0x82` so
necessarily has higher overhead.  Therefore, dropping the lfences is less over
an overall improvement.

I don't know why the Naples pv32 improvement is so small, but I've double
checked the numbers and they're consistent.  There's presumably something
we're doing which is a large overhead in the pipeline.

On the Intel side, both systems are writing to MSR_SPEC_CTRL on
entry/exit (SKX using the retrofitted microcode implementation, CFL-R using
the hardware implementation), while SKX is suffering further from XPTI for
Meltdown protection.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/arch/x86/pv/hypercall.c
xen/arch/x86/pv/traps.c