vmx/hap: optimize CR4 trapping
There a bunch of bits in CR4 that should be allowed to be set directly
by the guest without requiring Xen intervention, currently this is
already done by passing through guest writes into the CR4 used when
running in non-root mode, but taking an expensive vmexit in order to
do so.
xenalyze reports the following when running a PV guest in shim mode:
CR_ACCESS
3885950 6.41s 17.04% 3957 cyc { 2361| 3378| 7920}
cr4
3885940 6.41s 17.04% 3957 cyc { 2361| 3378| 7920}
cr3 1 0.00s 0.00% 3480 cyc { 3480| 3480| 3480}
*[ 0] 1 0.00s 0.00% 3480 cyc { 3480| 3480| 3480}
cr0 7 0.00s 0.00% 7112 cyc { 3248| 5960|17480}
clts 2 0.00s 0.00% 4588 cyc { 3456| 5720| 5720}
After this change this turns into:
CR_ACCESS 12 0.00s 0.00% 9972 cyc { 3680|11024|24032}
cr4 2 0.00s 0.00% 17528 cyc {11024|24032|24032}
cr3 1 0.00s 0.00% 3680 cyc { 3680| 3680| 3680}
*[ 0] 1 0.00s 0.00% 3680 cyc { 3680| 3680| 3680}
cr0 7 0.00s 0.00% 9209 cyc { 4184| 7848|17488}
clts 2 0.00s 0.00% 8232 cyc { 5352|11112|11112}
Note that this optimized trapping is currently only applied to guests
running with HAP on Intel hardware. If using shadow paging more CR4
bits need to be unconditionally trapped, which makes this approach
unlikely to yield any important performance improvements.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>