x86/hvm: re-work viridian APIC assist code
It appears there is a case where Windows enables the APIC assist
enlightenment[1] but does not use it. This scenario is perfectly valid
according to the documentation, but causes the state machine in Xen to
become confused leading to a domain_crash() such as the following:
(XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0
build: 1db0
(XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff
(XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe
(XEN) domain_crash called from viridian.c:452
(XEN) Domain 4 (vcpu#0) crashed on cpu#1:
The following sequence of events is an example of how this can happen:
- On return to guest vlapic_has_pending_irq() finds a bit set in the IRR.
- vlapic_ack_pending_irq() calls viridian_start_apic_assist() which latches
the vector, sets the bit in the ISR and clears it from the IRR.
- The guest then processes the interrupt but EOIs it normally, therefore
clearing the bit in the ISR.
- On next return to guest vlapic_has_pending_irq() calls
viridian_complete_apic_assist(), which discovers the assist bit still set
in the shared page and therefore leaves the latched vector in place, but
also finds another bit set in the IRR.
- vlapic_ack_pending_irq() is then called but, because the ISR is was
cleared by the EOI, another call is made to viridian_start_apic_assist()
and this then calls domain_crash() because it finds the latched vector
has not been cleared.
Having re-visited the code I also conclude that Xen's implementation of the
enlightenment is currently wrong and we are not properly following the
specification.
The specification says:
"The hypervisor sets the \93No EOI required\94 bit when it injects a virtual
interrupt if the following conditions are satisfied:
- The virtual interrupt is edge-triggered, and
- There are no lower priority interrupts pending.
If, at a later time, a lower priority interrupt is requested, the
hypervisor clears the \93No EOI required\94 such that a subsequent EOI causes
an intercept.
In case of nested interrupts, the EOI intercept is avoided only for the
highest priority interrupt. This is necessary since no count is maintained
for the number of EOIs performed by the OS. Therefore only the first EOI
can be avoided and since the first EOI clears the \93No EOI Required\94 bit,
the next EOI generates an intercept."
Thus it is quite legitimate to set the "No EOI required" bit and then
subsequently take a higher priority interrupt without clearing the bit.
Thus the avoided EOI will then relate to that subsequent interrupt rather
than the highest priority interrupt when the bit was set. Hence latching
the vector when setting the bit is not entirely useful and somewhat
misleading.
This patch re-works the APIC assist code to simply track when the "No EOI
required" bit is set and test if it has been cleared by the guest (i.e.
'completing' the APIC assist), thus indicating a 'missed EOI'. Missed EOIs
need to be dealt with in two places:
- In vlapic_has_pending_irq(), to avoid comparing the IRR against a stale
ISR, and
- In vlapic_EOI_set() because a missed EOI for a higher priority vector
should be dealt with before the actual EOI for the lower priority
vector.
Furthermore, because the guest is at liberty to ignore the "No EOI required"
bit (which lead the crash detailed above) vlapic_EOI_set() must also make
sure the bit is cleared to avoid confusing the state machine.
Lastly the previous code did not properly emulate an EOI if a missed EOI
was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
the ISR. The new code instead calls vlapic_EOI_set().
[1] See section 10.3.5 of Microsoft's "Hypervisor Top Level Functional
Specification v5.0b".
NOTE: The changes to the save/restore code are safe because the layout
of struct hvm_viridian_vcpu_context is unchanged and the new
interpretation of the (previously so named) vp_assist_vector field
as the boolean pending flag maintains the correct semantics.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>