x86/emul: Optimise decode_register() somewhat
The positions of GPRs inside struct cpu_user_regs doesn't follow any
particular order, so as compiled, decode_register() becomes a jump table to 16
blocks which calculate the appropriate offset, at a total of 207 bytes.
Instead, pre-compute the offsets at build time and use pointer arithmetic to
calculate the result. By observation, most callers in x86_emulate() inline
and constant-propagate the highbyte_regs value of 0.
The splitting of the general and legacy byte-op cases means that we will now
hit an ASSERT if any code path tries to use the legacy byte-op encoding with a
REX prefix.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>