x86/boot: Simplify pagetable manipulation loops
For __page_tables_{start,end} and L3 bootmap initialisation, the logic is
unnecesserily complicated owing to its attempt to use the LOOP instruction,
which results in an off-by-8 memory address owing to LOOP's termination
condition.
Rewrite both loops for improved clarity and speed.
Misc notes:
* TEST $IMM, MEM can't macrofuse. The loop has 0x1200 iterations, so pull
the $_PAGE_PRESENT constant out into a spare register to turn the TEST into
its %REG, MEM form, which can macrofuse.
* Avoid the use of %fs-relative references. %esi-relative is the more common
form in the code, and doesn't suffer an address generation overhead.
* Avoid LOOP. CMP/JB isn't microcoded and faster to execute in all cases.
* For a 4 interation trivial loop, even compilers unroll these. The
generated code size is a fraction larger, but this is init and the asm is
far easier to follow.
* Reposition the l2=>l1 bootmap construction so the asm reads in pagetable
level order.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>