dgit.raspbian.org Git

arm/mem_access: don't reinject stage 2 access exceptions

The only way a guest may trip with stage 2 access violation is if mem_access is
or was in-use, so reinjecting these exceptions to the guest is never required.

Requested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

libxc: use PRI_xen_pfn in xc_dom_load_acpi

This fixes compilation on ARM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

libxc: fix out of range shift in populate_acpi_pages

unsigned int is only 4-byte long and "4" is treated as int. The shift
would overflow.

Use unsigned long type, calculate the bits to shift before shifting
instead of shifting twice.

Caught by clang compilation test.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc: improve error handling of xc Credit1 and Credit2 helpers

In fact, libxc wrappers should, on error, set errno and
return -1.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xen: libxc: allow to set the ratelimit value online

The main purpose of the patch is to provide the xen-libxc
plumbing necessary to be able to change the value of the
ratelimit_us parameter online, for Credit2 (like it is
already for Credit1).

While there:
- mention in the Xen logs when rate limiting was enables
and is being disabled (and vice-versa);
- fix csched2_sys_cntl() which was always returning
-EINVAL in the XEN_SYSCTL_SCHEDOP_putinfo case.

And also:
- fix style of an if in csched_sys_cntl();
- fix the style of the switch in csched2_sys_cntl();

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc/xc_dom_core: Copy ACPI tables to guest space

Load ACPI modules into guest space

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl/acpi: Build ACPI tables for HVMlite guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

libxl: Initialize domain build info before calling libxl__domain_make

libxl__domain_make() may want to use b_info so we should set defaults
a little earlier.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl/pvhv2: Include APIC page in MMIO hole for PVHv2 guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl/acpi: Add ACPI e820 entry

Add entry for ACPI tables created for PVHv2 guests to e820 map.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc/libxl: Allow multiple ACPI modules

Provide ability to load multiple ACPI modules. Thie feature is needed
by PVHv2 guests and will be used in subsequent patches.

We assume that PVHv2 guests do not load their ACPI modules specified
in the configuration file. We can extend support for that in the future
if desired.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libacpi: Build DSDT for PVH guests

PVH guests require DSDT with only ACPI INFO (Xen-specific) and Processor
objects. We separate ASL's ACPI INFO definition into dsdt_acpi_info.asl so
that it can be included in ASLs for both HVM and PVH2.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: Allow LAPIC-only emulation_flags for HVM guests

PVHv2 guests may request LAPIC emulation (and nothing else)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi: Move ACPI code to tools/libacpi

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Include file/paths adjustments

In prepearation to moving acpi sources into generally available
libacpi:

1. Pass IOAPIC/LAPIC/PCI mask values via struct acpi_config
2. Modify include files search paths to point to acpi directory
3. Macro-ise include file for build.c that defines various
utilities used by that file. Users of libacpi will be expected
to define this macro when compiling build.c

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Link ACPI object files directly

ACPI sources will be available to various component which will build
them according to their own rules. ACPI's Makefile will only generate
necessary source files.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Translate all addresses when assigning addresses in ACPI tables

Non-hvmloader users may be building tables in virtual address space
and therefore we need to make sure that values that end up in tables
are physical addresses.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Replace mem_alloc() and virt_to_phys() with memory ops

Components that wish to use ACPI builder will need to provide their own
mem_alloc() and virt_to_phys() routines. Pointers to these routines will
be passed to the builder as memory ops.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Build WAET optionally

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Make providing IOAPIC in MADT optional

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Set TIS header address in hvmloader

Users other than hvmloader may provide TIS address as virtual.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi/hvmloader: Collect processor and NUMA info in hvmloader

No need for ACPI code to rely on hvm_info variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi: Re-license ACPI builder files from GPLv2 to LGPLv2.1

ACPI builder is currently distributed under GPLv2 license.

We plan to make the builder available to components other than the
hvmloader (which is also GPLv2). Some of these components (such as
libxl) may be distributed under LGPL-2.1 so that they can be used by
non-GPLv2 callers. But this will not be possible if we incorporate
the ACPI builder in those other components.

To avoid this problem we are relicensing sources in ACPI bulder
directory to the Lesser GNU Public License (LGPL) version 2.1

gpl/mk_dsdt_asl.sh file will remain GPL-only pending permission to
relicense from Lenovo due to commit 801d469ad ("[HVM] ACPI support
patch 3 of 4: ACPI _PRT table."))

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Daniel Kiper <dkiper@net-space.pl>
Acked-by: Stefan Berger <stefanb@us.ibm.com>
Acked-by: Kouya Shimura <kouya@jp.fujitsu.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [for Oracle, VirtualIron and Sun contributions]

acpi: Prevent GPL-only code from seeping into non-GPL binaries

Some code (specifically, introduced by commit 801d469ad ("[HVM] ACPI
support patch 3 of 4: ACPI _PRT table.")) has only been licensed under
GPLv2. We want to prevent this code from showing up in non-GPL
binaries which might become possible after we make ACPI builder code
available to users other than hvmloader.

There are two pieces that we need to be careful about:
(1) A small chunk of code in dsdt.asl that implements _PIC method
(2) A chunk of ASL generator in mk_dsdt.c that describes with PCI
interrupt routing.

This code will now be generated by a GPL-only script which will be
invoked only when ACPI builder's Makefile is called with GPL variable
set.

We also strip license header from generated ASL files to prevent
inadverent use of those files with incorrect license.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi: Extract acpi info description into a separate ASL file

This code will be needed by PVH guests who don't want to use full DSDT.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

Config.mk: update mini-os commit

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

pvgrub: use printk() instead of grub_printf()

grub_printf() is supporting only a very limited number of formats.
Especially some error messages suffer from that, e.g. %lx won't work.
Switch to use printk() for error messages instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

pvgrub: fix crash when booting kernel with p2m list outside kernel mapping

When trying to boot a kernel with the p2m list not mapped by the
initial kernel mapping it can happen that pvgrub is failing as it is
keeping some page tables mapped.

Unmap the additional page tables created for the special p2m mapping
will avoid this failure.

Reported-by: Sven Koehler <sven.koehler@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

livepatch: arm[32,64],x86: NOP test-case

The test-case is quite simple - we NOP the 'xen_minor_version'.
The amount of NOPs depends on the architecture.

On x86 the function is 11 bytes long:

   55                      push   %rbp      <- NOP
   48 89 e5                mov    %rsp,%rbp <- NOP
   b8 04 00 00 00          mov    $0x4,%eax <- NOP
   5d                      pop    %rbp      <- NOP
   c3                      retq

We can NOP everything but the last instruction (so 10 bytes).

On ARM64 its 8 bytes:

  52800100    mov w0, #0x8         <- NOP
  d65f03c0    ret

We can NOP the first instruction.

While on ARM32 there are 24 bytes:

  e52db004        push    {fp}       <- NOP
  e28db000        add     fp, sp, #0 <- NOP
  e3a00008        mov     r0, #8     <- NOP
  e24bd000        sub     sp, fp, #0 <- NOP
  e49db004        pop     {fp}       <- NOP
  e12fff1e        bx      lr

And we can NOP instructions 1 through 5.

Granted this code may be different per compiler!

Hence if anybody does run this test-case - they should
verify that the assumptions made here are correct.

Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch, arm[32|64]: Share arch_livepatch_revert

It is exactly the same in both platforms.

No functional change.

Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Initial ARM32 support.

The patch piggybacks on: livepatch: Initial ARM64 support, which
brings up all of the necessary livepatch infrastructure pieces in.

This patch adds three major pieces:

1) ELF relocations. ARM32 uses SHT_REL instead of SHT_RELA which
    means the adddendum had to be extracted from within the
    instruction. Which required parsing BL/BLX, B/BL<cond>,
    MOVT, and MOVW instructions.

    The code was written from scratch using the ARM ELF manual
    (and the ARM Architecture Reference Manual)

2) Inserting an trampoline. We use the B (branch to address)
    which uses an offset that is based on the PC value: PC + imm32.
    Because we insert the branch at the start of the old function
    we have to account for the instruction already being fetched
    and subtract -8 from the delta (new_addr - old_addr). See
    ARM DDI 0406C.c, see A2.3 (pg 45) and A8.8.18 pg (pg 334,335)

3) Allows the test-cases to be built under ARM 32.
    The "livepatch: tests: Make them compile under ARM64"
    put in the right infrastructure for it and we piggyback on it.

Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com> [for non-ARM parts]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: tests: Make them compile under ARM64

We need to two things:
1) Wrap the platform-specific objcopy parameters in defines
   The input and output parameters for $(OBJCOPY) are different
   based on the platforms. As such provide them in the
   OBJCOPY_MAGIC define and use that.

2) The alternative is a bit different (exists only under ARM64
   and x86), while and there are no exceptions under ARM at all.
   We use the LIVEPATCH_FEATURE CPU id feature for ARM similar to
   how it is done on x86.

We are not yet attempting to build them under ARM32 so
that is still ifdefed out.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: x86, ARM, alternative: Expose FEATURE_LIVEPATCH

To use as a common way of testing alternative patching for
livepatches. Both architectures have this FEATURE and the
test-cases can piggyback on that.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch/arm/x86: Check payload for for unwelcomed symbols.

Certain platforms, such as ARM [32|64] add extra mapping symbols
such as $x (for ARM64 instructions), or more interesting to
this patch: $t (for Thumb instructions). These symbols are supposed
to help the final linker to make any adjustments (such as
add an veneer). But more importantly - we do not compile Xen
with any Thumb instructions (which are variable length) - and
if we find these mapping symbols we should disallow such payload.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: ARM 32|64: Ignore mapping symbols: $[d,a,x]

Those symbols are used to help final linkers to replace insn.
The ARM ELF specification mandates that they are present
to denote the start of certain CPU features. There are two
variants of it - short and long format.

Either way - we can ignore these symbols.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> [x86 bits]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: ARM/x86: Check displacement of old_addr and new_addr

If the distance is too big we are in trouble - as our relocation
distance can surely be clipped, or still have a valid width - but
cause an overflow of distance.

On various architectures the maximum displacement for a unconditional
branch/jump varies. ARM32 is +/- 32MB, ARM64 is +/- 128MB while x86
for 32-bit relocations is +/- 2G.

Note: On x86 we could use the 64-bit jmpq instruction which
would provide much bigger displacement to do a jump, but we would
still have issues with the new function not being able to reach
any of the old functions (as all the relocations would assume 32-bit
displacement). And "furthermore would require an register or
memory location to load/store the address to." (From Jan).

On ARM the conditional branch supports even a smaller displacement
but fortunately we are not using that.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Initial ARM64 support.

As compared to x86 the va of the hypervisor .text
is locked down - we cannot modify the running pagetables
to have the .ro flag unset. We borrow the same idea that
alternative patching has - which is to vmap the entire
.text region and use the alternative virtual address
for patching.

Since we are doing vmap we may fail, hence the
arch_livepatch_quiesce was changed (see "x86,arm:
Change arch_livepatch_quiesce() declaration") to return
an error value which will be bubbled in payload->rc and
provided to the user (along with messages in the ring buffer).

The livepatch virtual address space (where the new functions
are) needs to be close to the hypervisor virtual address
so that the trampoline can reach it. As such we re-use
the BOOT_RELOC_VIRT_START which is not used after bootup
(alternatively we can also use the space after the _end to
FIXMAP_ADDR(0), but that may be too small).

The ELF relocation engine at the start was coded from
the "ELF for the ARM 64-bit Architecture (AArch64)"
(http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf)
but after a while of trying to write the correct bit shifting
and masking from scratch I ended up borrowing from Linux, the
'reloc_insn_imm' (Linux v4.7 arch/arm64/kernel/module.c function.
See 257cb251925f854da435cbf79b140984413871ac "arm64: Loadable modules")

And while at it - we also utilize code from Linux to construct
the right branch instruction (see "arm64/insn: introduce
aarch64_insn_gen_{nop|branch_imm}() helper functions").

In the livepatch payload loading code we tweak the #ifdef to
only exclude ARM_32. The exceptions are not part of ARM 32/64 hence
they are still behind the #ifdef.

We also expand the MAINTAINERS file to include the arm64 and arm32
platform specific livepatch file.

Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com> [non-arm parts]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/arm: Map mmio-sram nodes as un-cached memory

Map mmio-sram nodes as un-cached memory. If the node
has set the no-memory-wc property, we map it as device.

The DTS bindings for mmio-sram nodes can be found in the
Linux tree under Documentation/devicetree/bindings/sram/sram.txt.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/arm: domain_build: Plumb for different mapping attributes

Add plumbing for passing around mapping attributes.
Nodes that don't specifically state their type will inherit
their type from their parent.

This is in preparation for allowing us to differentiate the attributes
for specific device nodes.

We still use the same DEVICE mappings for all nodes so this
patch has no functional change.

Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

xen/device-tree: Make dt_match_node match props

Make dt_match_node match for a single existing property.
We only search for the existence of the property, not
for specific values.

Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

xen/device-tree: Add __DT_MATCH macros without braces

Add __DT_MATCH macros without braces to allow the creation
of match descriptors with multiple combined match options.

Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

xen/arm: Rename and generalize un/map_regions_rw_cache

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Rename and generalize un/map_regions_rw_cache into
un/map_regions_p2mt. The new functions take the mapping
attributes as an argument.

No functional change.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: p2m: Add support for normal non-cacheable memory

Add support for describing normal non-cacheable memory.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: p2m: Rename p2m_mmio_direct_nc -> p2m_mmio_direct_dev

Rename p2m_mmio_direct_nc to p2m_mmio_direct_dev to better
express that we are mapping device memory. This will allow us
to use p2m_mmio_direct_nc for Normal Non-Cached mappings.

No functional change.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: p2m: Export p2m_*_lock helpers

Earlier patches exported the p2m interface (p2m_get_entry and
p2m_set_entry) to allow splitting xen/arch/arm/p2m.c. Those functions
require the callers to lock the p2m, so we need to export p2m_*_lock
helpers.

All helpers but p2m_write_unlock but p2m_write_unlock are moved in
xen/include/asm-arm/p2m.h to allow inlining. The helpers
p2m_write_unlock is kept in p2m.c because it depends on a static
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Do not handle shattering in p2m_create_table

The helper p2m_create_table is only called to create a brand new table.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Re-implement p2m_set_mem_access using p2m_{set,get}_entry

The function p2m_set_mem_access can be re-implemented using the generic
functions p2m_get_entry and __p2m_set_entry.

Also the function apply_p2m_changes is dropped completely as it is not
used anymore.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Re-implement p2m_insert_mapping using p2m_set_entry

The function p2m_insert_mapping can be re-implemented using the generic
function p2m_set_entry.

Note that the mapping is not reverted anymore if Xen fails to insert a
mapping. This was added to ensure the MMIO are not kept half-mapped
in case of failure and to follow the x86 counterpart. This was removed
on the x86 part by commit c3c756bd "x86/p2m: use large pages for MMIO
mappings" and I think we should let the caller taking care of it.

Finally drop the operation INSERT in apply_* as nobody is using it
anymore. Note that the functions could have been dropped in one go at the
end, however I find easier to drop the operations one by one avoiding a
big deletion in the patch that convert the last operation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Re-implement p2m_remove_using using p2m_set_entry

The function p2m_insert_mapping can be re-implemented using the generic
function p2m_set_entry.

Also drop the operation REMOVE in apply_* as nobody is using it anymore.
Note that the functions could have been dropped in one go at the end,
however I find easier to drop the operations one by one avoiding a big
deletion in the patch that converts the last operation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabelini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Re-implement relinquish_p2m_mapping using p2m_{get,set}_entry

The function relinquish_p2m_mapping can be re-implemented using
p2m_{get,set}_entry by iterating over the range mapped and using the
mapping order given by the callee.

Given that the preemption was chosen arbitrarily, it is now done on every
512 iterations. Meaning that Xen may check more often if the function is
preempted when there are no mappings.

Finally drop the operation RELINQUISH in apply_* as nobody is using it
anymore. Note that the functions could have been dropped in one go at
the end, however I find easier to drop the operations one by one
avoiding a big deletion in the patch that remove the last operation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry

The ARM architecture mandates to use of a break-before-make sequence
when changing translation entries if the page table is shared between
multiple CPUs whenever a valid entry is replaced by another valid entry
(see D4.7.1 in ARM DDI 0487A.j for more details).

The break-before-make sequence can be divided in the following steps:
    1) Invalidate the old entry in the page table
    2) Issue a TLB invalidation instruction for the address associated
    to this entry
    3) Write the new entry

The current P2M code implemented in apply_one_level does not respect
this sequence and may result to break coherency on some processors.

Adapting the current implementation to use the break-before-make
sequence would imply some code duplication and more TLBs invalidation
than necessary. For instance, if we are replacing a 4KB page and the
current mapping in the P2M is using a 1GB superpage, the following steps
will happen:
    1) Shatter the 1GB superpage into a series of 2MB superpages
    2) Shatter the 2MB superpage into a series of 4KB pages
    3) Replace the 4KB page

As the current implementation is shattering while descending and install
the mapping, Xen would need to issue 3 TLB invalidation instructions
which is clearly inefficient.

Furthermore, all the operations which modify the page table are using
the same skeleton. It is more complicated to maintain different code paths
than having a generic function that set an entry and take care of the
break-before-make sequence.

The new implementation is based on the x86 EPT one which, I think,
fits quite well for the break-before-make sequence whilst keeping
the code simple.

The main function of the new implementation is __p2m_set_entry. It will
only work on mapping that are aligned to a block entry in the page table
(i.e 1GB, 2MB, 4KB when using a 4KB granularity).

Another function, p2m_set_entry, is provided to break down is region
into mapping that is aligned to a block entry or 4KB when memaccess is
enabled.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Introduce a helper to check if an entry is a superpage

Use the level and the entry to know whether an entry is a superpage.
A superpage can only happen below level 3.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Make p2m_{valid,table,mapping} helpers inline

Those helpers are very small and often used. Let know the compiler they
can be inlined.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Re-implement p2m_cache_flush using p2m_get_entry

The function p2m_cache_flush can be re-implemented using the generic
function p2m_get_entry by iterating over the range and using the mapping
order given by the callee.

As the current implementation, no preemption is implemented, although
the comment in the current code claimed it. As the function is called by
a DOMCTL with a region of 1GB maximum, I think the preemption can be
left unimplemented for now.

Finally drop the operation CACHEFLUSH in apply_one_level as nobody is
using it anymore. Note that the function could have been dropped in one
go at the end, however I find easier to drop the operations one by one
avoiding a big deletion in the patch that convert the last operation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Replace all usage of __p2m_lookup with p2m_get_entry

__p2m_lookup is just a wrapper to p2m_get_entry.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Introduce p2m_get_entry and use it to implement __p2m_lookup

Currently, for a given GFN, the function __p2m_lookup will only return
the associated MFN and the p2m type of the mapping.

In some case we need the order of the mapping and the memaccess
permission. Rather than providing a separate function for this purpose,
it is better to implement a generic function to return all the
information.

To avoid passing dummy parameter, a caller that does not need a
specific information can use NULL instead.

The list of the informations retrieved is based on the x86 version. All
of them will be used in follow-up patches.

It might have been possible to extend __p2m_lookup, however I choose to
reimplement it from scratch to allow sharing some helpers with the
function that will update the P2M (will be added in a follow-up patch).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Introduce p2m_get_root_pointer and use it in __p2m_lookup

Mapping the root table is always done the same way. To avoid duplicating
the code in a later patch, move the code in a separate helper.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Move the lookup helpers at the top of the file

This will be used later in functions that will be defined earlier in the
file.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Change the type of level_shifts from paddr_t to uint8_t

The level shift can be encoded with 8-bit. So it is not necessary to
use paddr_t (i.e 64-bit).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Invalidate the TLBs when write unlocking the p2m

Sometimes the invalidation of the TLBs can be deferred until the p2m is
unlocked. This is for instance the case when multiple mappings are
removed. In other case, such as shattering a superpage, an immediate
flush is required.

Keep track whether a flush is needed directly in the p2m_domain structure
to allow serializing multiple changes. The TLBs will be invalidated when
write unlocking the p2m if necessary.

Also a new helper, p2m_flush_sync, has been introduced to force a
synchronous TLB invalidation.

Finally, replace the call to p2m_flush_tlb by p2m_flush_tlb_sync in
apply_p2m_changes.

Note this patch is not useful today, however follow-up patches will make
advantage of it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: traps: Check the P2M before injecting a data/instruction abort

A data/instruction abort may have occurred if another CPU was playing
with the stage-2 page table when following the break-before-make
sequence (see D4.7.1 in ARM DDI 0487A.j). Rather than injecting directly
the fault to the guest, we need to check whether the mapping exists. If
it exists, return to the guest to replay the instruction.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: traps: Move MMIO emulation code in a separate helper

Currently, a stage-2 fault translation will likely access an emulated
region. All the checks are pre-sanitity check for MMIO emulation.

A follow-up patch will handle a new case that could lead to a stage-2
translation. To improve the clarity of the code and the changes, the
current implementation is move in a separate helper.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Add a back pointer to domain in p2m_domain

The back pointer will be usefult later to get the domain when we only
have the p2m in hand.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Use typesafe gfn in p2m_mem_access_radix_set

p2m_mem_access_radix_set is expecting a gfn in a parameter. Rename the
parameter 'pfn' to 'gfn' to match its content and use the typesafe gfn
to avoid possible misusage.

Also rename the parameter to gfn to match its content.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Rename parameter in p2m_{remove,write}_pte...

to make clear of the usage. I.e it is used to inform whether Xen needs
to clean the entry after writing in the page table.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: p2m: Store in p2m_domain whether we need to clean the entry

Each entry in the page table has to be cleaned when the IOMMU does not
support coherent walk. Rather than querying every time the page table is
updated, it is possible to do it only once when the p2m is initialized.

This is because this value can never change, Xen would be in big trouble
otherwise.

With this change, the initialization of the IOMMU for a given domain has
to be done earlier in order to know whether the page table entries need
to be cleaned. It is fine to move the call earlier because it has no
dependency.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

xen/arm: do_trap_instr_abort_guest: Move the IPA computation out of the switch

A follow-up patch will add more case to the switch that will require the
IPA. So move the computation out of the switch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>

tools/configure: fix --with-system-{ovmf/seabios}

Currently configure code doesn't define {SEABIOS/OVMF}_PATH when
--with-system-{ovmf/seabios} is used. Fix this by making sure those
defines are always set if the internal {ovmf/seabios}_path variables are
also set.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Wei Liu <wei.liu2@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
[ wei: run autogen.sh ]
Acked-by: Wei Liu <wei.liu2@citrix.com>

libs/gnttab: fix build of gnttab_unimp.c

Fix the definition of the xengnttab_grant_copy function so it's in line
with the prototypes in xengnttab.h.

This unbreaks the tools build on systems that don't have a gnttab device.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

x86/vm_event: allow overwriting Xen's i-cache used for emulation

When emulating instructions Xen's emulator maintains a small i-cache fetched
from the guest memory. This patch extends the vm_event interface to allow
overwriting this i-cache via a buffer returned in the vm_event response.

When responding to a SOFTWARE_BREAKPOINT event (INT3) the monitor subscriber
normally has to remove the INT3 from memory - singlestep - place back INT3
to allow the guest to continue execution. This routine however is susceptible
to a race-condition on multi-vCPU guests. By allowing the subscriber to return
the i-cache to be used for emulation it can side-step the problem by returning
a clean buffer without the INT3 present.

As part of this patch we rename hvm_mem_access_emulate_one to
hvm_emulate_one_vm_event to better reflect that it is used in various vm_event
scenarios now, not just in response to mem_access events.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

x86/svm: Drop the set_segment_register() macro

Replace its sole users with a single piece of inline assembly which is more
flexable about its register constraints, rather than forcing the use of %ax.

While editing this area, reflow the comment to remove trailing whitespace and
use fewer lines.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/AMD: apply erratum 665 workaround

AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.

[ Borislav: Wrote commit message. ]

Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
[Linux commit: d1992996753132e2dafe955cccb2fb0714d3cfc4]

Make applicable to Xen.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/HVM: correct segment register loading during task switch

Instead of #NP, #SS needs to be raised for a non-present %ss
descriptor.

Don't lose the low two selector bits on null selector loads.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86emul: don't allow null selector for LTR

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86emul: correct loading of %ss

- Instead of #NP, #SS needs to be raised for non-present descriptors.
- Loading a null selector is fine in 64-bit mode at CPL != 3, as long
as RPL == CPL.
- Don't lose the low two selector bits on null selector loads (also
applies to %ds, %es, %fs, %gs, and LDTR).

Since we need CPL earlier now, also switch to using get_cpl() (instead
of open coding it).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

VMX: don't bypass vmx_update_secondary_exec_control()

While putting together another patch modifying the secondary exec
controls I noticed that vmx_vcpu_update_vmfunc_ve() does a raw VMWRITE
instead of going through the designated function. I assume that is not
how it should be.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

xen-livepatch: Print the header _after_ the first livepatch hypercall

That way we can print out the header if we are sure the
hypervisor has been compiled with Xen Livepatching.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-livepatch: Remove the 'test' part

As it has evolved a bit and is more of a test tool.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reported-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

bug/x86/arm: Align bug_frames sections.

Most of the WARN_ON or BUG_ON sections are properly aligned on
x86. However on ARM and on x86 assembler the macros don't include
any alignment information - hence they end up being the default
byte granularity.

On ARM32 it is paramount that the alignment is word-size (4)
otherwise if one tries to use (uint32_t*) access (such
as livepatch ELF relocations) we get a Data Abort.

Enforcing bug_frames to have the proper alignment across all
architectures and in both C and x86 makes them all the same.

Furthermore on x86 the bloat-o-meter detects that with this
change:

add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
function                                     old     new   delta

On ARM32:
add/remove: 1/0 grow/shrink: 0/1 up/down: 384/-288 (96)
function                                     old     new   delta
gnttab_unpopulate_status_frames                -     384    +384
do_grant_table_op                          10808   10520    -288

And ARM64:
add/remove: 1/2 grow/shrink: 0/1 up/down: 4164/-4236 (-72)
function                                     old     new   delta
gnttab_map_grant_ref                           -    4164   +4164
do_grant_table_op                           9892    9836     -56
grant_map_exists                             300       -    -300
__gnttab_map_grant_ref                      3880       -   -3880

Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com> [x86 parts]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/arm32: Add an helper to invalidate all instruction caches

This is similar to commit fb9d877a9c0f3d4d15db8f6e0c5506ea641862c6
"xen/arm64: Add an helper to invalidate all instruction caches"
except it is on ARM32 side.

When we are flushing the cache we are most likely also want
to flush the branch predictor too. Hence we add this.

And we also need to follow this with dsb()/isb() which are
memory barriers().

Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Move test-cases to their own sub-directory in test.

So they can be shared with ARM64 (but not yet, so they
are only built on x86).

No functional change.

We also need to tweak the MAINTAINERS and .gitignore file.

Also we need to update SUBDIRS to include the new 'test'
directory so 'cscope' can show the example livepatches.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com> [arm change]
Acked-by: Jan Beulich <jbeulich@suse.com> [for directory]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

arm: poison initmem when it is freed.

The current byte sequence is '0xcc' which makes sense on x86,
but on ARM it is:

cccccccc stclgt 12, cr12, [ip], {204} ; 0xcc

Picking something more ARM applicable such as:

efefefef svc 0x00efefef

Creates a nice crash if one executes that code:
(XEN) CPU1: Unexpected Trap: Supervisor Call

But unfortunately that may not be a good choice either as in the future
we may want to implement support for it.

Julien suggested that we use a 4-byte insn instruction instead
of trying to work with one byte. To make sure nothing goes bad
we also require that the __init_[begin|end] be aligned properly.

As such on ARM 32 we use the udf instruction (see A8.8.247
in ARM DDI 0406C.c) and on ARM 64 use the AARCH64_BREAK_FAULT
instruction (aka brk instruction).

We don't have to worry about Thumb code so this instruction
is a safe to execute.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Reject payloads with .alternative or .ex_table if support is not built-in.

If the payload had the sections mentioned but the hypervisor
did not support some of them (say on ARM the .ex_table) - instead
of ignoring them - it should forbid loading of such payload.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

arm/x86/common: Add HAS_[ALTERNATIVE|EX_TABLE]

x86 implements all of them by default - and we just
add two extra HAS_ variables to be declared in autoconf.h.

ARM 64 only has alternative while ARM 32 has none of them.

And while at it change the livepatch common code that
would benefit from this.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com> [relevant parts]
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

arm64: s/ALTERNATIVE/HAS_ALTERNATIVE/

No functional change. We resist the temptation to move
the entries in the Kconfig file to be more in alphabetical
order as the "arm/x86/common: Add HAS_[ALTERNATIVE|EX_TABLE]"
will move one of the entries to common file.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepach: Add .livepatch.hooks functions and test-case

Add hook functions which run during patch apply and patch revert.
Hook functions are used by livepatch payloads to manipulate data
structures during patching, etc.

One use case is the XSA91. As Martin mentions it:
"If we have shadow variables, we also need an unload hook to garbage
collect all the variables introduced by a hotpatch to prevent memory
leaks. Potentially, we also want to pre-reserve memory for static or
existing dynamic objects in the load-hook instead of on the fly.

For testing and debugging, various applications are possible.

In general, the hooks provide flexibility when having to deal with
unforeseen cases, but their application should be rarely required (<
10%)."

Furthermore include a test-case for it.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Drop _jmp from arch_livepatch_[apply,revert]_jmp

With "livepatch: NOP if func->new_addr is zero." that name
makes no more sense as we also NOP now.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: NOP if func->new_addr is zero.

The NOP functionality will NOP any of the code at
the 'old_addr' or at 'name' if the 'new_addr' is zero.
The purpose of this is to NOP out calls, such as:

e8 <4-bytes-offset>

(5 byte insn), or on ARM a 4 byte insn for branching.

We need the EIP of where we need to the NOP, and that can
be provided via the `old_addr` or `name`.

If the `old_addr` is provided we will NOP 'new_size'
amount of bytes at that location.

The amount is up to 31 instructions if desired (which is
the size of the opaque member). If there is a need to NOP
more then: a) more 'struct livepatch_func' structures need
to be present, b) we have to implement a variable size
buffer (in the future), or c) first byte an unconditional
branch skipping the to be disabled code (of course provided
there are no branch targets in the middle).

While at it, also unify the code on x86 patching so
it is a bit simpler (instead of two seperate writes
just make it one memcpy).

And introduce a general livepatch_insn_len inline function
that would depend on platform specific instruction size
(for a unconditional branch). As such we also rename the
PATCH_INSN_SIZE to ARCH_PATCH_INSN_SIZE.
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Add limit of 2MB to payload .bss sections.

The initial patch: 11ff40fa7bb5fdcc69a58d0fec49c904ffca4793
"xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op" caps the
size of the binary at 2MB. We follow that in capping the size
of the .BSSes to be at maximum 2MB.

We also bubble up the payload limit and this one in one #define
called LIVEPATCH_MAX_SIZE to make it easier to find these
arbitrary limits.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

livepatch: Disallow applying after an revert

On general this is unhealthy - as the payload's .bss (definitly)
or .data (maybe) will be modified once the payload is running.

Doing an revert and then re-applying the payload with a non-pristine
.bss or .data can lead to unforseen consequences (.bss are assumed
to always contain zero value but now they may have a different value).

There is one exception - if the payload contains only one .data section
- the .livepatch.funcs, then it is OK to re-apply an revert.
We detect this rather simply (if there is one RW section and its name
is .livepatch.funcs) - but the payload may have many other RW sections
that are not used at all (such as .bss or .data sections with zero
length). To not account those we also ignore sections with sh_size
being zero.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/time: extend "tsc" param with "stable:socket"

Extend the "tsc" boot parameter is to further relax TSC restrictions and
allow it to be used on machines that guarantee reliable TSC across
sockets. This is up to board manufacturers and there's no way for the OS
to probe this property, therefore user needs to explicitly set this option.

Also make one style adjustment that is to remove the unnecessary
parenthesis around clearing TSC_RELIABLE.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/time: implement PVCLOCK_TSC_STABLE_BIT

This patch proposes relying on host TSC synchronization and
passthrough to the guest, when running on a TSC-safe platform. On
time_calibration we retrieve the platform time in ns and the counter
read by the clocksource that was used to compute system time. We can
guarantee that on a platform with a constant and reliable TSC, that the
time read on vcpu B right after A is bigger independently of the VCPU
calibration error. Since pvclock time infos are monotonic as seen by any
vCPU set PVCLOCK_TSC_STABLE_BIT, which then enables usage of VDSO on
Linux. IIUC, this is similar to how it's implemented on KVM. Add also a
comment regarding this bit changing and that guests are expected to
check this bit on every read.

Should note that I've yet to see time going backwards in a long running
test I ran for 2 weeks (in a dual socket machine), plus few other
tests I did on older platforms.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/time: implement tsc as clocksource

Recent x86/time changes improved a lot of the monotonicity in xen
timekeeping, making it much harder to observe time going backwards.
Although platform timer can't be expected to be perfectly in sync with
TSC and so get_s_time won't be guaranteed to always return
monotonically increasing values across cpus. This is the case in some
of the boxes I am testing with, observing sometimes ~100 warps (of
very few nanoseconds each) after a few hours.

This patch introduces support for using TSC as platform time source
which is the highest resolution time and most performant to get.
Though there are also several problems associated with its usage, and
there isn't a complete (and architecturally defined) guarantee that
all machines will provide reliable and monotonic TSC in all cases (I
believe Intel to be the only that can guarantee that?). For this reason
it's not used unless administrator changes "clocksource" boot option
to "tsc". Initializing TSC clocksource requires all CPUs up to have
the tsc reliability checks performed. init_xen_time is called before
all CPUs are up, so for example we would start with HPET (or ACPI,
PIT) at boot time, and switch later to TSC. The switch then happens on
verify_tsc_reliability initcall that is invoked when all CPUs are up.
When attempting to initialize TSC we also check for time warps and if
it has invariant TSC. Note that while we deem reliable a CONSTANT_TSC
with no deep C-states, it might not always be the case, so we're
conservative and allow TSC to be used as platform timer only with
invariant TSC. Additionally we check if CPU Hotplug isn't meant to be
performed on the host which will either be when max vcpus and
num_present_cpu are the same. This is because a newly hotplugged CPU
may not satisfy the condition of having all TSCs synchronized - so
when having tsc clocksource being used we allow offlining CPUs but not
onlining any ones back. Finally we prevent TSC from being used as
clocksource on multiple sockets because it isn't guaranteed to be
invariant. Further relaxing of this last requirement is added in a
separate patch, such that we allow vendors with such guarantee to use
TSC as clocksource. In case any of these conditions is not met, we
keep the clocksource that was previously initialized on init_xen_time.

Since b64438c7c ("x86/time: use correct (local) time stamp in
constant-TSC calibration fast path") updates to cpu time use local
stamps, which means platform timer is only used to seed the initial
cpu time. We further introduce a new rendezvous function
(nop_rendezvous) which doesn't require synchronization between master
and slave CPUS and just reads calibration_rendezvous struct and writes
it down the stime and stamp to the cpu_calibration struct to be used
later on. With clocksource=tsc there is no need to be in sync with
another clocksource, so we reseed the local/master stamps to be values
of TSC and update the platform time stamps accordingly. Time
calibration is set to 1sec after we switch to TSC, thus these stamps
are reseeded to also ensure monotonic returning values right after the
point we switch to TSC. This is to remove the possibility of having
inconsistent readings in this short period (i.e. until calibration
fires).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/time: refactor read_platform_stime()

To allow the caller to fetch the last read from the clocksource which
was used to calculate system_time. This is a prerequisite for a
subsequent patch that will use this last read.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/time: refactor init_platform_time()

And accomodate platform time source initialization in
try_platform_time(). This is a preparatory patch for deferring
TSC clocksource initialization to the stage where all CPUS are
up (verify_tsc_reliability init call).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

acpi: Makefile should better tolerate interrupts

Intermediate stages of building a target should be made with
temporary files that are copied to final target in the end.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86emul: move x86_emulate() common epilogue code

Only code movement, no functional change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

misc/arm: Correctly name bit in the booting document

SCTLR_EL3.HCR does not exists in the documentation (see D7.2.80 in ARM
DDI 0487A.j). It was meant to be SCTRL_EL3.HCE.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm64: Add missing synchronization barrier in invalidate_cache

The invalidation of the instructions cache requires barriers to ensure
the completion of the invalidation before continuing (see B2.3.4 in ARM
DDI 0487A.j).

This was overlooked in commit fb9d877 "xen/arm64: Add an helper to
invalidate all instruction caches".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

livepatch/tests: Move the .name value to .rodata

Right now the contents of 'name' are all located in
the .data section. We want them in the .rodata section
so change the type to have const on them.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>