dgit.raspbian.org Git

x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling acpi_parse_dmar()

pci_conf_read8() needs pci mmcfg mapping to work on multiple pci
segments system such as HPE Superdome-Flex.

Move acpi_mmcfg_init() call in acpi_boot_init() before calling
acpi_parse_dmar() so that when pci_conf_read8() is called in
acpi_parse_dev_scope(), we already have the mapping set up.

mmio_ro_ranges initialization is also moved ahead as it's the only
dependency of pci_mmcfg_arch_enable() need to be moved. Also
checked codes between the old and new call sites to ensure we
don't break anything.

Furthermore MMCFG will continue to not work this early (or
more precisely not at all until Dom0 boot has progressed far
enough) if the range(s) isn't/aren't marked reserved in E820.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Tested-by: Gopalasetty, Manoj <manoj.gopalasetty@hpe.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

VMX: make vmx_read_guest_msr() cope with callers not checking its return value

It took till the 4.5 backports of the L1TF prereqs that gcc 8.2 finally
noticed that the vPMU callers, not checking the function's return value,
may consume uninitialized data. Guard against this by storing zero on
the error path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

xenforeignmemory: fix fd leakage in error path

b49ef5d3 (xenforeignmemory: work around bug in older privcmd) added an
error path but forgot to close fd there.

Spotted by Coverity.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

rombios: remove packed attribute for pushad_regs_t

The structure already has explicitly padding.

Removing the attribute silences a clang 6 warning:

tcgbios.c:1519:34: error: taking address of packed member 'u' of class or structure 'pushad_regs_t' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
®s->u.r32.edx);
^~~~~~~~~~~~~~~

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen: is_hvm_{domain,vcpu} should evaluate to false when !CONFIG_HVM

Turn them into static inline functions which evaluate to false when
CONFIG_HVM is not set. ARM won't be broken because ARM guests are set
to PV type in the hypervisor.

But ARM has plan to switch to HVM guest type inside the hypervisor, so
preemptively introduce CONFIG_HVM for ARM here.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/xsm: Rename CONFIG_XSM_POLICY to CONFIG_XSM_FLASK_POLICY

The embedded policy is specifically a flask policy, so update the
infrastructure to reflect this.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

xen/xsm: Rename CONFIG_FLASK_* to CONFIG_XSM_FLASK_*

Flask is one single XSM module, and another is about to be introduced.
Properly namespace the symbols for clarity.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

x86/svm: Fixes to OS Visible Workaround handling

OSVW data is technically per-cpu, but it is the firmwares reponsibility to
make it equivelent on each cpu.  A guests OSVW data is sourced from global
data in Xen, clearly making it per-domain data rather than per-vcpu data.

Move the data from struct arch_svm_struct to struct svm_domain, and call
svm_guest_osvw_init() from svm_domain_initialise() instead of
svm_vcpu_initialise().

In svm_guest_osvw_init(), reading osvw_length and osvw_status must be done
under the osvw_lock to avoid observing mismatched values.  The guests view of
osvw_length also needs clipping at 64 as we only offer one status register (To
date, 5 is the maximum index defined AFAICT).  Avoid opencoding max().

Drop svm_handle_osvw() as it is shorter and simpler to implement the
functionality inline in svm_msr_{read,write}_intercept().  As the OSVW MSRs
are a contiguous block, we can access them as an array for simplicity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

xen/pt: io.c contains HVM only code

We also need to make it x86 only because ARM will define CONFIG_HVM at
some point.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/vpmu: put HVM only code under CONFIG_HVM

Change u32 to uint32_t while at it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: provide stub for memory_type_changed

Jan indicated that for PV guests the memory type is not changed, for
HVM guests memory_type_changed is needed for EPT's effective memory
type calculation. This means memory_type_changed is HVM only.

Provide a stub to minimise code churn.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/hvm: provide hvm_hap_supported

And replace direct accesses in non-HVM subsystems to
hvm_funcs.hap_supported with the new function, to avoid accessing an
internal data structure of another subsystem directly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: enclose hvm_op and dm_op in CONFIG_HVM in relevant tables

PV guest (Dom0) needs to able to use these two hypercalls in order to
serve HVM guests. But if xen doesn't support HVM at all there is no
point in exposing them to PV guests.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Revert "x86/hvm: remove default ioreq server"

This reverts commit 629856eae2a7f766f1f024a06ad3abf1fd4b9d37,
which breaks at least one of the qemu builds.

VT-d/dmar: iommu mem leak fix

Release memory allocated for drhd iommu in error path.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

build: remove tboot make targets

The tboot targets are woefully out of date. These should really be
retired because setting up tboot is more complex than the build process
for it.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Christopher Clark <christopher.clark6@baesystems.com>

x86/hvm: remove default ioreq server

My recent patch [1] to qemu-xen-traditional removes the last use of the
'default' ioreq server in Xen. (This is a catch-all ioreq server that is
used if no explicitly registered I/O range is targetted).

This patch can be applied once that patch is committed, to remove the
(>100 lines of) redundant code in Xen.

NOTE: The removal of the special case for HVM_PARAM_DM_DOMAIN in
      hvm_allow_set_param() is not directly related to removal of
      default ioreq servers. It could have been cleaned up at any time
      after commit 9a422c03 "x86/hvm: stop passing explicit domid to
      hvm_create_ioreq_server()". It is now added to the new
      deprecated sets introduced by this patch.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg00270.html

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/nestedhvm: provide some stubs for p2m code

Make two functions static inline so that they can be referenced in p2m
code. Check nestedhvm is enabled before calling
nestedhvm_vmcx_flushtlb (which also has a side effect of not issuing
unnecessary IPIs for non-nested case).

While moving, reformat code and use proper boolean.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm/shadow: split out HVM only code

Move the code previously enclosed in CONFIG_HVM into its own file.

Note that although some code explicitly check is_hvm_*, which hints it
can be used for PV too, I can't find a code path that would be the
case.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>

x86/mm/shadow: make it build with !CONFIG_HVM

Enclose HVM only emulation code under CONFIG_HVM. Add some BUG()s to
to catch any issue.

Note that although some code checks is_hvm_*, which hints it can be
called for PV as well, I can't find such paths.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>

x86/vm_event: put vm_event_fill_regs under CONFIG_HVM

Ideally the HVM specific part of VM event should be moved into hvm/ at
some point, but this will do for now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

x86/mem_access: put HVM only function under CONFIG_HVM

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

x86: guard HAS_VPCI with CONFIG_HVM

VPCI is only useful for PVH / HVM guests. Ideally CONFIG_HVM should
imply !PV_SHIM_EXCLUSIVE, but we still want to build PV_SHIM_EXCLUSIVE
with CONFIG_HVM at this stage because a lot of things are still
entangled.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xenforeignmemory: work around bug in older privcmd

Versions of linux privcmd prior to commit dc9eab6fd94d ("return -ENOTTY
for unimplemented IOCTLs") will return -EINVAL rather than the conventional
-ENOTTY for unimplemented codes. This breaks the error path in
libxenforeignmemory resource mapping, which only translates ENOTTY into
EOPNOTSUPP to inform callers of the need to use an alternative (legacy)
mechanism.

This patch adds a new 'unimplemented' [1] ioctl code into the local
privcmd header which is then used to probe for the appropriate errno to
translate in the resource mapping error path

[1] this is a code that has, so far, never been used in any version of
privcmd and will be added to future versions of the header in the
linux source, to make sure it stays unimplemented.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: building IPXE should be determined by CONFIG_IPXE

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

QEMU_TAG update

xen/arm: p2m: Introduce a new variable removing_mapping in __p2m_set_entry

This is making the code slightly easier to understand.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: p2m: Rename ret to mfn in p2m_lookup

Comestic change to make clearer what is the return ('ret' is a bit
too generic).

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: guest_walk: Use lpae_is_mapping to simplify the code

!lpae_is_page(pte, level) && !lpae_is_superpage(pte, level) is
equivalent to !lpae_is_mapping(pte, level).

At the same time drop lpae_is_page(pte, level) that is now unused.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Rename lpae_valid to lpae_is_valid

This will help to keep the naming consistent accross all lpae helpers.

No functional change intended.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Rework lpae_table

Currently, lpae_table can only work on entry from any level other than
3. Make it work with any level by extending the prototype to pass the
level.

At the same time, rename the function to lpae_is_mapping so naming stay
consistent accross all lpae_* helpers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Rework lpae_mapping

Currently, lpae_mapping can only work on entry from any level other than
3. Make it work with any level by extending the prototype to pass the
level.

At the same time, rename the function to lpae_is_mapping so naming stay
consistent accross lpae_* helpers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: p2m: Limit call to mem access code use in get_page_from_gva

Mem access has only an impact on the hardware translation between a
guest virtual address and the machine physical address. So it is not
necessary to fallback to memaccess for all the other case (e.g when it
is not possible to acquire the page behind the MFN).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: p2m: Reduce the locking section in get_page_from_gva

The p2m lock is only necessary to prevent gvirt_to_maddr failing when
break-before-make sequence is used in the P2M update concurrently on
another pCPU. So reduce the locking section.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: cpregs: Fix typo in the documentation of TTBCR

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: cpregs: Allow HSR_CPREG* to receive more than 1 parameter

At the moment, HSR_CPREG is expected to receive only the co-processor
register name in parameter. Because the name is actually a define, this
may have been expanded by a previous macro.

Rather than imposing the use of _HSR_CPREG* in such cases, allow
HSR_CPREG to receive more than 1 parameter.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: rename acpi_make_chosen_node to make_chosen_node

acpi_make_chosen_node is actually generic and can be reused. Rename it
to make_chosen_node and make it available to non-ACPI builds.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/arm: move evtchn_allocate call out of make_hypervisor_node

In the case of domUs, evtchn_irq is allocated by arch_domain_create and
set to GUEST_EVTCHN_PPI.

To make make_hypervisor_node more reusable, move the call to
evtchn_allocate out of make_hypervisor_node, to the dom0 specific caller
(handle_node).

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>

xen/arm: move a few DT related defines to public/device_tree_defs.h

Move a few constants defined by libxl_arm.c to
xen/include/public/device_tree_defs.h, so that they can be used from Xen
and libxl. Prepend GUEST_ to avoid conflicts.

Move the DT_IRQ_TYPE* definitions from libxl_arm.c to
public/device_tree_defs.h. Use them in Xen where appropriate.

Re-define the existing Xen internal IRQ_TYPEs as DT_IRQ_TYPEs: they
already happen to be the same, let make it clear.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
CC: ian.jackson@eu.citrix.com

xen/arm: do not pass dt_host to make_memory_node and make_hypervisor_node

In order to make make_memory_node and make_hypervisor_node more
reusable, do not pass them dt_host. As they only use it to calculate
addrcells and sizecells, pass addrcells and sizecells directly.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/arm: drivers: scif: Remove unused #define-s

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>

x86/oprofile: put SVM only code under CONFIG_HVM

The code snippet in question is to detect NMI held by SVM until STGI
is called. When Xen doesn't even support HVM guests there is no need
to check svm_stgi_label.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mtrr: move is_var_mtrr_overlapped

Move it to x86 generic code. While at it, use proper boolean type and
fix some cosmetic issues.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/amd: skip OSVW function calls if !CONFIG_HVM

The two functions are not needed when HVM is not supported in
hypervisor.

Note that using hvm_enabled won't work because early_microcode_init
gets to cpu_request_microcode before hvm_enabled is set in presmp init
call stage.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/vmce: enclose HVM load / save code in CONFIG_HVM

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/pt: add HVM check to XEN_DOMCTL_unbind_pt_irq

Its counterpart is HVM only. Add the check to help dead code
elimination to figure out the call to pt_irq_destroy_bind is not
needed when HVM is not enabled.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: don't reference hvm_funcs directly

It is generally not a good idea to reference the internal data
structure of the another subsystem directly. Introduce a wrapper
function for the invlpg hook.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/vvmx: make get_shadow_eptp static function

Its callers live within the same file.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: HVM_FEP should depend on HVM

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

xen: fix building !CONFIG_LOCK_PROFILE

The init function shouldn't be built or called at all when
!CONFIG_LOCK_PROFILE.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

libxl_qmp: Disable beautify for QMP generated cmd

There is no need for it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl_qmp: Simplify qmp_response_type() prototype

Remove the libxl__qmp_handler* argument so the function can be reused
later in a different context.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl_json: libxl__json_object_to_json

Allow to generate a JSON string from a libxl__json_object,
useful for debugging.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl_json: Enable yajl_allow_trailing_garbage

This allows to parse a string that is not NUL-terminated. With that
option disabled, YAJL v2 would look ahead on completion to find out if
there is more to parse.

YAJL v1 doesn't have this behavior.

Any function that allocates a yajl_handle via this function either parse
a NUL-terminated string, or do provide proper length. So change the
default and allow garbage (like a different JSON document) after the end
of the data to parse.

This is important for the QMP client, as there could be more than one
message to parse, and YAJL would consider the next message to be garbage
and throw an error.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl_dm: Add libxl__qemu_qmp_path()

... which generates the path to a QMP socket that libxl uses.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl_json: constify libxl__json_object_to_yajl_gen arguments

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_qmp: Remove unused yajl_ctx from handler

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: Add libxl__prepare_sockaddr_un() helper

There is going to be a few more users that want to use UNIX socket, this
helper is to prepare the `struct sockaddr_un` and check that the path
isn't too long.

Also start to use it in libxl_qmp.c.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl_qmp: Move struct sockaddr_un variable to qmp_open()

This variable is only used once, no need to keep it in the handler.

Also fix coding style (remove space after sizeof).
And allow strncpy to use all the space in sun_path.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

tools: fix uninstall: tests/x86_emulator, Linux hotplug

Fixing top-level "make uninstall":

tools/tests/x86_emulator is missing an uninstall target, which causes
failure. Trivial to add one since it installs nothing, so do that.

Linux hotplug uninstall returns success but doesn't actually remove what
it installed. The Makefile variables are obfuscating incorrect logic, so
strip them out and match existing code for xen-watchdog which does work.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

libgnttab: Add support for Linux dma-buf

Add support for Linux grant device driver extension which allows
converting existing dma-buf's into an array of grant references
and vise versa. This is only implemented for Linux as other OSes
have no Linux dma-buf support.

Bump gnttab library minor version to 3.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

MAINTAINERS: add myself as a reviewer for x86 patches

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

automation: build with debian unstable

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>

tools/tests: fix an xs-test.c issue

The ret variable can be used uninitialised when iters is 0. Initialise
ret at the beginning to fix this issue.

Reported-by: Steven Haigh <netwiz@crc.id.au>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/kdd: work around gcc 8.1 bug

Gcc 8.1 has a bug that causes kdd fail to build. Rewrite the code to
work around that bug.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86827

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xenpmd: make 32 bit gcc 8.1 non-debug build work

32 bit gcc 8.1 non-debug build yields:

xenpmd.c:354:23: error: '%02x' directive output may be truncated writing between 2 and 8 bytes into a region of size 3 [-Werror=format-truncation=]
     snprintf(val, 3, "%02x",
                       ^~~~
xenpmd.c:354:22: note: directive argument in the range [40, 2147483778]
     snprintf(val, 3, "%02x",
                      ^~~~~~
xenpmd.c:354:5: note: 'snprintf' output between 3 and 9 bytes into a destination of size 3
     snprintf(val, 3, "%02x",
     ^~~~~~~~~~~~~~~~~~~~~~~~
              (unsigned int)(9*4 +
              ~~~~~~~~~~~~~~~~~~~~
                             strlen(info->model_number) +
                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                             strlen(info->serial_number) +
                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                             strlen(info->battery_type) +
                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                             strlen(info->oem_info) + 4));
                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All info->* used in calculation are 32 bytes long, and the parsing
code makes sure they are null-terminated, so the end result of the
expression won't exceed 255, which should be able to be fit into 3
bytes in hexadecimal format.

Add an assertion to make gcc happy.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: update ipxe changeset

This placates gcc 8.1. The commit comes from ipxe master branch as of
July 25, 2018.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl/arm: Fix build on arm64 + acpi w/ gcc 8.2

Add zero-padding to #defined ACPI table strings that are copied.
Provides sufficient characters to satisfy the length required to
fully populate the destination and prevent array-bounds warnings.
Add BUILD_BUG_ON sizeof checks for compile-time length checking.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>

x86/mmcfg: rename pt_pci_init() and call it from acpi_mmcfg_init()

Given what pt_pci_init() actually does, rename it properly and move its
declaration to pci.h. Move the only call into acpi_mmcfg_init().

No functional change.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Tested-by: Gopalasetty, Manoj <manoj.gopalasetty@hpe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

libxc: copy back the result of XEN_DOMCTL_createdomain

Fixes the ARM guest boot breakage introduced by 54ed251dc7.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

rangeset: make inquiry functions tolerate NULL inputs

Rather than special casing the ->iomem_caps check in x86's
get_page_from_l1e() for the dom_xen case, let's be more tolerant in
general, along the lines of rangeset_is_empty(): A never allocated
rangeset can't possibly contain or overlap any range.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

dom0/pvh: change the order of the MMCFG initialization

So it's done before the iommu is initialized. This is required in
order to be able to fetch the MMCFG regions from the domain struct.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: remove page.h and processor.h inclusion from asm_defns.h

Subsequent changes require this (too wide anyway imo) dependency to be
dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/HVM: correct an inverted check in hvm_load()

Clearly we want to put a vCPU to sleep if it is _not_ already down.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: make arch_set_info_guest() match comments in load_segments()

For both fs_base and gs_base_user, there are comments saying "This can
only be non-zero if selector is NULL." While save_segments() ensures
this, so far arch_set_info_guest() didn't. Make behavior consistent
(attaching comments identical to those in save_segments()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/setup: Avoid OoB E820 lookup when calculating the L1TF safe address

A number of corner cases (most obviously, no-real-mode and no Multiboot memory
map) can end up with e820_raw.nr_map being 0, at which point the L1TF
calculation will underflow.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

libxl: fix ARM build after 54ed251dc7

Commit "tools: Rework xc_domain_create() to take a full
xen_domctl_createdomain" failed to replace one further instance of
xc_config in libxl__arch_domain_save_config().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/mmcfg: remove redundant code in pci_mmcfg_reject_broken()

No functional change.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

gnttab/ARM: properly implement gnttab_create_status_page()

Prevent the "BUG_ON(page_get_owner(pg) != d)" in
gnttab_unpopulate_status_frames() from triggering.

Reported-by: 王磊 <lei19.wang@samsung.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

x86/hvm/emulate: make sure rep I/O emulation does not cross GFN boundaries

When emulating a rep I/O operation it is possible that the ioreq will
describe a single operation that spans multiple GFNs. This is fine as long
as all those GFNs fall within an MMIO region covered by a single device
model, but unfortunately the higher levels of the emulation code do not
guarantee that. This is something that should almost certainly be fixed,
but in the meantime this patch makes sure that MMIO is truncated at GFN
boundaries and hence the appropriate device model is re-evaluated for each
target GFN.

NOTE: This patch does not deal with the case of a single MMIO operation
spanning a GFN boundary. That is more complex to deal with and is
deferred to a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Convert calculations to be 32-bit only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

xen/evtchn: Pass max_evtchn_port into evtchn_init()

... rather than setting it up once domain_create() has completed. This
involves constructing a default value for dom0.

No practical change in functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/domctl: Merge set_max_evtchn into createdomain

set_max_evtchn is somewhat weird. It was introduced with the event_fifo work,
but has never been used. Still, it is a bounding on resources consumed by the
event channel infrastructure, and should be part of createdomain, rather than
editable after the fact.

Drop XEN_DOMCTL_set_max_evtchn completely (including XSM hooks and libxc
wrappers), and retain the functionality in XEN_DOMCTL_createdomain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

tools: Rework xc_domain_create() to take a full xen_domctl_createdomain

In future patches, the structure will be extended with further information,
and this is far cleaner than adding extra parameters.

The python stubs are the only user which passes NULL for the existing config
option (which is actually the arch substructure). Therefore, the #ifdefary
moves to compensate.

For libxl, pass the full config object down into
libxl__arch_domain_{prepare,save}_config(), as there are in practice arch
specific settings in the common part of the structure (flags s3_integrity and
oos_off specifically).

No practical change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

tools/ocaml: Pass a full domctl_create_config into stub_xc_domain_create()

The underlying C function is about to make the same change, and the structure
is going to gain extra fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

x86/pv: Use xmemdup() for cpuidmasks, rather than opencoding it

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/ptwr: Misc cleanup to ptwr_emulated_update()

All but one user wants mfn as mfn_t, so switch its type. offset is only ever
used when multipled by 8, so fold that into its initial calculation. Fold all
the pointer arithmic on pl1e together, to avoid needless casts.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: write to correct variable in parse_pv_l1tf()

Apparently a copy-and-paste mistake.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/hvm/ioreq: MMIO range checking completely ignores direction flag

hvm_select_ioreq_server() is used to route an ioreq to the appropriate
ioreq server. For MMIO this is done by comparing the range of the ioreq
to the ranges registered by the device models of each ioreq server.
Unfortunately the calculation of the range if the ioreq completely ignores
the direction flag and thus may calculate the wrong range for comparison.
Thus the ioreq may either be routed to the wrong server or erroneously
terminated by null_ops.

NOTE: The patch also fixes whitespace in the switch statement to make it
style compliant.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

libs/foreignmemory: Avoid printing an error for ENOTSUPP

Resource mapping is not supported on Arm and results to an error message
at every guest boot:

xenforeignmemory: error: ioctl failed: Operation not supported

Hide the error message when errnor is ENOTSUPP.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: start pvqemu when 9pfs is requested

PV 9pfs requires the PV backend in QEMU. Make sure that libxl knows it.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

pygrub: fix package version

Make the version in setup.py agree with PYGRUB_VER.

Signed-off-by: Simon Rowe <simon.rowe@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xen: fix stale PVH comment

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xl.conf: Add global affinity masks

XSA-273 involves one hyperthread being able to use Spectre-like
techniques to "spy" on another thread.  The details are somewhat
complicated, but the upshot is that after all Xen-based mitigations
have been applied:

* PV guests cannot spy on sibling threads
* HVM guests can spy on sibling threads

(NB that for purposes of this vulnerability, PVH and HVM guests are
identical.  Whenever this comment refers to 'HVM', this includes PVH.)

There are many possible mitigations to this, including disabling
hyperthreading entirely.  But another solution would be:

* Specify some cores as PV-only, others as PV or HVM
* Allow HVM guests to only run on thread 0 of the "HVM-or-PV" cores
* Allow PV guests to run on the above cores, as well as any thread of the PV-only cores.

For example, suppose you had 16 threads across 8 cores (0-7).  You
could specify 0-3 as PV-only, and 4-7 as HVM-or-PV.  Then you'd set
the affinity of the HVM guests as follows (binary representation):

0000000010101010

And the affinity of the PV guests as follows:

1111111110101010

In order to make this easy, this patches introduces three "global affinity
masks", placed in xl.conf:

    vm.cpumask
    vm.hvm.cpumask
    vm.pv.cpumask

These are parsed just like the 'cpus' and 'cpus_soft' options in the
per-domain xl configuration files.  The resulting mask is AND-ed with
whatever mask results at the end of the xl configuration file.
`vm.cpumask` would be applied to all guest types, `vm.hvm.cpumask`
would be applied to HVM and PVH guest types, and `vm.pv.cpumask`
would be applied to PV guest types.

The idea would be that to implement the above mask across all your
VMs, you'd simply add the following two lines to the configuration
file:

    vm.hvm.cpumask=8,10,12,14
    vm.pv.cpumask=0-8,10,12,14

See xl.conf manpage for details.

This is part of XSA-273 / CVE-2018-3646.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>

x86: Make "spec-ctrl=no" a global disable of all mitigations

In order to have a simple and easy to remember means to suppress all the
more or less recent workarounds for hardware vulnerabilities, force
settings not controlled by "spec-ctrl=" also to their original defaults,
unless they've been forced to specific values already by earlier command
line options.

This is part of XSA-273.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests

This mitigation requires up-to-date microcode, and is enabled by default on
affected hardware if available, and is used for HVM guests

The default for SMT/Hyperthreading is far more complicated to reason about,
not least because we don't know if the user is going to want to run any HVM
guests to begin with. If a explicit default isn't given, nag the user to
perform a risk assessment and choose an explicit default, and leave other
configuration to the toolstack.

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/msr: Virtualise MSR_FLUSH_CMD for guests

Guests (outside of the nested virt case, which isn't supported yet) don't need
L1D_FLUSH for their L1TF mitigations, but offering/emulating MSR_FLUSH_CMD is
easy and doesn't pose an issue for Xen.

The MSR is offered to HVM guests only. PV guests attempting to use it would
trap for emulation, and the L1D cache would fill long before the return to
guest context. As such, PV guests can't make any use of the L1D_FLUSH
functionality.

This is part of XSA-273 / CVE-2018-3646.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/spec-ctrl: CPUID/MSR definitions for L1D_FLUSH

This is part of XSA-273 / CVE-2018-3646.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/pv: Force a guest into shadow mode when it writes an L1TF-vulnerable PTE

See the comment in shadow.h for an explanation of L1TF and the safety
consideration of the PTEs.

In the case that CONFIG_SHADOW_PAGING isn't compiled in, crash the domain
instead. This allows well-behaved PV guests to function, while preventing
L1TF from being exploited. (Note: PV guest kernels which haven't been updated
with L1TF mitigations will likely be crashed as soon as they try paging a
piece of userspace out to disk.)

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/mm: Plumbing to allow any PTE update to fail with -ERESTART

Switching to shadow mode is performed in tasklet context.  To facilitate this,
we schedule the tasklet, then create a hypercall continuation to allow the
switch to take place.

As a consequence, the x86 mm code needs to cope with an L1e operation being
continuable.  do_mmu{,ext}_op() may no longer assert that a continuation
doesn't happen on the final iteration.

To handle the arguments correctly on continuation, compat_update_va_mapping*()
may no longer call into their non-compat counterparts.  Move the compat
functions into mm.c rather than exporting __do_update_va_mapping() and
{get,put}_pg_owner(), and fix an unsigned long/int inconsistency with
compat_update_va_mapping_otherdomain().

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/shadow: Infrastructure to force a PV guest into shadow mode

To mitigate L1TF, we cannot alter an architecturally-legitimate PTE a PV guest
chooses to write, but we can force the PV domain into shadow mode so Xen
controls the PTEs which are reachable by the CPU pagewalk.

Introduce new shadow mode, PG_SH_forced, and a tasklet to perform the
transition. Later patches will introduce the logic to enable this mode at the
appropriate time.

To simplify vcpu cleanup, make tasklet_kill() idempotent with respect to
tasklet_init(), which involves adding a helper to check for an uninitialised
list head.

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>