dgit.raspbian.org Git

drivers/acpi: Drop "ERST table was not found" message

ERST isn't a mandatory table, and also isn't very common to find. The message
is unnecessary noise during boot. Furthermore, it is redundant with the list
of found ACPI tables printed just ahead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/vpmu: Drop "VPMU: disabled" message

Printing "$foo disabled" is unnecessary noise during boot. All other VPMU
settings emit a message, so this doesn't result in any ambiguity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

tools/libs: put common Makefile parts into new libs.mk

The Makefile below tools/libs have a lot in common. Put those common
parts into a new libs.mk and include that from the specific Makefiles.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

vpci: honor read-only devices

Don't allow the hardware domain write access the PCI config space of
devices marked as read-only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

sysctl / libxl: report whether IOMMU/HAP page table sharing is supported

This patch defines a new bit reported in the hw_cap field of struct
xen_sysctl_physinfo to indicate whether the platform supports sharing of
HAP page tables (i.e. the P2M) with the IOMMU. This informs the toolstack
whether the domain needs extra memory to store discrete IOMMU page tables
or not.

NOTE: This patch makes sure iommu_hap_pt_shared is clear if HAP is not
supported or the IOMMU is disabled, and defines it to false if
!CONFIG_HVM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <julien.grall@arm.com>

use is_iommu_enabled() where appropriate...

...rather than testing the global iommu_enabled flag and ops pointer.

Now that there is a per-domain flag indicating whether the domain is
permitted to use the IOMMU (which determines whether the ops pointer will
be set), many tests of the global iommu_enabled flag and ops pointer can
be translated into tests of the per-domain flag. Some of the other tests of
purely the global iommu_enabled flag can also be translated into tests of
the per-domain flag.

NOTE: The comment in iommu_share_p2m_table() is also fixed; need_iommu()
      disappeared some time ago. Also, whilst the style of the 'if' in
      flask_iommu_resource_use_perm() is fixed, I have not translated any
      instances of u32 into uint32_t to keep consistency. IMO such a
      translation would be better done globally for the source module in
      a separate patch.
      The change to the definition of iommu_call() is to keep the PV shim
      build happy. Without this change it will fail to compile with errors
      of the form:

iommu.c:361:32: error: unused variable ‘hd’ [-Werror=unused-variable]
     const struct domain_iommu *hd = dom_iommu(d);
                                     ^~

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

domain: introduce XEN_DOMCTL_CDF_iommu flag

This patch introduces a common domain creation flag to determine whether
the domain is permitted to make use of the IOMMU. Currently the flag is
always set for both dom0 and any domU created by libxl if the IOMMU is
globally enabled (i.e. iommu_enabled == 1). sanitise_domain_config() is
modified to reject the flag if !iommu_enabled.

A new helper function, is_iommu_enabled(), is added to test the flag and
iommu_domain_init() will return immediately if !is_iommu_enabled(). This is
slightly different to the previous behaviour based on !iommu_enabled where
the call to arch_iommu_domain_init() was made regardless, however it appears
that this call was only necessary to initialize the dt_devices list for ARM
such that iommu_release_dt_devices() can be called unconditionally by
domain_relinquish_resources(). Adding a simple check of is_iommu_enabled()
into iommu_release_dt_devices() keeps this unconditional call working.

No functional change should be observed with this patch applied.

Subsequent patches will allow the toolstack to control whether use of the
IOMMU is enabled for a domain.

NOTE: The introduction of the is_iommu_enabled() helper function might
      seem excessive but its use is expected to increase with subsequent
      patches. Also, having iommu_domain_init() bail before calling
      arch_iommu_domain_init() is not strictly necessary, but I think the
      consequent addition of the call to is_iommu_enabled() in
      iommu_release_dt_devices() makes the code clearer.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

sched: populate cpupool0 only after all cpus are up

Simplify cpupool initialization by populating cpupool0 with cpus only
after all cpus are up. This avoids having to call the cpu notifier
directly for cpu 0.

With that in place there is no need to create cpupool0 earlier, so
do that just before assigning the cpus. Initialize free cpus with all
online cpus at that time in order to be able to add the cpu notifier
late, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

spinlocks: print lock profile info in panic()

Print the lock profile data when the system crashes and add some more
information for each lock data (lock address, cpu holding the lock).
While at it use the PRI_stime format specifier for printing time data.

This is especially beneficial for watchdog triggered crashes in case
of deadlocks.

In order to have the cpu holding the lock available let the
lock profile config option select DEBUG_LOCKS.

As printing the lock profile data will make use of locking, too, we
need to disable spinlock debugging before calling
spinlock_profile_printall() from panic().

While at it remove a superfluous #ifdef CONFIG_LOCK_PROFILE and rename
CONFIG_LOCK_PROFILE to CONFIG_DEBUG_LOCK_PROFILE.

Also move the .lockprofile.data section to init area in linker scripts
as the data is no longer needed after boot.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen: add new CONFIG_DEBUG_LOCKS option

Instead of enabling debugging for debug builds only add a dedicated
Kconfig option for that purpose which defaults to DEBUG.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

spinlocks: in debug builds store cpu holding the lock

Add the cpu currently holding the lock to struct lock_debug. This makes
analysis of locking errors easier and it can be tested whether the
correct cpu is releasing a lock again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/PCI: read MSI-X table entry count early

Rather than doing this every time we set up interrupts for a device
anew (and then in two distinct places) fill this invariant field
right after allocating struct arch_msix.

While at it also obtain the MSI-X capability structure position just
once, in msix_capability_init(), rather than in each caller.

Furthermore take the opportunity and eliminate the multi_msix_capable()
alias of msix_table_size().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

AMD/IOMMU: let callers of amd_iommu_alloc_intremap_table() handle errors

Additional users of the function will want to handle errors more
gracefully. Remove the BUG_ON()s and make the current caller panic()
instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

AMD/IOMMU: introduce a "valid" flag for IVRS mappings

For us to no longer blindly allocate interrupt remapping tables for
everything the ACPI tables name, we can't use struct ivrs_mappings'
intremap_table field anymore to also have the meaning of "this entry
is valid". Add a separate boolean field instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

AMD/IOMMU: don't free shared IRT multiple times

Calling amd_iommu_free_intremap_table() for every IVRS entry is correct
only in per-device-IRT mode. Use a NULL 2nd argument to indicate that
the shared table should be freed, and call the function exactly once in
shared mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

livepatch: always print XENLOG_ERR information (ARM, ELF)

This complements [1] commit for ARM and livepatch_elf files.

[1] 4470efeae4 livepatch: always print XENLOG_ERR information

Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>

microcode: pass a patch pointer to apply_microcode()

apply_microcode()'s always loading the cached ucode patch forces
a patch to be stored before being loaded. Make apply_microcode()
accept a patch pointer to remove the limitation so that a patch
can be stored after a successful loading.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

microcode/amd: call svm_host_osvw_init() in common code

Introduce a vendor hook, .end_update_percpu, for svm_host_osvw_init().
The hook function is called on each cpu after loading an update.
It is a preparation for spliting out apply_microcode() from
cpu_request_microcode().

Note that svm_host_osvm_init() should be called regardless of the
result of loading an update.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

microcode: remove pointless 'cpu' parameter

Some callbacks in microcode_ops or related functions take a cpu
id parameter. But at current call sites, the cpu id parameter is
always equal to current cpu id. Some of them even use an assertion
to guarantee this. Remove this redundent 'cpu' parameter.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

microcode: remove struct ucode_cpu_info

Remove the per-cpu cache field in struct ucode_cpu_info since it has
been replaced by a global cache. It would leads to only one field
remaining in ucode_cpu_info. Then, this struct is removed and the
remaining field (cpu signature) is stored in per-cpu area.

The cpu status notifier is also removed. It was used to free the "mc"
field to avoid memory leak.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

microcode: clean up microcode_resume_cpu

Previously, a per-cpu ucode cache is maintained. Then each CPU had one
per-cpu update cache and there might be multiple versions of microcode.
Thus microcode_resume_cpu tried best to update microcode by loading
every update cache until a successful load.

But now the cache struct is simplified a lot and only a single ucode is
cached. a single invocation of ->apply_microcode() would load the cache
and make microcode updated.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

microcode: introduce a global cache of ucode patch

to replace the current per-cpu cache 'uci->mc'.

With the assumption that all CPUs in the system have the same signature
(family, model, stepping and 'pf'), one microcode update matches with
one cpu should match with others. Having differing microcode revisions
on cpus would cause system unstable and should be avoided. Hence, caching
one microcode update is good enough for all cases.

Introduce a global variable, microcode_cache, to store the newest
matching microcode update. Whenever we get a new valid microcode update,
its revision id is compared against that of the microcode update to
determine whether the "microcode_cache" needs to be replaced. And
this global cache is loaded to cpu in apply_microcode().

All operations on the cache is protected by 'microcode_mutex'.

Note that I deliberately avoid touching the old per-cpu cache ('uci->mc')
as I am going to remove it completely in the following patches. We copy
everything to create the new cache blob to avoid reusing some buffers
previously allocated for the old per-cpu cache. It is not so efficient,
but it is already corrected by a patch later in this series.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

microcode/amd: distinguish old and mismatched ucode in microcode_fits()

Sometimes, an ucode with a level lower than or equal to current CPU's
patch level is useful. For example, to work around a broken bios which
only loads ucode for BSP, when BSP parses an ucode blob during bootup,
it is better to save an ucode with lower or equal level for APs

No functional change is made in this patch. But following patch would
handle "old ucode" and "mismatched ucode" separately.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

microcode/intel: extend microcode_update_match()

to a more generic function. So that it can be used alone to check
an update against the CPU signature and current update revision.

Note that enum microcode_match_result will be used in common code
(aka microcode.c), it has been placed in the common header. And
constifying the parameter of microcode_sanity_check() such that it
can be called by microcode_update_match().

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

public/xen.h: update the comment explaining 'Wallclock time'

Since commit 0629adfd80e "Actually set a HVM domain's time offset when it
sets the RTC", the comment in the public header has been misleading, since
it claims that wallclock time is only updated by control software.
Moreover, the comments stating that wc_sec and wc_nsec are seconds and
nanoseconds (respectively) in UTC since the Unix epoch are bogus. Their
values are adjusted by the domain's time_offset_seconds value, which is
updated by a guest write to the emulated RTC and hence the wallclock
timezone is under guest control.

This patch attempts to bring the comment in line with reality whilst
keeping it reasonably short.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

Update my MAINTAINERS entries

My Citrix email address will expire shortly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

debugtrace: fix Arm build

Add missing #includes.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/arm: setup: Relocate the Device-Tree later on in the boot

At the moment, the Device-Tree is relocated into xenheap while setting
up the memory subsystem. This is actually not necessary because the
early mapping is still present and we don't require the virtual address
to be stable until unflatting the Device-Tree.

So the relocation can safely be moved after the memory subsystem is
fully setup. This has the nice advantage to make the relocation common
and let the xenheap allocator decides where to put it.

Lastly, the device-tree is not going to be used for ACPI system. So
there are no need to relocate it and can just be discarded.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: bootfd: Fix indentation in process_multiboot_node()

One line in process_multiboot_node() is using hard tab rather than soft
tab. So fix it!

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

scripts/add_maintainers.pl: Add logic to use V entry

Add logic to use V section entry in THE REST for identifying xen trees

Specifically:
* Move check until after the MAINTAINERS file has been read
* Add get_xen_maintainers_file_version() for check
* Remove top_of_tree as not needed any more
* Fail with extended error message when used out of tree

Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

MAINTAINERS: Add V section entry to allow identification of Xen file

This change provides sufficient information to allow get_maintainer.pl /
add_maintainers.pl scripts to be run on xen sister repositories such as
mini-os.git, osstest.git, etc

A suggested template for sister repositories of Xen is

========================================================
This file follows the same conventions as outlined in
xen.git:MAINTAINERS. Please refer to the file in xen.git
for more information.

THE REST
M:      MAINTAINER1 <maintainer1@email.com>
M:      MAINTAINER2 <maintainer2@email.com>
L:      xen-devel@lists.xenproject.org
S:      Supported
F:      *
F:      */
V:      xen-maintainers-1
========================================================

Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

scripts/add_maintainers.pl: Remove hardcoding

Instead of using a hardcoded location, inherit the
location from $0

Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

debugtrace: add entry when entry count is wrapping

The debugtrace entry count is a 32 bit variable, so it can wrap when
lots of trace entries are being produced. Making it wider would result
in a waste of buffer space as the printed count value would consume
more bytes when not wrapping.

So instead of letting the count value grow to huge values let it wrap
and add a wrap counter printed in this situation. This will keep the
needed buffer space at today's value while avoiding to loose a way to
sort all entries in case multiple trace buffers are involved.

Note that the wrap message will be printed before the first trace
entry in case output is switched to console early. This is on purpose
in order to enable a future support of debugtrace to console without
any allocated buffer.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

debugtrace: add per-cpu buffer option

debugtrace is normally writing trace entries into a single trace
buffer. There are cases where this is not optimal, e.g. when hunting
a bug which requires writing lots of trace entries and one cpu is
stuck. This will result in other cpus filling the trace buffer and
finally overwriting the interesting trace entries of the hanging cpu.

In order to be able to debug such situations add the capability to use
per-cpu trace buffers. This can be selected by specifying the
debugtrace boot parameter with the modifier "cpu:", like:

debugtrace=cpu:16

At the same time switch the parsing function to accept size modifiers
(e.g. 4M or 1G).

Printing out the trace entries is done for each buffer in order to
minimize the effort needed during printing. As each entry is prefixed
with its sequence number sorting the entries can easily be done when
analyzing them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

sysctl: report shadow paging capability

Report whether shadow paging is supported by the hypervisor, since it
can be disabled at build time.

Reuse and tweak LIBXL_HAVE_PHYSINFO_CAP_HAP as it hasn't appeared in a
released version of Xen yet.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/msr: Fix 'plaform' typo

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

sysctl/libxl: choose a sane default for HAP

Current libxl code will always enable Hardware Assisted Paging (HAP),
expecting that the hypervisor will fallback to shadow if HAP is not
available. With the changes to DOMCTL_createdomain that's not the case
any longer, and the hypervisor will raise an error if HAP is not
available instead of silently falling back to shadow.

In order to keep the previous functionality report whether HAP is
available or not in XEN_SYSCTL_physinfo, so that the toolstack can
select a sane default if there's no explicit user selection of whether
HAP should be used.

Note that on ARM hardware HAP capability is always reported since it's
a required feature in order to run Xen.

Fixes: d0c0ba7d3de ('x86/hvm/domain: remove the 'hap_enabled' flag')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

x86/shadow: fold p2m page accounting into sh_min_allocation()

This is to make the function live up to the promise its name makes. And
it simplifies all callers.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>

tools/ocaml: abi check: #include on x86 only. Spotted by Gitlab CI

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86emul: fix test harness and fuzzer build dependencies

Commit fd35f32b4b ("tools/x86emul: Use struct cpuid_policy in the
userspace test harnesses") didn't account for the dependencies of
cpuid-autogen.h to potentially change between incremental builds. In
particular the harness has a "run" goal which is supposed to be usable
independently of the rest of the tools sub-tree building, and both the
harness and the fuzzer code are also supposed to be buildable
independently. Therefore a re-build of the generated header needs to be
triggered first, which is achieved by introducing a new top-level target
pattern (for just the "run" part for now).

Further cpuid.o did not have any dependencies added for it.

Finally, while at it, add a "run" target to the cpu-policy test harness.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/IRQ: make 'i' debug output more tabular again

Since the affinity values are no longer of uniform width, move them
further to the right such that as much of the output as possible comes
out aligned with one another.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

ioreq: fix hvm_all_ioreq_servers_add_vcpu fail path cleanup

The loop in FOR_EACH_IOREQ_SERVER is backwards hence the cleanup on
failure needs to be done forwards.

Fixes: 97a5a3e30161 ('x86/hvm/ioreq: maintain an array of ioreq servers rather than a list')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

tools/ocaml: Fix build error with CentOS 7

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) complains:

  xenctrl_stubs.c: In function 'stub_xc_domain_create':
  xenctrl_stubs.c:216:28: error: 'val' may be used uninitialized
                          in this function [-Werror=maybe-uninitialized]
     cfg.arch.emulation_flags = ocaml_list_to_c_bitmap
                              ^
  xenctrl_stubs.c:198:12: error: 'val' may be used uninitialized
                          in this function [-Werror=maybe-uninitialized]
    cfg.flags = ocaml_list_to_c_bitmap
              ^
  cc1: all warnings being treated as errors

GCC doesn't point at the correct piece of code, but the diagnostic text is
correct, and can occur when the list is empty. Initialise val to 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: abi: Use formal conversion and check in more places

Now we have a caller for ocaml_list_to_c_bitmap.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: tools/ocaml: Add missing CDF_* values

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

tools/ocaml: abi-check: Check properly.

Fix a broken regexp which would mention `$/' when it ought to have
mentioned `$'.  The result would be that it would match lines like
    type some_ocaml_type = Thing | Other_Thing
but ignore everything but the type name, giving wrong answers.

Check that we check mentioned types.  Otherwise if we fail to spot
some suitable thing in the ocaml, we would just omit checking this
type !

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: Reformat domain_create_flag

This will allow us to apply the abi checker soon.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: abi-check: Cope with multiple conversions of same type

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: abi-check: Improve output and error messages

In the generated C, add some comments saying where we found the ocaml
type. This helps with debugging. (I considered emitting #line
directives but decided this would be more confusing than helpful.)

Improve two dies.

Use better-named filehandles (perl prints thier names when it dies).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: abi handling: Provide ocaml->C conversion/check

No users of this yet so no overall change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml: abi-check: Add comments

Provide interface documentation for this script.

Explain why we check .ml not .mli.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

xen/domctl: Drop guest suffix from XEN_DOMCTL_CDF_hvm

The suffix is redundant, and dropping it helps to simplify the Ocaml/C
ABI checking.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml: Introduce xenctrl ABI build-time checks

c/s f089fddd941 broke the Ocaml ABI by renumering
XEN_SYSCTL_PHYSCAP_directio without adjusting the Ocaml
physinfo_cap_flag enumeration.

Add build machinery which will check the ABI correspondence.

This will result in a compile time failure whenever constants get
renumbered/added without a compatible adjustment to the Ocaml ABI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <Andrew.Cooper3@citrix.com>

tools/ocaml: Add missing CAP_PV

c/s f089fddd941 broke the Ocaml ABI by renumering XEN_SYSCTL_PHYSCAP_directio
without adjusting the Ocaml physinfo_cap_flag enumeration. Fix this by
inserting CAP_PV between CAP_HVM and CAP_DirectIO.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/ocaml: Add missing X86_EMU_VPCI

This was missing from x86_arch_emulation_flags.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com>

x86/boot: Improve code generation from bootsym()

The code generation for bootsym() is atrocious, and unnecessarily complicated.
Given the appropriate physical address, all we need is to construct a virtual
address of the appropriate type.

  add/remove: 0/0 grow/shrink: 0/9 up/down: 0/-4256 (-4256)
  Function                                     old     new   delta
  kexec_reserve_area.constprop                 165     159      -6
  reset_videomode_after_s3                     231      70    -161
  identify_cpu                                1341    1176    -165
  parse_acpi_sleep                             408     240    -168
  early_init_intel                             632     440    -192
  __cpu_up                                    1983    1682    -301
  do_platform_op                              6469    5526    -943
  compat_platform_op                          6433    5482    -951
  __start_xen                                12939   11570   -1369
  Total: Before=3341298, After=3337042, chg -0.13%

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/cpuid: Fix build with CentOS 6 following c/s 7479151106

GCC of a CentOS 6 vintage complains:

cpuid.c: In function 'parse_xen_cpuid':
cpuid.c:32: error: 'mid' may be used uninitialized in this function

This can't occur in practice because the while() loop is guarenteed to be
entered, but initialise mid to work around the issues.

Spotted by Gitlab CI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/cpuid: Fix handling of the CPUID.7[0].eax levelling MSR

7a0 is an integer field, not a mask - taking the logical and of the hardware
and policy values results in nonsense. Instead, take the policy value
directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@cirtrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen: refactor debugtrace data

As a preparation for per-cpu buffers do a little refactoring of the
debugtrace data: put the needed buffer admin data into the buffer as
it will be needed for each buffer. In order not to limit buffer size
switch the related fields from unsigned int to unsigned long, as on
huge machines with RAM in the TB range it might be interesting to
support buffers >4GB.

While at it switch debugtrace_send_to_console and debugtrace_used to
bool and delete an empty line.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen: move debugtrace coding to common/debugtrace.c

Instead of living in drivers/char/console.c move the debugtrace
related coding to a new file common/debugtrace.c

No functional change, code movement only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen: fix debugtrace clearing

After dumping the debugtrace buffer it is cleared. This results in some
entries not being printed in case the buffer is dumped again before
having wrapped.

While at it remove the trailing zero byte in the buffer as it is no
longer needed. Commit b5e6e1ee8da59f introduced passing the number of
chars to be printed in the related interfaces, so the trailing 0 byte
is no longer required.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

sysctl: report existing physcaps on Arm

Current physcaps in XEN_SYSCTL_physinfo are only used by x86, albeit
the capabilities themselves are not x86 specific.

This patch adds support for also reporting the current capabilities on
Arm hardware. Note that on Arm PHYSCAP_hvm is always reported, and
setting PHYSCAP_directio has been moved to common code since the same
logic to set it is used by x86 and Arm.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/arm32: head: Don't setup the fixmap on secondary CPUs

setup_fixmap() will setup the fixmap in the boot page tables in order to
use earlyprintk and also update the register r11 holding the address to
the UART.

However, secondary CPUs are not using earlyprintk between turning the
MMU on and switching to the runtime page table. So setting up the
fixmap in the boot pages table is pointless.

This means most of setup_fixmap() is not necessary for the secondary
CPUs. The update of UART address is now moved out of setup_fixmap() and
duplicated in the CPU boot and secondary CPUs boot. Additionally, the
call to setup_fixmap() is removed from secondary CPUs boot.

Lastly, take the opportunity to replace load from literal pool with the
new macro mov_w.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Move assembly switch to the runtime PT in secondary CPUs path

The assembly switch to the runtime PT is only necessary for the
secondary CPUs. So move the code in the secondary CPUs path.

While this is definitely not compliant with the Arm Arm as we are
switching between two differents set of page-tables without turning off
the MMU. Turning off the MMU is impossible here as the ID map may clash
with other mappings in the runtime page-tables. This will require more
rework to avoid the problem. So for now add a TODO in the code.

Finally, the code is currently assume that r5 will be properly set to 0
before hand. This is done by create_page_tables() which is called quite
early in the boot process. There are a risk this may be oversight in the
future and therefore breaking secondary CPUs boot. Instead, set r5 to 0
just before using it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Document enable_mmu()

Document the behavior and the main registers usage within enable_mmu().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Document create_pages_tables()

Document the behavior and the main registers usage within the function.
Note that r6 is now only used within the function, so it does not need
to be part of the common register.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Rework and document zero_bss()

On secondary CPUs, zero_bss() will be a NOP because BSS only need to be
zeroed once at boot. So the call in the secondary CPUs path can be
removed.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Rework and document check_cpu_mode()

A branch in the success case can be avoided by inverting the branch
condition. At the same time, remove a pointless comment as Xen can only
run at Hypervisor Mode.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Introduce distinct paths for the boot CPU and secondary CPUs

The boot code is currently quite difficult to go through because of the
lack of documentation and a number of indirection to avoid executing
some path in either the boot CPU or secondary CPUs.

In an attempt to make the boot code easier to follow, each parts of the
boot are now in separate functions. Furthermore, the paths for the boot
CPU and secondary CPUs are now distinct and for now will call each
functions.

Follow-ups will remove unnecessary calls and do further improvement
(such as adding documentation and reshuffling).

Note that the switch from using the ID mapping to the runtime mapping
is duplicated for each path. This is because in the future we will need
to stay longer in the ID mapping for the boot CPU.

Lastly, it is now required to save lr in cpu_init() becauswe the
function will call other functions and therefore clobber lr.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Introduce print_reg

At the moment, the user should save r14/lr if it cares about it.

Follow-up patches will introduce more use of putn in place where lr
should be preserved.

Furthermore, any user of putn should also move the value to register r0
if it was stored in a different register.

For convenience, a new macro is introduced to print a given register.
The macro will take care for us to move the value to r0 and also
preserve lr.

Lastly the new macro is used to replace all the callsite of putn. This
will simplify rework/review later on.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Rework UART initialization on boot CPU

Anything executed after the label common_start can be executed on all
CPUs. However most of the instructions executed between the label
common_start and init_uart are not executed on the boot CPU.

The only instructions executed are to lookup the CPUID so it can be
printed on the console (if earlyprintk is enabled). Printing the CPUID
is not entirely useful to have for the boot CPU and requires a
conditional branch to bypass unused instructions.

Furthermore, the function init_uart is only called for boot CPU
requiring another conditional branch. This makes the code a bit tricky
to follow.

The UART initialization is now moved before the label common_start. This
now requires to have a slightly altered print for the boot CPU and set
the early UART base address in each the two path (boot CPU and
secondary CPUs).

This has the nice effect to remove a couple of conditional branch in
the code.

After this rework, the CPUID is only used at the very beginning of the
secondary CPUs boot path. So there is no need to "reserve" x24 for the
CPUID.

Lastly, take the opportunity to replace load from literal pool with the
new macro mov_w.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Don't clobber r14/lr in the macro PRINT

The current implementation of the macro PRINT will clobber r14/lr. This
means the user should save r14 if it cares about it.

Follow-up patches will introduce more use of PRINT in places where lr
should be preserved. Rather than requiring all the user to preserve lr,
the macro PRINT is modified to save and restore it.

While the comment state r3 will be clobbered, this is not the case. So
PRINT will use r3 to preserve lr.

Lastly, take the opportunity to move the comment on top of PRINT and use
PRINT in init_uart. Both changes will be helpful in a follow-up patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Mark the end of subroutines with ENDPROC

putn() and puts() are two subroutines. Add ENDPROC for the benefits of
static analysis tools and the reader.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Add a macro to move an immediate constant into a 32-bit register

The current boot code is using the pattern ldr rX, =... to move an
immediate constant into a 32-bit register.

This pattern implies to load the immediate constant from a literal pool,
meaning a memory access will be performed.

The memory access can be avoided by using movw/movt instructions.

A new macro is introduced to move an immediate constant into a 32-bit
register without a memory load. Follow-up patches will make use of it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm64: head: Fix typo in the documentation on top of init_uart()

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm64: head: Introduce a macro to get a PC-relative address of a symbol

Arm64 provides instructions to load a PC-relative address, but with some
limitations:
   - adr is enable to cope with +/-1MB
   - adrp is enale to cope with +/-4GB but relative to a 4KB page
     address

Because of that, the code requires to use 2 instructions to load any Xen
symbol. To make the code more obvious, introducing a new macro adr_l is
introduced.

The new macro is used to replace a couple of open-coded use in
efi_xen_start.

The macro is copied from Linux 5.2-rc4.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm64: head: Setup TTBR_EL2 in enable_mmu() and add missing isb

At the moment, TTBR_EL2 is setup in create_page_tables(). This is fine
as it is called by every CPUs.

However, such assumption may not hold in the future. To make change
easier, the TTBR_EL2 is not setup in enable_mmu().

Take the opportunity to add the missing isb() to ensure the TTBR_EL2 is
seen before the MMU is turned on.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm64: head: Rework and document launch()

Boot CPU and secondary CPUs will use different entry point to C code. At
the moment, the decision on which entry to use is taken within launch().

In order to avoid a branch for the decision and make the code clearer,
launch() is reworked to take in parameters the entry point and its
arguments.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: lpae: Allow more LPAE helpers to be used in assembly

A follow-up patch will require to use *_table_offset() and *_MASK helpers
from assembly. This can be achieved by using _AT() macro to remove the type
when called from assembly.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

x86/cpuid: Extend the cpuid= option to support all named features

For gen-cpuid.py, fix a comment describing self.names, and generate the
reverse mapping in self.values. Write out INIT_FEATURE_NAMES which maps a
string name to a bit position.

For parse_cpuid(), use cmdline_strcmp() and perform a binary search over
INIT_FEATURE_NAMES. A tweak to cmdline_strcmp() is needed to break at equals
signs as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/apic: do not initialize LDR and DFR for bigsmp

Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.

This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.

The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.

Remove the bogus LDR/DFR initialization.

This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bandan Das <bsd@redhat.com>
[Linux commit bae3a8d3308ee69a7dbdf145911b18dfda8ade0d]

Drop init_apic_ldr_x2apic_phys() at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/apic: include the LDR when clearing out APIC registers

Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.

This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.

Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.

Signed-off-by: Bandan Das <bsd@redhat.com>
[Linux commit 558682b5291937a70748d36fd9ba757fb25b99ae]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: drop CONFIG_X86_MCE_THERMAL

There's no point having this if it's not exposed through Kconfig.

Take the liberty and also drop an unnecessary "return" in context.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/mwait-idle: add support for Jacobsville

Jacobsville uses the same C-states as Denverton.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[Linux commit 04b1d5d098491244f506c4265cc95b87210eef2f]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/xstate: make use_xsave non-init

LLVM code generation can attempt to load from a variable in the next
condition of an expression under certain circumstances, thus
attempting to load use_xsave regardless of the value of the bsp
variable, which leads to a page fault when the init section has
already been unmapped.

Fix this by making use_xsave non-init, thus preventing the page fault;
use __read_mostly instead. The LLVM bug with the discussion about this
issue can be found at:

https://bugs.llvm.org/show_bug.cgi?id=39707

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

Revert "x86/shim: Refresh pvshim_defconfig"

This reverts commit 32b1d62887d01f85f0c1d2e0103f69f74e1f6fa3 and its fixup
060f4eee0fb408b316548775ab921e16b7acd0e0, which are still causing build and
test problems.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/AMD: Fix handling of x87 exception pointers on Fam17h hardware

AMD Pre-Fam17h CPUs "optimise" {F,}X{SAVE,RSTOR} by not saving/restoring
FOP/FIP/FDP if an x87 exception isn't pending.  This causes an information
leak, CVE-2006-1056, and worked around by several OSes, including Xen.  AMD
Fam17h CPUs no longer have this leak, and advertise so in a CPUID bit.

Introduce the RSTR_FP_ERR_PTRS feature, as specified by AMD, and expose to all
guests by default.  While adjusting libxl's cpuid table, add CLZERO which
looks to have been omitted previously.

Also introduce an X86_BUG bit to trigger the (F)XRSTOR workaround, and set it
on AMD hardware where RSTR_FP_ERR_PTRS is not advertised.  Optimise the
conditions for the workaround paths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/feature: Generalise synth and introduce a bug word

Future changes are going to want to use cpu_bug_* in a mannor similar to
Linux. Introduce one bug word, and generalise the calculation of
NCAPINTS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/vtd: Drop struct intel_iommu

The sole remaining member of struct intel_iommu is the drhd backpointer. Move
this into struct vtd_iommu, replacing the the 'intel' pointer.

This removes one dynamic memory allocation per IOMMU on the system.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/vtd: Drop struct iommu_flush

It is unclear why this abstraction exists, but iommu_get_flush() returns
possibly NULL and every user unconditionally dereferences the result.  In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.

Move the two function pointers into struct vtd_iommu (using a flush prefix),
and delete iommu_get_flush().  Furthermore, there is no need to pass the IOMMU
pointer to the callbacks via a void pointer, so change the parameter to be
correctly typed as struct vtd_iommu.  Clean up bool_t to bool in surrounding
context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/vtd: Drop struct ir_ctrl

It is unclear why this abstraction exists, but iommu_ir_ctrl() returns
possibly NULL and every user unconditionally dereferences the result. In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.

Move the fields into struct vtd_iommu, and delete iommu_ir_ctrl().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/vtd: Drop struct qi_ctrl

It is unclear why this abstraction exists, but iommu_qi_ctrl() returns
possibly NULL and every user unconditionally dereferences the result. In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.

Move the sole member into struct vtd_iommu, and delete iommu_qi_ctrl().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/vtd: Rename struct iommu to vtd_iommu

VT-d's local struct iommu is an overly-generic name, for a structure which in
practice maps 1-to-1 with the real IOMMUs in the system.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

VT-d/ATS: tidy device_in_domain()

Use appropriate types. Drop unnecessary casts. Check for failures which
can (at least in theory because of non-obvious breakage elsewhere)
occur, instead of ones which really can't (map_domain_page() won't
return NULL).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86: remove sched-if.h includes from various sources

xen/sched-if.h is included in multiple sources where it isn't directly
needed. Remove those #include statements.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/cpu-policy: work around bogus warning in test harness

Despite %.12s properly limiting the number of characters read from
ident[], gcc 9 (at least up to 9.2.0) warns about the strings not
being nul-terminated:

test-cpu-policy.c:64:18: error: '%.12s' directive argument is not a nul-terminated string [-Werror=format-overflow=]
   64 |             fail("  Test '%.12s', expected vendor %u, got %u\n",
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test-cpu-policy.c:20:12: note: in definition of macro 'fail'
   20 |     printf(fmt, ##__VA_ARGS__);                 \
      |            ^~~
test-cpu-policy.c:64:27: note: format string is defined here
   64 |             fail("  Test '%.12s', expected vendor %u, got %u\n",
      |                           ^~~~~
test-cpu-policy.c:44:7: note: referenced argument declared here
   44 |     } tests[] = {
      |       ^~~~~

The issue was reported against gcc in their bugzilla (bug 91667).

Re-order array entries, oddly enough suppressing the warning.

Reported-by: Christopher Clark <christopher.w.clark@gmail.com>
Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

p2m/ept: add _subtree suffix to ept_invalidate_emt

So that the name implies the function is used to walk the page table
pointer passed as parameter. Drop the parent_ prefix from the level
parameter, since the level passed is the one matching the EPT entry
passed in the mfn parameter.

While there also change bool_t to bool and add an assert to make sure
no level 0 entries (ie: 4K EPT leaf entries) are passed as parameters.

No functional change intended.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

VT-d: avoid PCI device lookup

The two uses of pci_get_pdev_by_domain() lack proper locking, but are
also only used to get hold of a NUMA node ID. Calculate and store the
node ID earlier on and remove the lookups (in lieu of fixing the
locking).

While doing this it became apparent that iommu_alloc()'s use of
alloc_pgtable_maddr() would occur before RHSAs would have been parsed:
iommu_alloc() gets called from the DRHD parsing routine, which - on
spec conforming platforms - happens strictly before RHSA parsing. Defer
the allocation until after all ACPI table parsing has finished,
established the node ID there first.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

VT-d: tidy <X>_to_<Y>() functions

Drop iommu_to_drhd() altogether - there's no need for a loop here, the
corresponding DRHD is a field in struct intel_iommu.

Constify drhd_to_rhsa()'s parameter and adjust style.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/shadow: don't enable shadow mode with too small a shadow allocation (part 2)

Commit 2634b997af ("x86/shadow: don't enable shadow mode with too small
a shadow allocation") was incomplete: The adjustment done there to
shadow_enable() is also needed in shadow_one_bit_enable(). The (new)
problem report was (apparently) a failed PV guest migration followed by
another migration attempt for that same guest. Disabling log-dirty mode
after the first one had left a couple of shadow pages allocated (perhaps
something that also wants fixing), and hence the second enabling of
log-dirty mode wouldn't have allocated anything further.

Reported-by: James Wang <jnwang@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>