dgit.raspbian.org Git

Manual merge of version 4.17.0-1+rpi1 and 4.17.1+2-gb773c48e36-1 to produce 4.17.1+2-gb773c48e36-1+rpi1

Commit patch queue (exported by git-debrebase)

[git-debrebase make-patches: export and commit patches]

Declare fast forward / record previous work

[git-debrebase pseudomerge: quick]

xen/arch/x86: make objdump output user locale agnostic

The objdump output is fed to grep, so make sure it doesn't change with
different user locales and break the grep parsing.
This problem was identified while updating xen in Debian and the fix is
needed for generating reproducible builds in varying environments.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

give meaningful error message if qemu device model is unavailable

There's no sense to switch to qemu-xen-traditional device model
if that one is not enabled in the first place. This way we'll
have a chance later to print a message suggesting to install the
missing qemu package if we *actually* need qemu for the device model.

docs: set date to SOURCE_DATE_EPOCH if available

Use the solution described in [1] to replace the call to the 'date'
command with a version that uses SOURCE_DATE_EPOCH if available. This
is needed for reproducible builds.

[1] https://reproducible-builds.org/docs/source-date-epoch/

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
[Hans van Kranenburg]
Note: this patch is submitted upstream but not committed yet. We
expect that it gets in. Otherwise, we don't wait and already have it
here because I want to have the reproducible build work completed.

tools: don't build/ship xenmon

This is something that hasn't been touched (except for making it Python
3 compatible, which failed) since 2007. Don't build or ship it.

    -# xenmon
      File "/usr/sbin/xenmon", line 680
stop_cmd = "/usr/bin/pkill -INT -z global xenbaked"
    TabError: inconsistent use of tabs and spaces in indentation

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

tools/xl/bash-completion: also complete 'xen'

We have the `xen` alias for xl in Debian, since in the past it was a
command that could execute either xl or xm.

Now, it always does xl, so, complete the same stuff for it as we have
for xl.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
[git-debrebase split: mixed commit: upstream part]

pygrub: Specify -rpath LIBEXEC_LIB when building fsimage.so

If LIBEXEC_LIB is not on the default linker search path, the python
fsimage.so module fails to find libfsimage.so.

Add the relevant directory to the rpath explicitly.

(This situation occurs in the Debian package, where
--with-libexec-libdir is used to put each Xen version's libraries and
utilities in their own directory, to allow them to be coinstalled.)

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

pygrub: Set sys.path

We install libfsimage in a non-standard path for Reasons.
(See debian/rules.)

This patch was originally part of `tools-pygrub-prefix.diff'
(eg commit 51657319be54) and included changes to the Makefile to
change the installation arrangements (we do that part in the rules now
since that is a lot less prone to conflicts when we update) and to
shared library rpath (which is now done in a separate patch).

(Commit message rewritten by Ian Jackson.)

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
squash! pygrub: Set sys.path and rpath

hotplug-common: Do not adjust LD_LIBRARY_PATH

This is in the upstream script because on non-Debian systems, the
default install locations in /usr/local/lib might not be on the linker
path, and as a result the hotplug scripts would break.

A reason we might need it in Debian is our multiple version
coinstallation scheme. However, the hotplug scripts all call the
utilities via the wrappers, and the binaries are configured to load
from the right place anyway.

This setting is an annoyance because it requires libdir, which is an
arch-specific path but comes from a file we want to put in
xen-utils-common, an arch:all package.

So drop this setting.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

sysconfig.xencommons.in: Strip and debianize

Strip all options that are for stuff we don't ship, which is 1)
xenstored as stubdom and 2) the new options for oom score and open file
descriptor limit, which would not have any effect, because we're
shipping different init scripts... :|

It seems useful to give the user the option to revert to xenstored
instead of the default oxenstored if they really want.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

t/h/L/vif-common.sh: disable handle_iptable

Also see Debian bug #894013. The current attempt at providing
anti-spoofing rules results in a situation that does not have any
effect. Also note that forwarding bridged traffic to iptables is not
enabled by default, and that for openvswitch users it does not make any
sense.

So, stop cluttering the live iptables ruleset.

This functionality seems to be introduced before 2004 and since then it
has never got some additional love.

It would be nice to have a proper discussion upstream about how Xen
could provide some anti mac/ip spoofing in the dom0. It does not seem to
be a trivial thing to do, since it requires having quite some knowledge
about what the domU is allowed to do or not (e.g. a domU can be a
router...).

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

docs/man/xen-vbd-interface.7: Provide properly-formatted NAME section

This manpage was omitted from
docs/man: Provide properly-formatted NAME sections
because I was previously building with markdown not installed.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

shim: Provide separate install-shim target

When building on a 32-bit userland, the user wants to build 32-bit
tools and a 64-bit hypervisor.  This involves setting XEN_TARGET_ARCH
to different values for the tools build and the hypervisor build.

So the user must invoke the tools build and the hypervisor build
separately.

However, although the shim is done by the tools/firmware Makefile, its
bitness needs to be the same as the hypervisor, not the same as the
tools.  When run with XEN_TARGET_ARCH=x86_32, it it skipped, which is
wrong.

So the user must invoke the shim build separately.  This can be done
with
   make -C tools/firmware/xen-dir XEN_TARGET_ARCH=x86_64

However, tools/firmware/xen-dir has no `install' target.  The
installation of all `firmware' is done in tools/firmware/Makefile.  It
might be possible to fix this, but it is not trivial.  For example,
the definitions of INST_DIR and DEBG_DIR would need to be copied, as
would an appropriate $(INSTALL_DIR) call.

For now, provide an `install-shim' target in tools/firmware/Makefile.

This has to be called from `install' of course.  We can't make it
a dependency of `install' because it might be run before `all' has
completed.  We could make it depend on a `shim' target but such
a target is nearly impossible to write because everything is done by
the inflexible subdir-$@ machinery.

The overally result of this patch is that existing make invocations
work as before.  But additionally, the user can say
  make -C tools/firmware install-shim XEN_TARGET_ARCH=x86_64
to install the shim.  The user must have built it already.
Unlike the build rune, this install-rune is properly conditional
so it is OK to call on ARM.

What a mess.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

config/Tools.mk.in: Respect caller's CONFIG_PV_SHIM

This makes it easier to disable the shim build. (In Debian we need to
build the shim separately because it needs different compiler flags).

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
[ Hans: adjust from tools/firmware/Makefile to config/Tools.mk.in to
follow changes that happened in 8845155c83 ("pvshim: make PV shim build
selectable from configure") ]
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

.gitignore: Add configure output which we always delete and regenerate

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

autoconf: Provide libexec_libdir_suffix

This is going to be used to put libfsimage.so into a path containing
the multiarch triplet.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

tools-libfsimage-prefix.diff

\o/

Do not build the instruction emulator

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

Remove static solaris support from pygrub

Patch-Name: tools-pygrub-remove-static-solaris-support

Gbp-Pq: Topic misc
Gbp-Pq: Name tools-pygrub-remove-static-solaris-support

Do not ship COPYING into /usr/include

This is not wanted in Debian. COPYING ends up in
/usr/share/doc/xen-*copyright.

Patch-Name: tools-include-no-COPYING.diff

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

config-prefix.diff

Patch-Name: config-prefix.diff

Gbp-Pq: Topic prefix-abiname
Gbp-Pq: Name config-prefix.diff

Display Debian package version in hypervisor log

During hypervisor boot, disable the banner and nicely display the xen
version as well as the Maintainer address from debian/control.

For this to work the DEB_VERSION and DEB_MAINTAINER variables needs to
be set by debian/rules.

Original patch by Bastian Blank <waldi@debian.org>
Modified by
Hans van Kranenburg <hans@knorrie.org>
Maximilian Engelhardt <maxi@daemonizer.de>

Delete configure output

These autogenerated files are not useful in Debian; dh_autoreconf will
regenerate them.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

Delete config.sub and config.guess

dh_autoreconf will provide these back.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

debian/changelog: finish 4.17.1+2-gb773c48e36-1

Update changelog for new upstream 4.17.1+2-gb773c48e36

[git-debrebase changelog: new upstream 4.17.1+2-gb773c48e36]

Update to upstream 4.17.1+2-gb773c48e36

[git-debrebase anchor: new upstream 4.17.1+2-gb773c48e36, merge]

update Xen version to 4.17.2-pre

x86/amd: fix legacy setting of SSBD on AMD Family 17h

The current logic to set SSBD on AMD Family 17h and Hygon Family 18h
processors requires that the setting of SSBD is coordinated at a core
level, as the setting is shared between threads.  Logic was introduced
to keep track of how many threads require SSBD active in order to
coordinate it, such logic relies on using a per-core counter of
threads that have SSBD active.

Given the current logic, it's possible for a guest to under or
overflow the thread counter, because each write to VIRT_SPEC_CTRL.SSBD
by the guest gets propagated to the helper that does the per-core
active accounting.  Overflowing the counter is not so much of an
issue, as this would just make SSBD sticky.

Underflowing however is more problematic: on non-debug Xen builds a
guest can perform empty writes to VIRT_SPEC_CTRL that would cause the
counter to underflow and thus the value gets saturated to the max
value of unsigned int.  At which points attempts from any thread to
set VIRT_SPEC_CTRL.SSBD won't get propagated to the hardware anymore,
because the logic will see that the counter is greater than 1 and
assume that SSBD is already active, effectively loosing the setting
of SSBD and the protection it provides.

Fix this by introducing a per-CPU variable that keeps track of whether
the current thread has legacy SSBD active or not, and thus only
attempt to propagate the value to the hardware once the thread
selected value changes.

This is XSA-431 / CVE-2022-42336

Fixes: b2030e6730a2 ('amd/virt_ssbd: set SSBD at vCPU context switch')
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: eda98ea870803ea204a1928519b3f21ec6a679b6
master date: 2023-05-16 17:17:24 +0200

update Xen version to 4.17.1

x86/shadow: restore dropped check in sh_unshadow_for_p2m_change()

As a result of 241702e064604dbb3e0d9b731aa8f45be448243b the
mfn_valid() check in sh_unshadow_for_p2m_change() was lost.  That
allows sh_remove_shadows() to be called with gfns that have no backing
page, causing an ASSERT to trigger in debug builds or dereferencing an
arbitrary pointer partially under guest control in non-debug builds:

RIP:    e008:[<ffff82d0402dcf2c>] sh_remove_shadows+0x19f/0x722
RFLAGS: 0000000000010246   CONTEXT: hypervisor (d0v2)
[...]
Xen call trace:
   [<ffff82d0402dcf2c>] R sh_remove_shadows+0x19f/0x722
   [<ffff82d0402e28f4>] F arch/x86/mm/shadow/hvm.c#sh_unshadow_for_p2m_change+0xab/0x2b7
   [<ffff82d040311931>] F arch/x86/mm/p2m-pt.c#write_p2m_entry+0x19b/0x4d3
   [<ffff82d0403131b2>] F arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x67b/0xa8e
   [<ffff82d040302c92>] F p2m_set_entry+0xcc/0x149
   [<ffff82d040305a50>] F unmap_mmio_regions+0x17b/0x2c9
   [<ffff82d040241e5e>] F do_domctl+0x11f3/0x195e
   [<ffff82d0402c7e10>] F hvm_hypercall+0x5b1/0xa2d
   [<ffff82d0402adc72>] F vmx_vmexit_handler+0x130f/0x1cd5
   [<ffff82d040203602>] F vmx_asm_vmexit_handler+0xf2/0x210

****************************************
Panic on CPU 1:
Assertion 'mfn_valid(gmfn)' failed at arch/x86/mm/shadow/common.c:2203
****************************************

Fix this by restoring the mfn_valid() check in
sh_unshadow_for_p2m_change(), unifying it with the rest of the checks
that are done at the start of the function.

This is XSA-430 / CVE-2022-42335

Fixes: 241702e064 ('x86/shadow: slightly consolidate sh_unshadow_for_p2m_change() (part II)')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f6c3cb21628f7bed73cb992da400f6b36630f290
master date: 2023-04-25 15:44:54 +0200

automation: Remove installation of packages from test scripts

Now, when these packages are already installed in the respective
containers, we can remove them from the test scripts.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 72cfe1c3ad1fae95f4f0ac51dbdd6838264fdd7f
master date: 2022-12-09 14:55:33 -0800

xen/ELF: Fix ELF32 PRI formatters

It is rude to hide width formatting inside a PRI* macro, doubly so when it's
only in one bitness of the macro.

However its fully buggy when all the users use %#"PRI because then it expands
to the common trap of %#08x which does not do what the author intends.

Switch the 32bit ELF PRI formatters to use plain integer PRI's, just like on
the 64bit side already. No practical change.

Fixes: 7597fabca76e ("livepatch: Include sizes when an mismatch occurs")
Fixes: 380b229634f8 ("xsplice: Implement payload loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: cfa2bb82c01f0c656804cedd8f44eb2a99a2b5bc
master date: 2023-04-19 15:55:29 +0100

x86/livepatch: Fix livepatch application when CET is active

Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:

  (XEN) livepatch: lp: Verifying enabled expectations for all functions
  (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
  (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
  (XEN) livepatch: lp: Applying 1 functions
  (XEN) hi_func: Hi! (called 1 times)
  (XEN) Hook executing.
  (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
  (XEN) *** DOUBLE FAULT ***
  <many double faults>

The assertion failure is from a global (system wide) TLB flush initiated by
modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
sure exactly what causes the #DF's, but it doesn't really matter either
because they highlight a latent bug that I'd overlooked with the CET-SS vs
patching work the first place.

While we're careful to arrange for the patching CPU to avoid encountering
non-shstk memory with transient shstk perms, other CPUs can pick these
mappings up too if they need to re-walk for uarch reasons.

Another bug is that for livepatching, we only disable CET if shadow stacks are
in use.  Running on Intel CET systems when Xen is only using CET-IBT will
crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
still active.

Also, we never went and cleared the dirty bits on .rodata.  This would
matter (for the same reason it matters on .text - it becomes a valid target
for WRSS), but we never actually patch .rodata anyway.

Therefore rework how we do patching for both alternatives and livepatches.

Introduce modify_xen_mappings_lite() with a purpose similar to
modify_xen_mappings(), but stripped down to the bare minimum as it's used in
weird contexts.  Leave all complexity to the caller to handle.

Instead of patching by clearing CR0.WP (and having to jump through some
fragile hoops to disable CET in order to do this), just transiently relax the
permissions on .text via l2_identmap[].

Note that neither alternatives nor livepatching edit .rodata, so we don't need
to relax those permissions at this juncture.

The perms are relaxed globally, but this is safe enough.  Alternatives run
before we boot APs, and Livepatching runs in a quiesced state where the other
CPUs are not doing anything interesting.

This approach is far more robust.

Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 8676092a0f16ca6ad188d3fb270784a2caecf542
master date: 2023-04-18 20:20:26 +0100

x86/hvm: Disallow disabling paging in 64bit mode

The Long Mode consistency checks exist to "ensure that the processor does not
enter an undefined mode or state that results in unpredictable behavior".  APM
Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
preventing the OS from trying to exit Long mode while in 64bit mode.  This
could leave the CPU in Protected Mode with an %rip above the 4G boundary.

Experimentally, AMD CPUs really do permit this state transition.  An OS which
tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
to be going on behind the scenes ought to result in sane continued execution.

Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
section instructs peoples to switch to a compatibility mode segment first
before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
Architecture".

Either way, this appears to have been a genuine oversight in the AMD64 spec.

Intel, on the other hand, rejects this state transition with #GP.

Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
4.1.2 "Paging-Mode Enable" was altered from

  If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
  exception (#GP); software should clear CR4.PCIDE before attempting to
  disable paging.

to

  If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
  clear CR0.PG causes a general-protection exception (#GP). Software should
  transition to compatibility mode and clear CR4.PCIDE before attempting to
  disable paging.

which acknowledges this corner case, but there doesn't appear to be any other
discussion even in the relevant Long Mode sections.

So it appears that Intel spotted and addressed the corner case in IA-32e mode,
but were 15 years late to document it.

Xen was written to the AMD spec, and misses the check.  Follow the Intel
behaviour, because it is more sensible and avoids hitting a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 18c128ba66e6308744850aca96dbffd18f91c29b
master date: 2023-04-14 18:18:20 +0100

x86emul: pull permission check ahead for REP INS/OUTS

Based on observations on a fair range of hardware from both primary
vendors even zero-iteration-count instances of these insns perform the
port related permission checking first.

Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f41c88a6fca59f99a2eb5e7ed3d90ab7bca08b1b
master date: 2023-03-30 13:07:16 +0200

tools/xenstore: fix quota check in transaction_fix_domains()

Today when finalizing a transaction the number of node quota is checked
to not being exceeded after the transaction. This check is always done,
even if the transaction is being performed by a privileged connection,
or if there were no nodes created in the transaction.

Correct that by checking quota only if:
- the transaction is being performed by an unprivileged guest, and
- at least one node was created in the transaction

Reported-by: Julien Grall <julien@xen.org>
Fixes: f2bebf72c4d5 ("xenstore: rework of transaction handling")
Signed-off-by: Juergen Gross <jgross@suse.com>
master commit: f6b801c36bd5e4ab22a9f80c8d57121b62b139af
master date: 2023-03-29 22:02:36 +0100

CI: Remove llvm-8 from the Debian Stretch container

For similar reasons to c/s a6b1e2b80fe20. While this container is still
build-able for now, all the other problems with explicitly-versioned compilers
remain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7a298375721636290a57f31bb0f7c2a5a38956a4)

automation: Remove non-debug x86_32 build jobs

In the interest of having less jobs, we remove the x86_32 build jobs
that do release build. Debug build is very likely to be enough to find
32bit build issues.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 7b66792ea7f77fb9e587e1e9c530a7c869eecba1)

automation: Remove CentOS 7.2 containers and builds

We already have a container which track the latest CentOS 7, no need
for this one as well.

Also, 7.2 have outdated root certificate which prevent connection to
website which use Let's Encrypt.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit ba512629f76dfddb39ea9133ee51cdd9e392a927)

automation: Switch arm32 cross builds to run on arm64

Due to the limited x86 CI resources slowing down the whole pipeline,
switch the arm32 cross builds to be executed on arm64 which is much more
capable. For that, rename the existing debian container dockerfile
from unstable-arm32-gcc to unstable-arm64v8-arm32-gcc and use
arm64v8/debian:unstable as an image. Note, that we cannot use the same
container name as we have to keep the backwards compatibility.
Take the opportunity to remove extra empty line at the end of a file.

Modify the tag of .arm32-cross-build-tmpl to arm64 and update the build
jobs accordingly.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit a35fccc8df93de7154dba87db6e7bcf391e9d51c)

CI: Drop automation/configs/

Having 3 extra hypervisor builds on the end of a full build is deeply
confusing to debug if one of them fails, because the .config file presented in
the artefacts is not the one which caused a build failure. Also, the log
tends to be truncated in the UI.

PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
repeating. HVM-only and neither appear frequently in randconfig, so drop all
the logic here to simplify things.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7b20009a812f26e74bdbde2ab96165376b3dad34)

Commit patch queue (exported by git-debrebase)

[git-debrebase make-patches: export and commit patches]

ns16550: correct name/value pair parsing for PCI port/bridge

First of all these were inverted: "bridge=" caused the port coordinates
to be established, while "port=" controlled the bridge coordinates. And
then the error messages being identical also wasn't helpful. While
correcting this also move both case blocks close together.

Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e692b22230b411d762ac9e278a398e28df474eae
master date: 2023-03-29 14:55:37 +0200

vpci/msix: handle accesses adjacent to the MSI-X table

The handling of the MSI-X table accesses by Xen requires that any
pages part of the MSI-X related tables are not mapped into the domain
physmap.  As a result, any device registers in the same pages as the
start or the end of the MSIX or PBA tables is not currently
accessible, as the accesses are just dropped.

Note the spec forbids such placing of registers, as the MSIX and PBA
tables must be 4K isolated from any other registers:

"If a Base Address register that maps address space for the MSI-X
Table or MSI-X PBA also maps other usable address space that is not
associated with MSI-X structures, locations (e.g., for CSRs) used in
the other address space must not share any naturally aligned 4-KB
address range with one where either MSI-X structure resides."

Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
in the same page as the MSIX tables, and thus won't work on a PVH dom0
without this fix.

In order to cope with the behavior passthrough any accesses that fall
on the same page as the MSIX tables (but don't fall in between) to the
underlying hardware.  Such forwarding also takes care of the PBA
accesses, so it allows to remove the code doing this handling in
msix_{read,write}.  Note that as a result accesses to the PBA array
are no longer limited to 4 and 8 byte sizes, there's no access size
restriction for PBA accesses documented in the specification.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
vpci/msix: restore PBA access length and alignment restrictions

Accesses to the PBA array have the same length and alignment
limitations as accesses to the MSI-X table:

"For all accesses to MSI-X Table and MSI-X PBA fields, software must
use aligned full DWORD or aligned full QWORD transactions; otherwise,
the result is undefined."

Introduce such length and alignment checks into the handling of PBA
accesses for vPCI.  This was a mistake of mine for not reading the
specification correctly.

Note that accesses must now be aligned, and hence there's no longer a
need to check that the end of the access falls into the PBA region as
both the access and the region addresses must be aligned.

Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table')
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b177892d2d0e8a31122c218989f43130aeba5282
master date: 2023-03-28 14:20:35 +0200
master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
master date: 2023-03-29 14:56:33 +0200

include: don't mention stub headers more than once in a make rule

When !GRANT_TABLE and !PV_SHIM headers-n contains grant_table.h twice,
causing make to complain "target '...' given more than once in the same
rule" for the rule generating the stub headers. We don't need duplicate
entries in headers-n anywhere, so zap them (by using $(sort ...)) right
where the final value of the variable is constructed.

Fixes: 6bec713f871f ("include/compat: produce stubs for headers not otherwise generated")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 231ab79704cbb5b9be7700287c3b185225d34f1b
master date: 2023-03-28 14:20:16 +0200

x86/ucode: Fix error paths control_thread_fn()

These two early exits skipped re-enabling the watchdog, restoring the NMI
callback, and clearing the nmi_patch global pointer. Always execute the tail
of the function on the way out.

Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: fc2e1f3aad602a66c14b8285a1bd38a82f8fd02d
master date: 2023-03-28 11:57:56 +0100

x86/vmx: Don't spuriously crash the domain when INIT is received

In VMX operation, the handling of INIT IPIs is changed. Instead of the CPU
resetting, the next VMEntry fails with EXIT_REASON_INIT. From the TXT spec,
the intent of this behaviour is so that an entity which cares can scrub
secrets from RAM before participating in an orderly shutdown.

Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
and continue blindly onwards anyway.

This patch addresses only the first of these two problems by ignoring the INIT
and continuing without crashing the VM in question.

The second wants addressing too, just as soon as we've figured out something
better to do...

Discovered as collateral damage from when an AP triple faults on S3 resume on
Intel TigerLake platforms.

Link: https://github.com/QubesOS/qubes-issues/issues/7283
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: b1f11273d5a774cc88a3685c96c2e7cf6385e3b6
master date: 2023-03-24 22:49:58 +0000

x86/shadow: Fix build with no PG_log_dirty

Gitlab Randconfig found:

  arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
  arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
      'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
   1023 |         count += paging_logdirty_levels();
        |                  ^~~~~~~~~~~~~~~~~~~~~~
        |                  paging_log_dirty_init
  arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]

The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
PV_SHIM_EXCLUSIVE.  Move the declaration outside.

Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
master date: 2023-03-24 12:16:31 +0000

x86/nospec: Fix evaluate_nospec() code generation under Clang

It turns out that evaluate_nospec() code generation is not safe under Clang.
Given:

  void eval_nospec_test(int x)
  {
      if ( evaluate_nospec(x) )
          asm volatile ("nop #true" ::: "memory");
      else
          asm volatile ("nop #false" ::: "memory");
  }

Clang emits:

  <eval_nospec_test>:
         0f ae e8                lfence
         85 ff                   test   %edi,%edi
         74 02                   je     <eval_nospec_test+0x9>
         90                      nop
         c3                      ret
         90                      nop
         c3                      ret

which is not safe because the lfence has been hoisted above the conditional
jump.  Clang concludes that both barrier_nospec_true()'s have identical side
effects and can safely be merged.

Clang can be persuaded that the side effects are different if there are
different comments in the asm blocks.  This is fragile, but no more fragile
that other aspects of this construct.

Introduce barrier_nospec_false() with a separate internal comment to prevent
Clang merging it with barrier_nospec_true() despite the otherwise-identical
content.  The generated code now becomes:

  <eval_nospec_test>:
         85 ff                   test   %edi,%edi
         74 05                   je     <eval_nospec_test+0x9>
         0f ae e8                lfence
         90                      nop
         c3                      ret
         0f ae e8                lfence
         90                      nop
         c3                      ret

which has the correct number of lfence's, and in the correct place.

Link: https://github.com/llvm/llvm-project/issues/55084
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bc3c133841435829ba5c0a48427e2a77633502ab
master date: 2023-03-24 12:16:31 +0000

x86/shadow: fix and improve sh_page_has_multiple_shadows()

While no caller currently invokes the function without first making sure
there is at least one shadow [1], we'd better eliminate UB here:
find_first_set_bit() requires input to be non-zero to return a well-
defined result.

Further, using find_first_set_bit() isn't very efficient in the first
place for the intended purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[1] The function has exactly two uses, and both are from OOS code, which
    is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
    guarding the call to sh_unsync(), guarantees at least one shadow.
    Hence even if sh_page_has_multiple_shadows() returned a bogus value
    when invoked for a PV domain, the subsequent is_hvm_vcpu() and
    oos_active checks (the former being redundant with the latter) will
    compensate. (Arguably that oos_active check should come first, for
    both clarity and efficiency reasons.)
master commit: 2896224a4e294652c33f487b603d20bd30955f21
master date: 2023-03-24 11:07:08 +0100

VT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)

If the scope for IGD's IOMMU contains additional device that doesn't
actually exist, iommu=no-igfx would not disable that IOMMU. In this
particular case (Thinkpad x230) it included 00:02.1, but there is no
such device on this platform. Consider only existing devices for the
"gfx only" check as well as the establishing of IGD DRHD address
(underlying is_igd_drhd(), which is used to determine applicability of
two workarounds).

Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 49de6749baa8d0addc3048defd4ef3e85cb135e9
master date: 2023-03-23 09:16:41 +0100

AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

An earlier change with the same title (commit 1ba66a870eba) altered only
the path where x2apic_phys was already set to false (perhaps from the
command line). The same of course needs applying when the variable
wasn't modified yet from its initial value.

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 0d2686f6b66b4b1b3c72c3525083b0ce02830054
master date: 2023-03-21 09:23:25 +0100

Declare fast forward / record previous work

[git-debrebase pseudomerge: quick]

debian/changelog: finish 4.17.0+74-g3eac216e6e-1