xen.git
2 years agopygrub: Specify -rpath LIBEXEC_LIB when building fsimage.so
Ian Jackson [Fri, 22 Feb 2019 12:24:35 +0000 (12:24 +0000)]
pygrub: Specify -rpath LIBEXEC_LIB when building fsimage.so

If LIBEXEC_LIB is not on the default linker search path, the python
fsimage.so module fails to find libfsimage.so.

Add the relevant directory to the rpath explicitly.

(This situation occurs in the Debian package, where
--with-libexec-libdir is used to put each Xen version's libraries and
utilities in their own directory, to allow them to be coinstalled.)

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agopygrub: Set sys.path
Bastian Blank [Sat, 5 Jul 2014 09:47:01 +0000 (11:47 +0200)]
pygrub: Set sys.path

We install libfsimage in a non-standard path for Reasons.
(See debian/rules.)

This patch was originally part of `tools-pygrub-prefix.diff'
(eg commit 51657319be54) and included changes to the Makefile to
change the installation arrangements (we do that part in the rules now
since that is a lot less prone to conflicts when we update) and to
shared library rpath (which is now done in a separate patch).

(Commit message rewritten by Ian Jackson.)

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
squash! pygrub: Set sys.path and rpath

2 years agohotplug-common: Do not adjust LD_LIBRARY_PATH
Ian Jackson [Thu, 21 Feb 2019 16:05:40 +0000 (16:05 +0000)]
hotplug-common: Do not adjust LD_LIBRARY_PATH

This is in the upstream script because on non-Debian systems, the
default install locations in /usr/local/lib might not be on the linker
path, and as a result the hotplug scripts would break.

A reason we might need it in Debian is our multiple version
coinstallation scheme.  However, the hotplug scripts all call the
utilities via the wrappers, and the binaries are configured to load
from the right place anyway.

This setting is an annoyance because it requires libdir, which is an
arch-specific path but comes from a file we want to put in
xen-utils-common, an arch:all package.

So drop this setting.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agosysconfig.xencommons.in: Strip and debianize
Hans van Kranenburg [Sat, 9 Feb 2019 16:27:26 +0000 (17:27 +0100)]
sysconfig.xencommons.in: Strip and debianize

Strip all options that are for stuff we don't ship, which is 1)
xenstored as stubdom and 2) the new options for oom score and open file
descriptor limit, which would not have any effect, because we're
shipping different init scripts... :|

It seems useful to give the user the option to revert to xenstored
instead of the default oxenstored if they really want.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
2 years agot/h/L/vif-common.sh: disable handle_iptable
Hans van Kranenburg [Thu, 3 Jan 2019 23:35:45 +0000 (00:35 +0100)]
t/h/L/vif-common.sh: disable handle_iptable

Also see Debian bug #894013. The current attempt at providing
anti-spoofing rules results in a situation that does not have any
effect. Also note that forwarding bridged traffic to iptables is not
enabled by default, and that for openvswitch users it does not make any
sense.

So, stop cluttering the live iptables ruleset.

This functionality seems to be introduced before 2004 and since then it
has never got some additional love.

It would be nice to have a proper discussion upstream about how Xen
could provide some anti mac/ip spoofing in the dom0. It does not seem to
be a trivial thing to do, since it requires having quite some knowledge
about what the domU is allowed to do or not (e.g. a domU can be a
router...).

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
2 years agodocs/man/xen-vbd-interface.7: Provide properly-formatted NAME section
Ian Jackson [Fri, 12 Oct 2018 16:56:56 +0000 (17:56 +0100)]
docs/man/xen-vbd-interface.7: Provide properly-formatted NAME section

This manpage was omitted from
   docs/man: Provide properly-formatted NAME sections
because I was previously building with markdown not installed.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agoshim: Provide separate install-shim target
Ian Jackson [Fri, 12 Oct 2018 17:17:10 +0000 (17:17 +0000)]
shim: Provide separate install-shim target

When building on a 32-bit userland, the user wants to build 32-bit
tools and a 64-bit hypervisor.  This involves setting XEN_TARGET_ARCH
to different values for the tools build and the hypervisor build.

So the user must invoke the tools build and the hypervisor build
separately.

However, although the shim is done by the tools/firmware Makefile, its
bitness needs to be the same as the hypervisor, not the same as the
tools.  When run with XEN_TARGET_ARCH=x86_32, it it skipped, which is
wrong.

So the user must invoke the shim build separately.  This can be done
with
   make -C tools/firmware/xen-dir XEN_TARGET_ARCH=x86_64

However, tools/firmware/xen-dir has no `install' target.  The
installation of all `firmware' is done in tools/firmware/Makefile.  It
might be possible to fix this, but it is not trivial.  For example,
the definitions of INST_DIR and DEBG_DIR would need to be copied, as
would an appropriate $(INSTALL_DIR) call.

For now, provide an `install-shim' target in tools/firmware/Makefile.

This has to be called from `install' of course.  We can't make it
a dependency of `install' because it might be run before `all' has
completed.  We could make it depend on a `shim' target but such
a target is nearly impossible to write because everything is done by
the inflexible subdir-$@ machinery.

The overally result of this patch is that existing make invocations
work as before.  But additionally, the user can say
  make -C tools/firmware install-shim XEN_TARGET_ARCH=x86_64
to install the shim.  The user must have built it already.
Unlike the build rune, this install-rune is properly conditional
so it is OK to call on ARM.

What a mess.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
2 years agoconfig/Tools.mk.in: Respect caller's CONFIG_PV_SHIM
Ian Jackson [Fri, 12 Oct 2018 16:00:16 +0000 (16:00 +0000)]
config/Tools.mk.in: Respect caller's CONFIG_PV_SHIM

This makes it easier to disable the shim build.  (In Debian we need to
build the shim separately because it needs different compiler flags).

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
[ Hans: adjust from tools/firmware/Makefile to config/Tools.mk.in to
follow changes that happened in 8845155c83 ("pvshim: make PV shim build
selectable from configure") ]
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
2 years ago.gitignore: Add configure output which we always delete and regenerate
Ian Jackson [Fri, 5 Oct 2018 17:05:48 +0000 (18:05 +0100)]
.gitignore: Add configure output which we always delete and regenerate

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agoautoconf: Provide libexec_libdir_suffix
Ian Jackson [Wed, 3 Oct 2018 15:25:58 +0000 (16:25 +0100)]
autoconf: Provide libexec_libdir_suffix

This is going to be used to put libfsimage.so into a path containing
the multiarch triplet.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agotools-libfsimage-prefix.diff
Hans van Kranenburg [Mon, 25 May 2020 15:08:18 +0000 (17:08 +0200)]
tools-libfsimage-prefix.diff

\o/

2 years agoDo not build the instruction emulator
Ian Jackson [Thu, 20 Sep 2018 17:10:14 +0000 (18:10 +0100)]
Do not build the instruction emulator

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agoRemove static solaris support from pygrub
Bastian Blank [Sat, 5 Jul 2014 09:47:29 +0000 (11:47 +0200)]
Remove static solaris support from pygrub

Patch-Name: tools-pygrub-remove-static-solaris-support

Gbp-Pq: Topic misc
Gbp-Pq: Name tools-pygrub-remove-static-solaris-support

2 years agoDo not ship COPYING into /usr/include
Bastian Blank [Sat, 5 Jul 2014 09:47:30 +0000 (11:47 +0200)]
Do not ship COPYING into /usr/include

This is not wanted in Debian.  COPYING ends up in
/usr/share/doc/xen-*copyright.

Patch-Name: tools-include-no-COPYING.diff

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agoconfig-prefix.diff
Bastian Blank [Sat, 5 Jul 2014 09:46:45 +0000 (11:46 +0200)]
config-prefix.diff

Patch-Name: config-prefix.diff

Gbp-Pq: Topic prefix-abiname
Gbp-Pq: Name config-prefix.diff

2 years agoDisplay Debian package version in hypervisor log
Bastian Blank [Sat, 5 Jul 2014 09:46:43 +0000 (11:46 +0200)]
Display Debian package version in hypervisor log

During hypervisor boot, disable the banner and nicely display the xen
version as well as the Maintainer address from debian/control.

For this to work the DEB_VERSION and DEB_MAINTAINER variables needs to
be set by debian/rules.

Original patch by Bastian Blank <waldi@debian.org>
Modified by
Hans van Kranenburg <hans@knorrie.org>
Maximilian Engelhardt <maxi@daemonizer.de>

2 years agoDelete configure output
Ian Jackson [Wed, 19 Sep 2018 15:53:22 +0000 (16:53 +0100)]
Delete configure output

These autogenerated files are not useful in Debian; dh_autoreconf will
regenerate them.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agoDelete config.sub and config.guess
Ian Jackson [Wed, 19 Sep 2018 15:45:49 +0000 (16:45 +0100)]
Delete config.sub and config.guess

dh_autoreconf will provide these back.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
2 years agodebian/changelog: finish 4.17.1+2-gb773c48e36-1
Maximilian Engelhardt [Thu, 18 May 2023 19:27:07 +0000 (21:27 +0200)]
debian/changelog: finish 4.17.1+2-gb773c48e36-1

2 years agoUpdate changelog for new upstream 4.17.1+2-gb773c48e36
Maximilian Engelhardt [Thu, 18 May 2023 19:24:46 +0000 (21:24 +0200)]
Update changelog for new upstream 4.17.1+2-gb773c48e36

[git-debrebase changelog: new upstream 4.17.1+2-gb773c48e36]

2 years agoUpdate to upstream 4.17.1+2-gb773c48e36
Maximilian Engelhardt [Thu, 18 May 2023 19:24:46 +0000 (21:24 +0200)]
Update to upstream 4.17.1+2-gb773c48e36

[git-debrebase anchor: new upstream 4.17.1+2-gb773c48e36, merge]

2 years agoupdate Xen version to 4.17.2-pre
Jan Beulich [Tue, 16 May 2023 15:23:29 +0000 (17:23 +0200)]
update Xen version to 4.17.2-pre

2 years agox86/amd: fix legacy setting of SSBD on AMD Family 17h
Roger Pau Monné [Tue, 16 May 2023 15:22:35 +0000 (17:22 +0200)]
x86/amd: fix legacy setting of SSBD on AMD Family 17h

The current logic to set SSBD on AMD Family 17h and Hygon Family 18h
processors requires that the setting of SSBD is coordinated at a core
level, as the setting is shared between threads.  Logic was introduced
to keep track of how many threads require SSBD active in order to
coordinate it, such logic relies on using a per-core counter of
threads that have SSBD active.

Given the current logic, it's possible for a guest to under or
overflow the thread counter, because each write to VIRT_SPEC_CTRL.SSBD
by the guest gets propagated to the helper that does the per-core
active accounting.  Overflowing the counter is not so much of an
issue, as this would just make SSBD sticky.

Underflowing however is more problematic: on non-debug Xen builds a
guest can perform empty writes to VIRT_SPEC_CTRL that would cause the
counter to underflow and thus the value gets saturated to the max
value of unsigned int.  At which points attempts from any thread to
set VIRT_SPEC_CTRL.SSBD won't get propagated to the hardware anymore,
because the logic will see that the counter is greater than 1 and
assume that SSBD is already active, effectively loosing the setting
of SSBD and the protection it provides.

Fix this by introducing a per-CPU variable that keeps track of whether
the current thread has legacy SSBD active or not, and thus only
attempt to propagate the value to the hardware once the thread
selected value changes.

This is XSA-431 / CVE-2022-42336

Fixes: b2030e6730a2 ('amd/virt_ssbd: set SSBD at vCPU context switch')
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: eda98ea870803ea204a1928519b3f21ec6a679b6
master date: 2023-05-16 17:17:24 +0200

2 years agoupdate Xen version to 4.17.1
Jan Beulich [Thu, 27 Apr 2023 12:53:19 +0000 (14:53 +0200)]
update Xen version to 4.17.1

2 years agox86/shadow: restore dropped check in sh_unshadow_for_p2m_change()
Roger Pau Monné [Tue, 25 Apr 2023 13:47:44 +0000 (15:47 +0200)]
x86/shadow: restore dropped check in sh_unshadow_for_p2m_change()

As a result of 241702e064604dbb3e0d9b731aa8f45be448243b the
mfn_valid() check in sh_unshadow_for_p2m_change() was lost.  That
allows sh_remove_shadows() to be called with gfns that have no backing
page, causing an ASSERT to trigger in debug builds or dereferencing an
arbitrary pointer partially under guest control in non-debug builds:

RIP:    e008:[<ffff82d0402dcf2c>] sh_remove_shadows+0x19f/0x722
RFLAGS: 0000000000010246   CONTEXT: hypervisor (d0v2)
[...]
Xen call trace:
   [<ffff82d0402dcf2c>] R sh_remove_shadows+0x19f/0x722
   [<ffff82d0402e28f4>] F arch/x86/mm/shadow/hvm.c#sh_unshadow_for_p2m_change+0xab/0x2b7
   [<ffff82d040311931>] F arch/x86/mm/p2m-pt.c#write_p2m_entry+0x19b/0x4d3
   [<ffff82d0403131b2>] F arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x67b/0xa8e
   [<ffff82d040302c92>] F p2m_set_entry+0xcc/0x149
   [<ffff82d040305a50>] F unmap_mmio_regions+0x17b/0x2c9
   [<ffff82d040241e5e>] F do_domctl+0x11f3/0x195e
   [<ffff82d0402c7e10>] F hvm_hypercall+0x5b1/0xa2d
   [<ffff82d0402adc72>] F vmx_vmexit_handler+0x130f/0x1cd5
   [<ffff82d040203602>] F vmx_asm_vmexit_handler+0xf2/0x210

****************************************
Panic on CPU 1:
Assertion 'mfn_valid(gmfn)' failed at arch/x86/mm/shadow/common.c:2203
****************************************

Fix this by restoring the mfn_valid() check in
sh_unshadow_for_p2m_change(), unifying it with the rest of the checks
that are done at the start of the function.

This is XSA-430 / CVE-2022-42335

Fixes: 241702e064 ('x86/shadow: slightly consolidate sh_unshadow_for_p2m_change() (part II)')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f6c3cb21628f7bed73cb992da400f6b36630f290
master date: 2023-04-25 15:44:54 +0200

2 years agoautomation: Remove installation of packages from test scripts
Michal Orzel [Tue, 25 Apr 2023 07:12:48 +0000 (09:12 +0200)]
automation: Remove installation of packages from test scripts

Now, when these packages are already installed in the respective
containers, we can remove them from the test scripts.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 72cfe1c3ad1fae95f4f0ac51dbdd6838264fdd7f
master date: 2022-12-09 14:55:33 -0800

2 years agoxen/ELF: Fix ELF32 PRI formatters
Andrew Cooper [Mon, 24 Apr 2023 11:02:19 +0000 (13:02 +0200)]
xen/ELF: Fix ELF32 PRI formatters

It is rude to hide width formatting inside a PRI* macro, doubly so when it's
only in one bitness of the macro.

However its fully buggy when all the users use %#"PRI because then it expands
to the common trap of %#08x which does not do what the author intends.

Switch the 32bit ELF PRI formatters to use plain integer PRI's, just like on
the 64bit side already.  No practical change.

Fixes: 7597fabca76e ("livepatch: Include sizes when an mismatch occurs")
Fixes: 380b229634f8 ("xsplice: Implement payload loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: cfa2bb82c01f0c656804cedd8f44eb2a99a2b5bc
master date: 2023-04-19 15:55:29 +0100

2 years agox86/livepatch: Fix livepatch application when CET is active
Andrew Cooper [Mon, 24 Apr 2023 11:01:23 +0000 (13:01 +0200)]
x86/livepatch: Fix livepatch application when CET is active

Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:

  (XEN) livepatch: lp: Verifying enabled expectations for all functions
  (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
  (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
  (XEN) livepatch: lp: Applying 1 functions
  (XEN) hi_func: Hi! (called 1 times)
  (XEN) Hook executing.
  (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
  (XEN) *** DOUBLE FAULT ***
  <many double faults>

The assertion failure is from a global (system wide) TLB flush initiated by
modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
sure exactly what causes the #DF's, but it doesn't really matter either
because they highlight a latent bug that I'd overlooked with the CET-SS vs
patching work the first place.

While we're careful to arrange for the patching CPU to avoid encountering
non-shstk memory with transient shstk perms, other CPUs can pick these
mappings up too if they need to re-walk for uarch reasons.

Another bug is that for livepatching, we only disable CET if shadow stacks are
in use.  Running on Intel CET systems when Xen is only using CET-IBT will
crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
still active.

Also, we never went and cleared the dirty bits on .rodata.  This would
matter (for the same reason it matters on .text - it becomes a valid target
for WRSS), but we never actually patch .rodata anyway.

Therefore rework how we do patching for both alternatives and livepatches.

Introduce modify_xen_mappings_lite() with a purpose similar to
modify_xen_mappings(), but stripped down to the bare minimum as it's used in
weird contexts.  Leave all complexity to the caller to handle.

Instead of patching by clearing CR0.WP (and having to jump through some
fragile hoops to disable CET in order to do this), just transiently relax the
permissions on .text via l2_identmap[].

Note that neither alternatives nor livepatching edit .rodata, so we don't need
to relax those permissions at this juncture.

The perms are relaxed globally, but this is safe enough.  Alternatives run
before we boot APs, and Livepatching runs in a quiesced state where the other
CPUs are not doing anything interesting.

This approach is far more robust.

Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 8676092a0f16ca6ad188d3fb270784a2caecf542
master date: 2023-04-18 20:20:26 +0100

2 years agox86/hvm: Disallow disabling paging in 64bit mode
Andrew Cooper [Mon, 24 Apr 2023 11:00:44 +0000 (13:00 +0200)]
x86/hvm: Disallow disabling paging in 64bit mode

The Long Mode consistency checks exist to "ensure that the processor does not
enter an undefined mode or state that results in unpredictable behavior".  APM
Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
preventing the OS from trying to exit Long mode while in 64bit mode.  This
could leave the CPU in Protected Mode with an %rip above the 4G boundary.

Experimentally, AMD CPUs really do permit this state transition.  An OS which
tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
to be going on behind the scenes ought to result in sane continued execution.

Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
section instructs peoples to switch to a compatibility mode segment first
before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
Architecture".

Either way, this appears to have been a genuine oversight in the AMD64 spec.

Intel, on the other hand, rejects this state transition with #GP.

Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
4.1.2 "Paging-Mode Enable" was altered from

  If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
  exception (#GP); software should clear CR4.PCIDE before attempting to
  disable paging.

to

  If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
  clear CR0.PG causes a general-protection exception (#GP). Software should
  transition to compatibility mode and clear CR4.PCIDE before attempting to
  disable paging.

which acknowledges this corner case, but there doesn't appear to be any other
discussion even in the relevant Long Mode sections.

So it appears that Intel spotted and addressed the corner case in IA-32e mode,
but were 15 years late to document it.

Xen was written to the AMD spec, and misses the check.  Follow the Intel
behaviour, because it is more sensible and avoids hitting a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 18c128ba66e6308744850aca96dbffd18f91c29b
master date: 2023-04-14 18:18:20 +0100

2 years agox86emul: pull permission check ahead for REP INS/OUTS
Jan Beulich [Mon, 24 Apr 2023 10:59:39 +0000 (12:59 +0200)]
x86emul: pull permission check ahead for REP INS/OUTS

Based on observations on a fair range of hardware from both primary
vendors even zero-iteration-count instances of these insns perform the
port related permission checking first.

Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f41c88a6fca59f99a2eb5e7ed3d90ab7bca08b1b
master date: 2023-03-30 13:07:16 +0200

2 years agotools/xenstore: fix quota check in transaction_fix_domains()
Juergen Gross [Mon, 24 Apr 2023 10:59:13 +0000 (12:59 +0200)]
tools/xenstore: fix quota check in transaction_fix_domains()

Today when finalizing a transaction the number of node quota is checked
to not being exceeded after the transaction. This check is always done,
even if the transaction is being performed by a privileged connection,
or if there were no nodes created in the transaction.

Correct that by checking quota only if:
- the transaction is being performed by an unprivileged guest, and
- at least one node was created in the transaction

Reported-by: Julien Grall <julien@xen.org>
Fixes: f2bebf72c4d5 ("xenstore: rework of transaction handling")
Signed-off-by: Juergen Gross <jgross@suse.com>
master commit: f6b801c36bd5e4ab22a9f80c8d57121b62b139af
master date: 2023-03-29 22:02:36 +0100

2 years agoCI: Remove llvm-8 from the Debian Stretch container
Andrew Cooper [Fri, 24 Mar 2023 17:59:56 +0000 (17:59 +0000)]
CI: Remove llvm-8 from the Debian Stretch container

For similar reasons to c/s a6b1e2b80fe20.  While this container is still
build-able for now, all the other problems with explicitly-versioned compilers
remain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7a298375721636290a57f31bb0f7c2a5a38956a4)

2 years agoautomation: Remove non-debug x86_32 build jobs
Anthony PERARD [Fri, 24 Feb 2023 17:29:15 +0000 (17:29 +0000)]
automation: Remove non-debug x86_32 build jobs

In the interest of having less jobs, we remove the x86_32 build jobs
that do release build. Debug build is very likely to be enough to find
32bit build issues.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 7b66792ea7f77fb9e587e1e9c530a7c869eecba1)

2 years agoautomation: Remove CentOS 7.2 containers and builds
Anthony PERARD [Tue, 21 Feb 2023 16:55:36 +0000 (16:55 +0000)]
automation: Remove CentOS 7.2 containers and builds

We already have a container which track the latest CentOS 7, no need
for this one as well.

Also, 7.2 have outdated root certificate which prevent connection to
website which use Let's Encrypt.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit ba512629f76dfddb39ea9133ee51cdd9e392a927)

2 years agoautomation: Switch arm32 cross builds to run on arm64
Michal Orzel [Tue, 14 Feb 2023 15:38:38 +0000 (16:38 +0100)]
automation: Switch arm32 cross builds to run on arm64

Due to the limited x86 CI resources slowing down the whole pipeline,
switch the arm32 cross builds to be executed on arm64 which is much more
capable. For that, rename the existing debian container dockerfile
from unstable-arm32-gcc to unstable-arm64v8-arm32-gcc and use
arm64v8/debian:unstable as an image. Note, that we cannot use the same
container name as we have to keep the backwards compatibility.
Take the opportunity to remove extra empty line at the end of a file.

Modify the tag of .arm32-cross-build-tmpl to arm64 and update the build
jobs accordingly.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit a35fccc8df93de7154dba87db6e7bcf391e9d51c)

2 years agoCI: Drop automation/configs/
Andrew Cooper [Thu, 29 Dec 2022 15:39:13 +0000 (15:39 +0000)]
CI: Drop automation/configs/

Having 3 extra hypervisor builds on the end of a full build is deeply
confusing to debug if one of them fails, because the .config file presented in
the artefacts is not the one which caused a build failure.  Also, the log
tends to be truncated in the UI.

PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
repeating.  HVM-only and neither appear frequently in randconfig, so drop all
the logic here to simplify things.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7b20009a812f26e74bdbde2ab96165376b3dad34)

2 years agons16550: correct name/value pair parsing for PCI port/bridge
Jan Beulich [Fri, 31 Mar 2023 06:35:15 +0000 (08:35 +0200)]
ns16550: correct name/value pair parsing for PCI port/bridge

First of all these were inverted: "bridge=" caused the port coordinates
to be established, while "port=" controlled the bridge coordinates. And
then the error messages being identical also wasn't helpful. While
correcting this also move both case blocks close together.

Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e692b22230b411d762ac9e278a398e28df474eae
master date: 2023-03-29 14:55:37 +0200

2 years agovpci/msix: handle accesses adjacent to the MSI-X table
Roger Pau Monné [Fri, 31 Mar 2023 06:34:26 +0000 (08:34 +0200)]
vpci/msix: handle accesses adjacent to the MSI-X table

The handling of the MSI-X table accesses by Xen requires that any
pages part of the MSI-X related tables are not mapped into the domain
physmap.  As a result, any device registers in the same pages as the
start or the end of the MSIX or PBA tables is not currently
accessible, as the accesses are just dropped.

Note the spec forbids such placing of registers, as the MSIX and PBA
tables must be 4K isolated from any other registers:

"If a Base Address register that maps address space for the MSI-X
Table or MSI-X PBA also maps other usable address space that is not
associated with MSI-X structures, locations (e.g., for CSRs) used in
the other address space must not share any naturally aligned 4-KB
address range with one where either MSI-X structure resides."

Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
in the same page as the MSIX tables, and thus won't work on a PVH dom0
without this fix.

In order to cope with the behavior passthrough any accesses that fall
on the same page as the MSIX tables (but don't fall in between) to the
underlying hardware.  Such forwarding also takes care of the PBA
accesses, so it allows to remove the code doing this handling in
msix_{read,write}.  Note that as a result accesses to the PBA array
are no longer limited to 4 and 8 byte sizes, there's no access size
restriction for PBA accesses documented in the specification.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
vpci/msix: restore PBA access length and alignment restrictions

Accesses to the PBA array have the same length and alignment
limitations as accesses to the MSI-X table:

"For all accesses to MSI-X Table and MSI-X PBA fields, software must
use aligned full DWORD or aligned full QWORD transactions; otherwise,
the result is undefined."

Introduce such length and alignment checks into the handling of PBA
accesses for vPCI.  This was a mistake of mine for not reading the
specification correctly.

Note that accesses must now be aligned, and hence there's no longer a
need to check that the end of the access falls into the PBA region as
both the access and the region addresses must be aligned.

Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table')
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b177892d2d0e8a31122c218989f43130aeba5282
master date: 2023-03-28 14:20:35 +0200
master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
master date: 2023-03-29 14:56:33 +0200

2 years agoinclude: don't mention stub headers more than once in a make rule
Jan Beulich [Fri, 31 Mar 2023 06:34:04 +0000 (08:34 +0200)]
include: don't mention stub headers more than once in a make rule

When !GRANT_TABLE and !PV_SHIM headers-n contains grant_table.h twice,
causing make to complain "target '...' given more than once in the same
rule" for the rule generating the stub headers. We don't need duplicate
entries in headers-n anywhere, so zap them (by using $(sort ...)) right
where the final value of the variable is constructed.

Fixes: 6bec713f871f ("include/compat: produce stubs for headers not otherwise generated")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 231ab79704cbb5b9be7700287c3b185225d34f1b
master date: 2023-03-28 14:20:16 +0200

2 years agox86/ucode: Fix error paths control_thread_fn()
Andrew Cooper [Fri, 31 Mar 2023 06:33:28 +0000 (08:33 +0200)]
x86/ucode: Fix error paths control_thread_fn()

These two early exits skipped re-enabling the watchdog, restoring the NMI
callback, and clearing the nmi_patch global pointer.  Always execute the tail
of the function on the way out.

Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: fc2e1f3aad602a66c14b8285a1bd38a82f8fd02d
master date: 2023-03-28 11:57:56 +0100

2 years agox86/vmx: Don't spuriously crash the domain when INIT is received
Andrew Cooper [Fri, 31 Mar 2023 06:32:57 +0000 (08:32 +0200)]
x86/vmx: Don't spuriously crash the domain when INIT is received

In VMX operation, the handling of INIT IPIs is changed.  Instead of the CPU
resetting, the next VMEntry fails with EXIT_REASON_INIT.  From the TXT spec,
the intent of this behaviour is so that an entity which cares can scrub
secrets from RAM before participating in an orderly shutdown.

Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
and continue blindly onwards anyway.

This patch addresses only the first of these two problems by ignoring the INIT
and continuing without crashing the VM in question.

The second wants addressing too, just as soon as we've figured out something
better to do...

Discovered as collateral damage from when an AP triple faults on S3 resume on
Intel TigerLake platforms.

Link: https://github.com/QubesOS/qubes-issues/issues/7283
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: b1f11273d5a774cc88a3685c96c2e7cf6385e3b6
master date: 2023-03-24 22:49:58 +0000

2 years agox86/shadow: Fix build with no PG_log_dirty
Andrew Cooper [Fri, 31 Mar 2023 06:32:11 +0000 (08:32 +0200)]
x86/shadow: Fix build with no PG_log_dirty

Gitlab Randconfig found:

  arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
  arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
      'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
   1023 |         count += paging_logdirty_levels();
        |                  ^~~~~~~~~~~~~~~~~~~~~~
        |                  paging_log_dirty_init
  arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]

The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
PV_SHIM_EXCLUSIVE.  Move the declaration outside.

Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
master date: 2023-03-24 12:16:31 +0000

2 years agox86/nospec: Fix evaluate_nospec() code generation under Clang
Andrew Cooper [Fri, 31 Mar 2023 06:31:20 +0000 (08:31 +0200)]
x86/nospec: Fix evaluate_nospec() code generation under Clang

It turns out that evaluate_nospec() code generation is not safe under Clang.
Given:

  void eval_nospec_test(int x)
  {
      if ( evaluate_nospec(x) )
          asm volatile ("nop #true" ::: "memory");
      else
          asm volatile ("nop #false" ::: "memory");
  }

Clang emits:

  <eval_nospec_test>:
         0f ae e8                lfence
         85 ff                   test   %edi,%edi
         74 02                   je     <eval_nospec_test+0x9>
         90                      nop
         c3                      ret
         90                      nop
         c3                      ret

which is not safe because the lfence has been hoisted above the conditional
jump.  Clang concludes that both barrier_nospec_true()'s have identical side
effects and can safely be merged.

Clang can be persuaded that the side effects are different if there are
different comments in the asm blocks.  This is fragile, but no more fragile
that other aspects of this construct.

Introduce barrier_nospec_false() with a separate internal comment to prevent
Clang merging it with barrier_nospec_true() despite the otherwise-identical
content.  The generated code now becomes:

  <eval_nospec_test>:
         85 ff                   test   %edi,%edi
         74 05                   je     <eval_nospec_test+0x9>
         0f ae e8                lfence
         90                      nop
         c3                      ret
         0f ae e8                lfence
         90                      nop
         c3                      ret

which has the correct number of lfence's, and in the correct place.

Link: https://github.com/llvm/llvm-project/issues/55084
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bc3c133841435829ba5c0a48427e2a77633502ab
master date: 2023-03-24 12:16:31 +0000

2 years agox86/shadow: fix and improve sh_page_has_multiple_shadows()
Jan Beulich [Fri, 31 Mar 2023 06:30:41 +0000 (08:30 +0200)]
x86/shadow: fix and improve sh_page_has_multiple_shadows()

While no caller currently invokes the function without first making sure
there is at least one shadow [1], we'd better eliminate UB here:
find_first_set_bit() requires input to be non-zero to return a well-
defined result.

Further, using find_first_set_bit() isn't very efficient in the first
place for the intended purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[1] The function has exactly two uses, and both are from OOS code, which
    is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
    guarding the call to sh_unsync(), guarantees at least one shadow.
    Hence even if sh_page_has_multiple_shadows() returned a bogus value
    when invoked for a PV domain, the subsequent is_hvm_vcpu() and
    oos_active checks (the former being redundant with the latter) will
    compensate. (Arguably that oos_active check should come first, for
    both clarity and efficiency reasons.)
master commit: 2896224a4e294652c33f487b603d20bd30955f21
master date: 2023-03-24 11:07:08 +0100

2 years agoVT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)
Marek Marczykowski-Górecki [Fri, 31 Mar 2023 06:29:55 +0000 (08:29 +0200)]
VT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)

If the scope for IGD's IOMMU contains additional device that doesn't
actually exist, iommu=no-igfx would not disable that IOMMU. In this
particular case (Thinkpad x230) it included 00:02.1, but there is no
such device on this platform. Consider only existing devices for the
"gfx only" check as well as the establishing of IGD DRHD address
(underlying is_igd_drhd(), which is used to determine applicability of
two workarounds).

Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 49de6749baa8d0addc3048defd4ef3e85cb135e9
master date: 2023-03-23 09:16:41 +0100

2 years agoAMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
Jan Beulich [Fri, 31 Mar 2023 06:28:56 +0000 (08:28 +0200)]
AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

An earlier change with the same title (commit 1ba66a870eba) altered only
the path where x2apic_phys was already set to false (perhaps from the
command line). The same of course needs applying when the variable
wasn't modified yet from its initial value.

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 0d2686f6b66b4b1b3c72c3525083b0ce02830054
master date: 2023-03-21 09:23:25 +0100

2 years agodebian/changelog: finish 4.17.0+74-g3eac216e6e-1
Maximilian Engelhardt [Thu, 23 Mar 2023 21:23:53 +0000 (22:23 +0100)]
debian/changelog: finish 4.17.0+74-g3eac216e6e-1

2 years agoUpdate changelog for new upstream 4.17.0+74-g3eac216e6e
Maximilian Engelhardt [Thu, 23 Mar 2023 21:21:38 +0000 (22:21 +0100)]
Update changelog for new upstream 4.17.0+74-g3eac216e6e

[git-debrebase changelog: new upstream 4.17.0+74-g3eac216e6e]

2 years agoUpdate to upstream 4.17.0+74-g3eac216e6e
Maximilian Engelhardt [Thu, 23 Mar 2023 21:21:38 +0000 (22:21 +0100)]
Update to upstream 4.17.0+74-g3eac216e6e

[git-debrebase anchor: new upstream 4.17.0+74-g3eac216e6e, merge]

2 years agolibacpi: fix PCI hotplug AML
David Woodhouse [Tue, 21 Mar 2023 12:47:52 +0000 (13:47 +0100)]
libacpi: fix PCI hotplug AML

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+                    ShiftRight (0x4, \_GPE.PH00, Local1)
+                    Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b190af7d3e90f58da5f58044b8dea7261b8b483d
master date: 2023-03-20 17:12:34 +0100

2 years agobunzip: work around gcc13 warning
Jan Beulich [Tue, 21 Mar 2023 12:47:11 +0000 (13:47 +0100)]
bunzip: work around gcc13 warning

While provable that length[0] is always initialized (because symCount
cannot be zero), upcoming gcc13 fails to recognize this and warns about
the unconditional use of the value immediately following the loop.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511.

Reported-by: Martin Liška <martin.liska@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 402195e56de0aacf97e05c80ed367d464ca6938b
master date: 2023-03-14 10:45:28 +0100

2 years agoVT-d: constrain IGD check
Jan Beulich [Tue, 21 Mar 2023 12:46:39 +0000 (13:46 +0100)]
VT-d: constrain IGD check

Marking a DRHD as controlling an IGD isn't very sensible without
checking that at the very least it's a graphics device that lives at
0000:00:02.0. Re-use the reading of the class-code to control both the
clearing of "gfx_only" and the setting of "igd_drhd_address".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: f8c4317295fa1cde1a81779b7e362651c084efb8
master date: 2023-03-14 10:44:08 +0100

2 years agox86/altp2m: help gcc13 to avoid it emitting a warning
Jan Beulich [Tue, 21 Mar 2023 12:45:47 +0000 (13:45 +0100)]
x86/altp2m: help gcc13 to avoid it emitting a warning

Switches of altp2m-s always expect a valid altp2m to be in place (and
indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
The compiler, however, cannot know that, and hence it cannot eliminate
p2m_get_altp2m()'s case of returnin (literal) NULL. If then the compiler
decides to special case that code path in the caller, the dereference in
instances of

    atomic_dec(&p2m_get_altp2m(v)->active_vcpus);

can, to the code generator, appear to be NULL dereferences, leading to

In function 'atomic_dec',
    inlined from '...' at ...:
./arch/x86/include/asm/atomic.h:182:5: error: array subscript 0 is outside array bounds of 'int[0]' [-Werror=array-bounds=]

Aid the compiler by adding a BUG_ON() checking the return value of the
problematic p2m_get_altp2m(). Since with the use of the local variable
the 2nd p2m_get_altp2m() each will look questionable at the first glance
(Why is the local variable not used here?), open-code the only relevant
piece of p2m_get_altp2m() there.

To avoid repeatedly doing these transformations, and also to limit how
"bad" the open-coding really is, convert the entire operation to an
inline helper, used by all three instances (and accepting the redundant
BUG_ON(idx >= MAX_ALTP2M) in two of the three cases).

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: be62b1fc2aa7375d553603fca07299da765a89fe
master date: 2023-03-13 15:16:21 +0100

2 years agocore-parking: fix build with gcc12 and NR_CPUS=1
Jan Beulich [Tue, 21 Mar 2023 12:44:53 +0000 (13:44 +0100)]
core-parking: fix build with gcc12 and NR_CPUS=1

Gcc12 takes issue with core_parking_remove()'s

    for ( ; i < cur_idle_nums; ++i )
        core_parking_cpunum[i] = core_parking_cpunum[i + 1];

complaining that the right hand side array access is past the bounds of
1. Clearly the compiler can't know that cur_idle_nums can only ever be
zero in this case (as the sole CPU cannot be parked).

Arrange for core_parking.c's contents to not be needed altogether, and
then disable its building when NR_CPUS == 1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 4b0422f70feb4b1cd04598ffde805fc224f3812e
master date: 2023-03-13 15:15:42 +0100

2 years agox86/spec-ctrl: Add BHI controls to userspace components
Andrew Cooper [Tue, 21 Mar 2023 12:44:27 +0000 (13:44 +0100)]
x86/spec-ctrl: Add BHI controls to userspace components

This was an oversight when adding the Xen parts.

Fixes: cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 9276e832aef60437da13d91e66fc259fd94d6f91
master date: 2023-03-13 11:26:26 +0000

2 years agotools/xenmon: Fix xenmon.py for with python3.x
Bernhard Kaindl [Tue, 21 Mar 2023 12:44:04 +0000 (13:44 +0100)]
tools/xenmon: Fix xenmon.py for with python3.x

Fixes for Py3:
* class Delayed(): file not defined; also an error for pylint -E.  Inherit
  object instead for Py2 compatibility.  Fix DomainInfo() too.
* Inconsistent use of tabs and spaces for indentation (in one block)

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3a59443c1d5ae0677a792c660ccd3796ce036732
master date: 2023-02-06 10:22:12 +0000

2 years agotools/python: change 's#' size type for Python >= 3.10
Marek Marczykowski-Górecki [Tue, 21 Mar 2023 12:43:44 +0000 (13:43 +0100)]
tools/python: change 's#' size type for Python >= 3.10

Python < 3.10 by default uses 'int' type for data+size string types
(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
required to define PY_SSIZE_T_CLEAN before including Python.h, and using
Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
supported since Python 2.5.

Adjust bindings accordingly.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 897257ba49d0a6ddcf084960fd792ccce9c40f94
master date: 2023-02-06 08:50:13 +0100

2 years agox86/vmx: implement Notify VM Exit
Roger Pau Monné [Tue, 21 Mar 2023 12:42:43 +0000 (13:42 +0100)]
x86/vmx: implement Notify VM Exit

Under certain conditions guests can get the CPU stuck in an unbounded
loop without the possibility of an interrupt window to occur on
instruction boundary.  This was the case with the scenarios described
in XSA-156.

Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
if no interrupt window occurs for a specified amount of time.  Note
that using the Notify VM Exit avoids having to trap #AC and #DB
exceptions, as Xen is guaranteed to get a VM Exit even if the guest
puts the CPU in a loop without an interrupt window, as such disable
the intercepts if the feature is available and enabled.

Setting the notify VM exit window to 0 is safe because there's a
threshold added by the hardware in order to have a sane window value.

Note the handling of EXIT_REASON_NOTIFY in the nested virtualization
case is passed to L0, and hence a nested guest being able to trigger a
notify VM exit with an invalid context would be able to crash the L1
hypervisor (by L0 destroying the domain).  Since we don't expose VM
Notify support to L1 it should already enable the required
protections in order to prevent VM Notify from triggering in the first
place.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
x86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"

The original patch tried to do two things - implement VMNotify, and
re-optimise VT-x to not intercept #DB/#AC by default.

The second part is buggy in multiple ways.  Both GDBSX and Introspection need
to conditionally intercept #DB, which was not accounted for.  Also, #DB
interception has nothing at all to do with cpu_has_monitor_trap_flag.

Revert the second half, leaving #DB/#AC intercepted unilaterally, but with
VMNotify active by default when available.

Fixes: 573279cde1c4 ("x86/vmx: implement Notify VM Exit")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 573279cde1c4e752d4df34bc65ffafa17573148e
master date: 2022-12-19 11:24:14 +0100
master commit: 5f08bc9404c7cfa8131e262c7dbcb4d96c752686
master date: 2023-01-20 19:39:32 +0000

2 years agox86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI
Roger Pau Monné [Tue, 21 Mar 2023 12:41:49 +0000 (13:41 +0100)]
x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI

Introduce a small helper to OR VMX_INTR_SHADOW_NMI in
GUEST_INTERRUPTIBILITY_INFO in order to help dealing with the NMI
unblocked by IRET case.  Replace the existing usage in handling
EXIT_REASON_EXCEPTION_NMI and also add such handling to EPT violations
and page-modification log-full events.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: d329b37d12132164c3894d0b6284be72576ef950
master date: 2022-12-19 11:23:34 +0100

2 years agox86/vmx: implement VMExit based guest Bus Lock detection
Roger Pau Monné [Tue, 21 Mar 2023 12:40:41 +0000 (13:40 +0100)]
x86/vmx: implement VMExit based guest Bus Lock detection

Add support for enabling guest Bus Lock Detection on Intel systems.
Such detection works by triggering a vmexit, which ought to be enough
of a pause to prevent a guest from abusing of the Bus Lock.

Add an extra Xen perf counter to track the number of Bus Locks detected.
This is done because Bus Locks can also be reported by setting the bit
26 in the exit reason field, so also account for those.

Note EXIT_REASON_BUS_LOCK VMExits will always have bit 26 set in
exit_reason, and hence the performance counter doesn't need to be
increased for EXIT_REASON_BUS_LOCK handling.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: f7d07619d2ae0382e2922e287fbfbb27722f3f0b
master date: 2022-12-19 11:22:43 +0100

2 years agox86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path
Andrew Cooper [Fri, 10 Feb 2023 21:11:14 +0000 (21:11 +0000)]
x86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path

As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
the two hunks visible in the patch, RET's are not safe prior to this point.

CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
compiled in, SMEP or SMAP active), and the RET can be attacked with one of
several known speculative issues.

Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
global variable, which is not safe when XPTI is active before restoring Xen's
full pagetables.

This crash has gone unnoticed because it is only AMD CPUs which permit the
SYSCALL instruction in compatibility mode, and these are not vulnerable to
Meltdown so don't activate XPTI by default.

This is XSA-429 / CVE-2022-42331

Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point")
Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit df5b055b12116d9e63ced59ae5389e69a2a3de48)

2 years agox86/HVM: serialize pinned cache attribute list manipulation
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: serialize pinned cache attribute list manipulation

While the RCU variants of list insertion and removal allow lockless list
traversal (with RCU just read-locked), insertions and removals still
need serializing amongst themselves. To keep things simple, use the
domain lock for this purpose.

This is CVE-2022-42334 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 829ec245cf66560e3b50d140ccb3168e7fb7c945)

2 years agox86/HVM: bound number of pinned cache attribute regions
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: bound number of pinned cache attribute regions

This is exposed via DMOP, i.e. to potentially not fully privileged
device models. With that we may not permit registration of an (almost)
unbounded amount of such regions.

This is CVE-2022-42333 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a5e768640f786b681063f4e08af45d0c4e91debf)

2 years agox86/shadow: account for log-dirty mode when pre-allocating
Jan Beulich [Tue, 21 Mar 2023 11:58:50 +0000 (11:58 +0000)]
x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)

2 years agox86/ucode/AMD: late load the patch on every logical thread
Sergey Dyasli [Fri, 3 Mar 2023 07:03:43 +0000 (08:03 +0100)]
x86/ucode/AMD: late load the patch on every logical thread

Currently late ucode loading is performed only on the first core of CPU
siblings.  But according to the latest recommendation from AMD, late
ucode loading should happen on every logical thread/core on AMD CPUs.

To achieve that, introduce is_cpu_primary() helper which will consider
every logical cpu as "primary" when running on AMD CPUs.  Also include
Hygon in the check for future-proofing.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f1315e48a03a42f78f9b03c0a384165baf02acae
master date: 2023-02-28 14:51:28 +0100

2 years agolibs/guest: Fix leak on realloc failure in backup_ptes()
Edwin Török [Fri, 3 Mar 2023 07:03:19 +0000 (08:03 +0100)]
libs/guest: Fix leak on realloc failure in backup_ptes()

From `man 2 realloc`:

  If realloc() fails, the original block is left untouched; it is not freed or moved.

Found using GCC -fanalyzer:

  |  184 |         backup->entries = realloc(backup->entries,
  |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  |      |         |               | |
  |      |         |               | (91) when ‘realloc’ fails
  |      |         |               (92) ‘old_ptes.entries’ leaks here; was allocated at (44)
  |      |         (90) ...to here

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 275d13184cfa52ebe4336ed66526ce93716adbe0
master date: 2023-02-27 15:51:23 +0000

2 years agolibs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()
Andrew Cooper [Fri, 3 Mar 2023 07:02:59 +0000 (08:02 +0100)]
libs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()

Edwin, with the help of GCC's -fanalyzer, identified that p2m_frame_list_list
gets leaked.  What fanalyzer can't see is that the live_p2m_frame_list_list
and live_p2m_frame_list foreign mappings are leaked too.

Rework the logic so the out path is executed unconditionally, which cleans up
all the intermediate allocations/mappings appropriately.

Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table")
Reported-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1868d7f22660c8980bd0a7e53f044467e8b63bb5
master date: 2023-02-27 15:51:23 +0000

2 years agotools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable
Bertrand Marquis [Fri, 3 Mar 2023 07:02:30 +0000 (08:02 +0100)]
tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable

Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
the pkg-config file.
This is preventing a conflict in some build systems where PKG_CONFIG
actually contains the path to the pkg-config executable to use, as the
default assignment in libs.mk is using a weak assignment (?=).

This problem has been found when trying to build the latest version of
Xen tools using buildroot.

Fixes: d400dc5729e4 ("tools: tweak tools/libs/libs.mk for being able to support libxenctrl")
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
master date: 2023-02-24 17:44:29 +0000

2 years agoxen: Fix Clang -Wunicode diagnostic when building asm-macros
Andrew Cooper [Fri, 3 Mar 2023 07:01:21 +0000 (08:01 +0100)]
xen: Fix Clang -Wunicode diagnostic when building asm-macros

While trying to work around a different Clang-IAS bug (parent changeset), I
stumbled onto:

  In file included from arch/x86/asm-macros.c:3:
  ./arch/x86/include/asm/spec_ctrl_asm.h:144:19: error: \u used with
  no following hex digits; treating as '\' followed by identifier [-Werror,-Wunicode]
  .L\@_fill_rsb_loop\uniq:
                    ^

It turns out that Clang -E is sensitive to the file extension of the source
file it is processing.  Furthermore, C explicitly permits the use of \u
escapes in identifier names, so the diagnostic would be reasonable in
principle if we trying to compile the result.

asm-macros should really have been .S from the outset, as it is ultimately
generating assembly, not C.  Rename it, which causes Clang not to complain.

We need to introduce rules for generating a .i file from .S, and substituting
c_flags for a_flags lets us drop the now-redundant -D__ASSEMBLY__.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 53f0d02040b1df08f0589f162790ca376e1c2040
master date: 2023-02-24 17:44:29 +0000

2 years agoxen: Work around Clang-IAS macro \@ expansion bug
Andrew Cooper [Fri, 3 Mar 2023 07:00:04 +0000 (08:00 +0100)]
xen: Work around Clang-IAS macro \@ expansion bug

https://github.com/llvm/llvm-project/issues/60792

It turns out that Clang-IAS does not expand \@ uniquely in a translaition
unit, and the XSA-426 change tickles this bug:

  <instantiation>:4:1: error: invalid symbol redefinition
  .L1_fill_rsb_loop:
  ^
  make[3]: *** [Rules.mk:247: arch/x86/acpi/cpu_idle.o] Error 1

Extend DO_OVERWRITE_RSB with an optional parameter so C callers can mix %= in
too, which Clang does seem to expand properly.

Fixes: 63305e5392ec ("x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
master date: 2023-02-24 17:44:29 +0000

2 years agox86: perform mem_sharing teardown before paging teardown
Tamas K Lengyel [Fri, 3 Mar 2023 06:59:08 +0000 (07:59 +0100)]
x86: perform mem_sharing teardown before paging teardown

An assert failure has been observed in p2m_teardown when performing vm
forking and then destroying the forked VM (p2m-basic.c:173). The assert
checks whether the domain's shared pages counter is 0. According to the
patch that originally added the assert (7bedbbb5c31) the p2m_teardown
should only happen after mem_sharing already relinquished all shared pages.

In this patch we flip the order in which relinquish ops are called to avoid
tripping the assert. Conceptually sharing being torn down makes sense to
happen before paging is torn down.

Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2869349f0cb3a89dcbf1f1b30371f58df6309312
master date: 2023-02-23 12:35:48 +0100

2 years agox86/ucode/AMD: apply the patch early on every logical thread
Sergey Dyasli [Fri, 3 Mar 2023 06:58:35 +0000 (07:58 +0100)]
x86/ucode/AMD: apply the patch early on every logical thread

The original issue has been reported on AMD Bulldozer-based CPUs where
ucode loading loses the LWP feature bit in order to gain the IBPB bit.
LWP disabling is per-SMT/CMT core modification and needs to happen on
each sibling thread despite the shared microcode engine. Otherwise,
logical CPUs will end up with different cpuid capabilities.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
Guests running under Xen happen to be not affected because of levelling
logic for the feature masking/override MSRs which causes the LWP bit to
fall out and hides the issue. The latest recommendation from AMD, after
discussing this bug, is to load ucode on every logical CPU.

In Linux kernel this issue has been addressed by e7ad18d1169c
("x86/microcode/AMD: Apply the patch early on every logical thread").
Follow the same approach in Xen.

Introduce SAME_UCODE match result and use it for early AMD ucode
loading. Take this opportunity and move opt_ucode_allow_same out of
compare_revisions() to the relevant callers and also modify the warning
message based on it. Intel's side of things is modified for consistency
but provides no functional change.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
master date: 2023-02-21 15:08:05 +0100

2 years agobuild: make FILE symbol paths consistent
Ross Lagerwall [Fri, 3 Mar 2023 06:58:12 +0000 (07:58 +0100)]
build: make FILE symbol paths consistent

The FILE symbols in out-of-tree builds may be either a relative path to
the object dir or an absolute path depending on how the build is
invoked. Fix the paths for C files so that they are consistent with
in-tree builds - the path is relative to the "xen" directory (e.g.
common/irq.c).

This fixes livepatch builds when the original Xen build was out-of-tree
since livepatch-build always does in-tree builds. Note that this doesn't
fix the behaviour for Clang < 6 which always embeds full paths.

Fixes: 7115fa562fe7 ("build: adding out-of-tree support to the xen build")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 5b9bb91abba7c983def3b4bef71ab08ad360a242
master date: 2023-02-15 16:13:49 +0100

2 years agocredit2: respect credit2_runqueue=all when arranging runqueues
Marek Marczykowski-Górecki [Fri, 3 Mar 2023 06:57:39 +0000 (07:57 +0100)]
credit2: respect credit2_runqueue=all when arranging runqueues

Documentation for credit2_runqueue=all says it should create one queue
for all pCPUs on the host. But since introduction
sched_credit2_max_cpus_runqueue, it actually created separate runqueue
per socket, even if the CPUs count is below
sched_credit2_max_cpus_runqueue.

Adjust the condition to skip syblink check in case of
credit2_runqueue=all.

Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1f5747ee929fbbcae58d7234c6c38a77495d0cfe
master date: 2023-02-15 16:12:42 +0100

2 years agox86/shskt: Disable CET-SS on parts susceptible to fractured updates
Andrew Cooper [Fri, 3 Mar 2023 06:56:16 +0000 (07:56 +0100)]
x86/shskt: Disable CET-SS on parts susceptible to fractured updates

Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
Token".

Architecturally, an event delivery which starts in CPL<3 and switches shadow
stack will first validate the Supervisor Shadow Stack Token (setting the busy
bit), then pushes CS/LIP/SSP.  One example of this is an NMI interrupting Xen.

Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc
between setting the busy bit and completing the event injection renders the
action non-restartable, because when it comes time to restart, the busy bit is
found to be already set.

This is far more easily encountered under virt, yet it is not the fault of the
hypervisor, nor the fault of the guest kernel.  The fault lies somewhere
between the architectural specification, and the uarch behaviour.

Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor
shadow stacks are safe to use.  Because of how Xen lays out its shadow stacks,
fracturing is not expected to be a problem on native.

Detect this case on boot and default to not using shstk if virtualised.
Specifying `cet=shstk` on the command line will override this heuristic and
enable shadow stacks irrespective.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 01e7477d1b081cff4288ff9f51ec59ee94c03ee0
master date: 2023-02-09 18:26:17 +0000

2 years agox86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
Andrew Cooper [Fri, 3 Mar 2023 06:55:54 +0000 (07:55 +0100)]
x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}

We don't actually need ecx yet, but adding it in now will reduce the amount to
which leaf 7 is out of order in a featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b4a23bf6293aadecfd03bf9e83974443e2eac9cb
master date: 2023-02-09 18:26:17 +0000

2 years agolibs/util: Fix parallel build between flex/bison and CC rules
Anthony PERARD [Fri, 3 Mar 2023 06:55:24 +0000 (07:55 +0100)]
libs/util: Fix parallel build between flex/bison and CC rules

flex/bison generate two targets, and when those targets are
prerequisite of other rules they are considered independently by make.

We can have a situation where the .c file is out-of-date but not the
.h, git checkout for example. In this case, if a rule only have the .h
file as prerequiste, make will procced and start to build the object.
In parallel, another target can have the .c file as prerequisite and
make will find out it need re-generating and do so, changing the .h at
the same time. This parallel task breaks the first one.

To avoid this scenario, we put both the header and the source as
prerequisite for all object even if they only need the header.

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bf652a50fb3bb3b1b3d93db6fb79bc28f978fe75
master date: 2023-02-09 18:26:17 +0000

2 years agodebian/changelog: finish 4.17.0+46-gaaf74a532c-1
Hans van Kranenburg [Fri, 24 Feb 2023 17:08:07 +0000 (18:08 +0100)]
debian/changelog: finish 4.17.0+46-gaaf74a532c-1

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
2 years agodebian/changelog: Remove duplicate 'Note that'
Hans van Kranenburg [Fri, 10 Feb 2023 12:59:21 +0000 (13:59 +0100)]
debian/changelog: Remove duplicate 'Note that'

This was already included in the changelog for 4.17.0-1 :(

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
2 years agodebian/changelog: Fix bug number typo.
Hans van Kranenburg [Fri, 10 Feb 2023 12:57:58 +0000 (13:57 +0100)]
debian/changelog: Fix bug number typo.

00:30 < Maxi[m]> Knorrie: I just noticed, the "(Closes: #102983)" from
   our changelog is missing a 0 at the end.
00:30 -zwiebelbot:#debian-xen- Debian#102983:
   quantlib_0.1.9-1(unstable): please add build-depends -
   https://bugs.debian.org/102983
00:31 < Maxi[m]> The correct bug number is #1029830

Oops. We will have to set it do done manually.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
2 years agodebian/shuffle-boot-files: fix typo
Hans van Kranenburg [Sat, 4 Feb 2023 16:57:23 +0000 (17:57 +0100)]
debian/shuffle-boot-files: fix typo

The tree picture changed, but I didn't correct the names in the text.
:-)

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
2 years agoUpdate changelog for new upstream 4.17.0+46-gaaf74a532c
Hans van Kranenburg [Fri, 24 Feb 2023 17:06:42 +0000 (18:06 +0100)]
Update changelog for new upstream 4.17.0+46-gaaf74a532c

[git-debrebase changelog: new upstream 4.17.0+46-gaaf74a532c]

2 years agoUpdate to upstream 4.17.0+46-gaaf74a532c
Hans van Kranenburg [Fri, 24 Feb 2023 17:06:42 +0000 (18:06 +0100)]
Update to upstream 4.17.0+46-gaaf74a532c

[git-debrebase anchor: new upstream 4.17.0+46-gaaf74a532c, merge]

2 years agod/changelog: finish 4.17.0+24-g2f8851c37f-2
Hans van Kranenburg [Mon, 6 Feb 2023 13:41:15 +0000 (14:41 +0100)]
d/changelog: finish 4.17.0+24-g2f8851c37f-2

2 years agochangelog: Prepare for upload to experimental
Ian Jackson [Sun, 5 Feb 2023 13:08:06 +0000 (13:08 +0000)]
changelog: Prepare for upload to experimental

2 years agoautomation: Remove clang-8 from Debian unstable container
Anthony PERARD [Tue, 21 Feb 2023 16:55:38 +0000 (16:55 +0000)]
automation: Remove clang-8 from Debian unstable container

First, apt complain that it isn't the right way to add keys anymore,
but hopefully that's just a warning.

Second, we can't install clang-8:
The following packages have unmet dependencies:
 clang-8 : Depends: libstdc++-8-dev but it is not installable
           Depends: libgcc-8-dev but it is not installable
           Depends: libobjc-8-dev but it is not installable
           Recommends: llvm-8-dev but it is not going to be installed
           Recommends: libomp-8-dev but it is not going to be installed
 libllvm8 : Depends: libffi7 (>= 3.3~20180313) but it is not installable
E: Unable to correct problems, you have held broken packages.

clang on Debian unstable is now version 14.0.6.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a6b1e2b80fe2053b1c9c9843fb086a668513ea36)

2 years agox86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions
Andrew Cooper [Thu, 8 Sep 2022 20:27:58 +0000 (21:27 +0100)]
x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions

This is XSA-426 / CVE-2022-27672

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)

2 years agotools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()
Andrew Cooper [Wed, 1 Feb 2023 11:27:42 +0000 (11:27 +0000)]
tools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()

All caml_alloc_*() functions can throw exceptions, and longjump out of
context.  If this happens, we leak the xch/xce handle.

Reorder the logic to allocate the the Ocaml object first.

Fixes: 8b3c06a3e545 ("tools/ocaml/xenctrl: OCaml 5 support, fix use-after-free")
Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d69ccf52ad467ccc22029172a8e61dc621187889)

2 years agotools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released
Andrew Cooper [Tue, 31 Jan 2023 17:19:30 +0000 (17:19 +0000)]
tools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released

The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

More than what the manual says, the intf pointer is (potentially) invalidated
by caml_enter_blocking_section() if another thread happens to perform garbage
collection at just the right (wrong) moment.

Rewrite the logic.  There's no need to stash data in the Ocaml object until
the success path at the very end.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9e7c74e6f9fd2e44df1212643b80af9032b45b07)

2 years agotools/ocaml/xc: Fix binding for xc_domain_assign_device()
Edwin Török [Thu, 12 Jan 2023 11:38:38 +0000 (11:38 +0000)]
tools/ocaml/xc: Fix binding for xc_domain_assign_device()

The patch adding this binding was plain broken, and unreviewed.  It modified
the C stub to add a 4th parameter without an equivalent adjustment in the
Ocaml side of the bindings.

In 64bit builds, this causes us to dereference whatever dead value is in %rcx
when trying to interpret the rflags parameter.

This has gone unnoticed because Xapi doesn't use this binding (it has its
own), but unbreak the binding by passing RDM_RELAXED unconditionally for
now (matching the libxl default behaviour).

Fixes: 9b34056cb4 ("tools: extend xc_assign_device() to support rdm reservation policy")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 4250683842104f02996428f93927a035c8e19266)

2 years agotools/ocaml/evtchn: Don't reference Custom objects with the GC lock released
Edwin Török [Thu, 12 Jan 2023 17:48:29 +0000 (17:48 +0000)]
tools/ocaml/evtchn: Don't reference Custom objects with the GC lock released

The modification to the _H() macro for Ocaml 5 support introduced a subtle
bug.  From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

Previously, the value was a naked C pointer, so dereferencing it wasn't
"accessing any Ocaml data", but the fix to avoid naked C pointers added a
layer of indirection through an Ocaml Custom object, meaning that the common
pattern of using _H() in a blocking section is unsafe.

In order to fix:

 * Drop the _H() macro and replace it with a static inline xce_of_val().
 * Opencode the assignment into Data_custom_val() in the two constructors.
 * Rename "value xce" parameters to "value xce_val" so we can consistently
   have "xenevtchn_handle *xce" on the stack, and obtain the pointer with the
   GC lock still held.

Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 2636d8ff7a670c4d2485757dbe966e36c259a960)

2 years agotools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag
Andrew Cooper [Tue, 31 Jan 2023 10:59:42 +0000 (10:59 +0000)]
tools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag

caml_alloc() takes units of Wsize (word size), not bytes.  As a consequence,
we're allocating 4 or 8 times too much memory.

Ocaml has a helper, Wsize_bsize(), but it truncates cases which aren't an
exact multiple.  Use a BUILD_BUG_ON() to cover the potential for truncation,
as there's no rounding-up form of the helper.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Fixes: d3e649277a13 ("ocaml: add mmap bindings implementation.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 36eb2de31b6ecb8787698fb1a701bd708c8971b2)

2 years agotools/ocaml/libs: Don't declare stubs as taking void
Edwin Török [Thu, 12 Jan 2023 11:28:29 +0000 (11:28 +0000)]
tools/ocaml/libs: Don't declare stubs as taking void

There is no such thing as an Ocaml function (C stub or otherwise) taking no
parameters.  In the absence of any other parameters, unit is still passed.

This doesn't explode with any ABI we care about, but would malfunction for an
ABI environment such as stdcall.

Fixes: c3afd398ba7f ("ocaml: Add XS bindings.")
Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ff8b560be80b9211c303d74df7e4b3921d2bb8ca)

2 years agotools/oxenstored: validate config file before live update
Edwin Török [Tue, 11 May 2021 15:56:50 +0000 (15:56 +0000)]
tools/oxenstored: validate config file before live update

The configuration file can contain typos or various errors that could prevent
live update from succeeding (e.g. a flag only valid on a different version).
Unknown entries in the config file would be ignored on startup normally,
add a strict --config-test that live-update can use to check that the config file
is valid *for the new binary*.

For compatibility with running old code during live update recognize
--live --help as an equivalent to --config-test.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit e6f07052ce4a0f0b7d4dc522d87465efb2d9ee86)

2 years agotools/ocaml/xb: Drop Xs_ring.write
Edwin Török [Fri, 16 Dec 2022 18:25:20 +0000 (18:25 +0000)]
tools/ocaml/xb: Drop Xs_ring.write

This function is unusued (only Xs_ring.write_substring is used), and the
bytes/string conversion here is backwards: the C stub implements the bytes
version and then we use a Bytes.unsafe_of_string to convert a string into
bytes.

However the operation here really is read-only: we read from the string and
write it to the ring, so the C stub should implement the read-only string
version, and if needed we could use Bytes.unsafe_to_string to be able to send
'bytes'. However that is not necessary as the 'bytes' version is dropped above.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 01f139215e678c2dc7d4bb3f9f2777069bb1b091)

2 years agotools/ocaml/xb,mmap: Use Data_abstract_val wrapper
Edwin Török [Fri, 16 Dec 2022 18:25:10 +0000 (18:25 +0000)]
tools/ocaml/xb,mmap: Use Data_abstract_val wrapper

This is not strictly necessary since it is essentially a no-op currently: a
cast to void * and value *, even in OCaml 5.0.

However it does make it clearer that what we have here is not a regular OCaml
value, but one allocated with Abstract_tag or Custom_tag, and follows the
example from the manual more closely:
https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head

It also makes it clearer that these modules have been reviewed for
compat with OCaml 5.0.

We cannot use OCaml finalizers here, because we want exact control over when
to unmap these pages from remote domains.

No functional change.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d2ccc637111d6dbcf808aaffeec7a46f0b1e1c81)

2 years agotools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist
Edwin Török [Tue, 1 Nov 2022 17:59:17 +0000 (17:59 +0000)]
tools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall.  Nevertheless, getting domain info in
blocks of 1024 is far more efficient than blocks of 2.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the previous change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 95db09b1b154fb72fad861815ceae1f3fa49fc4e)

2 years agotools/ocaml/xenctrl: Make domain_getinfolist tail recursive
Edwin Török [Tue, 1 Nov 2022 17:59:16 +0000 (17:59 +0000)]
tools/ocaml/xenctrl: Make domain_getinfolist tail recursive

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall.  xenopsd was further observed to be
wasting excessive quantites of time manipulating the list of already-obtained
domains.

Implement a tail recursive `rev_concat` equivalent to `concat |> rev`, and use
it instead of calling `@` multiple times.

An incidental benefit is that the list of domains will now be in domid order,
instead of having pairs of 2 domains changing direction every time.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the subsequent change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit c3b6be714c64aa62b56d0bce96f4b6a10b5c2078)

2 years agolibxl: fix guest kexec - skip cpuid policy
Jason Andryuk [Tue, 7 Feb 2023 16:01:49 +0000 (17:01 +0100)]
libxl: fix guest kexec - skip cpuid policy

When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid.  Calling libxl__cpuid_legacy() on the
existing domain fails since the cpuid policy has already been set, and
the guest isn't rebuilt and doesn't kexec.

xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM

During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
issue.  Before commit 34990446ca91, the libxl__cpuid_legacy() failure
would have been ignored, so kexec would continue.

Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 1e454c2b5b1172e0fc7457e411ebaba61db8fc87
master date: 2023-01-26 10:58:23 +0100

2 years agons16550: fix an incorrect assignment to uart->io_size
Ayan Kumar Halder [Tue, 7 Feb 2023 16:00:47 +0000 (17:00 +0100)]
ns16550: fix an incorrect assignment to uart->io_size

uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
is assigned to it, it should be converted to size in bytes.

Fixes: 17b516196c ("ns16550: add ACPI support for ARM only")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 352c89f72ddb67b8d9d4e492203f8c77f85c8df1
master date: 2023-01-24 16:54:38 +0100