xen.git
3 years agoshim: Provide separate install-shim target
Ian Jackson [Fri, 12 Oct 2018 17:17:10 +0000 (17:17 +0000)]
shim: Provide separate install-shim target

When building on a 32-bit userland, the user wants to build 32-bit
tools and a 64-bit hypervisor.  This involves setting XEN_TARGET_ARCH
to different values for the tools build and the hypervisor build.

So the user must invoke the tools build and the hypervisor build
separately.

However, although the shim is done by the tools/firmware Makefile, its
bitness needs to be the same as the hypervisor, not the same as the
tools.  When run with XEN_TARGET_ARCH=x86_32, it it skipped, which is
wrong.

So the user must invoke the shim build separately.  This can be done
with
   make -C tools/firmware/xen-dir XEN_TARGET_ARCH=x86_64

However, tools/firmware/xen-dir has no `install' target.  The
installation of all `firmware' is done in tools/firmware/Makefile.  It
might be possible to fix this, but it is not trivial.  For example,
the definitions of INST_DIR and DEBG_DIR would need to be copied, as
would an appropriate $(INSTALL_DIR) call.

For now, provide an `install-shim' target in tools/firmware/Makefile.

This has to be called from `install' of course.  We can't make it
a dependency of `install' because it might be run before `all' has
completed.  We could make it depend on a `shim' target but such
a target is nearly impossible to write because everything is done by
the inflexible subdir-$@ machinery.

The overally result of this patch is that existing make invocations
work as before.  But additionally, the user can say
  make -C tools/firmware install-shim XEN_TARGET_ARCH=x86_64
to install the shim.  The user must have built it already.
Unlike the build rune, this install-rune is properly conditional
so it is OK to call on ARM.

What a mess.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
3 years agoconfig/Tools.mk.in: Respect caller's CONFIG_PV_SHIM
Ian Jackson [Fri, 12 Oct 2018 16:00:16 +0000 (16:00 +0000)]
config/Tools.mk.in: Respect caller's CONFIG_PV_SHIM

This makes it easier to disable the shim build.  (In Debian we need to
build the shim separately because it needs different compiler flags).

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
[ Hans: adjust from tools/firmware/Makefile to config/Tools.mk.in to
follow changes that happened in 8845155c83 ("pvshim: make PV shim build
selectable from configure") ]
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
3 years ago.gitignore: Add configure output which we always delete and regenerate
Ian Jackson [Fri, 5 Oct 2018 17:05:48 +0000 (18:05 +0100)]
.gitignore: Add configure output which we always delete and regenerate

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
3 years agoautoconf: Provide libexec_libdir_suffix
Ian Jackson [Wed, 3 Oct 2018 15:25:58 +0000 (16:25 +0100)]
autoconf: Provide libexec_libdir_suffix

This is going to be used to put libfsimage.so into a path containing
the multiarch triplet.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
3 years agotools-libfsimage-prefix.diff
Hans van Kranenburg [Mon, 25 May 2020 15:08:18 +0000 (17:08 +0200)]
tools-libfsimage-prefix.diff

\o/

3 years agoDo not build the instruction emulator
Ian Jackson [Thu, 20 Sep 2018 17:10:14 +0000 (18:10 +0100)]
Do not build the instruction emulator

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
3 years agoRemove static solaris support from pygrub
Bastian Blank [Sat, 5 Jul 2014 09:47:29 +0000 (11:47 +0200)]
Remove static solaris support from pygrub

Patch-Name: tools-pygrub-remove-static-solaris-support

Gbp-Pq: Topic misc
Gbp-Pq: Name tools-pygrub-remove-static-solaris-support

3 years agoDo not ship COPYING into /usr/include
Bastian Blank [Sat, 5 Jul 2014 09:47:30 +0000 (11:47 +0200)]
Do not ship COPYING into /usr/include

This is not wanted in Debian.  COPYING ends up in
/usr/share/doc/xen-*copyright.

Patch-Name: tools-include-no-COPYING.diff

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
3 years agoconfig-prefix.diff
Bastian Blank [Sat, 5 Jul 2014 09:46:45 +0000 (11:46 +0200)]
config-prefix.diff

Patch-Name: config-prefix.diff

Gbp-Pq: Topic prefix-abiname
Gbp-Pq: Name config-prefix.diff

3 years agoDisplay Debian package version in hypervisor log
Bastian Blank [Sat, 5 Jul 2014 09:46:43 +0000 (11:46 +0200)]
Display Debian package version in hypervisor log

During hypervisor boot, disable the banner and nicely display the xen
version as well as the Maintainer address from debian/control.

For this to work the SOURCE_BASE_DIR variable needs to be set by the
build system to the top directory, i.e. where the debian folder is.

Original patch by Bastian Blank <waldi@debian.org>
Modified by
Hans van Kranenburg <hans@knorrie.org>
Maximilian Engelhardt <maxi@daemonizer.de>

3 years agoDelete configure output
Ian Jackson [Wed, 19 Sep 2018 15:53:22 +0000 (16:53 +0100)]
Delete configure output

These autogenerated files are not useful in Debian; dh_autoreconf will
regenerate them.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
3 years agoDelete config.sub and config.guess
Ian Jackson [Wed, 19 Sep 2018 15:45:49 +0000 (16:45 +0100)]
Delete config.sub and config.guess

dh_autoreconf will provide these back.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
3 years agodebian/changelog: finish 4.17.0~rc4-1~exp1
Maximilian Engelhardt [Wed, 7 Dec 2022 19:57:41 +0000 (20:57 +0100)]
debian/changelog: finish 4.17.0~rc4-1~exp1

3 years agodebian: update debian/control for xen-4.17 build
Maximilian Engelhardt [Fri, 2 Dec 2022 20:19:47 +0000 (21:19 +0100)]
debian: update debian/control for xen-4.17 build

replace 4.16 with 4.17

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
3 years agoUpdate changelog for new upstream 4.17.0~rc4
Maximilian Engelhardt [Fri, 2 Dec 2022 19:22:48 +0000 (20:22 +0100)]
Update changelog for new upstream 4.17.0~rc4

[git-debrebase changelog: new upstream 4.17.0~rc4]

3 years agoUpdate to upstream 4.17.0~rc4
Maximilian Engelhardt [Fri, 2 Dec 2022 19:22:48 +0000 (20:22 +0100)]
Update to upstream 4.17.0~rc4

[git-debrebase anchor: new upstream 4.17.0~rc4, merge]

3 years agodebian/changelog: finish 4.16.2+90-g0d39a6d1ae-1
Hans van Kranenburg [Wed, 16 Nov 2022 12:15:07 +0000 (13:15 +0100)]
debian/changelog: finish 4.16.2+90-g0d39a6d1ae-1

3 years agoTurn off debug by default
Julien Grall [Thu, 1 Dec 2022 13:54:04 +0000 (13:54 +0000)]
Turn off debug by default

Signed-off-by: Julien Grall <jgrall@amazon.com>
3 years agoSUPPORT.md: Set Releases Notes link
Julien Grall [Thu, 1 Dec 2022 13:50:31 +0000 (13:50 +0000)]
SUPPORT.md: Set Releases Notes link

Signed-off-by: Julien Grall <jgrall@amazon.com>
3 years agoConfig.mk: switch to named tags (for stable branch)
Julien Grall [Thu, 1 Dec 2022 11:58:50 +0000 (11:58 +0000)]
Config.mk: switch to named tags (for stable branch)

3 years agodocs/misc/arm: Update references to Linux kernel docs
Michal Orzel [Fri, 18 Nov 2022 11:45:54 +0000 (12:45 +0100)]
docs/misc/arm: Update references to Linux kernel docs

Some time ago, Linux switched the format of docs to ReST and the format
of device-tree bindings to json-schema.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoChangelog: Add __ro_after_init and CET
Andrew Cooper [Thu, 24 Nov 2022 02:50:40 +0000 (10:50 +0800)]
Changelog: Add __ro_after_init and CET

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoCHANGELOG: Add missing entries for work during the 4.17 release
Henry Wang [Thu, 24 Nov 2022 02:50:39 +0000 (10:50 +0800)]
CHANGELOG: Add missing entries for work during the 4.17 release

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/flask: Wire up XEN_DOMCTL_{get,set}_paging_mempool_size
Andrew Cooper [Mon, 21 Nov 2022 12:46:39 +0000 (12:46 +0000)]
xen/flask: Wire up XEN_DOMCTL_{get,set}_paging_mempool_size

These were overlooked in the original patch, and noticed by OSSTest which does
run some Flask tests.

Fixes: 22b20bd98c02 ("xen: Introduce non-broken hypercalls for the paging mempool size")
Suggested-by: Daniel Smith <dpsmith@apertussolutions.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/libxl: Fixes to libxl__domain_set_paging_mempool_size()
Andrew Cooper [Fri, 18 Nov 2022 16:53:45 +0000 (16:53 +0000)]
tools/libxl: Fixes to libxl__domain_set_paging_mempool_size()

The error message accidentally printed the bytes value as if it were kB.

Furthermore, both b_info.shadow_memkb and shadow_mem are uint64_t, meaning
there is a risk of overflow if the user specified a stupidly large value in
the vm.cfg file.  Check and reject such a condition.

Fixes: 7c3bbd940dd8 ("xen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agolibs/light: Propagate libxl__arch_domain_create() return code
Anthony PERARD [Mon, 21 Nov 2022 11:23:01 +0000 (12:23 +0100)]
libs/light: Propagate libxl__arch_domain_create() return code

Commit 34990446ca91 started to overwrite the `rc` value from
libxl__arch_domain_create(), thus error aren't propagated anymore.

Check `rc` value before doing the next thing.

Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoefifb: ignore frame buffer with invalid configuration
Roger Pau Monné [Mon, 21 Nov 2022 11:21:51 +0000 (12:21 +0100)]
efifb: ignore frame buffer with invalid configuration

On one of my boxes when the HDMI cable is not plugged in the
FrameBufferBase of the EFI_GRAPHICS_OUTPUT_PROTOCOL_MODE structure is
set to 0 by the firmware (while some of the other fields looking
plausible).

Such (bogus address) ends up mapped in vesa_init(), and since it
overlaps with a RAM region the whole system goes down pretty badly,
see:

(XEN) vesafb: framebuffer at 0x0000000000000000, mapped to 0xffff82c000201000, using 35209k, total 35209k
(XEN) vesafb: mode is 0x37557x32, linelength=960, font 8x16
(XEN) vesafb: Truecolor: size=8:8:8:8, shift=24:0:8:16
(XEN) (XEN) (XEN) (XEN) (XEN) (XEN) (XEN) (XEN) �ERROR: Class:0; Subclass:0; Operation: 0
ERROR: No ConOut
ERROR: No ConIn

Do like Linux and prevent using the EFI Frame Buffer if the base
address is 0.  This is inline with the logic in Linuxes
fb_base_is_valid() function at drivers/video/fbdev/efifb.c v6.0.9.

See also Linux commit 133bb070e94ab41d750c6f2160c8843e46f11b78 for
further reference.

Also prevent using Frame Buffers that have a 0 height or width, as
those are also invalid.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen/arm: Correct the p2m pool size calculations
Andrew Cooper [Thu, 20 Oct 2022 11:14:30 +0000 (12:14 +0100)]
xen/arm: Correct the p2m pool size calculations

Allocating or freeing p2m pages doesn't alter the size of the mempool; only
the split between free and used pages.

Right now, the hypercalls operate on the free subset of the pool, meaning that
XEN_DOMCTL_get_paging_mempool_size varies with time as the guest shuffles its
physmap, and XEN_DOMCTL_set_paging_mempool_size ignores the used subset of the
pool and lets the guest grow unbounded.

This fixes test-pagign-mempool on ARM so that the behaviour matches x86.

This is part of XSA-409 / CVE-2022-33747.

Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls
Andrew Cooper [Tue, 25 Oct 2022 14:27:05 +0000 (15:27 +0100)]
xen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls

This reverts most of commit cf2a68d2ffbc3ce95e01449d46180bddb10d24a0, and bits
of cbea5a1149ca7fd4b7cdbfa3ec2e4f109b601ff7.

First of all, with ARM borrowing x86's implementation, the logic to set the
pool size should have been common, not duplicated.  Introduce
libxl__domain_set_paging_mempool_size() as a shared implementation, and use it
from the ARM and x86 paths.  It is left as an exercise to the reader to judge
how libxl/xl can reasonably function without the ability to query the pool
size...

Remove ARM's p2m_domctl() infrastructure now the functioanlity has been
replaced with a working and unit tested interface.

This is part of XSA-409 / CVE-2022-33747.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/tests: Unit test for paging mempool size
Andrew Cooper [Thu, 20 Oct 2022 11:13:46 +0000 (12:13 +0100)]
tools/tests: Unit test for paging mempool size

Exercise some basic functionality of the new
xc_{get,set}_paging_mempool_size() hypercalls.

This passes on x86, but fails currently on ARM.  ARM will be fixed up in
future patches.

This is part of XSA-409 / CVE-2022-33747.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Introduce non-broken hypercalls for the paging mempool size
Andrew Cooper [Fri, 21 Oct 2022 13:13:00 +0000 (14:13 +0100)]
xen: Introduce non-broken hypercalls for the paging mempool size

The existing XEN_DOMCTL_SHADOW_OP_{GET,SET}_ALLOCATION have problems:

 * All set_allocation() flavours have an overflow-before-widen bug when
   calculating "sc->mb << (20 - PAGE_SHIFT)".
 * All flavours have a granularity of 1M.  This was tolerable when the size of
   the pool could only be set at the same granularity, but is broken now that
   ARM has a 16-page stopgap allocation in use.
 * All get_allocation() flavours round up, and in particular turn 0 into 1,
   meaning the get op returns junk before a successful set op.
 * The x86 flavours reject the hypercalls before the VM has vCPUs allocated,
   despite the pool size being a domain property.
 * Even the hypercall names are long-obsolete.

Implement a better interface, which can be first used to unit test the
behaviour, and subsequently correct a broken implementation.  The old
interface will be retired in due course.

The unit of bytes (as opposed pages) is a deliberate API/ABI improvement to
more easily support multiple page granularities.

This is part of XSA-409 / CVE-2022-33747.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agox86/hvm: Revert per-domain APIC acceleration support
Andrew Cooper [Mon, 14 Nov 2022 21:47:59 +0000 (21:47 +0000)]
x86/hvm: Revert per-domain APIC acceleration support

I was really hoping to avoid this, but its now too late in the 4.17 freeze and
we still don't have working fixes.

The in-Xen calculations for assistance capabilities are buggy.  For the
avoidance of doubt, the original intention was to be able to control every
aspect of a APIC acceleration so we could comprehensively test Xen's support,
as it has proved to be buggy time and time again.

Even after a protracted discussion on what the new API ought to mean, attempts
to apply it to the existing logic have been unsuccessful, proving that the
API/ABI is too complicated for most people to reason about.

This reverts most of:
  2ce11ce249a3981bac50914c6a90f681ad7a4222
  6b2b9b3405092c3ad38d7342988a584b8efa674c

leaving in place the non-APIC specific changes (minimal as they are).

This takes us back to the behaviour of Xen 4.16 where APIC acceleration is
configured on a per system basis.

This work will be revisted in due course.

Fixes: 2ce11ce249a3 ("x86/HVM: allow per-domain usage of hardware virtualized APIC")
Fixes: 6b2b9b340509 ("x86: report Interrupt Controller Virtualization capabilities")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Used SPDX identifier in public headers
Anthony PERARD [Thu, 3 Nov 2022 11:52:04 +0000 (11:52 +0000)]
xen: Used SPDX identifier in public headers

The text of the licence has been check to be the same as the one at
https://spdx.org/licenses/MIT.html, except we don't have "(including
the next paragraph)".

Mecanical change done with a script.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoRework COPYING installed in /usr/include/xen/, due to several licences
Anthony PERARD [Thu, 3 Nov 2022 11:52:03 +0000 (11:52 +0000)]
Rework COPYING installed in /usr/include/xen/, due to several licences

The notice in the COPYING file in "xen/include/public/COPYING" doesn't
really apply to the files that ultimately are been install at
"/usr/include/xen". The issue are headers in the "sys/" subdirectory
that comes from other projects such as Linux or FreeBSD.

The main issue is that there are two headers that have a different
licence than the MIT licence:

- xen-sys/Linux/gntalloc.h (installed as "sys/gntalloc.h") is public
  domain.
- xen-sys/FreeBSD/gntdev.h (installed as "sys/gntdev.h") is BSD-2.

To clarify this, we'll install a COPYING file with a different notice.

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Add licence header to device_tree_defs.h
Anthony PERARD [Thu, 3 Nov 2022 11:52:02 +0000 (11:52 +0000)]
xen: Add licence header to device_tree_defs.h

This header have been created by moving code from other part of the
project and miss a licence header. The original source code was some
version of GPL or LGPL but we intend to have the public header to be
MIT so they can be included easily in other projects.

Part of device_tree_defs.h were moved from libxl_arm.c which is
LGPL-2.1-only. And part were moved from device_tree.h that is
GPL-2.0-only.

Part of the original code were added by Julien Grall @ Linaro in
commits c3ba52a84dd8 and 405c167f0ec9 and 886f34045bf0. The other part
were added by Ian Campbell @ Citrix, with commit 0c64527e7fc9.

Resolves: xen-project/xen#35
Fixes: 1c898a9fec7e ("xen/arm: move a few DT related defines to public/device_tree_defs.h")
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> [Citrix relicensing]
Acked-by: Grant Likely <grant.likely@linaro.org> [Linaro relicensing]
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/include/xen-foreign: Add SPDX identifier to generated headers
Anthony PERARD [Thu, 3 Nov 2022 11:52:01 +0000 (11:52 +0000)]
tools/include/xen-foreign: Add SPDX identifier to generated headers

The headers install in "/usr/include/xen/foreign/" are missing a
licence header. This patch adds a SPDX identifier to clarify that
the MIT licence is used.

The script now check that the licence of the input file is also MIT,
by checking for the presence of the SPDX identifier.

Also add information about which files are used to generate the
headers.

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Used SPDX identifier in some public headers
Anthony PERARD [Thu, 3 Nov 2022 11:52:00 +0000 (11:52 +0000)]
xen: Used SPDX identifier in some public headers

The script "tools/include/xen-foreign/mkheader.py" is going to do a
sanity check on the licences of these headers. To ease this, we will
replace the verbatim copy of the MIT licence by its SPDX identifier
equivalent.

The text of the licence has been check to be the same as the one at
https://spdx.org/licenses/MIT.html, except we don't have "(including
the next paragraph)". The text is also the same as the one in
"xen/include/public/COPYING".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Add licence information to public/errno.h
Anthony PERARD [Thu, 3 Nov 2022 11:51:59 +0000 (11:51 +0000)]
xen: Add licence information to public/errno.h

Fixes: 81f559e97974 ("make error codes a formal part of the ABI")
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agox86/spec-ctrl: Fill in whitepaper URL
Andrew Cooper [Mon, 14 Nov 2022 21:41:08 +0000 (21:41 +0000)]
x86/spec-ctrl: Fill in whitepaper URL

... now that we a link available.

Fixes: 9deaf2d932f0 ("x86/spec-ctrl: Enable Zen2 chickenbit")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoUpdate changelog for new upstream 4.16.2+90-g0d39a6d1ae
Hans van Kranenburg [Wed, 16 Nov 2022 11:50:33 +0000 (12:50 +0100)]
Update changelog for new upstream 4.16.2+90-g0d39a6d1ae

[git-debrebase changelog: new upstream 4.16.2+90-g0d39a6d1ae]

3 years agoUpdate to upstream 4.16.2+90-g0d39a6d1ae
Hans van Kranenburg [Wed, 16 Nov 2022 11:50:32 +0000 (12:50 +0100)]
Update to upstream 4.16.2+90-g0d39a6d1ae

[git-debrebase anchor: new upstream 4.16.2+90-g0d39a6d1ae, merge]

3 years agoIntroduce CC-BY-4.0 license under LICENSES/
Stefano Stabellini [Mon, 14 Nov 2022 22:36:04 +0000 (14:36 -0800)]
Introduce CC-BY-4.0 license under LICENSES/

We use CC-BY-4.0 for many of the documents under docs/ so we should have
a copy of the license.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoamd: remove VIRT_SC_MSR_HVM synthetic feature
Roger Pau Monne [Tue, 15 Nov 2022 13:26:56 +0000 (14:26 +0100)]
amd: remove VIRT_SC_MSR_HVM synthetic feature

With the previous bugfix, X86_FEATURE_VIRT_SC_MSR_HVM is no longer
needed and can be replaced with an __initdata variable.  This also
leaves asm/cpufeatures.h as it was in 4.16 which will simplify
backports.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Rewrite commit message.  Move amd_virt_spec_ctrl into __initdata.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoamd/virt_ssbd: set SSBD at vCPU context switch
Roger Pau Monne [Tue, 15 Nov 2022 13:26:55 +0000 (14:26 +0100)]
amd/virt_ssbd: set SSBD at vCPU context switch

This fixes an issue with running C code in a GIF=0 region, that's
problematic when using UBSAN or other instrumentation techniques.

The current logic for AMD SSBD context switches it on every
vm{entry,exit} if the Xen and guest selections don't match.  This is
expensive when not using SPEC_CTRL, and hence should be avoided as
much as possible.

When SSBD is not being set from SPEC_CTRL on AMD don't context switch
at vm{entry,exit} and instead only context switch SSBD when switching
vCPUs.  This has the side effect of running Xen code with the guest
selection of SSBD, the documentation is updated to note this behavior.
Also note that then when `ssbd` is selected on the command line guest
SSBD selection will not have an effect, and the hypervisor will run
with SSBD unconditionally enabled when not using SPEC_CTRL itself.

As a result of no longer running the code to set SSBD in a GIF=0
region the locking of amd_set_legacy_ssbd() can be done using normal
spinlocks, and some more checks can be added to assure it works as
intended.

Finally it's also worth noticing that since the guest SSBD selection
is no longer set on vmentry the VIRT_SPEC_MSR handling needs to
propagate the value to the hardware as part of handling the wrmsr.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Extend the msrs->virt_spec_ctrl context switching comment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen/arm: vGICv3: Restore the interrupt state correctly
Ayan Kumar Halder [Thu, 27 Oct 2022 19:09:13 +0000 (20:09 +0100)]
xen/arm: vGICv3: Restore the interrupt state correctly

As "spin_lock_irqsave(&v->arch.vgic.lock, flags)" saves the current interrupt
state in "flags", "spin_unlock_irqrestore(&v->arch.vgic.lock, flags)" should be
used to restore the saved interrupt state.

Fixes: fe7fa1332dabd9ce4 ("ARM: vGICv3: handle virtual LPI pending and property tables")
Signed-off-by: Ayan Kumar Halder <ayankuma@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoCHANGELOG: update link for RELEASE-4.16.0
Henry Wang [Mon, 14 Nov 2022 11:01:47 +0000 (12:01 +0100)]
CHANGELOG: update link for RELEASE-4.16.0

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
3 years agoCHANGELOG: update link for RELEASE-4.16.0
Henry Wang [Mon, 14 Nov 2022 10:58:17 +0000 (11:58 +0100)]
CHANGELOG: update link for RELEASE-4.16.0

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
3 years agoAdd SPDX to CODING_STYLE
Stefano Stabellini [Thu, 13 Oct 2022 00:56:47 +0000 (17:56 -0700)]
Add SPDX to CODING_STYLE

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoRemove extra copies of licenses and license headers
Stefano Stabellini [Thu, 13 Oct 2022 00:56:46 +0000 (17:56 -0700)]
Remove extra copies of licenses and license headers

Remove the extra copy of the GPL license and license copyright headers
from CONTRIBUTING and the top-level COPYING.

Mention of the LICENSES/ directory and also mention the SPDX tag.

SPDX support is still in progress and COPYING files in subdirectories
still need to be updated.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoAdd licenses under LICENSES
Stefano Stabellini [Thu, 13 Oct 2022 00:56:45 +0000 (17:56 -0700)]
Add licenses under LICENSES

Add the individual licenses under a new top-level directory named
"LICENSES". Each license file includes its related SPDX tags.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoarm: fix Kconfig symbol dependency on arm features
Luca Fancellu [Wed, 9 Nov 2022 14:04:20 +0000 (14:04 +0000)]
arm: fix Kconfig symbol dependency on arm features

The commit 3c2a14ea81c7 is introducing some unsupported arm features
that by default are disabled and are used for the cpufeature.c code.

As they are disabled by default, a typo in the Kconfig symbol they
depend on has landed in the codebase unnoticed, instead of depending
on ARM64 which does not exist, fix the code to depend on ARM_64 that
is the intended symbol.

Fixes: 3c2a14ea81c7 ("arm: Define kconfig symbols used by arm64 cpufeatures")
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agokexec: restore hypercall 1st arg's type
Jan Beulich [Mon, 7 Nov 2022 15:09:13 +0000 (16:09 +0100)]
kexec: restore hypercall 1st arg's type

This reverts a small part of 7e21b25059ed ("xen: harmonize return types
of hypercall handlers"). The change from "unsigned long" to "unsigned
int" for the native handler function meant that previously invalid
values became valid. While perhaps not a significant issue, strictly
speaking that's still a change to the ABI. Don't go as far as restoring
the compat entry point's type though: That one can't have values passed
which don't fit in 32 bits.

Note that as a side effect this fixes the invocation of
hypercall_create_continuation(), which by mistake wasn't adjusted by the
earlier change.

Also take the opportunity and correct the respective comment in the
public header. (The way it was it really supports that it probably was
pointless to use "long", but that's the way the hypercall was
introduced.)

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoRevert "x86/HVM: also dump stacks from show_execution_state()"
Roger Pau Monne [Fri, 4 Nov 2022 14:43:37 +0000 (15:43 +0100)]
Revert "x86/HVM: also dump stacks from show_execution_state()"

This reverts commit adb715db698bc8ec3b88c24eb88b21e9da5b6c07.

The dumping of stacks for HVM guests is problematic, since it requires
taking the p2m lock in order to walk the guest page tables and the p2m.

The suggested solution to the issue is to introduce and use a lockless p2m
walker, that relies on being executed with interrupts disabled in order to
prevent any p2m pages from being freed while doing the walk.

Note that modifications of p2m entries are already done atomically in order
to prevent the hardware walker from seeing partially updated values.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen/sched: migrate timers to correct cpus after suspend
Juergen Gross [Wed, 9 Nov 2022 10:00:04 +0000 (11:00 +0100)]
xen/sched: migrate timers to correct cpus after suspend

Today all timers are migrated to cpu 0 when the system is being
suspended. They are not migrated back after resuming the system again.

This results (at least) to visible problems with the credit scheduler,
as the timer isn't handled on the cpu it was expected to occur, which
will result in an ASSERT() triggering. Other more subtle problems, like
uninterrupted elongated time slices, are probable. The least effect
will be worse performance on cpu 0 resulting from most scheduling
related timer interrupts happening there after suspend/resume.

Add migrating the scheduling related timers of a specific cpu from cpu
0 back to its original cpu when that cpu has gone up when resuming the
system.

Fixes: 0763cd268789 ("xen/sched: don't disable scheduler on cpus during suspend")
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
master commit: 37f82facd62f720fdcec104f72f86b8c6c214820
master date: 2022-11-04 09:03:23 +0100

3 years agotools/xenstore: call remove_domid_from_perm() for special nodes
Juergen Gross [Wed, 9 Nov 2022 09:59:42 +0000 (10:59 +0100)]
tools/xenstore: call remove_domid_from_perm() for special nodes

When destroying a domain, any stale permissions of the domain must be
removed from the special nodes "@...", too. This was not done in the
fix for XSA-322.

Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
master commit: 0751a75e3996cf6efd3925a90b4776660d8df2bc
master date: 2022-11-02 12:08:22 +0100

3 years agotools/hotplug: fix systemd unit dependencies
Juergen Gross [Wed, 9 Nov 2022 09:48:59 +0000 (10:48 +0100)]
tools/hotplug: fix systemd unit dependencies

Commit 1283af6465cd ("tools/xenstore: remove XEN_LIB_STORED and
XENSTORED_ROOTDIR") removed the systemd file var-lib-xenstored.mount
without removing dependencies to this file.

Fixes: 1283af6465cd ("tools/xenstore: remove XEN_LIB_STORED and XENSTORED_ROOTDIR")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/ocaml/xenstored/store.ml: fix build error
Edwin Török [Wed, 9 Nov 2022 09:48:33 +0000 (10:48 +0100)]
tools/ocaml/xenstored/store.ml: fix build error

Building with Dune in release mode fails with:
```
File "ocaml/xenstored/store.ml", line 464, characters 13-32:
Warning 18: this type-based record disambiguation is not principal.
File "ocaml/xenstored/store.ml", line 1:
Error: Some fatal warnings were triggered (1 occurrences)
```

This is a warning to help keep the code futureproof, quoting from its
documentation:
> Check information path during type-checking, to make sure that all types are
> derived in a principal way. When using labelled arguments and/or polymorphic
> methods, this flag is required to ensure future versions of the compiler will
> be able to infer types correctly, even if internal algorithms change. All
> programs accepted in -principal mode are also accepted in the default mode with
> equivalent types, but different binary signatures, and this may slow down type
> checking; yet it is a good idea to use it once before publishing source code.

Fixes: db471408edd46 "tools/ocaml/xenstored: Fix quota bypass on domain shutdown"
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agox86/spec-ctrl: Mitigate IBPB not flushing the RSB/RAS
Andrew Cooper [Tue, 14 Jun 2022 15:18:36 +0000 (16:18 +0100)]
x86/spec-ctrl: Mitigate IBPB not flushing the RSB/RAS

Introduce spec_ctrl_new_guest_context() to encapsulate all logic pertaining to
using MSR_PRED_CMD for a new guest context, even if it only has one user
presently.

Introduce X86_BUG_IBPB_NO_RET, and use it extend spec_ctrl_new_guest_context()
with a manual fixup for hardware which mis-implements IBPB.

This is part of XSA-422 / CVE-2022-23824.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Enumeration for IBPB_RET
Andrew Cooper [Tue, 14 Jun 2022 15:18:36 +0000 (16:18 +0100)]
x86/spec-ctrl: Enumeration for IBPB_RET

The IBPB_RET bit indicates that the CPU's implementation of MSR_PRED_CMD.IBPB
does flush the RSB/RAS too.

This is part of XSA-422 / CVE-2022-23824.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: add iounmap after initrd has been loaded in domain_build
Wei Chen [Fri, 4 Nov 2022 10:07:32 +0000 (18:07 +0800)]
xen/arm: add iounmap after initrd has been loaded in domain_build

domain_build use ioremap_wc to map a new non-cacheable virtual
address for initrd. After Xen copy initrd from this address to
guest, this new allocated virtual address has not been unmapped.

So in this patch, we add an iounmap to the end of domain_build,
after Xen loaded initrd to guest memory.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/arm: vGICv3: Emulate properly 32-bit access on GICR_PENDBASER
Ayan Kumar Halder [Thu, 27 Oct 2022 18:55:55 +0000 (19:55 +0100)]
xen/arm: vGICv3: Emulate properly 32-bit access on GICR_PENDBASER

If a guest is running in 32 bit mode and it tries to access
"GICR_PENDBASER + 4" mmio reg, it will be trapped to Xen. vreg_reg64_extract()
will return the value stored "v->arch.vgic.rdist_pendbase + 4".
This will be stored in a 64bit cpu register.
So now we have the top 32 bits of GICR_PENDBASER (a 64 bit MMIO register) stored
in the lower 32 bits of the 64bit cpu register.

This 64bit cpu register is then modified bitwise with a mask (ie
GICR_PENDBASER_PTZ, it clears the 62nd bit). But the PTZ (which is bit 30 in the
64 bit cpu register) is not cleared as expected by the specification.

The correct thing to do here is to store the value of
"v->arch.vgic.rdist_pendbase" in a temporary 64 bit variable. This variable is
then modified bitwise with GICR_PENDBASER_PTZ mask. It is then passed to
vreg_reg64_extract() which will extract 32 bits from the given offset.

Also, we have removed spin_lock_irqsave()/spin_unlock_irqrestore() to protect
v->arch.vgic.rdist_pendbase in __vgic_v3_rdistr_rd_mmio_read(). The reason
being v->arch.vgic.rdist_pendbase is now being read in an atomic manner.

Similarly in __vgic_v3_rdistr_rd_mmio_write(), we have used read_atomic(),
write_atomic() to read/write v->arch.vgic.rdist_pendbase.

Fixes: fe7fa1332dabd9ce4 ("ARM: vGICv3: handle virtual LPI pending and property tables")
Signed-off-by: Ayan Kumar Halder <ayankuma@amd.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/spec-ctrl: Mitigate IBPB not flushing the RSB/RAS
Andrew Cooper [Tue, 14 Jun 2022 15:18:36 +0000 (16:18 +0100)]
x86/spec-ctrl: Mitigate IBPB not flushing the RSB/RAS

Introduce spec_ctrl_new_guest_context() to encapsulate all logic pertaining to
using MSR_PRED_CMD for a new guest context, even if it only has one user
presently.

Introduce X86_BUG_IBPB_NO_RET, and use it extend spec_ctrl_new_guest_context()
with a manual fixup for hardware which mis-implements IBPB.

This is part of XSA-422 / CVE-2022-23824.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 2b27967fb89d7904a1571a2fb963b1c9cac548db)

3 years agox86/spec-ctrl: Enumeration for IBPB_RET
Andrew Cooper [Tue, 14 Jun 2022 15:18:36 +0000 (16:18 +0100)]
x86/spec-ctrl: Enumeration for IBPB_RET

The IBPB_RET bit indicates that the CPU's implementation of MSR_PRED_CMD.IBPB
does flush the RSB/RAS too.

This is part of XSA-422 / CVE-2022-23824.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 24496558e650535bdbd22cc04731e82276cd1b3f)

3 years agoxen/sched: migrate timers to correct cpus after suspend
Juergen Gross [Fri, 4 Nov 2022 08:03:23 +0000 (09:03 +0100)]
xen/sched: migrate timers to correct cpus after suspend

Today all timers are migrated to cpu 0 when the system is being
suspended. They are not migrated back after resuming the system again.

This results (at least) to visible problems with the credit scheduler,
as the timer isn't handled on the cpu it was expected to occur, which
will result in an ASSERT() triggering. Other more subtle problems, like
uninterrupted elongated time slices, are probable. The least effect
will be worse performance on cpu 0 resulting from most scheduling
related timer interrupts happening there after suspend/resume.

Add migrating the scheduling related timers of a specific cpu from cpu
0 back to its original cpu when that cpu has gone up when resuming the
system.

Fixes: 0763cd268789 ("xen/sched: don't disable scheduler on cpus during suspend")
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: fix generated code for calling hypercall handlers
Juergen Gross [Fri, 4 Nov 2022 07:54:57 +0000 (08:54 +0100)]
xen: fix generated code for calling hypercall handlers

The code generated for the call_handlers_*() macros needs to avoid
undefined behavior when multiple handlers share the same priority.
The issue is the hypercall number being unverified fed into the macros
and then used to set a mask via "mask = 1ULL << <hypercall-number>".

Avoid a shift amount of more than 63 by setting mask to zero in case
the hypercall number is too large.

Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agodrivers/char: suspend handling in XHCI console driver
Marek Marczykowski-Górecki [Fri, 4 Nov 2022 07:53:20 +0000 (08:53 +0100)]
drivers/char: suspend handling in XHCI console driver

Similar to the EHCI driver - save/restore relevant BAR and command
register, re-configure DbC on resume and stop/start timer.
On resume trigger sending anything that was queued in the meantime.
Save full BAR value, instead of just the address part, to ease restoring
on resume.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoIOMMU/VT-d: wire common device reserved memory API
Marek Marczykowski-Górecki [Thu, 3 Nov 2022 08:12:11 +0000 (09:12 +0100)]
IOMMU/VT-d: wire common device reserved memory API

Re-use rmrr= parameter handling code to handle common device reserved
memory.

Move MAX_USER_RMRR_PAGES limit enforcement to apply only to
user-configured ranges, but not those from internal callers.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/xenstore: call remove_domid_from_perm() for special nodes
Juergen Gross [Wed, 2 Nov 2022 11:08:22 +0000 (12:08 +0100)]
tools/xenstore: call remove_domid_from_perm() for special nodes

When destroying a domain, any stale permissions of the domain must be
removed from the special nodes "@...", too. This was not done in the
fix for XSA-322.

Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/xenstore: remove XEN_LIB_STORED and XENSTORED_ROOTDIR
Juergen Gross [Wed, 2 Nov 2022 11:07:57 +0000 (12:07 +0100)]
tools/xenstore: remove XEN_LIB_STORED and XENSTORED_ROOTDIR

XEN_LIB_STORED is serving no real purpose, as it is a mount point for
a tmpfs, so it can be replaced easily by XEN_RUN_STORED.

XENSTORED_ROOTDIR is basically unused already, there is just a single
reference in xs_daemon_rootdir() with a fallback to XEN_LIB_STORED,
and a .gdbinit file setting it.

Remove the .gdbinit file, as it is not known having been used since
ages, and make xs_daemon_rootdir() an alias of xs_daemon_rundir().

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agohvm/msr: load VIRT_SPEC_CTRL
Roger Pau Monné [Wed, 2 Nov 2022 11:06:37 +0000 (12:06 +0100)]
hvm/msr: load VIRT_SPEC_CTRL

Add MSR_VIRT_SPEC_CTRL to the list of MSRs handled by
hvm_load_cpu_msrs(), or else it would be lost.

Fixes: 8ffd5496f4 ('amd/msr: implement VIRT_SPEC_CTRL for HVM guests on top of SPEC_CTRL')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agotools/xenstore: harden transaction finalization against errors
Juergen Gross [Tue, 13 Sep 2022 05:35:14 +0000 (07:35 +0200)]
tools/xenstore: harden transaction finalization against errors

When finalizing a transaction, any error occurring after checking for
conflicts will result in the transaction being performed only
partially today. Additionally accounting data will not be updated at
the end of the transaction, which might result in further problems
later.

Avoid those problems by multiple modifications:

- free any transaction specific nodes which don't need to be committed
  as they haven't been written during the transaction as soon as their
  generation count has been verified, this will reduce the risk of
  out-of-memory situations

- store the transaction specific node name in struct accessed_node in
  order to avoid the need to allocate additional memory for it when
  finalizing the transaction

- don't stop the transaction finalization when hitting an error
  condition, but try to continue to handle all modified nodes

- in case of a detected error do the accounting update as needed and
  call the data base checking only after that

- if writing a node in a transaction is failing (e.g. due to a failed
  quota check), fail the transaction, as prior changes to struct
  accessed_node can't easily be undone in that case

This is part of XSA-421 / CVE-2022-42326.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Tested-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 2dd823ca7237e7fb90c890642d6a3b357a26fcff)

3 years agotools/xenstore: fix deleting node in transaction
Juergen Gross [Tue, 13 Sep 2022 05:35:13 +0000 (07:35 +0200)]
tools/xenstore: fix deleting node in transaction

In case a node has been created in a transaction and it is later
deleted in the same transaction, the transaction will be terminated
with an error.

As this error is encountered only when handling the deleted node at
transaction finalization, the transaction will have been performed
partially and without updating the accounting information. This will
enable a malicious guest to create arbitrary number of nodes.

This is part of XSA-421 / CVE-2022-42325.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 13ac37f1416cae88d97f7baf6cf2a827edb9a187)

3 years agotools/ocaml: Ensure packet size is never negative
Edwin Török [Wed, 12 Oct 2022 18:13:05 +0000 (19:13 +0100)]
tools/ocaml: Ensure packet size is never negative

Integers in Ocaml have 63 or 31 bits of signed precision.

On 64-bit builds of Ocaml, this is fine because a C uint32_t always fits
within a 63-bit signed integer.

In 32-bit builds of Ocaml, this goes wrong.  The C uint32_t is truncated
first (loses the top bit), then has a unsigned/signed mismatch.

A "negative" value (i.e. a packet on the ring of between 1G and 2G in size)
will trigger an exception later in Bytes.make in xb.ml, and because the packet
is not removed from the ring, the exception re-triggers on every subsequent
query, creating a livelock.

Fix both the source of the exception in Xb, and as defence in depth, mark the
domain as bad for any Invalid_argument exceptions to avoid the risk of
livelock.

This is XSA-420 / CVE-2022-42324.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ae34df4d82636f4c82700b447ea2c93b9f82b3f3)

3 years agotools/ocaml/xenstored: Fix quota bypass on domain shutdown
Edwin Török [Wed, 12 Oct 2022 18:13:06 +0000 (19:13 +0100)]
tools/ocaml/xenstored: Fix quota bypass on domain shutdown

XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
tree).

However Dom0 has no quota on purpose, so this opened up another potential
attack vector. Avoid it by deleting these nodes instead of assigning them to
Dom0.

This is part of XSA-419 / CVE-2022-42323.

Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit db471408edd46af403b8bd44d180a928ad7fbb80)

3 years agodocs: enhance xenstore.txt with permissions description
Juergen Gross [Tue, 13 Sep 2022 05:35:13 +0000 (07:35 +0200)]
docs: enhance xenstore.txt with permissions description

The permission scheme of Xenstore nodes is not really covered by
docs/misc/xenstore.txt, other than referring to the Xen wiki.

Add a paragraph explaining the permissions of nodes, and especially
mentioning removal of nodes when a domain has been removed from
Xenstore.

This is part of XSA-419.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit d084d2c6dff7044956ebdf83a259ad6081a1d921)

3 years agotools/xenstore: make the internal memory data base the default
Juergen Gross [Tue, 13 Sep 2022 05:35:13 +0000 (07:35 +0200)]
tools/xenstore: make the internal memory data base the default

Having a file backed data base has the only advantage of being capable
to dump the contents of it while Xenstore is running, and potentially
using less swap space in case the data base can't be kept in memory.

It has the major disadvantage of a huge performance overhead: switching
to keep the data base in memory only speeds up live update of xenstored
with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
of this configuration will be reduced from 7 seconds to 280 msecs
(measured by "xenstore-control check").

So make the internal memory data base the default and enhance the
"--internal-db" command line parameter to take an optional parameter
allowing to switch the internal data base back to the file based one.

This is part of XSA-419.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit d174fefa90487ddd25ebc618028f67b2e8a1f795)

3 years agotools/xenstore: remove nodes owned by destroyed domain
Juergen Gross [Tue, 13 Sep 2022 05:35:12 +0000 (07:35 +0200)]
tools/xenstore: remove nodes owned by destroyed domain

In case a domain is removed from Xenstore, remove all nodes owned by
it per default.

This tackles the problem that nodes might be created by a domain
outside its home path in Xenstore, leading to Xenstore hogging more
and more memory. Domain quota don't work in this case if the guest is
rebooting in between.

Since XSA-322 ownership of such stale nodes is transferred to dom0,
which is helping against unintended access, but not against OOM of
Xenstore.

As a fallback for weird cases add a Xenstore start parameter for
keeping today's way to handle stale nodes, adding the risk of Xenstore
hitting an OOM situation.

This is part of XSA-419 / CVE-2022-42322.

Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 755d3f9debf8879448211fffb018f556136f6a79)

3 years agotools/xenstore: use treewalk for creating node records
Juergen Gross [Tue, 13 Sep 2022 05:35:12 +0000 (07:35 +0200)]
tools/xenstore: use treewalk for creating node records

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when creating the node records during a live update.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 297ac246a5d8ed656b349641288f3402dcc0251e)

3 years agotools/xenstore: use treewalk for deleting nodes
Juergen Gross [Tue, 13 Sep 2022 05:35:12 +0000 (07:35 +0200)]
tools/xenstore: use treewalk for deleting nodes

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when deleting a sub-tree of nodes.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit ea16962053a6849a6e7cada549ba7f8c586d85c6)

3 years agotools/xenstore: use treewalk for check_store()
Juergen Gross [Tue, 13 Sep 2022 05:35:12 +0000 (07:35 +0200)]
tools/xenstore: use treewalk for check_store()

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when checking the store for inconsistencies.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit a07cc0ec60612f414bedf2bafb26ec38d2602e95)

3 years agotools/xenstore: simplify check_store()
Juergen Gross [Tue, 13 Sep 2022 05:35:12 +0000 (07:35 +0200)]
tools/xenstore: simplify check_store()

check_store() is using a hash table for storing all node names it has
found via walking the tree. Additionally it using another hash table
for all children of a node to detect duplicate child names.

Simplify that by dropping the second hash table as the first one is
already holding all the needed information.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 70f719f52a220bc5bc987e4dd28e14a7039a176b)

3 years agotools/xenstore: add generic treewalk function
Juergen Gross [Tue, 13 Sep 2022 05:35:11 +0000 (07:35 +0200)]
tools/xenstore: add generic treewalk function

Add a generic function to walk the complete node tree. It will start
at "/" and descend recursively into each child, calling a function
specified by the caller. Depending on the return value of the user
specified function the walk will be aborted, continued, or the current
child will be skipped by not descending into its children.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 0d7c5d19bc27492360196e7dad2b227908564fff)

3 years agotools/xenstore: don't let remove_child_entry() call corrupt()
Juergen Gross [Tue, 13 Sep 2022 05:35:11 +0000 (07:35 +0200)]
tools/xenstore: don't let remove_child_entry() call corrupt()

In case of write_node() returning an error, remove_child_entry() will
call corrupt() today. This could result in an endless recursion, as
remove_child_entry() is called by corrupt(), too:

corrupt()
  check_store()
    check_store_()
      remove_child_entry()

Fix that by letting remove_child_entry() return an error instead and
let the caller decide what to do.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 0c00c51f3bc8206c7f9cf87d014650157bee2bf4)

3 years agotools/xenstore: remove recursion from construct_node()
Juergen Gross [Tue, 13 Sep 2022 05:35:11 +0000 (07:35 +0200)]
tools/xenstore: remove recursion from construct_node()

In order to reduce stack usage due to recursion, switch
construct_node() to use a loop instead.

This is part of XSA-418 / CVE-2022-42321.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit da8ee25d02a5447ba39a9800ee2a710ae1f54222)

3 years agotools/xenstore: fix checking node permissions
Juergen Gross [Tue, 13 Sep 2022 05:35:10 +0000 (07:35 +0200)]
tools/xenstore: fix checking node permissions

Today chk_domain_generation() is being used to check whether a node
permission entry is still valid or whether it is referring to a domain
no longer existing. This is done by comparing the node's and the
domain's generation count.

In case no struct domain is existing for a checked domain, but the
domain itself is valid, chk_domain_generation() assumes it is being
called due to the first node created for a new domain and it will
return success.

This might be wrong in case the checked permission is related to an
old domain, which has just been replaced with a new domain using the
same domid.

Fix that by letting chk_domain_generation() fail in case a struct
domain isn't found. In order to cover the case of the first node for
a new domain try to allocate the needed struct domain explicitly when
processing the related SET_PERMS command. In case a referenced domain
isn't existing, flag the related permission to be ignored right away.

This is XSA-417 / CVE-2022-42320.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit ab128218225d3542596ca3a02aee80d55494bef8)

3 years agotools/xenstore: don't use conn->in as context for temporary allocations
Juergen Gross [Tue, 13 Sep 2022 05:35:10 +0000 (07:35 +0200)]
tools/xenstore: don't use conn->in as context for temporary allocations

Using the struct buffered data pointer of the current processed request
for temporary data allocations has a major drawback: the used area (and
with that the temporary data) is freed only after the response of the
request has been written to the ring page or has been read via the
socket. This can happen much later in case a guest isn't reading its
responses fast enough.

As the temporary data can be safely freed after creating the response,
add a temporary context for that purpose and use that for allocating
the temporary memory, as it was already the case before commit
cc0612464896 ("xenstore: add small default data buffer to internal
struct").

Some sub-functions need to gain the "const" attribute for the talloc
context.

This is XSA-416 / CVE-2022-42319.

Fixes: cc0612464896 ("xenstore: add small default data buffer to internal struct")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 2a587de219cc0765330fbf9fac6827bfaf29e29b)

3 years agoSUPPORT.md: clarify support of untrusted driver domains with oxenstored
Juergen Gross [Thu, 29 Sep 2022 11:07:35 +0000 (13:07 +0200)]
SUPPORT.md: clarify support of untrusted driver domains with oxenstored

Add a support statement for the scope of support regarding different
Xenstore variants. Especially oxenstored does not (yet) have security
support of untrusted driver domains, as those might drive oxenstored
out of memory by creating lots of watch events for the guests they are
servicing.

Add a statement regarding Live Update support of oxenstored.

This is part of XSA-326.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit c7bc20d8d123851a468402bbfc9e3330efff21ec)

3 years agotools/ocaml: Limit maximum in-flight requests / outstanding replies
Edwin Török [Wed, 12 Oct 2022 18:13:04 +0000 (19:13 +0100)]
tools/ocaml: Limit maximum in-flight requests / outstanding replies

Introduce a limit on the number of outstanding reply packets in the xenbus
queue.  This limits the number of in-flight requests: when the output queue is
full we'll stop processing inputs until the output queue has room again.

To avoid a busy loop on the Unix socket we only add it to the watched input
file descriptor set if we'd be able to call `input` on it.  Even though Dom0
is trusted and exempt from quotas a flood of events might cause a backlog
where events are produced faster than daemons in Dom0 can consume them, which
could lead to an unbounded queue size and OOM.

Therefore the xenbus queue limit must apply to all connections, Dom0 is not
exempt from it, although if everything works correctly it will eventually
catch up.

This prevents a malicious guest from sending more commands while it has
outstanding watch events or command replies in its input ring.  However if it
can cause the generation of watch events by other means (e.g. by Dom0, or
another cooperative guest) and stop reading its own ring then watch events
would've queued up without limit.

The xenstore protocol doesn't have a back-pressure mechanism, and doesn't
allow dropping watch events.  In fact, dropping watch events is known to break
some pieces of normal functionality.  This leaves little choice to safely
implement the xenstore protocol without exposing the xenstore daemon to
out-of-memory attacks.

Implement the fix as pipes with bounded buffers:
* Use a bounded buffer for watch events
* The watch structure will have a bounded receiving pipe of watch events
* The source will have an "overflow" pipe of pending watch events it couldn't
  deliver

Items are queued up on one end and are sent as far along the pipe as possible:

  source domain -> watch -> xenbus of target -> xenstore ring/socket of target

If the pipe is "full" at any point then back-pressure is applied and we prevent
more items from being queued up.  For the source domain this means that we'll
stop accepting new commands as long as its pipe buffer is not empty.

Before we try to enqueue an item we first check whether it is possible to send
it further down the pipe, by attempting to recursively flush the pipes. This
ensures that we retain the order of events as much as possible.

We might break causality of watch events if the target domain's queue is full
and we need to start using the watch's queue.  This is a breaking change in
the xenstore protocol, but only for domains which are not processing their
incoming ring as expected.

When a watch is deleted its entire pending queue is dropped (no code is needed
for that, because it is part of the 'watch' type).

There is a cache of watches that have pending events that we attempt to flush
at every cycle if possible.

Introduce 3 limits here:
* quota-maxwatchevents on watch event destination: when this is hit the
  source will not be allowed to queue up more watch events.
* quota-maxoustanding which is the number of responses not read from the ring:
  once exceeded, no more inputs are processed until all outstanding replies
  are consumed by the client.
* overflow queue on the watch event source: all watches that cannot be stored
  on destination are queued up here, a single command can trigger multiple
  watches (e.g. due to recursion).

The overflow queue currently doesn't have an upper bound, it is difficult to
accurately calculate one as it depends on whether you are Dom0 and how many
watches each path has registered and how many watch events you can trigger
with a single command (e.g. a commit).  However these events were already
using memory, this just moves them elsewhere, and as long as we correctly
block a domain it shouldn't result in unbounded memory usage.

Note that Dom0 is not excluded from these checks, it is important that Dom0 is
especially not excluded when it is the source, since there are many ways in
which a guest could trigger Dom0 to send it watch events.

This should protect against malicious frontends as long as the backend follows
the PV xenstore protocol and only exposes paths needed by the frontend, and
changes those paths at most once as a reaction to guest events, or protocol
state.

The queue limits are per watch, and per domain-pair, so even if one
communication channel would be "blocked", others would keep working, and the
domain itself won't get blocked as long as it doesn't overflow the queue of
watch events.

Similarly a malicious backend could cause the frontend to get blocked, but
this watch queue protects the frontend as well as long as it follows the PV
protocol.  (Although note that protection against malicious backends is only a
best effort at the moment)

This is part of XSA-326 / CVE-2022-42318.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9284ae0c40fb5b9606947eaaec23dc71d0540e96)

3 years agotools/ocaml/xb: Add BoundedQueue
Edwin Török [Wed, 12 Oct 2022 18:13:03 +0000 (19:13 +0100)]
tools/ocaml/xb: Add BoundedQueue

Ensures we cannot store more than [capacity] elements in a [Queue].  Replacing
all Queue with this module will then ensure at compile time that all Queues
are correctly bound checked.

Each element in the queue has a class with its own limits.  This, in a
subsequent change, will ensure that command responses can proceed during a
flood of watch events.

No functional change.

This is part of XSA-326.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 19171fb5d888b4467a7073e8febc5e05540956e9)

3 years agotools/ocaml: Change Xb.input to return Packet.t option
Edwin Török [Wed, 12 Oct 2022 18:13:02 +0000 (19:13 +0100)]
tools/ocaml: Change Xb.input to return Packet.t option

The queue here would only ever hold at most one element.  This will simplify
follow-up patches.

This is part of XSA-326.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit c0a86a462721008eca5ff733660de094d3c34bc7)

3 years agotools/ocaml/libs/xb: hide type of Xb.t
Edwin Török [Fri, 29 Jul 2022 17:53:29 +0000 (18:53 +0100)]
tools/ocaml/libs/xb: hide type of Xb.t

Hiding the type will make it easier to change the implementation
in the future without breaking code that relies on it.

No functional change.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 7ade30a1451734d041363c750a65d322e25b47ba)

3 years agotools/ocaml: GC parameter tuning
Edwin Török [Wed, 12 Oct 2022 18:13:07 +0000 (19:13 +0100)]
tools/ocaml: GC parameter tuning

By default the OCaml garbage collector would return memory to the OS only
after unused memory is 5x live memory.  Tweak this to 120% instead, which
would match the major GC speed.

This is part of XSA-326.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 4a8bacff20b857ca0d628ef5525877ade11f2a42)

3 years agotools/ocaml/xenstored: Check for maxrequests before performing operations
Edwin Török [Thu, 28 Jul 2022 16:08:15 +0000 (17:08 +0100)]
tools/ocaml/xenstored: Check for maxrequests before performing operations

Previously we'd perform the operation, record the updated tree in the
transaction record, then try to insert a watchop path and the reply packet.

If we exceeded max requests we would've returned EQUOTA, but still:
* have performed the operation on the transaction's tree
* have recorded the watchop, making this queue effectively unbounded

It is better if we check whether we'd have room to store the operation before
performing the transaction, and raise EQUOTA there.  Then the transaction
record won't grow.

This is part of XSA-326 / CVE-2022-42317.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 329f4d1a6535c6c5a34025ca0d03fc5c7228fcff)

3 years agotools/ocaml/xenstored: Synchronise defaults with oxenstore.conf.in
Edwin Török [Wed, 12 Oct 2022 18:13:01 +0000 (19:13 +0100)]
tools/ocaml/xenstored: Synchronise defaults with oxenstore.conf.in

We currently have 2 different set of defaults in upstream Xen git tree:
* defined in the source code, only used if there is no config file
* defined in the oxenstored.conf.in upstream Xen

An oxenstored.conf file is not mandatory, and if missing, maxrequests in
particular has an unsafe default.

Resync the defaults from oxenstored.conf.in into the source code.

This is part of XSA-326 / CVE-2022-42316.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 84734955d4bf629ba459a74773afcde50a52236f)

3 years agotools/xenstore: add control command for setting and showing quota
Juergen Gross [Tue, 13 Sep 2022 05:35:10 +0000 (07:35 +0200)]
tools/xenstore: add control command for setting and showing quota

Add a xenstore-control command "quota" to:
- show current quota settings
- change quota settings
- show current quota related values of a domain

Note that in the case the new quota is lower than existing one,
Xenstored may continue to handle requests from a domain exceeding the
new limit (depends on which one has been broken) and the amount of
resource used will not change. However the domain will not be able to
create more resource (associated to the quota) until it is back to below
the limit.

This is part of XSA-326.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 9c484bef83496b683b0087e3bd2a560da4aa37af)

3 years agotools/xenstore: add exports for quota variables
Juergen Gross [Tue, 13 Sep 2022 05:35:10 +0000 (07:35 +0200)]
tools/xenstore: add exports for quota variables

Some quota variables are not exported via header files.

This is part of XSA-326.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 1da16d5990b5f7752657fca3e948f735177ea9ad)

3 years agotools/xenstore: add memory accounting for nodes
Juergen Gross [Tue, 13 Sep 2022 05:35:10 +0000 (07:35 +0200)]
tools/xenstore: add memory accounting for nodes

Add the memory accounting for Xenstore nodes. In order to make this
not too complicated allow for some sloppiness when writing nodes. Any
hard quota violation will result in no further requests to be accepted.

This is part of XSA-326 / CVE-2022-42315.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 00e9e32d022be1afc144b75acdaeba8393e63315)

3 years agotools/xenstore: add memory accounting for watches
Juergen Gross [Tue, 13 Sep 2022 05:35:10 +0000 (07:35 +0200)]
tools/xenstore: add memory accounting for watches

Add the memory accounting for registered watches.

When a socket connection is destroyed, the associated watches are
removed, too. In order to keep memory accounting correct the watches
must be removed explicitly via a call of conn_delete_all_watches() from
destroy_conn().

This is part of XSA-326 / CVE-2022-42315.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 7f9978a2cc37aaffab2fb09593bc598c0712a69b)

3 years agotools/xenstore: add memory accounting for responses
Juergen Gross [Tue, 13 Sep 2022 05:35:09 +0000 (07:35 +0200)]
tools/xenstore: add memory accounting for responses

Add the memory accounting for queued responses.

In case adding a watch event for a guest is causing the hard memory
quota of that guest to be violated, the event is dropped. This will
ensure that it is impossible to drive another guest past its memory
quota by generating insane amounts of events for that guest. This is
especially important for protecting driver domains from that attack
vector.

This is part of XSA-326 / CVE-2022-42315.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit f6d00133643a524d2138c9e3f192bbde719050ba)

3 years agotools/xenstore: add infrastructure to keep track of per domain memory usage
Juergen Gross [Tue, 13 Sep 2022 05:35:09 +0000 (07:35 +0200)]
tools/xenstore: add infrastructure to keep track of per domain memory usage

The amount of memory a domain can consume in Xenstore is limited by
various quota today, but even with sane quota a domain can still
consume rather large memory quantities.

Add the infrastructure for keeping track of the amount of memory a
domain is consuming in Xenstore. Note that this is only the memory a
domain has direct control over, so any internal administration data
needed by Xenstore only is not being accounted for.

There are two quotas defined: a soft quota which will result in a
warning issued via syslog() when it is exceeded, and a hard quota
resulting in a stop of accepting further requests or watch events as
long as the hard quota would be violated by accepting those.

Setting any of those quotas to 0 will disable it.

As default values use 2MB per domain for the soft limit (this basically
covers the allowed case to create 1000 nodes needing 2kB each), and
2.5MB for the hard limit.

This is part of XSA-326.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 0d4a8ec7a93faedbe54fd197db146de628459e77)