xen.git
4 years agodebian/changelog: finish 4.14.3+32-g9de3671772-1
Hans van Kranenburg [Sat, 27 Nov 2021 14:45:55 +0000 (15:45 +0100)]
debian/changelog: finish 4.14.3+32-g9de3671772-1

4 years agox86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined
Jan Beulich [Tue, 5 Jan 2021 12:11:04 +0000 (13:11 +0100)]
x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined

We can be more tolerant as long as the data collected from FACS is only
needed to enter S3. A prior change already added suitable checking to
acpi_enter_sleep().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 16ca5b3f873f17f4fbdaecf46c133e1aa3d623b2)

4 years agox86/ACPI: fix S3 wakeup vector mapping
Jan Beulich [Tue, 5 Jan 2021 12:09:55 +0000 (13:09 +0100)]
x86/ACPI: fix S3 wakeup vector mapping

Use of __acpi_map_table() here was at least close to an abuse already
before, but it will now consistently return NULL here. Drop the layering
violation and use set_fixmap() directly. Re-use of the ACPI fixmap area
is hopefully going to remain "fine" for the time being.

Add checks to acpi_enter_sleep(): The vector now needs to be contained
within a single page, but the ACPI spec requires 64-byte alignment of
FACS anyway. Also bail if no wakeup vector was determined in the first
place, in part as preparation for a subsequent relaxation change.

Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8)

4 years agox86/DMI: fix table mapping when one lives above 1Mb
Jan Beulich [Tue, 24 Nov 2020 10:26:34 +0000 (11:26 +0100)]
x86/DMI: fix table mapping when one lives above 1Mb

Use of __acpi_map_table() is kind of an abuse here, and doesn't work
anymore for the majority of cases if any of the tables lives outside the
low first Mb. Keep this (ab)use only prior to reaching SYS_STATE_boot,
primarily to avoid needing to audit whether any of the calls here can
happen this early in the first place; quite likely this isn't necessary
at all - at least dmi_scan_machine() gets called late enough.

For the "normal" case, call __vmap() directly, despite effectively
duplicating acpi_os_map_memory(). There's one difference though: We
shouldn't need to establish UC- mappings, WP or r/o WB mappings ought to
be fine, as the tables are going to live in either RAM or ROM. Short of
having PAGE_HYPERVISOR_WP and wanting to map the tables r/o anyway, use
the latter of the two options. The r/o mapping implies some
constification of code elsewhere in the file. For code touched anyway
also switch to void (where possible) or uint8_t.

Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit f390941a92f102ebbbbce1b54be206a602187fd7)

4 years agox86/ACPI: fix mapping of FACS
Jan Beulich [Tue, 24 Nov 2020 10:26:02 +0000 (11:26 +0100)]
x86/ACPI: fix mapping of FACS

acpi_fadt_parse_sleep_info() runs when the system is already in
SYS_STATE_boot. Hence its direct call to __acpi_map_table() won't work
anymore. This call should probably have been replaced long ago already,
as the layering violation hasn't been necessary for quite some time.

Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 8b6d55c1261820bb9db8d867ce9ee77397d05203)

4 years agodocs: set date to SOURCE_DATE_EPOCH if available
Maximilian Engelhardt [Fri, 18 Dec 2020 20:42:35 +0000 (21:42 +0100)]
docs: set date to SOURCE_DATE_EPOCH if available

Use the solution described in [1] to replace the call to the 'date'
command with a version that uses SOURCE_DATE_EPOCH if available. This
is needed for reproducible builds.

[1] https://reproducible-builds.org/docs/source-date-epoch/

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
[Hans van Kranenburg]
Note: this patch is submitted upstream but not committed yet. We
expect that it gets in. Otherwise, we don't wait and already have it
here because I want to have the reproducible build work completed.

4 years agodocs: use predictable ordering in generated documentation
Maximilian Engelhardt [Fri, 18 Dec 2020 20:42:34 +0000 (21:42 +0100)]
docs: use predictable ordering in generated documentation

When the seq number is equal, sort by the title to get predictable
output ordering. This is useful for reproducible builds.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit e18dadc5b709290b8038a1cacb52bc3b3b69cf21)

4 years agox86/EFI: don't insert timestamp when SOURCE_DATE_EPOCH is defined
Maximilian Engelhardt [Tue, 22 Dec 2020 07:59:14 +0000 (08:59 +0100)]
x86/EFI: don't insert timestamp when SOURCE_DATE_EPOCH is defined

By default a timestamp gets added to the xen efi binary. Unfortunately
ld doesn't seem to provide a way to set a custom date, like from
SOURCE_DATE_EPOCH, so set a zero value for the timestamp (option
--no-insert-timestamp) if SOURCE_DATE_EPOCH is defined. This makes
reproducible builds possible.

This is an alternative to the patch suggested in [1]. This patch only
omits the timestamp when SOURCE_DATE_EPOCH is defined.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg02161.html

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ee41b5c450032ae7f2531e18cd0a73bf5fb48803)

4 years agoxen: don't have timestamp inserted in config.gz
Frédéric Pierret (fepitre) [Wed, 4 Nov 2020 08:24:40 +0000 (09:24 +0100)]
xen: don't have timestamp inserted in config.gz

This is for improving reproducible builds.

Signed-off-by: Frédéric Pierret (fepitre) <frederic.pierret@qubes-os.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 5816d327e44ab37ae08730f4c54a80835998f31f)

4 years agofix spelling errors
Diederik de Haas [Fri, 4 Dec 2020 07:28:21 +0000 (08:28 +0100)]
fix spelling errors

Only spelling errors; no functional changes.

In docs/misc/dump-core-format.txt there are a few more instances of
'informations'. I'll leave that up to someone who can properly determine
how those sentences should be constructed.

Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ba6e78f0db820fbeea4df41fde4655020ca05928)

4 years agoxen/arm: traps: Don't panic when receiving an unknown debug trap
Julien Grall [Thu, 5 Nov 2020 22:31:06 +0000 (22:31 +0000)]
xen/arm: traps: Don't panic when receiving an unknown debug trap

Even if debug trap are only meant for debugging purpose, it is quite
harsh to crash Xen if one of the trap sent by the guest is not handled.

So switch from a panic() to a printk().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 957708c2d1ae25d7375abd5e5e70c3043d64f1f1)

4 years agoxen/arm: acpi: add BAD_MADT_GICC_ENTRY() macro
Julien Grall [Wed, 30 Sep 2020 11:25:04 +0000 (12:25 +0100)]
xen/arm: acpi: add BAD_MADT_GICC_ENTRY() macro

Imported from Linux commit b6cfb277378ef831c0fa84bcff5049307294adc6:

    The BAD_MADT_ENTRY() macro is designed to work for all of the subtables
    of the MADT.  In the ACPI 5.1 version of the spec, the struct for the
    GICC subtable (struct acpi_madt_generic_interrupt) is 76 bytes long; in
    ACPI 6.0, the struct is 80 bytes long.  But, there is only one definition
    in ACPICA for this struct -- and that is the 6.0 version.  Hence, when
    BAD_MADT_ENTRY() compares the struct size to the length in the GICC
    subtable, it fails if 5.1 structs are in use, and there are systems in
    the wild that have them.

    This patch adds the BAD_MADT_GICC_ENTRY() that checks the GICC subtable
    only, accounting for the difference in specification versions that are
    possible.  The BAD_MADT_ENTRY() will continue to work as is for all other
    MADT subtables.

    This code is being added to an arm64 header file since that is currently
    the only architecture using the GICC subtable of the MADT.  As a GIC is
    specific to ARM, it is also unlikely the subtable will be used elsewhere.

Fixes: aeb823bbacc2 ("ACPICA: ACPI 6.0: Add changes for FADT table.")
Signed-off-by: Al Stone <al.stone@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
    [catalin.marinas@arm.com: extra brackets around macro arguments]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
(cherry picked from commit 7056f2f89f03f2f804ac7e776c7b2b000cd716cd)

4 years agoxen/arm: Introduce fw_unreserved_regions() and use it
Julien Grall [Sat, 26 Sep 2020 20:30:14 +0000 (21:30 +0100)]
xen/arm: Introduce fw_unreserved_regions() and use it

Since commit 6e3e77120378 "xen/arm: setup: Relocate the Device-Tree
later on in the boot", the device-tree will not be kept mapped when
using ACPI.

However, a few places are calling dt_unreserved_regions() which expects
a valid DT. This will lead to a crash.

As the DT should not be used for ACPI (other than for detecting the
modules), a new function fw_unreserved_regions() is introduced.

It will behave the same way on DT system. On ACPI system, it will
unreserve the whole region.

Take the opportunity to clarify that bootinfo.reserved_mem is only used
when booting using Device-Tree.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 9c2bc0f24b2ba7082df408b3c33ec9a86bf20cf0)

4 years agoxen/arm: Check if the platform is not using ACPI before initializing Dom0less
Julien Grall [Sat, 26 Sep 2020 20:16:55 +0000 (21:16 +0100)]
xen/arm: Check if the platform is not using ACPI before initializing Dom0less

Dom0less requires a device-tree. However, since commit 6e3e77120378
"xen/arm: setup: Relocate the Device-Tree later on in the boot", the
device-tree will not get unflatten when using ACPI.

This will lead to a crash during boot.

Given the complexity to setup dom0less with ACPI (for instance how to
assign device?), we should skip any code related to Dom0less when using
ACPI.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
(cherry picked from commit dac867bf9adc1562a4cf9db5f89726597af13ef8)

4 years agoxen/arm: acpi: The fixmap area should always be cleared during failure/unmap
Julien Grall [Sat, 26 Sep 2020 18:53:27 +0000 (19:53 +0100)]
xen/arm: acpi: The fixmap area should always be cleared during failure/unmap

Commit 022387ee1ad3 "xen/arm: mm: Don't open-code Xen PT update in
{set, clear}_fixmap()" enforced that each set_fixmap() should be
paired with a clear_fixmap(). Any failure to follow the model would
result to a platform crash.

Unfortunately, the use of fixmap in the ACPI code was overlooked as it
is calling set_fixmap() but not clear_fixmap().

The function __acpi_os_map_table() is reworked so:
    - We know before the mapping whether the fixmap region is big
    enough for the mapping.
    - It will fail if the fixmap is already in use. This is not a
    change of behavior but clarifying the current expectation to avoid
    hitting a BUG().

The function __acpi_os_unmap_table() will now call clear_fixmap().

Reported-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 4d625ff3c3a939dc270b03654337568c30c5ab6e)

4 years agoxen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()
Julien Grall [Sat, 26 Sep 2020 16:44:29 +0000 (17:44 +0100)]
xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
while the __acpi_os_{un,}map_memory() are meant to be arch-specific.

Currently, the former are still containing x86 specific code.

To avoid this rather strange split, the generic helpers are reworked so
they are arch-agnostic. This requires the introduction of a new helper
__acpi_os_unmap_memory() that will undo any mapping done by
__acpi_os_map_memory().

Currently, the arch-helper for unmap is basically a no-op so it only
returns whether the mapping was arch specific. But this will change
in the future.

Note that the x86 version of acpi_os_map_memory() was already able to
able the 1MB region. Hence why there is no addition of new code.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
(cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973)

4 years agoxen/arm: acpi: Don't fail if SPCR table is absent
Elliott Mitchell [Wed, 21 Oct 2020 22:12:53 +0000 (15:12 -0700)]
xen/arm: acpi: Don't fail if SPCR table is absent

Absence of a SPCR table likely means the console is a framebuffer.  In
such case acpi_iomem_deny_access() should NOT fail.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 861f0c110976fa8879b7bf63d9478b6be83d4ab6)

4 years agotools/python: Pass linker to Python build process
Elliott Mitchell [Mon, 12 Oct 2020 01:11:39 +0000 (18:11 -0700)]
tools/python: Pass linker to Python build process

Unexpectedly the environment variable which needs to be passed is
$LDSHARED and not $LD.  Otherwise Python may find the build `ld` instead
of the host `ld`.

Replace $(LDFLAGS) with $(SHLIB_LDFLAGS) as Python needs shared objects
it can load at runtime, not executables.

This uses $(CC) instead of $(LD) since Python distutils appends $CFLAGS
to $LDFLAGS which breaks many linkers.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
(cherry picked from commit 17d192e0238d6c714e9f04593b59597b7090be38)

[ Hans van Kranenburg ]
Fixed cherry-pick conflict because we have LIBEXEC_LIB=$(LIBEXEC_LIB) in
between in the same lines. The line wrap mess makes it a bit hard to
follow.

4 years agoxen/rpi4: implement watchdog-based reset
Stefano Stabellini [Fri, 2 Oct 2020 20:47:17 +0000 (13:47 -0700)]
xen/rpi4: implement watchdog-based reset

The preferred method to reboot RPi4 is PSCI. If it is not available,
touching the watchdog is required to be able to reboot the board.

The implementation is based on
drivers/watchdog/bcm2835_wdt.c:__bcm2835_restart in Linux v5.9-rc7.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Tested-by: Roman Shaposhnik <roman@zededa.com>
CC: roman@zededa.com
(cherry picked from commit 25849c8b16f2a5b7fcd0a823e80a5f1b590291f9)

4 years agot/h/L/vif-common.sh: fix handle_iptable return value
Hans van Kranenburg [Thu, 26 Nov 2020 15:06:03 +0000 (16:06 +0100)]
t/h/L/vif-common.sh: fix handle_iptable return value

A return statement without explicit value will return the value of the
last command executed before this line with return was encountered.

This is not what we want. return 0.

Closes: #955994
Fixes: 2e0814f971dd ("vif-common: disable handle_iptable")
Reported-by: Samuel Thibault <sthibault@debian.org>
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
4 years agotools: Partially revert "Cross-compilation fixes."
Elliott Mitchell [Sat, 18 Jul 2020 03:31:21 +0000 (20:31 -0700)]
tools: Partially revert "Cross-compilation fixes."

This partially reverts commit 16504669c5cbb8b195d20412aadc838da5c428f7.

Doesn't look like much of 16504669c5cbb8b195d20412aadc838da5c428f7
actually remains due to passage of time.

Of the 3, both Python and pygrub appear to mostly be building just fine
cross-compiling.  The OCAML portion is being troublesome, this is going
to cause bug reports elsewhere soon.  The OCAML portion though can
already be disabled by setting OCAML_TOOLS=n and shouldn't have this
extra form of disabling.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
(cherry picked from commit 69953e2856382274749b617125cc98ce38198463)

4 years agotools: don't build/ship xenmon
Hans van Kranenburg [Sat, 5 Sep 2020 20:43:19 +0000 (22:43 +0200)]
tools: don't build/ship xenmon

It can't run with Python 3, and I'm not going to fix it.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
4 years agotools/xl/bash-completion: also complete 'xen'
Hans van Kranenburg [Sun, 10 Feb 2019 17:26:45 +0000 (18:26 +0100)]
tools/xl/bash-completion: also complete 'xen'

We have the `xen` alias for xl in Debian, since in the past it was a
command that could execute either xl or xm.

Now, it always does xl, so, complete the same stuff for it as we have
for xl.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
[git-debrebase split: mixed commit: upstream part]

4 years agopygrub: Specify -rpath LIBEXEC_LIB when building fsimage.so
Ian Jackson [Fri, 22 Feb 2019 12:24:35 +0000 (12:24 +0000)]
pygrub: Specify -rpath LIBEXEC_LIB when building fsimage.so

If LIBEXEC_LIB is not on the default linker search path, the python
fsimage.so module fails to find libfsimage.so.

Add the relevant directory to the rpath explicitly.

(This situation occurs in the Debian package, where
--with-libexec-libdir is used to put each Xen version's libraries and
utilities in their own directory, to allow them to be coinstalled.)

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agopygrub: Set sys.path
Bastian Blank [Sat, 5 Jul 2014 09:47:01 +0000 (11:47 +0200)]
pygrub: Set sys.path

We install libfsimage in a non-standard path for Reasons.
(See debian/rules.)

This patch was originally part of `tools-pygrub-prefix.diff'
(eg commit 51657319be54) and included changes to the Makefile to
change the installation arrangements (we do that part in the rules now
since that is a lot less prone to conflicts when we update) and to
shared library rpath (which is now done in a separate patch).

(Commit message rewritten by Ian Jackson.)

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
squash! pygrub: Set sys.path and rpath

4 years agohotplug-common: Do not adjust LD_LIBRARY_PATH
Ian Jackson [Thu, 21 Feb 2019 16:05:40 +0000 (16:05 +0000)]
hotplug-common: Do not adjust LD_LIBRARY_PATH

This is in the upstream script because on non-Debian systems, the
default install locations in /usr/local/lib might not be on the linker
path, and as a result the hotplug scripts would break.

A reason we might need it in Debian is our multiple version
coinstallation scheme.  However, the hotplug scripts all call the
utilities via the wrappers, and the binaries are configured to load
from the right place anyway.

This setting is an annoyance because it requires libdir, which is an
arch-specific path but comes from a file we want to put in
xen-utils-common, an arch:all package.

So drop this setting.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agosysconfig.xencommons.in: Strip and debianize
Hans van Kranenburg [Sat, 9 Feb 2019 16:27:26 +0000 (17:27 +0100)]
sysconfig.xencommons.in: Strip and debianize

Strip all options that are for stuff we don't ship, which is 1)
xenstored as stubdom and 2) xenbackendd, which seems to be dead code
anyway. [1]

It seems useful to give the user the option to revert to xenstored
instead of the default oxenstored if they really want.

[1] https://lists.xen.org/archives/html/xen-devel/2015-07/msg04427.html

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
4 years agovif-common: disable handle_iptable
Hans van Kranenburg [Thu, 3 Jan 2019 23:35:45 +0000 (00:35 +0100)]
vif-common: disable handle_iptable

Also see Debian bug #894013. The current attempt at providing
anti-spoofing rules results in a situation that does not have any
effect. Also note that forwarding bridged traffic to iptables is not
enabled by default, and that for openvswitch users it does not make any
sense.

So, stop cluttering the live iptables ruleset.

This functionality seems to be introduced before 2004 and since then it
has never got some additional love.

It would be nice to have a proper discussion upstream about how Xen
could provide some anti mac/ip spoofing in the dom0. It does not seem to
be a trivial thing to do, since it requires having quite some knowledge
about what the domU is allowed to do or not (e.g. a domU can be a
router...).

4 years agoFix empty fields in first hypervisor log line
Hans van Kranenburg [Thu, 3 Jan 2019 21:03:06 +0000 (22:03 +0100)]
Fix empty fields in first hypervisor log line

Instead of:

    (XEN) Xen version 4.11.1 (Debian )
    (@)
    (gcc (Debian 8.2.0-13) 8.2.0) debug=n
    Thu Jan  3 19:08:37 UTC 2019

I'd like to see:

    (XEN) Xen version 4.11.1 (Debian 4.11.1-1~)
    (pkg-xen-devel@lists.alioth.debian.org)
    (gcc (Debian 8.2.0-13) 8.2.0) debug=n
    Thu Jan  3 22:44:00 CET 2019

The substitution was broken since the great packaging refactoring,
because the directory in which the build is done changed.

Also, use the Maintainer address from debian/control instead of the most
recent changelog entry. If someone wants to use the address to ask a
question, they will end up at the team mailing list, which is better
than an individual person.

4 years agodocs/man/xen-vbd-interface.7: Provide properly-formatted NAME section
Ian Jackson [Fri, 12 Oct 2018 16:56:56 +0000 (17:56 +0100)]
docs/man/xen-vbd-interface.7: Provide properly-formatted NAME section

This manpage was omitted from
   docs/man: Provide properly-formatted NAME sections
because I was previously building with markdown not installed.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoshim: Provide separate install-shim target
Ian Jackson [Fri, 12 Oct 2018 17:17:10 +0000 (17:17 +0000)]
shim: Provide separate install-shim target

When building on a 32-bit userland, the user wants to build 32-bit
tools and a 64-bit hypervisor.  This involves setting XEN_TARGET_ARCH
to different values for the tools build and the hypervisor build.

So the user must invoke the tools build and the hypervisor build
separately.

However, although the shim is done by the tools/firmware Makefile, its
bitness needs to be the same as the hypervisor, not the same as the
tools.  When run with XEN_TARGET_ARCH=x86_32, it it skipped, which is
wrong.

So the user must invoke the shim build separately.  This can be done
with
   make -C tools/firmware/xen-dir XEN_TARGET_ARCH=x86_64

However, tools/firmware/xen-dir has no `install' target.  The
installation of all `firmware' is done in tools/firmware/Makefile.  It
might be possible to fix this, but it is not trivial.  For example,
the definitions of INST_DIR and DEBG_DIR would need to be copied, as
would an appropriate $(INSTALL_DIR) call.

For now, provide an `install-shim' target in tools/firmware/Makefile.

This has to be called from `install' of course.  We can't make it
a dependency of `install' because it might be run before `all' has
completed.  We could make it depend on a `shim' target but such
a target is nearly impossible to write because everything is done by
the inflexible subdir-$@ machinery.

The overally result of this patch is that existing make invocations
work as before.  But additionally, the user can say
  make -C tools/firmware install-shim XEN_TARGET_ARCH=x86_64
to install the shim.  The user must have built it already.
Unlike the build rune, this install-rune is properly conditional
so it is OK to call on ARM.

What a mess.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
4 years agotools/firmware/Makefile: CONFIG_PV_SHIM: enable only on x86_64
Ian Jackson [Fri, 12 Oct 2018 17:56:04 +0000 (17:56 +0000)]
tools/firmware/Makefile: CONFIG_PV_SHIM: enable only on x86_64

Previously this was *dis*abled for x86_*32*.  But if someone should
run some of this Makefile on ARM, say, it ought not to be built
either.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
4 years agotools/firmware/Makfile: Respect caller's CONFIG_PV_SHIM
Ian Jackson [Fri, 12 Oct 2018 16:00:16 +0000 (16:00 +0000)]
tools/firmware/Makfile: Respect caller's CONFIG_PV_SHIM

This makes it easier to disable the shim build.  (In Debian we need to
build the shim separately because it needs different compiler flags
and a different XEN_COMPILE_ARCH.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
4 years agoRevert "pvshim: make PV shim build selectable from configure"
Hans van Kranenburg [Sat, 21 Nov 2020 23:40:58 +0000 (00:40 +0100)]
Revert "pvshim: make PV shim build selectable from configure"

This reverts commit 8845155c831c59e867ee3dd31ee63e0cc6c7dcf2.

This upstream change changes stuff that breaks our very fragile mess
that builds the shim when it needs to, and doesn't when it should not.

The result is that it's missing in the end for the i386 build... :|

    dh_install: warning: Cannot find (any matches for)
    "usr/lib/debug/usr/lib/xen-*/boot/*" (tried in ., debian/tmp)

    dh_install: warning: xen-utils-4.14 missing files:
    usr/lib/debug/usr/lib/xen-*/boot/*
    dh_install: error: missing files, aborting

4 years ago.gitignore: Add configure output which we always delete and regenerate
Ian Jackson [Fri, 5 Oct 2018 17:05:48 +0000 (18:05 +0100)]
.gitignore: Add configure output which we always delete and regenerate

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoautoconf: Provide libexec_libdir_suffix
Ian Jackson [Wed, 3 Oct 2018 15:25:58 +0000 (16:25 +0100)]
autoconf: Provide libexec_libdir_suffix

This is going to be used to put libfsimage.so into a path containing
the multiarch triplet.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agotools-libfsimage-prefix.diff
Hans van Kranenburg [Mon, 25 May 2020 15:08:18 +0000 (17:08 +0200)]
tools-libfsimage-prefix.diff

\o/

4 years agoDo not build the instruction emulator
Ian Jackson [Thu, 20 Sep 2018 17:10:14 +0000 (18:10 +0100)]
Do not build the instruction emulator

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agotools/tests/x86_emulator: Pass -no-pie -fno-pic to gcc on x86_32
Ian Jackson [Tue, 1 Nov 2016 16:20:27 +0000 (16:20 +0000)]
tools/tests/x86_emulator: Pass -no-pie -fno-pic to gcc on x86_32

The current build fails with GCC6 on Debian sid i386 (unstable):

 /tmp/ccqjaueF.s: Assembler messages:
 /tmp/ccqjaueF.s:3713: Error: missing or invalid displacement expression `vmovd_to_reg_len@GOT'

This is due to the combination of GCC6, and Debian's decision to
enable some hardening flags by default (to try to make runtime
addresses less predictable):
  https://wiki.debian.org/Hardening/PIEByDefaultTransition

This is of no benefit for the x86 instruction emulator test, which is
a rebuild of the emulator code for testing purposes only.  So pass
options to disable this.

These options will be no-ops if they are the same as the compiler
default.

On amd64, the -fno-pic breaks the build in a different way.  So do
this only on i386.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Gbp-Pq: Topic misc
Gbp-Pq: Name toolstestsx86_emulator-pass--no-pie--fno.patch

4 years agoRemove static solaris support from pygrub
Bastian Blank [Sat, 5 Jul 2014 09:47:29 +0000 (11:47 +0200)]
Remove static solaris support from pygrub

Patch-Name: tools-pygrub-remove-static-solaris-support

Gbp-Pq: Topic misc
Gbp-Pq: Name tools-pygrub-remove-static-solaris-support

4 years agoDo not ship COPYING into /usr/include
Bastian Blank [Sat, 5 Jul 2014 09:47:30 +0000 (11:47 +0200)]
Do not ship COPYING into /usr/include

This is not wanted in Debian.  COPYING ends up in
/usr/share/doc/xen-*copyright.

Patch-Name: tools-include-no-COPYING.diff

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoconfig-prefix.diff
Bastian Blank [Sat, 5 Jul 2014 09:46:45 +0000 (11:46 +0200)]
config-prefix.diff

Patch-Name: config-prefix.diff

Gbp-Pq: Topic prefix-abiname
Gbp-Pq: Name config-prefix.diff

4 years agoversion
Bastian Blank [Sat, 5 Jul 2014 09:46:43 +0000 (11:46 +0200)]
version

4 years agoDelete configure output
Ian Jackson [Wed, 19 Sep 2018 15:53:22 +0000 (16:53 +0100)]
Delete configure output

These autogenerated files are not useful in Debian; dh_autoreconf will
regenerate them.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoDelete config.sub and config.guess
Ian Jackson [Wed, 19 Sep 2018 15:45:49 +0000 (16:45 +0100)]
Delete config.sub and config.guess

dh_autoreconf will provide these back.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoUpdate changelog for new upstream 4.14.3+32-g9de3671772
Hans van Kranenburg [Sat, 27 Nov 2021 14:09:47 +0000 (15:09 +0100)]
Update changelog for new upstream 4.14.3+32-g9de3671772

[git-debrebase changelog: new upstream 4.14.3+32-g9de3671772]

4 years agoUpdate to upstream 4.14.3+32-g9de3671772
Hans van Kranenburg [Sat, 27 Nov 2021 14:09:47 +0000 (15:09 +0100)]
Update to upstream 4.14.3+32-g9de3671772

[git-debrebase anchor: new upstream 4.14.3+32-g9de3671772, merge]

4 years agodebian/changelog: finish 4.14.3-1
Hans van Kranenburg [Mon, 13 Sep 2021 11:38:58 +0000 (13:38 +0200)]
debian/changelog: finish 4.14.3-1

4 years agox86/P2M: deal with partial success of p2m_set_entry()
Jan Beulich [Tue, 23 Nov 2021 12:30:09 +0000 (13:30 +0100)]
x86/P2M: deal with partial success of p2m_set_entry()

M2P and PoD stats need to remain in sync with P2M; if an update succeeds
only partially, respective adjustments need to be made. If updates get
made before the call, they may also need undoing upon complete failure
(i.e. including the single-page case).

Log-dirty state would better also be kept in sync.

Note that the change to set_typed_p2m_entry() may not be strictly
necessary (due to the order restriction enforced near the top of the
function), but is being kept here to be on the safe side.

This is CVE-2021-28705 and CVE-2021-28709 / XSA-389.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 74a11c43fd7e074b1f77631b446dd2115eacb9e8
master date: 2021-11-22 12:27:30 +0000

4 years agox86/PoD: handle intermediate page orders in p2m_pod_cache_add()
Jan Beulich [Tue, 23 Nov 2021 12:29:54 +0000 (13:29 +0100)]
x86/PoD: handle intermediate page orders in p2m_pod_cache_add()

p2m_pod_decrease_reservation() may pass pages to the function which
aren't 4k, 2M, or 1G. Handle all intermediate orders as well, to avoid
hitting the BUG() at the switch() statement's "default" case.

This is CVE-2021-28708 / part of XSA-388.

Fixes: 3c352011c0d3 ("x86/PoD: shorten certain operations on higher order ranges")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 8ec13f68e0b026863d23e7f44f252d06478bc809
master date: 2021-11-22 12:27:30 +0000

4 years agox86/PoD: deal with misaligned GFNs
Jan Beulich [Tue, 23 Nov 2021 12:29:41 +0000 (13:29 +0100)]
x86/PoD: deal with misaligned GFNs

Users of XENMEM_decrease_reservation and XENMEM_populate_physmap aren't
required to pass in order-aligned GFN values. (While I consider this
bogus, I don't think we can fix this there, as that might break existing
code, e.g Linux'es swiotlb, which - while affecting PV only - until
recently had been enforcing only page alignment on the original
allocation.) Only non-PoD code paths (guest_physmap_{add,remove}_page(),
p2m_set_entry()) look to be dealing with this properly (in part by being
implemented inefficiently, handling every 4k page separately).

Introduce wrappers taking care of splitting the incoming request into
aligned chunks, without putting much effort in trying to determine the
largest possible chunk at every iteration.

Also "handle" p2m_set_entry() failure for non-order-0 requests by
crashing the domain in one more place. Alongside putting a log message
there, also add one to the other similar path.

Note regarding locking: This is left in the actual worker functions on
the assumption that callers aren't guaranteed atomicity wrt acting on
multiple pages at a time. For mis-aligned GFNs gfn_lock() wouldn't have
locked the correct GFN range anyway, if it didn't simply resolve to
p2m_lock(), and for well-behaved callers there continues to be only a
single iteration, i.e. behavior is unchanged for them. (FTAOD pulling
out just pod_lock() into p2m_pod_decrease_reservation() would result in
a lock order violation.)

This is CVE-2021-28704 and CVE-2021-28707 / part of XSA-388.

Fixes: 3c352011c0d3 ("x86/PoD: shorten certain operations on higher order ranges")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 182c737b9ba540ebceb1433f3940fbed6eac4ea9
master date: 2021-11-22 12:27:30 +0000

4 years agoxen/page_alloc: Harden assign_pages()
Julien Grall [Tue, 23 Nov 2021 12:29:09 +0000 (13:29 +0100)]
xen/page_alloc: Harden assign_pages()

domain_tot_pages() and d->max_pages are 32-bit values. While the order
should always be quite small, it would still be possible to overflow
if domain_tot_pages() is near to (2^32 - 1).

As this code may be called by a guest via XENMEM_increase_reservation
and XENMEM_populate_physmap, we want to make sure the guest is not going
to be able to allocate more than it is allowed.

Rework the allocation check to avoid any possible overflow. While the
check domain_tot_pages() < d->max_pages should technically not be
necessary, it is probably best to have it to catch any possible
inconsistencies in the future.

This is CVE-2021-28706 / part of XSA-385.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 143501861d48e1bfef495849fd68584baac05849
master date: 2021-11-22 11:11:05 +0000

4 years agopublic/gnttab: relax v2 recommendation
Jan Beulich [Fri, 19 Nov 2021 08:42:10 +0000 (09:42 +0100)]
public/gnttab: relax v2 recommendation

With there being a way to disable v2 support, telling new guests to use
v2 exclusively is not a good suggestion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
master commit: 2d72d2784eb71d8532bfbd6462d261739c9e82e4
master date: 2021-11-16 17:34:06 +0100

4 years agox86/APIC: avoid iommu_supports_x2apic() on error path
Jan Beulich [Fri, 19 Nov 2021 08:41:41 +0000 (09:41 +0100)]
x86/APIC: avoid iommu_supports_x2apic() on error path

The value it returns may change from true to false in case
iommu_enable_x2apic() fails and, as a side effect, clears iommu_intremap
(as can happen at least on AMD). Latch the return value from the first
invocation to replace the second one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 0f50d1696b3c13cbf0b18fec817fc291d5a30a31
master date: 2021-11-04 14:44:43 +0100

4 years agox86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
Jan Beulich [Fri, 19 Nov 2021 08:41:09 +0000 (09:41 +0100)]
x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing

x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
mode (physical vs clustered) depends on iommu_intremap, that variable
needs to be set to off as soon as we know we can't / won't enable
interrupt remapping, i.e. in particular when parsing of the respective
ACPI tables failed. Move the turning off of iommu_intremap from AMD
specific code into acpi_iommu_init(), accompanying it by clearing of
iommu_enable.

Take the opportunity and also fully skip ACPI table parsing logic on
VT-d when both "iommu=off" and "iommu=no-intremap" are in effect anyway,
like was already the case for AMD.

The tag below only references the commit uncovering a pre-existing
anomaly.

Fixes: d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 46c4061cd2bf69e8039021af615c2bdb94e50088
master date: 2021-11-04 14:44:01 +0100

4 years agox86/xstate: reset cached register values on resume
Marek Marczykowski-Górecki [Fri, 19 Nov 2021 08:40:44 +0000 (09:40 +0100)]
x86/xstate: reset cached register values on resume

set_xcr0() and set_msr_xss() use cached value to avoid setting the
register to the same value over and over. But suspend/resume implicitly
reset the registers and since percpu areas are not deallocated on
suspend anymore, the cache gets stale.
Reset the cache on resume, to ensure the next write will really hit the
hardware. Choose value 0, as it will never be a legitimate write to
those registers - and so, will force write (and cache update).

Note the cache is used io get_xcr0() and get_msr_xss() too, but:
- set_xcr0() is called few lines below in xstate_init(), so it will
  update the cache with appropriate value
- get_msr_xss() is not used anywhere - and thus not before any
  set_msr_xss() that will fill the cache

Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f7f4a523927fa4c7598e4647a16bc3e3cf8009d0
master date: 2021-11-04 14:42:37 +0100

4 years agox86/traps: Fix typo in do_entry_CP()
Andrew Cooper [Fri, 19 Nov 2021 08:40:19 +0000 (09:40 +0100)]
x86/traps: Fix typo in do_entry_CP()

The call to debugger_trap_entry() should pass the correct vector.  The
break-for-gdbsx logic is in practice unreachable because PV guests can't
generate #CP, but it will interfere with anyone inserting custom debugging
into debugger_trap_entry().

Fixes: 5ad05b9c2490 ("x86/traps: Implement #CP handler and extend #PF for shadow stacks")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 512863ed238d7390f74d43f0ba298b1dfa8f4803
master date: 2021-11-03 19:13:17 +0000

4 years agox86/shstk: Fix use of shadow stacks with XPTI active
Andrew Cooper [Fri, 19 Nov 2021 08:39:46 +0000 (09:39 +0100)]
x86/shstk: Fix use of shadow stacks with XPTI active

The call to setup_cpu_root_pgt(0) in smp_prepare_cpus() is too early.  It
clones the BSP's stack while the .data mapping is still in use, causing all
mappings to be fully read read/write (and with no guard pages either).  This
ultimately causes #DF when trying to enter the dom0 kernel for the first time.

Defer setting up BSPs XPTI pagetable until reinit_bsp_stack() after we've set
up proper shadow stack permissions.

Fixes: 60016604739b ("x86/shstk: Rework the stack layout to support shadow stacks")
Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b2851580b1f2ff121737a37cb25a370d7692ae3b
master date: 2021-11-03 13:08:42 +0000

4 years agoupdate system time immediately when VCPUOP_register_vcpu_info
Dongli Zhang [Fri, 19 Nov 2021 08:39:09 +0000 (09:39 +0100)]
update system time immediately when VCPUOP_register_vcpu_info

The guest may access the pv vcpu_time_info immediately after
VCPUOP_register_vcpu_info. This is to borrow the idea of
VCPUOP_register_vcpu_time_memory_area, where the
force_update_vcpu_system_time() is called immediately when the new memory
area is registered.

Otherwise, we may observe clock drift at the VM side if the VM accesses
the clocksource immediately after VCPUOP_register_vcpu_info().

Reference: https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg00571.html
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b67f09721f136cc3a9afcb6a82466d1bd27aa6c0
master date: 2021-11-03 10:19:06 +0100

4 years agox86/paging: restrict physical address width reported to guests
Jan Beulich [Fri, 19 Nov 2021 08:38:42 +0000 (09:38 +0100)]
x86/paging: restrict physical address width reported to guests

Modern hardware may report more than 48 bits of physical address width.
For paging-external guests our P2M implementation does not cope with
larger values. Telling the guest of more available bits means misleading
it into perhaps trying to actually put some page there (like was e.g.
intermediately done in OVMF for the shared info page).

While there also convert the PV check to a paging-external one (which in
our current code base are synonyms of one another anyway).

Fixes: 5dbd60e16a1f ("x86/shadow: Correct guest behaviour when creating PTEs above maxphysaddr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: b7635526acffbe4ad8ad16fd92812c57742e54c2
master date: 2021-10-19 10:08:30 +0200

4 years agox86/AMD: make HT range dynamic for Fam17 and up
Jan Beulich [Fri, 19 Nov 2021 08:38:09 +0000 (09:38 +0100)]
x86/AMD: make HT range dynamic for Fam17 and up

At the time of d838ac2539cf ("x86: don't allow Dom0 access to the HT
address range") documentation correctly stated that the range was
completely fixed. For Fam17 and newer, it lives at the top of physical
address space, though.

To correctly determine the top of physical address space, we need to
account for their physical address reduction, hence the calculation of
paddr_bits also gets adjusted.

While for paddr_bits < 40 the HT range is completely hidden, there's no
need to suppress the range insertion in that case: It'll just have no
real meaning.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: d6e38eea2d806c53d976603717aebf6e5de30a1e
master date: 2021-10-19 10:04:13 +0200

4 years agox86emul: de-duplicate scatters to the same linear address
Jan Beulich [Fri, 19 Nov 2021 08:37:37 +0000 (09:37 +0100)]
x86emul: de-duplicate scatters to the same linear address

The SDM specifically allows for earlier writes to fully overlapping
ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
would crash it if varying data was written to the same address. Detect
overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
be quite a bit more difficult. To maintain proper faulting behavior,
instead of dropping earlier write instances of fully overlapping slots
altogether, write the data of the final of these slots multiple times.
(We also can't pull ahead the [single] write of the data of the last of
the slots, clearing all involved slots' op_mask bits together, as this
would yield incorrect results if there were intervening partially
overlapping ones.)

Note that due to cache slot use being linear address based, there's no
similar issue with multiple writes to the same physical address (mapped
through different linear addresses).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: a8cddbac5051020bb4a59a7f0ea27500c51063fb
master date: 2021-10-19 10:02:39 +0200

4 years agox86/HVM: correct cleanup after failed viridian_vcpu_init()
Jan Beulich [Fri, 19 Nov 2021 08:37:10 +0000 (09:37 +0100)]
x86/HVM: correct cleanup after failed viridian_vcpu_init()

This happens after nestedhvm_vcpu_initialise(), so its effects also need
to be undone.

Fixes: 40a4a9d72d16 ("viridian: add init hooks")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 66675056c6e59b8a8b651a29ef53c63e9e04f58d
master date: 2021-10-18 14:21:17 +0200

4 years agobuild: fix dependencies in arch/x86/boot
Anthony PERARD [Fri, 19 Nov 2021 08:36:36 +0000 (09:36 +0100)]
build: fix dependencies in arch/x86/boot

Temporary fix the list of headers that cmdline.c and reloc.c depends
on, until the next time the list is out of sync again.

Also, add the linker script to the list.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2f5f0a1b77161993c16c4cc243467d75e5b7633b
master date: 2021-10-14 12:35:42 +0200

4 years agox86/PV32: fix physdev_op_compat handling
Jan Beulich [Fri, 15 Oct 2021 09:20:04 +0000 (11:20 +0200)]
x86/PV32: fix physdev_op_compat handling

The conversion of the original code failed to recognize that the 32-bit
compat variant of this (sorry, two different meanings of "compat" here)
needs to continue to invoke the compat handler, not the native one.
Arrange for this by adding yet another #define.

Affected functions (having existed prior to the introduction of the new
hypercall) are PHYSDEVOP_set_iobitmap and PHYSDEVOP_apic_{read,write}.
For all others the operand struct layout doesn't differ.

Fixes: 1252e2823117 ("x86/pv: Export pv_hypercall_table[] rather than working around it in several ways")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 834cb8761051f7d87816785c0d99fe9bd5f0ce30
master date: 2021-10-12 11:55:42 +0200

4 years agoAMD/IOMMU: consider hidden devices when flushing device I/O TLBs
Jan Beulich [Fri, 15 Oct 2021 09:19:41 +0000 (11:19 +0200)]
AMD/IOMMU: consider hidden devices when flushing device I/O TLBs

Hidden devices are associated with DomXEN but usable by the
hardware domain. Hence they need flushing as well when all devices are
to have flushes invoked.

While there drop a redundant ATS-enabled check and constify the first
parameter of the involved function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
master commit: 036432e8b27e1ef21e0f0204ba9b0e3972a031c2
master date: 2021-10-12 11:54:34 +0200

4 years agox86/HVM: fix xsm_op for 32-bit guests
Jan Beulich [Fri, 15 Oct 2021 09:19:11 +0000 (11:19 +0200)]
x86/HVM: fix xsm_op for 32-bit guests

Like for PV, 32-bit guests need to invoke the compat handler, not the
native one.

Fixes: db984809d61b ("hvm: wire up domctl and xsm hypercalls")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b6b672e8a925ff4b71a1a67bc7d213ef445af74f
master date: 2021-10-11 10:58:44 +0200

4 years agox86/build: suppress EFI-related tool chain checks upon local $(MAKE) recursion
Jan Beulich [Fri, 15 Oct 2021 09:18:51 +0000 (11:18 +0200)]
x86/build: suppress EFI-related tool chain checks upon local $(MAKE) recursion

The xen-syms and xen.efi linking steps are serialized only when the
intermediate note.o file is necessary. Otherwise both may run in
parallel. This in turn means that the compiler / linker invocations to
create efi/check.o / efi/check.efi may also happen twice in parallel.
Obviously it's a bad idea to have multiple producers of the same output
race with one another - every once in a while one may e.g. observe

objdump: efi/check.efi: file format not recognized

We don't need this EFI related checking to occur when producing the
intermediate symbol and relocation table objects, and we have an easy
way of suppressing it: Simply pass in "efi-y=", overriding the
assignments done in the Makefile and thus forcing the tool chain checks
to be bypassed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 24b0ce9a5da2e648cde818055a085bcbcf24ecb0
master date: 2021-10-11 10:58:17 +0200

4 years agopci: fix handling of PCI bridges with subordinate bus number 0xff
Igor Druzhinin [Fri, 15 Oct 2021 09:18:19 +0000 (11:18 +0200)]
pci: fix handling of PCI bridges with subordinate bus number 0xff

Bus number 0xff is valid according to the PCI spec. Using u8 typed sub_bus
and assigning 0xff to it will result in the following loop getting stuck.

    for ( ; sec_bus <= sub_bus; sec_bus++ ) {...}

Just change its type to unsigned int similarly to what is already done in
dmar_scope_add_buses().

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
master commit: 9c3b9800e2019c93ab22da69e4a0b22d6fb059ec
master date: 2021-09-28 16:04:50 +0200

4 years agoVT-d: PCI segment numbers are up to 16 bits wide
Jan Beulich [Fri, 15 Oct 2021 09:17:56 +0000 (11:17 +0200)]
VT-d: PCI segment numbers are up to 16 bits wide

We shouldn't silently truncate respective values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: a3dd33e63c2c8c51102f223e6efe2e3d3bdcbec1
master date: 2021-09-20 10:25:03 +0200

4 years agoVT-d: consider hidden devices when unmapping
Jan Beulich [Fri, 15 Oct 2021 09:17:32 +0000 (11:17 +0200)]
VT-d: consider hidden devices when unmapping

Whether to clear an IOMMU's bit in the domain's bitmap should depend on
all devices the domain can control. For the hardware domain this
includes hidden devices, which are associated with DomXEN.

While touching related logic
- convert the "current device" exclusion check to a simple pointer
  comparison,
- convert "found" to "bool",
- adjust style and correct a typo in an existing comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 75bfe6ec4844f83b300b9807bceaed1e2fe23270
master date: 2021-09-20 10:24:27 +0200

4 years agox86: quote section names when defining them in linker script
Roger Pau Monné [Fri, 15 Oct 2021 09:16:41 +0000 (11:16 +0200)]
x86: quote section names when defining them in linker script

LLVM ld seems to require section names to be quoted at both definition
and when referencing them for a match to happen, or else we get the
following errors:

ld: error: xen.lds:45: undefined section ".text"
ld: error: xen.lds:69: undefined section ".rodata"
ld: error: xen.lds:104: undefined section ".note.gnu.build-id"
[...]

The original fix for GNU ld 2.37 only quoted the section name when
referencing it in the ADDR function. Fix by also quoting the section
names when declaring them.

Fixes: 58ad654ebce7 ("x86: work around build issue with GNU ld 2.37")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6254920587c33bcc7ab884e6c9a11cfc0d5867ab
master date: 2021-09-15 11:02:21 +0200

4 years agotools/libacpi: Use 64-byte alignment for FACS
Kevin Stefanov [Fri, 15 Oct 2021 09:16:09 +0000 (11:16 +0200)]
tools/libacpi: Use 64-byte alignment for FACS

The spec requires 64-byte alignment, not 16.

Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c76cfada1cfad05aaf64ce3ad305c5467650e782
master date: 2021-09-10 13:27:08 +0100

4 years agox86/spec-ctrl: Print all AMD speculative hints/features
Andrew Cooper [Fri, 15 Oct 2021 09:15:39 +0000 (11:15 +0200)]
x86/spec-ctrl: Print all AMD speculative hints/features

We already print Intel features that aren't yet implemented/used, so be
consistent on AMD too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3d189f16a11d5a209fb47fa3635847608d43745c
master date: 2021-09-09 12:36:17 +0100

4 years agox86/amd: Use newer SSBD mechanisms if they exist
Andrew Cooper [Fri, 15 Oct 2021 09:15:14 +0000 (11:15 +0200)]
x86/amd: Use newer SSBD mechanisms if they exist

The opencoded legacy Memory Disambiguation logic in init_amd() neglected
Fam19h for the Zen3 microarchitecture.  Further more, all Zen2 based system
have the architectural MSR_SPEC_CTRL and the SSBD bit within it, so shouldn't
be using MSR_AMD64_LS_CFG.

Implement the algorithm given in AMD's SSBD whitepaper, and leave a
printk_once() behind in the case that no controls can be found.

This now means that a user explicitly choosing `spec-ctrl=ssbd` will properly
turn off Memory Disambiguation on Fam19h/Zen3 systems.

This still remains a single system-wide setting (for now), and is not context
switched between vCPUs.  As such, it doesn't interact with Intel's use of
MSR_SPEC_CTRL and default_xen_spec_ctrl (yet).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2a4e6c4e4bea2e0bb720418c331ee28ff9c7632e
master date: 2021-09-08 14:16:19 +0100

4 years agox86/amd: Enumeration for speculative features/hints
Andrew Cooper [Fri, 15 Oct 2021 09:14:46 +0000 (11:14 +0200)]
x86/amd: Enumeration for speculative features/hints

There is a step change in speculation protections between the Zen1 and Zen2
microarchitectures.

Zen1 and older have no special support.  Control bits in non-architectural
MSRs are used to make lfence be dispatch-serialising (Spectre v1), and to
disable Memory Disambiguation (Speculative Store Bypass).  IBPB was
retrofitted in a microcode update, and software methods are required for
Spectre v2 protections.

Because the bit controlling Memory Disambiguation is model specific,
hypervisors are expected to expose a MSR_VIRT_SPEC_CTRL interface which
abstracts the model specific details.

Zen2 and later implement the MSR_SPEC_CTRL interface in hardware, and
virtualise the interface for HVM guests to use.  A number of hint bits are
specified too to help guide OS software to the most efficient mitigation
strategy.

Zen3 introduced a new feature, Predictive Store Forwarding, along with a
control to disable it in sensitive code.

Add CPUID and VMCB details for all the new functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 747424c664bb164a04e7a9f2ffbf02d4a1630d7d
master date: 2021-09-08 14:16:19 +0100

4 years agox86/spec-ctrl: Split the "Hardware features" diagnostic line
Andrew Cooper [Fri, 15 Oct 2021 09:14:16 +0000 (11:14 +0200)]
x86/spec-ctrl: Split the "Hardware features" diagnostic line

Separate the read-only hints from the features requiring active actions on
Xen's behalf.

Also take the opportunity split the IBRS/IBPB and IBPB mess.  More features
with overlapping enumeration are on the way, and and it is not useful to split
them like this.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 565ebcda976c05b0c6191510d5e32b621a2b1867
master date: 2021-09-08 14:16:19 +0100

4 years agobuild: set policy filename on make command line
Anthony PERARD [Fri, 15 Oct 2021 09:13:39 +0000 (11:13 +0200)]
build: set policy filename on make command line

In order to avoid flask/Makefile.common calling `make xenversion`, we
override POLICY_FILENAME with the value we are going to use anyway.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: c81e7efe2146c8f381fbdbb037b9d46866a6451e
master date: 2021-09-08 14:40:00 +0200

4 years agoupdate Xen version to 4.14.4-pre
Jan Beulich [Fri, 15 Oct 2021 09:11:34 +0000 (11:11 +0200)]
update Xen version to 4.14.4-pre

4 years agoVT-d: fix deassign of device with RMRR
Jan Beulich [Fri, 1 Oct 2021 13:05:42 +0000 (15:05 +0200)]
VT-d: fix deassign of device with RMRR

Ignoring a specific error code here was not meant to short circuit
deassign to _just_ the unmapping of RMRRs. This bug was previously
hidden by the bogus (potentially indefinite) looping in
pci_release_devices(), until f591755823a7 ("IOMMU/PCI: don't let domain
cleanup continue when device de-assignment failed") fixed that loop.

This is CVE-2021-28702 / XSA-386.

Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling")
Reported-by: Ivan Kardykov <kardykov@tabit.pro>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Ivan Kardykov <kardykov@tabit.pro>
(cherry picked from commit 24ebe875a77833696bbe5c9372e9e1590a7e7101)

4 years agoUpdate changelog for new upstream 4.14.3
Hans van Kranenburg [Mon, 13 Sep 2021 09:51:20 +0000 (11:51 +0200)]
Update changelog for new upstream 4.14.3

[git-debrebase changelog: new upstream 4.14.3]

4 years agoUpdate to upstream 4.14.3
Hans van Kranenburg [Mon, 13 Sep 2021 09:51:20 +0000 (11:51 +0200)]
Update to upstream 4.14.3

[git-debrebase anchor: new upstream 4.14.3, merge]

4 years agod/changelog: finish 4.14.2+25-gb6a8c4f72d-2
Hans van Kranenburg [Fri, 30 Jul 2021 14:59:02 +0000 (16:59 +0200)]
d/changelog: finish 4.14.2+25-gb6a8c4f72d-2

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
4 years agoAdd debian/README.Debian.security
Hans van Kranenburg [Fri, 30 Jul 2021 14:57:44 +0000 (16:57 +0200)]
Add debian/README.Debian.security

Add a note about the end of upstream security support, since the date is
already known right now.

Install it into the xen-hypervisor-common package.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
4 years agodebian/changelog: finish 4.14.2+25-gb6a8c4f72d-1
Hans van Kranenburg [Sun, 11 Jul 2021 13:02:08 +0000 (15:02 +0200)]
debian/changelog: finish 4.14.2+25-gb6a8c4f72d-1

4 years agoupdate Xen version to 4.14.3
Jan Beulich [Fri, 10 Sep 2021 12:30:40 +0000 (14:30 +0200)]
update Xen version to 4.14.3

4 years agognttab: deal with status frame mapping race
Jan Beulich [Wed, 8 Sep 2021 12:53:04 +0000 (14:53 +0200)]
gnttab: deal with status frame mapping race

Once gnttab_map_frame() drops the grant table lock, the MFN it reports
back to its caller is free to other manipulation. In particular
gnttab_unpopulate_status_frames() might free it, by a racing request on
another CPU, thus resulting in a reference to a deallocated page getting
added to a domain's P2M.

Obtain a page reference in gnttab_map_frame() to prevent freeing of the
page until xenmem_add_to_physmap_one() has actually completed its acting
on the page. Do so uniformly, even if only strictly required for v2
status pages, to avoid extra conditionals (which then would all need to
be kept in sync going forward).

This is CVE-2021-28701 / XSA-384.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
master commit: eb6bbf7b30da5bae87932514d54d0e3c68b23757
master date: 2021-09-08 14:37:45 +0200

4 years agox86/p2m-pt: fix p2m_flags_to_access()
Jan Beulich [Wed, 8 Sep 2021 12:52:13 +0000 (14:52 +0200)]
x86/p2m-pt: fix p2m_flags_to_access()

The initial if() was inverted, invalidating all output from this
function. Which in turn means the mirroring of P2M mappings into the
IOMMU didn't always work as intended: Mappings may have got updated when
there was no need to. There would not have been too few (un)mappings;
what saves us is that alongside the flags comparison MFNs also get
compared, with non-present entries always having an MFN of 0 or
INVALID_MFN while present entries always have MFNs different from these
two (0 in the table also meant to cover INVALID_MFN):

OLD NEW
P W access MFN P W access MFN
0 0 r 0 0 0 n 0
0 1 rw 0 0 1 n 0
1 0 n non-0 1 0 r non-0
1 1 n non-0 1 1 rw non-0

present <-> non-present transitions are fine because the MFNs differ.
present -> present transitions as well as non-present -> non-present
ones are potentially causing too many map/unmap operations, but never
too few, because in that case old (bogus) and new access differ.

Fixes: d1bb6c97c31e ("IOMMU: also pass p2m_access_t to p2m_get_iommu_flags())
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e70a9a043a5ce6d4025420f729bc473f711bf5d1
master date: 2021-09-07 14:24:49 +0200

4 years agox86/P2M: relax guarding of MMIO entries
Jan Beulich [Wed, 8 Sep 2021 12:51:48 +0000 (14:51 +0200)]
x86/P2M: relax guarding of MMIO entries

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by code paths not intended for this purpose. At least in
the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
this is too strict. Generally short-circuit requests establishing the
same kind of mapping (mfn, type), but allow permissions to differ.

While there, also add a log message to the other domain_crash()
invocation that did prevent PVH Dom0 from coming up after the XSA-378
changes.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 111469cc7b3f586c2335e70205320ed3c828b89e
master date: 2021-09-07 09:39:38 +0200

4 years agox86/PVH: de-duplicate mappings for first Mb of Dom0 memory
Jan Beulich [Wed, 8 Sep 2021 12:51:24 +0000 (14:51 +0200)]
x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by code paths not intended for this purpose. This means we
need to be more careful about the mappings put in place in this range -
mappings should be created exactly once:
- iommu_hwdom_init() comes first; it should avoid the first Mb,
- pvh_populate_p2m() should insert identity mappings only into ranges
  not populated as RAM,
- pvh_setup_acpi() should again avoid the first Mb, which was already
  dealt with at that point.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6b4f6a31ace125d658a581e8d10809e4fccdc272
master date: 2021-08-31 17:43:36 +0200

4 years agognttab: avoid triggering assertion in radix_tree_ulong_to_ptr()
Jan Beulich [Wed, 8 Sep 2021 12:50:39 +0000 (14:50 +0200)]
gnttab: avoid triggering assertion in radix_tree_ulong_to_ptr()

Relevant quotes from the C11 standard:

"Except where explicitly stated otherwise, for the purposes of this
 subclause unnamed members of objects of structure and union type do not
 participate in initialization. Unnamed members of structure objects
 have indeterminate value even after initialization."

"If there are fewer initializers in a brace-enclosed list than there are
 elements or members of an aggregate, [...], the remainder of the
 aggregate shall be initialized implicitly the same as objects that have
 static storage duration."

"If an object that has static or thread storage duration is not
 initialized explicitly, then:
 [...]
 — if it is an aggregate, every member is initialized (recursively)
   according to these rules, and any padding is initialized to zero
   bits;
 [...]"

"A bit-field declaration with no declarator, but only a colon and a
 width, indicates an unnamed bit-field." Footnote: "An unnamed bit-field
 structure member is useful for padding to conform to externally imposed
 layouts."

"There may be unnamed padding within a structure object, but not at its
 beginning."

Which makes me conclude:
- Whether an unnamed bit-field member is an unnamed member or padding is
  unclear, and hence also whether the last quote above would render the
  big endian case of the structure declaration invalid.
- Whether the number of members of an aggregate includes unnamed ones is
  also not really clear.
- The initializer in map_grant_ref() initializes all fields of the "cnt"
  sub-structure of the union, so assuming the second quote above applies
  here (indirectly), the compiler isn't required to implicitly
  initialize the rest (i.e. in particular any padding) like would happen
  for static storage duration objects.

Gcc 7.4.1 can be observed (apparently in debug builds only) to translate
aforementioned initializer to a read-modify-write operation of a stack
variable, leaving unchanged the top two bits of whatever was previously
in that stack slot. Clearly if either of the two bits were set,
radix_tree_ulong_to_ptr()'s assertion would trigger.

Therefore, to be on the safe side, add an explicit padding field for the
non-big-endian-bitfields case and give a dummy name to both padding
fields.

Fixes: 9781b51efde2 ("gnttab: replace mapkind()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b6da9d0414d69c2682214ee3ecf9816fcac500d0
master date: 2021-08-27 10:54:46 +0200

4 years agotools/firmware/ovmf: Use OvmfXen platform file is exist
Anthony PERARD [Tue, 1 Jun 2021 10:28:03 +0000 (11:28 +0100)]
tools/firmware/ovmf: Use OvmfXen platform file is exist

A platform introduced in EDK II named OvmfXen is now the one to use for
Xen instead of OvmfX64. It comes with PVH support.

Also, the Xen support in OvmfX64 is deprecated,
    "deprecation notice: *dynamic* multi-VMM (QEMU vs. Xen) support in OvmfPkg"
    https://edk2.groups.io/g/devel/message/75498

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
(cherry picked from commit aad7b5c11d51d57659978e04702ac970906894e8)
(cherry picked from commit 7988ef515a5eabe74bb5468c8c692e03ee9db8bc)

4 years agoAMD/IOMMU: don't leave page table mapped when unmapping ...
Jan Beulich [Wed, 25 Aug 2021 13:11:37 +0000 (15:11 +0200)]
AMD/IOMMU: don't leave page table mapped when unmapping ...

... an already not mapped page. With all other exit paths doing the
unmap, I have no idea how I managed to miss that aspect at the time.

Fixes: ad591454f069 ("AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
master commit: 3cfec6a6aa7a7bf68f8e19e21f450c2febe9acb4
master date: 2021-08-20 12:30:35 +0200

4 years agoxen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
Juergen Gross [Wed, 25 Aug 2021 13:11:24 +0000 (15:11 +0200)]
xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case. Drop a
redundant check in exchange.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
master commit: 5293470a77ad980dce2af9b7e6c3f11eeebf1b64
master date: 2021-08-19 13:38:31 +0200

4 years agoVT-d: Tylersburg errata apply to further steppings
Jan Beulich [Wed, 25 Aug 2021 13:11:11 +0000 (15:11 +0200)]
VT-d: Tylersburg errata apply to further steppings

While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
spec update, X58's also mentions B2, and searching the internet suggests
systems with this stepping are actually in use. Even worse, for X58
erratum #69 is marked applicable even to C2. Split the check to cover
all applicable steppings and to also report applicable errata numbers in
the log message. The splitting requires using the DMI port instead of
the System Management Registers device, but that's then in line (also
revision checking wise) with the spec updates.

Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 517a90d1ca09ce00e50d46ac25566cc3bd2eb34d
master date: 2021-08-18 09:44:14 +0200

4 years agox86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}
Andrew Cooper [Wed, 25 Aug 2021 13:10:58 +0000 (15:10 +0200)]
x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}

This was a clear oversight in the original CET work.  The BUGFRAME_run_fn and
BUGFRAME_warn paths update regs->rip without an equivalent adjustment to the
shadow stack, causing IRET to suffer #CP because of the mismatch.

One subtle, and therefore fragile, aspect of extable_shstk_fixup() was that it
required regs->rip to have its old value as a cross-check that the right word
in the shadow stack was being edited.

Rework extable_shstk_fixup() into fixup_exception_return() which takes
ownership of the update to both the regular and shadow stacks, ensuring that
the regs->rip update is ordered correctly.

Use the new fixup_exception_return() for BUGFRAME_run_fn and BUGFRAME_warn to
ensure that the shadow stack is updated too.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/cet: Fix build on newer versions of GCC

Some versions of GCC complain with:

  traps.c:405:22: error: 'get_shstk_bottom' defined but not used [-Werror=unused-function]
   static unsigned long get_shstk_bottom(unsigned long sp)
                        ^~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Change #ifdef to if ( IS_ENABLED(...) ) to make the sole user of
get_shstk_bottom() visible to the compiler.

Fixes: 35727551c070 ("x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Compile-tested-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
master commit: 35727551c0703493a2240e967cffc3063b13d49c
master date: 2021-08-16 16:03:20 +0100
master commit: 54c9736382e0d558a6acd820e44185e020131c48
master date: 2021-08-17 12:55:48 +0100

4 years agocredit2: avoid picking a spurious idle unit when caps are used
Dario Faggioli [Wed, 25 Aug 2021 13:10:45 +0000 (15:10 +0200)]
credit2: avoid picking a spurious idle unit when caps are used

Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.

In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have enough budget for being run,
we should continue looking instead than giving up and picking the idle
unit.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 0f742839ae57e10687e7a573070c37430f31068c
master date: 2021-08-10 09:29:10 +0200

4 years agoxen/lib: Fix strcmp() and strncmp()
Jane Malalane [Wed, 25 Aug 2021 13:10:32 +0000 (15:10 +0200)]
xen/lib: Fix strcmp() and strncmp()

The C standard requires that each character be compared as unsigned
char. Xen's current behaviour compares as signed char, which changes
the answer when chars with a value greater than 0x7f are used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
master commit: 3747a2bb67daa5a8baeff6cda57dc98a5ef79c3e
master date: 2021-07-30 10:52:46 +0100

4 years agox86/hvm: Propagate real error information up through hvm_load()
Andrew Cooper [Wed, 25 Aug 2021 13:10:18 +0000 (15:10 +0200)]
x86/hvm: Propagate real error information up through hvm_load()

hvm_load() is currently a mix of -errno and -1 style error handling, which
aliases -EPERM.  This leads to the following confusing diagnostics:

From userspace:
  xc: info: Restoring domain
  xc: error: Unable to restore HVM context (1 = Operation not permitted): Internal error
  xc: error: Restore failed (1 = Operation not permitted): Internal error
  xc_domain_restore: [1] Restore failed (1 = Operation not permitted)

From Xen:
  (XEN) HVM10.0 restore: inconsistent xsave state (feat=0x2ff accum=0x21f xcr0=0x7 bv=0x3 err=-22)
  (XEN) HVM10 restore: failed to load entry 16/0

The actual error was a bad backport, but the -EINVAL got converted to -EPERM
on the way out of the hypercall.

The overwhelming majority of *_load() handlers already use -errno consistenty.
Fix up the rest to be consistent, and fix a few other errors noticed along the
way.

 * Failures of hvm_load_entry() indicate a truncated record or other bad data
   size.  Use -ENODATA.
 * Don't use {g,}dprintk().  Omitting diagnostics in release builds is rude,
   and almost everything uses unconditional printk()'s.
 * Switch some errors for more appropriate ones.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 96e5ad4c476e70688295b3cfb537847a3351d6fd
master date: 2021-07-19 14:34:38 +0100

4 years agoxen/arm: Restrict the amount of memory that dom0less domU and dom0 can allocate
Julien Grall [Wed, 25 Aug 2021 13:08:29 +0000 (15:08 +0200)]
xen/arm: Restrict the amount of memory that dom0less domU and dom0 can allocate

Currently, both dom0less domUs and dom0 can allocate an "unlimited"
amount of memory because d->max_pages is set to ~0U.

In particular, the former are meant to be unprivileged. Therefore the
memory they could allocate should be bounded. As the domain are not yet
officially aware of Xen (we don't expose advertise it in the DT, yet
the hypercalls are accessible), they should not need to allocate more
than the initial amount. So cap set d->max_pages directly the amount of
memory we are meant to allocate.

Take the opportunity to also restrict the memory for dom0 as the
domain is direct mapped (e.g. MFN == GFN) and therefore cannot
allocate outside of the pre-allocated region.

This is CVE-2021-28700 / XSA-383.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: c08d68cd2aacbc7cb56e73ada241bfe4639bbc68
master date: 2021-08-25 14:19:31 +0200