Bertrand Marquis [Thu, 17 Dec 2020 15:38:03 +0000 (15:38 +0000)]
xen/arm: Add arm64 ID registers definitions
Add coprocessor registers definitions for all ID registers trapped
through the TID3 bit of HSR.
Those are the one that will be emulated in Xen to only publish to guests
the features that are supported by Xen and that are accessible to
guests.
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Bertrand Marquis [Thu, 17 Dec 2020 15:38:02 +0000 (15:38 +0000)]
xen/arm: Add ID registers and complete cpuinfo
Add definition and entries in cpuinfo for ID registers introduced in
newer Arm Architecture reference manual:
- ID_PFR2: processor feature register 2
- ID_DFR1: debug feature register 1
- ID_MMFR4 and ID_MMFR5: Memory model feature registers 4 and 5
- ID_ISA6: ISA Feature register 6
Add more bitfield definitions in PFR fields of cpuinfo.
Add MVFR2 register definition for aarch32.
Add MVFRx_EL1 defines for aarch32.
Add mvfr values in cpuinfo.
Add some registers definition for arm64 in sysregs as some are not
always know by compilers.
Initialize the new values added in cpuinfo in identify_cpu during init.
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Bertrand Marquis [Thu, 17 Dec 2020 15:38:01 +0000 (15:38 +0000)]
xen/arm: Use READ_SYSREG instead of 32/64 versions
Modify identify_cpu function to use READ_SYSREG instead of READ_SYSREG32
or READ_SYSREG64.
All aarch32 specific registers (for example ID_PFR0_EL1) are 64bit when
accessed from aarch64 with upper bits read as 0, so it is right to
access them as 64bit registers on a 64bit platform.
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 31 Dec 2020 16:55:20 +0000 (16:55 +0000)]
x86/p2m: Fix paging_gva_to_gfn() for nested virt
nestedhap_walk_L1_p2m() takes guest physical addresses, not frame numbers.
This means the l2 input is off-by-PAGE_SHIFT, as is the l1 value eventually
returned to the caller.
Delete the misleading comment as well.
Fixes: bab2bd8e222de ("xen/nested_p2m: Don't walk EPT tables with a regular PT walker")
Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
Roger Pau Monné [Mon, 4 Jan 2021 09:03:23 +0000 (10:03 +0100)]
x86/p2m: fix p2m_add_foreign error path
One of the error paths in p2m_add_foreign could call put_page with a
NULL page, thus triggering a fault.
Split the checks into two different if statements, so the appropriate
error path can be taken.
Fixes: 173ae325026bd ('x86/p2m: tidy p2m_add_foreign() a little')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Wed, 30 Dec 2020 17:34:46 +0000 (18:34 +0100)]
xen: remove the usage of the P ar option
It's not part of the POSIX standard [0] and as such non GNU ar
implementations don't usually have it.
It's not relevant for the use case here anyway, as the archive file is
recreated every time due to the rm invocation before the ar call. No
file name matching should happen so matching using the full path name
or a relative one should yield the same result.
This fixes the build on FreeBSD.
While there also drop the s option, as ar will already generate a
symbol table by default when creating the archive.
[0] https://pubs.opengroup.org/onlinepubs/
9699919799/utilities/ar.html
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 30 Dec 2020 19:26:14 +0000 (19:26 +0000)]
x86/svm: Clean up MSR_K8_VM_CR definitions
Drop the unused shift number, and reposition the constants into the cleaned-up
section. Rename VM_CR_SVM_DISABLE to be closer to its APM definition.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Tue, 29 Dec 2020 17:51:23 +0000 (17:51 +0000)]
x86/hpet: Fix return value of hpet_setup()
hpet_setup() is idempotent if the rate has already been calculated, and
returns the cached value. However, this only works correctly when the return
statements are identical.
Use a sensibly named local variable, rather than a dead one with a bad name.
Fixes: a60bb68219 ("x86/time: reduce rounding errors in calculations")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 28 Sep 2020 17:14:53 +0000 (18:14 +0100)]
xen/domain: Introduce domain_teardown()
There is no common equivelent of domain_reliquish_resources(), which has
caused various pieces of common cleanup to live in inappropriate
places.
Perhaps most obviously, evtchn_destroy() is called for every continuation of
domain_reliquish_resources(), which can easily be thousands of times.
Create domain_teardown() to be a new top level facility, and call it from the
appropriate positions in domain_kill() and domain_create()'s error path. The
intention is for this to supersede domain_reliquish_resources() in due course.
No change in behaviour yet.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 Sep 2020 15:47:58 +0000 (16:47 +0100)]
xen/domain: Reorder trivial initialisation in early domain_create()
This improves the robustness of the error paths.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Maximilian Engelhardt [Fri, 18 Dec 2020 20:42:34 +0000 (21:42 +0100)]
docs: use predictable ordering in generated documentation
When the seq number is equal, sort by the title to get predictable
output ordering. This is useful for reproducible builds.
Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 22 Dec 2020 11:01:12 +0000 (12:01 +0100)]
x86/mm: p2m_add_foreign() is HVM-only
This is the case also for xenmem_add_to_physmap_one(), as is it's only
caller of the function. Move the latter next to p2m_add_foreign(),
allowing it one to become static at the same time. While moving, adjust
indentation of the body of the main switch().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 22 Dec 2020 08:00:03 +0000 (09:00 +0100)]
x86/Intel: insert Tiger Lake model numbers
Both match prior generation processors as far as LBR and C-state MSRs
go (SDM rev 073). The if_pschange_mc erratum, according to the spec
update, is not applicable.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Maximilian Engelhardt [Tue, 22 Dec 2020 07:59:14 +0000 (08:59 +0100)]
x86/EFI: don't insert timestamp when SOURCE_DATE_EPOCH is defined
By default a timestamp gets added to the xen efi binary. Unfortunately
ld doesn't seem to provide a way to set a custom date, like from
SOURCE_DATE_EPOCH, so set a zero value for the timestamp (option
--no-insert-timestamp) if SOURCE_DATE_EPOCH is defined. This makes
reproducible builds possible.
This is an alternative to the patch suggested in [1]. This patch only
omits the timestamp when SOURCE_DATE_EPOCH is defined.
[1] https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg02161.html
Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 22 Dec 2020 07:57:19 +0000 (08:57 +0100)]
x86: verify function type (and maybe attribute) in switch_stack_and_jump()
It is imperative that the functions passed here are taking no arguments,
return no values, and don't return in the first place. While the type
can be checked uniformly, the attribute check is limited to gcc 9 and
newer (no clang support for this so far afaict).
Note that I didn't want to have the "true" fallback "implementation" of
__builtin_has_attribute(..., __noreturn__) generally available, as
"true" may not be a suitable fallback in other cases.
Note further that the noreturn addition to startup_cpu_idle_loop()'s
declaration requires adding unreachable() to Arm's
switch_stack_and_jump(), or else the build would break. I suppose this
should have been there already.
For vmx_asm_do_vmentry() along with adding the attribute, also restrict
its scope.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
Julien Grall [Fri, 18 Dec 2020 13:30:54 +0000 (13:30 +0000)]
xen: Rework WARN_ON() to return whether a warning was triggered
So far, our implementation of WARN_ON() cannot be used in the following
situation:
if ( WARN_ON() )
...
This is because WARN_ON() doesn't return whether a warning has been
triggered. Such construciton can be handy if you want to print more
information and also dump the stack trace.
Therefore, rework the WARN_ON() implementation to return whether a
warning was triggered. The idea was borrowed from Linux
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 21 Dec 2020 14:52:26 +0000 (14:52 +0000)]
x86/shadow: Fix build with !CONFIG_SHADOW_PAGING
Implement a stub for shadow_vcpu_teardown()
Fixes: d162f36848c4 ("xen/x86: Fix memory leak in vcpu_create() error path")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 Sep 2020 14:25:44 +0000 (15:25 +0100)]
xen/x86: Fix memory leak in vcpu_create() error path
Various paths in vcpu_create() end up calling paging_update_paging_modes(),
which eventually allocate a monitor pagetable if one doesn't exist.
However, an error in vcpu_create() results in the vcpu being cleaned up
locally, and not put onto the domain's vcpu list. Therefore, the monitor
table is not freed by {hap,shadow}_teardown()'s loop. This is caught by
assertions later that we've successfully freed the entire hap/shadow memory
pool.
The per-vcpu loops in domain teardown logic is conceptually wrong, but exist
due to insufficient existing structure in the existing logic.
Break paging_vcpu_teardown() out of paging_teardown(), with mirrored breakouts
in the hap/shadow code, and use it from arch_vcpu_create()'s error path. This
fixes the memory leak.
The new {hap,shadow}_vcpu_teardown() must be idempotent, and are written to be
as tolerable as possible, with the minimum number of safety checks possible.
In particular, drop the mfn_valid() check - if these fields are junk, then Xen
is going to explode anyway.
Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 18 Dec 2020 23:30:04 +0000 (23:30 +0000)]
xen/Kconfig: Correct the NR_CPUS description
The description "physical CPUs" is especially wrong, as it implies the number
of sockets, which tops out at 8 on all but the very biggest servers.
NR_CPUS is the number of logical entities the scheduler can use.
Reported-by: hanetzer@startmail.com
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 18 Dec 2020 17:53:13 +0000 (17:53 +0000)]
Revert "x86/mm: p2m_add_foreign() is HVM-only"
This reverts commit
8009c33b5179536e2ecce54462fe4cd069060f77. It breaks the
PV-Shim build.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 18 Dec 2020 12:29:14 +0000 (13:29 +0100)]
x86/mm: p2m_add_foreign() is HVM-only
This is the case also for xenmem_add_to_physmap_one(), as is it's only
caller of the function. Move the latter next to p2m_add_foreign(),
allowing it one to become static at the same time. While moving, adjust
indentation of the body of the main switch().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 18 Dec 2020 12:28:30 +0000 (13:28 +0100)]
x86/p2m: tidy p2m_add_foreign() a little
Drop a bogus ASSERT() - we don't typically assert incoming domain
pointers to be non-NULL, and there's no particular reason to do so here.
Replace the open-coded DOMID_SELF check by use of
rcu_lock_remote_domain_by_id(), at the same time covering the request
being made with the current domain's actual ID.
Move the "both domains same" check into just the path where it really
is meaningful.
Swap the order of the two puts, such that
- the p2m lock isn't needlessly held across put_page(),
- a separate put_page() on an error path can be avoided,
- they're inverse to the order of the respective gets.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 18 Dec 2020 12:25:40 +0000 (13:25 +0100)]
lib: move sort code
Build this code into an archive, partly paralleling bsearch().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 18 Dec 2020 12:23:42 +0000 (13:23 +0100)]
lib: move bsearch code
Convert this code to an inline function (backed by an instance in an
archive in case the compiler decides against inlining), which results
in not having it in x86 final binaries. This saves a little bit of dead
code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 18 Dec 2020 12:22:54 +0000 (13:22 +0100)]
lib: move rbtree code
Build this code into an archive, which results in not linking it into
x86 final binaries. This saves about 1.5k of dead code.
While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL() and an instance of trailing whitespace.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 18 Dec 2020 12:22:10 +0000 (13:22 +0100)]
lib: move init_constructors()
... into its own CU, for being unrelated to other things in
common/lib.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 18 Dec 2020 12:21:25 +0000 (13:21 +0100)]
lib: move parse_size_and_unit()
... into its own CU, to build it into an archive.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 18 Dec 2020 12:20:42 +0000 (13:20 +0100)]
lib: move list sorting code
Build the source file always, as by putting it into an archive it still
won't be linked into final binaries when not needed. This way possible
build breakage will be easier to notice, and it's more consistent with
us unconditionally building other library kind of code (e.g. sort() or
bsearch()).
While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL() and an unnecessary #include.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Fri, 18 Dec 2020 12:17:57 +0000 (13:17 +0100)]
lib: collect library files in an archive
In order to (subsequently) drop odd things like CONFIG_NEEDS_LIST_SORT
just to avoid bloating binaries when only some arch-es and/or
configurations need generic library routines, combine objects under lib/
into an archive, which the linker then can pick the necessary objects
out of.
Note that we can't use thin archives just yet, until we've raised the
minimum required binutils version suitably.
Note further that --start-group / --end-group get put in place right
away to allow for symbol resolution across all archives, once we gain
multuiple ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Stefano Stabellini [Tue, 24 Nov 2020 21:33:14 +0000 (13:33 -0800)]
automation: add domU creation to dom0 alpine linux test
Add a trivial Busybox based domU.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Tue, 24 Nov 2020 21:22:17 +0000 (13:22 -0800)]
automation: use the tests-artifacts kernel for qemu-smoke-arm64-gcc
Use the tests-artifacts kernel, instead of the Debian kernel, for the
qemu-smoke-arm64-gcc job.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Tue, 24 Nov 2020 21:15:51 +0000 (13:15 -0800)]
automation: create an alpine linux arm64 test job
Create a test job that starts Xen and Dom0 on QEMU based on the alpine
linux rootfs. Use the Linux kernel and rootfs from the tests-artifacts
containers. Add the Xen tools binaries from the Alpine Linux build job.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Tue, 24 Nov 2020 21:13:50 +0000 (13:13 -0800)]
automation: make available the tests artifacts to the pipeline
In order to make available the pre-built binaries of the
automation/tests-artifacts containers to the gitlab-ci pipeline we need
to export them as gitlab artifacts.
To do that, we create two "fake" jobs that simply export the require
binaries as artifacts and do nothing else.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Tue, 24 Nov 2020 21:08:20 +0000 (13:08 -0800)]
automation: add tests artifacts
Some tests (soon to come) will require pre-built binaries to run, such
as the Linux kernel binary. We don't want to rebuild the Linux kernel
for each gitlab-ci run: these builds should not be added to the current
list of build jobs.
Instead, create additional containers that today are built and uploaded
manually, but could be re-built automatically. The containers build the
required binarires during the "docker build" step and store them inside
the container itself.
gitlab-ci will be able to fetch these pre-built binaries during the
regular test runs, saving cycles.
Add two tests artifacts containers:
- one to build the Linux kernel ARM64
- one to create an Alpine Linux ARM64 rootfs for Dom0
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Fri, 20 Nov 2020 17:56:25 +0000 (09:56 -0800)]
automation: add alpine linux x86 build jobs
Allow failure for these jobs. Currently they fail because hvmloader
doesn't build with musl. The failures don't block the pipeline.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Fri, 20 Nov 2020 17:54:01 +0000 (09:54 -0800)]
automation: add alpine linux 3.12 x86 build container
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Wed, 18 Nov 2020 01:07:43 +0000 (17:07 -0800)]
automation: add alpine linux arm64 build test
Based on the arm64 3.12 build container
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Wed, 18 Nov 2020 01:03:55 +0000 (17:03 -0800)]
automation: add alpine linux 3.12 arm64 build container
The build container will be used for a new Alpine Linux 3.12 arm64 build
test.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Fri, 20 Nov 2020 03:20:15 +0000 (19:20 -0800)]
automation: special configure flags for musl-based systems
QEMU upstream builds with warnings when libc is musl:
#warning redirecting incorrect #include <sys/signal.h> to <signal.h>
Disable -Werror by passing --disable-werror to the QEMUU config script
if libc is musl.
hvmloader doesn't build on musl systems today. Disable any guest
firmware build.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Fri, 13 Nov 2020 23:22:41 +0000 (15:22 -0800)]
automation: add dom0less to the QEMU aarch64 smoke test
Add a trivial dom0less test:
- fetch the Debian arm64 kernel and use it ad dom0/U kernel
- use busybox-static to create a trivial dom0/U ramdisk
- use ImageBuilder to generate the uboot boot script automatically
- install and use u-boot from the Debian package to start the test
- binaries are loaded from uboot via tftp
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Stefano Stabellini [Fri, 13 Nov 2020 02:30:33 +0000 (18:30 -0800)]
automation: add a QEMU aarch64 smoke test
Use QEMU to start Xen (just the hypervisor) up until it stops because
there is no dom0 kernel to boot.
It is based on the existing build job unstable-arm64v8.
Also use make -j$(nproc) to build Xen.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Thu, 17 Dec 2020 15:50:21 +0000 (16:50 +0100)]
xen/hypfs: add new enter() and exit() per node callbacks
In order to better support resource allocation and locking for dynamic
hypfs nodes add enter() and exit() callbacks to struct hypfs_funcs.
The enter() callback is called when entering a node during hypfs user
actions (traversing, reading or writing it), while the exit() callback
is called when leaving a node (accessing another node at the same or a
higher directory level, or when returning to the user).
For avoiding recursion this requires a parent pointer in each node.
Let the enter() callback return the entry address which is stored as
the last accessed node in order to be able to use a template entry for
that purpose in case of dynamic entries.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Thu, 17 Dec 2020 15:49:49 +0000 (16:49 +0100)]
xen/hypfs: switch write function handles to const
The node specific write functions take a void user address handle as
parameter. As a write won't change the user memory use a const_void
handle instead.
This requires a new macro for casting a guest handle to a const type.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Thu, 17 Dec 2020 15:49:11 +0000 (16:49 +0100)]
xen/cpupool: support moving domain between cpupools with different granularity
When moving a domain between cpupools with different scheduling
granularity the sched_units of the domain need to be adjusted.
Do that by allocating new sched_units and throwing away the old ones
in sched_move_domain().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Juergen Gross [Tue, 15 Dec 2020 16:35:41 +0000 (17:35 +0100)]
tools/xenstore: remove unused cruft from xenstored_domain.c
domain->remote_port and restore_existing_connections() are useless and
can be removed.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Juergen Gross [Tue, 15 Dec 2020 16:35:40 +0000 (17:35 +0100)]
tools/xenstore: make set_tdb_key() non-static
set_tdb_key() can be used by destroy_node(), too. So remove the static
attribute and move it to xenstored_core.c.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Juergen Gross [Tue, 15 Dec 2020 16:35:39 +0000 (17:35 +0100)]
tools/xenstore: switch barf[_perror]() to use syslog()
When xenstored crashes due to an unrecoverable condition it is calling
either barf() or barf_perror() to issue a message and then exit().
Make sure the message is visible somewhere by using syslog()
additionally to xprintf(), as the latter will be visible only with
tracing active.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Luca Fancellu [Thu, 10 Dec 2020 10:42:58 +0000 (10:42 +0000)]
xen/arm: Add workaround for Cortex-A53 erratum #843419
On the Cortex A53, when executing in AArch64 state, a load or store instruction
which uses the result of an ADRP instruction as a base register, or which uses
a base register written by an instruction immediately after an ADRP to the
same register, might access an incorrect address.
The workaround is to enable the linker flag --fix-cortex-a53-843419
if present, to check and fix the affected sequence. Otherwise print a warning
that Xen may be susceptible to this errata
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Wei Liu [Wed, 16 Dec 2020 17:48:04 +0000 (17:48 +0000)]
Revert patches that break libxl API
This patch reverts eight patches from staging.
The offending patch is the one that introduced libxl_pci_bdf (last one
in the list). The rest depend on that patch so they are also reverted.
8bf0fab14256 "libxl / libxlu: support 'xl pci-attach/detach' by name"
e1141654c374 "docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING"
93c16ae47baf "xl: support naming of assignable devices"
5ab684cb3e4d "libxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ..."
66c2fbc6e82b "libxl: convert internal functions in libxl_pci.c..."
f73c5dd56d78 "docs/man: modify xl(1) in preparation for naming of assignable devices"
96ed6ff29741 "libxlu: introduce xlu_pci_parse_spec_string()"
929f23114061 "libxl: introduce 'libxl_pci_bdf' in the idl..."
Signed-off-by: Wei Liu <wl@xen.org>
Jan Beulich [Wed, 16 Dec 2020 15:44:18 +0000 (16:44 +0100)]
x86/p2m: set_shared_p2m_entry() is MEM_SHARING-only
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Wed, 16 Dec 2020 15:43:32 +0000 (16:43 +0100)]
livepatch: adjust a stale comment
As of
005de45c887e ("xen: do live patching only from main idle loop")
the comment ahead of livepatch_do_action() has been stale.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Wed, 16 Dec 2020 15:42:50 +0000 (16:42 +0100)]
x86/PV: avoid double stack reset during schedule tail handling
Invoking check_wakeup_from_wait() from assembly allows the new
continue_pv_domain() to replace the prior continue_nonidle_domain() as
the tail hook, eliminating an extra reset_stack_and_jump().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Jan Beulich [Wed, 16 Dec 2020 15:41:46 +0000 (16:41 +0100)]
x86: clobber registers in switch_stack_and_jump() when !LIVEPATCH
In order to have the same effect on registers as a call to
check_for_livepatch_work() may have, clobber all call-clobbered
registers in debug builds.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:33 +0000 (19:30 +0000)]
libxl / libxlu: support 'xl pci-attach/detach' by name
This patch adds a 'name' field into the idl for 'libxl_device_pci' and
libxlu_pci_parse_spec_string() is modified to parse the new 'name'
parameter of PCI_SPEC_STRING detailed in the updated documention in
xl-pci-configuration(5).
If the 'name' field is non-NULL then both libxl_device_pci_add() and
libxl_device_pci_remove() will use it to look up the device BDF in
the list of assignable devices.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:32 +0000 (19:30 +0000)]
docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING
Since assignable devices can be named, a subsequent patch will support use
of a PCI_SPEC_STRING containing a 'name' parameter instead of a 'bdf'. In
this case the name will be used to look up the 'bdf' in the list of assignable
(or assigned) devices.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:31 +0000 (19:30 +0000)]
xl: support naming of assignable devices
This patch converts libxl to use libxl_pci_bdf_assignable_add/remove/list/
list_free() rather than libxl_device_pci_assignable_add/remove/list/
list_free(), which then allows naming of assignable devices to be supported.
With this patch applied 'xl pci-assignable-add' will take an optional '--name'
parameter, 'xl pci-assignable-remove' can be passed either a BDF or a name and
'xl pci-assignable-list' will take a optional '--show-names' flag which
determines whether names are displayed in its output.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:30 +0000 (19:30 +0000)]
libxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ...
which support naming and use 'libxl_pci_bdf' rather than 'libxl_device_pci',
as replacements for libxl_device_pci_assignable_add/remove/list/list_free().
libxl_pci_bdf_assignable_add() takes a 'name' parameter which is stored in
xenstore and facilitates two addtional functions added by this patch:
libxl_pci_bdf_assignable_name2bdf() and libxl_pci_bdf_assignable_bdf2name().
Currently there are no callers of these two functions. They will be added in
a subsequent patch.
libxl_device_pci_assignable_add/remove/list/list_free() are left in place
for compatibility but are re-implemented in terms of the newly introduced
functions.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:29 +0000 (19:30 +0000)]
libxl: convert internal functions in libxl_pci.c...
... to use 'libx_pci_bdf' where appropriate.
No API change.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:28 +0000 (19:30 +0000)]
docs/man: modify xl(1) in preparation for naming of assignable devices
A subsequent patch will introduce code to allow a name to be specified to
'xl pci-assignable-add' such that the assignable device may be referred to
by than name in subsequent operations.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:27 +0000 (19:30 +0000)]
libxlu: introduce xlu_pci_parse_spec_string()
This patch largely re-writes the code to parse a PCI_SPEC_STRING and enters
it via the newly introduced function. The new parser also deals with 'bdf'
and 'vslot' as non-positional paramaters, as per the documentation in
xl-pci-configuration(5).
The existing xlu_pci_parse_bdf() function remains, but now strictly parses
BDF values. Some existing callers of xlu_pci_parse_bdf() are
modified to call xlu_pci_parse_spec_string() as per the documentation in xl(1).
NOTE: Usage text in xl_cmdtable.c and error messages are also modified
appropriately.
As a side-effect this patch also fixes a bug where using '*' to specify
all functions would lead to an assertion failure at the end of
xlu_pci_parse_bdf().
Fixes: d25cc3ec93eb ("libxl: workaround gcc 10.2 maybe-uninitialized warning")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:26 +0000 (19:30 +0000)]
libxl: introduce 'libxl_pci_bdf' in the idl...
... and use in 'libxl_device_pci'
This patch is preparatory work for restricting the type passed to functions
that only require BDF information, rather than passing a 'libxl_device_pci'
structure which is only partially filled. In this patch only the minimal
mechanical changes necessary to deal with the structural changes are made.
Subsequent patches will adjust the code to make better use of the new type.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Paul Durrant [Tue, 8 Dec 2020 19:30:25 +0000 (19:30 +0000)]
docs/man: fix xl(1) documentation for 'pci' operations
Currently the documentation completely fails to mention the existence of
PCI_SPEC_STRING. This patch tidies things up, specifically clarifying that
'pci-assignable-add/remove' take <BDF> arguments where as 'pci-attach/detach'
take <PCI_SPEC_STRING> arguments (which will be enforced in a subsequent
patch).
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:24 +0000 (19:30 +0000)]
docs/man: improve documentation of PCI_SPEC_STRING...
... and prepare for adding support for non-positional parsing of 'bdf' and
'vslot' in a subsequent patch.
Also document 'BDF' as a first-class parameter type and fix the documentation
to state that the default value of 'rdm_policy' is actually 'strict', not
'relaxed', as can be seen in libxl__device_pci_setdefault().
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:23 +0000 (19:30 +0000)]
docs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg manpage...
... and put it into a new xl-pci-configuration(5) manpage, akin to the
xl-network-configration(5) and xl-disk-configuration(5) manpages.
This patch moves the content of the section verbatim. A subsequent patch
will improve the documentation, once it is in its new location.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:22 +0000 (19:30 +0000)]
libxl: use COMPARE_PCI() macro is_pci_in_array()...
... rather than an open-coded equivalent.
This patch tidies up the is_pci_in_array() function, making it take a single
'libxl_device_pci' argument rather than separate domain, bus, device and
function arguments. The already-available COMPARE_PCI() macro can then be
used and it is also modified to return 'bool' rather than 'int'.
The patch also modifies libxl_pci_assignable() to use is_pci_in_array() rather
than a separate open-coded equivalent, and also modifies it to return a
'bool' rather than an 'int'.
NOTE: The COMPARE_PCI() macro is also fixed to include the 'domain' in its
comparison, which should always have been the case.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:21 +0000 (19:30 +0000)]
libxl: add libxl_device_pci_assignable_list_free()...
... to be used by callers of libxl_device_pci_assignable_list().
Currently there is no API for callers of libxl_device_pci_assignable_list()
to free the list. The xl function pciassignable_list() calls
libxl_device_pci_dispose() on each element of the returned list, but
libxl_pci_assignable() in libxl_pci.c does not. Neither does the implementation
of libxl_device_pci_assignable_list() call libxl_device_pci_init().
This patch adds the new API function, makes sure it is used everywhere and
also modifies libxl_device_pci_assignable_list() to initialize list
entries rather than just zeroing them.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:20 +0000 (19:30 +0000)]
libxl: make sure callers of libxl_device_pci_list() free the list after use
A previous patch introduced libxl_device_pci_list_free() which should be used
by callers of libxl_device_pci_list() to properly dispose of the exported
'libxl_device_pci' types and the free the memory holding them. Whilst all
current callers do ensure the memory is freed, only the code in xl's
pcilist() function actually calls libxl_device_pci_dispose(). As it stands
this laxity does not lead to any memory leaks, but the simple addition of
.e.g. a 'string' into the idl definition of 'libxl_device_pci' would lead
to leaks.
This patch makes sure all callers of libxl_device_pci_list() can call
libxl_device_pci_list_free() by keeping copies of 'libxl_device_pci'
structures inline in 'pci_add_state' and 'pci_remove_state' (and also making
sure these are properly disposed at the end of the operations) rather
than keeping pointers to the structures returned by libxl_device_pci_list().
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:19 +0000 (19:30 +0000)]
libxl: remove get_all_assigned_devices() from libxl_pci.c
Use of this function is a very inefficient way to check whether a device
has already been assigned.
This patch adds code that saves the domain id in xenstore at the point of
assignment, and removes it again when the device id de-assigned (or the
domain is destroyed). It is then straightforward to check whether a device
has been assigned by checking whether a device has a saved domain id.
NOTE: To facilitate the xenstore check it is necessary to move the
pci_info_xs_read() earlier in libxl_pci.c. To keep related functions
together, the rest of the pci_info_xs_XXX() functions are moved too.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:18 +0000 (19:30 +0000)]
libxl: remove unnecessary check from libxl__device_pci_add()
The code currently checks explicitly whether the device is already assigned,
but this is actually unnecessary as assigned devices do not form part of
the list returned by libxl_device_pci_assignable_list() and hence the
libxl_pci_assignable() test would have already failed.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:17 +0000 (19:30 +0000)]
libxl: generalise 'driver_path' xenstore access functions in libxl_pci.c
For the purposes of re-binding a device to its previous driver
libxl__device_pci_assignable_add() writes the driver path into xenstore.
This path is then read back in libxl__device_pci_assignable_remove().
The functions that support this writing to and reading from xenstore are
currently dedicated for this purpose and hence the node name 'driver_path'
is hard-coded. This patch generalizes these utility functions and passes
'driver_path' as an argument. Subsequent patches will invoke them to
access other nodes.
NOTE: Because functions will have a broader use (other than storing a
driver path in lieu of pciback) the base xenstore path is also
changed from '/libxl/pciback' to '/libxl/pci'.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:16 +0000 (19:30 +0000)]
libxl: stop using aodev->device_config in libxl__device_pci_add()...
... to hold a pointer to the device.
There is already a 'pci' field in 'pci_add_state' so simply use that from
the start. This also allows the 'pci' (#3) argument to be dropped from
do_pci_add().
NOTE: This patch also changes the type of the 'pci_domid' field in
'pci_add_state' from 'int' to 'libxl_domid' which is more appropriate
given what the field is used for.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:15 +0000 (19:30 +0000)]
libxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c
Both 'domid' and 'pci' are available in 'pci_remove_state' so there is no
need to also pass them as separate arguments.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:14 +0000 (19:30 +0000)]
libxl: s/detatched/detached in libxl_pci.c
Simply spelling correction. Purely cosmetic fix.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:13 +0000 (19:30 +0000)]
libxl: add/recover 'rdm_policy' to/from PCI backend in xenstore
Other parameters, such as 'msitranslate' and 'permissive' are dealt with
but 'rdm_policy' appears to be have been completely missed.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:12 +0000 (19:30 +0000)]
libxl: Make sure devices added by pci-attach are reflected in the config
Currently libxl__device_pci_add_xenstore() is broken in that does not
update the domain's configuration for the first device added (which causes
creation of the overall backend area in xenstore). This can be easily observed
by running 'xl list -l' after adding a single device: the device will be
missing.
This patch fixes the problem and adds a DEBUG log line to allow easy
verification that the domain configuration is being modified. Also, the use
of libxl__device_generic_add() is dropped as it leads to a confusing situation
where only partial backend information is written under the xenstore
'/libxl' path. For LIBXL__DEVICE_KIND_PCI devices the only definitive
information in xenstore is under '/local/domain/0/backend' (the '0' being
hard-coded).
NOTE: This patch includes a whitespace in add_pcis_done().
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:11 +0000 (19:30 +0000)]
libxl: make libxl__device_list() work correctly for LIBXL__DEVICE_KIND_PCI...
... devices.
Currently there is an assumption built into libxl__device_list() that device
backends are fully enumarated under the '/libxl' path in xenstore. This is
not the case for PCI backend devices, which are only properly enumerated
under '/local/domain/0/backend'.
This patch adds a new get_path() method to libxl__device_type to allow a
backend implementation (such as PCI) to specify the xenstore path where
devices are enumerated and modifies libxl__device_list() to use this method
if it is available. Also, if the get_num() method is defined then the
from_xenstore() method expects to be passed the backend path without the device
number concatenated, so this issue is also rectified.
Having made libxl__device_list() work correctly, this patch removes the
open-coded libxl_pci_device_pci_list() in favour of an evaluation of the
LIBXL_DEFINE_DEVICE_LIST() macro. This has the side-effect of also defining
libxl_pci_device_pci_list_free() which will be used in subsequent patches.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:10 +0000 (19:30 +0000)]
xl: s/pcidev/pci where possible
To improve naming consistency, replaces occurrences of 'pcidev' with 'pci'.
The only remaining use of the term should be in relation to
'libxl_domain_config' where there are fields named 'pcidevs' and 'num_pcidevs'.
Purely cosmetic. No functional change.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 8 Dec 2020 19:30:09 +0000 (19:30 +0000)]
libxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X
The seemingly arbitrary use of 'pci' and 'pcidev' in the code in libxl_pci.c
is confusing and also compromises use of some macros used for other device
types. Indeed it seems that DEFINE_DEVICE_TYPE_STRUCT_X exists solely because
of this duality.
This patch purges use of 'pcidev' from the libxl internal code, but
unfortunately the 'pcidevs' and 'num_pcidevs' fields in 'libxl_domain_config'
are part of the API and need to be retained to avoid breaking callers,
particularly libvirt.
DEFINE_DEVICE_TYPE_STRUCT_X is still removed to avoid the special case in
libxl_pci.c but DEFINE_DEVICE_TYPE_STRUCT is given an extra 'array' argument
which is used to identify the fields in 'libxl_domain_config' relating to
the device type.
NOTE: Some of the more gross formatting errors (such as lack of spaces after
keywords) that came into context have been fixed in libxl_pci.c.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Olaf Hering [Wed, 9 Dec 2020 15:54:50 +0000 (16:54 +0100)]
tools: remove unused ORDER_LONG
There are no users left, xenpaging has its own variant.
The last user was removed with commit
11d0044a168994de85b9b328452292852aedc871
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
Olaf Hering [Wed, 9 Dec 2020 15:54:49 +0000 (16:54 +0100)]
tools: allocate bitmaps in units of unsigned long
Allocate enough memory so that the returned pointer can be safely
accessed as an array of unsigned long.
The actual bitmap size in units of bytes, as returned by bitmap_size,
remains unchanged.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Tue, 15 Dec 2020 15:04:11 +0000 (16:04 +0100)]
tools/xenstore: rework path length check
The different fixed limits for absolute and relative path lengths of
Xenstore nodes make it possible to create per-domain nodes via
absolute paths which are not accessible using relative paths, as the
two limits differ by 1024 characters.
Instead of this weird limits use only one limit, which applies to the
relative path length of per-domain nodes and to the absolute path
length of all other nodes. This means, the path length check is
applied to the path after removing a possible start of
"/local/domain/<n>/" with <n> being a domain id.
There has been the request to be able to limit the path lengths even
more, so an additional quota is added which can be applied to path
lengths. It is XENSTORE_REL_PATH_MAX (2048) per default, but can be
set to lower values. This is done via the new "-M" or "--path-max"
option when invoking xenstored.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Tue, 8 Dec 2020 10:30:26 +0000 (11:30 +0100)]
MAINTAINERS: add me as maintainer for tools/xenstore/
I have been the major contributor for C Xenstore the past few years.
Add me as a maintainer for tools/xenstore/.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Elliott Mitchell [Tue, 15 Dec 2020 02:35:32 +0000 (18:35 -0800)]
examples: Add PVH example to config example list
Somewhat helpful to actually install the example configurations.
Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Tue, 15 Dec 2020 12:47:45 +0000 (13:47 +0100)]
x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables
While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page()
may run when guest user mode is active, and hence may need to switch to
the kernel page tables in order to retrieve an LDT page mapping.
Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
Jan Beulich [Tue, 15 Dec 2020 12:46:37 +0000 (13:46 +0100)]
evtchn/FIFO: re-order and synchronize (with) map_control_block()
For evtchn_fifo_set_pending()'s check of the control block having been
set to be effective, ordering of respective reads and writes needs to be
ensured: The control block pointer needs to be recorded strictly after
the setting of all the queue heads, and it needs checking strictly
before any uses of them (this latter aspect was already guaranteed).
This is XSA-358 / CVE-2020-29570.
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Tue, 15 Dec 2020 12:42:51 +0000 (13:42 +0100)]
evtchn/FIFO: add 2nd smp_rmb() to evtchn_fifo_word_from_port()
Besides with add_page_to_event_array() the function also needs to
synchronize with evtchn_fifo_init_control() setting both d->evtchn_fifo
and (subsequently) d->evtchn_port_ops.
This is XSA-359 / CVE-2020-29571.
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monné [Tue, 15 Dec 2020 12:42:16 +0000 (13:42 +0100)]
x86/irq: fix infinite loop in irq_move_cleanup_interrupt
If Xen enters irq_move_cleanup_interrupt with a dynamic vector below
IRQ_MOVE_CLEANUP_VECTOR pending in IRR (0x20 or 0x21) that's also
designated for a cleanup it will enter a loop where
irq_move_cleanup_interrupt continuously sends a cleanup IPI (vector
0x22) to itself while waiting for the vector with lower priority to be
injected - which will never happen because IRQ_MOVE_CLEANUP_VECTOR
takes precedence and it's always injected first.
Fix this by making sure vectors below IRQ_MOVE_CLEANUP_VECTOR are
marked as used and thus not available for APs. Also add some logic to
assert and prevent irq_move_cleanup_interrupt from entering such an
infinite loop, albeit that should never happen given the current code.
This is XSA-356 / CVE-2020-29567.
Fixes: 3fba06ba9f8 ('x86/IRQ: re-use legacy vector ranges on APs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 15 Dec 2020 12:41:23 +0000 (13:41 +0100)]
x86: avoid calling {svm,vmx}_do_resume()
These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.
Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.
This is XSA-348 / CVE-2020-29566.
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Tue, 15 Dec 2020 12:41:09 +0000 (13:41 +0100)]
x86: fold guest_idle_loop() into idle_loop()
The latter can easily be made cover both cases. This is in preparation
of using idle_loop directly for populating idle_csw.tail.
Take the liberty and also adjust indentation / spacing in involved code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Tue, 15 Dec 2020 12:40:27 +0000 (13:40 +0100)]
x86: replace reset_stack_and_jump_nolp()
Move the necessary check into check_for_livepatch_work(), rather than
mostly duplicating reset_stack_and_jump() for this purpose. This is to
prevent an inflation of reset_stack_and_jump() flavors.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Edwin Török [Tue, 15 Dec 2020 12:37:33 +0000 (13:37 +0100)]
tools/ocaml/xenstored: only Dom0 can change node owner
Otherwise we can give quota away to another domain, either causing it to run
out of quota, or in case of Dom0 use unbounded amounts of memory and bypass
the quota system entirely.
This was fixed in the C version of xenstored in 2006 (c/s
db34d2aaa5f5,
predating the XSA process by 5 years).
It was also fixed in the mirage version of xenstore in 2012, with a unit test
demonstrating the vulnerability:
https://github.com/mirage/ocaml-xenstore/commit/
6b91f3ac46b885d0530a51d57a9b3a57d64923a7
https://github.com/mirage/ocaml-xenstore/commit/
22ee5417c90b8fda905c38de0d534506152eace6
but possibly without realising that the vulnerability still affected the
in-tree oxenstored (added c/s
f44af660412 in 2010).
This is XSA-352.
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Tue, 15 Dec 2020 12:37:14 +0000 (13:37 +0100)]
tools/ocaml/xenstored: delete watch from trie too when resetting watches
c/s
f8c72b526129 "oxenstored: implement XS_RESET_WATCHES" from Xen 4.6
introduced reset watches support in oxenstored by mirroring the change
in cxenstored.
However the OCaml version has some additional data structures to
optimize watch firing, and just resetting the watches in one of the data
structures creates a security bug where a malicious guest kernel can
exceed its watch quota, driving oxenstored into OOM:
* create watches
* reset watches (this still keeps the watches lingering in another data
structure, using memory)
* create some more watches
* loop until oxenstored dies
The guest kernel doesn't necessarily have to be malicious to trigger
this:
* if control/platform-feature-xs_reset_watches is set
* the guest kexecs (e.g. because it crashes)
* on boot more watches are set up
* this will slowly "leak" memory for watches in oxenstored, driving it
towards OOM.
This is XSA-330.
Fixes: f8c72b526129 ("oxenstored: implement XS_RESET_WATCHES")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Harsha Shamsundara Havanur [Tue, 15 Dec 2020 12:36:45 +0000 (13:36 +0100)]
tools/xenstore: Preserve bad client until they are destroyed
XenStored will kill any connection that it thinks has misbehaved,
this is currently happening in two places:
* In `handle_input()` if the sanity check on the ring and the message
fails.
* In `handle_output()` when failing to write the response in the ring.
As the domain structure is a child of the connection, XenStored will
destroy its view of the domain when killing the connection. This will
result in sending @releaseDomain event to all the watchers.
As the watch event doesn't carry which domain has been released,
the watcher (such as XenStored) will generally go through the list of
domains registers and check if one of them is shutting down/dying.
In the case of a client misbehaving, the domain will likely to be
running, so no action will be performed.
When the domain is effectively destroyed, XenStored will not be aware of
the domain anymore. So the watch event is not going to be sent.
By consequence, the watchers of the event will not release mappings
they may have on the domain. This will result in a zombie domain.
In order to send @releaseDomain event at the correct time, we want
to keep the domain structure until the domain is effectively
shutting-down/dying.
We also want to keep the connection around so we could possibly revive
the connection in the future.
A new flag 'is_ignored' is added to mark whether a connection should be
ignored when checking if there are work to do. Additionally any
transactions, watches, buffers associated to the connection will be
freed as you can't do much with them (restarting the connection will
likely need a reset).
As a side note, when the device model were running in a stubdomain, a
guest would have been able to introduce a use-after-free because there
is two parents for a guest connection.
This is XSA-325.
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Juergen Gross [Tue, 15 Dec 2020 12:36:42 +0000 (13:36 +0100)]
tools/xenstore: drop watch event messages exceeding maximum size
By setting a watch with a very large tag it is possible to trick
xenstored to send watch event messages exceeding the maximum allowed
payload size. This might in turn lead to a crash of xenstored as the
resulting error can cause dereferencing a NULL pointer in case there
is no active request being handled by the guest the watch event is
being sent to.
Fix that by just dropping such watch events. Additionally modify the
error handling to test the pointer to be not NULL before dereferencing
it.
This is XSA-324.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Edwin Török [Tue, 15 Dec 2020 12:36:39 +0000 (13:36 +0100)]
tools/ocaml/xenstored: Fix path length validation
Currently, oxenstored checks the length of paths against 1024, then
prepends "/local/domain/$DOMID/" to relative paths. This allows a domU
to create paths which can't subsequently be read by anyone, even dom0.
This also interferes with listing directories, etc.
Define a new oxenstored.conf entry: quota-path-max, defaulting to 1024
as before. For paths that begin with "/local/domain/$DOMID/" check the
relative path length against this quota. For all other paths check the
entire path length.
This ensures that if the domid changes (and thus the length of a prefix
changes) a path that used to be valid stays valid (e.g. after a
live-migration). It also ensures that regardless how the client tries
to access a path (domid-relative or absolute) it will get consistent
results, since the limit is always applied on the final canonicalized
path.
Delete the unused Domain.get_path to avoid it being confused with
Connection.get_path (which differs by a trailing slash only).
Rewrite Util.path_validate to apply the appropriate length restriction
based on whether the path is relative or not. Remove the check for
connection_path being absolute, because it is not guest controlled data.
This is part of XSA-323.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Edwin Török [Tue, 15 Dec 2020 12:36:04 +0000 (13:36 +0100)]
tools/ocaml/xenstored: clean up permissions for dead domains
domain ids are prone to wrapping (15-bits), and with sufficient number
of VMs in a reboot loop it is possible to trigger it. Xenstore entries
may linger after a domain dies, until a toolstack cleans it up. During
this time there is a window where a wrapped domid could access these
xenstore keys (that belonged to another VM).
To prevent this do a cleanup when a domain dies:
* walk the entire xenstore tree and update permissions for all nodes
* if the dead domain had an ACL entry: remove it
* if the dead domain was the owner: change the owner to Dom0
This is done without quota checks or a transaction. Quota checks would
be a no-op (either the domain is dead, or it is Dom0 where they are not
enforced). Transactions are not needed, because this is all done
atomically by oxenstored's single thread.
The xenstore entries owned by the dead domain are not deleted, because
that could confuse a toolstack / backends that are still bound to it
(or generate unexpected watch events). It is the responsibility of a
toolstack to remove the xenstore entries themselves.
This is part of XSA-322.
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Juergen Gross [Tue, 15 Dec 2020 12:36:01 +0000 (13:36 +0100)]
tools/xenstore: revoke access rights for removed domains
Access rights of Xenstore nodes are per domid. Unfortunately existing
granted access rights are not removed when a domain is being destroyed.
This means that a new domain created with the same domid will inherit
the access rights to Xenstore nodes from the previous domain(s) with
the same domid.
This can be avoided by adding a generation counter to each domain.
The generation counter of the domain is set to the global generation
counter when a domain structure is being allocated. When reading or
writing a node all permissions of domains which are younger than the
node itself are dropped. This is done by flagging the related entry
as invalid in order to avoid modifying permissions in a way the user
could detect.
A special case has to be considered: for a new domain the first
Xenstore entries are already written before the domain is officially
introduced in Xenstore. In order not to drop the permissions for the
new domain a domain struct is allocated even before introduction if
the hypervisor is aware of the domain. This requires adding another
bool "introduced" to struct domain in xenstored. In order to avoid
additional padding holes convert the shutdown flag to bool, too.
As verifying permissions has its price regarding runtime add a new
quota for limiting the number of permissions an unprivileged domain
can set for a node. The default for that new quota is 5.
This is part of XSA-322.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <julien@amazon.com>
Edwin Török [Tue, 15 Dec 2020 12:35:19 +0000 (13:35 +0100)]
tools/ocaml/xenstored: add xenstored.conf flag to turn off watch permission checks
There are flags to turn off quotas and the permission system, so add one
that turns off the newly introduced watch permission checks as well.
This is part of XSA-115.
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Tue, 15 Dec 2020 12:35:16 +0000 (13:35 +0100)]
tools/ocaml/xenstored: avoid watch events for nodes without access
Today watch events are sent regardless of the access rights of the
node the event is sent for. This enables any guest to e.g. setup a
watch for "/" in order to have a detailed record of all Xenstore
modifications.
Modify that by sending only watch events for nodes that the watcher
has a chance to see otherwise (either via direct reads or by querying
the children of a node). This includes cases where the visibility of
a node for a watcher is changing (permissions being removed).
Permissions for nodes are looked up either in the old (pre
transaction/command) or current trees (post transaction). If
permissions are changed multiple times in a transaction only the final
version is checked, because considering a transaction atomic the
individual permission changes would not be noticable to an outside
observer.
Two trees are only needed for set_perms: here we can either notice the
node disappearing (if we loose permission), appearing
(if we gain permission), or changing (if we preserve permission).
RM needs to only look at the old tree: in the new tree the node would be
gone, or could have different permissions if it was recreated (the
recreation would get its own watch fired).
Inside a tree we lookup the watch path's parent, and then the watch path
child itself. This gets us 4 sets of permissions in worst case, and if
either of these allows a watch, then we permit it to fire. The
permission lookups are done without logging the failures, otherwise we'd
get confusing errors about permission denied for some paths, but a watch
still firing. The actual result is logged in xenstored-access log:
'w event ...' as usual if watch was fired
'w notfired...' if the watch was not fired, together with path and
permission set to help in troubleshooting
Adding a watch bypasses permission checks and always fires the watch
once immediately. This is consistent with the specification, and no
information is gained (the watch is fired both if the path exists or
doesn't, and both if you have or don't have access, i.e. it reflects the
path a domain gave it back to that domain).
There are some semantic changes here:
* Write+rm in a single transaction of the same path is unobservable
now via watches: both before and after a transaction the path
doesn't exist, thus both tree lookups come up with the empty
permission set, and noone, not even Dom0 can see this. This is
consistent with transaction atomicity though.
* Similar to above if we temporarily grant and then revoke permission
on a path any watches fired inbetween are ignored as well
* There is a new log event (w notfired) which shows the permission set
of the path, and the path.
* Watches on paths that a domain doesn't have access to are now not
seen, which is the purpose of the security fix.
This is part of XSA-115.
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Tue, 15 Dec 2020 12:35:13 +0000 (13:35 +0100)]
tools/ocaml/xenstored: introduce permissions for special watches
The special watches "@introduceDomain" and "@releaseDomain" should be
allowed for privileged callers only, as they allow to gain information
about presence of other guests on the host. So send watch events for
those watches via privileged connections only.
Start to address this by treating the special watches as regular nodes
in the tree, which gives them normal semantics for permissions. A later
change will restrict the handling, so that they can't be listed, etc.
This is part of XSA-115.
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>