Andrew Cooper [Mon, 13 Aug 2018 16:26:21 +0000 (17:26 +0100)]
x86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL bits
The VPMU_MODE_OFF early-exit in vpmu_do_wrmsr() introduced by c/s
11fe998e56 bypasses all reserved bit checking in the general case. As a
result, a guest can enable BTS when it shouldn't be permitted to, and
lock up the entire host.
With vPMU active (not a security supported configuration, but useful for
debugging), the reserved bit checking in broken, caused by the original
BTS changeset
1a8aa75ed.
From a correctness standpoint, it is not possible to have two different
pieces of code responsible for different parts of value checking, if
there isn't an accumulation of bits which have been checked. A
practical upshot of this is that a guest can set any value it
wishes (usually resulting in a vmentry failure for bad guest state).
Therefore, fix this by implementing all the reserved bit checking in the
main MSR_DEBUGCTL block, and removing all handling of DEBUGCTL from the
vPMU MSR logic.
This is XSA-269.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Stefano Stabellini [Mon, 13 Aug 2018 16:25:51 +0000 (17:25 +0100)]
ARM: disable grant table v2
It was never expected to work, the implementation is incomplete.
As a side effect, it also prevents guests from triggering a
"BUG_ON(page_get_owner(pg) != d)" in gnttab_unpopulate_status_frames().
This is XSA-268.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 9 Aug 2018 16:22:17 +0000 (17:22 +0100)]
x86/spec-ctrl: Yet more fixes for xpti= parsing
As it currently stands, 'xpti=dom0' is indistinguishable from the default
value, which means it will be overridden by ARCH_CAPABILITIES_RDCL_NO on fixed
hardware.
Switch opt_xpti to use -1 as a default like all our other related options, and
clobber it as soon as we have a string to parse.
In addition, 'xpti' alone should be interpreted in its positive boolean form,
rather than resulting in a parse error.
(XEN) parameter "xpti" has invalid value "", rc=-22!
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Thu, 9 Aug 2018 09:59:41 +0000 (10:59 +0100)]
tools/libxenctrl: use new xenforeignmemory API to seed grant table
A previous patch added support for priv-mapping guest resources directly
(rather than having to foreign-map, which requires P2M modification for
HVM guests).
This patch makes use of the new API to seed the guest grant table unless
the underlying infrastructure (i.e. privcmd) doesn't support it, in which
case the old scheme is used.
NOTE: The call to xc_dom_gnttab_hvm_seed() in hvm_build_set_params() was
actually unnecessary, as the grant table has already been seeded
by a prior call to xc_dom_gnttab_init() made by libxl__build_dom().
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Thu, 9 Aug 2018 09:59:40 +0000 (10:59 +0100)]
common: add a new mappable resource type: XENMEM_resource_grant_table
This patch allows grant table frames to be mapped using the
XENMEM_acquire_resource memory op.
NOTE: This patch expands the on-stack mfn_list array in acquire_resource()
but it is still small enough to remain on-stack.
NOTE: This patch also removes a bogus comment above the
grant_to_status_frames() function.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[Rebase over "Explicitly default to gnttab v1 during domain creation"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 8 Aug 2018 14:54:30 +0000 (15:54 +0100)]
common/gnttab: Explicitly default to gnttab v1 during domain creation
For reasons which appear to be exclusively down to poor review of the grant
table v2 code, a grant table's version field was wasn't initialised during
creation.
A number of problems (including XSAs) have occurred in the past trying trying
to use a grant table which hasn't been properly set up, and various areas of
the code cope with v0 by defaulting to v1.
In particular, the toolstack using GNTTABOP_setup_table to be able to fill in
the store/console grants has a side effect of switching to v1.
In hindsight however, this "fixup if we see 0" is a very poor, with a
substantial degree of risk. Explicitly default to grant table v1 during
domain create, and let the rest of the code work safely in the knowledge that
the version is sensibly set.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 6 Aug 2018 09:11:00 +0000 (09:11 +0000)]
x86/vlapic: Bugfixes and improvements to vlapic_{read,write}()
Firstly, there is no 'offset' boundary check on the non-32-bit write path
before the call to vlapic_read_aligned(), which allows an attacker to read
beyond the end of vlapic->regs->data[], which is only 1024 bytes long.
However, as the backing memory is a domheap page, and misaligned accesses get
chunked down to single bytes across page boundaries, I can't spot any
XSA-worthy problems which occur from the overrun.
On real hardware, bad accesses don't instantly crash the machine. Their
behaviour is undefined, but the domain_crash() prohibits sensible testing.
Behave more like other x86 MMIO and terminate bad accesses with appropriate
defaults.
While making these changes, clean up and simplify the the smaller-access
handling. In particular, avoid pointer based mechansims for 1/2-byte reads so
as to avoid forcing the value to be spilled to the stack.
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-175 (-175)
function old new delta
vlapic_read 211 142 -69
vlapic_write 304 198 -106
Finally, there are a plethora of read/write functions in the vlapic namespace,
so rename these to vlapic_mmio_{read,write}() to make their purpose more
clear.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Wei Liu [Tue, 7 Aug 2018 10:00:50 +0000 (11:00 +0100)]
x86: move arch_evtchn_inject to x86 common code
It is not specific to HVM. It just so happens that PV doesn't need
special handling.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Tue, 7 Aug 2018 10:00:45 +0000 (11:00 +0100)]
x86: add missing "inline" keyword
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Tue, 7 Aug 2018 10:00:44 +0000 (11:00 +0100)]
x86: put compat.o and x86_64/compat.o under CONFIG_PV
They contain code for compat hypercall for PV guests.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 3 Aug 2018 15:40:31 +0000 (17:40 +0200)]
drop {,acpi_}reserve_bootmem()
Both are entirely unused (to be fair, reserve_bootmem() has a use inside
an "#if 0" section in x86's mpparse.c, but if we were to re-enable that
code, it would need doing differently anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Alexandru Isaila [Fri, 3 Aug 2018 15:39:31 +0000 (17:39 +0200)]
x86/hvm: Drop hvm_sr_handlers initializer
This initializer is flawed and only sets .name of array entry 0
to a non-NULL string.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Fri, 3 Aug 2018 14:46:49 +0000 (09:46 -0500)]
automation: ensure created are not owned as root
By default the container runs as the root user and since the source tree
is bind mounted into the container, any file is created and owned by the
root user which harms ergonomics when working outside of the container
environment. This maps the root user within the container to the uid of
the user outside of the container so files are not owned by root.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Fri, 3 Aug 2018 14:46:48 +0000 (09:46 -0500)]
automation: remove dead code from containerize
This is more dead code.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Fri, 3 Aug 2018 14:46:47 +0000 (09:46 -0500)]
automation: drop container name from containerize
This was something that existed for some scripting support for a totally
unrelated project and when I copied this script I failed to remove it so
this removes it. Build containers for Xen are best as ephemeral
environments and should just utilizes Docker's default container naming
behavior.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Fri, 3 Aug 2018 14:46:46 +0000 (09:46 -0500)]
automation: standardize containerize env names
Standardized all the environment variable names that the containerize
script uses to start with CONTAINER_
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Stefano Stabellini [Tue, 31 Jul 2018 15:24:01 +0000 (08:24 -0700)]
xen: specify support for EXPERT and DEBUG Kconfig options
Add a clear statement about them, reflecting the current security
support status of Kconfig options (no changes to current policies).
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: George.Dunlap@eu.citrix.com
CC: Ian.Jackson@eu.citrix.com
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
---
Changes in v7:
- talk about EXPERT and DEBUG rather than CONFIG_EXPERT and CONFIG_DEBUG
Stefano Stabellini [Tue, 31 Jul 2018 15:23:01 +0000 (08:23 -0700)]
xen: add cloc target
Add a Xen build target to count the lines of code of the source files
built. Uses `cloc' to do the job.
With Xen on ARM taking off in embedded, IoT, and automotive, we are
seeing more and more uses of Xen in constrained environments. Users and
system integrators want the smallest Xen and Dom0 configurations. Some
of these deployments require certifications, where you definitely want
the smallest lines of code count. I provided this patch to give us the
lines of code count for that purpose.
Use the .o.d files to account for all the built source files. Generate a
list for the `cloc' utility and invoke `cloc'.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v4:
- use grep regex to get multiple source files from .d files
Changes in v3:
- remove build as dependecy for the cloc target
Changes in v2:
- change implementation to use .o.d to find built source files
Stefano Stabellini [Tue, 31 Jul 2018 15:22:01 +0000 (08:22 -0700)]
xen: add per-platform defaults for NR_CPUS
Add specific per-platform defaults for NR_CPUS. Note that the order of
the defaults matter: they need to go first, otherwise the generic
defaults will be applied.
This is done so that Xen builds customized for a specific hardware
platform can have the right NR_CPUS number.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v6:
- remove useless additional default for ALL
Stefano Stabellini [Tue, 31 Jul 2018 15:21:01 +0000 (08:21 -0700)]
arm: add ALL_PLAT, QEMU, Rcar3 and MPSoC configs
Add a "Platform Support" choice with four kconfig options: QEMU, RCAR3,
MPSOC and ALL_PLAT. They enable the required options for their hardware
platform. ALL_PLAT enables all available platforms and it's the default.
It doesn't automatically select any of the related drivers, otherwise
they cannot be disabled. ALL_PLAT is implemented by using hidden options
with default values depending on ALL_PLAT.
In the case of the MPSOC that has a platform file under
arch/arm/platforms/, build the file if MPSOC.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
CC: artem_mygaiev@epam.com
CC: volodymyr_babchuk@epam.com
---
Changes in v8:
- remove QEMU_PLATFORM and RCAR3_PLATFORM that are currently unused
- remove selects from ALL
- rename ALL to ALL_PLAT
- introduce ALL64_PLAT and ALL32_PLAT
Changes in v5:
- turn platform support into a choice
- add ALL
Changes in v4:
- fix GICv3/GICV3
- default y to all options
- build xilinx-zynqmp if MPSOC
Stefano Stabellini [Tue, 31 Jul 2018 15:20:01 +0000 (08:20 -0700)]
arm: add a tiny kconfig configuration
Add a tiny kconfig configuration. Enabled only the credit scheduler.
It only carries non-default options (use make menuconfig or make
olddefconfig to produce a complete .config file).
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v7:
- remove NULL because it is still experimental
Stefano Stabellini [Tue, 31 Jul 2018 15:19:01 +0000 (08:19 -0700)]
arm: make it possible to disable the SMMU driver
Introduce a Kconfig option for the ARM SMMUv1 and SMMUv2 driver.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: jbeulich@suse.com
---
Changes in v3:
- rename SMMUv2 to ARM_SMMU
- improve help message
- use if ARM
Changes in v2:
- rename HAS_SMMUv2 to SMMUv2
- move SMMUv2 to xen/drivers/passthrough/Kconfig
Stefano Stabellini [Tue, 31 Jul 2018 15:18:01 +0000 (08:18 -0700)]
make it possible to enable/disable UART drivers
All the UART drivers are silent options. Add one line descriptions so
that can be de/selected via menuconfig.
Add an x86 dependency to HAS_EHCI: EHCI PCI has not been used on ARM. In
fact, it depends on PCI, and moreover we have drivers for several
embedded UARTs for various ARM boards.
NS16550 remains not selectable on x86.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v4:
- improve commit message
- remove prompt for HAS_EHCI
Changes in v3:
- NS16550 prompt if ARM
Changes in v2:
- make HAS_EHCI depend on x86
Stefano Stabellini [Tue, 31 Jul 2018 15:17:01 +0000 (08:17 -0700)]
Make MEM_ACCESS configurable
Select MEM_ACCESS_ALWAYS_ON on x86 to mark that MEM_ACCESS is not
configurable on x86. Avoid selecting it on ARM.
Rename HAS_MEM_ACCESS to MEM_ACCESS everywhere. Add a prompt and a
description to MEM_ACCESS in xen/common/Kconfig.
The result is that the user-visible option is MEM_ACCESS, and it is
configurable only on ARM (disabled by default). At the moment the
arch-specific mem_access code remains enabled on ARM, even with
MEM_ACCESS=y.
The purpose is to reduce code size. The option doesn't depend on EXPERT
because it would be nice to ecurity-support configurations without
MEM_ACCESS and a non-expert should be able to disable it.
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: dgdegra@tycho.nsa.gov
CC: andrew.cooper3@citrix.com
CC: George.Dunlap@eu.citrix.com
CC: ian.jackson@eu.citrix.com
CC: jbeulich@suse.com
CC: julien.grall@arm.com
CC: konrad.wilk@oracle.com
CC: sstabellini@kernel.org
CC: tim@xen.org
CC: wei.liu2@citrix.com
---
Changes in v5:
- change MEM_ACCESS_ALWAYS_ON to bool
- change default for MEM_ACCESS, default y if MEM_ACCESS_ALWAYS_ON
Changes in v4:
- remove HAS_MEM_ACCESS
- move MEM_ACCESS_ALWAYS_ON to common
- combile default and bool to def_bool
Changes in v3:
- keep HAS_MEM_ACCESS to mark that an arch can do MEM_ACCESS
- introduce MEM_ACCESS_ALWAYS_ON
- the main MEM_ACCESS option is in xen/common/Kconfig
Changes in v2:
- patch added
Stefano Stabellini [Tue, 31 Jul 2018 15:16:01 +0000 (08:16 -0700)]
arm: rename HAS_GICV3 to GICV3
HAS_GICV3 has become selectable by the user. To mark the change, rename
the option from HAS_GICV3 to GICV3.
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v3:
- no changes
Changes in v2:
- patch added
Stefano Stabellini [Tue, 31 Jul 2018 15:15:01 +0000 (08:15 -0700)]
arm: make it possible to disable HAS_GICV3
Today it is a silent option. This patch adds a one line description and
makes it optional.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: George.Dunlap@eu.citrix.com
CC: Ian.Jackson@eu.citrix.com
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v3:
- remove any changes to MEM_ACCESS
- update commit message
Changes in v2:
- make HAS_GICv3 depend on ARM_64
- remove modifications to ARM_HDLCD kconfig, it has been removed
George Dunlap [Thu, 2 Aug 2018 10:12:43 +0000 (12:12 +0200)]
x86/altp2m: make sure EPTP_INDEX is up-to-date when enabling #VE
vmx_vmexit_handler() assumes that if
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, that the value in
EPTP_INDEX is valid. Unfortunately, the function which sets this bit
(vmx_vcpu_update_vmfunc_ve) doesn't actually set EPTP_INDEX; it will
only be set the next time vmx_vcpu_update_eptp() is called.
This means that if a vcpu makes a vmexit between these two points, the
EPTP_INDEX it reads will be invalid. The first time this race happens
for a domain, EPTP_INDEX will most likely be zero, which is the index
for the "host" p2m -- and thus is often correct. But the second time
this race happens, the value will typically be INVALID_ALTP2M, which
will hit the following BUG:
BUG_ON(idx >= MAX_ALTP2M);
Worse, if for some reason the current altp2m was *not* `0` during this
window (say, because a toolstack changed the VM to a different view),
then the accounting of active vcpus for an altp2m will be thrown off.
Fix this by always updating EPTP_INDEX to the current altp2m index
when enabling #VE.
Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 2 Aug 2018 10:12:07 +0000 (12:12 +0200)]
x86/cpuidle: replace a pointless NULL check
The address of an array slot can't be NULL. Instead add a bounds check
to make sure the array indexing is valid (the check is against 2 since
slot zero of the array - corresponding to C0 - is of no interest here).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Thu, 2 Aug 2018 10:11:03 +0000 (12:11 +0200)]
vtd: cleanup vtd_set_hwdom_mapping after ia64 removal
Remove the handling for different page sizes now that ia64 is gone.
No functional change.
Suggested by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Wed, 24 Jan 2018 16:59:42 +0000 (16:59 +0000)]
xen: Remove domain_crash_synchronous() completely
domain_crash_synchronous() is unsafe to use in general as it may leave
spinlocks held, temporary memory allocated, etc.
With domain_crash_synchronous() removed from the ARM code in 4.11, take the
opportunity to remove the infrastructure completely by opencoding the softirq
loop in the remaining callsites, all of which are destined for deletion.
None of these sites are at risk of having a pending ioreq to qemu, which means
that the vcpu_end_shutdown_deferral() isn't necessary.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Tue, 6 Feb 2018 12:01:08 +0000 (12:01 +0000)]
x86/vmx: Avoid using domain_crash_syncrhonous() in vmx_vmentry_failure()
There is no need for the syncrhonous varient, as the vmentry failure path can
just return to processing softirqs.
This is in aid of trying to remove domain_crash_syncrhonous() from the
codebase.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Wed, 1 Aug 2018 11:47:50 +0000 (12:47 +0100)]
x86/vmx: Avoid hitting BUG_ON() after EPTP-related domain_crash()
If the EPTP pointer can't be located in the altp2m list, the domain
is (legitimately) crashed.
Under those circumstances, execution will continue and guarentee to hit the
BUG_ON(idx >= MAX_ALTP2M) (unfortunately, just out of context).
Return from vmx_vmexit_handler() after the domain_crash(), which also has the
side effect of reentering the scheduler more promptly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 20:19:05 +0000 (22:19 +0200)]
tools/gdbsx: use inttypes.h instead of custom macros
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: fix up patch ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 02:30:42 +0000 (04:30 +0200)]
tools/gdbsx: fix 'g' packet response for 64bit guests
gdb 8.0 fixed bounds checking for 'g' packet (commit
9dc193c3be85aafa60ceff57d3b0430af607b4ce "Check for truncated
registers in process_g_packet"). This revealed that gdbsx did
not properly formatted 'g' packet - segment registers and eflags are
expected to be 32-bit fields in the response (according to
gdb/features/i386/64bit-core.xml in gdb sources). Specific error is:
Truncated register 26 in remote 'g' packet
instead of silently truncating part of register.
Additionally, it looks like segment registers of 64bit guests were never
reported correctly, because of type mismatch.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Simon Gaiser [Tue, 31 Jul 2018 02:56:54 +0000 (04:56 +0200)]
xenstore-client: Add option for raw in-/output
Parsing/generating the escape sequences used by xenstore-client is non
trivial. So make scripting (for use in stubdom) easier by adding a raw
option.
[added man page entries, facor out expand_buffer]
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 02:56:53 +0000 (04:56 +0200)]
docs: add xenstore-read and xenstore-write man pages
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 03:15:32 +0000 (05:15 +0200)]
xenconsole: add option to avoid escape sequences in log
Add --replace-escape option to xenconsoled, which replaces ESC with
'.' in console output written to log file. This makes it slightly safer
to do tail -f on a console output of untrusted guest.
The pty output is unaffected by this option.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: move variables into a narrower scope ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Wed, 1 Aug 2018 09:03:07 +0000 (10:03 +0100)]
xen: clean up altp2m op comment
Delete trailing spaces and refer to XSM instead of an internal
function in the public header.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
George Dunlap [Tue, 31 Jul 2018 14:17:21 +0000 (15:17 +0100)]
hvm/altp2m: Clarify the proper way to extend the altp2m interface
The altp2m functionality was originally envisioned to be used in
several different configurations, one of which was a single in-guest
agent that had full operational control of altp2m. This required the
single hypercall to be an HVMOP rather than a DOMCTL, since HVM guests
are not allowed to make DOMCTLs. Access to this HVMOP is controlled
by a per-domain HVM_PARAM, and defaults to 'off'.
Exposing the altp2m functionality to the guest was controversial at
the time, but was ultimately accepted. The fact that altp2m is an
HVMOP rather than a DOMCTL has caused some problems, however, for
those moving forward trying to extend the interface: Extending the
interface even for the 'external' use case now means extending an
HVMOP, which implicitly extends the surface of attack for the
'internal' use case as well. The result has been that every addition
to this interface has also been controversial.
Settle the controversy once and for all by documenting 1) the purpose
of the altp2m interface, and 2) how to extend it. In particular:
* Specify that the fully in-guest agent is a target use case
* Specify that all extensions to altp2m functionality should be subops
of the HVMOP hypercall
* Specify that new subops should be enabled in ALTP2M_mixed mode by
default, but that this mode has not been evaluated for safety.
Hopefully this will allow the altp2m interface to be developed further
without unnecessary controversy.
Further discussion:
As far as I can tell there are three possible solutions to this
controversy.
A. Remove the 'internal' functionality as a target by converting the
current HVMOP into a DOMCTL.
B. Have two hypercalls -- an HVMOP which contains functionality
expected to be used by the 'internal' agent, and a DOMCTL for
functionality which is expected to be used only be the 'external'
agent.
C. Agree to add all new subops to the current hypercall (HVMOP), even
if we're not sure if they should be exposed to the guest.
I think A is a terrible idea. Having a single in-guest agent is a
reasonable design choice, and apparently it was even implemented at
some point; we should make it straightforward for someone in the
future to pick up the work if they want to.
I think B is also a bad idea. The people extending it at the moment
are primarily concerned with the 'external' use case. There is nobody
around to represent whether new functionality should end up in the
HVMOP or the DOMCTL, which means that by default it will end up in the
DOMCTL. If it is discovered, afterwards, that the new operations
*would* be safe and useful for the 'internal' use case, then we will
either have to duplicate them inside the HVMOP (which would be
terrible) or move the operation from the DOMCTL to the HVMOP (which
would make coding an agent against several versions a mess).
It just makes more sense to have all the altp2m operations in a single
place, and a simple way to control whether they're available to the
'internal' use case or not. As such, I am proposing 'C'.
Even within that, we have several options as far as what to do with
the current interface:
C1: Audit the current subops and make a blacklist of subops not
suitable for exposure to the guest. Future subops should be on the
blacklist unless they have been evaluated as safe for exposure.
C2: Don't blacklist the current subops, but require that all future
subops be blacklisted unless they have been evaluated as safe for
exposure.
C3: Don't blacklist current or future subops for the present; just
document that they need to be evaluated (and some potentially
blacklisted) before being exposed to a guest in a safety-critical
environment.
C1 would be ideal, but there's nobody at present to do the work.
Given that, C3 has been seen as the best solution in discussion.
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 31 Jul 2018 15:12:35 +0000 (17:12 +0200)]
x86/xstate: correct logging in handle_xsetbv()
Correct a disagreement between text and logged value.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Norbert Manthey [Tue, 31 Jul 2018 15:11:36 +0000 (17:11 +0200)]
memory: fix label syntax
When compiling this file with gcc, the compiler happily accepts the
sequence of a label followed by an attribute. However, this sequence does
not follow the gcc documentation. Hence, other compilers might stumble
upon this statement.
To be able to compile Xen with goto-cc (the compiler of the CPROVER tool
suite), the missing semicolon is added in this commit.
Reported-by: Elizabeth Polgreen <polgreen@amazon.de>
Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 31 Jul 2018 08:25:36 +0000 (10:25 +0200)]
iommu: remove unneeded return from iommu_hwdom_init
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 31 Jul 2018 08:25:06 +0000 (10:25 +0200)]
x86/efi: split compiler vs linker support
So that an ELF binary with support for EFI services will be built when
the compiler supports the MS ABI, regardless of the linker support for
PE.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Roger Pau Monné [Tue, 31 Jul 2018 08:24:22 +0000 (10:24 +0200)]
x86/efi: move the logic to detect PE build support
So that it can be used by other components apart from the efi specific
code. By moving the detection code creating a dummy efi/disabled file
can be avoided.
This is required so that the conditional used to define the efi symbol
in the linker script can be removed and instead the definition of the
efi symbol can be guarded using the preprocessor.
The motivation behind this change is to be able to build Xen using lld
(the LLVM linker), that at least on version 6.0.0 doesn't work
properly with a DEFINED being used in a conditional expression:
ld -melf_x86_64_fbsd -T xen.lds -N prelink.o --build-id=sha1 \
/root/src/xen/xen/common/symbols-dummy.o -o /root/src/xen/xen/.xen-syms.0
ld: error: xen.lds:233: symbol not found: efi
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Roger Pau Monné [Tue, 31 Jul 2018 08:23:37 +0000 (10:23 +0200)]
xen/compiler: introduce a define for weak symbols
And replace the open-coded versions already in tree. No functional
change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reivewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Doug Goldstein [Sun, 29 Jul 2018 21:53:16 +0000 (16:53 -0500)]
ci: enable builds with CentOS 7.x
Add the CentOS 7.x images to be used for build testing.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Sun, 29 Jul 2018 21:53:15 +0000 (16:53 -0500)]
automation: add CentOS 7.x image
This image will always track the latest CentOS 7.x release. Add this
container to containerize for easy access.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:48 +0000 (15:05 +0100)]
libxl_qmp: Add a warning to not trust QEMU
... even if it is not the case for the current code.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:47 +0000 (15:05 +0100)]
libxl_qmp: Move the buffer realloc to the same scope level as read
In qmp_next(), the inner loop should only try to parse messages from
QMP, if there is more than one.
The handling of the receive buffer ('incomplete'), should be done at the
same scope level as read(). It doesn't need to be handle more that once
after a read.
Before this patch, when on message what handled, the inner loop would
restart by adding the 'buffer' into 'incomplete' (after reallocation).
Since 'rd' was not reset, the buffer would be strcat a second time.
After that, the stream from the QMP server would have syntax error, and
the parsor would throw errors.
This is unlikely to happen as the receive buffer is very large. And
receiving two messages in a row is unlikely. In the current case, this
could be an event and a response to a command.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:46 +0000 (15:05 +0100)]
libxl_json: fix build with DEBUG_ANSWER
Also replace LIBXL__LOG_DEBUG by XTL_DEBUG, because it's shorter and
more often used in libxl.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:45 +0000 (15:05 +0100)]
libxl_qmp: Fix use of DEBUG_RECEIVED
This patch fix complilation error with #define DEBUG_RECEIVED of the
macro DEBUG_REPORT_RECEIVED.
error: field precision specifier ‘.*’ expects argument of type ‘int’, but argument 9 has type ‘ssize_t {aka long int}’
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:44 +0000 (15:05 +0100)]
libxl_qmp: Documentation of the logic of the QMP client
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:43 +0000 (15:05 +0100)]
libxl_event: Fix DEBUG prints
The libxl__log() call was missing the domid.
The macro DBG is using LIBXL__LOG which rely on a "gc". Add a GC where
needed.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Mon, 23 Oct 2017 15:40:57 +0000 (16:40 +0100)]
automation: introduce a script for build test
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Wei Liu [Mon, 23 Jul 2018 16:57:34 +0000 (17:57 +0100)]
automation: add debian unstable images
This will get us the latest toolchain available in Debian.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Juergen Gross [Wed, 25 Jul 2018 14:50:40 +0000 (16:50 +0200)]
tools/helpers: don't hardcode domain type for dom0 and xenstore domain
Today when setting up a minimal domain configuration file for dom0 and
eventually xenstore-domain the domain type is harcoded as PV. Change
that by asking the hypervisor for the correct type.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Wed, 25 Jul 2018 14:38:23 +0000 (15:38 +0100)]
Config.mk: update OVMF changeset
Simply catching up with upstream.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Mon, 23 Jul 2018 16:00:32 +0000 (18:00 +0200)]
docs: use the make wildcard function instead of find
The regexp used with find in order to list the man pages doesn't work
with FreeBSD find, so use a wildcard instead. No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Mon, 23 Jul 2018 08:04:46 +0000 (09:04 +0100)]
automation: build with 32 bit stretch
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Juergen Gross [Tue, 10 Jul 2018 08:31:51 +0000 (10:31 +0200)]
xen: correct DEFCONFIG_LIST Kconfig item
The default value of DEFCONFIG_LIST is wrong: it should be the value of
the configured ARCH_DEFCONFIG item, not the string "$ARCH_DEFCONFIG".
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:40 +0000 (19:07 +0300)]
libxl: add LIBXL_HAVE_EXTENDED_VKB define
LIBXL_HAVE_EXTENDED_VKB define indicates that libxl_device_vkb structure has
extended fields.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:39 +0000 (19:07 +0300)]
libxl: vkb add extended parameters
Add parsing and adding to xen store following extended parameters:
* feature-disable-keyboard
* feature-disable-pointer
* feature-abs-pointer
* feature-multi-touch
* feature-raw-pointer
* width
* height
* multi-touch-width
* multi-touch-height
* multi-touch-num-contacts
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:38 +0000 (19:07 +0300)]
docs: add vkb device to xl.cfg and xl
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:37 +0000 (19:07 +0300)]
xl: add vkb config parser and CLI
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:36 +0000 (19:07 +0300)]
libxl: vkb add list and info functions
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:35 +0000 (19:07 +0300)]
libxl: add backend type and id to vkb
New field backend_type is added to vkb device in order to have QEMU and user
space backend simultaneously. Each vkb backend shall read appropriate XS entry
and service only own frontends. Id is a string field which used by the backend
to indentify the frontend.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:34 +0000 (19:07 +0300)]
libxl: move vkb device to libxl_vkb.c
Logically it is better to move vkb to separate file as vkb device used not only
by vfb and console.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monné [Tue, 24 Jul 2018 13:55:39 +0000 (15:55 +0200)]
x86/pvh: change the order of the iommu initialization for Dom0
The iommu initialization will also create MMIO mappings in the Dom0
p2m, so the paging memory pool needs to be allocated or else iommu
initialization will fail.
Move the call to init the iommu after the Dom0 p2m has been setup in
order to solve this.
Note that issues caused by this wrong ordering have only been seen
when using shadow paging.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Tue, 24 Jul 2018 13:55:07 +0000 (15:55 +0200)]
x86/tboot: avoid recursive fault in early boot panic with tboot
If panic is called before init_idle_domain on a tboot-launched system,
then Xen recursively faults in write_ptbase as seen below.
(XEN) [<
ffff82d080286690>] write_ptbase+0/0x10
(XEN) [<
ffff82d0802c4c3b>] tboot_shutdown+0x6b/0x260
(XEN) [<
ffff82d08029ddac>] machine_restart+0xac/0x2d0
(XEN) [<
ffff82d080286690>] write_ptbase+0/0x10
(XEN) [<
ffff82d0802446c1>] panic+0x111/0x120
(XEN) [<
ffff82d0802a51c1>] do_general_protection+0x171/0x1f0
(XEN) [<
ffff82d080287a82>] mm.c#virt_to_xen_l2e+0x12/0x1c0
(XEN) [<
ffff82d080354720>] x86_64/entry.S#handle_exception_saved+0x66/0xa4
(XEN) [<
ffff82d080286690>] write_ptbase+0/0x10
(XEN) [<
ffff82d0802c4c3b>] tboot_shutdown+0x6b/0x260
(XEN) [<
ffff82d08029ddac>] machine_restart+0xac/0x2d0
(XEN) [<
ffff82d0802446c1>] panic+0x111/0x120
(XEN) [<
ffff82d0803c11a0>] setup.c#bootstrap_map+0/0x11a
(XEN) [<
ffff82d0803b82a0>] flask_op.c#parse_flask_param+0/0xb0
(XEN) [<
ffff82d0803c11a0>] setup.c#bootstrap_map+0/0x11a
(XEN) [<
ffff82d0803b6f6c>] xsm_multiboot_init+0x7c/0xb0
(XEN) [<
ffff82d0803c34bb>] __start_xen+0x1d2b/0x2da0
(XEN) [<
ffff82d0802000f3>] __high_start+0x53/0x60
idle_vcpu[0] is still poisoned with INVALID_VCPU, so write_ptbase faults
dereferencing the pointer. This fault calls panic and recurses through
the same code path.
If tboot_shutdown is called while idle_vcpu[0] == INVALID_VCPU, then we
are still operating with the initial page tables. Therefore changing
page tables with write_ptbase is unnecessary.
An easy way to reproduce this is to use tboot to launch an XSM-enabled
Xen without an XSM policy.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 24 Jul 2018 13:54:18 +0000 (15:54 +0200)]
x86/vhpet: add support for level triggered interrupts
Level triggered interrupts are not an optional feature of HPET, and
must be implemented in order to comply with the HPET specification.
Implement them by adding a callback to the timer which sets the
interrupt bit in the general interrupt status register. Further
interrupts (in case of periodic mode) will not be injected until the
bit is cleared.
In order to reset the interrupts when the status bit is clear Xen must
also detect accesses to such register.
While there convert tn and i in hpet_write to unsigned.
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 24 Jul 2018 13:52:47 +0000 (15:52 +0200)]
x86/vpt: add support for level interrupts
Level trigger interrupts will be asserted regardless of whether the
interrupt is masked, and thus the callback will also be executed.
Add a new 'level' parameter to create_periodic_time in order to create
level triggered timers. None of the current users of vpt are switched
to use level triggered interrupts yet.
Note that periodic level triggered interrupts are not supported. This
is because level triggered interrupts always require a deassert of the
IO-APIC pin, which should be done by the caller of vpt at which point
the caller should also reset the timer if required.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 23 Jul 2018 13:29:27 +0000 (14:29 +0100)]
x86/spec-ctrl: Fix the parsing of xpti= on fixed Intel hardware
The calls to xpti_init_default() in parse_xpti() are buggy. The CPUID data
hasn't been fetched that early, and boot_cpu_has(X86_FEATURE_ARCH_CAPS) will
always evaluate false.
As a result, the default case won't disable XPTI on Intel hardware which
advertises ARCH_CAPABILITIES_RDCL_NO.
Simplify parse_xpti() to solely the setting of opt_xpti according to the
passed string, and have init_speculation_mitigations() call
xpti_init_default() if appropiate. Drop the force parameter, and pass caps
instead, to avoid redundant re-reading of MSR_ARCH_CAPS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 20 Jul 2018 15:43:49 +0000 (15:43 +0000)]
x86/svm: Drop the suggestion of Long Mode Segment Limit support
Because of a bug in 2010, LMSL support isn't available to guests.
c/s
f2c608444 noticed but avoided fixing the issue for migration reasons. In
addition to migration problems, changes to the segmentation logic for
emulation would be needed before the feature could be enabled.
This feature is entirely unused by operating systems (probably owing to its
semantics which only cover half the segment registers), and no one has
commented on its absence from Xen. As supporting it would involve a large
amount of effort, it seems better to remove the code entirely.
If someone finds a valid usecase, we can resurrecting the code and
implementing the remaining parts, but I doubt anyone will.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 20 Jul 2018 15:42:04 +0000 (15:42 +0000)]
x86/hvm: Disallow unknown MSR_EFER bits
It turns out that nothing ever prevented HVM guests from trying to set unknown
EFER bits. Generally, this results in a vmentry failure.
For Intel hardware, all implemented bits are covered by the checks.
For AMD hardware, the only EFER bit which isn't covered by the checks is TCE
(which AFAICT is specific to AMD Fam15/16 hardware). We never advertise TCE
in CPUID, but it isn't a security problem to have TCE unexpected enabled in
guest context.
Disallow the setting of bits outside of the EFER_KNOWN_MASK, which prevents
any vmentry failures for guests, yielding #GP instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Mon, 23 Jul 2018 10:26:49 +0000 (11:26 +0100)]
ocaml: remove undefined behaviour in systemd_stubs.c
Clang complains:
systemd_stubs.c:51:8: error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value]
ret = Val_int(-1U);
^~~~~~~~~~~~
Since sd_notify_fd has a signature of unit -> unit, we simply change
the return value to Val_unit.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Mon, 23 Jul 2018 10:26:48 +0000 (11:26 +0100)]
tools/gdbsx: fix build with clang 3.8
Currently building gdbsx with clang 3.8 gives the following errors:
xg_main.c:783:17: error: 'aligned' attribute ignored when parsing type [-Werror,-Wignored-attributes]
iop->uva = (uint64_aligned_t)((unsigned long)tobuf);
^~~~~~~~~~~~~~~~
/builds/liuw/xen/tools/debugger/gdbsx/xg/../../../../tools/include/xen/arch-x86/xen-x86_32.h:105:50: note: expanded from macro 'uint64_aligned_t'
^~~~~~~~~~
xg_main.c:816:17: error: 'aligned' attribute ignored when parsing type [-Werror,-Wignored-attributes]
iop->uva = (uint64_aligned_t)((unsigned long)frombuf);
^~~~~~~~~~~~~~~~
/builds/liuw/xen/tools/debugger/gdbsx/xg/../../../../tools/include/xen/arch-x86/xen-x86_32.h:105:50: note: expanded from macro 'uint64_aligned_t'
According to https://bugs.llvm.org/show_bug.cgi?id=11071, this issue has
been fixed in clang. But we're not going to get that in 3.8.
Explicitly disable that warning to fix the build.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Razvan Cojocaru [Thu, 28 Jun 2018 07:54:01 +0000 (10:54 +0300)]
xen/altp2m: set access_required properly for all altp2ms
For the hostp2m, access_required starts off as 0, then it can be
set with xc_domain_set_access_required(). However, all the altp2ms
set it to 1 on init, and ignore both the hostp2m and the hypercall.
This patch sets access_required to the value from the hostp2m
on altp2m init, and propagates the values received via hypercall
to all the active altp2ms, when applicable.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Mon, 19 Mar 2018 15:33:32 +0000 (15:33 +0000)]
xen/gnttab: Export opt_max_{grant,maptrack}_frames
This is to facilitate the values being passed in via domain_create(), at which
point the dom0 construction code needs to know them.
While cleaning up, drop the DEFAULT_* defines, which are only used immediately
adjacent in a context which makes it obvious that they are the defaults, and
drop the (unused) logic to allow a per-arch override of
DEFAULT_MAX_NR_GRANT_FRAMES.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 16 Mar 2018 19:16:45 +0000 (19:16 +0000)]
xen/gnttab: Remove replace_grant_supported()
It is identical on all architecture, and this is a better overall than fixing
it up to have a proper boolean return value.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Zhenzhong Duan [Fri, 20 Jul 2018 09:29:34 +0000 (02:29 -0700)]
x86/physdev: Remove redundant assignment in allocate_and_map_msi_pirq()
No functional change.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Thu, 12 Jul 2018 13:53:06 +0000 (08:53 -0500)]
scripts: add helper script to use Docker containers
This adds a script that can be used to do builds easily within the
defined containers under the automation directory. These containers live
in the public GitLab registry under the xen-project namespace. The
script can be executed a number of ways but the default is to drop you
at a bash shell within a Debian Stretch container at the top level of
the source tree.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Fri, 20 Jul 2018 08:32:42 +0000 (10:32 +0200)]
lzma: fix tools build
Remove local definition of MIN and instead include the kernel.h header
for the hypervisor build. Fixes the following error on the tools build:
In file included from xc_dom_decompress_unsafe_lzma.c:8:0:
../../xen/common/unlzma.c:33:0: error: "MIN" redefined [-Werror]
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
^
In file included from xc_private.h:43:0,
from xg_private.h:29,
from xc_dom_decompress_unsafe_lzma.c:5:
/home/osstest/build.125458.build-amd64/xen/stubdom/libxc-x86_64/../../tools/include/xen-tools/libs.h:21:0: note: this is the location of the previous definition
#define MIN(x, y) ((x) < (y) ? (x) : (y))
^
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 18 Jul 2018 11:22:55 +0000 (12:22 +0100)]
x86/xstate: Make errors in xstate calculations more obvious by crashing the domain
If xcr0_max exceeds xfeature_mask, then something is broken with the CPUID
policy derivation or auditing logic. If hardware rejects new_bv, then
something is broken with Xen's xstate logic.
In both cases, crash the domain with an obvious error message, to help
highlight the issues.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 18 Jul 2018 10:56:44 +0000 (11:56 +0100)]
x86/xstate: Use a guests CPUID policy, rather than allowing all features
It turns out that Xen has never enforced that a domain remain within the
xstate features advertised in CPUID.
The check of new_bv against xfeature_mask ensures that a domain stays within
the set of features that Xen has enabled in hardware (and therefore isn't a
security problem), but this does means that attempts to level a guest for
migration safety might not be effective if the guest ignores CPUID.
Check the CPUID policy in validate_xstate() (for incoming migration) and in
handle_xsetbv() (for guest XSETBV instructions). This subsumes the PKRU check
for PV guests in handle_xsetbv() (and also demonstrates that I should have
spotted this problem while reviewing c/s
fbf9971241f).
For migration, this is correct despite the current (mis)ordering of data
because d->arch.cpuid is the applicable max policy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 29 Jun 2018 13:05:52 +0000 (13:05 +0000)]
libx86: Introduce lib/x86/msr.h and share msr_policy with userspace
To facilitate the shared Xen and toolstack code in libx86, struct msr_policy
needs to be available in the same way as struct cpuid_policy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:48 +0000 (15:35 +0100)]
libx86: introduce a libx86 shared library
Move x86_cpuid_lookup_deep_deps() into the shared library, removing the
individual copies from the hypervisor and libxc respectively.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:46 +0000 (16:35 +0200)]
libx86: Share struct cpuid_policy with userspace
Both Xen and the toolstack have need of the same logic when it comes to
manipulation and checking of the CPUID and MSR values offered to guests. To
that end, libx86 is being introduced to allow Xen and the toolstack to share a
single implementation, rather than duplicating the logic.
No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:46 +0000 (16:35 +0200)]
libx86: generate cpuid-autogen.h in the libx86 include dir
This avoids all users needing to opencode local generation of the file.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 28 Jun 2018 11:00:44 +0000 (11:00 +0000)]
libx86: Introduce lib/x86/cpuid.h
Begin to untangle the header dependency tangle by moving definition of
struct cpuid_leaf out of x86_emulate.h into the new cpuid.h.
Additionally, plumb the header through to libxc. This is technically a
redundant include at this point, but it helps build-test the later changes,
and will be used eventually.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 May 2018 14:19:05 +0000 (14:19 +0000)]
x86/vmx: Don't clobber %dr6 while debugging state is lazy
c/s
4f36452b63 introduced a write to %dr6 in the #DB intercept case, but the
guests debug registers may be lazy at this point, at which point the guests
later attempt to read %dr6 will discard this value and use the older stale
value.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 19 Jul 2018 10:33:38 +0000 (04:33 -0600)]
cpumask: tidy {,z}alloc_cpumask_var()
Drop unnecessary casts and use bool in favor of bool_t.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 10:32:43 +0000 (04:32 -0600)]
x86: command line option to avoid use of secondary hyper-threads
Shared resources (L1 cache and TLB in particular) present a risk of
information leak via side channels. Provide a means to avoid use of
hyperthreads in such cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 10:32:06 +0000 (04:32 -0600)]
x86: possibly bring up all CPUs even if not all are supposed to be used
Reportedly Intel CPUs which can't broadcast #MC to all targeted
cores/threads because some have CR4.MCE clear will shut down. Therefore
we want to keep CR4.MCE enabled when offlining a CPU, and we need to
bring up all CPUs in order to be able to set CR4.MCE in the first place.
The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
altogether right here.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 10:31:07 +0000 (04:31 -0600)]
x86: distinguish CPU offlining from CPU removal
In order to be able to service #MC on offlined CPUs, the GDT, IDT,
stack, and per-CPU data (which includes the TSS) need to be kept
allocated. They should only be freed upon CPU removal (which we
currently don't support, so some code is becoming effectively dead for
the moment).
Note that for now park_offline_cpus doesn't get set to true anywhere -
this is going to be the subject of a subsequent patch.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 09:54:45 +0000 (11:54 +0200)]
VMX: fix vmx_{find,del}_msr() build
Older gcc at -O2 (and perhaps higher) does not recognize that apparently
uninitialized variables aren't really uninitialized. Pull out the
assignments used by two of the three case blocks and make them
initializers of the variables, as I think I had suggested during review.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Christopher Clark [Wed, 18 Jul 2018 22:22:17 +0000 (15:22 -0700)]
tools/xentop : replace use of deprecated vwprintw
gcc-8.1 complains:
| xentop.c: In function 'print':
| xentop.c:304:4: error: 'vwprintw' is deprecated [-Werror=deprecated-declarations]
| vwprintw(stdscr, (curses_str_t)fmt, args);
| ^~~~~~~~
vw_printw (note the underscore) is a non-deprecated alternative.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 07:42:42 +0000 (09:42 +0200)]
x86/AMD: distinguish compute units from hyper-threads
Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
the difference is not very big. The most relevant change (requiring some
code restructuring) is that the topoext feature no longer means there is
a valid CU ID.
Take the opportunity and convert wrongly plain int variables in
set_cpu_sibling_map() to unsigned int.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 07:41:55 +0000 (09:41 +0200)]
cpupools: fix state when downing a CPU failed
While I've run into the issue with further patches in place which no
longer guarantee the per-CPU area to start out as all zeros, the
CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
of schedule_cpu_switch() will trigger the "c != old_pool" assertion
there.
Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
should not happen before cpu_disable_scheduler()). Clearing it in
CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
piece of code twice. Since the field's value shouldn't matter while the
CPU is offline, simply clear it (implicitly) for CPU_ONLINE and
CPU_DOWN_FAILED, but only for other than the suspend/resume case (which
gets specially handled in cpupool_cpu_remove()).
By adjusting the conditional in cpupool_cpu_add() CPU_DOWN_FAILED
handling in the suspend case should now also be handled better.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 19 Jul 2018 07:41:08 +0000 (09:41 +0200)]
x86: allow producing .i or .s for multiply compiled files
Since the generic pattern rules don't match those, explicit rules need
to be put in place for this to work.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 07:40:19 +0000 (09:40 +0200)]
x86/HVM: add wrapper for hvm_funcs.set_tsc_offset()
It's used in quite a few places, and hence doing so eases subsequent
adjustment to how these (indirect) calls are carried out.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>