xen.git
7 years agox86/HVM: correct an inverted check in hvm_load()
Jan Beulich [Fri, 17 Aug 2018 11:52:20 +0000 (13:52 +0200)]
x86/HVM: correct an inverted check in hvm_load()

Clearly we want to put a vCPU to sleep if it is _not_ already down.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: make arch_set_info_guest() match comments in load_segments()
Jan Beulich [Fri, 17 Aug 2018 11:51:27 +0000 (13:51 +0200)]
x86: make arch_set_info_guest() match comments in load_segments()

For both fs_base and gs_base_user, there are comments saying "This can
only be non-zero if selector is NULL." While save_segments() ensures
this, so far arch_set_info_guest() didn't. Make behavior consistent
(attaching comments identical to those in save_segments()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/setup: Avoid OoB E820 lookup when calculating the L1TF safe address
Andrew Cooper [Thu, 16 Aug 2018 15:26:22 +0000 (16:26 +0100)]
x86/setup: Avoid OoB E820 lookup when calculating the L1TF safe address

A number of corner cases (most obviously, no-real-mode and no Multiboot memory
map) can end up with e820_raw.nr_map being 0, at which point the L1TF
calculation will underflow.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: fix ARM build after 54ed251dc7
Jan Beulich [Thu, 16 Aug 2018 06:49:29 +0000 (00:49 -0600)]
libxl: fix ARM build after 54ed251dc7

Commit "tools: Rework xc_domain_create() to take a full
xen_domctl_createdomain"  failed to replace one further instance of
xc_config in libxl__arch_domain_save_config().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86/mmcfg: remove redundant code in pci_mmcfg_reject_broken()
Zhenzhong Duan [Thu, 16 Aug 2018 07:31:57 +0000 (09:31 +0200)]
x86/mmcfg: remove redundant code in pci_mmcfg_reject_broken()

No functional change.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab/ARM: properly implement gnttab_create_status_page()
Jan Beulich [Thu, 16 Aug 2018 07:30:59 +0000 (09:30 +0200)]
gnttab/ARM: properly implement gnttab_create_status_page()

Prevent the "BUG_ON(page_get_owner(pg) != d)" in
gnttab_unpopulate_status_frames() from triggering.

Reported-by: 王磊 <lei19.wang@samsung.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/hvm/emulate: make sure rep I/O emulation does not cross GFN boundaries
Paul Durrant [Thu, 16 Aug 2018 07:27:30 +0000 (09:27 +0200)]
x86/hvm/emulate: make sure rep I/O emulation does not cross GFN boundaries

When emulating a rep I/O operation it is possible that the ioreq will
describe a single operation that spans multiple GFNs. This is fine as long
as all those GFNs fall within an MMIO region covered by a single device
model, but unfortunately the higher levels of the emulation code do not
guarantee that. This is something that should almost certainly be fixed,
but in the meantime this patch makes sure that MMIO is truncated at GFN
boundaries and hence the appropriate device model is re-evaluated for each
target GFN.

NOTE: This patch does not deal with the case of a single MMIO operation
      spanning a GFN boundary. That is more complex to deal with and is
      deferred to a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Convert calculations to be 32-bit only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/evtchn: Pass max_evtchn_port into evtchn_init()
Andrew Cooper [Fri, 16 Mar 2018 18:27:24 +0000 (18:27 +0000)]
xen/evtchn: Pass max_evtchn_port into evtchn_init()

... rather than setting it up once domain_create() has completed.  This
involves constructing a default value for dom0.

No practical change in functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/domctl: Merge set_max_evtchn into createdomain
Andrew Cooper [Tue, 27 Feb 2018 17:39:37 +0000 (17:39 +0000)]
xen/domctl: Merge set_max_evtchn into createdomain

set_max_evtchn is somewhat weird.  It was introduced with the event_fifo work,
but has never been used.  Still, it is a bounding on resources consumed by the
event channel infrastructure, and should be part of createdomain, rather than
editable after the fact.

Drop XEN_DOMCTL_set_max_evtchn completely (including XSM hooks and libxc
wrappers), and retain the functionality in XEN_DOMCTL_createdomain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agotools: Rework xc_domain_create() to take a full xen_domctl_createdomain
Andrew Cooper [Fri, 9 Mar 2018 14:38:35 +0000 (14:38 +0000)]
tools: Rework xc_domain_create() to take a full xen_domctl_createdomain

In future patches, the structure will be extended with further information,
and this is far cleaner than adding extra parameters.

The python stubs are the only user which passes NULL for the existing config
option (which is actually the arch substructure).  Therefore, the #ifdefary
moves to compensate.

For libxl, pass the full config object down into
libxl__arch_domain_{prepare,save}_config(), as there are in practice arch
specific settings in the common part of the structure (flags s3_integrity and
oos_off specifically).

No practical change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/ocaml: Pass a full domctl_create_config into stub_xc_domain_create()
Andrew Cooper [Mon, 12 Mar 2018 10:40:33 +0000 (10:40 +0000)]
tools/ocaml: Pass a full domctl_create_config into stub_xc_domain_create()

The underlying C function is about to make the same change, and the structure
is going to gain extra fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
7 years agox86/pv: Use xmemdup() for cpuidmasks, rather than opencoding it
Andrew Cooper [Wed, 15 Aug 2018 09:53:53 +0000 (10:53 +0100)]
x86/pv: Use xmemdup() for cpuidmasks, rather than opencoding it

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/ptwr: Misc cleanup to ptwr_emulated_update()
Andrew Cooper [Fri, 10 Aug 2018 17:05:24 +0000 (18:05 +0100)]
x86/ptwr: Misc cleanup to ptwr_emulated_update()

All but one user wants mfn as mfn_t, so switch its type.  offset is only ever
used when multipled by 8, so fold that into its initial calculation.  Fold all
the pointer arithmic on pl1e together, to avoid needless casts.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: write to correct variable in parse_pv_l1tf()
Jan Beulich [Wed, 15 Aug 2018 12:15:30 +0000 (14:15 +0200)]
x86: write to correct variable in parse_pv_l1tf()

Apparently a copy-and-paste mistake.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/hvm/ioreq: MMIO range checking completely ignores direction flag
Paul Durrant [Wed, 15 Aug 2018 12:14:06 +0000 (14:14 +0200)]
x86/hvm/ioreq: MMIO range checking completely ignores direction flag

hvm_select_ioreq_server() is used to route an ioreq to the appropriate
ioreq server. For MMIO this is done by comparing the range of the ioreq
to the ranges registered by the device models of each ioreq server.
Unfortunately the calculation of the range if the ioreq completely ignores
the direction flag and thus may calculate the wrong range for comparison.
Thus the ioreq may either be routed to the wrong server or erroneously
terminated by null_ops.

NOTE: The patch also fixes whitespace in the switch statement to make it
      style compliant.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agolibs/foreignmemory: Avoid printing an error for ENOTSUPP
Julien Grall [Mon, 13 Aug 2018 17:33:25 +0000 (18:33 +0100)]
libs/foreignmemory: Avoid printing an error for ENOTSUPP

Resource mapping is not supported on Arm and results to an error message
at every guest boot:

xenforeignmemory: error: ioctl failed: Operation not supported

Hide the error message when errnor is ENOTSUPP.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: start pvqemu when 9pfs is requested
Stefano Stabellini [Tue, 14 Aug 2018 22:13:09 +0000 (15:13 -0700)]
libxl: start pvqemu when 9pfs is requested

PV 9pfs requires the PV backend in QEMU. Make sure that libxl knows it.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agopygrub: fix package version
Simon Rowe [Wed, 15 Aug 2018 08:08:07 +0000 (09:08 +0100)]
pygrub: fix package version

Make the version in setup.py agree with PYGRUB_VER.

Signed-off-by: Simon Rowe <simon.rowe@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen: fix stale PVH comment
Roger Pau Monne [Tue, 14 Aug 2018 14:03:24 +0000 (16:03 +0200)]
xen: fix stale PVH comment

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxl.conf: Add global affinity masks
Wei Liu [Tue, 7 Aug 2018 14:35:34 +0000 (15:35 +0100)]
xl.conf: Add global affinity masks

XSA-273 involves one hyperthread being able to use Spectre-like
techniques to "spy" on another thread.  The details are somewhat
complicated, but the upshot is that after all Xen-based mitigations
have been applied:

* PV guests cannot spy on sibling threads
* HVM guests can spy on sibling threads

(NB that for purposes of this vulnerability, PVH and HVM guests are
identical.  Whenever this comment refers to 'HVM', this includes PVH.)

There are many possible mitigations to this, including disabling
hyperthreading entirely.  But another solution would be:

* Specify some cores as PV-only, others as PV or HVM
* Allow HVM guests to only run on thread 0 of the "HVM-or-PV" cores
* Allow PV guests to run on the above cores, as well as any thread of the PV-only cores.

For example, suppose you had 16 threads across 8 cores (0-7).  You
could specify 0-3 as PV-only, and 4-7 as HVM-or-PV.  Then you'd set
the affinity of the HVM guests as follows (binary representation):

0000000010101010

And the affinity of the PV guests as follows:

1111111110101010

In order to make this easy, this patches introduces three "global affinity
masks", placed in xl.conf:

    vm.cpumask
    vm.hvm.cpumask
    vm.pv.cpumask

These are parsed just like the 'cpus' and 'cpus_soft' options in the
per-domain xl configuration files.  The resulting mask is AND-ed with
whatever mask results at the end of the xl configuration file.
`vm.cpumask` would be applied to all guest types, `vm.hvm.cpumask`
would be applied to HVM and PVH guest types, and `vm.pv.cpumask`
would be applied to PV guest types.

The idea would be that to implement the above mask across all your
VMs, you'd simply add the following two lines to the configuration
file:

    vm.hvm.cpumask=8,10,12,14
    vm.pv.cpumask=0-8,10,12,14

See xl.conf manpage for details.

This is part of XSA-273 / CVE-2018-3646.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: Make "spec-ctrl=no" a global disable of all mitigations
Jan Beulich [Mon, 13 Aug 2018 11:07:23 +0000 (05:07 -0600)]
x86: Make "spec-ctrl=no" a global disable of all mitigations

In order to have a simple and easy to remember means to suppress all the
more or less recent workarounds for hardware vulnerabilities, force
settings not controlled by "spec-ctrl=" also to their original defaults,
unless they've been forced to specific values already by earlier command
line options.

This is part of XSA-273.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests
Andrew Cooper [Tue, 29 May 2018 17:44:16 +0000 (18:44 +0100)]
x86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests

This mitigation requires up-to-date microcode, and is enabled by default on
affected hardware if available, and is used for HVM guests

The default for SMT/Hyperthreading is far more complicated to reason about,
not least because we don't know if the user is going to want to run any HVM
guests to begin with.  If a explicit default isn't given, nag the user to
perform a risk assessment and choose an explicit default, and leave other
configuration to the toolstack.

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/msr: Virtualise MSR_FLUSH_CMD for guests
Andrew Cooper [Fri, 13 Apr 2018 15:34:01 +0000 (15:34 +0000)]
x86/msr: Virtualise MSR_FLUSH_CMD for guests

Guests (outside of the nested virt case, which isn't supported yet) don't need
L1D_FLUSH for their L1TF mitigations, but offering/emulating MSR_FLUSH_CMD is
easy and doesn't pose an issue for Xen.

The MSR is offered to HVM guests only.  PV guests attempting to use it would
trap for emulation, and the L1D cache would fill long before the return to
guest context.  As such, PV guests can't make any use of the L1D_FLUSH
functionality.

This is part of XSA-273 / CVE-2018-3646.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/spec-ctrl: CPUID/MSR definitions for L1D_FLUSH
Andrew Cooper [Wed, 28 Mar 2018 14:21:39 +0000 (15:21 +0100)]
x86/spec-ctrl: CPUID/MSR definitions for L1D_FLUSH

This is part of XSA-273 / CVE-2018-3646.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Force a guest into shadow mode when it writes an L1TF-vulnerable PTE
Juergen Gross [Mon, 23 Jul 2018 06:11:40 +0000 (08:11 +0200)]
x86/pv: Force a guest into shadow mode when it writes an L1TF-vulnerable PTE

See the comment in shadow.h for an explanation of L1TF and the safety
consideration of the PTEs.

In the case that CONFIG_SHADOW_PAGING isn't compiled in, crash the domain
instead.  This allows well-behaved PV guests to function, while preventing
L1TF from being exploited.  (Note: PV guest kernels which haven't been updated
with L1TF mitigations will likely be crashed as soon as they try paging a
piece of userspace out to disk.)

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: Plumbing to allow any PTE update to fail with -ERESTART
Andrew Cooper [Mon, 23 Jul 2018 06:11:40 +0000 (08:11 +0200)]
x86/mm: Plumbing to allow any PTE update to fail with -ERESTART

Switching to shadow mode is performed in tasklet context.  To facilitate this,
we schedule the tasklet, then create a hypercall continuation to allow the
switch to take place.

As a consequence, the x86 mm code needs to cope with an L1e operation being
continuable.  do_mmu{,ext}_op() may no longer assert that a continuation
doesn't happen on the final iteration.

To handle the arguments correctly on continuation, compat_update_va_mapping*()
may no longer call into their non-compat counterparts.  Move the compat
functions into mm.c rather than exporting __do_update_va_mapping() and
{get,put}_pg_owner(), and fix an unsigned long/int inconsistency with
compat_update_va_mapping_otherdomain().

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/shadow: Infrastructure to force a PV guest into shadow mode
Juergen Gross [Mon, 23 Jul 2018 06:11:40 +0000 (07:11 +0100)]
x86/shadow: Infrastructure to force a PV guest into shadow mode

To mitigate L1TF, we cannot alter an architecturally-legitimate PTE a PV guest
chooses to write, but we can force the PV domain into shadow mode so Xen
controls the PTEs which are reachable by the CPU pagewalk.

Introduce new shadow mode, PG_SH_forced, and a tasklet to perform the
transition.  Later patches will introduce the logic to enable this mode at the
appropriate time.

To simplify vcpu cleanup, make tasklet_kill() idempotent with respect to
tasklet_init(), which involves adding a helper to check for an uninitialised
list head.

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/spec-ctrl: Introduce an option to control L1TF mitigation for PV guests
Andrew Cooper [Mon, 23 Jul 2018 13:46:10 +0000 (13:46 +0000)]
x86/spec-ctrl: Introduce an option to control L1TF mitigation for PV guests

Shadowing a PV guest is only available when shadow paging is compiled in.
When shadow paging isn't available, guests can be crashed instead as
mitigation from Xen's point of view.

Ideally, dom0 would also be potentially-shadowed-by-default, but dom0 has
never been shadowed before, and there are some stability issues under
investigation.

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/spec-ctrl: Calculate safe PTE addresses for L1TF mitigations
Andrew Cooper [Wed, 25 Jul 2018 12:10:19 +0000 (12:10 +0000)]
x86/spec-ctrl: Calculate safe PTE addresses for L1TF mitigations

Safe PTE addresses for L1TF mitigations are ones which are within the L1D
address width (may be wider than reported in CPUID), and above the highest
cacheable RAM/NVDIMM/BAR/etc.

All logic here is best-effort heuristics, which should in practice be fine for
most hardware.  Future work will see about disentangling the SRAT handling
further, as well as having L0 pass this information down to lower levels when
virtualised.

This is part of XSA-273 / CVE-2018-3620.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/oxenstored: Make evaluation order explicit
Christian Lindig [Mon, 13 Aug 2018 16:26:56 +0000 (17:26 +0100)]
tools/oxenstored: Make evaluation order explicit

In Store.path_write(), Path.apply_modify() updates the node_created
reference and both the value of apply_modify() and node_created are
returned by path_write().

At least with OCaml 4.06.1 this leads to the value of node_created being
returned *before* it is updated by apply_modify().  This in turn leads
to the quota for a domain not being updated in Store.write().  Hence, a
guest can create an unlimited number of entries in xenstore.

The fix is to make evaluation order explicit.

This is XSA-272.

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Rob Hoes <rob.hoes@citrix.com>
7 years agox86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL bits
Andrew Cooper [Mon, 13 Aug 2018 16:26:21 +0000 (17:26 +0100)]
x86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL bits

The VPMU_MODE_OFF early-exit in vpmu_do_wrmsr() introduced by c/s
11fe998e56 bypasses all reserved bit checking in the general case.  As a
result, a guest can enable BTS when it shouldn't be permitted to, and
lock up the entire host.

With vPMU active (not a security supported configuration, but useful for
debugging), the reserved bit checking in broken, caused by the original
BTS changeset 1a8aa75ed.

From a correctness standpoint, it is not possible to have two different
pieces of code responsible for different parts of value checking, if
there isn't an accumulation of bits which have been checked.  A
practical upshot of this is that a guest can set any value it
wishes (usually resulting in a vmentry failure for bad guest state).

Therefore, fix this by implementing all the reserved bit checking in the
main MSR_DEBUGCTL block, and removing all handling of DEBUGCTL from the
vPMU MSR logic.

This is XSA-269.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoARM: disable grant table v2
Stefano Stabellini [Mon, 13 Aug 2018 16:25:51 +0000 (17:25 +0100)]
ARM: disable grant table v2

It was never expected to work, the implementation is incomplete.

As a side effect, it also prevents guests from triggering a
"BUG_ON(page_get_owner(pg) != d)" in gnttab_unpopulate_status_frames().

This is XSA-268.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/spec-ctrl: Yet more fixes for xpti= parsing
Andrew Cooper [Thu, 9 Aug 2018 16:22:17 +0000 (17:22 +0100)]
x86/spec-ctrl: Yet more fixes for xpti= parsing

As it currently stands, 'xpti=dom0' is indistinguishable from the default
value, which means it will be overridden by ARCH_CAPABILITIES_RDCL_NO on fixed
hardware.

Switch opt_xpti to use -1 as a default like all our other related options, and
clobber it as soon as we have a string to parse.

In addition, 'xpti' alone should be interpreted in its positive boolean form,
rather than resulting in a parse error.

  (XEN) parameter "xpti" has invalid value "", rc=-22!

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/libxenctrl: use new xenforeignmemory API to seed grant table
Paul Durrant [Thu, 9 Aug 2018 09:59:41 +0000 (10:59 +0100)]
tools/libxenctrl: use new xenforeignmemory API to seed grant table

A previous patch added support for priv-mapping guest resources directly
(rather than having to foreign-map, which requires P2M modification for
HVM guests).

This patch makes use of the new API to seed the guest grant table unless
the underlying infrastructure (i.e. privcmd) doesn't support it, in which
case the old scheme is used.

NOTE: The call to xc_dom_gnttab_hvm_seed() in hvm_build_set_params() was
      actually unnecessary, as the grant table has already been seeded
      by a prior call to xc_dom_gnttab_init() made by libxl__build_dom().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agocommon: add a new mappable resource type: XENMEM_resource_grant_table
Paul Durrant [Thu, 9 Aug 2018 09:59:40 +0000 (10:59 +0100)]
common: add a new mappable resource type: XENMEM_resource_grant_table

This patch allows grant table frames to be mapped using the
XENMEM_acquire_resource memory op.

NOTE: This patch expands the on-stack mfn_list array in acquire_resource()
      but it is still small enough to remain on-stack.

NOTE: This patch also removes a bogus comment above the
      grant_to_status_frames() function.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[Rebase over "Explicitly default to gnttab v1 during domain creation"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agocommon/gnttab: Explicitly default to gnttab v1 during domain creation
Andrew Cooper [Wed, 8 Aug 2018 14:54:30 +0000 (15:54 +0100)]
common/gnttab: Explicitly default to gnttab v1 during domain creation

For reasons which appear to be exclusively down to poor review of the grant
table v2 code, a grant table's version field was wasn't initialised during
creation.

A number of problems (including XSAs) have occurred in the past trying trying
to use a grant table which hasn't been properly set up, and various areas of
the code cope with v0 by defaulting to v1.

In particular, the toolstack using GNTTABOP_setup_table to be able to fill in
the store/console grants has a side effect of switching to v1.

In hindsight however, this "fixup if we see 0" is a very poor, with a
substantial degree of risk.  Explicitly default to grant table v1 during
domain create, and let the rest of the code work safely in the knowledge that
the version is sensibly set.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/vlapic: Bugfixes and improvements to vlapic_{read,write}()
Andrew Cooper [Mon, 6 Aug 2018 09:11:00 +0000 (09:11 +0000)]
x86/vlapic: Bugfixes and improvements to vlapic_{read,write}()

Firstly, there is no 'offset' boundary check on the non-32-bit write path
before the call to vlapic_read_aligned(), which allows an attacker to read
beyond the end of vlapic->regs->data[], which is only 1024 bytes long.

However, as the backing memory is a domheap page, and misaligned accesses get
chunked down to single bytes across page boundaries, I can't spot any
XSA-worthy problems which occur from the overrun.

On real hardware, bad accesses don't instantly crash the machine.  Their
behaviour is undefined, but the domain_crash() prohibits sensible testing.
Behave more like other x86 MMIO and terminate bad accesses with appropriate
defaults.

While making these changes, clean up and simplify the the smaller-access
handling.  In particular, avoid pointer based mechansims for 1/2-byte reads so
as to avoid forcing the value to be spilled to the stack.

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-175 (-175)
  function                                     old     new   delta
  vlapic_read                                  211     142     -69
  vlapic_write                                 304     198    -106

Finally, there are a plethora of read/write functions in the vlapic namespace,
so rename these to vlapic_mmio_{read,write}() to make their purpose more
clear.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86: move arch_evtchn_inject to x86 common code
Wei Liu [Tue, 7 Aug 2018 10:00:50 +0000 (11:00 +0100)]
x86: move arch_evtchn_inject to x86 common code

It is not specific to HVM. It just so happens that PV doesn't need
special handling.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: add missing "inline" keyword
Wei Liu [Tue, 7 Aug 2018 10:00:45 +0000 (11:00 +0100)]
x86: add missing "inline" keyword

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: put compat.o and x86_64/compat.o under CONFIG_PV
Wei Liu [Tue, 7 Aug 2018 10:00:44 +0000 (11:00 +0100)]
x86: put compat.o and x86_64/compat.o under CONFIG_PV

They contain code for compat hypercall for PV guests.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agodrop {,acpi_}reserve_bootmem()
Jan Beulich [Fri, 3 Aug 2018 15:40:31 +0000 (17:40 +0200)]
drop {,acpi_}reserve_bootmem()

Both are entirely unused (to be fair, reserve_bootmem() has a use inside
an "#if 0" section in x86's mpparse.c, but if we were to re-enable that
code, it would need doing differently anyway).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/hvm: Drop hvm_sr_handlers initializer
Alexandru Isaila [Fri, 3 Aug 2018 15:39:31 +0000 (17:39 +0200)]
x86/hvm: Drop hvm_sr_handlers initializer

This initializer is flawed and only sets .name of array entry 0
to a non-NULL string.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoautomation: ensure created are not owned as root
Doug Goldstein [Fri, 3 Aug 2018 14:46:49 +0000 (09:46 -0500)]
automation: ensure created are not owned as root

By default the container runs as the root user and since the source tree
is bind mounted into the container, any file is created and owned by the
root user which harms ergonomics when working outside of the container
environment. This maps the root user within the container to the uid of
the user outside of the container so files are not owned by root.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoautomation: remove dead code from containerize
Doug Goldstein [Fri, 3 Aug 2018 14:46:48 +0000 (09:46 -0500)]
automation: remove dead code from containerize

This is more dead code.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoautomation: drop container name from containerize
Doug Goldstein [Fri, 3 Aug 2018 14:46:47 +0000 (09:46 -0500)]
automation: drop container name from containerize

This was something that existed for some scripting support for a totally
unrelated project and when I copied this script I failed to remove it so
this removes it. Build containers for Xen are best as ephemeral
environments and should just utilizes Docker's default container naming
behavior.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoautomation: standardize containerize env names
Doug Goldstein [Fri, 3 Aug 2018 14:46:46 +0000 (09:46 -0500)]
automation: standardize containerize env names

Standardized all the environment variable names that the containerize
script uses to start with CONTAINER_

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen: specify support for EXPERT and DEBUG Kconfig options
Stefano Stabellini [Tue, 31 Jul 2018 15:24:01 +0000 (08:24 -0700)]
xen: specify support for EXPERT and DEBUG Kconfig options

Add a clear statement about them, reflecting the current security
support status of Kconfig options (no changes to current policies).

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: George.Dunlap@eu.citrix.com
CC: Ian.Jackson@eu.citrix.com
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
---
Changes in v7:
- talk about EXPERT and DEBUG rather than CONFIG_EXPERT and CONFIG_DEBUG

7 years agoxen: add cloc target
Stefano Stabellini [Tue, 31 Jul 2018 15:23:01 +0000 (08:23 -0700)]
xen: add cloc target

Add a Xen build target to count the lines of code of the source files
built. Uses `cloc' to do the job.

With Xen on ARM taking off in embedded, IoT, and automotive, we are
seeing more and more uses of Xen in constrained environments. Users and
system integrators want the smallest Xen and Dom0 configurations. Some
of these deployments require certifications, where you definitely want
the smallest lines of code count. I provided this patch to give us the
lines of code count for that purpose.

Use the .o.d files to account for all the built source files. Generate a
list for the `cloc' utility and invoke `cloc'.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v4:
- use grep regex to get multiple source files from .d files

Changes in v3:
- remove build as dependecy for the cloc target

Changes in v2:
- change implementation to use .o.d to find built source files

7 years agoxen: add per-platform defaults for NR_CPUS
Stefano Stabellini [Tue, 31 Jul 2018 15:22:01 +0000 (08:22 -0700)]
xen: add per-platform defaults for NR_CPUS

Add specific per-platform defaults for NR_CPUS. Note that the order of
the defaults matter: they need to go first, otherwise the generic
defaults will be applied.

This is done so that Xen builds customized for a specific hardware
platform can have the right NR_CPUS number.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
---

Changes in v6:
- remove useless additional default for ALL

7 years agoarm: add ALL_PLAT, QEMU, Rcar3 and MPSoC configs
Stefano Stabellini [Tue, 31 Jul 2018 15:21:01 +0000 (08:21 -0700)]
arm: add ALL_PLAT, QEMU, Rcar3 and MPSoC configs

Add a "Platform Support" choice with four kconfig options: QEMU, RCAR3,
MPSOC and ALL_PLAT. They enable the required options for their hardware
platform. ALL_PLAT enables all available platforms and it's the default.
It doesn't automatically select any of the related drivers, otherwise
they cannot be disabled. ALL_PLAT is implemented by using hidden options
with default values depending on ALL_PLAT.

In the case of the MPSOC that has a platform file under
arch/arm/platforms/, build the file if MPSOC.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
CC: artem_mygaiev@epam.com
CC: volodymyr_babchuk@epam.com
---
Changes in v8:
- remove QEMU_PLATFORM and RCAR3_PLATFORM that are currently unused
- remove selects from ALL
- rename ALL to ALL_PLAT
- introduce ALL64_PLAT and ALL32_PLAT

Changes in v5:
- turn platform support into a choice
- add ALL

Changes in v4:
- fix GICv3/GICV3
- default y to all options
- build xilinx-zynqmp if MPSOC

7 years agoarm: add a tiny kconfig configuration
Stefano Stabellini [Tue, 31 Jul 2018 15:20:01 +0000 (08:20 -0700)]
arm: add a tiny kconfig configuration

Add a tiny kconfig configuration. Enabled only the credit scheduler.
It only carries non-default options (use make menuconfig or make
olddefconfig to produce a complete .config file).

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v7:
- remove NULL because it is still experimental

7 years agoarm: make it possible to disable the SMMU driver
Stefano Stabellini [Tue, 31 Jul 2018 15:19:01 +0000 (08:19 -0700)]
arm: make it possible to disable the SMMU driver

Introduce a Kconfig option for the ARM SMMUv1 and SMMUv2 driver.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: jbeulich@suse.com
---
Changes in v3:
- rename SMMUv2 to ARM_SMMU
- improve help message
- use if ARM

Changes in v2:
- rename HAS_SMMUv2 to SMMUv2
- move SMMUv2 to xen/drivers/passthrough/Kconfig

7 years agomake it possible to enable/disable UART drivers
Stefano Stabellini [Tue, 31 Jul 2018 15:18:01 +0000 (08:18 -0700)]
make it possible to enable/disable UART drivers

All the UART drivers are silent options. Add one line descriptions so
that can be de/selected via menuconfig.

Add an x86 dependency to HAS_EHCI: EHCI PCI has not been used on ARM. In
fact, it depends on PCI, and moreover we have drivers for several
embedded UARTs for various ARM boards.

NS16550 remains not selectable on x86.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v4:
- improve commit message
- remove prompt for HAS_EHCI

Changes in v3:
- NS16550 prompt if ARM

Changes in v2:
- make HAS_EHCI depend on x86

7 years agoMake MEM_ACCESS configurable
Stefano Stabellini [Tue, 31 Jul 2018 15:17:01 +0000 (08:17 -0700)]
Make MEM_ACCESS configurable

Select MEM_ACCESS_ALWAYS_ON on x86 to mark that MEM_ACCESS is not
configurable on x86. Avoid selecting it on ARM.
Rename HAS_MEM_ACCESS to MEM_ACCESS everywhere. Add a prompt and a
description to MEM_ACCESS in xen/common/Kconfig.

The result is that the user-visible option is MEM_ACCESS, and it is
configurable only on ARM (disabled by default). At the moment the
arch-specific mem_access code remains enabled on ARM, even with
MEM_ACCESS=y.

The purpose is to reduce code size. The option doesn't depend on EXPERT
because it would be nice to ecurity-support configurations without
MEM_ACCESS and a non-expert should be able to disable it.

Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: dgdegra@tycho.nsa.gov
CC: andrew.cooper3@citrix.com
CC: George.Dunlap@eu.citrix.com
CC: ian.jackson@eu.citrix.com
CC: jbeulich@suse.com
CC: julien.grall@arm.com
CC: konrad.wilk@oracle.com
CC: sstabellini@kernel.org
CC: tim@xen.org
CC: wei.liu2@citrix.com
---
Changes in v5:
- change MEM_ACCESS_ALWAYS_ON to bool
- change default for MEM_ACCESS, default y if MEM_ACCESS_ALWAYS_ON

Changes in v4:
- remove HAS_MEM_ACCESS
- move MEM_ACCESS_ALWAYS_ON to common
- combile default and bool to def_bool

Changes in v3:
- keep HAS_MEM_ACCESS to mark that an arch can do MEM_ACCESS
- introduce MEM_ACCESS_ALWAYS_ON
- the main MEM_ACCESS option is in xen/common/Kconfig

Changes in v2:
- patch added

7 years agoarm: rename HAS_GICV3 to GICV3
Stefano Stabellini [Tue, 31 Jul 2018 15:16:01 +0000 (08:16 -0700)]
arm: rename HAS_GICV3 to GICV3

HAS_GICV3 has become selectable by the user. To mark the change, rename
the option from HAS_GICV3 to GICV3.

Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v3:
- no changes

Changes in v2:
- patch added

7 years agoarm: make it possible to disable HAS_GICV3
Stefano Stabellini [Tue, 31 Jul 2018 15:15:01 +0000 (08:15 -0700)]
arm: make it possible to disable HAS_GICV3

Today it is a silent option. This patch adds a one line description and
makes it optional.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: George.Dunlap@eu.citrix.com
CC: Ian.Jackson@eu.citrix.com
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v3:
- remove any changes to MEM_ACCESS
- update commit message

Changes in v2:
- make HAS_GICv3 depend on ARM_64
- remove modifications to ARM_HDLCD kconfig, it has been removed

7 years agox86/altp2m: make sure EPTP_INDEX is up-to-date when enabling #VE
George Dunlap [Thu, 2 Aug 2018 10:12:43 +0000 (12:12 +0200)]
x86/altp2m: make sure EPTP_INDEX is up-to-date when enabling #VE

vmx_vmexit_handler() assumes that if
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, that the value in
EPTP_INDEX is valid.  Unfortunately, the function which sets this bit
(vmx_vcpu_update_vmfunc_ve) doesn't actually set EPTP_INDEX; it will
only be set the next time vmx_vcpu_update_eptp() is called.

This means that if a vcpu makes a vmexit between these two points, the
EPTP_INDEX it reads will be invalid.  The first time this race happens
for a domain, EPTP_INDEX will most likely be zero, which is the index
for the "host" p2m -- and thus is often correct.  But the second time
this race happens, the value will typically be INVALID_ALTP2M, which
will hit the following BUG:

    BUG_ON(idx >= MAX_ALTP2M);

Worse, if for some reason the current altp2m was *not* `0` during this
window (say, because a toolstack changed the VM to a different view),
then the accounting of active vcpus for an altp2m will be thrown off.

Fix this by always updating EPTP_INDEX to the current altp2m index
when enabling #VE.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/cpuidle: replace a pointless NULL check
Jan Beulich [Thu, 2 Aug 2018 10:12:07 +0000 (12:12 +0200)]
x86/cpuidle: replace a pointless NULL check

The address of an array slot can't be NULL. Instead add a bounds check
to make sure the array indexing is valid (the check is against 2 since
slot zero of the array - corresponding to C0 - is of no interest here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agovtd: cleanup vtd_set_hwdom_mapping after ia64 removal
Roger Pau Monné [Thu, 2 Aug 2018 10:11:03 +0000 (12:11 +0200)]
vtd: cleanup vtd_set_hwdom_mapping after ia64 removal

Remove the handling for different page sizes now that ia64 is gone.

No functional change.

Suggested by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agoxen: Remove domain_crash_synchronous() completely
Andrew Cooper [Wed, 24 Jan 2018 16:59:42 +0000 (16:59 +0000)]
xen: Remove domain_crash_synchronous() completely

domain_crash_synchronous() is unsafe to use in general as it may leave
spinlocks held, temporary memory allocated, etc.

With domain_crash_synchronous() removed from the ARM code in 4.11, take the
opportunity to remove the infrastructure completely by opencoding the softirq
loop in the remaining callsites, all of which are destined for deletion.

None of these sites are at risk of having a pending ioreq to qemu, which means
that the vcpu_end_shutdown_deferral() isn't necessary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/vmx: Avoid using domain_crash_syncrhonous() in vmx_vmentry_failure()
Andrew Cooper [Tue, 6 Feb 2018 12:01:08 +0000 (12:01 +0000)]
x86/vmx: Avoid using domain_crash_syncrhonous() in vmx_vmentry_failure()

There is no need for the syncrhonous varient, as the vmentry failure path can
just return to processing softirqs.

This is in aid of trying to remove domain_crash_syncrhonous() from the
codebase.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vmx: Avoid hitting BUG_ON() after EPTP-related domain_crash()
Andrew Cooper [Wed, 1 Aug 2018 11:47:50 +0000 (12:47 +0100)]
x86/vmx: Avoid hitting BUG_ON() after EPTP-related domain_crash()

If the EPTP pointer can't be located in the altp2m list, the domain
is (legitimately) crashed.

Under those circumstances, execution will continue and guarentee to hit the
BUG_ON(idx >= MAX_ALTP2M) (unfortunately, just out of context).

Return from vmx_vmexit_handler() after the domain_crash(), which also has the
side effect of reentering the scheduler more promptly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agotools/gdbsx: use inttypes.h instead of custom macros
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 20:19:05 +0000 (22:19 +0200)]
tools/gdbsx: use inttypes.h instead of custom macros

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: fix up patch ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/gdbsx: fix 'g' packet response for 64bit guests
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 02:30:42 +0000 (04:30 +0200)]
tools/gdbsx: fix 'g' packet response for 64bit guests

gdb 8.0 fixed bounds checking for 'g' packet (commit
9dc193c3be85aafa60ceff57d3b0430af607b4ce "Check for truncated
registers in process_g_packet"). This revealed that gdbsx did
not properly formatted 'g' packet - segment registers and eflags are
expected to be 32-bit fields in the response (according to
gdb/features/i386/64bit-core.xml in gdb sources). Specific error is:

    Truncated register 26 in remote 'g' packet

instead of silently truncating part of register.

Additionally, it looks like segment registers of 64bit guests were never
reported correctly, because of type mismatch.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxenstore-client: Add option for raw in-/output
Simon Gaiser [Tue, 31 Jul 2018 02:56:54 +0000 (04:56 +0200)]
xenstore-client: Add option for raw in-/output

Parsing/generating the escape sequences used by xenstore-client is non
trivial. So make scripting (for use in stubdom) easier by adding a raw
option.

[added man page entries, facor out expand_buffer]
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: add xenstore-read and xenstore-write man pages
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 02:56:53 +0000 (04:56 +0200)]
docs: add xenstore-read and xenstore-write man pages

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxenconsole: add option to avoid escape sequences in log
Marek Marczykowski-Górecki [Tue, 31 Jul 2018 03:15:32 +0000 (05:15 +0200)]
xenconsole: add option to avoid escape sequences in log

Add --replace-escape option to xenconsoled, which replaces ESC with
'.' in console output written to log file. This makes it slightly safer
to do tail -f on a console output of untrusted guest.
The pty output is unaffected by this option.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: move variables into a narrower scope ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen: clean up altp2m op comment
Wei Liu [Wed, 1 Aug 2018 09:03:07 +0000 (10:03 +0100)]
xen: clean up altp2m op comment

Delete trailing spaces and refer to XSM instead of an internal
function in the public header.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agohvm/altp2m: Clarify the proper way to extend the altp2m interface
George Dunlap [Tue, 31 Jul 2018 14:17:21 +0000 (15:17 +0100)]
hvm/altp2m: Clarify the proper way to extend the altp2m interface

The altp2m functionality was originally envisioned to be used in
several different configurations, one of which was a single in-guest
agent that had full operational control of altp2m.  This required the
single hypercall to be an HVMOP rather than a DOMCTL, since HVM guests
are not allowed to make DOMCTLs.  Access to this HVMOP is controlled
by a per-domain HVM_PARAM, and defaults to 'off'.

Exposing the altp2m functionality to the guest was controversial at
the time, but was ultimately accepted.  The fact that altp2m is an
HVMOP rather than a DOMCTL has caused some problems, however, for
those moving forward trying to extend the interface: Extending the
interface even for the 'external' use case now means extending an
HVMOP, which implicitly extends the surface of attack for the
'internal' use case as well.  The result has been that every addition
to this interface has also been controversial.

Settle the controversy once and for all by documenting 1) the purpose
of the altp2m interface, and 2) how to extend it.  In particular:

* Specify that the fully in-guest agent is a target use case

* Specify that all extensions to altp2m functionality should be subops
  of the HVMOP hypercall

* Specify that new subops should be enabled in ALTP2M_mixed mode by
  default, but that this mode has not been evaluated for safety.

Hopefully this will allow the altp2m interface to be developed further
without unnecessary controversy.

Further discussion:

As far as I can tell there are three possible solutions to this
controversy.

A. Remove the 'internal' functionality as a target by converting the
current HVMOP into a DOMCTL.

B. Have two hypercalls -- an HVMOP which contains functionality
expected to be used by the 'internal' agent, and a DOMCTL for
functionality which is expected to be used only be the 'external'
agent.

C. Agree to add all new subops to the current hypercall (HVMOP), even
if we're not sure if they should be exposed to the guest.

I think A is a terrible idea.  Having a single in-guest agent is a
reasonable design choice, and apparently it was even implemented at
some point; we should make it straightforward for someone in the
future to pick up the work if they want to.

I think B is also a bad idea.  The people extending it at the moment
are primarily concerned with the 'external' use case.  There is nobody
around to represent whether new functionality should end up in the
HVMOP or the DOMCTL, which means that by default it will end up in the
DOMCTL.  If it is discovered, afterwards, that the new operations
*would* be safe and useful for the 'internal' use case, then we will
either have to duplicate them inside the HVMOP (which would be
terrible) or move the operation from the DOMCTL to the HVMOP (which
would make coding an agent against several versions a mess).

It just makes more sense to have all the altp2m operations in a single
place, and a simple way to control whether they're available to the
'internal' use case or not.  As such, I am proposing 'C'.

Even within that, we have several options as far as what to do with
the current interface:

C1: Audit the current subops and make a blacklist of subops not
suitable for exposure to the guest.  Future subops should be on the
blacklist unless they have been evaluated as safe for exposure.

C2: Don't blacklist the current subops, but require that all future
subops be blacklisted unless they have been evaluated as safe for
exposure.

C3: Don't blacklist current or future subops for the present; just
document that they need to be evaluated (and some potentially
blacklisted) before being exposed to a guest in a safety-critical
environment.

C1 would be ideal, but there's nobody at present to do the work.
Given that, C3 has been seen as the best solution in discussion.

Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/xstate: correct logging in handle_xsetbv()
Jan Beulich [Tue, 31 Jul 2018 15:12:35 +0000 (17:12 +0200)]
x86/xstate: correct logging in handle_xsetbv()

Correct a disagreement between text and logged value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomemory: fix label syntax
Norbert Manthey [Tue, 31 Jul 2018 15:11:36 +0000 (17:11 +0200)]
memory: fix label syntax

When compiling this file with gcc, the compiler happily accepts the
sequence of a label followed by an attribute. However, this sequence does
not follow the gcc documentation. Hence, other compilers might stumble
upon this statement.

To be able to compile Xen with goto-cc (the compiler of the CPROVER tool
suite), the missing semicolon is added in this commit.

Reported-by: Elizabeth Polgreen <polgreen@amazon.de>
Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoiommu: remove unneeded return from iommu_hwdom_init
Roger Pau Monné [Tue, 31 Jul 2018 08:25:36 +0000 (10:25 +0200)]
iommu: remove unneeded return from iommu_hwdom_init

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/efi: split compiler vs linker support
Roger Pau Monné [Tue, 31 Jul 2018 08:25:06 +0000 (10:25 +0200)]
x86/efi: split compiler vs linker support

So that an ELF binary with support for EFI services will be built when
the compiler supports the MS ABI, regardless of the linker support for
PE.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
7 years agox86/efi: move the logic to detect PE build support
Roger Pau Monné [Tue, 31 Jul 2018 08:24:22 +0000 (10:24 +0200)]
x86/efi: move the logic to detect PE build support

So that it can be used by other components apart from the efi specific
code. By moving the detection code creating a dummy efi/disabled file
can be avoided.

This is required so that the conditional used to define the efi symbol
in the linker script can be removed and instead the definition of the
efi symbol can be guarded using the preprocessor.

The motivation behind this change is to be able to build Xen using lld
(the LLVM linker), that at least on version 6.0.0 doesn't work
properly with a DEFINED being used in a conditional expression:

ld    -melf_x86_64_fbsd  -T xen.lds -N prelink.o --build-id=sha1 \
    /root/src/xen/xen/common/symbols-dummy.o -o /root/src/xen/xen/.xen-syms.0
ld: error: xen.lds:233: symbol not found: efi

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
7 years agoxen/compiler: introduce a define for weak symbols
Roger Pau Monné [Tue, 31 Jul 2018 08:23:37 +0000 (10:23 +0200)]
xen/compiler: introduce a define for weak symbols

And replace the open-coded versions already in tree. No functional
change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reivewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
7 years agoci: enable builds with CentOS 7.x
Doug Goldstein [Sun, 29 Jul 2018 21:53:16 +0000 (16:53 -0500)]
ci: enable builds with CentOS 7.x

Add the CentOS 7.x images to be used for build testing.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoautomation: add CentOS 7.x image
Doug Goldstein [Sun, 29 Jul 2018 21:53:15 +0000 (16:53 -0500)]
automation: add CentOS 7.x image

This image will always track the latest CentOS 7.x release. Add this
container to containerize for easy access.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl_qmp: Add a warning to not trust QEMU
Anthony PERARD [Fri, 27 Jul 2018 14:05:48 +0000 (15:05 +0100)]
libxl_qmp: Add a warning to not trust QEMU

... even if it is not the case for the current code.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl_qmp: Move the buffer realloc to the same scope level as read
Anthony PERARD [Fri, 27 Jul 2018 14:05:47 +0000 (15:05 +0100)]
libxl_qmp: Move the buffer realloc to the same scope level as read

In qmp_next(), the inner loop should only try to parse messages from
QMP, if there is more than one.

The handling of the receive buffer ('incomplete'), should be done at the
same scope level as read(). It doesn't need to be handle more that once
after a read.

Before this patch, when on message what handled, the inner loop would
restart by adding the 'buffer' into 'incomplete' (after reallocation).
Since 'rd' was not reset, the buffer would be strcat a second time.
After that, the stream from the QMP server would have syntax error, and
the parsor would throw errors.

This is unlikely to happen as the receive buffer is very large. And
receiving two messages in a row is unlikely. In the current case, this
could be an event and a response to a command.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl_json: fix build with DEBUG_ANSWER
Anthony PERARD [Fri, 27 Jul 2018 14:05:46 +0000 (15:05 +0100)]
libxl_json: fix build with DEBUG_ANSWER

Also replace LIBXL__LOG_DEBUG by XTL_DEBUG, because it's shorter and
more often used in libxl.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl_qmp: Fix use of DEBUG_RECEIVED
Anthony PERARD [Fri, 27 Jul 2018 14:05:45 +0000 (15:05 +0100)]
libxl_qmp: Fix use of DEBUG_RECEIVED

This patch fix complilation error with #define DEBUG_RECEIVED of the
macro DEBUG_REPORT_RECEIVED.

  error: field precision specifier ‘.*’ expects argument of type ‘int’, but argument 9 has type ‘ssize_t {aka long int}’

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl_qmp: Documentation of the logic of the QMP client
Anthony PERARD [Fri, 27 Jul 2018 14:05:44 +0000 (15:05 +0100)]
libxl_qmp: Documentation of the logic of the QMP client

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl_event: Fix DEBUG prints
Anthony PERARD [Fri, 27 Jul 2018 14:05:43 +0000 (15:05 +0100)]
libxl_event: Fix DEBUG prints

The libxl__log() call was missing the domid.

The macro DBG is using LIBXL__LOG which rely on a "gc". Add a GC where
needed.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoautomation: introduce a script for build test
Wei Liu [Mon, 23 Oct 2017 15:40:57 +0000 (16:40 +0100)]
automation: introduce a script for build test

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
7 years agoautomation: add debian unstable images
Wei Liu [Mon, 23 Jul 2018 16:57:34 +0000 (17:57 +0100)]
automation: add debian unstable images

This will get us the latest toolchain available in Debian.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
7 years agotools/helpers: don't hardcode domain type for dom0 and xenstore domain
Juergen Gross [Wed, 25 Jul 2018 14:50:40 +0000 (16:50 +0200)]
tools/helpers: don't hardcode domain type for dom0 and xenstore domain

Today when setting up a minimal domain configuration file for dom0 and
eventually xenstore-domain the domain type is harcoded as PV. Change
that by asking the hypervisor for the correct type.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoConfig.mk: update OVMF changeset
Anthony PERARD [Wed, 25 Jul 2018 14:38:23 +0000 (15:38 +0100)]
Config.mk: update OVMF changeset

Simply catching up with upstream.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: use the make wildcard function instead of find
Roger Pau Monne [Mon, 23 Jul 2018 16:00:32 +0000 (18:00 +0200)]
docs: use the make wildcard function instead of find

The regexp used with find in order to list the man pages doesn't work
with FreeBSD find, so use a wildcard instead. No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoautomation: build with 32 bit stretch
Wei Liu [Mon, 23 Jul 2018 08:04:46 +0000 (09:04 +0100)]
automation: build with 32 bit stretch

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
7 years agoxen: correct DEFCONFIG_LIST Kconfig item
Juergen Gross [Tue, 10 Jul 2018 08:31:51 +0000 (10:31 +0200)]
xen: correct DEFCONFIG_LIST Kconfig item

The default value of DEFCONFIG_LIST is wrong: it should be the value of
the configured ARCH_DEFCONFIG item, not the string "$ARCH_DEFCONFIG".

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
7 years agolibxl: add LIBXL_HAVE_EXTENDED_VKB define
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:40 +0000 (19:07 +0300)]
libxl: add LIBXL_HAVE_EXTENDED_VKB define

LIBXL_HAVE_EXTENDED_VKB define indicates that libxl_device_vkb structure has
extended fields.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: vkb add extended parameters
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:39 +0000 (19:07 +0300)]
libxl: vkb add extended parameters

Add parsing and adding to xen store following extended parameters:
* feature-disable-keyboard
* feature-disable-pointer
* feature-abs-pointer
* feature-multi-touch
* feature-raw-pointer
* width
* height
* multi-touch-width
* multi-touch-height
* multi-touch-num-contacts

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: add vkb device to xl.cfg and xl
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:38 +0000 (19:07 +0300)]
docs: add vkb device to xl.cfg and xl

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxl: add vkb config parser and CLI
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:37 +0000 (19:07 +0300)]
xl: add vkb config parser and CLI

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: vkb add list and info functions
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:36 +0000 (19:07 +0300)]
libxl: vkb add list and info functions

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: add backend type and id to vkb
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:35 +0000 (19:07 +0300)]
libxl: add backend type and id to vkb

New field backend_type is added to vkb device in order to have QEMU and user
space backend simultaneously. Each vkb backend shall read appropriate XS entry
and service only own frontends. Id is a string field which used by the backend
to indentify the frontend.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: move vkb device to libxl_vkb.c
Oleksandr Grytsov [Tue, 17 Jul 2018 16:07:34 +0000 (19:07 +0300)]
libxl: move vkb device to libxl_vkb.c

Logically it is better to move vkb to separate file as vkb device used not only
by vfb and console.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/pvh: change the order of the iommu initialization for Dom0
Roger Pau Monné [Tue, 24 Jul 2018 13:55:39 +0000 (15:55 +0200)]
x86/pvh: change the order of the iommu initialization for Dom0

The iommu initialization will also create MMIO mappings in the Dom0
p2m, so the paging memory pool needs to be allocated or else iommu
initialization will fail.

Move the call to init the iommu after the Dom0 p2m has been setup in
order to solve this.

Note that issues caused by this wrong ordering have only been seen
when using shadow paging.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/tboot: avoid recursive fault in early boot panic with tboot
Jason Andryuk [Tue, 24 Jul 2018 13:55:07 +0000 (15:55 +0200)]
x86/tboot: avoid recursive fault in early boot panic with tboot

If panic is called before init_idle_domain on a tboot-launched system,
then Xen recursively faults in write_ptbase as seen below.

(XEN)    [<ffff82d080286690>] write_ptbase+0/0x10
(XEN)    [<ffff82d0802c4c3b>] tboot_shutdown+0x6b/0x260
(XEN)    [<ffff82d08029ddac>] machine_restart+0xac/0x2d0
(XEN)    [<ffff82d080286690>] write_ptbase+0/0x10
(XEN)    [<ffff82d0802446c1>] panic+0x111/0x120
(XEN)    [<ffff82d0802a51c1>] do_general_protection+0x171/0x1f0
(XEN)    [<ffff82d080287a82>] mm.c#virt_to_xen_l2e+0x12/0x1c0
(XEN)    [<ffff82d080354720>] x86_64/entry.S#handle_exception_saved+0x66/0xa4
(XEN)    [<ffff82d080286690>] write_ptbase+0/0x10
(XEN)    [<ffff82d0802c4c3b>] tboot_shutdown+0x6b/0x260
(XEN)    [<ffff82d08029ddac>] machine_restart+0xac/0x2d0
(XEN)    [<ffff82d0802446c1>] panic+0x111/0x120
(XEN)    [<ffff82d0803c11a0>] setup.c#bootstrap_map+0/0x11a
(XEN)    [<ffff82d0803b82a0>] flask_op.c#parse_flask_param+0/0xb0
(XEN)    [<ffff82d0803c11a0>] setup.c#bootstrap_map+0/0x11a
(XEN)    [<ffff82d0803b6f6c>] xsm_multiboot_init+0x7c/0xb0
(XEN)    [<ffff82d0803c34bb>] __start_xen+0x1d2b/0x2da0
(XEN)    [<ffff82d0802000f3>] __high_start+0x53/0x60

idle_vcpu[0] is still poisoned with INVALID_VCPU, so write_ptbase faults
dereferencing the pointer.  This fault calls panic and recurses through
the same code path.

If tboot_shutdown is called while idle_vcpu[0] == INVALID_VCPU, then we
are still operating with the initial page tables.  Therefore changing
page tables with write_ptbase is unnecessary.

An easy way to reproduce this is to use tboot to launch an XSM-enabled
Xen without an XSM policy.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/vhpet: add support for level triggered interrupts
Roger Pau Monné [Tue, 24 Jul 2018 13:54:18 +0000 (15:54 +0200)]
x86/vhpet: add support for level triggered interrupts

Level triggered interrupts are not an optional feature of HPET, and
must be implemented in order to comply with the HPET specification.

Implement them by adding a callback to the timer which sets the
interrupt bit in the general interrupt status register. Further
interrupts (in case of periodic mode) will not be injected until the
bit is cleared.

In order to reset the interrupts when the status bit is clear Xen must
also detect accesses to such register.

While there convert tn and i in hpet_write to unsigned.

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>