xen.git
9 years agobuild: convert debug to Kconfig
Doug Goldstein [Wed, 8 Jun 2016 12:04:30 +0000 (14:04 +0200)]
build: convert debug to Kconfig

Enabling debug will disable NDEBUG which will result in more debug
prints.  There are a number of debugging options for Xen so place the
debug option under a menu for different debugging options to have a way
to group them all together.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/boot: do not create unwind tables
Daniel Kiper [Wed, 8 Jun 2016 12:01:53 +0000 (14:01 +0200)]
x86/boot: do not create unwind tables

This way .eh_frame section is not included in *.lnk and *.bin files.
Hence, final e.g. reloc.bin file size is reduced from 408 bytes to
272 bytes and it contains only used code and data.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoserial: fix incorrect length of strncmp for dtuart
Jiandi An [Wed, 8 Jun 2016 09:10:23 +0000 (11:10 +0200)]
serial: fix incorrect length of strncmp for dtuart

In serial_parse_handler(), length of strncmp for dtuart should have been
6, not 5.

Signed-off-by: Jiandi An <anjiandi@codeaurora.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoRevert "x86/hvm: add support for pcommit instruction"
Haozhong Zhang [Wed, 8 Jun 2016 09:09:54 +0000 (11:09 +0200)]
Revert "x86/hvm: add support for pcommit instruction"

This reverts commit cfacce340608be5f94ce0c8f424487b63c3d5399.

Platforms supporting Intel NVDIMM are now required to provide
persistency once pmem stores are accepted by the memory subsystem.
This is usually achieved by a platform-level feature known as ADR
(Asynchronous DRAM Refresh) that flushes any memory subsystem write
pending queues on power loss/shutdown. Therefore, the pcommit
instruction, which has not yet shipped on any product (and will not),
is no longer needed and is deprecated.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/mce: handle reserved domain ID in XEN_MC_msrinject
Haozhong Zhang [Wed, 8 Jun 2016 09:08:55 +0000 (11:08 +0200)]
x86/mce: handle reserved domain ID in XEN_MC_msrinject

Commit 26646f3 "x86/mce: translate passed-in GPA to host machine
address" and commit 4ddf474 "tools/xen-mceinj: Pass in GPA when
injecting through MSR_MCI_ADDR" forgot to consider reserved domain
ID and mistakenly add MC_MSRINJ_F_GPADDR flag for them, which in turn
causes bug reported by
http://lists.xenproject.org/archives/html/xen-devel/2016-05/msg02640.html.

This patch removes MC_MSRINK_F_GPADDR flag and checks this when injecting
to reserved domain IDs except DOMID_SELF, and treats the passed-in
address as host machine address.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agoOpen Xen 4.8-unstable
Ian Jackson [Tue, 7 Jun 2016 13:49:31 +0000 (14:49 +0100)]
Open Xen 4.8-unstable

* Change version number in README and xen/Makefile to `4.8-unstable'.
* Set `debug ?= y'.
* Set QEMU_UPSTREAM_REVISION to track qemu-xen.git `master'.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
9 years agolibfsimage: replace deprecated readdir_r() with readdir()
Chris Patterson [Fri, 3 Jun 2016 16:50:10 +0000 (12:50 -0400)]
libfsimage: replace deprecated readdir_r() with readdir()

Replace the usage of readdir_r() with readdir() to address a
compilation error under glibc due to the deprecation of readdir_r
for their next release (2.24) [1, 2].

Add new error checking on readdir(), and fail if error occurs.

--

From the GNU libc manual [3]:
"
 It is expected that future versions of POSIX will obsolete readdir_r and
 mandate the level of thread safety for readdir which is provided by the
 GNU C Library and other implementations today.
"

There is a filed bug in the Austin Group Defect Tracker [4]  in which 'dalias'
proposes (in comment 0001632) that:
"
   I would like to propose an alternate solution. For readdir, replace the text:
    "The readdir() function need not be thread-safe."
   with:
    "If multiple threads call the readdir() function with the same directory
    stream argument and without synchronization to preclude simultaneous
    access, then the behavior is undefined."

   With this change, the clunky readdir_r function is no longer needed or
   useful, and should probably be deprecated. As the only reasonable way
   to meet the implementation requirements for readdir is to have the dirent
   buffer in the DIR structure, this change should not require any change to
   existing implementations.
"

[1] https://sourceware.org/ml/libc-alpha/2016-02/msg00093.html
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=19056
[3] https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html
[4] http://austingroupbugs.net/view.php?id=696

Signed-off-by: Chris Patterson <pattersonc@ainfosec.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: replace deprecated readdir_r() with readdir()
Chris Patterson [Fri, 3 Jun 2016 16:50:09 +0000 (12:50 -0400)]
libxl: replace deprecated readdir_r() with readdir()

Replace the usage of readdir_r() with readdir() to address a
compilation error under glibc due to the deprecation of readdir_r
for their next release (2.24) [1, 2].

Remove code specific to usage of readdir_r which is no longer required,
such as zalloc_dirent().

--

From the GNU libc manual [3]:
"
 It is expected that future versions of POSIX will obsolete readdir_r and
 mandate the level of thread safety for readdir which is provided by the
 GNU C Library and other implementations today.
"

There is a filed bug in the Austin Group Defect Tracker [4]  in which 'dalias'
proposes (in comment 0001632) that:
"
   I would like to propose an alternate solution. For readdir, replace the text:
    "The readdir() function need not be thread-safe."
   with:
    "If multiple threads call the readdir() function with the same directory
    stream argument and without synchronization to preclude simultaneous
    access, then the behavior is undefined."

   With this change, the clunky readdir_r function is no longer needed or
   useful, and should probably be deprecated. As the only reasonable way
   to meet the implementation requirements for readdir is to have the dirent
   buffer in the DIR structure, this change should not require any change to
   existing implementations.
"

[1] https://sourceware.org/ml/libc-alpha/2016-02/msg00093.html
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=19056
[3] https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html
[4] http://austingroupbugs.net/view.php?id=696

Signed-off-by: Chris Patterson <pattersonc@ainfosec.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Feature Levelling feature document
Andrew Cooper [Fri, 3 Jun 2016 15:21:46 +0000 (16:21 +0100)]
docs: Feature Levelling feature document

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/cpuid: Calculate a guests xfeature_mask from its featureset
Andrew Cooper [Thu, 2 Jun 2016 11:08:42 +0000 (12:08 +0100)]
x86/cpuid: Calculate a guests xfeature_mask from its featureset

libxc current performs the xstate calculation for guests, and provides the
information to Xen to be used when satisfying CPUID traps.  (There is further
work planned to improve this arrangement, but the worst a buggy toolstack can
do is make junk appear in the cpuid leaves for the guest.)

dom0 however has no policy constructed for it, and certain fields filter
straight through from hardware.

Linux queries CPUID.7[0].{EAX/EDX} alone to choose a setting for %xcr0, which
is a valid action to take, but features such as MPX and PKRU are not supported
for PV guests.  As a result, Linux, using leaked hardware information, fails
to set %xcr0 on newer Skylake hardware with PKRU support, and crashes.

As an interim solution, dynamically calculate the correct xfeature_mask and
xstate_size to report to the guest for CPUID.7[0] queries.  This ensures that
domains don't see leaked hardware values, even when no cpuid policy is
provided.

Similarly, CPUID.7[1]{ECX/EDX} represents the applicable settings for MSR_XSS.
As Xen doesn't yet support any XSS states in guests, unconditionally zero
them.

Reported-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Luwei Kang <luwei.kang@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoVMX: relax incoming BNDCFGS check
Jan Beulich [Fri, 3 Jun 2016 13:28:10 +0000 (15:28 +0200)]
VMX: relax incoming BNDCFGS check

Accepting zero here even when !cpu_has_mpx makes the restore side
symmetric to the save logic (which avoids saving the value if zero),
i.e. makes either side independent of the logic on the other side.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: Don't free p2m->root in p2m_teardown() before it has been allocated
Andrew Cooper [Thu, 2 Jun 2016 13:19:00 +0000 (14:19 +0100)]
xen/arm: Don't free p2m->root in p2m_teardown() before it has been allocated

If p2m_init() didn't complete successfully, (e.g. due to VMID
exhaustion), p2m_teardown() is called and unconditionally tries to free
p2m->root before it has been allocated.  free_domheap_pages() doesn't
tolerate NULL pointers.

This is XSA-181

Reported-by: Aaron Cornelius <Aaron.Cornelius@dornerworks.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
9 years agotmem: Move bulk of tmem control functions in its own file.
Konrad Rzeszutek Wilk [Mon, 16 May 2016 02:47:01 +0000 (22:47 -0400)]
tmem: Move bulk of tmem control functions in its own file.

The functionality that is related to migration is left inside
tmem.c. The list of control operations that are in tmem_control
with XEN_SYSCTL_TMEM_OP prefix are:

DESTROY, FLUSH, FREEZE, THAW, LIST, QUERY_FREEABLE_MB
SAVE_GET_CLIENT_CAP, SAVE_GET_CLIENT_FLAGS,
SAVE_GET_CLIENT_WEIGHT, SAVE_GET_MAXPOOLS,
SAVE_GET_POOL_FLAGS, SAVE_GET_POOL_NPAGES
SAVE_GET_POOL_UUID, SAVE_GET_VERSION
SET_CAP, SET_COMPRESS, SET_WEIGHT

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotmem: Move global_ individual variables in a global structure.
Konrad Rzeszutek Wilk [Mon, 16 May 2016 02:45:56 +0000 (22:45 -0400)]
tmem: Move global_ individual variables in a global structure.

Put them all in one structure to make it easier to
figure out what can be removed. The structure is called
'tmem_global' as it will be eventually non-static.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotmem: Wrap atomic_t in struct tmem_statistics as well.
Konrad Rzeszutek Wilk [Mon, 16 May 2016 02:44:35 +0000 (22:44 -0400)]
tmem: Wrap atomic_t in struct tmem_statistics as well.

The macros: atomic_inc_and_max and atomic_dec_and_assert
use also the 'stats' to access them. Had to open-code
access to pool->pgp_count as it would not work anymore.

No functional change.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotmem: Move global stats in a tmem_statistics structure
Konrad Rzeszutek Wilk [Sun, 15 May 2016 19:27:50 +0000 (15:27 -0400)]
tmem: Move global stats in a tmem_statistics structure

And adjust the macro: atomic_inc_and_max to update the structure.

Sadly one entry: pool->pgp_count cannot use this macro anymore
so unroll the macro for this instance.

No functional change. The name has the 'tmem_stats' as it will
be eventually non-local.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen: Rename of xSplice to livepatch.
Konrad Rzeszutek Wilk [Thu, 2 Jun 2016 00:14:47 +0000 (20:14 -0400)]
xen: Rename of xSplice to livepatch.

Specifically:

s/\.xsplice/\.livepatch/
s/XSPLICE/LIVEPATCH/
s/xsplice/livepatch/
s/livepatch_patch_func/livepatch_func/
s/xSplice/Xen Live Patch/
s/livepatching/livepatch/
s/arch_livepatch_enter/arch_livepatch_quiesce/
s/arch_livepatch_exit/arch_livepatch_revive/

And then modify some of the function arguments
to have two more characters.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Document ~/serial/ correctly
Ian Jackson [Thu, 2 Jun 2016 15:10:32 +0000 (16:10 +0100)]
libxl: Document ~/serial/ correctly

xenstore-paths.markdown talked about ~/device/serial/, but that's not
used.

(It is very wrong for this value, which contains a driver domain
filesystem path, to be in the guest's area of xenstore.  However, it
is only ever created by libxl and ready by xenconsoled.  When it is
created, it inherits the read-only permissions of /local/domain/DOMID.
So there is no security bug.)

This is a followup to XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Cleanup: use libxl__backendpath_parse_domid in libxl__device_disk_from_xs_be
Ian Jackson [Thu, 2 Jun 2016 15:10:31 +0000 (16:10 +0100)]
libxl: Cleanup: use libxl__backendpath_parse_domid in libxl__device_disk_from_xs_be

Rather than an open-coded sscanf.  No functional change with correct
input.

This is a followup to XSA-175 and XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Cleanup: Have libxl__alloc_vdev use /libxl
Ian Jackson [Thu, 2 Jun 2016 15:10:30 +0000 (16:10 +0100)]
libxl: Cleanup: Have libxl__alloc_vdev use /libxl

When allocating a vdev for a new disk, look in /libxl/device, rather
than the frontends directory in xenstore.

This is more in line with the other parts of libxl, which ought not to
trust frontends.  In this case, though, there is no security bug prior
to this patch because the frontend is the toolstack domain itself.

If libxl__alloc_vdev were ever changed to take a frontend domain
argument, this patch will fix a latent security bug.

This is a followup to XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for vusb
Ian Jackson [Thu, 5 May 2016 15:17:26 +0000 (16:17 +0100)]
libxl: Do not trust backend for vusb

Read the type from /libxl, rather than the backend.  (We still trust
the backend for details such as the number of ports, etc.; these are
not a security problem.)

In getinfo, use the computed frontend path, and the incoming domid,
rather than needlessly reading these values from the backend.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: New patch following rebase.

9 years agolibxl: Do not trust backend in channel list
Ian Jackson [Wed, 4 May 2016 15:59:38 +0000 (16:59 +0100)]
libxl: Do not trust backend in channel list

Read the name from /libxl/device.  Pass the /libxl path to
libxl__device_channel_from_xenstore.

This removes the final route by which READ_LIBXLDEV might receive a
backend path.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Remove be_path variable which is now no longer used.

9 years agolibxl: Do not trust backend for nic in list
Ian Jackson [Wed, 4 May 2016 15:23:57 +0000 (16:23 +0100)]
libxl: Do not trust backend for nic in list

libxl_device_nic_list should use the /libxl path to search for
devices, and for obtaining the device information.

The "type" parameter was always "vif".  Abolish it.  (In any case,
paths in /libxl/device are named after the frontend type which is
constant, not the backend type which might in future vary.)

Abolish a redundant store to pnic->backend_domid.  Before this commit,
that store was not needed because libxl_device_nic_init (called by
libxl__device_nic_from_xenstore) would zero it.  Now it overwrites the
correct backend domid with zero; so remove it.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for nic in devid_to_device
Ian Jackson [Wed, 4 May 2016 15:20:05 +0000 (16:20 +0100)]
libxl: Do not trust backend for nic in devid_to_device

libxl_devid_to_device_nic should read the information it needs from
the /libxl/device path, not the backend.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend in nic getinfo
Ian Jackson [Tue, 3 May 2016 15:35:21 +0000 (16:35 +0100)]
libxl: Do not trust backend in nic getinfo

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Have READ_LIBXLDEV use libxl_path rather than be_path
Ian Jackson [Tue, 3 May 2016 14:40:18 +0000 (15:40 +0100)]
libxl: Have READ_LIBXLDEV use libxl_path rather than be_path

Fix the just-introduced bug in this macro: now it reads the
trustworthy libxl_path.  Change the variable name in the two functions
(nic and channel) which use it.

Shuffling the bump in the carpet along, we now introduce three new
bugs: the three call sites pass a backend path where a frontend path
is expected.

No functional change.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Rename READ_BACKEND to READ_LIBXLDEV
Ian Jackson [Wed, 4 May 2016 15:07:02 +0000 (16:07 +0100)]
libxl: Rename READ_BACKEND to READ_LIBXLDEV

We are going to want to change all the functions that use READ_BACKEND
to get untrustworthy information from the backend, to use trustworthy
information from /libxl.

This will involve replacing READ_BACKEND, which reads from be_path,
with a similar macro READ_LIBXLDEV, which reads from libxl_path.

The macro name change generates a lot of clutter in the diff.  So we
break it out into this separate patch.  Here, we rename the macro, but
the implementation does not really match the new name.

So, another way to look at this, is that we have transformed the bug:
 * All of the backends use READ_BACKEND, which is unsafe
into the new bug:
 * READ_LIBXLDEV actually reads be_path, which is unsafe.

There is no functional change as yet.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Rename libxl__device_{nic,channel}_from_xs_be to _from_xenstore
Ian Jackson [Wed, 4 May 2016 15:18:36 +0000 (16:18 +0100)]
libxl: Rename libxl__device_{nic,channel}_from_xs_be to _from_xenstore

We are going to change these functions to expect, and be passed, a
/libxl path.  So it is wrong that they are called _from_xs_be.

Neither function reads anything which isn't found in both places, so
we can and will change the call sites later.

The only remaining function in libxl called *_from_xs_be relates to
PCI devices, for which the backend domain is hardcoded to 0 throughout
the libxl_pci.c.

No functional change.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for channel in getinfo
Ian Jackson [Wed, 4 May 2016 14:57:10 +0000 (15:57 +0100)]
libxl: Do not trust backend for channel in getinfo

Do not read the frontend path out of the backend.  We have it in our
hand.  Likewise the guest (frontend) domid was one of our parameters (!)

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for cdrom insert
Ian Jackson [Fri, 29 Apr 2016 18:13:17 +0000 (19:13 +0100)]
libxl: Do not trust backend for cdrom insert

Use the /libxl path where appropriate.  Rename `path' variable to
`be_path' to make sure we caught all the occurrences.

Specifically, when checking that the device still exists, check the
`frontend' value in /libxl, rather than anything in the backend
directory.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for disk in getinfo
Ian Jackson [Fri, 29 Apr 2016 18:10:45 +0000 (19:10 +0100)]
libxl: Do not trust backend for disk in getinfo

Do not read the frontend path out of the backend.  We have it in our
hand.  Likewise the guest (frontend) domid was one of our parameters (!)

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for disk; fix driver domain disks list
Ian Jackson [Fri, 29 Apr 2016 17:29:45 +0000 (18:29 +0100)]
libxl: Do not trust backend for disk; fix driver domain disks list

Rework libxl__device_disk_from_xs_be (which takes a backend path) into
to libxl__device_disk_from_xenstore (which takes a libxl path).

libxl__device_disk_from_xenstore now finds the backend path itself,
although it doesn't use it any more for most of its functions.  We
rename the variable from be_path to backend_path to make sure we
didn't miss any cases.

All the data collection is now done by reading from the copy in
/libxl.

libxl_device_disk_list and its helper libxl__append_disk_list (which
used to be libxl__append_disk_list_of_type) need extensive rework,
because they now need to specify the /libxl path rather than the
backend path.

To do that they enumerate disks by looking in the appropriate area in
/libxl.  Previously they scanned various of the backend directories in
dom0 (which was broken for driver domains).  It is no longer necessary
to enumerate the various disk backends, because they all use the same
paths in /devices.  libxl__device_disk_from_xenstore will parse the
type out of the backend path, for itself.  (Indeed, it did so before -
the now-gone type parameter to libxl__append_disk_list_of_type wasn't
used other than to construct the directory to list.)

Finally, remove a redundant store to pdisk->backend_domid in
libxl__append_disk_list[_of_type].  Even before this commit, that
store was not needed because libxl_device_disk_init (called by
libxl__device_disk_from_xenstore) would zero it.  Now it overwrites
the correct backend domid with zero; so remove it.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Also fix up COLO reads, following rebase

9 years agolibxl: Do not trust backend for disk eject vdev
Ian Jackson [Fri, 29 Apr 2016 15:23:35 +0000 (16:23 +0100)]
libxl: Do not trust backend for disk eject vdev

For disk eject, use configured vdev from /libxl, not backend.

The backend directory is writeable by driver domains.  This means that
a malicious driver domain could cause libxl to see a wrong vdev,
confusing the user or the toolstack.

Use the vdev from the /libxl space, rather than the backend.

For convenience, we read the vdev from the /libxl space into the evg
during setup and copy it on each event, rather than reading it afresh
each time (which would in any case involve generating or saving a copy
of the relevant /libxl path).

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: cdrom eject and insert: write to /libxl
Ian Jackson [Fri, 29 Apr 2016 18:15:13 +0000 (19:15 +0100)]
libxl: cdrom eject and insert: write to /libxl

Copy the new type and params values to /libxl, so that the information
in /libxl is kept up to date.

This is needed so that we can return this trustworthy information,
rather than trusting the backend-writeable parts of xenstore.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for vtpm in getinfo (uuid)
Ian Jackson [Fri, 29 Apr 2016 15:57:14 +0000 (16:57 +0100)]
libxl: Do not trust backend for vtpm in getinfo (uuid)

Use uuid from /libxl, rather than from backend.  I think the backend
is not supposed to change the uuid, since it seems to be set by libxl
during setup.

If in fact the backend is supposed to be able to change the uuid, this
patch needs to be dropped and replaced by a patch which makes the vtpm
uuid lookup tolerate bad or missing data.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend for vtpm in getinfo (except uuid)
Ian Jackson [Fri, 29 Apr 2016 16:18:44 +0000 (17:18 +0100)]
libxl: Do not trust backend for vtpm in getinfo (except uuid)

* Do not check the backend for existence.  We have already read the
  /libxl path so know that the vtpm exists (or is supposed to); if the
  backend doesn't exist then that must be the backend's doing.
* Get the frontend path from the /libxl directory.
* The frontend domid is the guest domid, and does not need to be read
  from xenstore (!)

We still attempt to read the uuid from the backend.  This will be
fixed in the next patch.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust backend in libxl__device_exists
Ian Jackson [Wed, 4 May 2016 14:04:35 +0000 (15:04 +0100)]
libxl: Do not trust backend in libxl__device_exists

To determine whether a device is supposed to exist, look in /libxl,
rather than the backend.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Make copy of every xs backend in /libxl in _generic_add
Ian Jackson [Fri, 29 Apr 2016 15:19:28 +0000 (16:19 +0100)]
libxl: Make copy of every xs backend in /libxl in _generic_add

We want to stop libxl trustingly reading information from the backend
directory (since this is, of course, writeable by the backend, which
might be a semi-trusted driver domain).

In principle it is wrong in current libxl for anything to try to
divine virtual device configuration from xenstore: the JSON domain
config ought to supply that, and xenstore should only tell us which
devices actually exist.

However:

Firstly, there are several existing places where configuration
information is retrieved from xenstore rather than JSON.  We do not
want to reen gineer this in a security patch.

Secondly, we want to make a security patch which can be backported to
versions of libxl without the JSON configuration machinery.

So we take the expedient approach of keeping a copy of the
configuration somewhere we trust, namely /libxl.  This is obviously
fairly low-risk, although it does write significantly more keys in
xenstore.

In this patch we make this change in libxl__device_generic_add.  This
is responsible for actually writing the vast majority of device
information to xenstore.  There are a few loose ends which will be
dealt with in a moment.

Likewise, changes to readers to use the new location will appear in
further patches.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for vusb
Ian Jackson [Thu, 5 May 2016 15:17:18 +0000 (16:17 +0100)]
libxl: Do not trust frontend for vusb

Do not use the frontend directory for enumerating the vusb devices;
since the frontend could delete them, this could result in devices
being lost and not torn down, etc.  Instead, use the /libxl directory
for enumeration.  So:

* Replace vusb_be_from_xs_fe with vusb_be_from_xs_libxl
* Change the call sites
* Change various places that use the dompath to use libxl_dom_path
* Rename some `path' variables appropriate (to spot any missed updates)
* Parse backend domid out of backend path rather than reading it from
  the frontend (several places)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v3: Whitespace adjustment to parameter list indentation
v2: New patch, following rebase

9 years agolibxl: Do not trust frontend for channel in getinfo
Ian Jackson [Tue, 3 May 2016 16:24:32 +0000 (17:24 +0100)]
libxl: Do not trust frontend for channel in getinfo

libxl_device_channel_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.
* Tolerate FRONTEND/tty vanishing.

Note that there is a strange off-by-one error in the computation of
both fe_path and libxl_path in libxl_device_channel_getinfo: the
incoming channel->devid, which is copied to channelinfo->devid, has +1
applied to calculate the frontend path (and, after this patch, the
libxl path).  I.e., the devid passed to libxl_device_channel_getinfo
must be one less than the actual devid for the device being asked
about.

This is actually a bug which mirrors a bug in
libxl__append_channel_list, which fills in the devids of the channel
devices it finds with sequentially increasing numbers starting at 0.

In the usual case channels have real devids starting at 1 (because
there is the console, which is devid 0, but not a channel).  So these
bugs usually cancel out.

We do not address this problem at this time.  This bug does not have
any security implications.

This patch is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for channel in list
Ian Jackson [Tue, 3 May 2016 16:01:56 +0000 (17:01 +0100)]
libxl: Do not trust frontend for channel in list

libxl_device_channel_list should not trust frontend-provided data.

So it needs to iterate using the /libxl paths, and read the backend
path out of /libxl.

However, it also filters out pure "consoles", which are channels
without a "name".  But the name was stored only in the frontend
directory, which the frontend can delete.

So store the name in the backend too.  (Ideally we would store it in
/libxl, where the backend can't write to it either, but
libxl__device_console_add not currently have access to the xenstore
transaction used by libxl__device_generic_add.  Protection against the
backend will come later, in XSA-178.)

Because the libxl paths are defined to be in terms of the frontend
device types, not the backend device types, it is no longer correct
for libxl__append_channel_list to take a type argument.  Abolish this
(with no functional effect).

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for nic in getinfo
Ian Jackson [Tue, 3 May 2016 15:31:07 +0000 (16:31 +0100)]
libxl: Do not trust frontend for nic in getinfo

libxl_device_nic_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for nic in libxl_devid_to_device_nic
Ian Jackson [Tue, 3 May 2016 14:52:53 +0000 (15:52 +0100)]
libxl: Do not trust frontend for nic in libxl_devid_to_device_nic

Find the backend by reading the pointer in /libxl rather than in the
guest's frontend area.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for vtpm in getinfo
Ian Jackson [Tue, 3 May 2016 15:00:20 +0000 (16:00 +0100)]
libxl: Do not trust frontend for vtpm in getinfo

libxl_device_vtpm_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for vtpm list
Ian Jackson [Tue, 3 May 2016 14:58:32 +0000 (15:58 +0100)]
libxl: Do not trust frontend for vtpm list

libxl_device_vtpm_list needs to enumerate and identify devices without
trusting frontend-controlled data.  So

* Use the /libxl path to enumerate vtpms.
* Use the /libxl path to find the corresponding backends.
* Parse the backend path to find the backend domid.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for disk in getinfo
Ian Jackson [Fri, 29 Apr 2016 18:21:51 +0000 (19:21 +0100)]
libxl: Do not trust frontend for disk in getinfo

* Rename the frontend variable to `fe_path' to check we caught them all
* Read the backend path from /libxl, rather than from the frontend
* Parse the backend domid from the backend path, rather than reading it
  from the frontend (and add the appropriate error path and initialisation)

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend for disk eject event
Ian Jackson [Wed, 27 Apr 2016 15:08:49 +0000 (16:08 +0100)]
libxl: Do not trust frontend for disk eject event

Use the /libxl path for interpreting disk eject watch events: do not
read the backend path out of the frontend.  Instead, use the version
in /libxl.  That avoids us relying on the guest-modifiable
$frontend/backend pointer.

To implement this we store the path
  /libxl/$guest/device/vbd/$devid/backend
in the evgen structure.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend in libxl__device_nextid
Ian Jackson [Wed, 4 May 2016 14:30:32 +0000 (15:30 +0100)]
libxl: Do not trust frontend in libxl__device_nextid

When selecting the devid for a new device, we should look in
/libxl/device for existing devices, not in the frontend area.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Do not trust frontend in libxl__devices_destroy
Ian Jackson [Tue, 3 May 2016 17:39:36 +0000 (18:39 +0100)]
libxl: Do not trust frontend in libxl__devices_destroy

We need to enumerate the devices we have provided to a domain, without
trusting the guest-writeable (or, at least, guest-deletable) frontend
paths.

Instead, enumerate via, and read the backend path from, /libxl.

The console /libxl path is regular, so the special case for console 0
is not relevant any more: /libxl/GUEST/device/console/0 will be found,
and then libxl__device_destroy will DTRT to the right frontend path.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Provide libxl__backendpath_parse_domid
Ian Jackson [Wed, 27 Apr 2016 15:34:19 +0000 (16:34 +0100)]
libxl: Provide libxl__backendpath_parse_domid

Multiple places in libxl need to figure out the backend domid of a
device.  This can be discovered easily by looking at the backend path,
which always starts /local/domain/$backend_domid/.

There are no call sites yet.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Record backend/frontend paths in /libxl/$DOMID
Ian Jackson [Wed, 27 Apr 2016 14:50:01 +0000 (15:50 +0100)]
libxl: Record backend/frontend paths in /libxl/$DOMID

This gives us a record of all the backends we have set up for a
domain, which is separate from the frontends in
  /local/domain/$DOMID/device.

In particular:

1. A guest has write permission for the frontend path:
  /local/domain/$DOMID/device/$KIND/$DEVID
which means that the guest can completely delete the frontend.
(They can't recreate it because they don't have write permission
on the containing directory.)

2. A guest has write permission for the backend path recorded in the
frontend, ie, it can write to
  /local/domain/$DOMID/device/$KIND/$DEVID/backend
which means that the guest can break the association between
frontend and backend.

So we can't rely on iterating over the frontends to find all the
backends, or examining a frontend to discover how a device is
configured.

So, have libxl__device_generic_add record the frontend and backend
paths in /libxl/$DOMID/device, and have libxl__device_destroy remove
them again.

Create the containing directory /libxl/GUEST/device in
libxl__domain_make.  The already existing xs_rm in devices_destroy_cb
will take care of removing it.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Correct actual path computation (!)
v3: Correct actual path computation - really this time (!)

9 years agoAMD IOMMU: remove currently non-functioning guest iommu feature
Suravee Suthikulpanit [Thu, 2 Jun 2016 11:12:35 +0000 (13:12 +0200)]
AMD IOMMU: remove currently non-functioning guest iommu feature

The guest IOMMU feature is currently not functioning. However,
the current guest_iommu_init() is causing issue when it tries to
register mmio handler because the it is currently called by the
following code path:

  arch/x86/domain.c: arch_domain_create()
    ]- drivers/passthrough/iommu.c: iommu_domain_init()
      |- drivers/passthrough/amd/pci_amd_iommu.c: amd_iommu_domain_init();
        |- drivers/passthrough/amd/iommu_guest.c: guest_iommu_init()

At this point, the hvm_domain_initialised() has not been called.
So register_mmio_handler() in guest_iommu_init() silently fails.

This patch removes the guest IOMMU feature for now until we can properly
support it.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/HVM: don't calculate XSTATE area sizes in software
Jan Beulich [Thu, 2 Jun 2016 07:41:07 +0000 (09:41 +0200)]
x86/HVM: don't calculate XSTATE area sizes in software

Use hardware output instead, bringing HVM behavior in line with PV one
in this regard.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86: flush high xstate CPUID sub-leaves to zero
Jan Beulich [Thu, 2 Jun 2016 07:40:08 +0000 (09:40 +0200)]
x86: flush high xstate CPUID sub-leaves to zero

In line with other recent changes, these should be fully white listed,
requiring us to zero them until they obtain a meaning we support.

Without XSAVE support, all xstate sub-leaves should be zero.

Also move away from checking host XSAVE support - we really ought to
consider the guest flag for that purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: keep PoD target adjustment by memory fudge after reload_domain_config()
Vitaly Kuznetsov [Wed, 3 Feb 2016 15:53:03 +0000 (16:53 +0100)]
libxl: keep PoD target adjustment by memory fudge after reload_domain_config()

Commit 56fb5fd623 ("libxl: adjust PoD target by memory fudge, too")
introduced target_memkb adjustment for HVM PoD domains on create,
wherein the value it wrote to target is always 1MiB lower than the
actual target_memkb.  Unfortunately, on reboot, it is this value which
is read *unmodified* to feed into the next domain creation; from which
1MiB is subtracted *again*.  This means that any guest which reboots
with memory < maxmem will have its memory target decreased by 1MiB on
every boot.

This patch makes it so that when reading target on reboot, we adjust the
value we read *up* by 1MiB, so that the domain will be build with the
appropriate amount of memory and the target will remain the same after
reboot.

This is still not quite a complete fix, as the 1MiB offset is only
subtracted when creating or rebooting; it is not subtracted when 'xl
set-memory' is called.  But it will prevent any situations where memory
is continually increased or decreased.  A better fix will have to wait
until after the release.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: warn the user that we cannot route SPIs to Dom0 on ACPI
Stefano Stabellini [Wed, 1 Jun 2016 14:38:51 +0000 (15:38 +0100)]
xen/arm: warn the user that we cannot route SPIs to Dom0 on ACPI

as a consequence of 9d77b3c01d1261ce17c10097a1b393f2893ca657 being
reverted.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
9 years agoxen/arm: arm64: Remove MPIDR multiprocessing extensions check
Wei Chen [Tue, 31 May 2016 02:54:14 +0000 (10:54 +0800)]
xen/arm: arm64: Remove MPIDR multiprocessing extensions check

In AArch32, MPIDR bit 31 is defined as multiprocessing extensions bit.
But in AArch64, this bit is always RES1. So the value check for this
bit is no longer necessary in AArch64.

Signed-off-by: Wei Chen <Wei.Chen@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen:arm: arm64: Add correct MPIDR_HWID_MASK value for ARM64
Wei Chen [Tue, 31 May 2016 02:54:13 +0000 (10:54 +0800)]
xen:arm: arm64: Add correct MPIDR_HWID_MASK value for ARM64

Currently, MPIDR_HWID_MASK is using the bit definition of AArch32
MPIDR register. But from D7.2.67 of ARM ARM (DDI 0487A.i) we can see
there are 4 levels of affinity on AArch64 whilst AArch32 has only 3.
So, this value is not correct when Xen is running on AArch64.

Now, we use the value 0xff00ffffff for this macro on AArch64. But
neither of this value and its bitwise invert value can be used in mov
instruction with the encoding of {imm16:shift} or {imms:immr}. So we
have to use ldr to load the bitwise invert value to register.

The details of mov immediate encoding are listed in C4.2.5 of ARM ARM
(DDI 0487A.i).

Signed-off-by: Wei Chen <Wei.Chen@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen/arm: Make AFFINITY_MASK generate correct mask for level3
Wei Chen [Tue, 31 May 2016 02:54:12 +0000 (10:54 +0800)]
xen/arm: Make AFFINITY_MASK generate correct mask for level3

The original affinity shift bits algorithm in AFFINITY_MASK is buggy,
it could not generate correct affinity shift bits of level3.
The macro MPIDR_LEVEL_SHIFT can calculate level3 affinity shift bits
correctly. We use this macro in AFFINITY_MASK to generate correct
mask for level3.

Signed-off-by: Wei Chen <Wei.Chen@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen/arm: Change the variable type of cpu_logical_map to register_t
Wei Chen [Tue, 31 May 2016 02:54:11 +0000 (10:54 +0800)]
xen/arm: Change the variable type of cpu_logical_map to register_t

The cpu_logical_map is used to store CPU hardware ID from MPIDR_EL1 or
from CPU node of DT. Currently, the cpu_logical_map is using the u32 as
its variable type. It can work properly while Xen is running on ARM32,
because the hardware ID is 32-bits. While Xen is running on ARM64, the
hardware ID expands to 64-bits and then the cpu_logical_map will overflow.

Change the variable type of cpu_logical_map to register_t will make
cpu_logical_map to store hardware IDs correctly on ARM32 and ARM64.

Signed-off-by: Wei Chen <Wei.Chen@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoRevert "arm/acpi: Configure SPI interrupt type and route to Dom0 dynamically"
Stefano Stabellini [Wed, 1 Jun 2016 09:54:45 +0000 (10:54 +0100)]
Revert "arm/acpi: Configure SPI interrupt type and route to Dom0 dynamically"

This reverts commit 9d77b3c01d1261ce17c10097a1b393f2893ca657.

The commit is causing a dead loop inside the spinlock function.
spinlocks in Xen are not recursive. Re-acquiring a spinlock that was
already taken by the calling cpu leads to deadlock. This happens
whenever dom0 writes to GICD regs ISENABLER/ICENABLER.

DOM0 writes GICD_ISENABLER/GICD_ICENABLER
  vgic_v3_distr_common_mmio_write()
    vgic_lock_rank()  -->  acquiring first time
      vgic_enable_irqs()
        route_irq_to_guest()
          gic_route_irq_to_guest()
            vgic_get_target_vcpu()
              vgic_lock_rank()  -->  attemping acquired lock

Reported-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen/Makefile: quote HOSTCC and HOSTCXX args
Chris Patterson [Tue, 31 May 2016 15:13:52 +0000 (11:13 -0400)]
xen/Makefile: quote HOSTCC and HOSTCXX args

In some cross-compilation environments, the CC/CXX variables may
expand out to more than one argument (to include things
like --sysroot=...).  Quote these to safely pass along.

Signed-off-by: Chris Patterson <pattersonc@ainfosec.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: fix assembler instruction tests again
Jan Beulich [Tue, 31 May 2016 16:14:22 +0000 (18:14 +0200)]
build: fix assembler instruction tests again

Commit 7fb252bd41 ("build/xen: fix assembler instruction tests") added
$(AFLAGS) here, which results in all of those tests now failing.
Certain items need to be removed for things to work again.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: Document the behavior of XENMAPSPACE_dev_mmio
Julien Grall [Fri, 27 May 2016 16:06:21 +0000 (17:06 +0100)]
xen/arm: Document the behavior of XENMAPSPACE_dev_mmio

Currently, XENMAPSPACE_dev_mmio maps MMIO regions using one of the most
restrictive memory attribute (Device-nGnRE).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoxen: XENMEM_add_to_physmap_batch: Mark 'foreign_id' as reserved for dev_mmio
Julien Grall [Fri, 27 May 2016 16:06:20 +0000 (17:06 +0100)]
xen: XENMEM_add_to_physmap_batch: Mark 'foreign_id' as reserved for dev_mmio

The field 'foreign_id' is not used when the space is dev_mmio. As the
space is not yet part of the stable ABI, the field is marked as reserved
for future use.

The value should always be 0, other values will return -EOPNOTSUPP.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoRevert "x86/mce: handle reserved domain ID in XEN_MC_msrinject"
Wei Liu [Fri, 27 May 2016 16:16:36 +0000 (17:16 +0100)]
Revert "x86/mce: handle reserved domain ID in XEN_MC_msrinject"

This reverts commit 55dc7f61260f4becc6c5e52a8155a6b8741c03cc.

The get_maintainer.pl script showed Jan as the maintainer so I pushed
this patch. But in fact according to MAINTAINERS file, he's not.  Revert
this patch and wait until a maintainer acks it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/mce: handle reserved domain ID in XEN_MC_msrinject
Haozhong Zhang [Fri, 27 May 2016 13:30:33 +0000 (21:30 +0800)]
x86/mce: handle reserved domain ID in XEN_MC_msrinject

Commit 26646f3 "x86/mce: translate passed-in GPA to host machine
address" and commit 4ddf474 "tools/xen-mceinj: Pass in GPA when
injecting through MSR_MCI_ADDR" forgot to consider reserved domain
ID and mistakenly add MC_MSRINJ_F_GPADDR flag for them, which in turn
causes bug reported by
http://lists.xenproject.org/archives/html/xen-devel/2016-05/msg02640.html.

This patch removes MC_MSRINK_F_GPADDR flag and checks this when injecting
to reserved domain IDs except DOMID_SELF, and treats the passed-in
address as host machine address.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoConfig.mk: update qemu-xen tag
Anthony PERARD [Fri, 27 May 2016 15:00:54 +0000 (16:00 +0100)]
Config.mk: update qemu-xen tag

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/compat: correct SMEP/SMAP NOPs patching
Jan Beulich [Thu, 26 May 2016 16:26:24 +0000 (17:26 +0100)]
x86/compat: correct SMEP/SMAP NOPs patching

Correct the number of single byte NOPs we want to be replaced in case
neither SMEP nor SMAP are available.

Also simplify the expression adding these NOPs - at that location .
equals .Lcr4_orig, and removing that part of the expression fixes a
bogus ".space or fill with negative value, ignored" warning by very old
gas (which actually is what made me look at those constructs again).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/psr: make opt_psr persistent
Chao Peng [Thu, 26 May 2016 02:03:13 +0000 (10:03 +0800)]
x86/psr: make opt_psr persistent

opt_psr is now not only used at booting time but also at runtime.
More specifically, it is used to check CDP switch in psr_cpu_init()
which can potentially be called in CPU hotplug case.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: set XEN_QEMU_CONSOLE_LIMIT for QEMU
Wei Liu [Thu, 26 May 2016 15:11:42 +0000 (16:11 +0100)]
libxl: set XEN_QEMU_CONSOLE_LIMIT for QEMU

XSA-180 provides a patch to QEMU to bodge QEMU logging issue. We
explicitly set the limit in libxl for 4.7.

Introduce a function for setting the environment variable and call it in
the right places.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: update xl manpage about {block,network}-attach command
Wei Liu [Mon, 23 May 2016 11:07:20 +0000 (12:07 +0100)]
docs: update xl manpage about {block,network}-attach command

State that only attaching PV interface is supported.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: Don't call setup_virtual_regions multiple time
Julien Grall [Wed, 25 May 2016 13:14:06 +0000 (14:14 +0100)]
xen/arm: Don't call setup_virtual_regions multiple time

The commit 2aa925be84293b44ad587ed117184ace61b41dd6 "arm/x86: Use struct
virtual_region to do bug, symbol, and (x86) exception tables lookup."
has introduced virtual_region. The call to initialize those regions is
made in init_traps which is called during each CPU bring up.

This will result to register multiple time the same region and Xen crash
when an address is looked up.

This can be fixed by moving the call to setup_virtual_region directly in
start_xen.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Chenxia Zhao <chenxiao.zhao@gmail.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: use xstrdup in cpurange_parse
Wei Liu [Wed, 25 May 2016 13:23:56 +0000 (14:23 +0100)]
xl: use xstrdup in cpurange_parse

This ensures buf is always valid when it is passed to strtok_r.

CID: 1291936

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoRTDS: fix another instance of the 'read NOW()' race
Dario Faggioli [Wed, 25 May 2016 12:33:57 +0000 (14:33 +0200)]
RTDS: fix another instance of the 'read NOW()' race

which was overlooked in 779511f4bf5ae ("sched: avoid
races on time values read from NOW()").

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: drop stray const from function return type
Jan Beulich [Tue, 24 May 2016 15:28:19 +0000 (09:28 -0600)]
libxl: drop stray const from function return type

Some compiler versions warn about this, causing the build to fail due
to -Werror.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: bump library version numbers
Wei Liu [Wed, 18 May 2016 13:35:40 +0000 (14:35 +0100)]
tools: bump library version numbers

The following libraries are checked:
1. libxc, version number bumped
2. libxl, version number bumped
3. libxlu, no development in 4.7 cycle, but depends on libxl, version
   number bumped
4. libs/*, new in 4.7 cycle, version numbers not bumped
5. libxenstore, no development in 4.7 cycle, version number not bumped
6. libxenstat, no development in 4.7 cycle, version number not bumped
7. libvchan, depends on  xengnttab library, API changed, version number
   bumped

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Avoid advertising about device_model_user config option
Anthony PERARD [Tue, 24 May 2016 14:45:36 +0000 (15:45 +0100)]
libxl: Avoid advertising about device_model_user config option

Running QEMU as non-root user is not ready yet, so replace the warning
with a debug message and remove the option from the man page.

Also improve the doc to include more potential issue with running QEMU
as non-root.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoConfig.mk: non-debug build by default
Wei Liu [Mon, 23 May 2016 16:17:57 +0000 (17:17 +0100)]
Config.mk: non-debug build by default

Set debug ?= n in preparation for late RCs and eventual release.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: nic type defaults to vif in hotplug for hvm guest
Wei Liu [Fri, 20 May 2016 17:16:05 +0000 (18:16 +0100)]
libxl: nic type defaults to vif in hotplug for hvm guest

We don't support plugging in emulated nic to a HVM guest.

The "update_json" flag is only set when doing hotplug, so use that as an
indicator in libxl__device_nic_add. The new hotplug flag to _setdefault
function should be false in all other locations.

This then requires saving nic type in JSON file, because we don't want
the receiving end to recalculate the nic type.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agosched: avoid races on time values read from NOW()
Dario Faggioli [Mon, 23 May 2016 12:39:51 +0000 (14:39 +0200)]
sched: avoid races on time values read from NOW()

or (even in cases where there is no race, e.g., outside
of Credit2) avoid using a time sample which may be rather
old, and hence stale.

In fact, we should only sample NOW() from _inside_
the critical region within which the value we read is
used. If we don't, in case we have to spin for a while
before entering the region, when actually using it:

 1) we will use something that, at the veryy least, is
    not really "now", because of the spinning,

 2) if someone else sampled NOW() during a critical
    region protected by the lock we are spinning on,
    and if we compare the two samples when we get
    inside our region, our one will be 'earlier',
    even if we actually arrived later, which is a
    race.

In Credit2, we see an instance of 2), in runq_tickle(),
when it is called by csched2_context_saved() as it samples
NOW() before acquiring the runq lock. This makes things
look like the time went backwards, and it confuses the
algorithm (there's even a d2printk() about it, which would
trigger all the time, if enabled).

In RTDS, something similar happens in repl_timer_handler(),
and there's another instance in schedule() (in generic code),
so fix these cases too.

While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
a little as well (removing a pointless initialization, and
moving the sampling a bit closer to its use). These two hunks
entail no further functional changes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs/feature: Tweaks to the feature document template
Andrew Cooper [Tue, 26 Apr 2016 17:34:34 +0000 (18:34 +0100)]
docs/feature: Tweaks to the feature document template

During review of the migration feature doc, some changes were made which
should have been reflected in the template.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs/xsplice: Fix syntax when compiling to pdf with pandoc
Andrew Cooper [Sun, 15 May 2016 13:09:30 +0000 (14:09 +0100)]
docs/xsplice: Fix syntax when compiling to pdf with pandoc

Pandoc (version 1.12.4.2 from Debian Jessie) complains at the embedded \n in
the signature checking paragraph.

  /usr/bin/pandoc --number-sections --toc --standalone misc/xsplice.markdown
    --output pdf/misc/xsplice.pdf
  ! Undefined control sequence.
  l.1085 appended\textasciitilde{}\n

Surround the string in backticks to make it verbatim text.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs/build: Avoid using multi-target pattern rules
Andrew Cooper [Mon, 25 Apr 2016 12:55:12 +0000 (12:55 +0000)]
docs/build: Avoid using multi-target pattern rules

Multi-target non-pattern rules and Multi-target pattern rules behave rather
differently.  From `Pattern Intro':

  Pattern rules may have more than one target.  Unlike normal rules, this does
  not act as many different rules with the same prerequisites and commands.
  If a pattern rule has multiple targets, `make' knows that the rule's
  commands are responsible for making all of the targets.  The commands are
  executed only once to make all the targets.

The intended use of the multi-target pattern rules was to avoid repeating the
identical recipe multiple times.  The issue can be demonstrated with the
generation of documentation from pandoc source.

  ./xen.git$ touch docs/features/template.pandoc
  ./xen.git$ make -C docs/
  # Regenerates html/features/template.html
  ./xen.git$ make -C docs/
  # Regenerates txt/features/template.txt
  ./xen.git$ make -C docs/
  # Regenerates pdf/features/template.pdf

To work around this, there need to be three distinct rules, so the execution
of one recipe doesn't short ciruit the others.  To avoid copy&paste
duplication, introduce a metarule, and evalute it for each document target.

As $(PANDOC) is used to generate documentation from different source types,
the metarule can be extended to also encompas the rule to create pdfs from
markdown.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoRevert "Config.mk: update ovmf changeset"
Wei Liu [Mon, 23 May 2016 10:11:39 +0000 (11:11 +0100)]
Revert "Config.mk: update ovmf changeset"

This reverts commit 1542efcea893df874b13b1ea78101e1ff6a55c41. It fails
consistently on our Debian HVM test when the VM has more than 4G of
memory.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: p2m: Release the p2m lock before undoing the mappings
Julien Grall [Fri, 20 May 2016 13:37:42 +0000 (14:37 +0100)]
xen/arm: p2m: Release the p2m lock before undoing the mappings

Since commit 4b25423a "arch/arm: unmap partially-mapped memory regions",
Xen has been undoing the P2M mappings when an error occurred during
insertion or memory allocation.

This is done by calling recursively apply_p2m_changes, however the
second call is done with the p2m lock taken which will result in a
deadlock for the current processor.

The p2m lock is here to protect 2 threads modifying concurrently the
page tables. However, it does not guarantee the ordering of the
changes. I.e if 2 threads request change on regions that overlaps,
then the result is undefined.

Therefore it is fine to move the recursive call to undo the changes
after the lock is released.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Wei Chen <Wei.Chen@arm.com>
9 years agoxen/arm: p2m: apply_p2m_changes: Do not undo more than necessary
Julien Grall [Fri, 20 May 2016 13:37:41 +0000 (14:37 +0100)]
xen/arm: p2m: apply_p2m_changes: Do not undo more than necessary

Since commit 4b25423a "arch/arm: unmap partially-mapped memory regions",
Xen has been undoing the P2M mappings when an error occurred during
insertion or memory allocation.

The function apply_p2m_changes can work with region not-aligned to a
block size (2MB, 1G) or page size (4K). The mapping will be done by
splitting the region in a set of regions aligned to the size supported
by the page table.

The mapping of a region could fail when it is not possible to allocate
memory for an intermediate table (i.e a new or when shattering a block).

When the mapping is undone, the end of the region is computed using the
base address of the current region and the size of the failing level.
However the failing level may not be the leaf one, therefore unrelated
entries will be removed.

Fix it by removing the mapping from the start address up to the last
region that has been successfully mapped.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agolibxl: consolidate casting to xc psr type to a function
Wei Liu [Wed, 18 May 2016 15:19:45 +0000 (16:19 +0100)]
libxl: consolidate casting to xc psr type to a function

In commit 31268fea (libxl: fix passing the type argument to xc_psr_*),
casting to xc psr type was done at each call site.

This patch consolidates casting into a function to avoid casting at each
conversion point. Each call site remains more type safe.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: fix a spurious newline
Dario Faggioli [Thu, 19 May 2016 04:04:55 +0000 (06:04 +0200)]
xenalyze: fix a spurious newline

in dump mode, when tracing context switches.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86emul: suppress writeback upon unsuccessful MMX/SSE/AVX insn emulation
Jan Beulich [Thu, 19 May 2016 10:06:33 +0000 (12:06 +0200)]
x86emul: suppress writeback upon unsuccessful MMX/SSE/AVX insn emulation

This in particular prevents updating guest IP when handling the retry
needed to forward the memory access to qemu.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/nested_p2m: Don't walk EPT tables with a regular PT walker
Andrew Cooper [Tue, 12 Apr 2016 18:57:35 +0000 (19:57 +0100)]
xen/nested_p2m: Don't walk EPT tables with a regular PT walker

hostmode->p2m_ga_to_gfn() is a plain PT walker, and is not appropriate for a
general L1 p2m walk.  It is fine for AMD as NPT share the same format as
normal pagetables.  For Intel EPT however, it is wrong.

The translation ends up correct (as the formats are sufficiently similar), but
the control bits in lower 12 bits differ in meaning.  A plain PT walker sets
A/D bits (bits 5 and 6) as it walks, but in EPT tables, these are the IPAT and
top bit of EMT (caching type).  This in turn causes problem when the EPT
tables are subsequently used.

Replace hostmode->p2m_ga_to_gfn() with nestedhap_walk_L1_p2m() in
paging_gva_to_gfn(), which is the correct function for the task.  This
involves making nestedhap_walk_L1_p2m() non-static, and adding
vmx_vmcs_enter/exit() pairs to nvmx_hap_walk_L1_p2m() as it is now reachable
from contexts other than v == current.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Wed, 18 May 2016 14:46:13 +0000 (15:46 +0100)]
QEMU_TAG update

9 years agoConfig.mk: update qemu-xen tag
Anthony PERARD [Wed, 18 May 2016 11:35:22 +0000 (12:35 +0100)]
Config.mk: update qemu-xen tag

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoConfig.mk: update ovmf changeset
Wei Liu [Wed, 18 May 2016 10:48:25 +0000 (11:48 +0100)]
Config.mk: update ovmf changeset

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/device-tree: Do not remap IRQs for secondary IRQ controllers
Edgar E. Iglesias [Tue, 17 May 2016 12:15:50 +0000 (14:15 +0200)]
xen/device-tree: Do not remap IRQs for secondary IRQ controllers

Do not remap IRQs connected to secondary interrupt controllers.
These IRQs have no meaning to us until they connect to the
primary controller.

Secondary IRQ controllers will at some point connect to the
primary controller (possibly via other IRQ controllers). We
map the IRQs at that last connection point.

Reviewed-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
9 years agox86/cpuid: Avoid unconditionally clobbering ITSC for guests
Andrew Cooper [Fri, 13 May 2016 18:38:41 +0000 (19:38 +0100)]
x86/cpuid: Avoid unconditionally clobbering ITSC for guests

In general, Invariant TSC is not a feature which can be advertised to guests,
because it cannot be guaranteed across migrate.  domain_cpuid() goes so far as
to deliberately clobber the feature flag under a number of circumstances.

Because ITSC is absent from the static {pv,hvm}_featureset masks, c/s b648feff
"xen/x86: Improvements to in-hypervisor cpuid sanity checks" caused ITSC to be
unconditionally masked out.

As an interim solution, include the hosts idea of ITSC along with the static
{pv,hvm}_featureset when restricting the guests view of features.  This causes
the hardware domain, and VMs explicitly configured with ITSC and no-migrate to
be offered ITSC (subject to hardware availability).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86: make SMEP/SMAP suppression tolerate NMI/MCE at the "wrong" time
Jan Beulich [Tue, 17 May 2016 14:42:15 +0000 (16:42 +0200)]
x86: make SMEP/SMAP suppression tolerate NMI/MCE at the "wrong" time

There is one instruction boundary where any kind of interruption would
break the assumptions cr4_pv32_restore's debug mode checking makes on
the correlation between the CR4 register value and its in-memory cache.
Correct this (see the code comment) even in non-debug mode, or else
a subsequent cr4_pv32_restore would also be misguided into thinking the
features are enabled when they really aren't.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86: refine debugging of SMEP/SMAP fix
Jan Beulich [Tue, 17 May 2016 14:41:35 +0000 (16:41 +0200)]
x86: refine debugging of SMEP/SMAP fix

Instead of just latching cr4_pv32_mask into %rdx, correct the found
wrong value in %cr4 (to avoid triggering another BUG). The value left
in %rdx should be sufficient for deducing cr4_pv32_mask from the
register dump.

Also there is one more place for XEN_CR4_PV32_BITS to be used.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/mm: fully honor PS bits in guest page table walks
Jan Beulich [Tue, 17 May 2016 12:41:14 +0000 (14:41 +0200)]
x86/mm: fully honor PS bits in guest page table walks

In L4 entries it is currently unconditionally reserved (and hence
should, when set, always result in a reserved bit page fault), and is
reserved on hardware not supporting 1Gb pages (and hence should, when
set, similarly cause a reserved bit page fault on such hardware).

This is CVE-2016-4480 / XSA-176.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoxen/arm: mm: fix nr_second calculation in setup_frametable_mappings
Peng Fan [Thu, 12 May 2016 13:03:08 +0000 (21:03 +0800)]
xen/arm: mm: fix nr_second calculation in setup_frametable_mappings

On ARM64, "frametable_size >> SECOND_SHIFT" computes the number
of second level entries, not the number of second level pages.

"ROUNDUP(frametable_size, FIRST_SIZE) >> FIRST_SHIFT" which computes
the number of the first level entries (the number of second level pages),
is the correct one that should be used.

Signed-off-by: Peng Fan <van.freenix@gmail.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>