xen.git
10 years agoxen/arm: Data abort exception (R/W) mem_access events
Tamas K Lengyel [Mon, 20 Apr 2015 15:06:18 +0000 (17:06 +0200)]
xen/arm: Data abort exception (R/W) mem_access events

This patch enables to store, set, check and deliver LPAE R/W mem_events.
As the LPAE PTE's lack enough available software programmable bits,
we store the permissions in a Radix tree. The tree is only looked at if
mem_access_enabled is turned on.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Allow hypervisor access to mem_access protected pages
Tamas K Lengyel [Mon, 20 Apr 2015 15:06:17 +0000 (17:06 +0200)]
xen/arm: Allow hypervisor access to mem_access protected pages

The hypervisor may use the MMU to verify that the given guest has read/write
access to a given page during hypercalls. As we may have custom mem_access
permissions set on these pages, we do a software-based type checking in case
the MMU based approach failed, but only if mem_access_enabled is set.

These memory accesses are not forwarded to the mem_event listener. Accesses
performed by the hypervisor are currently not part of the mem_access scheme.
This is consistent behaviour with the x86 side as well.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: groundwork for mem_access support on ARM
Tamas K Lengyel [Mon, 20 Apr 2015 15:06:16 +0000 (17:06 +0200)]
xen/arm: groundwork for mem_access support on ARM

Add necessary changes for page table construction routines to pass
the default access information and hypercall continuation mask. Also,
define necessary functions and data fields to be used later by mem_access.

The p2m_access_t info will be stored in a Radix tree as the PTE lacks
enough software programmable bits, thus in this patch we add the radix-tree
construction/destruction portions. The tree itself will be used later
by mem_access.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: Fix (and, effectively, update) QEMU_TAG
Ian Jackson [Tue, 21 Apr 2015 10:27:59 +0000 (11:27 +0100)]
Config.mk: Fix (and, effectively, update) QEMU_TAG

In 952944f7 "QEMU_TAG update" my tag update script mangled the
machinery which sets QEMU_TRADITIONAL_REVISION, by replacing the first
assignment to QEMU_TRADITIONAL_REVISION it found rather than the one
which ought to have been replaced.

The result was that:
 * From that commit on, QEMU_TAG was no longer honoured although
   QEMU_TRADITIONAL_REVISION still was
 * That particular update to QEMU_TRADITIONAL_REVISION's default
   value was effective
 * The next attempt to update QEMU_TRADITIONAL_REVISION, in
   1fc3aeb3 "libxl: use new QEMU xenstore protocol" was totally
   ineffective.

Fix this by restoring the transfer from QEMU_TAG.  The effects are:
 * Once more, honour QEMU_TAG.
 * Belatedly apply the qemu-trad change part of "libxl: use new QEMU
   xenstore protocol.

(I have also fixed my script to not do this again.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agosysctl: make XEN_SYSCTL_numainfo a little more efficient
Boris Ostrovsky [Tue, 21 Apr 2015 07:06:00 +0000 (09:06 +0200)]
sysctl: make XEN_SYSCTL_numainfo a little more efficient

A number of changes to XEN_SYSCTL_numainfo interface:

* Make sysctl NUMA topology query use fewer copies by combining some
  fields into a single structure and copying distances for each node
  in a single copy.
* NULL meminfo and distance handles are a request for maximum number
  of nodes (num_nodes). If those handles are valid and num_nodes is
  is smaller than the number of nodes in the system then -ENOBUFS is
  returned (and correct num_nodes is provided)
* Instead of using max_node_index for passing number of nodes keep this
  value in num_nodes: almost all uses of max_node_index required adding
  or subtracting one to eventually get to number of nodes anyway.
* Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add
  XEN_INVALID_NODE_DIST.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/domctl: don't allow a toolstack domain to pause itself
Andrew Cooper [Tue, 21 Apr 2015 07:05:26 +0000 (09:05 +0200)]
x86/domctl: don't allow a toolstack domain to pause itself

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/domctl: cleanup
Andrew Cooper [Tue, 21 Apr 2015 07:04:45 +0000 (09:04 +0200)]
x86/domctl: cleanup

 * latch curr/currd once at start
 * drop redundant "ret = 0" and braces
 * use "copyback = 1" when appropriate
 * move break statements inside case-specific braced scopes
 * don't bother check for NULL before calling xfree()
 * eliminate trailing whitespace
 * Xen style corrections

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agodomctl/sysctl: don't leak hypervisor stack to toolstacks
Andrew Cooper [Tue, 21 Apr 2015 07:03:15 +0000 (09:03 +0200)]
domctl/sysctl: don't leak hypervisor stack to toolstacks

This is CVE-2015-3340 / XSA-132.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/efi: Reserve SMBIOS table region when EFI booting
Ross Lagerwall [Fri, 17 Apr 2015 08:44:48 +0000 (10:44 +0200)]
x86/efi: Reserve SMBIOS table region when EFI booting

Some EFI firmware implementations may place the SMBIOS table in RAM
marked as BootServicesData, which Xen does not consider as reserved.
When dom0 tries to access the SMBIOS, the region is not contained in the
initial P2M and it crashes with a page fault. To fix this, reserve the
SMBIOS region.

Also, fix the memcmp checks for existence of the SMBIOS.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agopublic/grant_table.h: fix description of GNTTABOP_map_grant_ref
Rafał Wojdyła [Fri, 17 Apr 2015 08:44:29 +0000 (10:44 +0200)]
public/grant_table.h: fix description of GNTTABOP_map_grant_ref

Error code is not returned in the <handle> field of the
gnttab_map_grant_ref structure but in the <status> field only.

Signed-off-by: Rafał Wojdyła <omeg@invisiblethingslab.com>
10 years agoVMX: replace some plain numbers
Liang Li [Fri, 17 Apr 2015 08:42:13 +0000 (10:42 +0200)]
VMX: replace some plain numbers

... making the code better document itself. No functional change
intended.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agovtpmmgr: execute deep quote in locality 0
Emil Condrea [Wed, 15 Apr 2015 18:00:14 +0000 (21:00 +0300)]
vtpmmgr: execute deep quote in locality 0

Enables deep quote execution for vtpmmgr which can not be started
using locality 2. Flags are used to request additional data to be
present when executing quote. They are interpreted as a bitmask of:
 * VTPM_QUOTE_FLAGS_HASH_UUID
 * VTPM_QUOTE_FLAGS_VTPM_MEASUREMENTS
 * VTPM_QUOTE_FLAGS_GROUP_INFO
 * VTPM_QUOTE_FLAGS_GROUP_PUBKEY

The externData param for TPM_Quote is calculated as:
externData = SHA1 (
       extraInfoFlags
       requestData
       [SHA1 (
          [SHA1 (UUIDs if requested)]
          [SHA1 (vTPM measurements if requested)]
          [SHA1 (vTPM group update policy if requested)]
          [SHA1 (vTPM group public key if requested)]
       ) if flags !=0 ]
)

The response param pcrValues is an array containing requested hashes used
for externData calculation : UUIDs, vTPM measurements, vTPM group update
policy, group public key. At the end of these hashes the PCR values are
appended.

Signed-off-by: Emil Condrea <emilcondrea@gmail.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
10 years agovtpm: deep quote flags
Emil Condrea [Wed, 15 Apr 2015 18:00:13 +0000 (21:00 +0300)]
vtpm: deep quote flags

Currently, the flags are not interpreted by vTPM. They are just
packed and sent to vtpmmgr.

Signed-off-by: Emil Condrea <emilcondrea@gmail.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
10 years agoxen/vm_event: Add RESUME option to vm_event_op domctl
Tamas K Lengyel [Thu, 9 Apr 2015 14:32:53 +0000 (16:32 +0200)]
xen/vm_event: Add RESUME option to vm_event_op domctl

Thus far mem_access and mem_sharing memops had been able to signal
to Xen to start pulling responses off the corresponding rings. In this patch
we retire these memops and add them to the option to the vm_event_op domctl.

The vm_event_op domctl suboptions are the same for each ring thus we
consolidate them into XEN_VM_EVENT_ENABLE/DISABLE/RESUME.

As part of this patch in libxc we also rename the mem_access_enable/disable
functions to monitor_enable/disable and move them into xc_monitor.c.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoxen/xsm: Split vm_event_op into three separate labels
Tamas K Lengyel [Thu, 9 Apr 2015 14:32:52 +0000 (16:32 +0200)]
xen/xsm: Split vm_event_op into three separate labels

The XSM label vm_event_op has been used to control the three memops
controlling mem_access, mem_paging and mem_sharing. While these systems
rely on vm_event, these are not vm_event operations themselves. Thus,
in this patch we introduce three separate labels for each of these memops.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoxen/vm_event: Relocate memop checks
Tamas K Lengyel [Thu, 9 Apr 2015 14:32:51 +0000 (16:32 +0200)]
xen/vm_event: Relocate memop checks

The memop handler function for paging/sharing responsible for calling XSM
doesn't really have anything to do with vm_event, thus in this patch we
relocate it into mem_paging_memop and mem_sharing_memop. This has already
been the approach in mem_access_memop, so in this patch we just make it
consistent.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agoxen/vm_event: Decouple vm_event and mem_access.
Tamas K Lengyel [Thu, 9 Apr 2015 14:32:50 +0000 (16:32 +0200)]
xen/vm_event: Decouple vm_event and mem_access.

The vm_event subsystem has been artifically tied to the presence of mem_access.
While mem_access does depend on vm_event, vm_event is an entirely independent
subsystem that can be used for arbitrary function-offloading to helper apps in
domains. This patch removes the dependency that mem_access needs to be supported
in order to enable vm_event.

A new vm_event_resume function is introduced which pulls all responses off from
given ring and delegates handling to appropriate helper functions (if
necessary). By default, vm_event_resume just pulls the response from the ring
and unpauses the corresponding vCPU. This approach reduces code duplication
and present a single point of entry for the entire vm_event subsystem's
response handling mechanism.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoxen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag
Tamas K Lengyel [Thu, 9 Apr 2015 14:32:49 +0000 (16:32 +0200)]
xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag

There are no use-cases for this flag.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoxen: Introduce monitor_op domctl
Tamas K Lengyel [Thu, 9 Apr 2015 14:32:48 +0000 (16:32 +0200)]
xen: Introduce monitor_op domctl

In preparation for allowing for introspecting ARM and PV domains the old
control interface via the hvm_op hypercall is retired. A new control mechanism
is introduced via the domctl hypercall: monitor_op.

This patch aims to establish a base API on which future applications can build
on.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Tim Deegan <tim@xen.org>
10 years agolibxenstat: qmp_read fix and cleanup
Wei Liu [Wed, 8 Apr 2015 16:08:22 +0000 (17:08 +0100)]
libxenstat: qmp_read fix and cleanup

The second argument of poll(2) is the number of file descriptors. POLLIN
is defined as 1 so it happens to work. Also reduce the size of array to
one as there is only one file descriptor.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Charles Arnold <carnold@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxenstat: always free qmp_stats
Wei Liu [Wed, 8 Apr 2015 16:08:21 +0000 (17:08 +0100)]
libxenstat: always free qmp_stats

Originally qmp_stats is only freed in failure path and leaked in success
path.

Instead of wiring up the success path, rearrange the code a bit to
always free qmp_stats before checking if info is NULL.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Charles Arnold <carnold@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxenstat: YAJL_GET_STRING may return NULL
Wei Liu [Wed, 8 Apr 2015 16:08:20 +0000 (17:08 +0100)]
libxenstat: YAJL_GET_STRING may return NULL

Passing NULL to strcmp can cause segmentation fault. Continue in that
case.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Charles Arnold <carnold@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxenstat: reuse xc_handle open in xenstat_init
Wei Liu [Wed, 8 Apr 2015 16:08:19 +0000 (17:08 +0100)]
libxenstat: reuse xc_handle open in xenstat_init

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Charles Arnold <carnold@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: check return value of libxl_vcpu_setaffinity
Wei Liu [Wed, 8 Apr 2015 16:05:24 +0000 (17:05 +0100)]
libxl: check return value of libxl_vcpu_setaffinity

That function can fail.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Don't write to GICH_MISR
Edgar E. Iglesias [Fri, 10 Apr 2015 06:21:10 +0000 (16:21 +1000)]
xen/arm: Don't write to GICH_MISR

GICH_MISR is read-only in GICv2.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoREADME: Reference some more comprehensive docs from the Quick-start
Ian Campbell [Tue, 14 Apr 2015 15:25:49 +0000 (16:25 +0100)]
README: Reference some more comprehensive docs from the Quick-start

The quick-start is not terribly comprehensive for beginners.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
10 years agoxenstore: document xs_set_permissions
Wei Liu [Tue, 31 Mar 2015 12:26:11 +0000 (13:26 +0100)]
xenstore: document xs_set_permissions

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl/vcpu-set - allow to decrease vcpu count on overcommitted guests (v5)
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:34 +0000 (16:02 -0400)]
libxl/vcpu-set - allow to decrease vcpu count on overcommitted guests (v5)

We have a check to warn the user if they are overcommitting.
But the check only checks the hosts CPU amount and does
not take into account the case when the user is trying to fix
the overcommit. That is - they want to limit the amount of
online VCPUs.

This fix allows the user to offline vCPUs without any
warnings when they are running an overcommitted guest.

Also fix the extra space in the message.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl/vcpuset: Remove useless limit on max_vcpus.
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:33 +0000 (16:02 -0400)]
libxl/vcpuset: Remove useless limit on max_vcpus.

The check is superflous. If the 'max_vcpus' (argument
value) is greater than  pCPU and --ignore-host has not
been supplied we would print an warning and return
and not call this code.

If the --ignore-host parameter had been used we would
never end up in this condition and enforce 'max_vcpus'.

The only time it would be invoked is if max_vcpus < host_cpu
in which case it would set max_vcpus to max_vcpus.

In short - it is dead code.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl/vcpuset: Return error value if failed.
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:32 +0000 (16:02 -0400)]
libxl/vcpuset: Return error value if failed.

The function does not return any values at all. Convert the
internal libxl errors (ERROR_FAIL, ..., etc) to 1.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl/vcpuset: Print error if libxl_set_vcpuonline returns ERROR_DOMAIN_NOTFOUND
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:31 +0000 (16:02 -0400)]
libxl/vcpuset: Print error if libxl_set_vcpuonline returns ERROR_DOMAIN_NOTFOUND

Instead of just printing an generic error.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:29 +0000 (16:02 -0400)]
libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.

There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.

As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.

We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Add ERROR_DOMAIN_NOTFOUND for libxl_domain_info when it cannot find the domain
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:28 +0000 (16:02 -0400)]
libxl: Add ERROR_DOMAIN_NOTFOUND for libxl_domain_info when it cannot find the domain

And use that for all of its callers in the tree.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
Ian Jackson [Tue, 7 Apr 2015 13:05:28 +0000 (14:05 +0100)]
libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof

Some operating systems (including Linux and FreeBSD[1]) signal not
(only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or
without POLLIN).  This is permitted[2].  The implications are that in
the general case it is not possible to determine whether POLLHUP
indicates an error or simply eof without attempting a read.

Datacopiers mishandle this, because they always treat POLLHUP
exceptionally (either reporting it via callback_pollhup, or treating
it as an error).  datacopiers reading from pipes on such OSs can fail
(perhaps leaving some data unprocessed) rather than completing
successfully.

[1] http://www.greenend.org.uk/rjk/tech/poll.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

Distinguishing POLLHUP is needed for pty fds, but most callers in
libxl do not care about POLLHUP except as an error or eof condition.

So change the datacopier semantics so that if callback_pollhup is not
specified we treat POLLHUP almost like POLLIN.  The difference is that
if we get HUP from poll, but EWOULDBLOCK from read, we must signal an
error rather than attempting the read again.

This fixes the problem which 7e9ec50b0535 was aimed at.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: datacopier: Avoid eof/POLLHUP race
Ian Jackson [Tue, 7 Apr 2015 13:05:27 +0000 (14:05 +0100)]
libxl: datacopier: Avoid eof/POLLHUP race

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.

(c) can be detected only after a previous POLLIN without POLLHUP and
that previous POLLIN would be associated with data which was read,
which must therefore have ended up in the dc's buffer.  But nothing
stops the dc from writing that data into the output fd and reporting
eof before it calls poll again.

This race is unlikely.  But  nevertheless it should be fixed.

We solve the race with a poll of the reading fd, to double-check, when
we detect eof via read.  (This is only necessary if the caller has
specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
treated as eof anyway.)

With a testing patch supplied by me, Roger Pau Monné has reproduced
the failure on FreeBSD and verified that this patch fixes the problem.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
10 years agox86/vMSI-X: add valid bits for read acceleration
Jan Beulich [Tue, 14 Apr 2015 14:51:18 +0000 (16:51 +0200)]
x86/vMSI-X: add valid bits for read acceleration

Again because Xen doesn't get to see all guest writes, it shouldn't
serve reads from its cache before having seen a write to the respective
address.

Also use DECLARE_BITMAP() in a related field declaration instead of
open coding it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/vMSI-X: honor all mask requests
Jan Beulich [Tue, 14 Apr 2015 14:50:35 +0000 (16:50 +0200)]
x86/vMSI-X: honor all mask requests

Commit 74fd0036de ("x86: properly handle MSI-X unmask operation from
guests") didn't go far enough: it fixed an issue with unmasking, but
left an issue with masking in place: Due to the (late) point in time
when qemu requests the hypervisor to set up MSI-X interrupts (which is
where the MMIO intercept gets put in place), the hypervisor doesn't
see all guest writes, and hence shouldn't make assumptions on the state
the virtual MSI-X resources are in. Bypassing the rest of the logic on
a guest mask operation leads to

[00:04.0] pci_msix_write: Error: Can't update msix entry 1 since MSI-X is already enabled.

which surprisingly enough doesn't lead to the device not working
anymore (I didn't dig in deep enough to figure out why that is). But it
does prevent the IRQ to be migrated inside the guest, i.e. all
interrupts will always arrive in vCPU 0.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: use real assert frames for ASSERT_INTERRUPTS_{EN,DIS}ABLED
Andrew Cooper [Tue, 14 Apr 2015 13:29:19 +0000 (15:29 +0200)]
x86: use real assert frames for ASSERT_INTERRUPTS_{EN,DIS}ABLED

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: infrastructure to create BUG_FRAMES in asm code
Andrew Cooper [Tue, 14 Apr 2015 13:07:24 +0000 (15:07 +0200)]
x86: infrastructure to create BUG_FRAMES in asm code

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: set regs->entry_vector for early_page_fault
Don Slutz [Tue, 14 Apr 2015 13:03:27 +0000 (15:03 +0200)]
x86: set regs->entry_vector for early_page_fault

This changes:

(XEN) Early fatal page fault at e008:ffff82d080164252 (cr2=0000000000000000, ec=0000)
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
...
(XEN) Xen call trace:
(XEN)    [<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
(XEN)    [<ffff82d080105262>] domain_create+0x384/0x556
(XEN)    [<ffff82d0802a0de4>] scheduler_init+0x1c4/0x244
(XEN)    [<ffff82d0802be359>] __start_xen+0x1d0e/0x22a1
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x58
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL TRAP: vector = 0 (divide error)
(XEN) [error_code=0000] , IN INTERRUPT CONTEXT
(XEN) ****************************************
...

to:

(XEN) Early fatal page fault at e008:ffff82d080164252 (cr2=0000000000000000, ec=0000)
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
...
(XEN) Xen call trace:
(XEN)    [<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
(XEN)    [<ffff82d080105262>] domain_create+0x384/0x556
(XEN)    [<ffff82d0802a0de4>] scheduler_init+0x1c4/0x244
(XEN)    [<ffff82d0802be359>] __start_xen+0x1d0e/0x22a1
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x58
(XEN)
(XEN) Faulting linear address: 0000000000000000
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000083a1a6063 ffffffffffffffff
(XEN)  L3[0x000] = 000000083a1a5063 ffffffffffffffff
(XEN)  L2[0x000] = 000000083a1a4063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL TRAP: vector = 14 (page fault)
(XEN) [error_code=0000] , IN INTERRUPT CONTEXT
(XEN) ****************************************
...

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/mtrr: include asm/atomic.h
David Vrabel [Tue, 14 Apr 2015 13:02:32 +0000 (15:02 +0200)]
x86/mtrr: include asm/atomic.h

asm/atomic.h is needed but only included indirectly via
asm/spinlock.h.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/hvm: don't include asm/spinlock.h
David Vrabel [Tue, 14 Apr 2015 13:02:10 +0000 (15:02 +0200)]
x86/hvm: don't include asm/spinlock.h

asm/spinlock.h should not be included directly.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoRevert "x86/hvm: wait for at least one ioreq server to be enabled"
Wei Liu [Tue, 14 Apr 2015 13:01:14 +0000 (15:01 +0200)]
Revert "x86/hvm: wait for at least one ioreq server to be enabled"

We don't need this workaround anymore since we have fixed the toolstack
interlock problem that affects stubdom.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
10 years agox86: clean up psr boot parameter parsing
Chao Peng [Tue, 14 Apr 2015 13:00:44 +0000 (15:00 +0200)]
x86: clean up psr boot parameter parsing

Change type of opt_psr from bool to int so more psr features can fit.

Introduce a new routine to parse bool parameter so that both cmt and
future psr features like cat can use it.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agodocs: efi: given some hint about the dom0 command line
Ian Campbell [Tue, 14 Apr 2015 13:00:28 +0000 (15:00 +0200)]
docs: efi: given some hint about the dom0 command line

Suggested-by: Carlos Gustavo Ramirez Rodriguez <carlosgrr@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/traps: identify the vcpu in context when dumping registers
Andrew Cooper [Tue, 14 Apr 2015 12:59:53 +0000 (14:59 +0200)]
x86/traps: identify the vcpu in context when dumping registers

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/cpuidle: identify a legitimate fallthrough case
Andrew Cooper [Tue, 14 Apr 2015 12:59:37 +0000 (14:59 +0200)]
x86/cpuidle: identify a legitimate fallthrough case

to appease the Missing Break checker.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-id: 1291938

10 years agosched_credit2: more info when dumping
Dario Faggioli [Tue, 14 Apr 2015 12:58:52 +0000 (14:58 +0200)]
sched_credit2: more info when dumping

more specifically, for each runqueue, print what pCPUs
belong to it, which ones are idle and which ones have
been tickled.

While there, also convert the whole file to use
keyhandler_scratch for printing cpumask-s.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agorework locking for dump of scheduler info (debug-key r)
Dario Faggioli [Tue, 14 Apr 2015 12:56:13 +0000 (14:56 +0200)]
rework locking for dump of scheduler info (debug-key r)

such as it is taken care of by the various schedulers, rather
than happening in schedule.c. In fact, it is the schedulers
that know better which locks are necessary for the specific
dumping operations.

While there, fix a few style issues (indentation, trailing
whitespace, parentheses and blank line after var declarations)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agoVTd/dmar: Tweak how the DMAR table is clobbered
Andrew Cooper [Fri, 10 Apr 2015 15:26:18 +0000 (11:26 -0400)]
VTd/dmar: Tweak how the DMAR table is clobbered

Intead of clobbering DMAR -> XMAR and back, clobber to RMAD instead. This
means that changing the signature does not alter the checksum, which allows
the clobbering/unclobbering to be peformed atomically and idempotently, which
is an advantage on the kexec path which can reenter acpi_dmar_reinstate().

This DMAR clobbering was introduced by
83904107a33c9badc34ecdd1f8ca0f9271e5e370 which claims that the dom0 VT-d
driver was capable of playing with the IOMMU(s) while Xen was also using
them. An alternative approach might be to leave the DMAR table alone
and sprinkle some iomem_deny_access() around to forcibly prevent dom0
from playing but this is simpler.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix>
CC: Yang Zhang <yang.z.zhang@intel>
Acked-by: Kevin Tian <kevin.tian@intel>
10 years agotools/hvmloader: Don't perform AML hotplug debugging in production
Andrew Cooper [Mon, 30 Mar 2015 14:20:19 +0000 (15:20 +0100)]
tools/hvmloader: Don't perform AML hotplug debugging in production

It is number of vmexits and a moderate quantity of qemu logging which can
safely be avoided when not specifically debugging a PCI hotplug issue.

As mk_dsdt is a build system tool, pass 'debug' as a command line parameter
rather than "hardcoding" it via the compilation of mk_dsdt itself.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/smp: Allocate pcpu stacks on their local numa node
Andrew Cooper [Tue, 7 Apr 2015 17:26:19 +0000 (18:26 +0100)]
x86/smp: Allocate pcpu stacks on their local numa node

Previously, all pcpu stacks tended to be allocated on node 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
10 years agox86/link: Introduce and use __bss_end
Andrew Cooper [Tue, 7 Apr 2015 17:26:18 +0000 (18:26 +0100)]
x86/link: Introduce and use __bss_end

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/smp: Clean up use of memflags in cpu_smpboot_alloc()
Andrew Cooper [Tue, 7 Apr 2015 17:26:17 +0000 (18:26 +0100)]
x86/smp: Clean up use of memflags in cpu_smpboot_alloc()

Hoist MEMF_node(cpu_to_node(cpu)) to the start of the function, and avoid
passing (potentially bogus) memflags if node information is not available.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/numa: Correct the extern of cpu_to_node
Andrew Cooper [Tue, 7 Apr 2015 17:26:16 +0000 (18:26 +0100)]
x86/numa: Correct the extern of cpu_to_node

This was missed by c/s 54ce2db "x86/numa: adjust datatypes for node and pxm"
which changed the array definition in numa.c

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/link: Discard the alternatives ".discard" sections
Andrew Cooper [Tue, 7 Apr 2015 17:26:15 +0000 (18:26 +0100)]
x86/link: Discard the alternatives ".discard" sections

This appears to have been missed when porting the alternatives framework from
Linux, and saves us a section which is otherwise loaded into memory.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/dom0: Don't allow dom0_max_vcpus to be zero
Boris Ostrovsky [Thu, 9 Apr 2015 20:38:43 +0000 (16:38 -0400)]
x86/dom0: Don't allow dom0_max_vcpus to be zero

In case dom0_max_vcpus is incorrectly specified on boot line make sure
we will still boot.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/hvm: Fix the unknown nested vmexit reason 80000021 bug
Liang Li [Tue, 7 Apr 2015 13:27:02 +0000 (21:27 +0800)]
x86/hvm: Fix the unknown nested vmexit reason 80000021 bug

This bug will be trigged when NMI happen in the L2 guest. The current
code handles the NMI incorrectly. According to Intel SDM 31.7.1.2
(Resuming Guest Software after Handling an Exception), If bit 31 of the
IDT-vectoring information fields is set, and the virtual NMIs VM-execution
control is 1, while bits 10:8 in the IDT-vectoring information field is
2, bit 3 in the interruptibility-state field should be cleared to avoid
the next VM entry fail.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agolibxl: use new QEMU xenstore protocol
Wei Liu [Thu, 9 Apr 2015 18:49:25 +0000 (19:49 +0100)]
libxl: use new QEMU xenstore protocol

Originally both QEMU traditional and QEMU upstream used hardcoded
/local/domain/0 paths. This patch changes the protocol to use
/local/domain/$dm_domid path.

For QEMU traditional and upstream without stubdom, $dm_domid is 0 so
the path is in fact still /local/domain/0.

For QEMU traditional stubdom, this is incompatible protocol change.
However QEMU traditional is shipped with Xen so we are allowed to do
such change.  This change requires to corresponding QEMU traditional
changeset.

There is no compatibility issue with QEMU upstream stubdom, because QEMU
upstream stubdom doesn't exist yet.

Watch /local/domain/$dm_domid/device-model/$domid/state, wait until
state turns "running" then unpause guest.

LIBXL_STUBDOM_START_TIMEOUT is the timeout used wait for stubdom to be
ready. My test on a very old machine (Core 2 6400) showed that it might
need more than 20s before the stubdom is ready to serve DomU.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/hvm: factor out and rename vm_event related functions
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:58 +0000 (22:06 +0100)]
x86/hvm: factor out and rename vm_event related functions

To avoid growing hvm.c these functions can be stored separately. Minor style
changes are applied to the logic in the file.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agotools/tests: Clean-up tools/tests/xen-access
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:57 +0000 (22:06 +0100)]
tools/tests: Clean-up tools/tests/xen-access

The spin-lock implementation in the xen-access test program is implemented
in a fashion that is actually incomplete. The x86 assembly that guarantees that
the lock is held by only one thread lacks the "lock;" instruction.

However, the spin-lock is not actually necessary in xen-access as it is not
multithreaded. The presence of the faulty implementation of the lock in a non-
multithreaded environment is unnecessarily complicated for developers who are
trying to follow this code as a guide in implementing their own applications.
Thus, removing it from the code improves the clarity on the behavior of the
system.

Also converting functions that always return 0 to return to void, and making
the teardown function actually return an error code on error.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: Rename mem_event to vm_event
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:56 +0000 (22:06 +0100)]
xen: Rename mem_event to vm_event

In this patch we mechanically rename mem_event to vm_event. This patch
introduces no logic changes to the code. Using the name vm_event better
describes the intended use of this subsystem, which is not limited to memory
events. It can be used for off-loading the decision making logic into helper
applications when encountering various events during a VM's execution.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agoxen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:55 +0000 (22:06 +0100)]
xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup

The only use-case of the mem_event_op structure had been in mem_paging,
thus renaming the structure mem_paging_op and relocating its associated
functions clarifies its actual usage.

As part of this fix-up we also convert the gfn's in the toolstack to be
explicitely 64-bit wide and clean the code a bit.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/mem_event: Cleanup mem_event names in rings, functions and domctls
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:54 +0000 (22:06 +0100)]
xen/mem_event: Cleanup mem_event names in rings, functions and domctls

The name of one of the mem_event rings still implies it is used only
for memory accesses, which is no longer the case. It is also used to
deliver various HVM events, thus the name "monitor" is more appropriate
in this setting.

Couple functions incorrectly labeled as part of mem_event is also renamed
to reflect that they belong to mem_access.

The mem_event subop definitions are also shortened to be more meaningful.

The tool side changes are only mechanical renaming to match these new names.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/mem_event: Cleanup of mem_event structures
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:53 +0000 (22:06 +0100)]
xen/mem_event: Cleanup of mem_event structures

The public mem_event structures used to communicate with helper applications via
shared rings have been used in different settings. However, the variable names
within this structure have not reflected this fact, resulting in the reuse of
variables to mean different things under different scenarios.

This patch remedies the issue by clearly defining the structure members based on
the actual context within which the structure is used.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agohvmloader: fix build error `invalid digit "8" in octal constant'
Wen Congyang [Wed, 8 Apr 2015 01:49:26 +0000 (01:49 +0000)]
hvmloader: fix build error `invalid digit "8" in octal constant'

commit b9245b75 introduces a building error:
make[1]: Entering directory `/root/work/xen/tools/firmware/hvmloader'
gcc   -O1 -fno-omit-frame-pointer -m32 -march=i686 -g -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wdeclaration-after-statement   -O0 -g3 -D__XEN_TOOLS__ -MMD -MF .smbios.o.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fno-optimize-sibling-calls -mno-tls-direct-seg-refs  -Werror -fno-stack-protector -fno-exceptions -fno-builtin -msoft-float -I/root/work/xen/tools/firmware/hvmloader/../../../tools/include -DENABLE_ROMBIOS -DENABLE_SEABIOS -D__SMBIOS_DATE__="04/08/2015"  -c -o smbios.o smbios.c
smbios.c:384:46: error: invalid digit "8" in octal constant
smbios.c:792:46: error: invalid digit "8" in octal constant
make[1]: *** [smbios.o] Error 1

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoxentop: fix potential memory leak
Charles Arnold [Thu, 2 Apr 2015 15:42:02 +0000 (09:42 -0600)]
xentop: fix potential memory leak

On a read failure the qstats buffer is not freed.

Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoRevert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
Ian Jackson [Thu, 2 Apr 2015 14:32:22 +0000 (15:32 +0100)]
Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"

The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
breaks that.  7e9ec50b, when handling a pty master, violates the
specification of the datacopier interface (as defined).

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.  I think there is a latent bug with (c), which I will discuss
later in this email.

In a recent bug report from a FreeBSD installation, the datacopier
gets told about (a) before (b).  But 7e9ec50b filters the POLLHUP out,
so that the dc signals eof rather than hup.  As a result in
bootloader_copyfail we take the error path.

This reverts commit 7e9ec50b0535bf2630da9d279a060775817d136d.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
10 years agohvmloader: add knob for fixed VGABIOS date string
Olaf Hering [Wed, 1 Apr 2015 13:28:35 +0000 (13:28 +0000)]
hvmloader: add knob for fixed VGABIOS date string

To allow reproducible builds of hvmloader introduce a make variable
VGABIOS_REL_DATE="dd Mon yyyy" to provide a fixed date string. Without
this change the hvmloader binary changes with every rebuild.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agohvmloader: add knob for fixed SMBIOS date string
Olaf Hering [Wed, 1 Apr 2015 13:28:34 +0000 (13:28 +0000)]
hvmloader: add knob for fixed SMBIOS date string

To allow reproducible builds of hvmloader introduce a make variable
SMBIOS_REL_DATE=mm/dd/yyyy to provide a fixed date string. Without this
change the hvmloader binary changes with every rebuild.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
10 years agoINSTALL: mention variables for reproducible builds
Olaf Hering [Wed, 1 Apr 2015 13:28:33 +0000 (13:28 +0000)]
INSTALL: mention variables for reproducible builds

Mention two variables introduced by commit ac977f5 ("use more fixed
strings to build the hypervisor").

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: introduce XENSTORED_ARGS= in sysconfig file.
Olaf Hering [Wed, 1 Apr 2015 13:28:32 +0000 (13:28 +0000)]
tools/hotplug: introduce XENSTORED_ARGS= in sysconfig file.

It is already used in the runlevel script and the service file.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
10 years agoxen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts
Julien Grall [Wed, 1 Apr 2015 16:21:47 +0000 (17:21 +0100)]
xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts

GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
IRQ 1020-1023 are reserved for special purpose.

The result is used by the callers of gic_number_lines in order to check
the validity of an IRQ.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
10 years agoxen/arm: vgic: Correctly calculate GICD_TYPER.ITLinesNumber
Julien Grall [Wed, 1 Apr 2015 16:21:46 +0000 (17:21 +0100)]
xen/arm: vgic: Correctly calculate GICD_TYPER.ITLinesNumber

The formula of GICD_TYPER.ITLinesNumber is 32(N + 1).

As the number of SPIs suppported by the domain may not be a multiple of
32, we have to round up the number before using it.

At the same time remove the mask GICD_TYPE_LINES which is pointless.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: gic_route_irq_to_guest: Honor the priority given in parameter
Julien Grall [Wed, 1 Apr 2015 16:21:45 +0000 (17:21 +0100)]
xen/arm: gic_route_irq_to_guest: Honor the priority given in parameter

The priority is already hardcoded in route_irq_to_guest and therefore
can't be controlled by the guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: gic: Add sanity checks gic_route_irq_to_guest
Julien Grall [Wed, 1 Apr 2015 16:21:44 +0000 (17:21 +0100)]
xen/arm: gic: Add sanity checks gic_route_irq_to_guest

With the addition of interrupt assignment to guest, we need to make sure
the guest can't blow up the interrupt management in Xen.

Before associating the IRQ to a vIRQ we need to make sure:
    - the vIRQ is not already associated to another IRQ
    - the guest didn't enable the vIRQ

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: route_irq_to_guest: Check validity of the IRQ
Julien Grall [Wed, 1 Apr 2015 16:21:43 +0000 (17:21 +0100)]
xen/arm: route_irq_to_guest: Check validity of the IRQ

Currently Xen only supports SPIs routing for guest, add a function
is_assignable_irq to check if we can assign a given IRQ to the guest.

Secondly, make sure the vIRQ is not the greater than the number of IRQs
configured in the vGIC and it's an SPI.

Thirdly, when the IRQ is already assigned to the domain, check the user
is not asking to use a different vIRQ than the one already bound.

Finally, desc->arch.type which contains the IRQ type (i.e level/edge) must
be correctly configured before. The misconfiguration can happen when:
    - the device has been blacklisted for the current platform
    - the IRQ has not been described in the device tree

Also, use XENLOG_G_ERR in the error message within the function as it will
be later called from a guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Allow virq != irq
Julien Grall [Wed, 1 Apr 2015 16:21:42 +0000 (17:21 +0100)]
xen/arm: Allow virq != irq

Currently, Xen is assuming that the virtual IRQ will always be the same
as IRQ.

Modify route_guest_irq to take the virtual IRQ in parameter which allow
Xen to assign a different IRQ number. Also store the vIRQ in the desc
action to easily retrieve the IRQ target when we need to inject the
interrupt.

As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case.

At the same time modify the behavior of irq_get_domain. The function now
requires that the irq_desc belongs to an IRQ assigned to a guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: Extend DOMCTL createdomain to support arch configuration
Julien Grall [Wed, 1 Apr 2015 16:21:41 +0000 (17:21 +0100)]
xen: Extend DOMCTL createdomain to support arch configuration

On ARM the virtual GIC may differ between each guest (emulated GIC version,
number of SPIs...). This information is already known at the domain creation
and can never change.

For now only the gic_version is set. In the long run, there will be more
parameters such as the number of SPIs. All will be required to be set at the
same time.

A new arch-specific structure arch_domainconfig has been created, the x86
one doesn't have any specific configuration, for now, a dummy structure
(C-spec compliant) has been created.

Some external tools (qemu, xenstore) may be required to create a domain.
Rather than asking them to take care of the arch-specific domain
configuration, let the current function (xc_domain_create) chose a
default configuration and introduce a new one (xc_domain_create_config).

This patch also drops the previously introduced DOMCTL arm_configure_domain
in Xen 4.5, as it has been made useless.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
10 years agoMAINTAINERS: move drivers/passthrough/device_tree.c in "DEVICE TREE"
Julien Grall [Wed, 1 Apr 2015 16:21:40 +0000 (17:21 +0100)]
MAINTAINERS: move drivers/passthrough/device_tree.c in "DEVICE TREE"

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Keir Fraser <keir@xen.org>
10 years agoxen/arm: Introduce xen, passthrough property
Julien Grall [Wed, 1 Apr 2015 16:21:39 +0000 (17:21 +0100)]
xen/arm: Introduce xen, passthrough property

When a device is marked for passthrough (via the new property
"xen,passthrough"), dom0 must not access to the device (i.e not
loading a driver), but should be able to manage the MMIO/interrupt
of the passthrough device.

The latter part will allow the toolstack to map MMIO/IRQ when a device
is pass through to a guest.

The property "xen,passthrough" will be translated as 'status="disabled"'
in the device tree to avoid DOM0 using the device. We assume that DOM0 is
able to cope with this property (already the case for Linux, and
required by ePAPR).

Rework the function map_device (renamed into handle_device) to:

* For a given device node:
    - Give permission to manage IRQ/MMIO for this device
    - Retrieve the IRQ configuration (i.e edge/level) from the device
    tree
* When the device is not marked for guest passthrough:
    - Assign the device to the guest if it's protected by an IOMMU
    - Map the IRQs and MMIOs regions to the guest

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Map disabled device in DOM0
Julien Grall [Wed, 1 Apr 2015 16:21:38 +0000 (17:21 +0100)]
xen/arm: Map disabled device in DOM0

The check to avoid mapping disabled devices in DOM0 was added in
anticipation of the device passthrough. But, a brand new property will
be added later to mark device which will be passthrough.

Also, remove the memory type check as we already skipped them earlier in
the function via skip_matches.

Furthermore, some platform (such as the OMAP) may try to poke device even
if the property "status" is set to "disabled".

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
10 years agoxen/arm: vgic: Introduce a function to initialize pending_irq
Julien Grall [Wed, 1 Apr 2015 16:21:37 +0000 (17:21 +0100)]
xen/arm: vgic: Introduce a function to initialize pending_irq

The structure pending_irq is initialized in the same way in 2 different
places. Introduce vgic_init_pending_irq to avoid code duplication.

Also move the setting of the irq field into this function as we need to
initialize it once rather than every time an IRQ is injected to the guest.

Finally, use unsigned int for the "irq" field to be consistent with the
virq variable

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/dts: Use unsigned int for MMIO and IRQ index
Julien Grall [Wed, 1 Apr 2015 16:21:36 +0000 (17:21 +0100)]
xen/dts: Use unsigned int for MMIO and IRQ index

There is no reason to use signed integer for an index.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/dts: Allow only IRQ translation that are mapped to main GIC
Julien Grall [Wed, 1 Apr 2015 16:21:35 +0000 (17:21 +0100)]
xen/dts: Allow only IRQ translation that are mapped to main GIC

Xen is only able to handle one GIC controller. Some platforms may contain
other interrupt controllers.

Make sure to only translate IRQ mapped into the GIC handled by Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Divide GIC initialization in 2 parts
Julien Grall [Wed, 1 Apr 2015 16:21:34 +0000 (17:21 +0100)]
xen/arm: Divide GIC initialization in 2 parts

Currently the function to translate IRQ from the device tree is set
unconditionally  to be able to be able to retrieve serial/timer IRQ before the
GIC has been initialized.

It assumes that the xlate function won't ever changed. We may also need to
have the primary interrupt controller very early.

Rework the gic initialization in 2 parts:
    - gic_preinit: Get the interrupt controller device tree node and set
up GIC and xlate callbacks
    - gic_init: Initialize the interrupt controller and the boot CPU
    interrupts.

The former function will be called just after the IRQ subsystem as been
initialized.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
10 years agodomctl: don't allow a toolstack domain to call domain_pause() on itself
Andrew Cooper [Wed, 1 Apr 2015 09:08:33 +0000 (10:08 +0100)]
domctl: don't allow a toolstack domain to call domain_pause() on itself

These DOMCTL subops were accidentally declared safe for disaggregation
in the wake of XSA-77.

This is XSA-127 / CVE-2015-2751.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoLimit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)
Konrad Rzeszutek Wilk [Wed, 19 Nov 2014 17:57:11 +0000 (12:57 -0500)]
Limit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)

Said hypercall for large BARs can take quite a while. As such
we can require that the hypercall MUST break up the request
in smaller values.

Another approach is to add preemption to it - whether we do the
preemption using hypercall_create_continuation or returning
EAGAIN to userspace (and have it re-invocate the call) - either
way the issue we cannot easily solve is that in 'map_mmio_regions'
if we encounter an error we MUST call 'unmap_mmio_regions' for the
whole BAR region.

Since the preemption would re-use input fields such as nr_mfns,
first_gfn, first_mfn - we would lose the original values -
and only undo what was done in the current round (i.e. ignoring
anything that was done prior to earlier preemptions).

Unless we re-used the return value as 'EAGAIN|nr_mfns_done<<10' but
that puts a limit (since the return value is a long) on the amount
of nr_mfns that can provided.

This patch sidesteps this problem by:
 - Setting an hard limit of nr_mfns having to be 64 or less.
 - Toolstack adjusts correspondingly to the nr_mfn limit.
 - If the there is an error when adding the toolstack will call the
   remove operation to remove the whole region.

The need to break this hypercall down is for large BARs can take
more than the guest (initial domain usually) time-slice. This has
the negative result in that the guest is locked out for a long
duration and is unable to act on any pending events.

We also augment the code to return zero if nr_mfns instead
of trying to the hypercall.

This is XSA-125 / CVE-2015-2752.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 31 Mar 2015 16:29:48 +0000 (17:29 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agoxentop: add support for qdisks
Charles Arnold [Tue, 24 Mar 2015 02:55:08 +0000 (20:55 -0600)]
xentop: add support for qdisks

Now that Xen uses qdisks by default and qemu does not write out
statistics to sysfs this patch queries the QMP for disk statistics.

This patch depends on libyajl for parsing statistics returned from
QMP. The runtime requires libyajl 2.0.3 or newer for required bug
fixes in yajl_tree_parse().

Libxl is modified to create a new socket dedicated for the use of
libxenstat for querying the block statistics using QMP.

The current APIs remain unchanged. It works within the existing
framework of libxenstat.

Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: cleanup some misuse of 'cpumap' as parameter
Dario Faggioli [Thu, 26 Mar 2015 08:55:04 +0000 (09:55 +0100)]
libxl: cleanup some misuse of 'cpumap' as parameter

in favour of the more generic 'bitmap', which is better
since these are generic libxl_bitmap_* functions.

Also fix a typo, and remove a stale (and wrong) comment.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agolibxl: automatically set soft affinity after vnuma info
Dario Faggioli [Thu, 26 Mar 2015 08:54:57 +0000 (09:54 +0100)]
libxl: automatically set soft affinity after vnuma info

More specifically, vcpus are assigned to a vnode, which in
turn is associated with a pnode. If a vcpu does not have any
soft affinity, automatically build up one, matching the pcpus
of the said pnode.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agolibxl: check whether vcpu affinity and vnuma info match
Dario Faggioli [Thu, 26 Mar 2015 08:54:48 +0000 (09:54 +0100)]
libxl: check whether vcpu affinity and vnuma info match

More specifically, vcpus are assigned to a vnode, which in
turn is associated with a pnode. If a vcpu also has, in its
(hard or soft) affinity, some pcpus that are not part of the
said pnode, print a warning to the user.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agoQEMU_TAG update
Ian Jackson [Tue, 31 Mar 2015 15:29:19 +0000 (16:29 +0100)]
QEMU_TAG update

10 years agoxen/passthrough: Support a single iommu_domain per xen domain per SMMU
Robbie VanVossen [Tue, 24 Mar 2015 20:48:19 +0000 (16:48 -0400)]
xen/passthrough: Support a single iommu_domain per xen domain per SMMU

If multiple devices are being passed through to the same domain and they
share a single SMMU, then they only require a single iommu_domain.

In arm_smmu_assign_dev, before a new iommu_domain is created, the
xen_domain->contexts is checked for any iommu_domains that are already
assigned to device that uses the same SMMU as the current device. If one
is found, attach the device to that iommu_domain. If a new one isn't
found, create a new iommu_domain just like before.

The arm_smmu_deassign_dev function assumes that there is a single
device per iommu_domain. This meant that when the first device was
deassigned, the iommu_domain was freed and when another device was
deassigned a crash occurred in xen.

To fix this, a reference counter was added to the iommu_domain struct.
When an arm_smmu_xen_device references an iommu_domain, the
iommu_domains ref is incremented. When that reference is removed, the
iommu_domains ref is decremented. The iommu_domain will only be freed
when the ref is 0.

Signed-off-by: Robbie VanVossen <robert.vanvossen@dornerworks.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: always omit guest user stack in vcpu_show_execution_state
Ian Campbell [Mon, 30 Mar 2015 11:12:35 +0000 (12:12 +0100)]
xen: arm: always omit guest user stack in vcpu_show_execution_state

Using !usr_mode(regs) only catches arm32 usr mode and not arm64 user
mode, switch to psr_mode_is_user instead.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Allow traps from 32 bit userspace on 64 bit hypervisors again
Ian Campbell [Mon, 30 Mar 2015 11:12:34 +0000 (12:12 +0100)]
xen: arm: Allow traps from 32 bit userspace on 64 bit hypervisors again

This removes the unconditional #undef injected in response to such
traps which was added by the fixes to CVE-2014-5147 / XSA-102 in
c0020e099702 "xen: arm: Handle traps from 32-bit userspace on 64-bit
kernel as undef", we now handle such traps correctly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Dump guest state when invalid trap state is detected
Ian Campbell [Mon, 30 Mar 2015 11:12:33 +0000 (12:12 +0100)]
xen: arm: Dump guest state when invalid trap state is detected

By adding GUEST_BUG_ON locally to traps.c.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle remaining traps from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:32 +0000 (12:12 +0100)]
xen: arm: handle remaining traps from userspace

CP14 dbg and general CP register access are both handled with
unconditional injection of #undef from their respective handlers, so
allow these even from 32-bit userspace on a 64-bit kernel.

SMC32 and HVC32 should only come from a guest in AArch32 mode and
SMC64 and HVC64 should only come from a guest in AArch64 mode. Add
appropriate BUG_ONs to all cases.

After this bad_trap is no longer used.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: correctly handle sysreg accesses from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:31 +0000 (12:12 +0100)]
xen: arm: correctly handle sysreg accesses from userspace

Previously we implemented all registers as RAZ/WI even if they
shouldn't be accessible to userspace.

It is not entirely clear whether attempts to access *_EL1 registers
from EL0 will trap to EL1 or EL2, be conservative and treat as an
undef injection.

PMUSERENR_EL0 and MDCCSR_EL0 are R/O to EL0. MDCCSR_EL0 was previously
not handled at all.

Other PM*_EL0 registers are accessible at EL0 only if PMUSERENR_EL0.EN
is set, since we emulate that as RAZ/WI we know that bit cannot be
set.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>