Boris Ostrovsky [Thu, 9 Jul 2015 11:36:15 +0000 (13:36 +0200)]
x86/VPMU: make vpmu not HVM-specific
vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:34:29 +0000 (13:34 +0200)]
x86/VPMU: add public xenpmu.h
Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.
Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.
While making these updates, also:
* Remove unused vpmu_domain() macro from vpmu.h
* Convert msraddr_to_bitpos() into an inline and make it a little faster by
realizing that all Intel's PMU-related MSRs are in the lower MSR range.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:27:52 +0000 (13:27 +0200)]
common/symbols: export hypervisor symbols to privileged guest
Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
hypercall
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Chao Peng [Thu, 9 Jul 2015 08:54:15 +0000 (16:54 +0800)]
docs: add xl-psr.markdown
Add document to introduce basic concepts and terms in PSR family
technologies and the xl interfaces.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chao Peng [Thu, 9 Jul 2015 08:54:14 +0000 (16:54 +0800)]
tools: add tools support for Intel CAT
This is the xc/xl changes to support Intel Cache Allocation
Technology(CAT).
'xl psr-hwinfo' is updated to show CAT info and two new commands
for CAT are introduced:
- xl psr-cat-cbm-set [-s socket] <domain> <cbm>
Set cache capacity bitmasks(CBM) for a domain.
- xl psr-cat-show <domain>
Show CAT domain information.
Examples:
[root@vmm-psr vmm]# xl psr-hwinfo --cat
Cache Allocation Technology (CAT):
Socket ID : 0
L3 Cache : 12288KB
Maximum COS : 15
CBM length : 12
Default CBM : 0xfff
[root@vmm-psr vmm]# xl psr-cat-cbm-set 0 0xff
[root@vmm-psr vmm]# xl psr-cat-show
Socket ID : 0
L3 Cache : 12288KB
Default CBM : 0xfff
ID NAME CBM
0 Domain-0 0xff
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chao Peng [Thu, 9 Jul 2015 08:54:13 +0000 (16:54 +0800)]
tools/libxl: introduce some socket helpers
Add libxl_socket_bitmap_alloc() to allow allocating a socket specific
libxl_bitmap (as it is for cpu/node bitmap).
Internal function libxl__count_physical_sockets() is introduced together
to get the socket count when the size of bitmap is not specified.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chao Peng [Thu, 9 Jul 2015 08:54:12 +0000 (16:54 +0800)]
tools/libxl: add command to show PSR hardware info
Add dedicated one to show hardware information.
[root@vmm-psr]xl psr-hwinfo
Cache Monitoring Technology (CMT):
Enabled : 1
Total RMID : 63
Supported monitor types:
cache-occupancy
total-mem-bandwidth
local-mem-bandwidth
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chao Peng [Thu, 9 Jul 2015 08:54:11 +0000 (16:54 +0800)]
tools/libxl: minor name changes for CMT commands
Use "-" instead of "_" for monitor types.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Wed, 8 Jul 2015 15:31:01 +0000 (17:31 +0200)]
sched: factor code that moves a vcpu to a new pcpu in a function
No functional change intended.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Dario Faggioli [Wed, 8 Jul 2015 15:30:25 +0000 (17:30 +0200)]
sched: factor the code for taking two runq locks in a function
No functional change intended.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Razvan Cojocaru [Wed, 8 Jul 2015 15:28:42 +0000 (17:28 +0200)]
vm_event: rename MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE
By naming, placing and bit shift convention, it could be taken as
implied that MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE are
mem_access event specific flags (instead of being generally
applicable as vm_event flags). This patch renames them to
VM_EVENT_FLAG_EMULATE and VM_EVENT_FLAG_EMULATE_NOWRITE
respectively, and uses bit shifts following the rest of the
VM_EVENT_FLAG_ constants.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tlengyel@novetta.com>
Dario Faggioli [Tue, 7 Jul 2015 16:44:18 +0000 (18:44 +0200)]
docs: get rid of the SEDF scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 16:44:11 +0000 (18:44 +0200)]
xl: get rid of the SEDF scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 16:44:02 +0000 (18:44 +0200)]
xen: kill sched_sedf.c
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 16:43:55 +0000 (18:43 +0200)]
xen: get rid of the SEDF scheduler
more specifically, of all the symbols and references
to it.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 16:43:47 +0000 (18:43 +0200)]
libxc: get rid of the SEDF scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 16:43:40 +0000 (18:43 +0200)]
tools: python: get rid of the SEDF scheduler bindings
as it is going away from libxc, so these won't build any
longer.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 16:43:32 +0000 (18:43 +0200)]
libxl: get rid of the SEDF scheduler
only the interface is left in place, for backward
compile-time compatibility, but every attempt to
use it would throw an error.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell [Tue, 7 Jul 2015 15:40:32 +0000 (16:40 +0100)]
tools: Do not add top-level tools dir to include path
Instead switch to an explicit -include $(XEN_ROOT)/tools/config.h to
pickup config.h.
Most places already do this, fixup the rest.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 7 Jul 2015 15:40:31 +0000 (16:40 +0100)]
tools: Link in-tree libvchan and libblktapctl users against lib....so
As with other in-tree users avoid -L + -l.
This avoids any confusion with versions of these libraries already
installed on the system and the possibility of linking against them by
mistake.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 7 Jul 2015 15:40:30 +0000 (16:40 +0100)]
tools: libxl: Log on more error paths on domain create failure
The setdefault functions do not generally log why they didn't like
things.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 7 Jul 2015 15:40:29 +0000 (16:40 +0100)]
libxc: Correct log message in xc_map_foreign_bulk
Things are confusing enough as it is without using another function's
name here.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 7 Jul 2015 15:40:28 +0000 (16:40 +0100)]
tools: libxc: fix "alocated" typo in comment
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Tue, 7 Jul 2015 16:22:32 +0000 (17:22 +0100)]
xen/arm: Rename XEN_DOMCTL_CONFIG_GIC_DEFAULT to XEN_DOMCTL_CONFIG_GIC_NATIVE
This will reflect that we effectively emulate the same version as the
hardware GIC for the guest.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jennifer Herbert [Tue, 7 Jul 2015 16:38:59 +0000 (16:38 +0000)]
libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
Unlikely that it may seem localtime_r could fail, which would result in a
null pointer dereference. In this case, it shoud log the errno, (instead of
the date/time), and and continue its logging, as this is still useful.
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 15 Jun 2015 13:50:42 +0000 (14:50 +0100)]
xl: Sane handling of extra config file arguments
Various xl sub-commands take additional parameters containing = as
additional config fragments.
The handling of these config fragments has a number of bugs:
1. Use of a static 1024-byte buffer. (If truncation would occur,
with semi-trusted input, a security risk arises due to quotes
being lost.)
2. Mishandling of the return value from snprintf, so that if
truncation occurs, the to-write pointer is updated with the
wanted-to-write length, resulting in stack corruption. (This is
XSA-137.)
3. Clone-and-hack of the code for constructing the appended
config file.
These are fixed here, by introducing a new function
`string_realloc_append' and using it everywhere. The `extra_info'
buffers are replaced by pointers, which start off NULL and are
explicitly freed on all return paths.
The separate variable which will become dom_info.extra_config is
abolished (which involves moving the clearing of dom_info).
Additional bugs I observe, not fixed here:
4. The functions which now call string_realloc_append use ad-hoc
error returns, with multiple calls to `return'. This currently
necessitates multiple new calls to `free'.
5. Many of the paths in xl call exit(-rc) where rc is a libxl status
code. This is a ridiculous exit status `convention'.
6. The loops for handling extra config data are clone-and-hacks.
7. Once the extra config buffer is accumulated, it must be combined
with the appropriate main config file. The code to do this
combining is clone-and-hacked too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian,campbell@citrix.com>
Anthony PERARD [Tue, 7 Jul 2015 15:09:13 +0000 (16:09 +0100)]
libxl: Increase device model startup timeout to 1min.
On a busy host, QEMU may take more than 10s to load and start.
This is likely due to a bug in Linux where the I/O subsystem sometime
produce high latency under load and result in QEMU taking a long time to
load every single dynamic libraries.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 1 Jun 2015 10:32:23 +0000 (11:32 +0100)]
tools: libxl: allow permissive qemu-upstream pci passthrough.
Since XSA-131 qemu-xen now restricts access to PCI cfg by default. In
order to allow local configuration of the existing libxl_device_pci
"permissive" flag needs to be plumbed through via the new QMP property
added by the XSA-131 patches.
Versions of QEMU prior to XSA-131 did not support this permissive
property, so we only pass it if it is true. Older versions only
supported permissive mode.
qemu-xen-traditional already supports the permissive mode setting via
xenstore.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Mon, 6 Jul 2015 15:52:30 +0000 (16:52 +0100)]
libxl: Provide doc comments for AO_GC and STATE_AO_GC
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
George Dunlap [Mon, 6 Jul 2015 10:51:40 +0000 (11:51 +0100)]
tools: Add a block-tap script for setting up tapdisks via tap-ctl
The blocktap library isn't really necessary; all the necessary functionality
is available via the tap-ctl binary.
To use:
script=block-tap,vdev=[whatever],target=vhd:/path/to/file.vhd
script=block-tap,vdev=[whatever],target=aio:/path/to/file.raw
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 6 Jul 2015 10:51:43 +0000 (11:51 +0100)]
libxl: Add more logging to hotplug script path
This was useful in tracking down bugs.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 6 Jul 2015 10:51:39 +0000 (11:51 +0100)]
libxl: Remove linux udev rules
They are no longer needed, having been replaced by a daemon for
driverdomains which will run scripts as necessary.
Worse yet, they seem to be broken for script-based block devices, such
as block-iscsi. This wouldn't matter so much if they were never run
by default; but if you run block-attach without having created a
domain, then the appropriate node to disable running udev scripts will
not have been written yet, and the attach will silently fail.
Rather than try to sort out that issue, just remove them entirely.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 6 Jul 2015 10:51:38 +0000 (11:51 +0100)]
libxl: Make local_initiate_attach more rational
There are a lot of paths through
libxl__device_disk_local_initiate_attach(), but they all really boil
down to one thing: Can we just access the file directly, or do we need
to attach it?
The requirements for direct access are fairly simple:
* Is this local (as opposed to a driver domain)?
* Is this a raw format (as opposed to cooked)?
* Does this have no scripts associated with it?
If it meets all those requirements, we can access it directly;
otherwise we need to attach it.
This fixes a bug where bootloader execution fails for disks with
hotplug scripts.
This should fix a theoretical bug when using a qdisk backend in a
driver domain. (Not tested.)
Based on a patch by Roger Pau Monne <roger.pau@citrix.com>.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 6 Jul 2015 13:47:40 +0000 (14:47 +0100)]
libxc: fix PV vNUMA guest memory allocation
In
415b58c1 (tools/libxc: Batch memory allocations for PV guests) the
number of super pages is calculated with the number of total pages. That
is wrong. It breaks PV guest vNUMA. The correct number of super pages
should be derived from the number of pages within that virtual NUMA
node.
Also change the name and type of super page variable to match the naming
convention and type of normal page variable. Make the necessary
adjustment to make code compile.
Reported-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-and-Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 6 Jul 2015 13:17:19 +0000 (14:17 +0100)]
libxc: remove trailing newline in xc_dom_panic format string
xc_dom_panic prints more information after user supplied strings, so
don't print a newline.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 11 May 2015 11:55:36 +0000 (12:55 +0100)]
tools/xenconsoled: Use XC_PAGE_SIZE rather than getpagesize()
Linux may not use the same page granularity as Xen. This will result to
a domain crash because it will try to map more page than required.
As the console page size will always be equal to a Xen page size, use
XC_PAGE_SIZE.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 11 May 2015 11:55:35 +0000 (12:55 +0100)]
tools/xenstored: Use XC_PAGE_SIZE rather than getpagesize()
Linux may not use the same page granularity as Xen. This will result to
a domain crash because it will try to map more page than required.
As the xenstore page size willl always be equal to a Xen page size, use
XC_PAGE_SIZE.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 1 Jul 2015 14:43:07 +0000 (15:43 +0100)]
xen: earlycpio: Pull in latest linux earlycpio.[ch]
AFAICT our current version does not correspond to any version in the
Linux history. This commit resynchronised to the state in Linux
commit
598bae70c2a8e35c8d39b610cca2b32afcf047af.
Differences from upstream: find_cpio_data is __init, printk instead of
pr_*.
This appears to fix Debian bug #785187. "Appears" because my test box
happens to be AMD and the issue is that the (valid) cpio generated by
the Intel ucode is not liked by the old Xen code. I've tested by
hacking the hypervisor to look for the Intel path.
Reported-by: Stephan Seitz <stse+debianbugs@fsing.rootsland.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stephan Seitz <stse+debianbugs@fsing.rootsland.net>
Cc: 785187@bugs.debian.org
Acked-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:18 +0000 (09:46 +0100)]
xen: arm: consolidate mmio and irq mapping to dom0
The code in the callbacks for dt_for_each_irq_map and
dt_for_each_range is very similar to the code in handle_device for
each non-pci device.
In fact the only major difference is that the irq callback needs to
call irq_set_spi_type in the PCI case. Refactor into a
map_dt_irq_to_domain callback which does the irq_set_spi_type and then
calls map_irq_to_domain which is also used from handle_device.
For mmio map_range_to_domain can already be used directly from
handle_device too. Note that the uses of PAGE_MASK in the
handle_device code here were unnecessary (and already removed from the
map_range_to_domain variant).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:17 +0000 (09:46 +0100)]
xen: arm: Import of_bus PCI entry from Linux (as a dt_bus entry)
This provides specific handlers for the PCI bus relating to matching
and translating. It's mostly similar to the defaults but includes some
additional error checks and other PCI specific bits.
There are some subtle differences in how the generic code vs. the pci
specific code here will handle buggy DTs (i.e. #*-cells which are not
as required by the pci bindings). This will mean we tolerate such
device trees better.
I say "buggy", but actually it's not clear to me from reading "PCI Bus
Binding to Open Firmware" that when the device_type is "pci" that
e.g. the text says "The value of "#address-cells" for PCI Bus Nodes is
3." and not "A PCI Bus Node must contain a #address-cells property
containing 3", iow the #address-cells might validly be implicit rather
than an actual property. Maybe that interpretation is bogus, but with
this patch we are are able to cope with DTs written by people who do
read it like that.
It also gets us the ability to parse the flags (cacheability),
although at the moment we only check them for validity rather than use
them.
Functions/types renamed and reindented (because apparently we do
that for these).
Needs a selection of IORESOURCE_* defines, which I've taken from Linux
and have included locally for now until we figure out where else they
might be needed.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:16 +0000 (09:46 +0100)]
xen: arm: map child MMIO and IRQs to dom0 for PCI bus DT nodes.
This uses the dt_for_each_{irq_map,range} helpers to map the interrupt
and child MMIO regions to dom0. Since PCI busses are enumerable these
resources may not be otherwise described in the DT (although they can
be).
Although PCI is the only bus we handle this way the code should be
generic enough to apply to similar buses in the future.
This replaces the xgene specific mapping. Tested on Mustang and on a
model with a PCI virtio controller.
This patch doesn't stop recursing when it finds such a node, since
double mapping these resources if they do happen to be described is
(or should be) harmless
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:15 +0000 (09:46 +0100)]
xen: arm: drop redundant extra call to vgic_reserve_virq
This is only needed if we are giving the IRQ to dom0 (as opposed to
setting it up for passthrough due to xen,passthrough property). There
is already a call to vgic_reserve_virq inside the if ( need_mapping ),
so drop this one.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:14 +0000 (09:46 +0100)]
xen: dt: add dt_for_each_range helper
This function iterates over a node's ranges property and calls a
callback for each region. For now it only supplies the MMIO range (in
terms of CPU addresses, i.e. already translated).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:13 +0000 (09:46 +0100)]
xen: dt: add dt_for_each_irq_map helper
This function iterates over a nodes interrupt-map property and calls a
callback for each interrupt. For now it only supplies the translated
IRQ since my use case has no need of e.g. child unit address. These
can be added as needed by any future users.
This follows much the same logic as dt_irq_map_raw when parsing the
interrupt-map, but doesn't walk up the tree doing the actual
translation and it iterates over all entries instead of just looking
for the first match.
I looked into refactoring dt_irq_map_raw but I couldn't find a way
which I was confident in, plus I was reluctant to diverge from the
Linux roots of this function any further.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 11:51:54 +0000 (12:51 +0100)]
tools: Rerun autogen.sh with Jessie version of autoconf
I have upgraded the box which I use to do committing (and hence run
autogen.sh on) from Debian Wheezy to Jessie, resulting in a upgrade
from autoconf 2.69-1 to 2.69-8. To avoid noise from this transition
when the next configure.ac change occurs regenerate those files now.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Chao Peng [Tue, 7 Jul 2015 13:47:53 +0000 (15:47 +0200)]
xsm: add CAT related policies
Add xsm policies for Cache Allocation Technology(CAT) related hypercalls
to restrict the functions visibility to control domain only.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Chao Peng [Tue, 7 Jul 2015 13:47:18 +0000 (15:47 +0200)]
x86: add scheduling support for Intel CAT
On context switch, write the the domain's Class of Service(COS) to MSR
IA32_PQR_ASSOC, to notify hardware to use the new COS.
For performance reason, the COS mask for current cpu is also cached in
the local per-CPU variable.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Chao Peng [Tue, 7 Jul 2015 13:46:39 +0000 (15:46 +0200)]
x86: dynamically get/set CBM for a domain
For CAT, COS is maintained in hypervisor only while CBM is exposed to
user space directly to allow getting/setting domain's cache capacity.
For each specified CBM, hypervisor will either use a existed COS which
has the same CBM or allocate a new one if the same CBM is not found. If
the allocation fails because of no enough COS available then error is
returned. The getting/setting are always operated on a specified socket.
For multiple sockets system, the interface may be called several times.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Chao Peng [Tue, 7 Jul 2015 13:46:00 +0000 (15:46 +0200)]
x86: expose CBM length and COS number information
General CAT information such as maximum COS and CBM length are exposed to
user space by a SYSCTL hypercall, to help user space to construct the CBM.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Chao Peng [Tue, 7 Jul 2015 13:45:08 +0000 (15:45 +0200)]
x86: add COS information for each domain
In Xen's implementation, the CAT enforcement granularity is per domain.
Due to the length of CBM and the number of COS may be socket-different,
each domain has COS ID for each socket. The domain get COS=0 by default
and at runtime its COS is then allocated dynamically when user specifies
a CBM for the domain.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Chao Peng [Tue, 7 Jul 2015 13:44:24 +0000 (15:44 +0200)]
x86: maintain COS to CBM mapping for each socket
For each socket, a COS to CBM mapping structure is maintained for each
COS. The mapping is indexed by COS and the value is the corresponding
CBM. Different VMs may use the same CBM, a reference count is used to
indicate if the CBM is available.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Chao Peng [Tue, 7 Jul 2015 13:43:33 +0000 (15:43 +0200)]
x86: detect and initialize Intel CAT feature
Detect Intel Cache Allocation Technology(CAT) feature and store the
cpuid information for later use. Currently only L3 cache allocation is
supported. The L3 CAT features may vary among sockets so per-socket
feature information is stored. The initialization can happen either at
boot time or when CPU(s) is hot plugged after booting.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Use -1 as notifier priority. Fix typos.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Chao Peng [Tue, 7 Jul 2015 13:42:49 +0000 (15:42 +0200)]
x86: add socket_cpumask
Maintain socket_cpumask which contains all the HT and core siblings
in the same socket.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 7 Jul 2015 12:40:04 +0000 (14:40 +0200)]
x86/hvm: make sure emulation is retried if domain is shutting down
The addition of commit
2df1aa01 "x86/hvm: remove hvm_io_pending() check
in hvmemul_do_io()" causes a problem in migration because I/O that was
caught by the test of vcpu_start_shutdown_deferral() in
hvm_send_assist_req() is now considered completed rather than requiring
a retry.
This patch fixes the problem by having hvm_send_assist_req() return
X86EMUL_RETRY rather than X86EMUL_OKAY if the
vcpu_start_shutdown_deferral() test fails and then making sure that
the emulation state is reset if the domain is found to be shutting
down.
Reported-by: Don Slutz <don.slutz@gmail.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 7 Jul 2015 12:39:40 +0000 (14:39 +0200)]
x86/hvmloader: improve error handling for xenbus interactions
Consume and ignore all XS_DEBUG packets, and pass the response type back to
the caller of xenbus_recv() so the caller can take appropriate action if an
unexpected reply was received.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 7 Jul 2015 12:39:27 +0000 (14:39 +0200)]
x86/hvmloader: avoid data corruption with xenstore reads/writes
The functions ring_read and ring_write() have logic to try and deal with
partial reads and writes.
However, in all cases where the "while (len)" loop executed twice, data
corruption would occur as the second memcpy() starts from the beginning of
"data" again, rather than from where it got to.
This bug manifested itself as protocol corruption when a reply header crossed
the first wrap of the response ring. However, similar corruption would also
occur if hvmloader observed xenstored performing partial writes of the block
in question, or if hvmloader had to wait for xenstored to make space in either
ring.
Reported-by: Adam Kucia <djexit@o2.pl>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 12:30:06 +0000 (14:30 +0200)]
credit1: properly deal with pCPUs not in any cpupool
Ideally, the pCPUs that are 'free', i.e., not assigned
to any cpupool, should not be considred by the scheduler
for load balancing or anything. In Credit1, we fail at
this, because of how we use cpupool_scheduler_cpumask().
In fact, for a free pCPU, cpupool_scheduler_cpumask()
returns a pointer to cpupool_free_cpus, and hence, near
the top of csched_load_balance():
if ( unlikely(!cpumask_test_cpu(cpu, online)) )
goto out;
is false (the pCPU _is_ free!), and we therefore do not
jump to the end right away, as we should. This, causes
the following splat when resuming from ACPI S3 with
pCPUs not assigned to any pool:
(XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN) [<
ffff82d080122eaa>] csched_load_balance+0x213/0x794
(XEN) [<
ffff82d08012374c>] csched_schedule+0x321/0x452
(XEN) [<
ffff82d08012c85e>] schedule+0x12a/0x63c
(XEN) [<
ffff82d08012fa09>] __do_softirq+0x82/0x8d
(XEN) [<
ffff82d08012fa61>] do_softirq+0x13/0x15
(XEN) [<
ffff82d080164780>] idle_loop+0x5b/0x6b
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 8:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************
The cure is:
* use cpupool_online_cpumask(), as a better guard to the
case when the cpu is being offlined;
* explicitly check whether the cpu is free.
SEDF is in a similar situation, so fix it too.
Still in Credit1, we must make sure that free (or offline)
CPUs are not considered "ticklable". Not doing so would impair
the load balancing algorithm, making the scheduler think that
it is possible to 'ask' the pCPU to pick up some work, while
in reallity, that will never happen! Evidence of such behavior
is shown in this trace:
Name CPU list
Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0.
112998198 | ||.|| -|x||-|- d0v0 runstate_change d0v4 offline->runnable
] 0.
112998198 | ||.|| -|x||-|- d0v0 22006(2:2:6) 1 [ f ]
] 0.
112999612 | ||.|| -|x||-|- d0v0 28004(2:8:4) 2 [ 0 4 ]
0.
113003387 | ||.|| -||||-|x d32767v15 runstate_continue d32767v15 running->running
where "22006(2:2:6) 1 [ f ]" means that pCPU 15, which is
free from any pool, is tickled.
The cure, in this case, is to filter out the free pCPUs,
within __runq_tickle().
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Dario Faggioli [Tue, 7 Jul 2015 12:29:39 +0000 (14:29 +0200)]
x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
In fact, when a pCPU goes down, we want to clear its
bit in the correct cpupool's valid mask, rather than
always in cpupool0's one.
Before this commit, all the pCPUs in the non-default
pool(s) will be considered immediately valid, during
system resume, even the one that have not been brought
up yet. As a result, the (Credit1) scheduler will attempt
to run its load balancing logic on them, causing the
following Oops:
# xl cpupool-cpu-remove Pool-0 8-15
# xl cpupool-create name=\"Pool-1\"
# xl cpupool-cpu-add Pool-1 8-15
--> suspend
--> resume
(XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----
(XEN) CPU: 8
(XEN) RIP: e008:[<
ffff82d080123078>] csched_schedule+0x4be/0xb97
(XEN) RFLAGS:
0000000000010087 CONTEXT: hypervisor
(XEN) rax:
80007d2f7fccb780 rbx:
0000000000000009 rcx:
0000000000000000
(XEN) rdx:
ffff82d08031ed40 rsi:
ffff82d080334980 rdi:
0000000000000000
(XEN) rbp:
ffff83010000fe20 rsp:
ffff83010000fd40 r8:
0000000000000004
(XEN) r9:
0000ffff0000ffff r10:
00ff00ff00ff00ff r11:
0f0f0f0f0f0f0f0f
(XEN) r12:
ffff8303191ea870 r13:
ffff8303226aadf0 r14:
0000000000000009
(XEN) r15:
0000000000000008 cr0:
000000008005003b cr4:
00000000000026f0
(XEN) cr3:
00000000dba9d000 cr2:
0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN) [<
ffff82d080123078>] csched_schedule+0x4be/0xb97
(XEN) [<
ffff82d08012c732>] schedule+0x12a/0x63c
(XEN) [<
ffff82d08012f8c8>] __do_softirq+0x82/0x8d
(XEN) [<
ffff82d08012f920>] do_softirq+0x13/0x15
(XEN) [<
ffff82d080164791>] idle_loop+0x5b/0x6b
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 8:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************
The reason why the error is a #GP fault is that, without
this commit, we try to access the per-cpu area of a not
yet allocated and initialized pCPU.
In fact, %rax, which is what is used as pointer, is
80007d2f7fccb780, and we also have this:
#define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Dario Faggioli [Tue, 7 Jul 2015 12:28:35 +0000 (14:28 +0200)]
sched: avoid dumping duplicate information
When dumping scheduling information (debug key 'r'), what
we print as 'Idle cpupool' is pretty much the same of what
we print immediately after as 'Cpupool0'. In fact, if there
are no pCPUs outside of any cpupools, it is exactly the
same.
If there are free pCPUs, there is some valuable information,
but still a lot of duplication:
(XEN) Online Cpus: 0-15
(XEN) Free Cpus: 8
(XEN) Idle cpupool:
(XEN) Scheduler: SMP Credit Scheduler (credit)
(XEN) info:
(XEN) ncpus = 13
(XEN) master = 0
(XEN) credit = 3900
(XEN) credit balance = 45
(XEN) weight = 1280
(XEN) runq_sort = 11820
(XEN) default-weight = 256
(XEN) tslice = 30ms
(XEN) ratelimit = 1000us
(XEN) credits per msec = 10
(XEN) ticks per tslice = 3
(XEN) migration delay = 0us
(XEN) idlers:
00000000,
00006d29
(XEN) active vcpus:
(XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
(XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
(XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
(XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
(XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
(XEN) CPU[08] sort=5791, sibling=
00000000,
00000300, core=
00000000,
0000ff00
(XEN) run: [32767.8] pri=-64 flags=0 cpu=8
(XEN) Cpupool 0:
(XEN) Cpus: 0-5,10-15
(XEN) Scheduler: SMP Credit Scheduler (credit)
(XEN) info:
(XEN) ncpus = 13
(XEN) master = 0
(XEN) credit = 3900
(XEN) credit balance = 45
(XEN) weight = 1280
(XEN) runq_sort = 11820
(XEN) default-weight = 256
(XEN) tslice = 30ms
(XEN) ratelimit = 1000us
(XEN) credits per msec = 10
(XEN) ticks per tslice = 3
(XEN) migration delay = 0us
(XEN) idlers:
00000000,
00006d29
(XEN) active vcpus:
(XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
(XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)}
(XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)}
(XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)}
(XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)}
(XEN) CPU[00] sort=11801, sibling=
00000000,
00000003, core=
00000000,
000000ff
(XEN) run: [32767.0] pri=-64 flags=0 cpu=0
... ... ...
(XEN) CPU[15] sort=11820, sibling=
00000000,
0000c000, core=
00000000,
0000ff00
(XEN) run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)}
(XEN) 1: [32767.15] pri=-64 flags=0 cpu=15
(XEN) Cpupool 1:
(XEN) Cpus: 6-7,9
(XEN) Scheduler: SMP RTDS Scheduler (rtds)
(XEN) CPU[06]
(XEN) CPU[07]
(XEN) CPU[09]
With this change, we get rid of the redundancy, and retain
only the information about the free pCPUs.
(While there, turn a loop index variable from `int' to
`unsigned int' in schedule_dump().)
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Andrew Cooper [Tue, 7 Jul 2015 12:28:00 +0000 (14:28 +0200)]
x86: calculate PV CR4 masks at boot
... rather than on every time a guest sets CR4.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 7 Jul 2015 09:37:26 +0000 (11:37 +0200)]
x86/p2m-ept: don't unmap the EPT pagetable while it is still in use
The call to iommu_pte_flush() between the two hunks uses &ept_entry->epte
which is a pointer into the mapped page.
It is eventually passed to `clflush` instruction which will suffer a pagefault
if the virtual mapping has fallen out of the TLB.
(XEN) ----[ Xen-4.5.0-xs102594-d x86_64 debug=y Not tainted ]----
(XEN) CPU: 7
(XEN) RIP: e008:[<
ffff82d0801572f0>] cacheline_flush+0x4/0x9
<snip>
(XEN) Xen call trace:
(XEN) [<
ffff82d0801572f0>] cacheline_flush+0x4/0x9
(XEN) [<
ffff82d08014ffff>] __iommu_flush_cache+0x4a/0x6a
(XEN) [<
ffff82d0801532e2>] iommu_pte_flush+0x2b/0xd5
(XEN) [<
ffff82d0801f909a>] ept_set_entry+0x4bc/0x61f
(XEN) [<
ffff82d0801f0c25>] p2m_set_entry+0xd1/0x112
(XEN) [<
ffff82d0801f25b1>] clear_mmio_p2m_entry+0x1a0/0x200
(XEN) [<
ffff82d0801f4aac>] unmap_mmio_regions+0x49/0x73
(XEN) [<
ffff82d080106292>] do_domctl+0x15bd/0x1edb
(XEN) [<
ffff82d080234fcb>] syscall_enter+0xeb/0x145
(XEN)
(XEN) Pagetable walk from
ffff820040004ae0:
(XEN) L4[0x104] =
00000008668a5063 ffffffffffffffff
(XEN) L3[0x001] =
00000008668a3063 ffffffffffffffff
(XEN) L2[0x000] =
000000086689c063 ffffffffffffffff
(XEN) L1[0x004] =
000000056f078063 000000000007f678
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 7:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address:
ffff820040004ae0
(XEN) ****************************************
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 7 Jul 2015 09:36:55 +0000 (11:36 +0200)]
x86/traps: move early pagefault static data into __initdata
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:39:52 +0000 (10:39 +0200)]
x86/nHVM: generic hook adjustments
Some of the generic hooks were unused altogether - drop them.
Some of the hooks were used only to handle calls from the specific
vendor's code (SVM) - drop them too.
Several more hooks were pointlessly implementaed as out-of-line
functions, when most (all?) other HVM hooks use inline ones - make
them inlines. None of them are implemented by only one of SVM or VMX,
so also drop the conditionals. Funnily nhvm_vmcx_hap_enabled(), having
return type bool_t, nevertheless returned -EOPNOTSUPP.
nhvm_vmcx_guest_intercepts_trap() and its hook and implementations are
being made return bool_t, as they should have been from the beginning
(its sole caller only checks for a non-zero result).
Finally, make static whatever can as a result be static.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Feng Wu [Tue, 7 Jul 2015 08:39:25 +0000 (10:39 +0200)]
x86: add helper macro for X86_FEATURE_CX16 feature detection
Add macro cpu_has_cx16 to detect X86_FEATURE_CX16 feature.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:34:13 +0000 (10:34 +0200)]
x86: drop is_pv_32on64_domain()
... as being identical to is_pv_32bit_domain() after the x86-32
removal.
In a few cases this includes no longer open-coding is_pv_32bit_vcpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:32:59 +0000 (10:32 +0200)]
minor shared/vcpu info adjustments
- remove vcpu_info from xlat.lst (it isn't and can't be checked)
- drop pointless (redundant) casts
- make dummy_vcpu_info static
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:30:12 +0000 (10:30 +0200)]
gnttab: clean up gnttab_set_version()
- drop pointless nr_grant_entries() check from loop over reserved
entries (adding suitable BUILD_BUG_ON()s to validate that)
- adjust types
- rename d to currd
- formatting
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:29:35 +0000 (10:29 +0200)]
gnttab: don't silently truncate frame numbers in gnttab_set_version()
On a v2 -> v1 transition frame numbers previously stored in a 64-bit
field have to fit into a 32-bit one.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:28:25 +0000 (10:28 +0200)]
gnttab: fix out of range shift count
Commit
213f145114 ("gnttab: fix/adjust gnttab_transfer()") wasn't
careful enough in this regard.
Coverity ID:
1306859
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jennifer Herbert [Wed, 1 Jul 2015 17:37:11 +0000 (17:37 +0000)]
libxc: Fix misleading use of strncpy code in build_hvm_info()
hvm_info->signature is not a string, but an 64 bit int, and is not
NULL terminated. The use of strncpy to populate it is inappropriate and
potentially misleading. A cursory glance might have you thinking someone
had miscounted the length of the string literal - not realising it was
intentionally cropping of the null termination.
Also, since we wish to initialise all of hvm_info->signature, and
certainly no more, the use of sizeof is safer.
Coverity-ID:
1198710
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jennifer Herbert [Wed, 1 Jul 2015 17:37:09 +0000 (17:37 +0000)]
libxc: Prevent dereferencing NULL pointers returned from xc_dom_allocate()
The return from xc_dom_allocate is not checked for a NULL value.
This patch fixes this, causing it to return from the function with an error.
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 3 Jul 2015 15:33:45 +0000 (16:33 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Julien Grall [Fri, 26 Jun 2015 15:43:09 +0000 (16:43 +0100)]
xen/arm: Remove unused field eoi_cpu in arch_irq_desc
This field have been set but not used since Xen 4.5. Slim down Xen by
about 4K by removing it.
Also fix comment coding style.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 3 Jul 2015 11:42:40 +0000 (12:42 +0100)]
xl: xl -N create -d sends json output to stdout, not stderr
domain config output goes to
before after
xl create nowhere nowhere
xl create -d stderr stderr
xl -N create stdout stdout
xl -N create -d stderr stdout
It is not sensible that adding -d would cause different output on
stdout. And that -N would produce less debug output is hardly
surprising in general and not really a problem in this case.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: New patch in this version of the mini-series.
Ian Jackson [Fri, 26 Jun 2015 14:19:46 +0000 (15:19 +0100)]
xl: Change output from xl -N create to be more useful
Currently, xl -N create produces:
{
"domid": null,
"config": {
"c_info": {
"type": "pv",
[etc]
}
The domid is always NULL (as the domain has not been created at this
stage).
This is annoying if you want to take this output and use it for some
actually useful purpose like domain creation: either it needs to be
massaged, or the the consuming tool needs to be taught to look inside
the json object for the `config' element (which IMO makes no sense as
an interface).
We would like to be able to pass libxl json configs around sensibly.
In the future maybe xl will grow an option to create a domain from a
json config, and this is currently something I want to be able to have
a test tool do.
Note that this change is NOT BACKWARDS COMPATIBLE. But it would only
adversely affects anyone who uses `xl -N create' and then saves and
processes the JSON. (The output from xl list et al is not changed; it
normally needs the domid.) Such a user should probably have already
have complained about the infelicitous output. If they haven't it
would be simple enough for them to bookend the output so as to provide
compatible output.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Euan Harris <euan.harris@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Print json output to correct filehandle
(Using newly introduced flush_stream.)
Ian Jackson [Fri, 3 Jul 2015 11:36:20 +0000 (12:36 +0100)]
xl: Break out flush_stream
We are going to want to reuse this. Adjust the code slightly to
detect right away call sites that pass something other than stdout or
stderr.
No resulting functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: New patch in this version of the mini-series
Ian Campbell [Fri, 26 Jun 2015 11:06:09 +0000 (12:06 +0100)]
stubdom: vtpmmgr: Correctly format size_t with %z when printing.
Also contains a fix from Thomas Leonard (to use %u for "4 + 32", not
%lu) previously posted as part of "mini-os: enable compiler check for
printk format types" but with mini-os now having been split a separate
repo most of that change has been applied there.
This fixes the 32-bit build with updated mini-os which includes format
string checking.
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-By: Samuel Thibault <samuel.thibault@ens-lyon.org>
[ ijc -- Updated MINIOS_UPSTREAM_REVISION ]
Chen Baozi [Tue, 30 Jun 2015 08:00:22 +0000 (16:00 +0800)]
xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64
After we have increased the size of GICR in address space for guest
and made use of both AFF0 and AFF1 in (v)MPIDR, we are now able to
support up to 4096 vCPUs in theory. However, it will cost 512M
address space for GICR region, which is unnecessarily big at the
moment. Considering the max CPU number that GIC-500 can support and
the old value of MAX_VIRT_CPUS before commit
aa25a61, we increase
its value to 128.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chen Baozi [Tue, 30 Jun 2015 08:00:21 +0000 (16:00 +0800)]
xen/arm: make domain_max_vcpus return value from vgic_ops
Each vGIC driver supports different maximum numbers of vCPU. For
example, GICv2 is limited to 8 vCPUs, while GICv3 can support up
to 4096 vCPUs if we use both AFF0 and AFF1. Thus, domain_max_vcpus
should depend on not only MAX_VIRT_CPUS but also the version
of vGIC that the guest uses.
Since evtchn_init would call domain_max_vcpus to allocate poll_mask
when the vgic_ops haven't been initialised yet, we make it return
MAX_VIRT_CPUS at that time. On ARM32, event channel doesn't need
to allocate the poll_mask because MAX_VIRT_CPUS < BITS_PER_LONG,
while allocating more memory (2 unsigned long rather than 1) only
for poll_mask on arm64 with GICv2 looks not so expensive.
We didn't keep it as the old static inline form because it will break
compilation when access the member of struct domain:
In file included from xen/include/xen/domain.h:6:0,
from xen/include/xen/sched.h:10,
from arm64/asm-offsets.c:10:
xen/include/asm/domain.h: In function ‘domain_max_vcpus’:
xen/include/asm/domain.h:266:10: error: dereferencing pointer to incomplete type
if (d->arch.vgic.version == GIC_V2)
^
Signed-off-by: Chen Baozi <baozich@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chen Baozi [Tue, 30 Jun 2015 08:00:20 +0000 (16:00 +0800)]
xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity
According to ARM CPUs bindings, the reg field should match the MPIDR's
affinity bits. We will use AFF0 and AFF1 when constructing the reg value
of the guest at the moment, for it is enough for the current max vcpu
number.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
[ ijc -- use PRIx64 to format mpidr_aff in node name, fixing 32-bit
build ]
Chen Baozi [Tue, 30 Jun 2015 08:00:19 +0000 (16:00 +0800)]
tools/libxl: Set 'reg' of cpu node equal to MPIDR affinity for domU
According to ARM CPUs bindings, the reg field should match the MPIDR's
affinity bits. We will use AFF0 and AFF1 when constructing the reg value
of the guest at the moment, for it is enough for the current max vcpu
number.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chen Baozi [Tue, 30 Jun 2015 08:00:18 +0000 (16:00 +0800)]
xen/arm: Use AFF1 when translating ICC_SGI1R_EL1 to cpumask
The old unsigned long type of vcpu_mask can only express 64 cpus at the
most, which might not be enough for the guest which used vGICv3. We
introduce a new struct sgi_target for the target cpu list of SGI, which
holds the affinity path information (only level 1 at the moment). For
GICv2 that has no affinity level, we can just set the corresponding
fields to be 0.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Chen Baozi [Tue, 30 Jun 2015 08:00:17 +0000 (16:00 +0800)]
xen/arm: Use the new functions for vCPUID/vaffinity transformation
There are 3 places to change:
* Initialise vMPIDR value in vcpu_initialise()
* Find the vCPU from vMPIDR affinity information when accessing GICD
registers in vGIC
* Find the vCPU from vMPIDR affinity information when booting with vPSCI
in vGIC
- Both PSCI 0.1 and PSCI 0.2 are modified to respect the MPIDR like.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chen Baozi [Tue, 30 Jun 2015 08:00:16 +0000 (16:00 +0800)]
xen/arm: Add functions of mapping between vCPUID and virtual affinity
GICv3 restricts that the maximum number of CPUs in affinity 0 (one
cluster) is 16. (See the note of 'Bits[15:0]' in '5.7.29 ICC_SGI0R_EL1
ICC_SGI1R_EL1 and ICC_ASGI1R_EL1, GICv3 Architecture Specification')
That is to say the upper 4 bits of affinity 0 is unused. Current
implementation considers that AFF0 is equal to vCPUID, which makes all
vCPUs in one cluster, limiting its number to 16. If we would like to
support more than 16 number of vCPU in one guest, we need to make use
of AFF1. Considering the unused upper 4 bits, we need to create a pair
of functions mapping the vCPUID and virtual affinity.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chen Baozi [Tue, 30 Jun 2015 08:00:15 +0000 (16:00 +0800)]
xen/arm: gic-v3: Increase the size of GICR in address space for guest
Currently it only supports up to 8 vCPUs. Increase the region to hold
up to 128 vCPUs, which is the maximum number that GIC-500 supports.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Wed, 1 Jul 2015 14:03:14 +0000 (16:03 +0200)]
libxl: unset info->numa_placement upon successful placement
so that, if the same config is reused later, the following
two (good) things happen:
- we do not trip over warnings because node and/or vcpu
soft affinity now exist (as a consequence of the
successful placement), but numa_placement is still
true;
- we end up always using the results of the original
execution of the placement algorithm, rather than
re-running it at each re-use of the same config,
which is what most users expects and wants.
This fixes the bug reported here:
http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg04454.html
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Dario Faggioli [Wed, 1 Jul 2015 14:03:07 +0000 (16:03 +0200)]
libxl: turn NUMA placement misconfigs into warnings
instead than errors. More specifically, in libxl,
b_info->numa_autoplacement is meant as a way to
disable automatic NUMA placement, if one does not
want it to happen. It is, however, useful for
consistency checking as well, i.e., to ensure that
the user provided configuration (such as, for instance,
vcpu hard or soft affinity) and NUMA placement itself
will not clash.
However, right now, if such a clash happens we abort
domain creation and error out, which is too much! It
is, in fact, enough to infom the user/caller that NUMA
placement won't be performed, with a WARN, and that's
what this commit does.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:11 +0000 (12:01 +0100)]
xen/arm: Merge gicv_setup with vgic_domain_init
Currently, it's hard to decide whether a part of the domain
initialization should live in gicv_setup (part of the GIC
driver) and domain_init (part of the vGIC driver).
The code to initialize the domain for a specific vGIC version is always
the same no matter the version of the GIC.
Move all the domain initialization code for the vGIC in the respective
domain_init callback of each vGIC drivers.
New structures have been introduced to store HW information per vGIC.
Each vGIC HW structure contains a boolean to indicate if the current GIC is
able to support this specific version of virtual GIC.
Helpers have been introduced in order to help the GIC correctly setup
the vGIC. The GIC will have to call them to announce support for this
specific version.
Also drop fields that become unnecessary in each global state.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:10 +0000 (12:01 +0100)]
xen/arm: gic-{v2, hip04}: Remove hbase from the global state
The driver only needs to know the base address of the hypervisor
register during the GIC initialization (see {gicv2,hip04}_init).
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Julien Grall [Wed, 1 Jul 2015 11:01:09 +0000 (12:01 +0100)]
xen/arm: gic: Allow the base address to be 0
0 is a valid physical address and dt_device_get_address would return
an error if a problem during the retrieving happen.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Julien Grall [Wed, 1 Jul 2015 11:01:08 +0000 (12:01 +0100)]
xen/arm: gic-{v2, hip04}: Use SZ_64K rather than our custom value
It's not easy to understand PAGE_SIZE * 0x10 and PAGE_SIZE * 16 at the
first glance.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Julien Grall [Wed, 1 Jul 2015 11:01:07 +0000 (12:01 +0100)]
xen/arm: gic-{v2, hip04}: Remove redundant check in {gicv2, hip04gic}_init
There is a global check for page alignment later within the same function.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:06 +0000 (12:01 +0100)]
xen/arm: gic-v3: Rework the messages printed at initialization
- Print all the redistributor regions rather than only the first
one...
- Add # in the format to print 0x for hexadecimal. It's easier to
differentiate from decimal
- Re-order information printed
- Drop print of the virtual addresses. It makes the log more
difficult to read and don't improve user debugging experience (the
value can't be used like as it is).
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:05 +0000 (12:01 +0100)]
xen/arm: gic-v3: Use the domain redistributor information to make the DT node
It's not necessary to get from the hardware DT the redistributor
informations again. We already have it stored in the gic_info and
the domain.
Use the latter to be consistent with the rest of the function.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:04 +0000 (12:01 +0100)]
xen/arm: gic-v3: Fix the distributor region to 64kB
On GICv3, the default size of the distributor region is 64kB (see 5.3
in PRD03-GENC-010745 24.0). This region can be extended to provide an
implementation defined set of pages containing additional aliases for MSI.
Although, the GICv3 driver only access to register within the default
distributor region.
Furthermore, our vGIC driver implementation doesn't support the extended
distributor. Therefore there is no reason to expose it to DOM0.
Finally drop the field dbase_size which is not useful anymore.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:03 +0000 (12:01 +0100)]
xen/arm: vGIC: Check return of the domain_init callback
The domain_init callback can return error. Check it and progate the
error if necessary.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:02 +0000 (12:01 +0100)]
xen/arm: gic: Rename make_dt_node into make_hwdom_dt_node
Making it clear that the callback is only used to make the device tree node
for the hardware domain.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Jul 2015 11:01:00 +0000 (12:01 +0100)]
xen/arm: Gate GICv3 change with HAS_GICV3 rather than CONFIG_ARM_64...
for clarity and it will be easier to understand some follow-up patches.
Also gate gic_v3 structure with HAS_GICV3.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 17 Jun 2015 13:58:27 +0000 (14:58 +0100)]
xen/arm: Find automatically the gnttab region for DOM0
Currently, the grant table region is hardcoded per-platform. When a new
board is coming up, we have to check the spec in order to find a space
in the memory layout free. Depending on the platform it may be tedious.
A good candidate for the gnttab region is the one used by Xen binary as
some part will never be mapped to the DOM0 address, MMIO are mapped 1:1
and the RAM will be either:
- direct mapped: 1:1 mapping is used => no problem
- non direct mapped: Xen always relocates himself as high as possible
(limited to 4GB on ARM32) and the RAM bank are filled from the first
one. It's very unlikely that the gnttab region will overlap with the
RAM. Although for safety a check may be necessary when we will reenable
the option.
Furthermore, there is plenty of space to contain a big gnttab, the default
size is 32 frame (i.e 128KB) but it can be changed via a command option.
It's not possible to use the whole region used by Xen, as some part of
the binary will be freed after Xen boot and can be used by DOM0 and other
guest. A sensible choice is the text secion as it will always reside in
memory never be mapped to the guest and the size is big enough (~300KB
on ARM64). It could be extended later to use other contiguous sections
such as data...
Note that on ARM64, the grant table region may be after 4GB (Xen is
relocated to the highest address) using DOM0 32 bit with short page table
may not work. Although, I don't think this is a big deal as device may not
work and/or the RAM is too high due to the 1:1 mapping.
This patch also drop the platforms thunderx and xilinx-zynqmp which became
dummy by dropping the hardcoding DOM0 grant table region.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Euan Harris [Thu, 2 Jul 2015 10:30:05 +0000 (11:30 +0100)]
libxl: doc: Fix nonexistent error code in libxl_event_check example
Fix example code in comment. libxl_event_check() can return
ERROR_NOT_READY; LIBXL_NOT_READY does not exist.
Signed-off-by: Euan Harris <euan.harris@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>