Andrew Cooper [Mon, 27 Jan 2014 16:25:24 +0000 (16:25 +0000)]
tools/libxc: goto correct label on error paths
Both of these "goto finish;" statements are actually errors, and need to "goto
out;" instead, which will correctly destroy the domain and return an error,
rather than trying to finish the migration (and in at least one scenario,
return success).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Tue, 4 Feb 2014 08:22:12 +0000 (09:22 +0100)]
x86/domctl: don't ignore errors from vmce_restore_vcpu()
What started out as a simple cleanup patch (eliminating the redundant
check of domctl->cmd before setting "copyback", which as a result
turned the "ext_vcpucontext_out" label useless) revealed a bug in the
handling of XEN_DOMCTL_set_ext_vcpucontext.
Fix this, retaining the cleanup, and at once dropping a stale comment
and an accompanying formatting issue.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 3 Feb 2014 08:31:03 +0000 (09:31 +0100)]
QEMU_UPSTREAM_REVISION -> master again
Ian Jackson [Fri, 31 Jan 2014 11:21:55 +0000 (11:21 +0000)]
Merge branch 'master' into staging
Ian Jackson [Tue, 28 Jan 2014 15:48:55 +0000 (16:48 +0100)]
Update QEMU_TAG and QEMU_UPSTREAM_REVISION for 4.4.0-rc3
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit
a96bbe5fd79ea8ac6b40e90965f84aab839d3391)
Ian Jackson [Thu, 30 Jan 2014 03:47:11 +0000 (03:47 +0000)]
Update QEMU_TAG and QEMU_UPSTREAM_REVISION for 4.4.0-rc3
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Stefano Stabellini [Tue, 28 Jan 2014 15:48:55 +0000 (16:48 +0100)]
Update QEMU_UPSTREAM_REVISION
Switch back to master.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Olaf Hering [Tue, 28 Jan 2014 12:33:57 +0000 (13:33 +0100)]
blkif.h: enhance comments related to the discard feature
Also fix the name of the discard-alignment property, add the missing 'n'.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 28 Jan 2014 12:31:28 +0000 (13:31 +0100)]
xen/unlz4: always set an error return code on failures
"ret", being set to -1 early on, gets cleared by the first invocation
of lz4_decompress()/lz4_decompress_unknownoutputsize(), and hence
subsequent failures wouldn't be noticed by the caller without setting
it back to -1 right after those calls.
Linux commit:
2a1d689c9ba42a6066540fb221b6ecbd6298b728
Reported-by: Matthew Daley <mattjd@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 24 Jan 2014 18:28:11 +0000 (18:28 +0000)]
minios: Correct HYPERVISOR_physdev_op()
A physdev_op is a two argument hypercall, taking a command parameter and an
optional pointer to a structure.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 18:45:31 +0000 (18:45 +0000)]
xenstore: xs_suspend_evtchn_port: always free portstr
If portstr!=NULL but plen==0 this function would leak portstr.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 18:45:30 +0000 (18:45 +0000)]
xl: Free optdata_begin when saving domain config
This makes valgrind a bit happier.
It is also
Coverity-CID:
1055903
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 18:45:29 +0000 (18:45 +0000)]
libxl: events: Pass correct nfds to poll
libxl_event.c:eventloop_iteration would pass the allocated pollfds
array size, rather than the used size, to poll (and to
afterpoll_internal).
The effect is that if the number of fds to poll on reduces, libxl will
poll on stale entries. Because of the way the return value from poll
is processed these stale entries are often harmless because any events
coming back from poll ignored by libxl. However, it could cause
malfunctions:
It could result in unwanted SIGTTIN/SIGTTOU/SIGPIPE, for example, if
the fd has been reused to refer to an object which can generate those
signals. Alternatively, it could result in libxl spinning if the
stale entry refers to an fd which happens now to be ready for the
previously-requested operation.
I have tested this with a localhost migration and inspected the strace
output.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Fabio Fantoni [Tue, 21 Jan 2014 13:51:08 +0000 (14:51 +0100)]
tools/hotplug: fix bug on xendomains using xl
Make rdname function work with xl.
The rdname function not support json output of xl commands and this cause
problems using xl, for example the check if domUs are already running (because
they have been restored) on domUs autostart does not succeed and the domain is
created in any case, causing xl create to fail.
Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Pranavkumar Sawargaonkar [Mon, 27 Jan 2014 11:34:48 +0000 (17:04 +0530)]
xen: arm: platforms: Adding reset support for xgene arm64 platform.
This patch adds a reset support for xgene arm64 platform.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 24 Jan 2014 08:01:21 +0000 (08:01 +0000)]
libxc/unlz4: always set an error return code on failures
"ret", being set to -1 early on, gets cleared by the first invocation
of lz4_decompress()/lz4_decompress_unknownoutputsize(), and hence
subsequent failures wouldn't be noticed by the caller without setting
it back to -1 right after those calls.
Linux commit:
2a1d689c9ba42a6066540fb221b6ecbd6298b728
Reported-by: Matthew Daley <mattjd@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Mike Neilsen [Wed, 22 Jan 2014 17:41:11 +0000 (11:41 -0600)]
mini-os: Fix stubdom build failures on gcc 4.8
This is a fix for bug 35:
http://bugs.xenproject.org/xen/bug/35
This bug report describes several format string mismatches which prevent
building the stubdom target in Xen 4.3 and Xen 4.4-rc2 on gcc 4.8. This is a
copy of Alex Sharp's original patch with the following modifications:
* Andrew Cooper's recommendation applied to extras/mini-os/xenbus/xenbus.c to
avoid stack corruption
* Samuel Thibault's recommendation to make "fun" an unsigned int rather than an
unsigned long in pcifront_physical_to_virtual and related functions
(extras/mini-os/include/pcifront.h and extras/mini-os/pcifront.c)
Tested on x86_64 gcc Ubuntu/Linaro 4.8.1-10ubuntu9.
Coverity-IDs:
1055807 1055808 1055809 1055810
Signed-off-by: Mike Neilsen <mneilsen@acm.org>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 24 Jan 2014 12:41:36 +0000 (13:41 +0100)]
x86: PHYSDEVOP_{prepare,release}_msix are privileged
Yet this wasn't being enforced.
This is XSA-87.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 Jan 2014 09:19:53 +0000 (10:19 +0100)]
Revert "x86/viridian: Time Reference Count MSR"
This mostly reverts commit
e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b.
hvm_get_guest_time() is not a suitable time source for this MSR, as
is resets across migration.
Conflicts:
xen/arch/x86/hvm/viridian.c
xen/include/asm-x86/perfc_defn.h
Andrew Cooper [Thu, 23 Jan 2014 12:55:42 +0000 (13:55 +0100)]
x86/irq: avoid use-after-free on error path in pirq_guest_bind()
This is XSA-83.
Coverity-ID:
1146952
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 23 Jan 2014 09:30:08 +0000 (10:30 +0100)]
x86: don't drop guest visible state updates when 64-bit PV guest is in user mode
Since 64-bit PV uses separate kernel and user mode page tables, kernel
addresses (as usually provided via VCPUOP_register_runstate_memory_area
and possibly via VCPUOP_register_vcpu_time_memory_area) aren't
necessarily accessible when the respective updating occurs. Add logic
for toggle_guest_mode() to take care of this (if necessary) the next
time the vCPU switches to kernel mode.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 23 Jan 2014 09:29:12 +0000 (10:29 +0100)]
unmodified_drivers: make usbfront build conditional
Commit
0dcfb88fb8 ("unmodified_drivers: enable build of usbfront
driver") results in the PV drivers to no longer build against older
(pre-2.6.35) Linux versions. That's because usbfront.h includes
headers from drivers/usb/core/, which is generally unavailable when
building out-of-tree modules.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Yang Zhang [Thu, 23 Jan 2014 09:27:34 +0000 (10:27 +0100)]
Nested VMX: prohibit virtual vmentry/vmexit during IO emulation
Sometimes, L0 needs to decode L2's instruction to handle IO access directly.
And L0 may get X86EMUL_RETRY when handling this IO request. At same time, if
there is a virtual vmexit pending (for example, an interrupt pending to inject
to L1) and hypervisor will switch the VCPU context from L2 to L1. Now we
already are in L1's context, but since we got a X86EMUL_RETRY just now and
this means hypervisor will retry to handle the IO request later and
unfortunately, the retry will happen in L1's context. And it will cause the
problem. The fixing is that if there is a pending IO request, no virtual
vmexit/vmentry is allowed.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Jan Beulich [Mon, 20 Jan 2014 08:50:20 +0000 (09:50 +0100)]
compat/memory: fix build with old gcc
struct xen_add_to_physmap_batch's size field being uint16_t causes old
compiler versions to warn about the pointless range check done inside
compat_handle_okay().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Mon, 20 Jan 2014 08:49:20 +0000 (09:49 +0100)]
common/memory: Fix ABI breakage for XENMEM_add_to_physmap
caused by c/s
4be86bb194e25e46b6cbee900601bfee76e8090a
In public/memory.h, struct xen_add_to_physmap has 'space' as an unsigned int,
but struct xen_add_to_physmap_batch has 'space' as a uint16_t.
By defining xenmem_add_to_physmap_one() with space defined as uint16_t, the
now-common xenmem_add_to_physmap() implicitly truncates xatp->space from
unsigned int to uint16_t, which changes the space switch()'d upon.
This wouldn't be noticed with any upstream code (of which I am aware), but was
discovered because of the XenServer support for legacy Windows PV drivers,
which make XENMEM_add_to_physmap hypercalls using spaces with the top bit set.
The current Windows PV drivers don't do this any more, but we 'fix' Xen to
support running VMs with out-of-date tools.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Ack: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Mon, 20 Jan 2014 08:48:11 +0000 (09:48 +0100)]
common/sysctl: Don't leak status in SYSCTL_page_offline_op
In addition, 'copyback' should be cleared even in the error case.
Also fix the indentation of the arguments to copy_to_guest() to help clarify
that the 'ret = -EFAULT' is not part of the condition.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Fri, 17 Jan 2014 15:02:00 +0000 (16:02 +0100)]
MAINTAINERS: remove Linux sections
The LINUX (PV_OPS) section was out-dated and it's better to only have
this information in one place (Tte Linux MAINTAINERS file).
The LINUX (XCP) section was an external project that that hasn't been
maintained for years.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Yang Zhang [Fri, 17 Jan 2014 15:00:21 +0000 (16:00 +0100)]
nested EPT: fixing wrong handling for L2 guest's direct mmio access
L2 guest will access the physical device directly(nested VT-d). For such access,
Shadow EPT table should point to device's MMIO. But in current logic, L0 doesn't
distinguish the MMIO whether from qemu or physical device when building shadow EPT table.
This is wrong. This patch will setup the correct shadow EPT table for such MMIO ranges.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Frediano Ziglio [Fri, 17 Jan 2014 14:58:27 +0000 (15:58 +0100)]
mce: fix race condition in mctelem_xchg_head
The function (mctelem_xchg_head()) used to exchange mce telemetry
list heads is racy. It may write to the head twice, with the second
write linking to an element in the wrong state.
If there are two threads, T1 inserting on committed list; and T2
trying to consume it.
1. T1 starts inserting an element (A), sets prev pointer (mcte_prev).
2. T1 is interrupted after the cmpxchg succeeded.
3. T2 gets the list and changes element A and updates the commit list
head.
4. T1 resumes, reads pointer to prev again and compare with result
from the cmpxchg which succeeded but in the meantime prev changed
in memory.
5. T1 thinks the cmpxchg failed and goes around the loop again,
linking head to A again.
To solve the race use temporary variable for prev pointer.
*linkp (which point to a field in the element) must be updated before
the cmpxchg() as after a successful cmpxchg the element might be
immediately removed and reinitialized.
The wmb() prior to the cmpchgptr() call is not necessary since it is
already a full memory barrier. This wmb() is thus removed.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Ian Campbell [Tue, 14 Jan 2014 17:32:54 +0000 (17:32 +0000)]
xen: arm: correct guest PSCI handling on 64-bit hypervisor.
Using ->rN truncates the 64-bit registers to 32-bits, which on X-gene chops
off the top bit of the entry address for PSCI_UP.
Follow the pattern established in do_trap_hypercall.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Jackson [Mon, 13 Jan 2014 18:15:37 +0000 (18:15 +0000)]
xl: Always use "fast" migration resume protocol
As Ian Campbell writes in http://bugs.xenproject.org/xen/bug/30:
There are two mechanisms by which a suspend can be aborted and the
original domain resumed.
The older method is that the toolstack resets a bunch of state (see
tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then
restarts the domain. The domain will see HYPERVISOR_suspend return 0
and will continue without any realisation that it is actually
running in the original domain and not in a new one. This method is
supposed to be implemented by libxl_domain_resume(suspend_cancel=0)
but it is not.
The other method is newer and in this case the toolstack arranges
that HYPERVISOR_suspend returns SUSPEND_CANCEL and restarts it. The
domain will observe this and realise that it has been restarted in
the same domain and will behave accordingly. This method is
implemented, correctly AFAIK, by
libxl_domain_resume(suspend_cancel=1).
Attempting to use the old method without doing all of the work simply
causes the guest to crash. Implementing the work required for old
method, or for checking that domains actually support the new method,
is not feasible at this stage of the 4.4 release.
So, always use the new method, without regard to the declarations of
support by the guest. This is a strict improvement: guests which do
in fact support the new method will work, whereas ones which don't are
no worse off.
There are two call sites of libxl_domain_resume that need fixing, both
in the migration error path.
With this change I observe a correct and successful resumption of a
Debian wheezy guest with a Linux 3.4.70 kernel after a migration
attempt which I arranged to fail by nobbling the block hotplug script.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: konrad.wilk@oracle.com
CC: David Vrabel <david.vrabel@citrix.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Wei Liu [Mon, 13 Jan 2014 11:52:28 +0000 (11:52 +0000)]
libxl: disallow PCI device assignment for HVM guest when PoD is enabled
This replicates a Xend behavior, see
ec789523749 ("xend: Dis-allow
device assignment if PoD is enabled.").
This change is restricted to HVM guest, as only HVM is relevant in the
counterpart in Xend. We're late in release cycle so the change should
only do what's necessary. Probably we can revisit it if we need to do
the same thing for PV guest in the future.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Tue, 14 Jan 2014 13:36:55 +0000 (13:36 +0000)]
xen/arm: p2m: Correctly flush TLB in create_p2m_entries
The p2m is shared between VCPUs for each domain. Currently Xen only flush
TLB on the local PCPU. This could result to mismatch between the mapping in the
p2m and TLBs.
Flush TLB entries used by this domain on every PCPU. The flush can also be
moved out of the loop because:
- ALLOCATE: only called for dom0 RAM allocation, so the flush is never called
- INSERT: if valid = 1 that would means with have replaced a
page that already belongs to the domain. A VCPU can write on the wrong page.
This can happen for dom0 with the 1:1 mapping because the mapping is not
removed from the p2m.
- REMOVE: except for grant-table (replace_grant_host_mapping), each
call to guest_physmap_remove_page are protected by the callers via a
get_page -> .... -> guest_physmap_remove_page -> ... -> put_page. So
the page can't be allocated for another domain until the last put_page.
- RELINQUISH : the domain is not running anymore so we don't care...
Also avoid leaking a foreign page if the function is INSERTed a new mapping
on top of foreign mapping.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Thu, 9 Jan 2014 16:58:03 +0000 (16:58 +0000)]
xen/arm: correct flush_tlb_mask behaviour
On ARM, flush_tlb_mask is used in the common code:
- alloc_heap_pages: the flush is only be called if the new allocated
page was used by a domain before. So we need to flush only TLB non-secure
non-hyp inner-shareable.
- common/grant-table.c: every calls to flush_tlb_mask are used with
the current domain. A flush TLB by current VMID inner-shareable is enough.
The current code only flush hypervisor TLB on the current PCPU. For now,
flush TLBs non-secure non-hyp on every PCPUs.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Anil Madhavapeddy [Sat, 11 Jan 2014 23:33:25 +0000 (23:33 +0000)]
libxl: ocaml: guard x86-specific functions behind an ifdef
The various cpuid functions are not available on ARM, so this
makes them raise an OCaml exception. Omitting the functions
completely results in a link failure in oxenstored due to the
missing symbols, so this is preferable to the much bigger patch
that would result from adding conditional compilation into the
OCaml interfaces.
With this patch, oxenstored can successfully start a domain on
Xen/ARM.
Signed-off-by: Anil Madhavapeddy <anil@recoil.org>
Acked-by: David Scott <dave.scott@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Don Slutz [Fri, 10 Jan 2014 21:57:00 +0000 (16:57 -0500)]
xg_main: If XEN_DOMCTL_gdbsx_guestmemio fails then force error.
Without this gdb does not report an error.
With this patch and using a 1G hvm domU:
(gdb) x/1xh 0x6ae9168b
0x6ae9168b: Cannot access memory at address 0x6ae9168b
Drop output of iop->remain because it most likely will be zero.
This leads to a strange message:
ERROR: failed to read 0 bytes. errno:14 rc:-1
Add address to write error because it may be the only message
displayed.
Note: currently XEN_DOMCTL_gdbsx_guestmemio does not change 'iop' on
error and so iop->remain will be zero.
Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Don Slutz [Fri, 10 Jan 2014 21:56:59 +0000 (16:56 -0500)]
xg_read_mem: Report on error.
I had coded this with XGERR, but gdb will try to read memory without
a direct request from the user. So the error message can be confusing.
Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Jan Beulich [Tue, 14 Jan 2014 15:19:08 +0000 (16:19 +0100)]
update Xen version to 4.4-rc2
Ian Campbell [Tue, 7 Jan 2014 15:52:29 +0000 (15:52 +0000)]
Revert "tools: libxc: flush data cache after loading images into guest memory"
This reverts commit
a0035ecc0d82c1d4dcd5e429e2fcc3192d89747a.
Even with this fix there is a period between the flush and the unmap where
processor may speculate data into the cache. The solution is to map this
region uncached or to use the HCR.DC bit to mark all guest accesses cached.
89eb02c2204a "xen: arm: force guest memory accesses to cacheable when MMU is
disabled" has arranged to do the latter.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 8 Jan 2014 14:09:01 +0000 (14:09 +0000)]
xen: arm: force guest memory accesses to cacheable when MMU is disabled
On ARM guest OSes are started with MMU and Caches disables (as they are on
native) however caching is enabled in the domain running the builder and
therefore we must ensure cache consistency.
The existing solution to this problem (
a0035ecc0d82 "tools: libxc: flush data
cache after loading images into guest memory") is to flush the caches after
loading the various blobs into guest RAM. However this approach has two short
comings:
- The cache flush primitives available to userspace on arm32 are not
sufficient for our needs.
- There is a race between the cache flush and the unmap of the guest page
where the processor might speculatively dirty the cache line again.
(of these the second is the more fundamental)
This patch makes use of the the hardware functionality to force all accesses
made from guest mode to be cached (the HCR.DC == default cached bit). This
means that we don't need to worry about the domain builder's writes being
cached because the guests "uncached" accesses will actually be cached.
Unfortunately the use of HCR.DC is incompatible with the guest enabling its
MMU (SCTLR.M bit). Therefore we must trap accesses to the SCTLR so that we can
detect when this happens and disable HCR.DC. This is done with the HCR.TVM
(trap virtual memory controls) bit which also causes various other registers
to be trapped, all of which can be passed straight through to the underlying
register. Once the guest has enabled its MMU we no longer need to trap so
there is no ongoing overhead. In my tests Linux makes about half a dozen
accesses to these registers before the MMU is enabled, I would expect other
OSes to behave similarly (the sequence of writes needed to setup the MMU is
pretty obvious).
Apart from this unfortunate need to trap these accesses this approach is
incompatible with guests which attempt to do DMA operations with their MMU
disabled. In practice this means guests with passthrough which we do not yet
support. Since a typical guest (including dom0) does not access devices which
require DMA until after it is fully up and running with paging enabled the
main risk is to in-guest firmware which does DMA i.e. running EFI in a guest,
with a disk passed through and booting from that disk. Since we know that dom0
is not using any such firmware and we do not support device passthrough to
guests yet we can live with this restriction. Once passthrough is implemented
this will need to be revisited.
The patch includes a couple of seemingly unrelated but necessary changes:
- HSR_SYSREG_CRN_MASK was incorrectly defined, which happened to be benign
with the existing set of system register we handled, but broke with the new
ones introduced here.
- The defines used to decode the HSR system register fields were named the
same as the register. This breaks the accessor macros. This had gone
unnoticed because the handling of the existing trapped registers did not
require accessing the underlying hardware register. Rename those constants
with an HSR_SYSREG prefix (in line with HSR_CP32/64 for 32-bit registers).
This patch has survived thousands of boot loops on a Midway system.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Julien Grall [Fri, 10 Jan 2014 03:27:55 +0000 (03:27 +0000)]
xen/arm: Scrub heap pages during boot
Scrub heap pages was disabled because it was slow on the models. Now that Xen
supports real hardware, it's possible to enable by default scrubbing.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Rob Hoes [Fri, 10 Jan 2014 13:52:04 +0000 (13:52 +0000)]
libxl: ocaml: use 'for_app_registration' in osevent callbacks
This allows the application to pass a token to libxl in the fd/timeout
registration callbacks, which it receives back in modification or
deregistration callbacks.
It turns out that this is essential for timeout handling, in order to
identify which timeout to change on a modify event.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Acked-by: David Scott <dave.scott@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
David Vrabel [Fri, 10 Jan 2014 16:46:33 +0000 (17:46 +0100)]
x86: map portion of kexec crash area that is within the direct map area
Commit
7113a45451a9f656deeff070e47672043ed83664 (kexec/x86: do not map
crash kernel area) causes fatal page faults when loading a crash
image. The attempt to zero the first control page allocated from the
crash region will fault as the VA return by map_domain_page() has no
mapping.
The fault will occur on non-debug builds of Xen when the crash area is
below 5 TiB (which will be most systems).
The assumption that the crash area mapping was not used is incorrect.
map_domain_page() is used when loading an image and building the
image's page tables to temporarily map the crash area, thus the
mapping is required if the crash area is in the direct map area.
Reintroduce the mapping, but only the portions of the crash area that
are within the direct map area.
Reported-by: Don Slutz <dslutz@verizon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
This is really just a band aid - kexec shouldn't rely on the crash area
being always mapped when in the direct mapping range (and it didn't use
to in its previous form). That's primarily because map_domain_page()
(needed when the area is outside the direct mapping range) may be
unusable when wanting to kexec due to a crash, but also because in the
case of PFN compression the kexec range (if specified on the command
line) could fall into a hole between used memory ranges (while we're
currently only ignoring memory at the top of the physical address
space, it's pretty clear that sooner or later we will want that
selection to become more sophisticated in order to maximize the memory
made use of).
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 10 Jan 2014 16:45:01 +0000 (17:45 +0100)]
dbg_rw_guest_mem: need to call put_gfn in error path
Using a 1G hvm domU (in grub) and gdbsx:
(gdb) set arch i8086
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB. Attempting to continue with the default i8086 settings.
The target architecture is assumed to be i8086
(gdb) target remote localhost:9999
Remote debugging using localhost:9999
Remote debugging from host 127.0.0.1
0x0000d475 in ?? ()
(gdb) x/1xh 0x6ae9168b
Will reproduce this bug.
With a debug=y build you will get:
Assertion '!preempt_count()' failed at preempt.c:37
For a debug=n build you will get a dom0 VCPU hung (at some point) in:
[
ffff82c4c0126eec] _write_lock+0x3c/0x50
ffff82c4c01e43a0 __get_gfn_type_access+0x150/0x230
ffff82c4c0158885 dbg_rw_mem+0x115/0x360
ffff82c4c0158fc8 arch_do_domctl+0x4b8/0x22f0
ffff82c4c01709ed get_page+0x2d/0x100
ffff82c4c01031aa do_domctl+0x2ba/0x11e0
ffff82c4c0179662 do_mmuext_op+0x8d2/0x1b20
ffff82c4c0183598 __update_vcpu_system_time+0x288/0x340
ffff82c4c015c719 continue_nonidle_domain+0x9/0x30
ffff82c4c012938b add_entry+0x4b/0xb0
ffff82c4c02223f9 syscall_enter+0xa9/0xae
And gdb output:
(gdb) x/1xh 0x6ae9168b
0x6ae9168b: 0x3024
(gdb) x/1xh 0x6ae9168b
0x6ae9168b: Ignoring packet error, continuing...
Reply contains invalid hex digit 116
The 1st one worked because the p2m.lock is recursive and the PCPU
had not yet changed.
crash reports (for example):
crash> mm_rwlock_t 0xffff83083f913010
struct mm_rwlock_t {
lock = {
raw = {
lock =
2147483647
},
debug = {<No data fields>}
},
unlock_level = 0,
recurse_count = 1,
locker = 1,
locker_function = 0xffff82c4c022c640 <__func__.13514> "__get_gfn_type_access"
}
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Stefan Bader [Wed, 8 Jan 2014 17:26:59 +0000 (18:26 +0100)]
libxl: Auto-assign NIC devids in initiate_domain_create
This will change initiate_domain_create to walk through NIC definitions
and automatically assign devids to those which have not assigned one.
The devids are needed later in domcreate_launch_dm (for HVM domains
using emulated NICs). The command string for starting the device-model
has those ids as part of its arguments.
Assignment of devids in the hotplug case is handled by libxl_device_nic_add
but that would be called too late in the startup case.
I also moved the call to libxl__device_nic_setdefault here as this seems
to be the only path leading there and avoids doing the loop a third time.
The two loops are trying to handle a case where the caller sets some devids
(not sure that should be valid) but leaves some unset.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Thu, 9 Jan 2014 11:48:13 +0000 (11:48 +0000)]
docs/man/xl.cfg.pod.5: document global VNC options for VFB device
Update xl.cfg to reflect change in
706d4ab74 "xl: create VFB for PV
guest when VNC is specified".
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Tue, 7 Jan 2014 10:04:23 +0000 (10:04 +0000)]
tools/libxc: Correct read_exact() error messages
The errors have been incorrectly identifying their function since c/s
861aef6e1558bebad8fc60c1c723f0706fd3ed87 which did a lot of error handling
cleanup.
Use __func__ to ensure the name remains correct in the future.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Julien Grall [Mon, 6 Jan 2014 16:36:18 +0000 (16:36 +0000)]
xen/dts: Don't translate invalid address
ePAR specifies that if the property "ranges" doesn't exist in a bus node:
"it is assumed that no mapping exists between children of node and the parent
address space".
Modify dt_number_of_address to check if the list of ranges are valid. Return
0 (ie there is zero range) if the list is not valid.
This patch has been tested on the Arndale where the bug can occur with the
'/hdmi' node.
Reported-by: <tsahee@gmx.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Don Slutz [Wed, 8 Jan 2014 00:25:44 +0000 (19:25 -0500)]
gdbsx: Add Emacs local variables to source files.
These 2 files are changed in this patch set. So add the allowed
"Emacs local variables" from CODING_STYLE.
Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Tue, 17 Dec 2013 22:53:45 +0000 (22:53 +0000)]
xl: create VFB for PV guest when VNC is specified
This replicates a Xend behavior. When you specify 'vnc=1' and there's no
'vfb=[]' in a PV guest's config file, xl parses all top level VNC options and
creates a VFB for you.
Fixes bug #25.
http://bugs.xenproject.org/xen/bug/25
Reported-by: Konrad Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Rob Hoes [Thu, 12 Dec 2013 16:36:50 +0000 (16:36 +0000)]
libxl: ocaml: use int64 for timeval fields in the timeout_register callback
The original code works fine on 64-bit, but on 32-bit, the OCaml int (which is
1 bit smaller than the C int) is likely to overflow.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Anthony PERARD [Wed, 8 Jan 2014 08:17:55 +0000 (09:17 +0100)]
firmware: change level-triggered GPE event to a edge one for qemu-xen
This should help to reduce a CPU hotplug race window where a cpu hotplug
event while not be seen by the OS.
When hotplugging more than one vcpu, some of those vcpus might not be
seen as plugged by the guest.
This is what is currently happenning:
1. hw adds cpu, sets GPE.2 bit and sends SCI
2. OSPM gets SCI, reads GPE00.sts and masks GPE.2 bit in GPE00.en
3. OSPM executes _L02 (level-triggered event associate to cpu hotplug)
4. hw adds second cpu and sets GPE.2 bit but SCI is not asserted
since GPE00.en masks event
5. OSPM resets GPE.2 bit in GPE00.sts and umasks it in GPE00.en
as result event for step 4 is lost because step 5 clears it and OS
will not see added second cpu.
ACPI 50 spec: 5.6.4 General-Purpose Event Handling
defines GPE event handling as following:
1. Disables the interrupt source (GPEx_BLK EN bit).
2. If an edge event, clears the status bit.
3. Performs one of the following:
* Dispatches to an ACPI-aware device driver.
* Queues the matching control method for execution.
* Manages a wake event using device _PRW objects.
4. If a level event, clears the status bit.
5. Enables the interrupt source.
So, by using edge-triggered General-Purpose Event instead of a
level-triggered GPE, OSPM is less likely to clear the status bit of the
addition of the second CPU. On step 5, QEMU will resend an interrupt if
the status bit is set.
This description apply also for PCI hotplug since the same step are
followed by QEMU, so we also change the GPE event type for PCI hotplug.
This does not apply to qemu-xen-traditional because it does not resend
an interrupt if necessary as a result of step 5.
Patch and description inspired by SeaBIOS's commit:
Replace level gpe event with edge gpe event for hot-plug handlers
9c6635bd48d39a1d17d0a73df6e577ef6bd0037c
from Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Don Slutz [Wed, 8 Jan 2014 08:15:03 +0000 (09:15 +0100)]
hvm_save_one: return correct data
It is possible that hvm_sr_handlers[typecode].save does not use all
the provided room. Also it can use variable sized records. In both
cases, using:
instance * hvm_sr_handlers[typecode].size
does not select the correct instance. Add code to search for the
correct instance.
Signed-off-by: Don Slutz <dslutz@verizon.com>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 8 Jan 2014 08:06:07 +0000 (09:06 +0100)]
compat wrapper for XENMEM_add_to_physmap_batch
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 8 Jan 2014 08:04:48 +0000 (09:04 +0100)]
rename XENMEM_add_to_physmap_{range => batch}
The use of "range" here wasn't really correct - there are no ranges
involved. As the comment in the public header already correctly said,
all this is about is batching of XENMEM_add_to_physmap calls (with
the addition of having a way to specify a foreign domain for
XENMAPSPACE_gmfn_foreign).
Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Bob Liu [Thu, 12 Dec 2013 11:05:15 +0000 (19:05 +0800)]
tmem: check the return value of copy to guest
Use function copy_to_guest_offset/copy_to_guest directly and check their return
value.
This also fixes CID
1132754, and
1132755:
"Unchecked return value
If the function returns an error value, the error value may be mistaken for a
normal value. In tmem_copy_to_client_buf_offset: Value returned from a function
is not checked for errors before being used (CWE-252)"
And CID
1055125,
1055126,
1055127,
1055128,
1055129,
1055130
"Unchecked return value
If the function returns an error value, the error value may be mistaken for a
normal value. In <functions changed>: Value returned from a function is not
checked for errors before being used (CWE-252)"
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:14 +0000 (19:05 +0800)]
tmem: cleanup: rm unused tmem_freeze_all()
Nobody uses tmem_freeze_all() so remove it.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:13 +0000 (19:05 +0800)]
tmem: cleanup: rename tmem_relinquish_npages()
Rename tmem_relinquish_npages() to tmem_flush_npages() to
distinguish it from tmem_relinquish_pages().
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:12 +0000 (19:05 +0800)]
tmem: refator function tmem_ensure_avail_pages()
tmem_ensure_avail_pages() doesn't return a value which is incorrect because
the caller need to confirm whether there is enough memory.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:11 +0000 (19:05 +0800)]
tmem: cleanup: drop useless functions from header file
They are several one line functions in tmem_xen.h which are useless, this patch
embeded them into tmem.c directly.
Also modify void *tmem in struct domain to struct client *tmem_client in order
to make things more straightforward.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Bob Liu [Thu, 12 Dec 2013 11:05:10 +0000 (19:05 +0800)]
tmem: cleanup: __tmem_alloc_page: drop unneed parameters
The two parameters of __tmem_alloc_page() can be reduced.
tmem_called_from_tmem() was also dropped by this patch.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:09 +0000 (19:05 +0800)]
tmem: cleanup: refactor the alloc/free path
There are two allocate path for each persistant and ephemeral pool.
This path try to refactor those allocate/free functions with better name and
more readable call layer. Also added more comment.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:08 +0000 (19:05 +0800)]
tmem: cleanup: drop tmem_lock_all
tmem_lock_all is used for debug only, remove it from upstream to make
tmem source code more readable and easier to maintain.
And no_evict is meaningless without tmem_lock_all, this patch removes it
also.
This also fixes CID
1055654 Thread deadlock
[ Two threads will be stuck waiting forever if each holds a lock the other needs to acquire.
In alloc_heap_pages: Threads may try to acquire two locks in different orders, potentially
causing deadlock (CWE-833)]
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:07 +0000 (19:05 +0800)]
tmem: cleanup: rm useless EXPORT/FORWARD define
It's meaningless to define EXPORT/FORWARD and nobody uses them.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:06 +0000 (19:05 +0800)]
tmem: drop unneeded is_ephemeral() and is_private()
Can use !is_persistent() and !is_shared() to replace them directly.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:05 +0000 (19:05 +0800)]
tmem: cleanup: reorg function do_tmem_put()
Reorg code logic of do_tmem_put() to make it more readable and clean.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:04 +0000 (19:05 +0800)]
tmem: cleanup: drop useless parameters from put/get page
Tmem only takes page as a unit, so parameters tmem_offset, pfn_offset and len in
do_tmem_put/get() are meaningless. All of the callers are using the same
value(tmem_offset=0, pfn_offset=0, len=PAGE_SIZE).
This patch simplifies tmem ignoring those useless parameters and use the default
value directly.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:03 +0000 (19:05 +0800)]
tmem: cleanup: drop useless function 'tmem_copy_page'
Use memcpy directly.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:02 +0000 (19:05 +0800)]
tmem: cleanup: drop some debug code
"SENTINELS" and "DECL_CYC_COUNTER" are hacky code for debugging, there are not
suitable exist in upstream code.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Thu, 12 Dec 2013 11:05:01 +0000 (19:05 +0800)]
tmem: cleanup: drop unused sub command
TMEM_READ/TMEM_WRITE/TMEM_XCHG/TMEM_NEW_PAGE are never used, drop them to make
things simple and clean.
To be clear - we are bit lucky here - as none of the other implementors
of the tmem API are using it (Windows GPLPV code, SLES11, Linux upstream).
The spec says that the operations can return an error code (-ENOSYS for
example) so we are OK doing that.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Wed, 8 Jan 2014 07:44:23 +0000 (08:44 +0100)]
evtchn/fifo: don't corrupt queues if an old tail is linked
An event may still be the tail of a queue even if the queue is now
empty (an 'old tail' event). There is logic to handle the case when
this old tail event needs to be added to the now empty queue (by
checking for q->tail == port).
However, this does not cover all cases.
1. An old tail may be re-added simultaneously with another event.
LINKED is set on the old tail, and the other CPU may misinterpret
this as the old tail still being valid and set LINK instead of
HEAD. All events on this queue will then be lost.
2. If the old tail event on queue A is moved to a different queue B
(by changing its VCPU or priority), the event may then be linked
onto queue B. When another event is linked onto queue A it will
check the old tail, see that it is linked (but on queue B) and
overwrite the LINK field, corrupting both queues.
When an event is linked, save the vcpu id and priority of the queue it
is being linked onto. Use this when linking an event to check if it
is an unlinked old tail event. If it is an old tail event, the old
queue is empty and old_q->tail is invalidated to ensure adding another
event to old_q will update HEAD. The tail is invalidated by setting
it to 0 since the event 0 is never linked.
The old_q->lock is held while setting LINKED to avoid the race with
the test of LINKED in evtchn_fifo_set_link().
Since a event channel may move queues after old_q->lock is acquired,
we must check that we have the correct lock and retry if not. Since
changing VCPUs or priority is expected to be rare events that are
serialized in the guest, we try at most 3 times before dropping the
event. This prevents a malicious guest from repeatedly adjusting
priority to prevent another domain from acquiring old_q->lock.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Wed, 8 Jan 2014 07:43:36 +0000 (08:43 +0100)]
evtchn/fifo: initialize priority when events are bound
Event channel ports that are reused or that were not in the initial
bucket would have a non-default priority.
Add an init evtchn_port_op hook and use this to set the priority when
an event channel is bound.
Within this new evtchn_fifo_init() call, also check if the event is
already on a queue and print a warning, as this event may have its
first event delivered on a queue with the wrong VCPU or priority.
This guest is expected to prevent this (if it cares) by not unbinding
events that are still linked.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 7 Jan 2014 15:01:14 +0000 (16:01 +0100)]
IOMMU: make page table deallocation preemptible
This too can take an arbitrary amount of time.
In fact, the bulk of the work is being moved to a tasklet, as handling
the necessary preemption logic in line seems close to impossible given
that the teardown may also be invoked on error paths.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Ian Campbell [Tue, 7 Jan 2014 14:32:45 +0000 (14:32 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Fri, 20 Dec 2013 15:08:08 +0000 (15:08 +0000)]
xen: arm: context switch the aux memory attribute registers
We appear to have somehow missed these. Linux doesn't actually use them and
none of the processors I've looked at actually define any bits in them (so
they are UNK/SBZP) but it is good form to context switch them anyway.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Suravee Suthikulpanit [Tue, 7 Jan 2014 14:09:42 +0000 (15:09 +0100)]
AMD/IOMMU: fix infinite loop due to ivrs_bdf_entries larger than 16-bit value
Certain AMD systems could have upto 0x10000 ivrs_bdf_entries.
However, the loop variable (bdf) is declared as u16 which causes
inifinite loop when parsing IOMMU event log with IO_PAGE_FAULT event.
This patch changes the variable to u32 instead.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 7 Jan 2014 13:59:31 +0000 (14:59 +0100)]
VTD/DMAR: free() correct pointer on error from acpi_parse_one_atsr()
Free the allocated structure rather than the ACPI table ATS entry.
On further analysis, there is another memory leak. acpi_parse_dev_scope()
could allocate scope->devices, and return with -ENOMEM. All callers of
acpi_parse_dev_scope() would then free the underlying structure, loosing the
pointer.
These errors can only actually be reached through acpi_parse_dev_scope()
(which passes type = DMAR_TYPE), but I am quite surprised Coverity didn't spot
it.
Coverity-ID:
1146949
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 7 Jan 2014 13:58:35 +0000 (14:58 +0100)]
AMD/microcode: avoid use-after-free for the microcode buffer
It is possible to free the mc_old buffer and then store it for use in the case
of resume.
This keeps the old semantics of being able to return an error even after a
successful microcode application.
Coverity-ID
1146953
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Tue, 7 Jan 2014 13:57:15 +0000 (14:57 +0100)]
AMD/iommu_detect: don't leak iommu structure on error paths
Tweak the logic slightly to return the real errors from
get_iommu_{,msi_}capabilities(), which at the moment is no functional change.
Coverity-ID:
1146950
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Tue, 7 Jan 2014 13:50:35 +0000 (13:50 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Julien Grall [Tue, 24 Dec 2013 11:28:47 +0000 (11:28 +0000)]
xen: driver/char: fix const declaration of DT compatible list
The data type for DT compatible list should be:
const char * const[] __initconst
Fix every serial drivers which support device tree.
Spotted-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tsahee Zidenberg [Sun, 22 Dec 2013 10:59:57 +0000 (12:59 +0200)]
ns16550: support ns16550a
Ns16550a devices are Ns16550 devices with additional capabilities.
Decare XEN is compatible with this device, to be able to use unmodified
devicetrees.
Signed-off-by: Tsahee Zidenberg <tsahee@gmx.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Tsahee Zidenberg [Sun, 22 Dec 2013 11:01:31 +0000 (13:01 +0200)]
xen/dts: specific bad cell count error
Specify in the error message if bad cell count is in device or parent.
Signed-off-by: Tsahee Zidenberg <tsahee@gmx.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Jackson [Tue, 17 Dec 2013 18:35:18 +0000 (18:35 +0000)]
libxc: Document xenctrl.h event channel calls
Provide semantic documentation for how the libxc calls relate to the
hypervisor interface, and how they are to be used.
Also document the bug (present at least in Linux 3.12) that setting
the evtchn fd to nonblocking doesn't in fact make xc_evtchn_pending
nonblocking, and describe the appropriate workaround.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Tue, 17 Dec 2013 18:35:17 +0000 (18:35 +0000)]
docs: Document event-channel-based suspend protocol
Document the event channel protocol in xenstore-paths.markdown, in the
section for ~/device/suspend/event-channel.
Protocol reverse-engineered from commentary and commit messages of
4539594d46f9 Add facility to get notification of domain suspend ...
17636f47a474 Teach xc_save to use event-channel-based ...
and implementations in
xc_save (current version)
libxl (current version)
linux-2.6.18-xen (mercurial 1241:
2993033a77ca)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Ian Jackson [Tue, 17 Dec 2013 18:35:16 +0000 (18:35 +0000)]
xen: Document that EVTCHNOP_bind_interdomain signals
EVTCHNOP_bind_interdomain signals the event channel. Document this.
Also explain the usual use pattern.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Ian Jackson [Tue, 17 Dec 2013 18:35:15 +0000 (18:35 +0000)]
xen: Document XEN_DOMCTL_subscribe
Arguably this domctl is misnamed. But, for now, document its actual
behaviour (reverse-engineered from the code and found in the commit
message for
4539594d46f9) under its actual name.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
CC: Jan Beulich <JBeulich@suse.com>
Julien Grall [Tue, 17 Dec 2013 14:28:19 +0000 (14:28 +0000)]
xen/arm: Allow ballooning working with 1:1 memory mapping
With the lack of iommu, dom0 must have a 1:1 memory mapping for all
these guest physical address. When the balloon decides to give back a
page to the kernel, this page must have the same address as previously.
Otherwise, we will loose the 1:1 mapping and will break DMA-capable
devices.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Yang Zhang [Tue, 7 Jan 2014 13:30:47 +0000 (14:30 +0100)]
VMX: Eliminate cr3 save/loading exiting when UG enabled
With the feature of unrestricted guest, there should be no vmexit
be triggered when guest accesses the cr3 in non-paging mode. This
patch will clear the cr3 save/loading bit in vmcs control filed to
eliminate cr3 access vmexit on UG avaliable hardware.
The previous patch (commit
c9efe34c119418a5ac776e5d91aeefcce4576518)
did the same thing compare to this one. But it will cause guest fail
to boot up on non-UG hardware which is repoted by Jan and it has been
reverted (commit
1e2bf05ec37cf04b0e01585eae524509179f165e).
This patch incorporate the fixing and guest are working well both in
UG and non-UG platform with this patch.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Yang Zhang [Tue, 7 Jan 2014 13:30:21 +0000 (14:30 +0100)]
VMX,apicv: Set "NMI-window exiting" for NMI
Enable NMI-window exiting if interrupt is blocked by NMI under apicv enabled
platform.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Jan Beulich [Tue, 7 Jan 2014 13:21:48 +0000 (14:21 +0100)]
IOMMU: make page table population preemptible
Since this can take an arbitrary amount of time, the rooting domctl as
well as all involved code must become aware of this requiring a
continuation.
The subject domain's rel_mem_list is being (ab)used for this, in a way
similar to and compatible with broken page offlining.
Further, operations get slightly re-ordered in assign_device(): IOMMU
page tables now get set up _before_ the first device gets assigned, at
once closing a small timing window in which the guest may already see
the device but wouldn't be able to access it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Jan Beulich [Fri, 20 Dec 2013 11:02:06 +0000 (12:02 +0100)]
fix XENMEM_add_to_physmap_range preemption handling
Just like for all other hypercalls we shouldn't be modifying the input
structure - all of the fields are, even if not explicitly documented,
just inputs (the one OUT one really refers to the memory pointed to by
that handle rather than the handle itself).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 20 Dec 2013 11:01:44 +0000 (12:01 +0100)]
move XENMEM_add_to_physmap_range handling framework to common code
There's really nothing really architecture specific here; the
architecture specific handling is limited to
xenmem_add_to_physmap_one().
This further eliminates the erroneous bailing from
xenmem_add_to_physmap_range() if xenmem_add_to_physmap_one() fails.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 20 Dec 2013 11:01:09 +0000 (12:01 +0100)]
fix XENMEM_add_to_physmap preemption handling
Just like for all other hypercalls we shouldn't be modifying the input
structure - all of the fields are, even if not explicitly documented,
just inputs.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 20 Dec 2013 11:00:15 +0000 (12:00 +0100)]
move XENMEM_add_to_physmap handling framework to common code
There's really nothing really architecture specific here; the
architecture specific handling is limited to
xenmem_add_to_physmap_one().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Yang Zhang [Fri, 20 Dec 2013 10:57:14 +0000 (11:57 +0100)]
Nested VMX: Setup the virtual NMI exiting info
When inject a virtual nmi exit to L1, hypervisor need to set the
virtual vmcs with right vaule which is missing in current Xen.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Ian Campbell [Fri, 20 Dec 2013 09:53:14 +0000 (09:53 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Julien Grall [Fri, 20 Dec 2013 01:41:20 +0000 (01:41 +0000)]
xen/arm: p2m: Don't create new table when the mapping is removed
When Xen is removing/relinquishing mapping, it will create second/third tables
if they don't exist.
Non-existent table means the address range was never mapped, so Xen can safely
skip them.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Matthew Daley [Sat, 30 Nov 2013 00:20:04 +0000 (13:20 +1300)]
xenstore: sanity check incoming message body lengths
This is for the client-side receiving messages from xenstored, so there
is no security impact, unlike XSA-72.
Coverity-ID:
1055449
Coverity-ID:
1056028
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Thu, 19 Dec 2013 16:45:03 +0000 (16:45 +0000)]
tools/libx: xl uptime doesn't require argument
The current behavior is:
42sh> xl uptime
'xl uptime' requires at least 1 argument.
Usage: xl [-v] uptime [-s] [Domain]
The normal behavior should list uptime for each domain when there is no
parameters.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>