Jan Beulich [Thu, 11 Dec 2014 16:14:07 +0000 (17:14 +0100)]
have architectures specify the number of PIRQs a hardware domain gets
The current value of nr_static_irqs + 256 is often too small for larger
systems. Make it dependent on CPU count and number of IO-APIC pins on
x86, and (until it obtains PCI support) simply NR_IRQS on ARM.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Jan Beulich [Thu, 11 Dec 2014 16:13:04 +0000 (17:13 +0100)]
lock down hypercall continuation encoding masks
Andrew validly points out that even if these masks aren't a formal part
of the hypercall interface, we aren't free to change them: A guest
suspended for migration in the middle of a continuation would fail to
work if resumed on a hypervisor using a different value. Hence add
respective comments to their definitions.
Additionally, to help future extensibility as well as in the spirit of
reducing undefined behavior as much as possible, refuse hypercalls made
with the respective bits non-zero when the respective sub-ops don't
make use of those bits.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Jan Beulich [Thu, 11 Dec 2014 11:24:05 +0000 (12:24 +0100)]
VMX: don't allow PVH to reach handle_mmio()
PVH guests accessing I/O ports via string ops is not supported yet.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 17:28:18 +0000 (17:28 +0000)]
libxl: events: Document and enforce actual callbacks restriction
libxl_event_register_callbacks cannot reasonably be called while libxl
is busy (has outstanding operations and/or enabled events).
This is because the previous spec implied (although not entirely
clearly) that event hooks would not be called for existing fd and
timeout interests. There is thus no way to reliably ensure that libxl
would get told about fds and timeouts which it became interested in
beforehand.
So there have to be no such fds or timeouts, which means that the
callbacks must only be registered or changed when the ctx is idle.
Document this restriction, and enforce it with a pair of asserts.
(It would be nicer, perhaps, to say that the application may not call
libxl_osevent_register_hooks other than right after creating the ctx.
But there are existing callers, including libvirt, who do it later -
even after doing major operations such as domain creation.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 17:27:27 +0000 (17:27 +0000)]
libxl: events: Deregister evtchn fd when not needed
We want to have no fd events registered when we are idle.
In this patch, deal with the evtchn fd:
* Defer setup of the evtchn handle to the first use.
* Defer registration of the evtchn fd; register as needed on use.
* When cancelling an evtchn wait, or when wait setup fails, check
whether there are now no evtchn waits and if so deregister the fd.
* On libxl teardown, the evtchn fd should therefore be unregistered.
assert that this is the case.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Do not bother putting evtchn_fd in the ctx; instead, get it
from xc_evtchn_fd when we need it. (Cosmetic.)
Do not register the evtchn fd multiple times: check it's not
registered before we call libxl__ev_fd_register. (Bugfix.)
Ian Jackson [Thu, 27 Nov 2014 18:04:29 +0000 (18:04 +0000)]
libxl: events: Tear down SIGCHLD machinery on ctx destruction
We want to have no fd events registered when we are idle.
Also, we should put back the default SIGCHLD handler. So:
* In libxl_ctx_free, use libxl_childproc_setmode to set the mode to
the default, which is libxl_sigchld_owner_libxl (ie `libxl owns
SIGCHLD only when it has active children').
But of course there are no active children at libxl teardown so
this results in libxl__sigchld_notneeded: the ctx loses its
interest in SIGCHLD (unsetting the SIGCHLD handler if we were the
last ctx) and deregisters the per-ctx selfpipe fd.
* assert that this is the case: ie that we are no longer interested
in the selfpipe.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Thu, 27 Nov 2014 18:03:03 +0000 (18:03 +0000)]
libxl: events: Deregister, don't just modify, sigchld pipe fd
We want to have no fd events registered when we are idle. This
implies that we must be able to deregister our interest in the sigchld
self-pipe fd, not just modify to request no events.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 16:44:52 +0000 (16:44 +0000)]
libxl: events: Deregister xenstore watch fd when not needed
We want to have no fd events registered when we are idle.
In this patch, deal with the xenstore watch fd:
* Track the total number of active watches.
* When deregistering a watch, or when watch registration fails, check
whether there are now no watches and if so deregister the fd.
* On libxl teardown, the watch fd should therefore be unregistered.
assert that this is the case.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 16:17:49 +0000 (16:17 +0000)]
libxl: events: Assert that libxl_ctx_free is not called from a hook
No-one in their right mind would do this, and if they did everything
would definitely collapse. Arrange that if this happens, we crash
ASAP.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 5 Dec 2014 10:49:47 +0000 (11:49 +0100)]
tools/xenstore: fix link error with libsystemd
Linking fails with undefined reference to the used systemd functions.
Move LDFLAGS after the object files to fix the failure.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Thu, 4 Dec 2014 19:26:55 +0000 (19:26 +0000)]
xen/arm: Correct the opcode for BUG_INSTR on arm32
A 0 was forgotten when the arm32 BUG instruction opcode has been added in commit
3e802c6ca1fb9a9549258c2855a57cad483f3cbd "xen/arm: Correctly support WARN_ON".
This will result to use a valid instruction (mcreq 0, 3, r0, cr15, cr0, {7}),
and inhibit usage of BUG/WARN_ON and co.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Fri, 5 Dec 2014 17:03:07 +0000 (12:03 -0500)]
flask/policy: Example policy updates for migration
The example XSM policy was missing permission for dom0_t to migrate
domains; add these permissions.
Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Mon, 8 Dec 2014 13:45:46 +0000 (14:45 +0100)]
switch to write-biased r/w locks
This is to improve fairness: A permanent flow of read acquires can
otherwise lock out eventual writers indefinitely.
This is CVE-2014-9065 / XSA-114.
Signed-off-by: Keir Fraser <keir@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Don Dugger [Fri, 5 Dec 2014 11:46:15 +0000 (12:46 +0100)]
VT-d: decouple SandyBridge quirk from VTd timeout
Currently the quirk code for SandyBridge uses the VTd timeout value when
writing to an IGD register. This is the wrong timeout to use and, at
1000 msec., is also much too large. This patch changes the quirk code
to use a timeout that is specific to the IGD device and allows the user
control of the timeout.
Boolean settings for the boot parameter `snb_igd_quirk' keep their current
meaning, enabling or disabling the quirk code with a timeout of 1000 msec.
In addition specifying `snb_igd_quirk=default' will enable the code and
set the timeout to the theoretical maximum of 670 msec. For finer control,
specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
the code and set the timeout to `n' msec.
Signed-off-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Update documentation.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Tue, 2 Dec 2014 15:11:30 +0000 (15:11 +0000)]
systemd: use pkg-config to determine systemd library availability
AC_CHECK_LIB fails on Debian Jessie since the ld flag it generates is
incorrect, even in the event systemd library is available. Use
PKG_CHECK_MODULES instead.
Tested on Debian Jessie and Arch Linux.
Reported-by: Mark Pryor <tlviewer@yahoo.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Cc: Mark Pryor <tlviewer@yahoo.com>
[ ijc -- reran autogen.sh as requested ]
Julien Grall [Fri, 28 Nov 2014 15:17:06 +0000 (15:17 +0000)]
xen/arm: Handle platforms with edge-triggered virtual timer
Some platforms (such as Xgene and ARMv8 models) use an edge-triggered interrupt
for the virtual timer. Even if the timer output signal is masked in the
context switch, the GIC will keep track that of any interrupts raised
while IRQs are disabled. As soon as IRQs are re-enabled, the virtual
interrupt timer will be injected to Xen.
If an idle vVCPU was scheduled next then the interrupt handler doesn't
expect to the receive the IRQ and will crash:
(XEN) [<
0000000000228388>] _spin_lock_irqsave+0x28/0x94 (PC)
(XEN) [<
0000000000228380>] _spin_lock_irqsave+0x20/0x94 (LR)
(XEN) [<
0000000000250510>] vgic_vcpu_inject_irq+0x40/0x1b0
(XEN) [<
000000000024bcd0>] vtimer_interrupt+0x4c/0x54
(XEN) [<
0000000000247010>] do_IRQ+0x1a4/0x220
(XEN) [<
0000000000244864>] gic_interrupt+0x50/0xec
(XEN) [<
000000000024fbac>] do_trap_irq+0x20/0x2c
(XEN) [<
0000000000255240>] hyp_irq+0x5c/0x60
(XEN) [<
0000000000241084>] context_switch+0xb8/0xc4
(XEN) [<
000000000022482c>] schedule+0x684/0x6d0
(XEN) [<
000000000022785c>] __do_softirq+0xcc/0xe8
(XEN) [<
00000000002278d4>] do_softirq+0x14/0x1c
(XEN) [<
0000000000240fac>] idle_loop+0x134/0x154
(XEN) [<
000000000024c160>] start_secondary+0x14c/0x15c
(XEN) [<
0000000000000001>]
0000000000000001
The proper solution is to context switch the virtual interrupt state at
the GIC level. This would also avoid masking the output signal which
requires specific handling in the guest OS and more complex code in Xen
to deal with EOIs, and so is desirable for that reason too.
Sadly, this solution requires some refactoring which would not be
suitable for a freeze exception for the Xen 4.5 release.
For now implement a temporary solution which ignores the virtual timer
interrupt when the idle VCPU is running.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- tweaked some wording in the comment ]
Boris Ostrovsky [Tue, 25 Nov 2014 16:11:50 +0000 (11:11 -0500)]
pygrub: Fix regression from c/s
d1b93ea, attempt 2
c/s
d1b93ea causes substantial functional regressions in pygrub's ability to
parse bootloader configuration files.
c/s
d1b93ea itself changed an an interface which previously used exclusively
integers, to using strings in the case of a grub configuration with explicit
default set, along with changing the code calling the interface to require a
string. The default value for "default" remained as an integer.
As a result, any Extlinux or Lilo configuration (which drives this interface
exclusively with integers), or Grub configuration which doesn't explicitly
declare a default will die with an AttributeError when attempting to call
"self.cf.default.isdigit()" where "default" is an integer.
Sadly, this AttributeError gets swallowed by the blanket ignore in the loop
which searches partitions for valid bootloader configurations, causing the
issue to be reported as "Unable to find partition containing kernel"
We should explicitly check type of "default" in image_index() and process it
appropriately.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Vitaly Kuznetsov [Tue, 2 Dec 2014 15:18:08 +0000 (16:18 +0100)]
libxc: check in xc_get_tot_pages() that the proper domain is reported
XEN_DOMCTL_getdomaininfo, which is being used by xc_domain_getinfo(), has
strange interface: it reports first domain which has domid >= requested domid
so all callers are supposed to check that the proper domain(s) was queried
by checking domid. xc_get_tot_pages() doesn't do that. In case the requested
domain was destroyed it will report first domain with domid > requested domid
which is apparently misleading as there is no way xc_get_tot_pages() callers
can figure out that they got tot_pages for some other domain.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 27 Nov 2014 12:34:34 +0000 (12:34 +0000)]
python/xs: Correct the indirection of the NULL xshandle() check
The code now now matches its comment, and will actually catch the case of a
bad xs handle.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-ID:
1055948
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Xen Coverity Team <coverity@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 27 Nov 2014 12:34:33 +0000 (12:34 +0000)]
python/xc: Fix multiple issues in pyxc_readconsolering()
Don't leak a 16k allocation if PyArg_ParseTupleAndKeywords() or the first
xc_readconsolering() fail. It is trivial to run throught the processes memory
by repeatedly passing junk parameters to this function.
In the case that the call to xc_readconsolering() in the while loop fails,
reinstate str before breaking out, and passing a spurious pointer to free().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-IDs:
1054984 1055906
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Xen Coverity Team <coverity@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel Kiper [Tue, 2 Dec 2014 15:16:30 +0000 (16:16 +0100)]
gitignore: group tools/hotplug files in one place
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Daniel Kiper [Tue, 2 Dec 2014 15:16:29 +0000 (16:16 +0100)]
gitignore: ignore some files generated by configure
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Olaf Hering [Tue, 2 Dec 2014 15:39:23 +0000 (16:39 +0100)]
tools/hotplug: update systemd dependency to use service instead of socket
Since commit
4542ae340d75bd6319e3fcd94e6c9336e210aeef ("tools/hotplug:
systemd xenstored dependencies") all service files use the .socket unit
as startup dependency. While this happens to work for boot it fails for
shutdown because a .socket does not seem to enforce ordering. When
xendomains.service runs during shutdown then systemd will stop
xenstored.service at the same time.
Change all "xenstored.socket" to "xenstored.service" to let systemd know
that xenstored has to be shutdown after everything else.
Reported-by: Mark Pryor <tlviewer@yahoo.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Wed, 3 Dec 2014 10:41:38 +0000 (10:41 +0000)]
libxl: expose #define to 4.5 and above
In
e3abab74 (libxl: un-constify return value of libxl_basename), the
macro was exposed to releases < 4.5. However only new code is able to
make use of that macro so it should be exposed to releases >= 4.5.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Wed, 3 Dec 2014 08:52:34 +0000 (09:52 +0100)]
INSTALL: fix typo in xendomains.service name
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 3 Dec 2014 17:03:59 +0000 (17:03 +0000)]
Merge branch 'master' of xenbits.xen.org:/home/xen/git/xen into staging
Konrad Rzeszutek Wilk [Wed, 3 Dec 2014 15:31:22 +0000 (10:31 -0500)]
Xen-4.5.0-rc3: Update tag for qemu-xen.
QEMU-traditional has nothing new, so stay at rc1 there.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Mon, 1 Dec 2014 11:31:13 +0000 (11:31 +0000)]
xl: fix two memory leaks
Free strings returned by libxl_basename after used.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- s/basename/kernel_basename in parse_config_data to avoid
shadowing basename(3). ]
Euan Harris [Mon, 1 Dec 2014 14:27:06 +0000 (14:27 +0000)]
libxl: Don't dereference null new_name pointer in libxl_domain_rename()
libxl__domain_rename() unconditionally dereferences its new_name
parameter, to check whether it is an empty string. Add a check to
avoid a segfault if new_name is null.
Signed-off-by: Euan Harris <euan.harris@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Dec 2014 11:31:12 +0000 (11:31 +0000)]
libxl: un-constify return value of libxl_basename
The string returned is malloc'ed but marked as "const".
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chunyan Liu [Fri, 28 Nov 2014 05:55:22 +0000 (13:55 +0800)]
missing chunk of HVM direct kernel boot patch
Found by Stefano, this chunk of the patch was never applied to
xen-unstable (commit
11dffa2359e8a2629490c14c029c7c7c777b3e47),
see http://marc.info/?l=qemu-devel&m=
140471493425353&w=2.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Razvan Cojocaru [Fri, 28 Nov 2014 12:26:48 +0000 (14:26 +0200)]
xenstore: Clarify xs_open() semantics
Added to the xs_open() comments in xenstore.h. The text has been
taken almost verbatim from a xen-devel email by Ian Campbell,
and confirmed as accurate by Ian Jackson.
Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
M A Young [Tue, 2 Dec 2014 13:48:54 +0000 (13:48 +0000)]
xl: fix migration failure with xl migrate --debug
Migrations with xl migrate --debug will fail because debugging
information from the receiving process is written to the stdout
channel. This channel is also used for status messages so the
migration will fail as the sending process receives an unexpected
message. This patch moves the debugging information to the stderr
channel.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Euan Harris [Mon, 1 Dec 2014 10:47:33 +0000 (10:47 +0000)]
libxl: libxl_domain_info: fix typo in error message
Signed-off-by: Euan Harris <euan.harris@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 2 Dec 2014 11:48:01 +0000 (12:48 +0100)]
x86/HVM: prevent infinite VM entry retries
This reverts the VMX side of commit
28b4baac ("x86/HVM: don't crash
guest upon problems occurring in user mode") and gets SVM in line with
the resulting VMX behavior. This is because Andrew validly says
"A failed vmentry is overwhelmingly likely to be caused by corrupt
VMC[SB] state. As a result, injecting a fault and retrying the the
vmentry is likely to fail in the same way."
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Zhigang Wang [Tue, 18 Nov 2014 20:57:08 +0000 (15:57 -0500)]
set pv guest default video_memkb to 0
Before this patch, pv guest video_memkb is -1, which is an invalid value.
And it will cause the xenstore 'memory/targe' calculation wrong:
memory/target = info->target_memkb - info->video_memkb
Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Chunyan Liu [Wed, 19 Nov 2014 06:34:11 +0000 (14:34 +0800)]
fix rename: xenstore not fully updated
libxl__domain_rename only updates /local/domain/<domid>/name,
/vm/<uuid>/name in xenstore are not updated. Add code in
libxl__domain_rename to update /vm/<uuid>/name too.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Chunyan Liu [Wed, 19 Nov 2014 06:34:10 +0000 (14:34 +0800)]
remove domain field in xenstore backend dir
Remove the unusual 'domain' field under backend directory. The
affected are backend/console, backend/vfb, backend/vkbd.
The correct way to obtain a domain's name is via
libxl_domid_to_name(), or by reading from /local/domain/$DOMID/name
for toolstacks not using libxl.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- added second paragraph to commit message ]
Andrew Cooper [Wed, 26 Nov 2014 15:09:40 +0000 (15:09 +0000)]
tools/oxenstored: Fix | vs & error in fd event handling
This makes fields 0 and 1 true more often than they should be, resulting
problems when handling events.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Dave Scott <Dave.Scott@eu.citrix.com>
CC: Zheng Li <zheng.li3@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Zheng Li <dev@zheng.li>
Reviewed-by: David Scott <dave.scott@citrix.com>
Olaf Hering [Thu, 27 Nov 2014 09:26:26 +0000 (10:26 +0100)]
INSTALL: correct EXTRA_CFLAGS handling
The already documented configure patch was not applied.
Adjust documentation to describe existing behaviour.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Tue, 25 Nov 2014 10:59:47 +0000 (10:59 +0000)]
libxl: allow copying between bitmaps of different sizes
When parsing bitmap objects JSON parser will create libxl_bitmap map of the
smallest size needed.
This can cause problems when saved image file specifies CPU affinity. For
example, if 'vcpu_hard_affinity' in the saved image has only the first CPU
specified, just a single byte will be allocated and libxl_bitmap->size will be
set to 1.
This will result in assertion in libxl_set_vcpuaffinity()->libxl_bitmap_copy()
since the destination bitmap is created for maximum number of CPUs.
We could allocate that bitmap of the same size as the source, however, it is
later passed to xc_vcpu_setaffinity() which expects it to be sized to the max
number of CPUs
To fix this issue, introduce an internal function to allowing copying between
bitmaps of different sizes. Note that this function is only used in
libxl_set_vcpuaffinity at the moment. Though NUMA placement logic invoke
libxl_bitmap_copy as well there's no need to replace those invocations. NUMA
placement logic comes into effect when no vcpu / node pinning is provided, so
it always operates on bitmap of the same sizes (that is, size of maximum
number of cpus /nodes).
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 27 Nov 2014 13:04:44 +0000 (14:04 +0100)]
docs/commandline: Refresh document for 4.5
* Add options whose code has been committed (without a patch to this file).
* Remove options which have been deleted (without a patch to this file).
* Tweak some formatting for consistency.
* Nuke some trailing whitespace.
I believe this document now identifies the exact set of command line options
accepted by Xen, for both x86 and ARM architectures.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Daniel De Graaf [Thu, 27 Nov 2014 13:04:23 +0000 (14:04 +0100)]
xsm/flask: add two missing domctls
Reported-by: Michael Young <m.a.young@durham.ac.uk>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 27 Nov 2014 13:03:23 +0000 (14:03 +0100)]
x86/PVH: properly disable vLAPIC
Rather than guarding higher level operations (like vPMU initialization
as suggested by Boris in
http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg02278.html)
mark the vLAPIC hardware disabled for PVH guests and prevent it from
getting moved out of this state.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Boris Ostrovsky [Thu, 27 Nov 2014 13:02:45 +0000 (14:02 +0100)]
x86: disable VPMU for PVH guests
Currently when VPMU is enabled on a system both HVM and PVH VPCUs will
initialize their VPMUs, including setting up vpmu_ops. As result even
though VPMU will not work for PVH guests (APIC is not supported there),
the guest may decide to perform a write to a PMU MSR. This will cause a
call to is_vlapic_lvtpc_enabled() which will crash the hypervisor, e.g.:
(XEN) Xen call trace:
(XEN) [<
ffff82d0801ca06f>] is_vlapic_lvtpc_enabled+0x13/0x22
(XEN) [<
ffff82d0801e2a15>] core2_vpmu_do_wrmsr+0x415/0x589
(XEN) [<
ffff82d0801cedaa>] vpmu_do_wrmsr+0x2a/0x33
(XEN) [<
ffff82d0801dd648>] vmx_msr_write_intercept+0x268/0x557
(XEN) [<
ffff82d0801bcd2e>] hvm_msr_write_intercept+0x36c/0x39b
(XEN) [<
ffff82d0801e0a0e>] vmx_vmexit_handler+0x1082/0x185b
(XEN) [<
ffff82d0801e74c1>] vmx_asm_vmexit_handler+0x41/0xc0
If we prevent VPMU from being initialized on PVH guests we will avoid
those accesses.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 27 Nov 2014 13:01:40 +0000 (14:01 +0100)]
x86/HVM: confine internally handled MMIO to solitary regions
While it is generally wrong to cross region boundaries when dealing
with MMIO accesses of repeated string instructions (currently only
MOVS) as that would do things a guest doesn't expect (leaving aside
that none of these regions would normally be accessed with repeated
string instructions in the first place), this is even more of a problem
for all virtual MSI-X page accesses (both msixtbl_{read,write}() can be
made dereference NULL "entry" pointers this way) as well as undersized
(1- or 2-byte) LAPIC writes (causing vlapic_read_aligned() to access
space beyond the one memory page set up for holding LAPIC register
values).
Since those functions validly assume to be called only with addresses
their respective checking functions indicated to be okay, it is generic
code that needs to be fixed to clip the repetition count.
To be on the safe side (and consistent), also do the same for buffered
I/O intercepts, even if their only client (stdvga) doesn't put the
hypervisor at risk (i.e. "only" guest misbehavior would result).
This is CVE-2014-8867 / XSA-112.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 27 Nov 2014 13:00:23 +0000 (14:00 +0100)]
x86: limit checks in hypercall_xlat_continuation() to actual arguments
HVM/PVH guests can otherwise trigger the final BUG_ON() in that
function by entering 64-bit mode, setting the high halves of affected
registers to non-zero values, leaving 64-bit mode, and issuing a
hypercall that might get preempted and hence become subject to
continuation argument translation (HYPERVISOR_memory_op being the only
one possible for HVM, PVH also having the option of using
HYPERVISOR_mmuext_op). This issue got introduced when HVM code was
switched to use compat_memory_op() - neither that nor
hypercall_xlat_continuation() were originally intended to be used by
other than PV guests (which can't enter 64-bit mode and hence have no
way to alter the high halves of 64-bit registers).
This is CVE-2014-8866 / XSA-111.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Ian Campbell [Tue, 25 Nov 2014 16:24:19 +0000 (16:24 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Jan Beulich [Tue, 25 Nov 2014 16:21:52 +0000 (17:21 +0100)]
vNUMA: rename interface structures
No-one (including me) paid attention during review that these
structures don't adhere to the naming requirements of the public
interface: Consistently use xen_ prefixes at least for all new
additions.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Campbell [Thu, 20 Nov 2014 15:48:47 +0000 (15:48 +0000)]
libxc: don't leak buffer containing the uncompressed PV kernel
The libxc xc_dom_* infrastructure uses a very simple malloc memory pool which
is freed by xc_dom_release. However the various xc_try_*_decode routines (other
than the gzip one) just use plain malloc/realloc and therefore the buffer ends
up leaked.
The memory pool currently supports mmap'd buffers as well as a directly
allocated buffers, however the try decode routines make use of realloc and do
not fit well into this model. Introduce a concept of an external memory block
to the memory pool and provide an interface to register such memory.
The mmap_ptr and mmap_len fields of the memblock tracking struct lose their
mmap_ prefix since they are now also used for external memory blocks.
We are only seeing this now because the gzip decoder doesn't leak and it's only
relatively recently that kernels in the wild have switched to better
compression.
This is https://bugs.debian.org/767295
Reported by: Gedalya <gedalya@gedalya.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 19 Nov 2014 15:28:15 +0000 (15:28 +0000)]
xen: arm: Support the other 4 PCI buses on Xgene
Currently we only establish specific mappings for pcie0, which is
used on the Mustang platform. However at least McDivitt uses pcie3.
So wire up all the others, based on whether the corresponding DT node
is marked as available.
This results in no change for Mustang.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Stefano Stabellini [Fri, 21 Nov 2014 14:31:30 +0000 (14:31 +0000)]
xen/arm: clear UIE on hypervisor entry
UIE being set can cause maintenance interrupts to occur when Xen writes
to one or more LR registers. The effect is a busy loop around the
interrupt handler in Xen
(http://marc.info/?l=xen-devel&m=
141597517132682): everything gets stuck.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
Tested-by: Julien Grall <julien.grall@linaro.org>
Release-acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Julien Grall [Thu, 20 Nov 2014 17:36:03 +0000 (17:36 +0000)]
scripts/get_maintainer.pl: Correctly CC the maintainers
The current script is setting $email_remove_duplicates to 1 by default, on
complex patch (see [1]), this will result to ommitting randomly some
maintainers.
This is because, the script will:
1) Get the list of maintainers of the file (incidentally all the
maintainers in "THE REST" role are added). If the email address already
exists in the global list, skip it. => The role will be lost
2) Filter the list to remove the entry with "THE REST" role
So if a maintainers is marked with "THE REST" role on the first file and
actually be an x86 maintainers on the script, the script will only retain
the "THE REST" role. During the filtering step, this maintainers will
therefore be dropped.
This patch fixes this by setting $email_remove_duplicates to 0 by default.
The new behavior of the script will be:
1) Append the list of maintainers for every file
2) Filter the list to remove the entry with "THE REST" role
3) Remove duplicated email address
Example:
Patch: https://patches.linaro.org/41083/
Before the patch:
Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Jackson <ian.jackson@eu.citrix.com>
Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Wei Liu <wei.liu2@citrix.com>
George Dunlap <george.dunlap@eu.citrix.com>
xen-devel@lists.xen.org
After the patch:
Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Jackson <ian.jackson@eu.citrix.com>
Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Wei Liu <wei.liu2@citrix.com>
Stefano Stabellini <stefano.stabellini@citrix.com>
Tim Deegan <tim@xen.org>
Keir Fraser <keir@xen.org>
Jan Beulich <jbeulich@suse.com>
George Dunlap <george.dunlap@eu.citrix.com>
xen-devel@lists.xen.org
[1] http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00060.html
Signed-off-by: Julien Grall <julien.grall@linaro.org>
CC: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
George Dunlap [Wed, 12 Nov 2014 17:31:33 +0000 (17:31 +0000)]
xl: Return proper error codes for block-attach and block-detach
Return proper error codes on failure so that scripts can tell whether
the command completed properly or not.
This is not a proper fix, since it fails to call
libxl_device_disk_dispose() on the error path. But a proper fix
requires some refactoring, so given where we are in the release
process, it's better to have a fix that is simple and obvious, and do
the refactoring once the next development window opens up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 25 Nov 2014 09:08:57 +0000 (10:08 +0100)]
x86/HVM: don't crash guest upon problems occurring in user mode
This extends commit
5283b310 ("x86/HVM: only kill guest when unknown VM
exit occurred in guest kernel mode") to a few more cases, including the
failed VM entry one that XSA-110 was needed to be issued for.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 25 Nov 2014 09:08:04 +0000 (10:08 +0100)]
x86: don't ignore foreigndom input on various MMUEXT ops
Instead properly fail requests that shouldn't be issued on foreign
domains or - for MMUEXT_{CLEAR,COPY}_PAGE - extend the existing
operation to work that way.
In the course of doing this the need to always clear "okay" even when
wanting an error code other than -EINVAL became unwieldy, so the
respective logic is being adjusted at once, together with a little
other related cleanup.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 25 Nov 2014 09:07:09 +0000 (10:07 +0100)]
x86: tighten page table owner checking in do_mmu_update()
MMU_MACHPHYS_UPDATE, not manipulating page tables, shouldn't ignore
a bad page table domain being specified.
Also pt_owner can't be NULL when reaching the "out" label, so the
respective check can be dropped.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 25 Nov 2014 09:05:29 +0000 (10:05 +0100)]
x86/cpuidle: don't count C1 multiple times
Commit
4ca6f9f0 ("x86/cpuidle: publish new states only after fully
initializing them") resulted in the state counter to be incremented
for C1 despite that using a fixed table entry (and the statically
initialized counter value already accounting for it and C0).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 24 Nov 2014 16:07:16 +0000 (17:07 +0100)]
dpci: add 'masked' as a gate for hvm_dirq_assist to process
commit
f6dd295381f4b6a66acddacf46bca8940586c8d8 "dpci: replace tasklet
with softirq" used the 'masked' as an two-bit state mechanism
(STATE_SCHED, STATE_RUN) to communicate between 'raise_softirq_for' and
'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be
re-scheduled.
However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.
The code would trigger the following path causing list corruption:
\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
ends the loop, exits.
\- dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.
An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!
This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.
The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.
However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.
By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.
In the 'pt_irq_create_bind' - in the error cases we could be seeing
an softirq scheduled right away and being serviced (though stuck at
the spinlock). The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset'
to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED').
'pt_irq_create_bind' continues on with setting '->flag=0' and unlocks the lock.
'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and
'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing
anything until it hits 'set_timer' which is undefined for MSI. Adding
in 'masked=0' for the MSI case fixes that.
The legacy interrupt one does not need it as there is no chance of
do_IRQ being called at that point.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 20 Nov 2014 16:38:46 +0000 (17:38 +0100)]
x86/mm: fix a reference counting error in MMU_MACHPHYS_UPDATE
Any domain which can pass the XSM check against a translated guest can cause a
page reference to be leaked.
While shuffling the order of checks, drop the quite-pointless MEM_LOG(). This
brings the check in line with similar checks in the vicinity.
Discovered while reviewing the XSA-109/110 followup series.
This is XSA-113.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Thu, 20 Nov 2014 15:22:00 +0000 (15:22 +0000)]
docs/commandline: Fix formatting issues
For 'dom0_max_vcpus' and 'hvm_debug', markdown was interpreting the text as
regular text, and reflowing it as a regular paragraph, leading to a single
line as output. Reformat them as code blocks inside blockquote blocks, which
causes them to take their precise whitespace layout.
For 'psr', the bullet point was incorrectly delineated from paragraph text,
causing it to be reflowed. Alter the formatting to include the CMT-specific
options as sub-bullets of the overall CMT resource.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Release-acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 19 Nov 2014 15:28:14 +0000 (15:28 +0000)]
xen: arm: correct specific mappings for PCIE0 on X-Gene
The region assigned to PCIE0, according to the docs, is 0x0e000000000 to
0x10000000000. They make no distinction between PCI CFG and PCI IO mem within
this range (in fact, I'm not sure that isn't up to the driver).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 15:28:13 +0000 (15:28 +0000)]
xen: arm: correct off by one in xgene-storm's map_one_mmio
The callers pass the end as the pfn immediately *after* the last page to be
mapped, therefore adding one is incorrect and causes an additional page to be
mapped.
At the same time correct the printing of the mfn values, zero-padding them to
16 digits as for a paddr when they are frame numbers is just confusing.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 15:28:12 +0000 (15:28 +0000)]
xen: arm: Drop EARLY_PRINTK_BAUD from entries which don't set ..._INIT_UART
EARLY_PRINTK_BAUD doesn't do anything unless EARLY_PRINTK_INIT_UART is set.
Furthermore only the pl011 driver implements the init routine at all, so the
entries which use any other UART driver and specified a BAUD were doubly wrong.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 15:28:11 +0000 (15:28 +0000)]
xen: arm: Add earlyprintk for McDivitt.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 10:42:18 +0000 (10:42 +0000)]
docs: workaround markdown parser error in xen-command-line.markdown
Some versions of markdown (specifically the one in Debian Wheezy, currently
used to generate
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
confused by nested lists in the middle of multi-paragraph parent list entries
as seen in the com1,com2 entry.
The effect is that the "Default" section of all following entries are replace
by some sort of hash or checksum (at least, a string of 32 random seeming hex
digits).
Workaround this issue by making the decriptions of the DPS options a nested
list, moving the existing nested list describing the options for S into a third
level list. This seems to avoid the issue, and is arguably better formatting in
its own right (at least its not a regression IMHO)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Clark Laughlin [Wed, 12 Nov 2014 15:38:48 +0000 (09:38 -0600)]
mkdeb: correctly map package architectures for x86 and ARM
mkdeb previously set the package architecture to be 'amd64' for anything other than
XEN_TARGET_ARCH=x86_32. This patch attempts to correctly map the architecture
from XEN_TARGET_ARCH to the Debian architecture names for x86 and ARM
architectures.
Signed-off-by: Clark Laughlin <clark.laughlin@linaro.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Euan Harris [Tue, 18 Nov 2014 17:07:41 +0000 (17:07 +0000)]
libxl: Document device parameter of libxl_device_<type>_add functions
The device parameter of libxl_device_<type>_add is an in/out parameter.
Unspecified fields are filled in with appropriate values for the created
device when the function returns. Document this behaviour.
Signed-off-by: Euan Harris <euan.harris@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Mon, 17 Nov 2014 12:10:34 +0000 (12:10 +0000)]
libxl: remove existence check for PCI device hotplug
The existence check is to make sure a device is not added to a guest
multiple times.
PCI device backend path has different rules from vif, disk etc. For
example:
/local/domain/0/backend/pci/9/0/dev-1/0000:03:10.1
/local/domain/0/backend/pci/9/0/key-1/0000:03:10.1
/local/domain/0/backend/pci/9/0/dev-2/0000:03:10.2
/local/domain/0/backend/pci/9/0/key-2/0000:03:10.2
The devid for PCI devices is hardcoded 0. libxl__device_exists only
checks up to /local/.../9/0 so it always returns true even the device is
assignable.
Remove invocation of libxl__device_exists. We're sure at this point that
the PCI device is assignable (hence no xenstore entry or JSON entry).
The check is done before hand. For HVM guest it's done by calling
xc_test_assign_device and for PV guest it's done by calling
pciback_dev_is_assigned.
Reported-by: Li, Liang Z <liang.z.li@intel.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 5 Nov 2014 14:32:47 +0000 (14:32 +0000)]
libxl: CODING_STYLE: Discuss existing style problems
Document that:
- the existing code is not all confirming yet
- code should conform
- we will sometimes accept patches with nonconforming elements if
they don't make matters worse.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 5 Nov 2014 14:26:59 +0000 (14:26 +0000)]
libxl: CODING_STYLE: Mention function out parameters
We seem to use both `_r' and `_out'. Document both.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 5 Nov 2014 14:26:30 +0000 (14:26 +0000)]
libxl: CODING_STYLE: Deprecate `error' for out blocks
We should have only one name for this and `out' is more prevalent.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 5 Nov 2014 14:25:03 +0000 (14:25 +0000)]
libxl: CODING_STYLE: Much new material
Discuss:
Memory allocation
Conventional variable names
Convenience macros
Error handling
Idempotent data structure construction/destruction
Asynchronous/long-running operations
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 18 Nov 2014 13:16:23 +0000 (14:16 +0100)]
x86emul: enforce privilege level restrictions when loading CS
Privilege level checks were basically missing for the CS case, the
only check that was done (RPL == DPL for nonconforming segments)
was solely covering a single special case (return to non-conforming
segment).
Additionally in long mode the L bit set requires the D bit to be clear,
as was recently pointed out for KVM by Nadav Amit
<namit@cs.technion.ac.il>.
Finally we also need to force the loaded selector's RPL to CPL (at
least as long as lret/retf emulation doesn't support privilege level
changes).
This is CVE-2014-8595 / XSA-110.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 18 Nov 2014 13:15:21 +0000 (14:15 +0100)]
x86: don't allow page table updates on non-PV page tables in do_mmu_update()
paging_write_guest_entry() and paging_cmpxchg_guest_entry() aren't
consistently supported for non-PV guests (they'd deref NULL for PVH or
non-HAP HVM ones). Don't allow respective MMU_* operations on the
page tables of such domains.
This is CVE-2014-8594 / XSA-109.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Mon, 17 Nov 2014 14:07:03 +0000 (15:07 +0100)]
EFI: allow retry of ExitBootServices() call
The specification is kind of vague under what conditions
ExitBootServices() may legitimately fail, requiring the OS loader to
retry:
"If MapKey value is incorrect, ExitBootServices() returns
EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must
be called again. Firmware implementation may choose to do a partial
shutdown of the boot services during the first call to
ExitBootServices(). EFI OS loader should not make calls to any boot
service function other then GetMemoryMap() after the first call to
ExitBootServices()."
While our code guarantees the map key to be valid, there are systems
where a firmware internal notification sent while processing
ExitBootServices() reportedly results in changes to the memory map.
In that case, make a best effort second try: Avoid any boot service
calls other than the two named above, with the possible exception of
error paths. Those aren't a problem, since if we end up needing to
retry, we're hosed when something goes wrong as much as if we didn't
make the retry attempt.
For x86, a minimal adjustment to efi_arch_process_memory_map() is
needed for it to cope with potentially being called a second time.
For arm64, while efi_process_memory_map_bootinfo() is easy to verify
that it can safely be called more than once without violating spec
constraints, it's not so obvious for fdt_add_uefi_nodes(), hence a
step by step approach:
- deletion of memory nodes and memory reserve map entries: the 2nd pass
shouldn't find any as the 1st one deleted them all,
- a "chosen" node should be found as it got added in the 1st pass,
- the various "linux,uefi-*" nodes all got added during the 1st pass
and hence only their contents may get updated.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roy Franz <roy.franz@linaro.org>
Release-acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Mon, 17 Nov 2014 14:05:53 +0000 (15:05 +0100)]
x86: (allow to) override LIST_POISON*
Having these point into space not controlled by the hypervisor provides
an unnecessary attack surface. Allow architectures to override them and
utilize that override to make them non-canonical addresses (thus
causing #GP rather than #PF when dereferenced).
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Wed, 12 Nov 2014 10:39:31 +0000 (10:39 +0000)]
libxl: add missing action in DEFINE_DEVICE_ADD
... otherwise when device add operation fails, the error message looks
like "libxl: error: libxl.c:1897:device_addrm_aocomplete: unable to (null)
device", which is not very helpful.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Emil Condrea [Thu, 30 Oct 2014 13:05:30 +0000 (15:05 +0200)]
vTPM: Fix Atmel timeout bug.
Some versions of Atmel TPMs provide invalid values for TPM_CAP_PROP_TIS_TIMEOUT query.
Because timeouts are invalid, every other command after tpm_get_timeouts will fail.
It is a known issue and it was fixed recently in linux kernel tpm_tis.c on 2014-07-29.
This patch does not allow timeouts to be less than standard values.
I tested it on a Dell Latitude E5520 and after making the changes I was able to start vtpmmgr-stubdom.
Signed-off-by: Emil Condrea <emilcondrea@gmail.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Wei Liu [Wed, 12 Nov 2014 11:05:58 +0000 (11:05 +0000)]
xl: correct test condition on libxl_domain_info
The `if' statement considered return value 0 from libxl_domain_info an
error, while 0 actually means success.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Tue, 14 Oct 2014 06:55:23 +0000 (08:55 +0200)]
tools/hotplug: use configure --sysconfdir result
... instead of hardcoding values and guess where they config files may
be. Also use the result of --with-sysconfig-leaf-dir.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Juergen Gross [Wed, 12 Nov 2014 11:39:58 +0000 (12:39 +0100)]
adjust number of domains in cpupools when destroying domain
Commit
bac6334b51d9bcfe57ecf4a4cb5288348fcf044a (move domain to
cpupool0 before destroying it) introduced an error in the accounting
of cpupools regarding the number of domains. The number of domains
is nor adjusted when a domain is moved to cpupool0 in kill_domain().
Correct this by introducing a cpupool function doing the move
instead of open coding it by calling sched_move_domain().
Reported-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Reviewed-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Konrad Rzeszutek Wilk [Wed, 12 Nov 2014 11:38:08 +0000 (12:38 +0100)]
dpci: replace tasklet with softirq
The existing tasklet mechanism has a single global spinlock that is
taken every-time the global list is touched. And we use this lock quite
a lot - when we call do_tasklet_work which is called via an softirq and
from the idle loop. We take the lock on any operation on the
tasklet_list.
The problem we are facing is that there are quite a lot of tasklets
scheduled. The most common one that is invoked is the one injecting the
VIRQ_TIMER in the guest. Guests are not insane and don't set the
one-shot or periodic clocks to be in sub 1ms intervals (causing said
tasklet to be scheduled for such small intervalls).
The problem appears when PCI passthrough devices are used over many
sockets and we have an mix of heavy-interrupt guests and idle guests.
The idle guests end up seeing 1/10 of its RUNNING timeslice eaten by
the hypervisor (and 40% steal time).
The mechanism by which we inject PCI interrupts is by hvm_do_IRQ_dpci
which schedules the hvm_dirq_assist tasklet every time an interrupt is
received. The callchain is:
_asm_vmexit_handler
-> vmx_vmexit_handler
->vmx_do_extint
-> do_IRQ
-> __do_IRQ_guest
-> hvm_do_IRQ_dpci
tasklet_schedule(&dpci->dirq_tasklet);
[takes lock to put the tasklet on]
[later on the schedule_tail is invoked which is 'vmx_do_resume']
vmx_do_resume
-> vmx_asm_do_vmentry
-> call vmx_intr_assist
-> vmx_process_softirqs
-> do_softirq
[executes the tasklet function, takes the
lock again]
While on other CPUs they might be sitting in a idle loop and invoked to
deliver an VIRQ_TIMER, which also ends up taking the lock twice: first
to schedule the v->arch.hvm_vcpu.assert_evtchn_irq_tasklet (accounted
to the guests' BLOCKED_state); then to execute it - which is accounted
for in the guest's RUNTIME_state.
The end result is that on a 8 socket machine with PCI passthrough,
where four sockets are busy with interrupts, and the other sockets have
idle guests - we end up with the idle guests having around 40% steal
time and 1/10 of its timeslice (3ms out of 30 ms) being tied up taking
the lock. The latency of the PCI interrupts delieved to guest is also
hindered.
With this patch the problem disappears completly. That is removing the
lock for the PCI passthrough use-case (the 'hvm_dirq_assist' case) by
not using tasklets at all.
The patch is simple - instead of scheduling an tasklet we schedule our
own softirq - HVM_DPCI_SOFTIRQ, which will take care of running
'hvm_dirq_assist'. The information we need on each CPU is which
'struct hvm_pirq_dpci' structure the 'hvm_dirq_assist' needs to run on.
That is simple solved by threading the 'struct hvm_pirq_dpci' through a
linked list. The rule of only running one 'hvm_dirq_assist' for only
one 'hvm_pirq_dpci' is also preserved by having 'schedule_dpci_for'
ignore any subsequent calls for an domain which has already been
scheduled.
== Code details ==
Most of the code complexity comes from the '->dom' field in the
'hvm_pirq_dpci' structure. We use it for ref-counting and as such it
MUST be valid as long as STATE_SCHED bit is set. Whoever clears the
STATE_SCHED bit does the ref-counting and can also reset the '->dom'
field.
To compound the complexity, there are multiple points where the
'hvm_pirq_dpci' structure is reset or re-used. Initially (first time
the domain uses the pirq), the 'hvm_pirq_dpci->dom' field is set to
NULL as it is allocated. On subsequent calls in to 'pt_irq_create_bind'
the ->dom is whatever it had last time.
As this is the initial call (which QEMU ends up calling when the guest
writes an vector value in the MSI field) we MUST set the '->dom' to a
the proper structure (otherwise we cannot do proper ref-counting).
The mechanism to tear it down is more complex as there are three ways
it can be executed. To make it simpler everything revolves around
'pt_pirq_softirq_active'. If it returns -EAGAIN that means there is an
outstanding softirq that needs to finish running before we can continue
tearing down. With that in mind:
a) pci_clean_dpci_irq. This gets called when the guest is being
destroyed. We end up calling 'pt_pirq_softirq_active' to see if it
is OK to continue the destruction.
The scenarios in which the 'struct pirq' (and subsequently the
'hvm_pirq_dpci') gets destroyed is when:
- guest did not use the pirq at all after setup.
- guest did use pirq, but decided to mask and left it in that state.
- guest did use pirq, but crashed.
In all of those scenarios we end up calling 'pt_pirq_softirq_active'
to check if the softirq is still active. Read below on the
'pt_pirq_softirq_active' loop.
b) pt_irq_destroy_bind (guest disables the MSI). We double-check that
the softirq has run by piggy-backing on the existing
'pirq_cleanup_check' mechanism which calls 'pt_pirq_cleanup_check'.
We add the extra call to 'pt_pirq_softirq_active' in
'pt_pirq_cleanup_check'.
NOTE: Guests that use event channels unbind first the event channel
from PIRQs, so the 'pt_pirq_cleanup_check' won't be called as 'event'
is set to zero. In that case we either clean it up via the a) or c)
mechanism.
There is an extra scenario regardless of 'event' being set or not:
the guest did 'pt_irq_destroy_bind' while an interrupt was triggered
and softirq was scheduled (but had not been run). It is OK to still
run the softirq as hvm_dirq_assist won't do anything (as the flags
are set to zero). However we will try to deschedule the softirq if
we can (by clearing the STATE_SCHED bit and us doing the
ref-counting).
c) pt_irq_create_bind (not a typo). The scenarios are:
- guest disables the MSI and then enables it (rmmod and modprobe in
a loop). We call 'pt_pirq_reset' which checks to see if the
softirq has been scheduled. Imagine the 'b)' with interrupts in
flight and c) getting called in a loop.
We will spin up on 'pt_pirq_is_active' (at the start of the
'pt_irq_create_bind') with the event_lock spinlock dropped and waiting
(cpu_relax). We cannot call 'process_pending_softirqs' as it might
result in a dead-lock. hvm_dirq_assist will be executed and then the
softirq will clear 'state' which signals that that we can re-use the
'hvm_pirq_dpci' structure. In case this softirq is scheduled on a
remote CPU the softirq will run on it as the semantics behind an
softirq is that it will execute within the guest interruption.
- we hit once the error paths in 'pt_irq_create_bind' while an
interrupt was triggered and softirq was scheduled.
If the softirq is in STATE_RUN that means it is executing and we should
let it continue on. We can clear the '->dom' field as the softirq has
stashed it beforehand. If the softirq is STATE_SCHED and we are
successful in clearing it, we do the ref-counting and clear the '->dom'
field. Otherwise we let the softirq continue on and the '->dom' field
is left intact. The clearing of the '->dom' is left to a), b) or again
c) case.
Note that in both cases the 'flags' variable is cleared so
hvm_dirq_assist won't actually do anything.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Konrad Rzeszutek Wilk [Wed, 12 Nov 2014 11:37:10 +0000 (12:37 +0100)]
dpci: move from an hvm_irq_dpci (and struct domain) to an hvm_dirq_dpci model
When an interrupt for an PCI (or PCIe) passthrough device is to be sent
to a guest, we find the appropiate 'hvm_dirq_dpci' structure for the
interrupt (PIRQ), set a bit (masked), and schedule an tasklet.
Then the 'hvm_dirq_assist' tasklet gets called with the 'struct domain'
from where it iterates over the the radix-tree of 'hvm_dirq_dpci' (from
zero to the number of PIRQs allocated) which are masked to the guest
and calls each 'hvm_pirq_assist'. If the PIRQ has a bit set (masked) it
figures out how to inject the PIRQ to the guest.
This is inefficient and not fair as:
- We iterate starting at PIRQ 0 and up every time. That means the PCIe
devices that have lower PIRQs get to be called first.
- If we have many PCIe devices passed in with many PIRQs and if most
of the time only the highest numbered PIRQ get an interrupt (as the
initial ones are for control) we end up iterating over many PIRQs.
But we could do beter - the 'hvm_dirq_dpci' has the field for
'struct domain', which we can use instead of having to pass in the
'struct domain'.
As such this patch moves the tasklet to the 'struct hvm_dirq_dpci' and
sets the 'dom' field to the domain. We also double-check that the
'->dom' is not reset before using it.
We have to be careful with this as that means we MUST have 'dom' set
before pirq_guest_bind() is called. As such we add the
'pirq_dpci->dom = d;' to cover for such cases.
The mechanism to tear it down is more complex as there are two ways it
can be executed:
a) pci_clean_dpci_irq. This gets called when the guest is being
destroyed. We end up calling 'tasklet_kill'.
The scenarios in which the 'struct pirq' (and subsequently the
'hvm_pirq_dpci') gets destroyed is when:
- guest did not use the pirq at all after setup.
- guest did use pirq, but decided to mask and left it in that
state.
- guest did use pirq, but crashed.
In all of those scenarios we end up calling 'tasklet_kill' which
will spin on the tasklet if it is running.
b) pt_irq_destroy_bind (guest disables the MSI). We double-check that
the softirq has run by piggy-backing on the existing
'pirq_cleanup_check' mechanism which calls 'pt_pirq_cleanup_check'.
We add the extra call to 'pt_pirq_softirq_active' in
'pt_pirq_cleanup_check'.
NOTE: Guests that use event channels unbind first the event channel
from PIRQs, so the 'pt_pirq_cleanup_check' won't be called as event
is set to zero. In that case we either clean it up via the a)
mechanism. It is OK to re-use the tasklet when 'pt_irq_create_bind'
is called afterwards.
There is an extra scenario regardless of event being set or not:
the guest did 'pt_irq_destroy_bind' while an interrupt was
triggered and tasklet was scheduled (but had not been run). It is
OK to still run the tasklet as hvm_dirq_assist won't do anything
(as the flags are set to zero). As such we can exit out of
hvm_dirq_assist without doing anything.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Meng Xu [Wed, 12 Nov 2014 11:36:04 +0000 (12:36 +0100)]
sched_rt: serialize vcpu data access
Fix the following two issues in rtds scheduler:
1) The runq queue lock is not grabbed when rt_update_deadline is
called in rt_alloc_vdata function, which may cause race condition;
Solution: Move call to rt_update_deadline from _alloc to _insert;
Note: rt_alloc_vdata does not need grab the runq lock, because only one
cpu will allocate the rt_vcpu; before the rt_vcpu is inserted into the
runq, no more than one cpu operates on the rt_vcpu.
2) rt_vcpu_remove should grab the runq lock before remove the vcpu
from runq; otherwise, race condition may happen.
Solution: Add lock in rt_vcpu_remove().
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Meng Xu [Wed, 12 Nov 2014 11:34:49 +0000 (12:34 +0100)]
sched_rt: sanity check input and serialization
Sanity check input params in rt_dom_cntl();
Serialize rt_dom_cntl() against the global lock.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
M A Young [Tue, 11 Nov 2014 20:28:38 +0000 (20:28 +0000)]
fix commit xen/arm: Add support for GICv3 for domU
The build of xen-4.5.0-rc2 fails if XSM_ENABLE=y due to an inconsistency
in commit
fda1614 "xen/arm: Add support for GICv3 for domU" which uses
XEN_DOMCTL_configure_domain in xen/xsm/flask/hooks.c and
xen/xsm/flask/policy/access_vectors but XEN_DOMCTL_arm_configure_domain
elsewhere.
Michael Young
In
fda1614 ("xen/arm: Add support for GICv3 for domU")
XEN_DOMCTL_configure_domain is used in xen/xsm/flask/hooks.c and
xen/xsm/flask/policy/access_vectors but XEN_DOMCTL_arm_configure_domain
is used elsewhere.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Konrad Rzeszutek Wilk [Tue, 11 Nov 2014 14:40:20 +0000 (09:40 -0500)]
Xen 4.5.0-rc2: Update tag for QEMU upstream tree....
QEMU traditional can stay at rc1 since there are no changes in it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Thu, 6 Nov 2014 10:41:28 +0000 (10:41 +0000)]
pvgrub: ignore NUL
When using pvgrub in graphical mode with vnc, the grub timeout doesn't
work: the countdown doesn't even start. With a serial terminal the
problem doesn't occur and the countdown works as expected.
It turns out that the problem is that when using a graphical terminal,
checkkey () returns 0 instead of -1 when there is no activity on the
mouse or keyboard. As a consequence grub thinks that the user typed
something and interrupts the count down.
To fix the issue simply ignore keystrokes returning 0, that is the NUL
character anyway. Add a patch to grub.patches to do that.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Julien Grall [Wed, 5 Nov 2014 13:04:22 +0000 (13:04 +0000)]
xen/arm: Add support for GICv3 for domU
The vGIC will emulate the same version as the hardware. The toolstack has
to retrieve the version of the vGIC in order to be able to create the
corresponding device tree node.
A new DOMCTL has been introduced for ARM to configure the domain. For now
it only allow the toolstack to retrieve the version of vGIC.
This DOMCTL will be extend later to let the user choose the version of the
emulated GIC.
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 6 Nov 2014 13:59:43 +0000 (13:59 +0000)]
tools: libxl: do not overrun input buffer in libxl__parse_mac
Valgrind reports:
==7971== Invalid read of size 1
==7971== at 0x40877BE: libxl__parse_mac (libxl_internal.c:288)
==7971== by 0x405C5F8: libxl__device_nic_from_xs_be (libxl.c:3405)
==7971== by 0x4065542: libxl__append_nic_list_of_type (libxl.c:3484)
==7971== by 0x4065542: libxl_device_nic_list (libxl.c:3504)
==7971== by 0x406F561: libxl_retrieve_domain_configuration (libxl.c:6661)
==7971== by 0x805671C: reload_domain_config (xl_cmdimpl.c:2037)
==7971== by 0x8057F30: handle_domain_death (xl_cmdimpl.c:2116)
==7971== by 0x8057F30: create_domain (xl_cmdimpl.c:2580)
==7971== by 0x805B4B2: main_create (xl_cmdimpl.c:4652)
==7971== by 0x804EAB2: main (xl.c:378)
This is because on the final iteration the tok += 3 skips over the terminating
NUL to the next byte, and then *tok reads it. Fix this by using endptr as the
iterator.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Thu, 6 Nov 2014 13:00:31 +0000 (13:00 +0000)]
tools: libxl: do not leak diskpath during local disk attach
libxl__device_disk_local_initiate_attach is assigning dls->diskpath with a
strdup of the device path. This is then passed to the callback, e.g.
parse_bootloader_result but bootloader_cleanup will not free it.
Since the callback is within the scope of the (e)gc and therefore doesn't need
to be malloc'd, a gc'd alloc will do. All other assignments to this field use
the gc.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767295
Reported-by: Gedalya <gedalya@gedalya.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Fri, 24 Oct 2014 09:58:33 +0000 (10:58 +0100)]
xen: arm: propagate gic's #address-cells property to dom0.
The interrupt-map property requires that the interrupt-parent node
must have both #address-cells and #interrupt-cells properties (see
ePAPR 2.4.3.1). Therefore propagate the property if it is present.
We must propagate (rather than invent our own value) since this value
is used to size fields within other properties within the tree.
ePAPR strictly speaking requires that the interrupt-parent node
always has these properties. However reality has diverged from this
and implementations will recursively search parents for #*-cells
properties. Hence we only copy if it is present.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Thu, 11 Sep 2014 15:21:29 +0000 (16:21 +0100)]
xen: arm: configure correct dom0_gnttab_start/size
Vexpress is currently failing to boot for me with:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at arch/arm/mm/ioremap.c:301 __arm_ioremap_pfn_caller+0x118/0x1a4()
CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.16.0-arm-native+ #276
[<
c0011e9c>] (unwind_backtrace) from [<
c0010758>] (show_stack+0x10/0x14)
[<
c0010758>] (show_stack) from [<
c001a3ec>] (warn_slowpath_common+0x5c/0x7c)
[<
c001a3ec>] (warn_slowpath_common) from [<
c001a4c8>] (warn_slowpath_null+0x18/0x20)
[<
c001a4c8>] (warn_slowpath_null) from [<
c001488c>] (__arm_ioremap_pfn_caller+0x118/0x1a4)
[<
c001488c>] (__arm_ioremap_pfn_caller) from [<
c00149a0>] (__arm_ioremap+0x14/0x20)
[<
c00149a0>] (__arm_ioremap) from [<
c01d103c>] (gnttab_setup_auto_xlat_frames+0x30/0xdc)
[<
c01d103c>] (gnttab_setup_auto_xlat_frames) from [<
c0495324>] (xen_guest_init+0x19c/0x2d4)
[<
c0495324>] (xen_guest_init) from [<
c0492c6c>] (do_one_initcall+0xfc/0x1a4)
[<
c0492c6c>] (do_one_initcall) from [<
c0492d6c>] (kernel_init_freeable+0x58/0x1b4)
[<
c0492d6c>] (kernel_init_freeable) from [<
c039611c>] (kernel_init+0x8/0xe4)
[<
c039611c>] (kernel_init) from [<
c000de58>] (ret_from_fork+0x14/0x3c)
---[ end trace
3406ff24bd97382f ]---
xen:grant_table: Failed to ioremap gnttab share frames (addr=0x00000000b0000000)!
which is:
/*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
*/
if (WARN_ON(pfn_valid(pfn)))
return NULL;
This makes sense since the gnttab defaults to 0xb000000 and my dom0
is being allocated a 1:1 mapping at 0xa0000000-0xc0000000.
I suspect this broke around the time we stopped forcing dom0 memory to be
allocated as low as possible which happened to prevent the default dom0_gnttab
region overlapping RAM.
This patch specifies an explicit dom0_gnttab base which is explicitly unused
according to the FVP model docs (although it corresponds to CS5 this isn't
wired up to anything).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Markus Hauschild [Wed, 29 Oct 2014 11:24:32 +0000 (12:24 +0100)]
xentop: Dynamically expand some columns
Allow certain xentop columns to automatically expand as the amount
of data reported gets larger. The columns allowed to auto expand are:
NETTX(k), NETRX(k), VBD_RD, VBD_WR, VBD_RSECT, VBD_WSECT
If the -f option is used to allow full length VM names, those names will
also be aligned based on the longest name in the NAME column.
The default minimum width of all columns remains unchanged.
Signed-off-by: Markus Hauschild <Markus.Hauschild@rz.uni-regensburg.de>
Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 4 Nov 2014 12:15:58 +0000 (13:15 +0100)]
... as being more like a hypervisor extension into the guest than a
part of the tool stack.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Razvan Cojocaru [Tue, 4 Nov 2014 12:13:55 +0000 (13:13 +0100)]
x86: disable emulate.c REP optimization if introspection is active
Emulation for REP instructions is optimized to perform a single
write for all repeats in the current page if possible. However,
this interferes with a memory introspection application's ability
to detect suspect behaviour, since it will cause only one
mem_event to be sent per page touched.
This patch disables the optimization, gated on introspection
being active for the domain.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Roy Franz [Tue, 4 Nov 2014 12:13:26 +0000 (13:13 +0100)]
EFI: ignore EFI commandline, skip console setup when booted from GRUB
Update EFI code to completely ignore the EFI comnandline when booted from GRUB.
Previusly it was parsed of EFI boot specific options, but these aren't used
when booted from GRUB.
Don't do EFI console or video configuration when booted by GRUB. The EFI boot
code does some console and video initialization to support native EFI boot from
the EFI boot manager or EFI shell. This initlization should not be done when
booted using GRUB.
Update EFI documentation to indicate that it describes EFI native boot, and
does not apply at all when Xen is booted using GRUB.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Willy Tarreau [Tue, 4 Nov 2014 12:09:09 +0000 (13:09 +0100)]
lzo: check for length overrun in variable length encoding
This fix ensures that we never meet an integer overflow while adding
255 while parsing a variable length encoding. It works differently from
commit
504f70b6 ("lzo: properly check for overruns") because instead of
ensuring that we don't overrun the input, which is tricky to guarantee
due to many assumptions in the code, it simply checks that the cumulated
number of 255 read cannot overflow by bounding this number.
The MAX_255_COUNT is the maximum number of times we can add 255 to a base
count without overflowing an integer. The multiply will overflow when
multiplying 255 by more than MAXINT/255. The sum will overflow earlier
depending on the base count. Since the base count is taken from a u8
and a few bits, it is safe to assume that it will always be lower than
or equal to 2*255, thus we can always prevent any overflow by accepting
two less 255 steps.
This patch also reduces the CPU overhead and actually increases performance
by 1.1% compared to the initial code, while the previous fix costs 3.1%
(measured on x86_64).
The fix needs to be backported to all currently supported stable kernels.
Reported-by: Willem Pinckaers <willem@lekkertech.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
[original Linux commit:
72cf9012]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Willy Tarreau [Tue, 4 Nov 2014 12:08:32 +0000 (13:08 +0100)]
Revert "lzo: properly check for overruns"
This reverts commit
504f70b6 ("lzo: properly check for overruns").
As analysed by Willem Pinckaers, this fix is still incomplete on
certain rare corner cases, and it is easier to restart from the
original code.
Reported-by: Willem Pinckaers <willem@lekkertech.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
[original Linux commit:
af958a38]
Signed-off-by: Jan Beulich <jbeulich@suse.com>