xen.git
13 years agoxl/libxl: add iomem support
Matthew Fioravante [Fri, 5 Oct 2012 14:12:04 +0000 (15:12 +0100)]
xl/libxl: add iomem support

This patch adds a new option for xen config files for
directly mapping hardware io memory into a vm.

Signed-off-by: Matthew Fioravante <matthew.fioravante@jhuapl.edu>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl_json: Use libxl alloc function.
Anthony PERARD [Fri, 5 Oct 2012 13:34:30 +0000 (14:34 +0100)]
libxl_json: Use libxl alloc function.

This patch makes use of the libxl allocation API and the GC and removes the
check for allocation failure.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: Have flexarray using the GC
Anthony PERARD [Fri, 5 Oct 2012 13:34:30 +0000 (14:34 +0100)]
libxl: Have flexarray using the GC

This patch makes the flexarray function libxl__gc aware.

It also updates every function that use a flexarray to pass the gc and removes
every memory allocation check and free.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: Move gc_is_real to libxl_internal.h.
Anthony PERARD [Fri, 5 Oct 2012 13:34:29 +0000 (14:34 +0100)]
libxl: Move gc_is_real to libxl_internal.h.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl/qemu-xen: use cache=writeback for IDE and SCSI
Stefano Stabellini [Fri, 5 Oct 2012 13:34:28 +0000 (14:34 +0100)]
libxl/qemu-xen: use cache=writeback for IDE and SCSI

Change caching mode from writethrough to writeback for upstream QEMU.

After a lengthy discussion, we came up with the conclusion that
WRITEBACK is OK for IDE.
See: http://marc.info/?l=xen-devel&m=133311527009773

Given that the same reasons apply to SCSI as well, change to writeback
for SCSI too.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: make devid a type so it is initialized properly
Matthew Fioravante [Fri, 5 Oct 2012 13:34:27 +0000 (14:34 +0100)]
libxl: make devid a type so it is initialized properly

Previously device ids in libxl were treated as integers meaning they
were being initialized to 0, which is a valid device id. This patch
makes devid its own type in libxl and initializes it to -1, an invalid
value.

This fixes a bug where if you try to do a xl DEV-attach multiple
time it will continuously try to reattach device 0 instead of
generating a new device id.

Signed-off-by: Matthew Fioravante <matthew.fioravante@jhuapl.edu>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agox86/MCE: implement recoverscan for AMD
Christoph Egger [Fri, 5 Oct 2012 12:32:02 +0000 (14:32 +0200)]
x86/MCE: implement recoverscan for AMD

Implement recoverable_scan() for AMD.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: add sanity check and comments for vMCE injection
Liu, Jinsong [Fri, 5 Oct 2012 12:30:21 +0000 (14:30 +0200)]
x86: add sanity check and comments for vMCE injection

Add sanity check for input vcpu so that malicious value would not
return 0. Add comments since vcpu=-1 (broadcast) is some implicit to
code reader.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Suggested-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agofix inclusion style in public/domctl.h
Jan Beulich [Thu, 4 Oct 2012 15:11:25 +0000 (17:11 +0200)]
fix inclusion style in public/domctl.h

Public headers should include one another only via self-relative
include directives (violated by 25955:07d0d5b3a005).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/nested-svm: Update the paging mode on VMRUN and VMEXIT emulation.
Tim Deegan [Thu, 4 Oct 2012 13:20:50 +0000 (14:20 +0100)]
x86/nested-svm: Update the paging mode on VMRUN and VMEXIT emulation.

This allows Xen to walk the l1 hypervisor's shadow pagetable
correctly.  Not needed for hap-on-hap guests because they are handled
at lookup time.  Problem found with 64bit Win7 and 32bit XPMode where Win7
switches forth and back between long mode and PAE legacy pagetables.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
[Adjusted to update in all cases where the l1 vmm uses shadows]
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agoVT-d: make remap_entry_to_msi_msg() return consistent message
Jan Beulich [Thu, 4 Oct 2012 07:28:25 +0000 (09:28 +0200)]
VT-d: make remap_entry_to_msi_msg() return consistent message

During debugging of another problem I found that in x2APIC mode, the
destination field of the low address value wasn't passed back
correctly. While this is benign in most cases (as the value isn't being
used anywhere), it can be confusing (and misguiding) when printing the
value read or when comparing it to the one previously passed into the
inverse function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: consolidate frame state manipulation functions
Jan Beulich [Thu, 4 Oct 2012 07:05:24 +0000 (09:05 +0200)]
x86: consolidate frame state manipulation functions

Rather than doing this in multiple places, have a single central
function (decode_register()) to be used by all other code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: get the MWAIT idle driver in sync with the ACPI one
Jan Beulich [Thu, 4 Oct 2012 07:03:06 +0000 (09:03 +0200)]
x86: get the MWAIT idle driver in sync with the ACPI one

.. with respect to behavior when there is no HPET broadcast support
(for using the PIT broadcast instead, it requires explicitly enabling
CPU idle management).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoMAINTAINERS: Move and fix up VTPM entry
Keir Fraser [Wed, 3 Oct 2012 11:59:30 +0000 (12:59 +0100)]
MAINTAINERS: Move and fix up VTPM entry

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agoMAINTAINERS: Matthew Fioravante now maintains VTPM
Matthew Fioravante [Wed, 3 Oct 2012 10:13:54 +0000 (11:13 +0100)]
MAINTAINERS: Matthew Fioravante now maintains VTPM

Signed-off-by: Matthew Fioravante <matthew.fioravante@jhuapl.edu>
Committed-by: Keir Fraser <keir@xen.org>
13 years agotrace: trace hypercalls inside a multicall
David Vrabel [Wed, 3 Oct 2012 10:11:35 +0000 (11:11 +0100)]
trace: trace hypercalls inside a multicall

Add a trace record for every hypercall inside a multicall.  These use
a new event ID (with a different sub-class ) so they may be filtered
out if only the calls into hypervisor are of interest.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agotrace: improve usefulness of hypercall trace record
David Vrabel [Wed, 3 Oct 2012 10:11:06 +0000 (11:11 +0100)]
trace: improve usefulness of hypercall trace record

Trace hypercalls using a more useful trace record format.

The EIP field is removed (it was always somewhere in the hypercall
page) and include selected hypercall arguments (e.g., the number of
calls in a multicall, and the number of PTE updates in an mmu_update
etc.).  12 bits in the first extra word are used to indicate which
arguments are present in the record and what size they are (32 or
64-bit).

This is an incompatible record format so a new event ID is used so
tools can distinguish between the two formats.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agotrace: allow for different sub-classes of TRC_PV_* tracepoints
David Vrabel [Wed, 3 Oct 2012 10:10:33 +0000 (11:10 +0100)]
trace: allow for different sub-classes of TRC_PV_* tracepoints

We want to add additional sub-classes for TRC_PV tracepoints and to be
able to only capture these new sub-classes.  This cannot currently be
done as the existing tracepoints all use a sub-class of 0xf.

So, redefine the PV events to use a new sub-class.  All the current
tracepoints are tracing entry points to the hypervisor so the
sub-class is named TRC_PV_ENTRY.

This change does not affect xenalyze as that only looks at the main
class and the event number and does not use the sub-class field.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agox86/Intel: add further support for Ivy Bridge CPU models
Jan Beulich [Tue, 2 Oct 2012 10:14:00 +0000 (12:14 +0200)]
x86/Intel: add further support for Ivy Bridge CPU models

And some initial Haswell ones at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Nakajima, Jun" <jun.nakajima@intel.com>
13 years agoRegister Linux PV-on-HVM drivers product number.
Paul Durrant [Mon, 1 Oct 2012 19:06:31 +0000 (20:06 +0100)]
Register Linux PV-on-HVM drivers product number.

This is already in use despite never being registereed.
See XEN_IOPORT_LINUX_PRODNUM in include/xen/platform_pci.h

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoAdd a new pvdrivers header to serve as the register of product numbers.
Paul Durrant [Mon, 1 Oct 2012 19:05:33 +0000 (20:05 +0100)]
Add a new pvdrivers header to serve as the register of product numbers.

These product numbers are used by the QEMU blacklisting protocol in
traditional QEMU and are currently coded directly into the xenstore.c
source module. Since there are now multiple QEMUs this information
should be pulled into a public header to avoid duplication/conflict.
hvm-emulated-unplug.markdown has also been adjusted to reference the
new header.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxen: Remove sched_credit_default_yield option
George Dunlap [Mon, 1 Oct 2012 19:03:19 +0000 (20:03 +0100)]
xen: Remove sched_credit_default_yield option

The sched_credit_default_yield option was added when the behavior of
"SCHEDOP_yield" was changed in 4.1, to allow any users who had
problems to revert to the old behavior.  The new behavior has been in
Xen.org xen since 4.1, and in XenServer even longer, and there is no
evidence of anyone having trouble with it.  Remove the option.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxen/console: introduce a 'w' debug-key that dumps the console ring
Matt Wilson [Mon, 1 Oct 2012 19:02:45 +0000 (20:02 +0100)]
xen/console: introduce a 'w' debug-key that dumps the console ring

This patch adds a new 'w' debug-key, chosen from the limited remaining
keys only due to its proximity to 'q', that dumps the console ring to
configured console devices. It's useful to for tracking down how an
unresponsive system got into a broken state via serial console.

Signed-off-by: Matt Wilson <msw@amazon.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agohvmloader: Add 64 bits big bar support
Xiantao Zhang [Mon, 1 Oct 2012 19:01:55 +0000 (20:01 +0100)]
hvmloader: Add 64 bits big bar support

Currently it is assumed PCI device BAR access < 4G memory. If there is
such a device whose BAR size is larger than 4G, it must access > 4G
memory address.  This patch enable the 64bits big BAR support on
hvmloader.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agodocs: initial documentation for xenstore paths
Ian Campbell [Mon, 1 Oct 2012 16:54:11 +0000 (17:54 +0100)]
docs: initial documentation for xenstore paths

This is based upon my inspection of a system with a single PV domain
and a single HVM domain running and is therefore very incomplete.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agodocs: Document scheduler-related Xen command-line options
George Dunlap [Mon, 1 Oct 2012 16:49:01 +0000 (17:49 +0100)]
docs: Document scheduler-related Xen command-line options

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agox86: replace literal numbers
Jan Beulich [Fri, 28 Sep 2012 08:59:41 +0000 (10:59 +0200)]
x86: replace literal numbers

In various cases, 256 was being used instead of NR_VECTORS or a derived
ARRAY_SIZE() expression. In one case (guest_has_trap_callback()), a
wrong (unrelated) constant was used instead of NR_VECTORS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoRevert 25960:6bf8b882df8f (x86: default-disable MWAIT-based idle driver ...)
Jan Beulich [Fri, 28 Sep 2012 07:36:32 +0000 (09:36 +0200)]
Revert 25960:6bf8b882df8f (x86: default-disable MWAIT-based idle driver ...)

The problem this was working around should be resolved with c/s
25961:6a5812129094 (x86/HPET: don't disable interrupt delivery right
after setting it up).

13 years agox86/ucode: fix Intel case of resume handling on boot CPU
Jan Beulich [Fri, 28 Sep 2012 07:28:11 +0000 (09:28 +0200)]
x86/ucode: fix Intel case of resume handling on boot CPU

Checking the stored version doesn't tell us anything about the need to
apply the update (during resume, what is stored doesn't necessarily
match what is loaded).

Note that the check can be removed altogether because once switched to
use what was read from the CPU (uci->cpu_sig.rev, as used in the
subsequent pr_debug()), it would become redundant with the checks that
lead to microcode_update_match() returning the indication that an
update should be applied.

Note further that this was not an issue on APs since they start with
uci->mc.mc_intel being NULL.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Ben Guthro <ben@guthro.net>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: remove further code applicable to 32-bit CPUs only
Jan Beulich [Fri, 28 Sep 2012 07:26:46 +0000 (09:26 +0200)]
x86: remove further code applicable to 32-bit CPUs only

On the AMD side, anything prior to family 0xf can now be ignored, as
well as very low model numbers of family 6 on the Intel side.

Apart from that, there were several made up CPU features that turned
out entirely unused throughout the tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/HPET: don't needlessly set up channels for broadcast
Jan Beulich [Fri, 28 Sep 2012 07:25:42 +0000 (09:25 +0200)]
x86/HPET: don't needlessly set up channels for broadcast

When there are more FSB delivery capable HPET channels than CPU cores
(or threads), we can simply use a dedicated channel per CPU. This
avoids wasting the resources to handle the excess channels (including
the pointless triggering of the respective interrupt on each
wraparound) as well as the ping-pong of the interrupts' affinities
(when getting assigned to different CPUs).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/IRQ: fix valid-old-vector checks in __assign_irq_vector()
Jan Beulich [Fri, 28 Sep 2012 07:23:34 +0000 (09:23 +0200)]
x86/IRQ: fix valid-old-vector checks in __assign_irq_vector()

There are two greater-than-zero checks for the old vector retrieved,
which don't work when a negative value got stashed into the respective
arch_irq_desc field. The effect of this was that for interrupts that
are intended to get their affinity adjusted the first time before the
first interrupt occurs, the affinity change would fail, because the
original vector assignment would have caused the move_in_progress flag
to get set (which causes subsequent re-assignments to fail until it
gets cleared, which only happens from the ->ack() actor, i.e. when an
interrupt actually occurred).

This addresses a problem introduced in c/s 23816:7f357e1ef60a (by
changing IRQ_VECTOR_UNASSIGNED from 0 to -1).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/HPET: don't disable interrupt delivery right after setting it up
Jan Beulich [Fri, 28 Sep 2012 07:22:14 +0000 (09:22 +0200)]
x86/HPET: don't disable interrupt delivery right after setting it up

We shouldn't clear HPET_TN_FSB right after we (indirectly, via
request_irq()) enabled it for the channels we intend to use for
broadcasts.

This fixes a regression introduced by c/s 25103:0b0e42dc4f0a.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: default-disable MWAIT-based idle driver for CPUs without ARAT
Jan Beulich [Wed, 26 Sep 2012 15:11:39 +0000 (17:11 +0200)]
x86: default-disable MWAIT-based idle driver for CPUs without ARAT

Without ARAT, and apparently only when using HPET broadcast mode as
replacement, CPUs occasionally fail to wake up, causing the system to
(transiently) hang. Until the reason is understood, disable the driver
on such systems.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: Expose TSC adjust to HVM guest
Liu, Jinsong [Wed, 26 Sep 2012 10:14:30 +0000 (12:14 +0200)]
x86: Expose TSC adjust to HVM guest

Intel latest SDM (17.13.3) release a new MSR CPUID.7.0.EBX[1]=1
indicates TSC_ADJUST MSR 0x3b is supported.

This patch expose it to hvm guest.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: Save/restore TSC adjust during HVM guest migration
Liu, Jinsong [Wed, 26 Sep 2012 10:13:38 +0000 (12:13 +0200)]
x86: Save/restore TSC adjust during HVM guest migration

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: Implement TSC adjust feature for HVM guest
Liu, Jinsong [Wed, 26 Sep 2012 10:12:42 +0000 (12:12 +0200)]
x86: Implement TSC adjust feature for HVM guest

IA32_TSC_ADJUST MSR is maintained separately for each logical
processor. A logical processor maintains and uses the IA32_TSC_ADJUST
MSR as follows:
1). On RESET, the value of the IA32_TSC_ADJUST MSR is 0;
2). If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds
    (or subtracts) value X from the TSC, the logical processor also
    adds (or subtracts) value X from the IA32_TSC_ADJUST MSR;
3). If an execution of WRMSR to the IA32_TSC_ADJUST MSR adds (or
    subtracts) value X from that MSR, the logical processor also adds
    (or subtracts) value X from the TSC.

This patch provides tsc adjust support for hvm guest, with it guest OS
would be happy when sync tsc.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86/vMCE: Add AMD support
Christoph Egger [Wed, 26 Sep 2012 10:07:42 +0000 (12:07 +0200)]
x86/vMCE: Add AMD support

Add vMCE support for AMD. Add vmce namespace to Intel specific vMCE MSR
functions. Move vMCE prototypes from mce.h to vmce.h.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
- fix inverted return values from vmce_amd_{rd,wr}msr()
- remove bogus printk()-s from those functions

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: vMCE save and restore
Liu, Jinsong [Wed, 26 Sep 2012 10:05:55 +0000 (12:05 +0200)]
x86: vMCE save and restore

This patch provides vMCE save/restore when migration.
1. MCG_CAP is well-defined. However, considering future cap extension,
   we keep save/restore logic that Jan implement at c/s 24887;
2. MCi_CTL2 initialized by guestos when booting, so need save/restore
   otherwise guest would surprise;
3. Other MSRs do not need save/restore since they are either error-
   related and pointless to save/restore, or, unified among all vMCE
   platform;

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
- fix handling of partial data in XEN_DOMCTL_set_ext_vcpucontext
- fix adjustment of xen_domctl_ext_vcpucontext

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: vMCE injection
Liu, Jinsong [Wed, 26 Sep 2012 10:05:10 +0000 (12:05 +0200)]
x86: vMCE injection

In our test for win8 guest mce, we find a bug that no matter what
SRAO/SRAR error xen inject to win8 guest, it always reboot.

The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0,
this is not correct for Intel MCE (Under Intel arch, h/w generate MCE#
to all CPUs).

This patch fixes vMCE injection bug, injecting vMCE# to all vcpus on
Intel platforms.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
- increase flexibility be making new second argument of inject_vmce() a
  VCPU ID rather than just a boolean

Acked-by: Christoph Egger <Christoph.Egger@amd.com> (on just this change)
- fix condition evaluation order in inject_vmce()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: vMCE emulation
Liu, Jinsong [Wed, 26 Sep 2012 10:04:00 +0000 (12:04 +0200)]
x86: vMCE emulation

This patch provides virtual MCE support to guest. It emulates a simple
and clean MCE MSRs interface to guest by faking caps to guest if needed
and masking caps if unnecessary:
1. Providing a well-defined MCG_CAP to guest, filter out un-necessary
   caps and provide only guest needed caps;
2. Disabling MCG_CTL to avoid model specific;
3. Sticking all 1's to MCi_CTL to guest to avoid model specific;
4. Enabling CMCI cap but never really inject to guest to prevent
   polling periodically;
5. Masking MSCOD field of MCi_STATUS to avoid model specific;
6. Keeping natural semantics by per-vcpu instead of per-domain
   variables;
7. Using bank1 and reserving bank0 to work around 'bank0 quirk' of some
   very old processors;
8. Cleaning some vMCE# injection logic which shared by Intel and AMD
   but useless under new vMCE implement;
9. Keeping compatilbe w/ old xen version which has been backported to
   SLES11 SP2, so that old vMCE would not blocked when migrate to new
   vMCE;

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
- make printing consistent (and non-exploitable)
- fix return values of intel_mce_{rd,wr}msr() for out of range banks
- miscellaneous cleanup

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: check remote MMIO remap permissions
Daniel De Graaf [Wed, 26 Sep 2012 09:56:07 +0000 (11:56 +0200)]
x86: check remote MMIO remap permissions

When a domain is mapping pages from a different pg_owner domain, the
iomem_access checks are currently only applied to the pg_owner domain,
potentially allowing a domain with a more restrictive iomem_access
policy to have the pages mapped into its page tables. To catch this,
also check the owner of the page tables. The current domain does not
need to be checked because the ability to manipulate a domain's page
tables implies full access to the target domain, so checking that
domain's permission is sufficient.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: slightly improve stack trace on debug builds
Jan Beulich [Wed, 26 Sep 2012 09:53:38 +0000 (11:53 +0200)]
x86: slightly improve stack trace on debug builds

As was rather obvious from crashes recently happening in stage testing,
the debug hypervisor, in that special case, has a drawback compared to
the non-debug one: When a call through a bad pointer happens, there's
no frame, and the top level (and frequently most important for
analysis) stack entry would get skipped:

(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<0000000000000000>] ???
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000008   rbx: 0000000000000001   rcx: 0000000000000003
(XEN) rdx: 0000003db54eb700   rsi: 7fffffffffffffff   rdi: 0000000000000001
(XEN) rbp: ffff8302357e7ee0   rsp: ffff8302357e7e58   r8:  0000000000000000
(XEN) r9:  000000000000003e   r10: ffff8302357e7f18   r11: ffff8302357e7f18
(XEN) r12: ffff8302357ee340   r13: ffff82c480263980   r14: ffff8302357ee3d0
(XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000bf473000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff8302357e7e58:
(XEN)    ffff82c4801a3d05 ffff8302357eca70 0000000800000020 ffff82c4802ead60
(XEN)    0000000000000001 ffff8302357e7ea0 ffff82c48016bf07 0000000000000000
(XEN)    0000000000000000 ffff8302357e7ee0 fffff830fffff830 0000000000000046
(XEN)    ffff8302357e7f18 ffff82c480263980 ffff8302357e7f18 0000000000000000
(XEN)    0000000000000000 ffff8302357e7f10 ffff82c48015c2be 8302357dc0000fff
...
(XEN) Xen call trace:
(XEN)    [<0000000000000000>] ???
(XEN)    [<ffff82c48015c2be>] idle_loop+0x6c/0x7a
(XEN)
(XEN) Pagetable walk from 0000000000000000:

Since the bad pointer is being printed anyway (as part of the register
state), replace it with the top of stack value in such a case.

With the introduction of is_active_kernel_text(), use it also at the
(few) other suitable places (I intentionally didn't replace the use in
xen/arch/arm/mm.c - while it would be functionally correct, the
dependency on system_state wouldn't be from an abstract perspective).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: clean up interrupt stub generation
Jan Beulich [Wed, 26 Sep 2012 09:52:03 +0000 (11:52 +0200)]
x86: clean up interrupt stub generation

Apart from moving some code that is only used here from the header file
to the actual source one, this also
- moves interrupt[] into .init.data,
- prevents generating (unused) stubs for vectors below
  FIRST_DYNAMIC_VECTOR, and
- shortens and sanitizes the names of the stubs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: slightly streamline __prepare_to_wait() inline assembly
Jan Beulich [Wed, 26 Sep 2012 09:51:27 +0000 (11:51 +0200)]
x86: slightly streamline __prepare_to_wait() inline assembly

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: use compiler visible "add" instead of inline assembly "or" in get_cpu_info()
Jan Beulich [Wed, 26 Sep 2012 09:49:56 +0000 (11:49 +0200)]
x86: use compiler visible "add" instead of inline assembly "or" in get_cpu_info()

This follows the same idea as the previous patch, just that the effect
is much more visible here: With a half-way [dr]ecent gcc this reduced
.text size by over 12k for me.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: enhance rsp-relative calculations
Jan Beulich [Wed, 26 Sep 2012 09:48:21 +0000 (11:48 +0200)]
x86: enhance rsp-relative calculations

The use of "or" in GET_CPUINFO_FIELD so far wasn't ideal, as it doesn't
lend itself to folding this operation with a possibly subsequent one
(e.g. the well known mov+add=lea conversion). Split out the sub-
operations, and shorten assembly code slightly with this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agodocs: network network diagrams for the wiki (figs)
Ian Jackson [Tue, 25 Sep 2012 17:45:04 +0000 (18:45 +0100)]
docs: network network diagrams for the wiki (figs)

Add the figs in hg as well as git.  Sorry (again)!

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agodocs: network network diagrams for the wiki (Makefile)
Ian Jackson [Tue, 25 Sep 2012 17:39:39 +0000 (18:39 +0100)]
docs: network network diagrams for the wiki (Makefile)

Add the Makefile in hg as well as git.  Sorry.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agodocs: network diagrams for the wiki
Ian Jackson [Tue, 25 Sep 2012 17:31:20 +0000 (18:31 +0100)]
docs: network diagrams for the wiki

We provide two new diagrams
  docs/figs/network-{bridge,basic}.fig
which are converted to pngs by the Makefiles and intended for
consumption by http://wiki.xen.org/wiki/Xen_Networking.

This is perhaps not the ideal location for this source code but we
don't have a better one.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agotools: bump SONAMEs for changes during 4.2 development cycle.
Ian Campbell [Tue, 25 Sep 2012 12:40:00 +0000 (13:40 +0100)]
tools: bump SONAMEs for changes during 4.2 development cycle.

We mostly did this as we went along, only a couple of minor number
bumps were missed http://marc.info/?l=xen-devel&m=134366054929255&w=2:
 - Bumped libxl from 1.0.0 -> 1.0.1
 - Bumped libxenstore from 3.0.1 -> 3.0.2

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxl: resume the domain on suspend failure
Bastian Blank [Tue, 25 Sep 2012 10:03:51 +0000 (11:03 +0100)]
xl: resume the domain on suspend failure

The MUST macro calls exit(3) on failure but we need to cleanup and
resume.

Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agopygrub: always append --args
Olaf Hering [Tue, 25 Sep 2012 10:03:51 +0000 (11:03 +0100)]
pygrub: always append --args

If a bootloader entry in menu.lst has no additional kernel command line
options listed and the domU.cfg has 'bootargs="--args=something"' the
additional arguments from the config file are not passed to the kernel.
The reason for that incorrect behaviour is that run_grub appends arg
only if the parsed config file has arguments listed.

Fix this by appending args from image section and the config file separatly.
To avoid adding to a NoneType initialize grubcfg['args'] to an empty string.
This does not change behaviour but simplifies the code which appends the
string.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agox86/S3: add cache flush on secondary CPUs before going to sleep
Ben Guthro [Tue, 25 Sep 2012 06:38:14 +0000 (08:38 +0200)]
x86/S3: add cache flush on secondary CPUs before going to sleep

Secondary CPUs, between doing their final memory writes (particularly
updating cpu_initialized) and getting a subsequent INIT, may not write
back all modified data. The INIT itself then causes those modifications
to be lost, so in the cpu_initialized case the CPU would find itself
already initialized, (intentionally) entering an infinite loop instead
of actually coming online.

Signed-off-by: Ben Guthro <ben@guthro.net>
Make acpi_dead_idle() call default_dead_idle() rather than duplicating
the logic there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: fix MWAIT-based idle driver for CPUs without ARAT
Jan Beulich [Tue, 25 Sep 2012 06:36:33 +0000 (08:36 +0200)]
x86: fix MWAIT-based idle driver for CPUs without ARAT

lapic_timer_{on,off} need to get initialized in this case. This in turn
requires getting HPET broadcast setup to be carried out earlier (and
hence preventing double initialization there).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: enable VIA CPU support
Jan Beulich [Fri, 21 Sep 2012 15:02:46 +0000 (17:02 +0200)]
x86: enable VIA CPU support

Newer VIA CPUs have both 64-bit and VMX support. Enable them to be
recognized for these purposes, at once stripping off any 32-bit CPU
only bits from the respective CPU support file, and adding 64-bit ones
found in recent Linux.

This particularly implies untying the VMX == Intel assumption in a few
places.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: eliminate code affecting only 64-bit-incapable CPUs
Jan Beulich [Fri, 21 Sep 2012 13:20:21 +0000 (15:20 +0200)]
x86: eliminate code affecting only 64-bit-incapable CPUs

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoprintk: prefer %#x et at over 0x%x
Jan Beulich [Fri, 21 Sep 2012 12:25:12 +0000 (14:25 +0200)]
printk: prefer %#x et at over 0x%x

Performance is not an issue with printk(), so let the function do
minimally more work and instead save a byte per affected format
specifier.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: introduce MWAIT-based, ACPI-less CPU idle driver
Jan Beulich [Fri, 21 Sep 2012 11:47:18 +0000 (13:47 +0200)]
x86: introduce MWAIT-based, ACPI-less CPU idle driver

This is a port of Linux'es intel-idle driver serving the same purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agocpuidle: remove unused latency_ticks member
Jan Beulich [Fri, 21 Sep 2012 11:45:08 +0000 (13:45 +0200)]
cpuidle: remove unused latency_ticks member

... and code used only for initializing it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agointroduce guest_handle_for_field()
Jan Beulich [Thu, 20 Sep 2012 11:31:19 +0000 (13:31 +0200)]
introduce guest_handle_for_field()

This helper turns a field of a GUEST_HANDLE in a GUEST_HANDLE.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agoACPI: move tables.c fully into .init.*
Jan Beulich [Thu, 20 Sep 2012 07:22:55 +0000 (09:22 +0200)]
ACPI: move tables.c fully into .init.*

The only non-init item was the space reserved for the initial tables,
but we can as well dynamically allocate that array.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: tighten checks in XEN_DOMCTL_memory_mapping handler
Jan Beulich [Thu, 20 Sep 2012 07:21:53 +0000 (09:21 +0200)]
x86: tighten checks in XEN_DOMCTL_memory_mapping handler

Properly checking the MFN implies knowing the physical address width
supported by the platform, so to obtain this consistently the
respective code gets moved out of the MTRR subdir.

Btw., the model specific workaround in that code is likely unnecessary
- I believe those CPU models don't support 64-bit mode. But I wasn't
able to formally verify this, so I preferred to retain that code for
now.

But domctl code here also was lacking other error checks (as was,
looking at it again from that angle) the XEN_DOMCTL_ioport_mapping one.
Besides adding the missing checks, printing is also added for the case
where revoking access permissions didn't work (as that may have
implications for the host operator, e.g. wanting to not pass through
affected devices to another guest until the one previously using them
did actually die).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/IO-APIC: streamline level ack/end handling
Jan Beulich [Thu, 20 Sep 2012 07:20:30 +0000 (09:20 +0200)]
x86/IO-APIC: streamline level ack/end handling

Rather than evaluating "ioapic_ack_new" on each invocation, and
considering that the two methods really have almost no code in common,
split the handlers.

While at it, also move ioapic_ack_{new,forced} into .init.data
(eliminating the single non-__init reference to the former).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agotmem: bump pool version to 1 to fix restore issue when tmem enabled
Zhenzhong Duan [Wed, 19 Sep 2012 15:38:47 +0000 (17:38 +0200)]
tmem: bump pool version to 1 to fix restore issue when tmem enabled

Restore fails when tmem is enabled both in hypervisor and guest. This
is due to spec version mismatch when restoring a pool.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
13 years agox86: remove open-coded IO-APIC RTE reads/writes
Jan Beulich [Wed, 19 Sep 2012 07:30:50 +0000 (09:30 +0200)]
x86: remove open-coded IO-APIC RTE reads/writes

This improves readability, not the least through doing away with a
couple of ugly casts.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86: properly check XEN_DOMCTL_ioport_mapping arguments for invalid range
Jan Beulich [Wed, 19 Sep 2012 07:27:55 +0000 (09:27 +0200)]
x86: properly check XEN_DOMCTL_ioport_mapping arguments for invalid range

In particular, the case of "np" being a very large value wasn't handled
correctly. The range start checks also were off by one (except that in
practice, when "np" is properly range checked, this would still have
been caught by the range end checks).

Also, is a GFN wrap in XEN_DOMCTL_memory_mapping really okay?

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/ACPI: fix error indication from acpi_parse_madt_lapic_entries()
Jan Beulich [Wed, 19 Sep 2012 07:26:26 +0000 (09:26 +0200)]
x86/ACPI: fix error indication from acpi_parse_madt_lapic_entries()

If the legacy APIC invocation of acpi_table_parse_madt() succeeds but
the x2APIC counterpart fails, this is regarded as failure by the
function, yet its return value would indicate success.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoxsm/flask: add domain relabel support
Daniel De Graaf [Mon, 17 Sep 2012 20:12:21 +0000 (21:12 +0100)]
xsm/flask: add domain relabel support

This adds the ability to change a domain's XSM label after creation.
The new label will be used for all future access checks; however,
existing event channels and memory mappings will remain valid even if
their creation would be denied by the new label.

With appropriate security policy and hooks in the domain builder, this
can be used to create domains that the domain builder does not have
access to after building. It can also be used to allow a domain to
drop privileges - for example, prior to launching a user-supplied
kernel loaded by a pv-grub stubdom.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxsm/flask: remove unneeded create_sid field
Daniel De Graaf [Mon, 17 Sep 2012 20:10:39 +0000 (21:10 +0100)]
xsm/flask: remove unneeded create_sid field

This field was only used to populate the ssid of dom0, which can be
handled explicitly in the domain creation hook. This also removes the
unnecessary permission check on the creation of dom0.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxsm/flask: remove inherited class attributes
Daniel De Graaf [Mon, 17 Sep 2012 20:10:07 +0000 (21:10 +0100)]
xsm/flask: remove inherited class attributes

The ability to declare common permission blocks shared across multiple
classes is not currently used in Xen. Currently, support for this
feature is broken in the header generation scripts, and it is not
expected that this feature will be used in the future, so remove the
dead code.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxen: add virtual x2apic support for apicv
Jiongxi Li [Mon, 17 Sep 2012 20:06:02 +0000 (21:06 +0100)]
xen: add virtual x2apic support for apicv

basically to benefit from apicv, we need clear MSR bitmap for
corresponding x2apic MSRs:
  0x800 - 0x8ff: no read intercept for apicv register virtualization
  TPR,EOI,SELF-IPI: no write intercept for virtual interrupt
    delivery

Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxen: enable Virtual-interrupt delivery
Jiongxi Li [Mon, 17 Sep 2012 20:05:11 +0000 (21:05 +0100)]
xen: enable Virtual-interrupt delivery

Virtual interrupt delivery avoids Xen to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:
For pending interrupt from vLAPIC, instead of direct injection, we may
need update architecture specific indicators before resuming to guest.
Before returning to guest, RVI should be updated if any pending IRRs
EOI exit bitmap controls whether an EOI write should cause VM-Exit. If
set, a trap-like induced EOI VM-Exit is triggered. The approach here
is to manipulate EOI exit bitmap based on value of TMR. Level
triggered irq requires a hook in vLAPIC EOI write, so that vIOAPIC EOI
is triggered and emulated

Signed-off-by: Gang Wei <gang.wei@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoxen: enable APIC-Register Virtualization
Jiongxi Li [Mon, 17 Sep 2012 20:04:08 +0000 (21:04 +0100)]
xen: enable APIC-Register Virtualization

Add APIC register virtualization support
 - APIC read doesn't cause VM-Exit
 - APIC write becomes trap-like

Signed-off-by: Gang Wei <gang.wei@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
13 years agoMCE: use new common mce handler on AMD CPUs
Christoph Egger [Mon, 17 Sep 2012 16:57:24 +0000 (17:57 +0100)]
MCE: use new common mce handler on AMD CPUs

Factor common machine check handler out of intel specific code
and move it into common files.
Replace old common mce handler with new one and use it on AMD CPUs.
No functional changes on Intel side.
While here fix some whitespace nits and comments.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Keir Fraser <keir@xen.org>
13 years agomem_event: fix regression affecting CR3, CR4 memory events
Steven Maresca [Mon, 17 Sep 2012 16:55:12 +0000 (17:55 +0100)]
mem_event: fix regression affecting CR3, CR4 memory events

This is a patch repairing a regression in code previously functional
in 4.1.x. It appears that, during some refactoring work, calls to
hvm_memory_event_cr3 and hvm_memory_event_cr4 were lost.

These functions were originally called in mov_to_cr() of vmx.c, but
the commit  http://xenbits.xen.org/hg/xen-unstable.hg/rev/1276926e3795
abstracted the original code into generic functions up a level in
hvm.c, dropping these calls in the process.

Signed-off-by: Steven Maresca <steve@zentific.com>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Keir Fraser <keir@xen.org>
13 years agoExtra check in grant table code for mapping of shared frame
Andres Lagar-Cavilla [Mon, 17 Sep 2012 16:51:57 +0000 (17:51 +0100)]
Extra check in grant table code for mapping of shared frame

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Committed-by: Keir Fraser <keir@xen.org>
13 years agotools: drop ia64 only foreign structs from headers
Ian Campbell [Mon, 17 Sep 2012 10:17:05 +0000 (11:17 +0100)]
tools: drop ia64 only foreign structs from headers

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years ago.*ignore: drop ia64 entries
Ian Campbell [Mon, 17 Sep 2012 10:17:04 +0000 (11:17 +0100)]
.*ignore: drop ia64 entries

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoFix libxenstore memory leak when USE_PTHREAD is not defined
Andres Lagar-Cavilla [Mon, 17 Sep 2012 10:17:03 +0000 (11:17 +0100)]
Fix libxenstore memory leak when USE_PTHREAD is not defined

Redefine usage of pthread_cleanup_push and _pop, to explicitly call free for
heap objects in error paths.

By the way, set a suitable errno value for an error path that had none.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxl: Remove global domid and enable -Wshadow
Ian Campbell [Mon, 17 Sep 2012 10:17:02 +0000 (11:17 +0100)]
xl: Remove global domid and enable -Wshadow

Lots of functions loop over a list of domain and others take a domid as
a parameter, shadowing the global one and leading to all sorts of
confusion.

Therefore remove the global domid and explicitly pass it around as
necessary.

Adds a domid to the parameters for many functions and switches many
others from taking a char * domain specifier to taking a domid, pushing
the domid lookup to the toplevel.

Replaces some open-coded domain_qualifier_to_domid error checking with
find_domain.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- annotate find_domain() with warn_unused_result and fix the
         handful of errors. ]
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxl: prepare to enable Wshadow
Ian Campbell [Mon, 17 Sep 2012 10:17:01 +0000 (11:17 +0100)]
xl: prepare to enable Wshadow

Takes care of everything other than the global domid clashes.

Avoid galobal functions
  - stime(2)
  - time(2)

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: Enable -Wshadow.
Ian Campbell [Mon, 17 Sep 2012 10:17:00 +0000 (11:17 +0100)]
libxl: Enable -Wshadow.

It was convenient to invent $(CFLAGS_LIBXL) to do this.

Various renamings to avoid shadowing standard functions:
  - index(3)
  - listen(2)
  - link(2)
  - abort(3)
  - abs(3)

Reduced the scope of some variables to avoid conflicts.

Change to libxc is due to the nested hypercall buf macros in
set_xen_guest_handle (used in libxl) using the same local private vars.

Build tested only.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxl: free libxl context, logger and lockfile using atexit handler
Ian Campbell [Mon, 17 Sep 2012 10:16:59 +0000 (11:16 +0100)]
xl: free libxl context, logger and lockfile using atexit handler

xl frequently just calls exit(3), especially on error. Try to clean
up some of our global state to make tools like valgrind more useful.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxenpm: make argument parsing and error handling more consistent
Jan Beulich [Mon, 17 Sep 2012 08:09:59 +0000 (10:09 +0200)]
xenpm: make argument parsing and error handling more consistent

Specifically, what values are or aren't accepted as CPU identifier, and
how the values get interpreted should be consistent across sub-commands
(intended behavior now: non-negative values are okay, and along with
omitting the argument, specifying "all" will also be accepted).

For error handling, error messages should get consistently issued to
stderr, and the tool should now (hopefully) produce an exit code of
zero only in the (partial) success case (there may still be a small
number of questionable cases).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agohvmloader: Do not zero the wallclock fields in shared-info.
Keir Fraser [Fri, 14 Sep 2012 18:47:57 +0000 (19:47 +0100)]
hvmloader: Do not zero the wallclock fields in shared-info.

These fields need to be valid at all times. Hypervisor ensures this
even across 32/64-bit guest transitions.

This fixes a bug where wallclock time is incorrect for booting 32-bit
HVM guests.

This should be backported to Xen 4.1 and 4.2.

Signed-off-by: Keir Fraser <keir@xen.org>
Tested-and-Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agox86/hvm: mark save/restore registration code __init
Jan Beulich [Fri, 14 Sep 2012 12:30:23 +0000 (14:30 +0200)]
x86/hvm: mark save/restore registration code __init

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/hvm: constify static data where possible
Jan Beulich [Fri, 14 Sep 2012 12:28:59 +0000 (14:28 +0200)]
x86/hvm: constify static data where possible

In a few cases this also extends to making them static in the first
place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agox86/hvm: don't use indirect calls without need
Jan Beulich [Fri, 14 Sep 2012 12:25:22 +0000 (14:25 +0200)]
x86/hvm: don't use indirect calls without need

Direct calls perform better, so we should prefer them and use indirect
ones only when there indeed is a need for indirection.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoVT-d: use msi_compose_msg()
Jan Beulich [Fri, 14 Sep 2012 12:20:08 +0000 (14:20 +0200)]
VT-d: use msi_compose_msg()

... instead of open coding it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoamd iommu: use base platform MSI implementation
Jan Beulich [Fri, 14 Sep 2012 12:17:26 +0000 (14:17 +0200)]
amd iommu: use base platform MSI implementation

Given that here, other than for VT-d, the MSI interface gets surfaced
through a normal PCI device, the code should use as much as possible of
the "normal" MSI support code.

Further, the code can (and should) follow the "normal" MSI code in
distinguishing the maskable and non-maskable cases at the IRQ
controller level rather than checking the respective flag in the
individual actors.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Keir Fraser <keir@xen.org>
13 years agolibxl: Tolerate xl config files missing trailing newline
Ian Jackson [Fri, 14 Sep 2012 09:25:15 +0000 (10:25 +0100)]
libxl: Tolerate xl config files missing trailing newline

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: Fix missing dependency in api check rule
Ian Jackson [Fri, 14 Sep 2012 09:02:52 +0000 (10:02 +0100)]
libxl: Fix missing dependency in api check rule

Without this, the api check cpp run might happen before the various
autogenerated files which are #include by libxl.h are ready.

We need to remove the api-ok file from AUTOINCS to avoid a circular
dependency.  Instead, we list it explicitly as a dependency of the
object files.  The result is that the api check is the last thing to
be done before make considers the preparation done and can start work
on compiling .c files into .o's.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agodocs: flesh out xl.cfg documentation, correct typos, reorganize
Matt Wilson [Fri, 14 Sep 2012 09:02:51 +0000 (10:02 +0100)]
docs: flesh out xl.cfg documentation, correct typos, reorganize

Some highlights:
 * Correct some markup errors:
       Around line 663:
           '=item' outside of any '=over'
       Around line 671:
           You forgot a '=back' before '=head3'
 * Add documentation for msitranslate, power_mgnt, acpi_s3, aspi_s4,
   gfx_passthru, nomigrate, etc.
 * Reorganize items in "unclassified" sections like cpuid,
   gfx_passthru to where they belong
 * Correct link L<> references so they can be resolved within the
   document
 * Remove placeholders for deprecated options device_model and vif2
 * Remove placeholder for "sched" and "node", as these are options for
   cpupool configuration. Perhaps cpupool configuration deserves
   a section in this document.
 * Rename "global" options to "general"
 * Add section headers to group general VM options.

Signed-off-by: Matt Wilson <msw@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxentop.c: Change curses painting behavior to avoid flicker
Jason McCarver [Fri, 14 Sep 2012 09:02:51 +0000 (10:02 +0100)]
xentop.c: Change curses painting behavior to avoid flicker

Currently, xentop calls clear() before drawing the screen and calling
refresh().  This causes the entire screen to be repainted from scratch
on each call to refresh().  It is inefficient and causes visible flicker
when using xentop.

This patch fixes this by calling erase() instead of clear() which overwrites
the current screen with blanks instead.  The screen is then drawn as usual
in the top() function and refresh() is called.  This method allows curses
to only repaint the characters that have changed since the last call
to refresh(), thus avoiding the flicker and sending fewer characters to
the terminal.

In the event the screen becomes corrupted, this patch accepts a CTRL-L
keystroke from the user which will call clear() and force a repaint of
the entire screen.

Signed-off-by: Jason McCarver <slam@parasite.cc>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxl: do not leak cpupool names.
Ian Campbell [Fri, 14 Sep 2012 09:02:50 +0000 (10:02 +0100)]
xl: do not leak cpupool names.

Valgrind reports:
==3076== 7 bytes in 1 blocks are definitely lost in loss record 1 of 1
==3076==    at 0x402458C: malloc (vg_replace_malloc.c:270)
==3076==    by 0x406F86D: libxl_cpupoolid_to_name (libxl_utils.c:102)
==3076==    by 0x8058742: parse_config_data (xl_cmdimpl.c:639)
==3076==    by 0x805BD56: create_domain (xl_cmdimpl.c:1838)
==3076==    by 0x805DAED: main_create (xl_cmdimpl.c:3903)
==3076==    by 0x804D39D: main (xl.c:285)

And indeed there are several places where xl uses
libxl_cpupoolid_to_name as a boolean to test if the pool name is
valid and leaks the name if it is. Introduce an is_valid helper and
use that instead.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Juergen Gross<juergen.gross@ts.fujitsu.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agoxl: error if vif backend!=0 is used with run_hotplug_scripts
Roger Pau Monne [Fri, 14 Sep 2012 09:02:49 +0000 (10:02 +0100)]
xl: error if vif backend!=0 is used with run_hotplug_scripts

Print an error and exit if backend!=0 is used in conjunction with
run_hotplug_scripts. Currently libxl can only execute hotplug scripts
from the toolstack domain (the same domain xl is running from).

Added a description and workaround of this issue on
xl-network-configuration.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: fix usage of backend parameter and run_hotplug_scripts
Roger Pau Monne [Fri, 14 Sep 2012 09:02:48 +0000 (10:02 +0100)]
libxl: fix usage of backend parameter and run_hotplug_scripts

vif interfaces allows the user to specify the domain that should run
the backend (also known as driver domain) using the 'backend'
parameter. This is not compatible with run_hotplug_scripts=1, since
libxl can only run the hotplug scripts from the Domain 0.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibfsimage: add ext4 support for CentOS 5.x
Roger Pau Monne [Fri, 14 Sep 2012 09:02:47 +0000 (10:02 +0100)]
libfsimage: add ext4 support for CentOS 5.x

CentOS 5.x forked e2fs ext4 support into a different package called
e4fs, and so headers and library names changed from ext2fs to ext4fs.
Check if ext4fs/ext2fs.h and -lext4fs work, and use that instead of
ext2fs to build libfsimage. This patch assumes that if the ext4fs
library is present it should always be used instead of ext2fs.

This patch includes a rework of the ext2fs check, a new ext4fs check
and a minor modification in libfsimage to use the correct library.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
13 years agolibxl: handle errors from xc_sharing_* info functions
Ian Campbell [Fri, 14 Sep 2012 09:02:46 +0000 (10:02 +0100)]
libxl: handle errors from xc_sharing_* info functions

On a 32 bit hypervisor xl info currently reports:
sharing_freed_memory   : 72057594037927935
sharing_used_memory    : 72057594037927935

Eat the ENOSYS and turn it into 0. Log and propagate other errors.

I don't have a 32 bit system handy, so tested on x86_64 with a libxc
hacked to return -ENOSYS and -EINVAL.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>