xen.git
14 years agoNested VMX: Add APIs for nestedhvm_ops.
Eddie Dong [Thu, 9 Jun 2011 08:24:09 +0000 (16:24 +0800)]
Nested VMX: Add APIs for nestedhvm_ops.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoNested VMX: Add data structure for nestedvmx
Eddie Dong [Thu, 9 Jun 2011 08:24:09 +0000 (16:24 +0800)]
Nested VMX: Add data structure for nestedvmx

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/hvm: Move IDT_VECTORING processing code out of intr_assist.
Eddie Dong [Thu, 9 Jun 2011 08:24:09 +0000 (16:24 +0800)]
x86/hvm: Move IDT_VECTORING processing code out of intr_assist.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/hvm: extend nhvm_vmcx_guest_intercepts_trap to include errcode
Eddie Dong [Thu, 9 Jun 2011 08:24:09 +0000 (16:24 +0800)]
x86/hvm: extend nhvm_vmcx_guest_intercepts_trap to include errcode
to assist decision of TRAP_page_fault in VMX.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Move check for non-translated guests in one layer
Tim Deegan [Wed, 15 Jun 2011 11:02:07 +0000 (12:02 +0100)]
x86/mm/p2m: Move check for non-translated guests in one layer
so that direct callers of gfn_to_mfn_type_p2m() can operate safely
on PV domains.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86: Pass through ERMS CPUID feature for HVM and PV guests
Yang, Wei [Tue, 14 Jun 2011 12:13:18 +0000 (13:13 +0100)]
x86: Pass through ERMS CPUID feature for HVM and PV guests

This patch exposes ERMS feature to HVM and PV guests.

The REP MOVSB/STOSB instruction can enhance fast strings attempts to
move as much of the data with larger size load/stores as possible.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
14 years agoIOMMU VTD BUG: disable Extended Interrupt Mode when disabling Interrupt Remapping
Andrew Cooper [Tue, 14 Jun 2011 12:04:09 +0000 (13:04 +0100)]
IOMMU VTD BUG: disable Extended Interrupt Mode when disabling Interrupt Remapping

Experimental evidence shows that Extended Interrupt Mode remains in
effect even after Interrupt Remapping is disabled in each DMAR Global
Command Register.  A consiquence of this is that when we switch from
x2apic mode back to xapic mode, and disable interrupt remapping for
the kdump kernel, interrupts passing through the IO APICs are in
x2apic format as opposed xapic.  This causes a triple fault in the
kexec kernel.

As EIM is explicitly set up each time Interrup Remapping is enabled,
it is safe for us to clobber this when taring down.

Also, change the header definition of IRTA_REG_EIME_SHIFT.  It caused
verbose and error-prone code, and was only used in 1 place before.  We
now have IRTA_EIME which is the specific bit in the register.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
14 years agox86/apic: record local APIC state on boot
Andrew Cooper [Tue, 14 Jun 2011 12:02:00 +0000 (13:02 +0100)]
x86/apic: record local APIC state on boot

Xen does not store the boot local APIC state which leads to problems
when shutting down for a kexec jump.  This patch records the boot
state so we can return to the boot state when kexecing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@novell.com>
14 years agox86/kexec: nmi_shootdown_cpus() should leave irqs disabled
Keir Fraser [Tue, 14 Jun 2011 11:49:41 +0000 (12:49 +0100)]
x86/kexec: nmi_shootdown_cpus() should leave irqs disabled

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86/apic: fix potential Protection Fault during shutdown
Andrew Cooper [Tue, 14 Jun 2011 11:47:45 +0000 (12:47 +0100)]
x86/apic: fix potential Protection Fault during shutdown

This is a rare case, but if the BIOS is set to uniprocessor, and Xen
is booted with 'lapic x2apic', Xen will switch into x2apic mode, which
will cause a protection fault when disabling the local APIC.  This
leads to a general protection fault as this code is also in the fault
handler.

When x2apic mode is enabled, the only tranlsation which does
not result in a protection fault is to clear both the EN and EXTD
bits, which is safe to do in all cases, even if you are in xapic
mode rather than x2apic mode.

The linux code from which this is derrived is protected by an
if ( ! x2apic_mode ...) clause which is how they get away with it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
14 years agox86/amd: Eliminate cache flushing when entering C3 on select AMD processors
Mark Langsdorf [Tue, 14 Jun 2011 11:46:29 +0000 (12:46 +0100)]
x86/amd: Eliminate cache flushing when entering C3 on select AMD processors

AMD Fam15h processors have a shared cache. It does not need=20
to be be flushed when entering C3 and doing so causes reduces
performance. Modify acpi_processor_power_init_bm_check to
prevent these processors from flushing when entering C3.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
14 years agox86/hvm: Make DRNG feature visible in CPUID
Yang, Wei [Tue, 14 Jun 2011 11:44:48 +0000 (12:44 +0100)]
x86/hvm: Make DRNG feature visible in CPUID

This patch exposes DRNG feature to HVM guests.

The RDRAND instruction can provide software with sequences of
random numbers generated from white noise.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
14 years agox86_32: Fix build: Define machine_to_phys_mapping_valid
Keir Fraser [Fri, 10 Jun 2011 12:51:39 +0000 (13:51 +0100)]
x86_32: Fix build: Define machine_to_phys_mapping_valid

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86/vmx: Small fixes to MSR_IA32_VMX_PROCBASED_CTLS feature probing.
Keir Fraser [Fri, 10 Jun 2011 07:32:47 +0000 (08:32 +0100)]
x86/vmx: Small fixes to MSR_IA32_VMX_PROCBASED_CTLS feature probing.

Should check for VIRTUAL_INTR_PENDING as we unconditionally make use
of it. Also check for CR8 exiting unconditionally on x86/64, as this
is of use to nestedvmx, and every 64-bit cpu should support it.

Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoxenpaging: update machine_to_phys_mapping[] during page deallocation
Keir Fraser [Fri, 10 Jun 2011 07:19:07 +0000 (08:19 +0100)]
xenpaging: update machine_to_phys_mapping[] during page deallocation

The machine_to_phys_mapping[] array needs updating during page
deallocation.  If that page is allocated again, a call to
get_gpfn_from_mfn() will still return an old gfn from another guest.
This will cause trouble because this gfn number has no or different
meaning in the context of the current guest.

This happens when the entire guest ram is paged-out before
xen_vga_populate_vram() runs.  Then XENMEM_populate_physmap is called
with gfn 0xff000.  A new page is allocated with alloc_domheap_pages.
This new page does not have a gfn yet.  However, in
guest_physmap_add_entry() the passed mfn maps still to an old gfn
(perhaps from another old guest).  This old gfn is in paged-out state
in this guests context and has no mfn anymore.  As a result, the
ASSERT() triggers because p2m_is_ram() is true for p2m_ram_paging*
types.  If the machine_to_phys_mapping[] array is updated properly,
both loops in guest_physmap_add_entry() turn into no-ops for the new
page and the mfn/gfn mapping will be done at the end of the function.

If XENMEM_add_to_physmap is used with XENMAPSPACE_gmfn,
get_gpfn_from_mfn() will return an appearently valid gfn.  As a
result, guest_physmap_remove_page() is called.  The ASSERT in
p2m_remove_page triggers because the passed mfn does not match the old
mfn for the passed gfn.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agox86: Disable set_gpfn_from_mfn until m2p table is allocated.
Keir Fraser [Fri, 10 Jun 2011 07:18:33 +0000 (08:18 +0100)]
x86: Disable set_gpfn_from_mfn until m2p table is allocated.

This is a prerequisite for calling set_gpfn_from_mfn() unconditionally
from free_heap_pages().

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: Fix argument checking in (privileged) function cpu_add().
Keir Fraser [Fri, 10 Jun 2011 07:08:44 +0000 (08:08 +0100)]
x86: Fix argument checking in (privileged) function cpu_add().

Thanks to John McDermott <john.mcdermott@nrl.navy.mil> for spotting.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86/hvm: add SMEP support to HVM guest
Tim Deegan [Mon, 6 Jun 2011 12:46:48 +0000 (13:46 +0100)]
x86/hvm: add SMEP support to HVM guest

Intel new CPU supports SMEP (Supervisor Mode Execution Protection). SMEP
prevents software operating with CPL < 3 (supervisor mode) from fetching
instructions from any linear address with a valid translation for which the U/S
flag (bit 2) is 1 in every paging-structure entry controlling the translation
for the linear address.

This patch adds SMEP support to HVM guest.

Signed-off-by: Yang Wei <wei.y.yang@intel.com>
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Li Xin <xin.li@intel.com>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agohvmloader: add missing emacs local variables.
Ian Campbell [Wed, 8 Jun 2011 12:40:46 +0000 (13:40 +0100)]
hvmloader: add missing emacs local variables.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: reduce unnecessary e820 reservations for SeaBIOS
Ian Campbell [Wed, 8 Jun 2011 12:40:23 +0000 (13:40 +0100)]
hvmloader: reduce unnecessary e820 reservations for SeaBIOS

SeaBIOS will reserve memory in the e820 as necessary, including BIOS
data structures such as the EBDA, any tables it creates or copies into
pleace etc.  Therefore arrange that the memory map provided by
hvmloader to SeaBIOS reserves only things which HVMloader has created.

Since ROMBIOS is more tightly coupled with hvmloader we retain the
ability to reserve BIOS regions in the hvmloader produced e820 and use
that from the ROMBIOS backend.

The code for this could probably have been simpler but the existing
code avoids overlapping e820 areas and so the new code does the same
(many guest OSes sanitize the e820 map to handle this, but I wouldn't
trust that all do, so I didn't take the risk)

For ROMBIOS the resulting e820 map as seen by the guest is the same
except the reserved regions at 0x9e000-0x9fc00,0x9fc00-0xa0000 are
merged into a single region 0x9e000-0xa0000 (Linux guests sanitize the
e820 to look like this anyway).

For SeaBIOS the result is that the lowmem reserved region is from
0x9f000-0xa0000 rather than 0x9e000-0xa0000 which correctly reflects
SeaBIOS's actual usage.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agonestedsvm: Support Decodeassist
Christoph Egger [Wed, 8 Jun 2011 12:39:31 +0000 (13:39 +0100)]
nestedsvm: Support Decodeassist

Offer l1 guest to use decode assist if available in hardware.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
14 years agoFix 32-bit build after p2m series
Tim Deegan [Mon, 6 Jun 2011 08:56:08 +0000 (09:56 +0100)]
Fix 32-bit build after p2m series

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoMerge
Tim Deegan [Mon, 6 Jun 2011 08:49:57 +0000 (09:49 +0100)]
Merge

14 years agox86: Enable Supervisor Mode Execution Protection (SMEP)
Keir Fraser [Fri, 3 Jun 2011 20:39:00 +0000 (21:39 +0100)]
x86: Enable Supervisor Mode Execution Protection (SMEP)

Intel new CPU supports SMEP (Supervisor Mode Execution
Protection). SMEP prevents software operating with CPL < 3 (supervisor
mode) from fetching instructions from any linear address with a valid
translation for which the U/S flag (bit 2) is 1 in every
paging-structure entry controlling the translation for the linear
address.

This patch enables SMEP in Xen to protect Xen hypervisor from
executing pv guest instructions, whose translation paging-structure
entries' U/S flags are all set.

Signed-off-by: Yang Wei <wei.y.yang@intel.com>
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Li Xin <xin.li@intel.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agolibxc: Don't refer to meaningless 'word offsets' in xc_cpufeature.h
Keir Fraser [Fri, 3 Jun 2011 16:27:01 +0000 (17:27 +0100)]
libxc: Don't refer to meaningless 'word offsets' in xc_cpufeature.h

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agohvmloader: reinstate datastructure shared with DSDT in SeaBIOS case.
Ian Campbell [Fri, 3 Jun 2011 16:22:51 +0000 (17:22 +0100)]
hvmloader: reinstate datastructure shared with DSDT in SeaBIOS case.

I mistakenly thought that the "struct bios_info" was a ROMBIOS
specific data structure and so caused it to be populated only in the
ROMBIOS case.

However it turns out that the majority of the struct's fields are
actually referenced from the ACPI DSDT and hence are needed for
SeaBIOS too.

While in principal it might have been possible to continue to mix
ROMBIOS and ACPI bits in this datastructure this is, evidently,
confusing but also leads to header file dependencies from
ROMBIOS->hvmloader which I had been hoping to avoid so as to head-off
future accidental re-entanglement of ROMBIOS and hvmloader.

So instead I have split the ACPI parts into a new "struct acpi_info"
which is defined entirely within the acpi building code in hvmloader
and which comes with a big comment pointing to the DSDT interaction.

This new ACPI info is placed at 0x9F000 which is available under both
ROMBIOS and SeaBIOS. This address is in a reserved region of the E820
and is just above the ROMBIOS stack.

The resulting "struct rombios_info" is hardly worthy of its own
structure but keep it anyway.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: allocate ACPI tables as we go
Ian Campbell [Fri, 3 Jun 2011 16:21:57 +0000 (17:21 +0100)]
hvmloader: allocate ACPI tables as we go

Rather than building the tables twice, once purely to figure out the
size, just allocate each individual table as we go.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: removed unused/incorrect define.
Ian Campbell [Fri, 3 Jun 2011 16:21:40 +0000 (17:21 +0100)]
hvmloader: removed unused/incorrect define.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: reduce minimum allocation alignment from 1024 bytes to 16.
Ian Campbell [Fri, 3 Jun 2011 16:21:28 +0000 (17:21 +0100)]
hvmloader: reduce minimum allocation alignment from 1024 bytes to 16.

1024 bytes create a lot of wastage when the majority of allocations
are of BIOS table data structures which are generally happy with much
lower alignment. I conservatively chose 16 bytes.

Most callers pass 0 for the alignment anyway, for the rombios high
code allocation I kept it 1024 byte aligned since it was the only case
that didn't seem obviously ok with a smaller alignment.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agotools/hotplug: Fix hotplug hook script arrangements not to always fail
Ian Jackson [Fri, 3 Jun 2011 14:04:30 +0000 (15:04 +0100)]
tools/hotplug: Fix hotplug hook script arrangements not to always fail

The new feature introduced in 23401:a44b12ee2fd3 was broken; it in
general always fails, at least if there are no hotplug scripts.

If there are no hooks, call_hooks ends up running this:
  [ -x ".....*.hook" ] && . "..... *.hook"

This does not directly trigger set -e and sigerr.  However, it is the
last command exected in call_hooks.  So the return status of
call_hooks is an error, and thus a sigerr happens when call_hooks
returns.

The bug affects xl and xm.  However xl does not detect failure of the
hotplug script.

Change the script to use if...then rather than &&, as the latter has
very confusing and undesirable semantics.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: disks: rename disk param "unpluggable" to "removable"
Ian Jackson [Thu, 2 Jun 2011 17:46:35 +0000 (18:46 +0100)]
libxl: disks: rename disk param "unpluggable" to "removable"

This property corresponds to what is called "removable" in xenstore,
and is the conventional meaning of "removable": ie, the _media_ can be
removed even as the _device_ remains present.

"unpluggable" is a misleading name for this.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
14 years agolibxl: disks: expose new "script" parameter for external block scripts
Ian Jackson [Thu, 2 Jun 2011 17:46:34 +0000 (18:46 +0100)]
libxl: disks: expose new "script" parameter for external block scripts

This is not currently implemented.  Applications setting it to
anything but NULL will cause an error.  Code to set it from xl
configuration files will appear later in this series.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Committed-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
14 years agolibxl: make libxl_ctx_free tolerate NULL ctx argument
Ian Jackson [Thu, 2 Jun 2011 17:46:33 +0000 (18:46 +0100)]
libxl: make libxl_ctx_free tolerate NULL ctx argument

This is purely for convenience (eg, when debugging).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
14 years agolibxl: provide TOSTRING in libxl_internal.h and libxlu_internal.h
Ian Jackson [Thu, 2 Jun 2011 17:46:32 +0000 (18:46 +0100)]
libxl: provide TOSTRING in libxl_internal.h and libxlu_internal.h

Provide a copy of the standard TOSTRING macro in libxlu_internal.h,
for the benefit of patches later in this series.

Also, move TOSTRING to libxl_internal.h from a .c file for the
benefit of future other callers in libxl proper.

(These cannot be combined because libxlu cannot include
libxl_internal.h and libxl should not include libxlu_internal.h.)

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Committed-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
14 years agolibxl: add missing copyright notices to autogenerated files
Ian Jackson [Thu, 2 Jun 2011 17:46:32 +0000 (18:46 +0100)]
libxl: add missing copyright notices to autogenerated files

Copyright notices in libxlu_cfg_[ly].[ly] end up in the .[ch] files,
copied there by flex and bison.  Regenerate those files (flex 2.5.35
and bison 2.3, from Debian lenny i386).

No manual edits in this patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Committed-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
14 years agolibxl: add missing copyright notices to some files
Ian Jackson [Thu, 2 Jun 2011 17:46:31 +0000 (18:46 +0100)]
libxl: add missing copyright notices to some files

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Committed-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
14 years agolibxl: fix wrong mask of function number
Yang Zhang [Thu, 2 Jun 2011 16:42:03 +0000 (17:42 +0100)]
libxl: fix wrong mask of function number

Function number is 3 bits. So the mask should be 0x7 instead 0x3.

Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoxl: print sxp on dry-run of create.
Ian Campbell [Thu, 2 Jun 2011 16:36:02 +0000 (17:36 +0100)]
xl: print sxp on dry-run of create.

The help text for xm create's --dry-run says "Dry run - prints the
resulting configuration in SXP but does not create the domain." so
update xl implementation to match. At least the xendomains initscript
relies on this (for better or worse).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Carsten Schiers <carsten@schiers.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: flask xsm support
Machon Gregory [Thu, 2 Jun 2011 16:32:18 +0000 (17:32 +0100)]
libxl: flask xsm support

Adds support for assigning a label to domains, obtaining and setting the
current enforcing mode, and loading a policy with xl command and libxl
header when the Flask XSM is in use. Adheres to the changes made by the
patch to remove exposure of libxenctrl/libxenstore headers via libxl.h.

Signed-Off-By: Machon Gregory <mbgrego@tycho.ncsc.mil>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: check return values of read/write
Ian Campbell [Thu, 2 Jun 2011 16:26:10 +0000 (17:26 +0100)]
libxl: check return values of read/write

Some distros enable -D_FORTIFY_SOURCE=2 by default
(https://wiki.ubuntu.com/CompilerFlags) which adds the warn_unused_result
attribute to several functions including read(2) and write(2)

Although we don't really care about error reading or writing the libxl spawn fd
catch them anyway to keep this warning happy.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: avoid build warning when _libxl_types.h does not initially exist.
Ian Campbell [Thu, 2 Jun 2011 16:25:27 +0000 (17:25 +0100)]
libxl: avoid build warning when _libxl_types.h does not initially exist.

Olaf Hering reports:
if ! cmp _libxl_paths.h.2.tmp _libxl_paths.h; then mv -f _libxl_paths.h.2.tmp _libxl_paths.h; fi
cmp: _libxl_paths.h: No such file or directory

Use "cmp -s" to silence the error. cmp returns 2 in this case and so the mv
does occur.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: remove stray "atoi" in debug code.
Ian Campbell [Thu, 2 Jun 2011 16:24:41 +0000 (17:24 +0100)]
libxl: remove stray "atoi" in debug code.

I switched from atoi to strtol but failed to actually remove it...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxc: Simplify and clean up xc_cpufeature.h
Keir Fraser [Thu, 2 Jun 2011 14:01:04 +0000 (15:01 +0100)]
libxc: Simplify and clean up xc_cpufeature.h

 * Remove Linux-private defns with no direct relation to CPUID
 * Remove word offsets into Linux-defined cpu_caps array
 * Hard tabs -> soft tabs

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: Hide CPUID leaf 7 from PV guests.
Keir Fraser [Thu, 2 Jun 2011 13:39:50 +0000 (14:39 +0100)]
x86: Hide CPUID leaf 7 from PV guests.

Except for the whitelisted FSGSBASE feature.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: Replace ad-hoc bitmaskof() macro with single cpufeat_mask() defn.
Keir Fraser [Thu, 2 Jun 2011 13:34:34 +0000 (14:34 +0100)]
x86: Replace ad-hoc bitmaskof() macro with single cpufeat_mask() defn.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86/mm/shadow: emulated writes are always guest-originated actions
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/shadow: emulated writes are always guest-originated actions
and never happen with the paging lock held.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm: simplify log-dirty page allocation.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm: simplify log-dirty page allocation.

Now that the log-dirty code is covered by the same lock as shadow and
hap activity, we no longer need to avoid doing allocs and frees with
the lock held.  Simplify the code accordingly.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm: merge the shadow, hap and log-dirty locks into a single paging lock.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm: merge the shadow, hap and log-dirty locks into a single paging lock.

This will allow us to simplify the locking around calls between
hap/shadow and log-dirty code.  Many log-dirty paths already need the
shadow or HAP lock so it shouldn't increase contention that much.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm: Make MM locks recursive.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm: Make MM locks recursive.

This replaces a lot of open coded 'if (!locked) {lock()}' instances
by making the mm locks recursive locks, but only allowing them
to be taken recursively in the places that they were before.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm: dedup the various copies of the shadow lock functions
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm: dedup the various copies of the shadow lock functions

Define the lock and unlock functions once, and list all the locks in one
place so (a) it's obvious what the locking discipline is and (b) none of
the locks are visible to non-mm code.  Automatically enforce that these
locks never get taken in the wrong order.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Fix locking discipline around log-dirty teardown.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: Fix locking discipline around log-dirty teardown.

It's not safe to call paging_free_log_dirty_page with the
log-dirty lock held.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Move p2m code in HVMOP_[gs]et_mem_access into p2m.c
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: Move p2m code in HVMOP_[gs]et_mem_access into p2m.c

It uses p2m internals like the p2m lock and function pointers so belongs
behind the p2m interface.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Fix locking discipline around p2m lookups.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: Fix locking discipline around p2m lookups.

All gfn_to_mfn* functions except _query() might take the p2m lock,
so can't be called with a p2m, shadow, hap or log_dirty lock held.
The remaining offender is the memory sharing code, which calls
_unshare() from inside the pagetable walker!  Fixing that is too big
for a cleanup patch like this one.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Fix locking discipline around p2m updates.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: Fix locking discipline around p2m updates.

Direct callers of the p2m setting functions must hold the p2m lock.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Remove recursive-locking code from set_shared_p2m_entry().
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: Remove recursive-locking code from set_shared_p2m_entry().

It should no longer be needed now that the shr_lock discipline is fixed.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm: Fix memory-sharing code's locking discipline.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm: Fix memory-sharing code's locking discipline.

memshr_audit is sometimes called with the shr_lock held.  Make it so for
every call.

Move the unsharing loop in p2m_teardown out of the p2m_lock to avoid
deadlocks.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: paging_p2m_ga_to_gfn() doesn't need so many arguments
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: paging_p2m_ga_to_gfn() doesn't need so many arguments

It has only one caller and is always called with p2m == hostp2m and mode
== hostmode.  Also, since it's only called from nested HAP code, remove
the check of paging_mode_hap().  Then rename it to reflect its new
interface.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Make p2m interfaces take struct domain arguments.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: Make p2m interfaces take struct domain arguments.

As part of the nested HVM patch series, many p2m functions were changed
to take pointers to p2m tables rather than to domains.  This patch
reverses that for almost all of them, which:
 - gets rid of a lot of "p2m_get_hostp2m(d)" in code which really
   shouldn't have to know anything about how gfns become mfns.
 - ties sharing and paging interfaces to a domain, which is
   what they actually act on, rather than a particular p2m table.

In developing this patch it became clear that memory-sharing and nested
HVM are unlikely to work well together.  I haven't tried to fix that
here beyond adding some assertions around suspect paths (as this patch
is big enough with just the interface changes)

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: merge gfn_to_mfn_unshare with other gfn_to_mfn paths.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: merge gfn_to_mfn_unshare with other gfn_to_mfn paths.

gfn_to_mfn_unshare() had its own function despite all other lookup types
being handled in one place. Merge it into _gfn_to_mfn_type(), so that it
gets the benefit of broken-page protection, for example, and tidy its
interfaces up to fit.

The unsharing code still has a lot of bugs, e.g.
 - failure to alloc for unshare on a foreign lookup still BUG()s,
 - at least one race condition in unshare-and-retry
 - p2m_* lookup types should probably be flags, not enum
but it's cleaner and will make later p2m cleanups easier.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: hide the current-domain fast-path inside the p2m-pt code.
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: hide the current-domain fast-path inside the p2m-pt code.

The other implementations of the p2m interface don't have this, and
it will go away entirely when 32-bit builds go away, so take it out
of the interface.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: little fixes and tidying up
Tim Deegan [Thu, 2 Jun 2011 12:16:52 +0000 (13:16 +0100)]
x86/mm/p2m: little fixes and tidying up

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86/mm/p2m: Mark internal functions static
Tim Deegan [Thu, 2 Jun 2011 12:16:51 +0000 (13:16 +0100)]
x86/mm/p2m: Mark internal functions static

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agohvmloader: add code to generate a $PIR table.
Ian Campbell [Wed, 1 Jun 2011 15:50:16 +0000 (16:50 +0100)]
hvmloader: add code to generate a $PIR table.

Does not replace the table hardcoded in ROMBIOS (it ain't broke) but
is used for SeaBIOS.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: allow the possibility to allocate the size of smbios table
Ian Campbell [Wed, 1 Jun 2011 15:49:23 +0000 (16:49 +0100)]
hvmloader: allow the possibility to allocate the size of smbios table
we actually need.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: further support for SeaBIOS
Ian Campbell [Wed, 1 Jun 2011 15:49:03 +0000 (16:49 +0100)]
hvmloader: further support for SeaBIOS

Build the various BIOS tables and arrange for them to be passed to
SeaBIOS. We define a simple data structure structure at a known
physical address for this purpose.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: smbios: allow the entry point data structure to be located separately.
Ian Campbell [Wed, 1 Jun 2011 15:48:29 +0000 (16:48 +0100)]
hvmloader: smbios: allow the entry point data structure to be located separately.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: allow create_mp_tables() to allocate the table
Ian Campbell [Wed, 1 Jun 2011 15:47:50 +0000 (16:47 +0100)]
hvmloader: allow create_mp_tables() to allocate the table

Will be used by SeaBIOS.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: return MPFPS from create_mp_tables()
Ian Campbell [Wed, 1 Jun 2011 15:47:27 +0000 (16:47 +0100)]
hvmloader: return MPFPS from create_mp_tables()

This is the hook which the mptables hang off, so it is useful to know.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: refactor BIOS info setup
Ian Campbell [Wed, 1 Jun 2011 15:47:04 +0000 (16:47 +0100)]
hvmloader: refactor BIOS info setup

Currently we have ->bios_high_setup, which is called relatively early
and returns a cookie which is passed to ->bios_info_setup which runs
towards the end and creates the BIOS info, incorporating the cookie
which (in the case of ROMBIOS) happens to be the BIOS's high load
address . This is rather ROMBIOS specific.

Refactor to have ->bios_info_setup which is called early and prepares
the bios_info, ->bios_relocate which does any necessary relocation
(updating the BIOS info as necessary) and ->bios_info_finish which
finalises the info (e.g.  by calculating the checksum).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: Add a simple "scratch allocator"
Ian Campbell [Wed, 1 Jun 2011 15:46:28 +0000 (16:46 +0100)]
hvmloader: Add a simple "scratch allocator"

Returns memory which is passed to the subsequent BIOS but can be
reused once the contents is consumed. An example of this would be a
BIOS table which the BIOS consumes by copying rather than simply
referencing.

Users which need a temporary scratch buffer for internal use
scratch_start which follows these allocations.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: make SMBIOS initialisation more general.
Ian Campbell [Wed, 1 Jun 2011 15:45:05 +0000 (16:45 +0100)]
hvmloader: make SMBIOS initialisation more general.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: make ACPI initialisation hook more general.
Ian Campbell [Wed, 1 Jun 2011 15:44:31 +0000 (16:44 +0100)]
hvmloader: make ACPI initialisation hook more general.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: call SMP setup from common code again.
Ian Campbell [Wed, 1 Jun 2011 15:43:52 +0000 (16:43 +0100)]
hvmloader: call SMP setup from common code again.

Previous refactoring was premature.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: setup APICs in a common function again.
Ian Campbell [Wed, 1 Jun 2011 15:43:29 +0000 (16:43 +0100)]
hvmloader: setup APICs in a common function again.

Previous refactoring was premature.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agomerge
Keir Fraser [Wed, 1 Jun 2011 15:42:39 +0000 (16:42 +0100)]
merge

14 years agohvmloader: setup PCI bus in a common function again.
Ian Campbell [Wed, 1 Jun 2011 15:41:42 +0000 (16:41 +0100)]
hvmloader: setup PCI bus in a common function again.

Previous refactoring was premature.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: enable PCI_COMMAND_IO on primary VGA device
Ian Campbell [Wed, 1 Jun 2011 15:40:54 +0000 (16:40 +0100)]
hvmloader: enable PCI_COMMAND_IO on primary VGA device

There is an implicit assumption in the PCI spec that the primary VGA
device (e.g. something with class==VGA) will have I/O enabled in order
to make the standard VGA I/O registers (e.g. at 0x3xx) available, even
though the device has no explicit I/O BARS.

The qemu device model for the Cirrus VGA card does not actually
enforce this but SeaBIOS looks for a VGA device with I/O enabled
before running the VGA ROM. Coreboot has similar behaviour and I
verified on a physical Cirrus GD 5446 that the BIOS had enable I/O
cycles.

The thread at
http://www.seabios.org/pipermail/seabios/2011-May/001804.html
contains more info.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: allow per-BIOS decision on loading option ROMS
Ian Campbell [Wed, 1 Jun 2011 15:39:55 +0000 (16:39 +0100)]
hvmloader: allow per-BIOS decision on loading option ROMS

SeaBIOS has functionality to load ROMs from the PCI device directly,
it makes sense to use this when it is available.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agoxsm: fixed missing header compile error
Machon Gregory [Wed, 1 Jun 2011 15:12:29 +0000 (16:12 +0100)]
xsm: fixed missing header compile error

Fixes compile error caused by changeset 23363 by including xenoprof.h
header.

Signed-off-by: Machon Gregory <mbgrego@tycho.ncsc.mil>
14 years agox86/mm: mem-paging and mem-sharing only work with HAP
Tim Deegan [Wed, 1 Jun 2011 10:11:43 +0000 (11:11 +0100)]
x86/mm: mem-paging and mem-sharing only work with HAP
so don't let the tools shoot themselves in the foot.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86: Fix spurious_page_fault() for 1GB superpages.
Keir Fraser [Tue, 31 May 2011 12:57:45 +0000 (13:57 +0100)]
x86: Fix spurious_page_fault() for 1GB superpages.

From: Xin Li <xin.li@intel.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agonestedsvm: fix tlb_control
Christoph Egger [Tue, 31 May 2011 12:55:50 +0000 (13:55 +0100)]
nestedsvm: fix tlb_control

On VMRUN emulation evaluate the virtual tlb_control only to match
hw behaviour. Deal with l1 guests which use flush-by-asid w/o
checking cpuid bits or fill tlb_control with random data.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
14 years agox86: cpufreq init cleanup
Liu, Jinsong [Tue, 31 May 2011 12:53:54 +0000 (13:53 +0100)]
x86: cpufreq init cleanup

c/s 20325 change AMD cpufreq init logic.  Before that, AMD cpu start
cpufreq init logic only when all cpus ready.  c/s 20325 change it to
per cpu add, however, leave code un-elegant.

This patch do a little cleanup work.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
14 years agonestedhvm: Fix wrong memory size of nested shadow_io_bitmap
Keir Fraser [Tue, 31 May 2011 12:52:42 +0000 (13:52 +0100)]
nestedhvm: Fix wrong memory size of nested shadow_io_bitmap
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
While there, simplify and tidy the code.
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoHVM/SVM: enable tsc scaling ratio for SVM
Wei Huang [Sat, 28 May 2011 07:58:08 +0000 (08:58 +0100)]
HVM/SVM: enable tsc scaling ratio for SVM

Future AMD CPUs support TSC scaling. It allows guests to have a
different TSC frequency from host system using this formula: guest_tsc
= host_tsc * tsc_ratio + vmcb_offset. The tsc_ratio is a 64bit MSR
contains a fixed-point number in 8.32 format (8 bits for integer part
and 32bits for fractional part). For instance 0x00000003_80000000
means tsc_ratio=3.5.

This patch enables TSC scaling ratio for SVM. With it, guest VMs don't
need take #VMEXIT to calculate a translated TSC value when it is
running under TSC emulation mode. This can substancially reduce the
rdtsc overhead.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
14 years agox86/intel: Fix CPUID leaf 7 detection
Yang, Wei [Sat, 28 May 2011 07:57:12 +0000 (08:57 +0100)]
x86/intel: Fix CPUID leaf 7 detection

Must set subleaf to 0 (input ECX==0).

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
14 years agomem_event: Revert pointless, unrelated, and broken (on i386) change in 23434:ef410f262299
Keir Fraser [Sat, 28 May 2011 07:33:54 +0000 (08:33 +0100)]
mem_event: Revert pointless, unrelated, and broken (on i386) change in 23434:ef410f262299

vcpu_pause() is nestable in the hypervisor, hence checking for
already-paused is not required.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agomem_event: Allow memory access listener to perform single step execution.
Aravindh Puthiyaparambil [Fri, 27 May 2011 17:44:26 +0000 (18:44 +0100)]
mem_event: Allow memory access listener to perform single step execution.

Add a new memory event that handles single step. This allows the
memory access listener to handle instructions that modify data within
the execution page.  This can be enabled in the listener by doing:
xc_set_hvm_param(xch, domain_id, HVM_PARAM_MEMORY_EVENT_SINGLE_STEP,
HVMPME_mode_sync)

Now the listener can start single stepping by:
xc_domain_debug_control(xch, domain_id,
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON, vcpu_id)

And stop single stepping by: xc_domain_debug_control(xch, domain_id,
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF, vcpu_id)

Signed-off-by: Aravindh Puthiyaparambil <aravindh@virtuata.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoFix the filename of the archive produced by 'make deb'
Tim Deegan [Fri, 27 May 2011 17:41:12 +0000 (18:41 +0100)]
Fix the filename of the archive produced by 'make deb'

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoClean up stdarg handling a little. Fix for NetBSD.
Keir Fraser [Fri, 27 May 2011 14:49:24 +0000 (15:49 +0100)]
Clean up stdarg handling a little. Fix for NetBSD.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoxen/x86: Add -Wnested-externs to CFLAGS
Tim Deegan [Fri, 27 May 2011 07:56:47 +0000 (08:56 +0100)]
xen/x86: Add -Wnested-externs to CFLAGS

This will catch any new extern declarations that happen actually
inside function bodies.  Unfortunately there's no equivalent
warning for extern declarations at rootl level in .c files.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoxen: remove more declarations from C files.
Tim Deegan [Fri, 27 May 2011 07:56:12 +0000 (08:56 +0100)]
xen: remove more declarations from C files.

This patch moves some more, mostly data, extern declarations into
header files.   I haven't been as strict as I was with functions;
in particular there are a number of declarations of assembler labels
that are only used in one place.  I've also left a few compat-mode
tricks, and all the magic in symbols.c

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agolibxl: use preferred syntax for network device creation with upstream qemu
Ian Campbell [Thu, 26 May 2011 16:16:47 +0000 (17:16 +0100)]
libxl: use preferred syntax for network device creation with upstream qemu

Markus Armbruster points out in <m3r582pzc1.fsf@blackfin.pond.sub.org>
on qemu-devel that this is the prefered syntax going forward. Using it avoid
needlessly instantiating a qemu "vlan" and instead creates a simply host end
point and device.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Add 'e820_host' option to config file.
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:56:37 +0000 (09:56 -0400)]
libxl: Add 'e820_host' option to config file.

.. which will be removed once the auto-ballooning of guests
with PCI devices works. During testing of the patches which provide
a host E820 in a PV guest, certain inconsistencies were found with
guests. When launching a RHEL5 or SLES11 PV guest with 4GB and a PCI device,
the kernel would report 4GB, but have 1.5G "used". What happend was that
the P2M that fall within the E820 I/O holes would never be used and was just
wasted. The mechanism to go around this is to shrink the size of the guest
before launch (say memory=2048, maxmem=4096) and then balloon back to 4096M
after start. For PVOPS type kernels it would detect the E820 I/O holes and
deflate by the correct amount but would not inflate back to 4GB.
Manually inflating makes it work.

The fix in the future for guests where the memory amount flows over the
PCI hole, is to launch the guest with decreased amount right up to the cusp
of where the E820 PCI hole starts. Also increase the 'maxmem' by the delta
and then when the guest has launched, balloon up to the delta number.

This will require some careful surgery so for right now this parameter
will guard against unsuspecting users seeing their PV guests memory "vanish."

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate.
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:56:30 +0000 (09:56 -0400)]
libxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate.

Most machines after the RAM regions in the e802 have a couple of
E820_RESERVED, with E820_ACPI and E820_NVS. On some Intel machines, the
E820 looks like swiss cheese:

(XEN) Initial Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d000 (usable)
(XEN)  000000000009d000 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000009cf66000 (usable)
(XEN)  000000009cf66000 - 000000009d102000 (ACPI NVS)
(XEN)  000000009d102000 - 000000009f6bd000 (usable)  <--
(XEN)  000000009f6bd000 - 000000009f6bf000 (reserved)
(XEN)  000000009f6bf000 - 000000009f714000 (usable)  <--
(XEN)  000000009f714000 - 000000009f7bf000 (ACPI NVS)
(XEN)  000000009f7bf000 - 000000009f7e0000 (usable)  <--
(XEN)  000000009f7e0000 - 000000009f7ff000 (ACPI data)
(XEN)  000000009f7ff000 - 000000009f800000 (usable)  <--
(XEN)  000000009f800000 - 00000000a0000000 (reserved)
(XEN)  00000000a0000000 - 00000000b0000000 (reserved)
(XEN)  00000000fc000000 - 00000000fd000000 (reserved)
(XEN)  00000000ffe00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000160000000 (usable)

Which means we have to pay attention to the E820_RAM that are
between the E820_[ACPI,NVS,RESERVED]. If we remove those
E820_RAM (b/c the amount of memory passed to the guest
is less that where those E820 regions reside) from the E820, the
Linux kernel interprets those "gaps" as PCI I/O space.
This is what we are currently doing.

This can be disastrous if we pass in an Intel IGD card which tries
to use the first available PCI I/O space - and ends up
using the MFNs which are actually RAM instead of being the
PCI I/O space.

To make this work, we convert all E820_RAM that are above
the 'target_kb' (those that overlap the 'target_kb'
are truncated appropriately) to be E820_UNUSABLE. We also limit this
alternation up to 4GB. This means that an E820 for a guest
>from this (target_kb=1024, maxmem=2048):

[    0.000000] Set 405658 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 000000009cf66000 (unusable)
[    0.000000]  Xen: 000000009cf66000 - 000000009d102000 (ACPI NVS)
[    0.000000]  Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
[    0.000000]  Xen: 000000009f714000 - 000000009f7bf000 (ACPI NVS)
[    0.000000]  Xen: 000000009f7e0000 - 000000009f7ff000 (ACPI data)
[    0.000000]  Xen: 000000009f800000 - 00000000b0000000 (reserved)
[    0.000000]  Xen: 00000000fc000000 - 00000000fd000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 0000000140800000 (usable)

Will look as so:

[    0.000000] Set 395880 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 000000009cf66000 (unusable)
[    0.000000]  Xen: 000000009cf66000 - 000000009d102000 (ACPI NVS)
[    0.000000]  Xen: 000000009d102000 - 000000009f6bd000 (unusable)
[    0.000000]  Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
[    0.000000]  Xen: 000000009f6bf000 - 000000009f714000 (unusable)
[    0.000000]  Xen: 000000009f714000 - 000000009f7bf000 (ACPI NVS)
[    0.000000]  Xen: 000000009f7bf000 - 000000009f7e0000 (unusable)
[    0.000000]  Xen: 000000009f7e0000 - 000000009f7ff000 (ACPI data)
[    0.000000]  Xen: 000000009f7ff000 - 000000009f800000 (unusable)
[    0.000000]  Xen: 000000009f800000 - 00000000b0000000 (reserved)
[    0.000000]  Xen: 00000000fc000000 - 00000000fd000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 0000000140800000 (usable)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxl: Add support for passing in the host's E820 for PCI passthrough
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:56:26 +0000 (09:56 -0400)]
libxl: Add support for passing in the host's E820 for PCI passthrough

The code that populates E820 is unconditionally triggered by the guest
configuration having "pci=['<BDF>,..']", being a PV guest, and if
b_info->u.pv.e820_host is set.

The code do_domain_create calls the libxl__e820_alloc when
it notices that the guest is PV, has at least one PCI devices, and has
the e820_host flag set.

libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED to
get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes that any
gap in the E820 is considered PCI I/O space which means that if we pass
in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
that we also create an E820_UNUSABLE between the region of 'target_kb'
(called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
Lastly, the xc_domain_set_memory_map is called to install the new E820.

When tested with another PV guest (NetBSD 5.1) the modified E820 gave
it no trouble. The code has also been tested with older "classic" Xen Linux
and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).

Memory that is slack or for balloon (so 'maxmem' in guest configuration)
is put behind the machine E820. Which in most cases is after the 4GB.

The reason for doing the fetching of the E820 using the hypercall in
the toolstack (instead of the guest doing it) is that when a guest
would do a hypercall to 'XENMEM_machine_memory_map' it would
retrieve an E820 with I/O range caps added in. Meaning that the
region after 4GB up to end of possible memory would be marked as unusable
and the kernel would not have any space to allocate a balloon
region.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxl: fix build failure (unused variables) for non-Linux platforms
Christoph Egger [Thu, 26 May 2011 14:55:22 +0000 (15:55 +0100)]
libxl: fix build failure (unused variables) for non-Linux platforms

Move variable definitions into Linux-specific sections where they are
actually used. Fixes warning about unused variables.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools/libfsimage: build fix (ctype macros applied to char)
Christoph Egger [Thu, 26 May 2011 14:43:22 +0000 (15:43 +0100)]
tools/libfsimage: build fix (ctype macros applied to char)

Fix warning: array subscript has type 'char'

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: Enable superpages for HVM domains by default
George Dunlap [Thu, 26 May 2011 14:27:34 +0000 (15:27 +0100)]
tools: Enable superpages for HVM domains by default

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agotools: Introduce "allocate-only" page type for migration
George Dunlap [Thu, 26 May 2011 14:27:33 +0000 (15:27 +0100)]
tools: Introduce "allocate-only" page type for migration

To detect presence of superpages on the receiver side, we need
to have strings of sequential pfns sent across on the first iteration
through the memory.  However, as we go through the memory, more and
more of it will be marked dirty, making it wasteful to send those pages.

This patch introduces a new PFINFO type, "XALLOC".  Like PFINFO_XTAB, it
indicates that there is no corresponding page present in the subsquent
page buffer.  However, unlike PFINFO_XTAB, it contains a pfn which should be
allocated.

This new type is only used for migration; but it's placed in
xen/public/domctl.h so that the value isn't reused.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>