Ian Campbell [Thu, 22 Sep 2011 17:37:06 +0000 (18:37 +0100)]
tools: fix install of lomount
$(BIN) went away in 23124:
e3d4c34b14a3.
Also there are no *.so, *.a or *.rpm built in here
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 22 Sep 2011 17:35:30 +0000 (18:35 +0100)]
x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
This patch originally comes from the Linus mainline kernel (2.6.33),
find below the patch details:
From: Andreas Herrmann <herrmann.der.user@googlemail.com>
There is no point in warning when there is no ucode available
for a specific CPU revision. Currently the container-file, which
provides the AMD ucode patches for OS load, contains only a few
ucode patches.
It's already clearly indicated by the printed patch_level
whenever new ucode was available and an update happened. So the
warning message is of no help but rather annoying on systems
with many CPUs.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:34:27 +0000 (18:34 +0100)]
XZ: Fix incorrect XZ_BUF_ERROR
From: Lasse Collin <lasse.collin@tukaani.org>
xz_dec_run() could incorrectly return XZ_BUF_ERROR if all of the
following was true:
- The caller knows how many bytes of output to expect and only
provides
that much output space.
- When the last output bytes are decoded, the caller-provided input
buffer ends right before the LZMA2 end of payload marker. So LZMA2
won't provide more output anymore, but it won't know it yet and
thus
won't return XZ_STREAM_END yet.
- A BCJ filter is in use and it hasn't left any unfiltered bytes in
the
temp buffer. This can happen with any BCJ filter, but in practice
it's more likely with filters other than the x86 BCJ.
This fixes <https://bugzilla.redhat.com/show_bug.cgi?id=
3D735408>
where Squashfs thinks that a valid file system is corrupt.
This also fixes a similar bug in single-call mode where the
uncompressed size of a block using BCJ + LZMA2 was 0 bytes and caller
provided no output space. Many empty .xz files don't contain any
blocks and thus don't trigger this bug.
This also tweaks a closely related detail: xz_dec_bcj_run() could call
xz_dec_lzma2_run() to decode into temp buffer when it was known to be
useless. This was harmless although it wasted a minuscule number of
CPU cycles.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:33:48 +0000 (18:33 +0100)]
XZ decompressor: Fix decoding of empty LZMA2 streams
From: Lasse Collin <lasse.collin@tukaani.org>
The old code considered valid empty LZMA2 streams to be corrupt.
Note that a typical empty .xz file has no LZMA2 data at all,
and thus most .xz files having no uncompressed data are handled
correctly even without this fix.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:32:34 +0000 (18:32 +0100)]
VT-d: fix off-by-one error in RMRR validation
(base_addr,end_addr) is an inclusive range, and hence there shouldn't
be a subtraction of 1 in the second invocation of page_is_ram_type().
For RMRRs covering a single page that actually resulted in the
immediately preceding page to get checked (which could have resulted
in a false warning).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:31:44 +0000 (18:31 +0100)]
VT-d: eliminate a mis-use of pcidevs_lock
dma_pte_clear_one() shouldn't acquire this global lock for the purpose
of processing a per-domain list. Furthermore the function a few lines
earlier has a comment stating that acquiring pcidevs_lock isn't
necessary here (whether that's really correct is another question).
Use the domain's mappin_lock instead to protect the mapped_rmrrs list.
Fold domain_rmrr_mapped() into its sole caller so that the otherwise
implicit dependency on pcidevs_lock there becomes more obvious (see
the comment there).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:31:02 +0000 (18:31 +0100)]
x86: IO-APIC code has no dependency on PCI
The IRQ handling code requires pcidevs_lock to be held only for MSI
interrupts.
As the handling of which was now fully moved into msi.c (i.e. while
applying fine without, the patch needs to be applied after the one
titled "x86: split MSI IRQ chip"), io_apic.c now also doesn't need to
include PCI headers anymore.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:29:19 +0000 (18:29 +0100)]
PCI multi-seg: config space accessor adjustments
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:28:38 +0000 (18:28 +0100)]
PCI multi-seg: Pass-through adjustments
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:28:03 +0000 (18:28 +0100)]
PCI multi-seg: AMD-IOMMU specific adjustments
There are two places here where it is entirely unclear to me where the
necessary PCI segment number should be taken from (as IVMD descriptors
don't have such, only IVHD ones do). AMD confirmed that for the time
being it is acceptable to imply that only segment 0 exists.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:27:26 +0000 (18:27 +0100)]
PCI multi-seg: VT-d specific adjustments
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 22 Sep 2011 17:26:54 +0000 (18:26 +0100)]
PCI multi-seg: adjust domctl interface
Again, a couple of directly related functions at once get adjusted to
account for the segment number.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 23:26:52 +0000 (00:26 +0100)]
x86: split MSI IRQ chip
With the .end() accessor having become optional and noting that
several of the accessors' behavior really depends on the result of
msi_maskable_irq(), the splits the MSI IRQ chip type into two - one
for the maskable ones, and the other for the (MSI only) non-maskable
ones.
At once the implementation of those methods gets moved from io_apic.c
to msi.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 23:25:57 +0000 (00:25 +0100)]
pass struct irq_desc * to all other IRQ accessors
This is again because the descriptor is generally more useful (with
the IRQ number being accessible in it if necessary) and going forward
will hopefully allow to remove all direct accesses to the IRQ
descriptor array, in turn making it possible to make this some other,
more efficient data structure.
This additionally makes the .end() accessor optional, noting that in a
number of cases the functions were empty.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 23:24:37 +0000 (00:24 +0100)]
pass struct irq_desc * to set_affinity() IRQ accessors
This is because the descriptor is generally more useful (with the IRQ
number being accessible in it if necessary) and going forward will
hopefully allow to remove all direct accesses to the IRQ descriptor
array, in turn making it possible to make this some other, more
efficient data structure.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 23:22:57 +0000 (00:22 +0100)]
convert more literal uses of cpumask_t to pointers
This is particularly relevant as the number of CPUs to be supported
increases (as recently happened for the default thereof).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Keir Fraser [Sat, 17 Sep 2011 23:21:23 +0000 (00:21 +0100)]
MAINTAINERS: Add Jan Beulich for EFI, x86, ACPI.
Signed-off-by: Keir Fraser <keir@xen.org>
Jan Beulich [Sat, 17 Sep 2011 23:12:19 +0000 (00:12 +0100)]
PCI multi-seg: add new physdevop-s
The new PHYSDEVOP_pci_device_add is intended to be extensible, with a
first extension (to pass the proximity domain of a device) added right
away.
A couple of directly related functions at once get adjusted to account
for the segment number.
Should we deprecate the PHYSDEVOP_manage_pci_* sub-hypercalls?
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 23:10:03 +0000 (00:10 +0100)]
PCI multi-seg: introduce notion of PCI segments
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Shan Haitao [Sat, 17 Sep 2011 23:01:58 +0000 (00:01 +0100)]
Fix PV CPUID virtualization of XSave
The patch will fix XSave CPUID virtualization for PV guests. The XSave
area size returned by CPUID leaf D is changed dynamically depending on
the XCR0. Tools/libxc only assigns a static value. The fix will adjust
xsave area size during runtime.
Note: This fix is already in HVM cpuid virtualization. And Dom0 is not
affected, either.
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Igor Mammedov [Sat, 17 Sep 2011 23:00:26 +0000 (00:00 +0100)]
Clear IRQ_GUEST in irq_desc->status when setting action to NULL.
Looking more closely at usage of action field with relation to
IRQ_GUEST flag. It appears that set IRQ_GUEST implies that action
is not NULL. As result it is not safe to set action to NULL and
leave IRQ_GUEST set.
Hence IRQ_GUEST should be cleared in dynamic_irq_cleanup where
action is set to NULL.
An addition remove BUGON at __pirq_guest_unbind that appears to be
bogus and not needed anymore.
Thanks Paolo Bonzini for NACKing previous patch, and pointing at the
correct solution.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reinstate the BUG_ON, but after the action==NULL check. Since we then
go and start interpreting action as an irq_guest_action_t, the BUG_ON
is relevant here.
More generally, the brute-force nature of dynamic_irq_cleanup() looks
a bit worrying. Possibly there should be more integratioin with
pirq_guest_unbind() logic, for cleaning up un-acked EOIs and the like.
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Sat, 17 Sep 2011 15:44:56 +0000 (16:44 +0100)]
x86/time: verify_tsc_reliability() can be run as a generic initcall.
Signed-off-by: Keir Fraser <keir@xen.org>
Jan Beulich [Sat, 17 Sep 2011 15:27:36 +0000 (16:27 +0100)]
x86-64/EFI: 2.0 hypercall extensions
Flesh out the interface to EFI 2.0 runtime calls and implement what
can reasonably be without actually having active call paths getting
there (i.e. without actual debugging possible: The capsule interfaces
certainly require an environment where an initial implementation can
actually be tested).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 15:27:06 +0000 (16:27 +0100)]
x86-64/EFI: 2.0 header extensions
Updates from gnu-efi 3.0m. UEFI 2.0 runtime services additions taken
from EDK 1.06.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 15:26:37 +0000 (16:26 +0100)]
x86/vmx: don't call __vmxoff() blindly
If vmx_vcpu_up() failed, __vmxon() would generally not have got
(successfully) executed, and in that case __vmxoff() will #UD.
Additionally, any panic() during early resume (namely the tboot
related one) would cause vmx_cpu_down() to get executed without
vmx_cpu_up() having run before.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Sat, 17 Sep 2011 15:25:53 +0000 (16:25 +0100)]
x86/tboot: make resume error messages visible
With tboot_s3_resume() running before console_resume(), the error
messages so far printed by it are mostly guaranteed to go into
nirwana. Latch MACs into a static variable instead, and issue the
messages right before calling panic().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
George Dunlap [Sat, 17 Sep 2011 15:22:54 +0000 (16:22 +0100)]
xen: Move tsc reliability check until after CPUs have booted
AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a
later check to disable this feature if TSC drift is detected.
Unfortunately, this check is done in time.c:init_xen_time(), which is
done before any secondary CPUs are brought up, and is thus guaranteed
to succed.
This patch moves the check into its own function, and calls it after
cpus are brought up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Paul Durrant [Sat, 17 Sep 2011 15:22:13 +0000 (16:22 +0100)]
x86/hvm: Tidy up the viridian code a little and flesh out the APIC
assist MSR handling code.
We don't say we that handle that MSR but Windows assumes it. In
Windows 7 it just wrote to the MSR and we used to handle that
ok. Windows 8 also reads from the MSR so we need to keep a record of
the contents.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
James Carter [Sat, 17 Sep 2011 15:20:58 +0000 (16:20 +0100)]
xen/xsm: Compile error due to naming clash between XSM and EFI runtime
The problem is that efi_runtime_call is the name of both a function in
xen/arch/x86/efi/runtime.c and a member of the xsm_operations struct
in xen/include/xsm/xsm.h. This causes the macro "#define
efi_runtime_call(x) efi_compat_runtime_call(x)" on line 15 of
xen/arch/x86/x86_64/platform_hypercall.c to cause the above compile
error.
Renaming the XSM struct member fixes the problem.
Signed-off-by: James Carter <jwcart2@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Sat, 17 Sep 2011 15:19:26 +0000 (16:19 +0100)]
Avoid race in schedule() when switching schedulers
Selecting the scheduler to call must be done under lock. Otherwise a
race might occur when switching schedulers in a cpupool
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Olaf Hering [Fri, 16 Sep 2011 11:19:26 +0000 (12:19 +0100)]
mem_event: use different ringbuffers for share, paging and access
Up to now a single ring buffer was used for mem_share, xenpaging and
xen-access. Each helper would have to cooperate and pull only its own
requests from the ring. Unfortunately this was not implemented. And
even if it was, it would make the whole concept fragile because a crash
or early exit of one helper would stall the others.
What happend up to now is that active xenpaging + memory_sharing would
push memsharing requests in the buffer. xenpaging is not prepared for
such requests.
This patch creates an independet ring buffer for mem_share, xenpaging
and xen-access and adds also new functions to enable xenpaging and
xen-access. The xc_mem_event_enable/xc_mem_event_disable functions will
be removed. The various XEN_DOMCTL_MEM_EVENT_* macros were cleaned up.
Due to the removal the API changed, so the SONAME will be changed too.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Tim Deegan <tim@xen.org>
Olaf Hering [Fri, 16 Sep 2011 11:13:31 +0000 (12:13 +0100)]
mem_event: pass mem_event_domain pointer to mem_event functions
Pass a struct mem_event_domain pointer to the various mem_event
functions. This will be used in a subsequent patch which creates
different ring buffers for the memshare, xenpaging and memaccess
functionality.
Remove the struct domain argument from some functions.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
Juergen Gross [Thu, 15 Sep 2011 14:26:07 +0000 (15:26 +0100)]
libxc: Enable cpuid performance counter leaf for HVM
In HVM domains the usable performance counters can be checked
automatically only, if cpuid leaf 0x0000000a is accessible.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Olaf Hering [Thu, 15 Sep 2011 10:08:05 +0000 (11:08 +0100)]
xenstored: allow guest to shutdown all its watches/transactions
During kexec all old watches have to be removed, otherwise the new
kernel will receive unexpected events. Allow a guest to reset itself
and cleanup all of its watches and transactions.
Add a new XS_RESET_WATCHES command to do the reset on behalf of the
guest.
(Changes by iwj: specify the argument to be a single nul byte. Permit
read-only clients to use the new command.)
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 14 Sep 2011 10:38:13 +0000 (11:38 +0100)]
tools: Revert seabios and upstream qemu build changes
These have broken the build and it seems to be difficult to fix. So
we will revert the whole lot for now, and await corrected patch(es).
Revert "fix the build when CONFIG_QEMU is specified by the user"
Revert "tools: fix permissions of git-checkout.sh"
Revert "scripts/git-checkout.sh: Is not bash specific. Invoke with /bin/sh."
Revert "Clone and build Seabios by default"
Revert "Clone and build upstream Qemu by default"
Revert "Rename ioemu-dir as qemu-xen-traditional-dir"
Revert "Move the ioemu-dir-find shell script to an external file"
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Stefano Stabellini [Tue, 13 Sep 2011 14:46:47 +0000 (15:46 +0100)]
fix the build when CONFIG_QEMU is specified by the user
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Committed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Ian Jackson [Tue, 13 Sep 2011 13:52:22 +0000 (14:52 +0100)]
tools: fix permissions of git-checkout.sh
23828:
0d21b68f528b introduced a new scripts/git-checkout.sh, but it
had the wrong permissions. chmod +x it, and add a blank line at the
end to make sure it actually gets updated.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Tue, 13 Sep 2011 10:20:57 +0000 (11:20 +0100)]
scripts/git-checkout.sh: Is not bash specific. Invoke with /bin/sh.
Signed-off-by: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 13 Sep 2011 09:43:43 +0000 (10:43 +0100)]
xen,credit1: Add variable timeslice
Add a xen command-line parameter, sched_credit_tslice_ms,
to set the timeslice of the credit1 scheduler.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Andrew Cooper [Tue, 13 Sep 2011 09:33:10 +0000 (10:33 +0100)]
IRQ: IO-APIC support End Of Interrupt for older IO-APICs
The old io_apic_eoi() function using the EOI register only works for
IO-APICs with a version of 0x20. Older IO-APICs do not have an EOI
register so line level interrupts have to be EOI'd by flipping the
mode to edge and back, which clears the IRR and Delivery Status bits.
This patch replaces the current io_apic_eoi() function with one which
takes into account the version of the IO-APIC and EOI's
appropriately.
v2: make recursive call to __io_apic_eoi() to reduce code size.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Stefano Stabellini [Tue, 13 Sep 2011 09:32:24 +0000 (10:32 +0100)]
xen: if mapping GSIs we run out of pirq < nr_irqs_gsi, use the others
PV on HVM guests can have more GSIs than the host, in that case we
could run out of pirq < nr_irqs_gsi. When that happens use pirq >=
nr_irqs_gsi rather than returning an error.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Benjamin Schweikert <b.schweikert@googlemail.com>
Stefano Stabellini [Tue, 13 Sep 2011 09:30:09 +0000 (10:30 +0100)]
Clone and build Seabios by default
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Stefano Stabellini [Tue, 13 Sep 2011 09:29:14 +0000 (10:29 +0100)]
Clone and build upstream Qemu by default
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Stefano Stabellini [Tue, 13 Sep 2011 09:27:53 +0000 (10:27 +0100)]
Rename ioemu-dir as qemu-xen-traditional-dir
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Stefano Stabellini [Tue, 13 Sep 2011 09:27:20 +0000 (10:27 +0100)]
Move the ioemu-dir-find shell script to an external file
Add support for configuring upstream qemu and rename ioemu-remote
ioemu-dir-remote.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Olaf Hering [Tue, 13 Sep 2011 09:25:32 +0000 (10:25 +0100)]
xenpaging: use batch of pages during final page-in
Map up to RING_SIZE pages in exit path to fill the ring instead of
populating one page at a time.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Ian Campbell [Tue, 13 Sep 2011 09:22:03 +0000 (10:22 +0100)]
hvmloader: don't clear acpi_info after filling in some fields
In particular the madt_lapic0_addr and madt_csum_addr fields are
filled in while building the tables.
This fixes a bluescreen on shutdown with certain versions of Windows.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Christoph Egger <Christoph.Egger@amd.com>
Tested-and-acked-by: Christoph Egger <Christoph.Egger@amd.com>
Tim Deegan [Thu, 8 Sep 2011 14:13:06 +0000 (15:13 +0100)]
x86/mm: use new page-order interfaces in nested HAP code
to make 2M and 1G mappings in the nested p2m tables.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 8 Sep 2011 14:13:06 +0000 (15:13 +0100)]
x86/mm: adjust paging interface to return superpage sizes
to the caller of paging_ga_to_gfn_cr3()
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 8 Sep 2011 14:13:06 +0000 (15:13 +0100)]
x86/mm: adjust p2m interface to return superpage sizes
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
Olaf Hering [Wed, 7 Sep 2011 09:37:48 +0000 (10:37 +0100)]
p2m-ept: remove map_domain_page check
map_domain_page() can not fail, remove ASSERT in ept_set_entry().
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Jan Beulich [Wed, 7 Sep 2011 09:37:20 +0000 (10:37 +0100)]
x86: remove unnecessary indirection from irq_complete_move()'s sole parameter
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 7 Sep 2011 09:36:55 +0000 (10:36 +0100)]
bitmap_scnlistprintf() should always zero-terminate its output buffer
... as long as it has non-zero size. So far this would not happen if
the passed in CPU mask was empty.
Also fix the comment describing the return value to actually match
reality.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Keir Fraser [Tue, 6 Sep 2011 14:49:40 +0000 (15:49 +0100)]
docs: Fix 'make docs'
Signed-off-by: Keir Fraser <keir@xen.org>
Olaf Hering [Mon, 5 Sep 2011 14:10:28 +0000 (15:10 +0100)]
mem_event: use mem_event_mark_and_pause() in mem_event_check_ring()
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Olaf Hering [Mon, 5 Sep 2011 14:10:09 +0000 (15:10 +0100)]
mem_event: add ref counting for free requestslots
If mem_event_check_ring() is called by many vcpus at the same time
before any of them called also mem_event_put_request(), all of the
callers must assume there are enough free slots available in the ring.
Record the number of request producers in mem_event_check_ring() to
keep track of available free slots.
Add a new mem_event_put_req_producers() function to release a request
attempt made in mem_event_check_ring(). Its required for
p2m_mem_paging_populate() because that function can only modify the
p2m type if there are free request slots. But in some cases
p2m_mem_paging_populate() does not actually have to produce another
request when it is known that the same request was already made
earlier by a different vcpu.
mem_event_check_ring() can not return a reference to a free request
slot because there could be multiple references for different vcpus
and the order of mem_event_put_request() calls is not known. As a
result, incomplete requests could be consumed by the ring user.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Andrew Cooper [Mon, 5 Sep 2011 14:09:24 +0000 (15:09 +0100)]
IRQ: Introduce old_vector to irq_cfg
Introduce old_vector to irq_cfg with the same principle as
old_cpu_mask. This removes a brute force loop from
__clear_irq_vector(), and paves the way to correct bitrotten logic
elsewhere in the irq code.
Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 5 Sep 2011 14:08:38 +0000 (15:08 +0100)]
IRQ: Fold irq_status into irq_cfg
irq_status is an int for each of nr_irqs which represents a single
boolean variable. Fold it into the bitfield in irq_cfg, which saves
768 bytes per CPU with per-cpu IDTs in use.
Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 5 Sep 2011 14:02:11 +0000 (15:02 +0100)]
IRQ: Remove bit-rotten code
irq_desc.depth is a write only variable.
LEGACY_IRQ_FROM_VECTOR(vec) is never referenced.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
George Dunlap [Mon, 5 Sep 2011 14:00:46 +0000 (15:00 +0100)]
xen, vtd: Fix device check for devices behind PCIe-to-PCI bridges
On some systems, requests devices behind a PCIe-to-PCI bridge all
appear to the IOMMU as though they come from from slot 0, function 0
on that device; so the mapping code much punch a hole for X:0.0 in the
IOMMU for such devices. When punching the hole, if that device has
already been mapped once, we simply need to check ownership to make
sure it's legal. To do so, domain_context_mapping_one() will look up
the device for the mapping with pci_get_pdev() and look for the owner.
However, if there is no device in X:0.0, this look up will fail.
Rather than returning -ENODEV in this situation (causing a failure in
mapping the device), try to get the domain ownership from the iommu
context mapping itself.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
George Dunlap [Mon, 5 Sep 2011 14:00:15 +0000 (15:00 +0100)]
xen: Add global irq_vector_map option, set if using AMD global intremap tables
As mentioned in previous changesets, AMD IOMMU interrupt
remapping tables only look at the vector, not the destination
id of an interrupt. This means that all IRQs going through
the same interrupt remapping table need to *not* share vectors.
The irq "vector map" functionality was originally introduced
after a patch which disabled global AMD IOMMUs entirely. That
patch has since been reverted, meaning that AMD intremap tables
can either be per-device or global.
This patch therefore introduces a global irq vector map option,
and enables it if we're using an AMD IOMMU with a global
interrupt remapping table.
This patch removes the "irq-perdev-vector-map" boolean
command-line optino and replaces it with "irq_vector_map",
which can have one of three values: none, global, or per-device.
Setting the irq_vector_map to any value will override the
default that the AMD code sets.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Fri, 2 Sep 2011 13:56:26 +0000 (14:56 +0100)]
ns16550: Simplify UART and UART-interrupt probing logic.
1. No need to check for UART existence in the polling routine. We
already check for UART existence during boot-time initialisation (see
check_existence() function).
2. No obvious need to send a dummy character. The poll routine will
run until a character is eventually sent, but for the most common use
of serial ports (console logging) that will happen almost immediately.
Signed-off-by: Keir Fraser <keir@xen.org>
Ian Campbell [Thu, 1 Sep 2011 16:46:43 +0000 (17:46 +0100)]
xen/x86: only support >128 CPUs on x86_64
32 bit cannot cope with 256 cpus and hits:
/* At least half the ioremap space should be available to us. */
BUILD_BUG_ON(IOREMAP_VIRT_START + (IOREMAP_MBYTES << 19) >=
FIXADDR_START);
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tim Deegan [Thu, 1 Sep 2011 08:39:25 +0000 (09:39 +0100)]
x86/mm: use defines for page sizes rather hardcoding them.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
Stefano Stabellini [Wed, 31 Aug 2011 14:23:49 +0000 (15:23 +0100)]
xen: get_free_pirq: make sure that the returned pirq is allocated
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Stefano Stabellini [Wed, 31 Aug 2011 14:23:34 +0000 (15:23 +0100)]
xen: __hvm_pci_intx_assert should check for gsis remapped onto pirqs
If the isa irq corresponding to a particular gsi is disabled while the
gsi is enabled, __hvm_pci_intx_assert will always inject the gsi
through the violapic, even if the gsi has been remapped onto a pirq.
This patch makes sure that even in this case we inject the
notification appropriately.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Stefano Stabellini [Wed, 31 Aug 2011 14:23:12 +0000 (15:23 +0100)]
xen: fix hvm_domain_use_pirq's behavior
hvm_domain_use_pirq should return true when the guest is using a
certain pirq, no matter if the corresponding event channel is
currently enabled or disabled. As an additional complication, qemu is
going to request pirqs for passthrough devices even for Xen unaware
HVM guests, so we need to wait for an event channel to be connected
before considering the pirq of a passthrough device as "in use".
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Andrew Cooper [Wed, 31 Aug 2011 14:19:24 +0000 (15:19 +0100)]
IRQ: manually EOI migrating line interrupts
When migrating IO-APIC line level interrupts between PCPUs, the
migration code rewrites the IO-APIC entry to point to the new
CPU/Vector before EOI'ing it.
The EOI process says that EOI'ing the Local APIC will cause a
broadcast with the vector number, which the IO-APIC must listen to to
clear the IRR and Status bits.
In the case of migrating, the IO-APIC has already been
reprogrammed so the EOI broadcast with the old vector fails to match
the new vector, leaving the IO-APIC with an outstanding vector,
preventing any more use of that line interrupt. This causes a lockup
especially when your root device is using PCI INTA (megaraid_sas
driver *ehem*)
However, the problem is mostly hidden because send_cleanup_vector()
causes a cleanup of all moving vectors on the current PCPU in such a
way which does not cause the problem, and if the problem has occured,
the writes it makes to the IO-APIC clears the IRR and Status bits
which unlocks the problem.
This fix is distinctly a temporary hack, waiting on a cleanup of the
irq code. It checks for the edge case where we have moved the irq,
and manually EOI's the old vector with the IO-APIC which correctly
clears the IRR and Status bits. Also, it protects the code which
updates irq_cfg by disabling interrupts.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Kevin Tian [Wed, 31 Aug 2011 14:18:23 +0000 (15:18 +0100)]
x86: add irq count for IPIs
such count is useful to assist decision make in cpuidle governor,
while w/o this patch only device interrupts through do_IRQ is
currently counted.
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Dietmar Hahn [Wed, 31 Aug 2011 14:17:45 +0000 (15:17 +0100)]
vpmu: Add processors Westmere E7-8837 and SandyBridge i5-2500 to the vpmu list
Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Laszlo Ersek [Wed, 31 Aug 2011 14:16:14 +0000 (15:16 +0100)]
x86: Increase the default NR_CPUS to 256
Changeset 21012:
ef845a385014 bumped the default to 128 about one and a
half years ago. Increase it now to 256, as systems with eg. 160
logical CPUs are becoming (have become) common.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Christoph Egger [Wed, 31 Aug 2011 14:15:41 +0000 (15:15 +0100)]
nestedsvm: VMRUN doesn't use nextrip
VMRUN does not use nextrip. So remove pointless assignment.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Wed, 31 Aug 2011 14:14:49 +0000 (15:14 +0100)]
x86-64: Fix off-by-one error in __addr_ok() macro
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Ian Jackson [Tue, 30 Aug 2011 10:46:58 +0000 (11:46 +0100)]
Merge
Keir Fraser [Sat, 27 Aug 2011 11:20:19 +0000 (12:20 +0100)]
Config.mk: Include optional .config file *first* rather than *last*
Allows the core of Config.mk to correctly respond to any configuration
overrides specified in the .config file.
Signed-off-by: Keir Fraser <keir@xen.org>
Jan Beulich [Sat, 27 Aug 2011 11:15:07 +0000 (12:15 +0100)]
x86: drop unused parameter from msi_compose_msg() and setup_msi_irq()
This particularly eliminates the bogus passing of NULL by hpet.c.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Jan Beulich [Sat, 27 Aug 2011 11:14:38 +0000 (12:14 +0100)]
x86: work around certain Intel BIOSes causing (transient) hangs during boot
They apparently leave the USB legacy emulation bits set in ICH10's
SMI Control and Enable register, but fail to handle the resulting SMIs
gracefully. The hangs can apparently extend indefinitely, but are
commonly observed to last between a few seconds and a minute.
This assumes that only ICH10-based systems on Intel main boards with
Intel BIOS may be affected. Until Intel comes up with a more precise
identification of affected BIOSes, all Intel ones on Intel boards
will get this workaround applied.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Jan Beulich [Sat, 27 Aug 2011 11:13:39 +0000 (12:13 +0100)]
x86-64: allow mapping mmcfg space for high numbered PCI segments
Rather than using the segment number directly when determining the
virtual address for a particular mmconfig block, use the array index
instead. Thus a system with (perhaps significantly) less than 2048 PCI
segments, but with some having numbers beyond 2047 can actually have
all its mmconfig blocks mapped.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Kaushik Kumar Ram [Fri, 26 Aug 2011 13:58:41 +0000 (14:58 +0100)]
Add missing 'break' statement.
Without the 'break', assigning a pci device to a PV guest results in an abort,
since the code always falls through to the default abort case in the switch
statement.
Signed-off-by: Kaushik Kumar Ram <kaushik@rice.edu>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tim Deegan [Fri, 26 Aug 2011 12:06:39 +0000 (13:06 +0100)]
Update my email address in MAINTAINERS
Signed-off-by: Tim Deegan <tim@xen.org>
Christoph Egger [Fri, 26 Aug 2011 12:00:52 +0000 (13:00 +0100)]
x86/mm/p2m: use defines for page sizes
Use defines for page sizes instead of hardcoding the value.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 25 Aug 2011 11:03:14 +0000 (12:03 +0100)]
passthrough: Turn on IOMMU/HAP pagetable sharing by default.
Signed-off-by: Tim Deegan <tim@xen.org>
David Vrabel [Wed, 24 Aug 2011 08:33:10 +0000 (09:33 +0100)]
x86: don't limit dom0's maximum reservation by the available memory
Set dom0's initial maximum reservation using the max value supplied in
the dom0_mem command line option without limiting it by the available
memory.
This allows dom0 to make use of any hotplugged memory without having
to also adjust the maximum reservation.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Tim Deegan [Tue, 23 Aug 2011 09:54:27 +0000 (10:54 +0100)]
Passthrough: fix iommu_use_hap_pt() to use hap_enabled()
In line with 22924:
86000076dcee, paging_mode_hap(d) shouldn't be
used in HAP internals that are called during HAP setup.
Signed-off-by: Tim Deegan <tim@xen.org>
Tim Deegan [Tue, 23 Aug 2011 09:43:25 +0000 (10:43 +0100)]
IOMMU: only try to share IOMMU and HAP tables for domains with P2M.
This makes the check more precise, and brings VTd in line with AMD code.
Signed-off-by: Tim Deegan <tim@xen.org>
Tim Deegan [Tue, 23 Aug 2011 09:43:20 +0000 (10:43 +0100)]
VT-d: Explicitly test EPT capabilities during IOMMU init
because the cached version isn't set up until the EPT init happens.
Signed-off-by: Tim Deegan <tim@xen.org>
George Dunlap [Mon, 22 Aug 2011 15:15:33 +0000 (16:15 +0100)]
x86: Fix up irq vector map logic
We need to make sure that cfg->used_vector is only cleared once;
otherwise there may be a race condition that allows the same vector to
be assigned twice, defeating the whole purpose of the map.
This makes two changes:
* __clear_irq_vector() only clears the vector if the irq is not being
moved
* smp_iqr_move_cleanup_interrupt() only clears used_vector if this
is the last place it's being used (move_cleanup_count==0 after
decrement).
Also make use of asserts more consistent, to catch this kind of logic
bug in the future.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Mon, 22 Aug 2011 15:15:19 +0000 (16:15 +0100)]
Adjust non-debug ASSERT() definition to avoid unused-variable warnings.
Signed-off-by: Keir Fraser <keir@xen.org>
Christoph Egger [Mon, 22 Aug 2011 13:37:29 +0000 (14:37 +0100)]
nested-p2m: suppress np2m flushes during p2m setup
There is no need to send IPIs within p2m_alloc_table() via
set_p2m_entry().
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Mon, 22 Aug 2011 09:12:36 +0000 (10:12 +0100)]
ACPI: add _PDC input override mechanism
In order to have Dom0 call _PDC with input fully representing Xen's
capabilities, and in order to avoid building knowledge of Xen
implementation details into Dom0, this provides a mechanism by which
the Dom0 kernel can, once it filled the _PDC input buffer according to
its own knowledge, present the buffer to Xen to apply overrides for
the parts of the C-, P-, and T-state management that it controls. This
is particularly to address the dependency of Xen using MWAIT to enter
certain C-states on the availability of the break-on-interrupt
extension (which the Dom0 kernel should have no need to know about).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Jan Beulich [Mon, 22 Aug 2011 09:11:10 +0000 (10:11 +0100)]
x86/IO-APIC: clear remoteIRR in clear_IO_APIC_pin()
It was found that in a crash scenario, the remoteIRR bit in an IO-APIC
RTE could be left set, causing problems when bringing up a kdump
kernel. While this generally is most important to be taken care of in
the new kernel (which usually would be a native one), it still seems
desirable to also address this problem in Xen so that (a) the problem
doesn't bite Xen when used as a secondary emergency kernel and (b) an
attempt is being made to save un-fixed secondary kernels from running
into said problem.
Based on a Linux patch from suresh.b.siddha@intel.com.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Jan Beulich [Mon, 22 Aug 2011 09:10:39 +0000 (10:10 +0100)]
pm: don't truncate processors' ACPI IDs to 8 bits
This is just another adjustment to allow systems with very many CPUs
(or unusual ACPI IDs) to be properly power-managed.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Wei Wang [Mon, 22 Aug 2011 09:10:04 +0000 (10:10 +0100)]
AMD IOMMU: remove iommu tlb flush for non-present entries
Fixes dom0 boot on some systems.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
David Vrabel [Mon, 22 Aug 2011 09:05:27 +0000 (10:05 +0100)]
x86: use 'dom0_mem' to limit the number of pages for dom0
Use the 'dom0_mem' command line option to set the maximum number of
pages for dom0. dom0 can use then use the XENMEM_maximum_reservation
memory op to automatically find this limit and reduce the size of any
page tables etc.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Tim Deegan [Fri, 19 Aug 2011 12:29:27 +0000 (13:29 +0100)]
nestedhvm: avoid endless loop of nested page faults
Stop sending IPIs to flush the nested-on-nested pagetable
after write operations. Instead flush the TLB only.
This fixes an endless loop of nested page faults after
adding an entry to the nested-on-nested pagetable.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Fri, 19 Aug 2011 12:29:25 +0000 (13:29 +0100)]
nestedhvm: do not send IPIs twice
In p2m_get_nestedp2m() there is no need to send IPIs via
nestedhvm_vmcx_flushtlb() since p2m_flush_table() already
did that.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 19 Aug 2011 08:58:22 +0000 (09:58 +0100)]
x86/KEXEC: disable hpet legacy broadcasts earlier
On x2apic machines which booted in xapic mode,
hpet_disable_legacy_broadcast() sends an event check IPI to all online
processors. This leads to a protection fault as the genapic blindly
pokes x2apic MSRs while the local apic is in xapic mode.
One option is to change genapic when we shut down the local apic, but
there are still problems with trying to IPI processors in the online
processor map which are actually sitting in NMI loops
Another option is to have each CPU take itself out of the online CPU
map during the NMI shootdown.
Realistically however, disabling hpet legacy broadcasts earlier in the
kexec path is the easiest fix to the problem.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jeremy Fitzhardinge [Fri, 19 Aug 2011 08:57:42 +0000 (09:57 +0100)]
mini-os: work around ld bug causing stupid CTOR count
I'm seeing pvgrub crashing when running CTORs. It appears its because
the magic in the linker script is generating junk. If I get ld to
output a map, I see:
.ctors 0x0000000000097000 0x18
0x0000000000097000 __CTOR_LIST__ = .
0x0000000000097000 0x4 LONG 0x25c04
(((__CTOR_END__ - __CTOR_LIST__) / 0x4) - 0x2)
*(.ctors)
.ctors 0x0000000000097004 0x10
/home/jeremy/hg/xen/unstable/stubdom/mini-os-x86_32-grub/mini-os.o
0x0000000000097014 0x4 LONG 0x0
0x0000000000097018 __CTOR_END__ = .
In other words, somehow ((0x97018-0x97000) / 4) - 2 = 0x25c04
The specific crash is that the ctor loop tries to call the NULL
sentinel. I'm seeing the same with the DTOR list.
Avoid this by terminating the loop with the NULL sentinel, and get rid
of the CTOR count entirely.
From: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 19 Aug 2011 08:55:20 +0000 (09:55 +0100)]
x86-64/EFI: construct EDD data from device path protocol information
In the absence of a BIOS to handle INT13 requests, this information
must be constructed artificially instead when booted from EFI.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Jan Beulich [Fri, 19 Aug 2011 08:54:53 +0000 (09:54 +0100)]
x86: trampoline cleanup
To make future changes less error prone, and to slightly simplify a
possible future conversion to a relocatable trampoline even for the
multiboot path (pretty desirable given that we had to change the
trampoline base a number of times to escape collisions with firmware
placed data),
- remove final uses of bootsym_phys() from trampoline.S, allowing the
symbol to be undefined before including this file (to make sure no
new references get added)
- replace two easy to deal with uses of bootsym_phys() in head.S
- remove an easy to replace reference to BOOT_TRAMPOLINE
Signed-off-by: Jan Beulich <jbeulich@novell.com>