xen.git
16 years ago[IA64] Fix some IPF Xen VT-d bugs.
Isaku Yamahata [Tue, 6 Jan 2009 09:05:32 +0000 (18:05 +0900)]
[IA64] Fix some IPF Xen VT-d bugs.

In arch_domain_create(): when xen creates Dom0, need_iommu(d) is false,
so iommu_domain_init() is not invoked, as a result, eventually iommu is
not enabled properly.
Note: d->need_iommu is set to 1 only by assign_device() which is never
called for dom0. And it is called via XEN_DOMCTL_assign_device hypercall.

In IA64 Xen, physdev_map_pirq()/physdev_unmap_pirq() are kept dummy since
we don't support MSI in IA64 Xen now, but here they shouldn't return
-ENOSYS because xend invokes them (the x86 version of them is necessary
for x86 Xen); in IPF Xen if they return -ENOSYS, xend would disallow us
to create IPF HVM guest with devices assigned. Here They can return 0 instead.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years ago[IA64] fix ia64_fast_eoi hypercall to catch up PHYSDEVOP_pirq_eoi_gmfn
Isaku Yamahata [Mon, 5 Jan 2009 05:13:38 +0000 (14:13 +0900)]
[IA64] fix ia64_fast_eoi hypercall to catch up PHYSDEVOP_pirq_eoi_gmfn

ia64 xen Linux uses ia64_fast_eoi to do eoi. So the c/s 18862:f0a9a58608a0
should also have changed od_pir_guest_eoi() too.
This patch changes it.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years ago[IA64] fix mis-setting ed bit for itlb entry for hvm domain.
Isaku Yamahata [Mon, 5 Jan 2009 03:24:58 +0000 (12:24 +0900)]
[IA64] fix mis-setting ed bit for itlb entry for hvm domain.

This patch fixes a windows BSOD issue caused by mis-setting pte's ED bit
for itlb entry.
For hash vTLB, it uses unified tlb and doesn't differentiate itc and dtc
in its implementation, so itlb_miss handler may reference dtlb entry in
hash vTLB.
But it may result in issues, because dtlb's ED bit may be different with
itlb's setting.
Since the case is very rare, so just purge the corresponding entry in hash
vTLB and let guest OS to determin how to set ED bit for itlb mapping once
found it.

Signed-off-by : Xiantao Zhang <xiantao.zhang@intel.com>

16 years ago[IA64] paravirtualize itc and support save/restore.
Isaku Yamahata [Mon, 5 Jan 2009 03:24:58 +0000 (12:24 +0900)]
[IA64] paravirtualize itc and support save/restore.

ia64 linux 2.6.18 only use ar.itc for local ticks so that
ar.itc didn't need paravirtualization and it can be work arounded
when save/restore.
However recent ia64 linux uses ar.itc for sched_clock() and
CONFIG_VIRT_CPU_ACCOUNTING and other issues. So ar.itc needs
paravirtualization. Although Most part is done in guest OS,
save/restore needs hypervisor support.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years ago[IA64] remove warning.
Isaku Yamahata [Mon, 5 Jan 2009 03:24:58 +0000 (12:24 +0900)]
[IA64] remove warning.

This patch removes the following warning.
> hypercall.c:205: warning: implicit declaration of function 'vmx_lazy_load_fpu'

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agomerge with xen-unstable.hg
Isaku Yamahata [Wed, 24 Dec 2008 03:52:34 +0000 (12:52 +0900)]
merge with xen-unstable.hg

16 years ago[IA64]: Fix BUILD_BUG_ON().
Isaku Yamahata [Wed, 24 Dec 2008 03:50:57 +0000 (12:50 +0900)]
[IA64]: Fix BUILD_BUG_ON().

This is ia64 counter part of 1419a73316e1.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years ago[IA64]: fix compilation error.
Isaku Yamahata [Wed, 24 Dec 2008 03:50:55 +0000 (12:50 +0900)]
[IA64]: fix compilation error.

BUILD_BUG_ON() was changed so that now BUILD_BUG_ON() can't be
used with symbol values.
Fortunately dom_fpswa_hypercall_patch() isn't performance critical
so replace BUILD_BUG_ON() with BUG_ON().
Fixed the wrong condition which has off-by-one bug.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoi386: Fix the build.
Keir Fraser [Mon, 22 Dec 2008 13:48:40 +0000 (13:48 +0000)]
i386: Fix the build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoshadow: Remove warnings about writes to read-only BIOS area. These
Keir Fraser [Mon, 22 Dec 2008 13:43:13 +0000 (13:43 +0000)]
shadow: Remove warnings about writes to read-only BIOS area. These
attempts can be legitimate.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoCleanup Intel CMCI support.
Keir Fraser [Mon, 22 Dec 2008 12:07:20 +0000 (12:07 +0000)]
Cleanup Intel CMCI support.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoEnable CMCI for Intel CPUs
Keir Fraser [Mon, 22 Dec 2008 08:12:33 +0000 (08:12 +0000)]
Enable CMCI for Intel CPUs

Signed-off-by Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by Liping Ke <liping.ke@intel.com>

17 years agoSupport S3 for MSI interrupt
Keir Fraser [Fri, 19 Dec 2008 14:56:36 +0000 (14:56 +0000)]
Support S3 for MSI interrupt

From: "Jiang, Yunhong" <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoChange the pcidevs_lock from rw_lock to spin_lock
Keir Fraser [Fri, 19 Dec 2008 14:52:32 +0000 (14:52 +0000)]
Change the pcidevs_lock from rw_lock to spin_lock

As pcidevs_lock is changed from protecting only the alldevs_list to
more than that, it doesn't benifit too much from the rw_lock. Also the
previous patch 18906:2941b1a97c60 is wrong to use read_lock to protect some
sensitive data (thanks Espen pointed out that).

Also two minor fix in this patch:
a) deassign_device will deadlock when try to get the pcidevs_lock if
called by pci_release_devices, remove the lock to the caller.
b) The iommu_domain_teardown should not ASSERT for the pcidevs_lock
because it just update the domain's vt-d mapping.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
17 years agoCPUIDLE: adjust cstate statistic interface
Keir Fraser [Fri, 19 Dec 2008 14:44:40 +0000 (14:44 +0000)]
CPUIDLE: adjust cstate statistic interface

1. change unit of residency, PM ticks -> ns.
2. output C0 usage & residency.

Signed-off-by: Wei Gang <gang.wei@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoVT-d: Fix PCI-X device assignment
Keir Fraser [Fri, 19 Dec 2008 13:42:04 +0000 (13:42 +0000)]
VT-d: Fix PCI-X device assignment

When assign PCI device, current code just map its bridge and its
secondary bus number and devfn 0. It doesn't work for PCI-x device
assignment, because the request may be the source-id in the original
PCI-X transaction or the source-id provided by the bridge. It needs to
map the device itself, and its upstream bridges till PCIe-to-PCI/PCI-x
bridge.

In addition, add description for DEV_TYPE_PCIe_BRIDGE and
DEV_TYPE_PCI_BRIDGE for understandability.

Signed-off-by: Weidong Han <weidong.han@intel.com>
17 years agoxend: Actually restrict a domU's access to xenstore when we mean to --
Keir Fraser [Thu, 18 Dec 2008 17:18:28 +0000 (17:18 +0000)]
xend: Actually restrict a domU's access to xenstore when we mean to --
this means that in some cases it cannot be owner of its own xenstore
nodes.

This bug was pointed out by Daniel Berrange at Red Hat. This patch is
my own more generic fix that automatically covers a range of callers
(albeit the patch is arguably a bit of a hack ;-).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: Quieten tracing in msi startup/shutdown handlers.
Keir Fraser [Thu, 18 Dec 2008 17:14:27 +0000 (17:14 +0000)]
x86: Quieten tracing in msi startup/shutdown handlers.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agorombios: Update to Bochs latest
Keir Fraser [Thu, 18 Dec 2008 14:52:53 +0000 (14:52 +0000)]
rombios: Update to Bochs latest

Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
17 years agoxenoprof: Add support for Intel Dunnington cores.
Keir Fraser [Thu, 18 Dec 2008 11:29:33 +0000 (11:29 +0000)]
xenoprof: Add support for Intel Dunnington cores.

Signed-off-by: Xiaowei Yang <Xiaowei.yang@intel.com>
Signed-off-by: Ting Zhou <ting.g.zhou@intel.com>
17 years agox86, shadow: Avoid duplicates in fixup tables.
Keir Fraser [Thu, 18 Dec 2008 11:28:25 +0000 (11:28 +0000)]
x86, shadow: Avoid duplicates in fixup tables.

Avoid entering duplicates in fixup tables, reducing fixup evictions.

Signed-off-by: Gianluca Guida <gianluca.guida@eu.citrix.com>
17 years agoFix mini-os ia64 compilation
Keir Fraser [Thu, 18 Dec 2008 11:27:37 +0000 (11:27 +0000)]
Fix mini-os ia64 compilation

- Avoid nested function to avoid a trampoline.
- Do not link mini-os_app.o when it is empty.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
17 years agox86, hvm: Don't ever call the shadow code to fix a page fault in an
Keir Fraser [Wed, 17 Dec 2008 11:36:22 +0000 (11:36 +0000)]
x86, hvm: Don't ever call the shadow code to fix a page fault in an
external-mode guest if the fault came from Xen; it would be making
changes to the wrong pagetables, potentially causing a pagefault loop
in Xen.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
17 years agoxenpm: add cpu frequency control interface, through which user can
Keir Fraser [Tue, 16 Dec 2008 13:14:25 +0000 (13:14 +0000)]
xenpm: add cpu frequency control interface, through which user can
tune the parameters manually.

Now, xenpm can be invoked with the following options:
Usage:
       xenpm get-cpuidle-states [cpuid]: list cpu idle information on
       CPU cpuid or all CPUs.
       xenpm get-cpufreq-states [cpuid]: list cpu frequency
       information on CPU cpuid or all CPUs.
       xenpm get-cpufreq-para [cpuid]: list cpu frequency information
       on CPU cpuid or all CPUs.
       xenpm set-scaling-maxfreq <cpuid> <HZ>: set max cpu frequency
       <HZ> on CPU <cpuid>.
       xenpm set-scaling-minfreq <cpuid> <HZ>: set min cpu frequency
       <HZ> on CPU <cpuid>.
       xenpm set-scaling-governor <cpuid> <name>: set scaling governor
       on CPU <cpuid>.
       xenpm set-scaling-speed <cpuid> <num>: set scaling speed on CPU
       <cpuid>.
       xenpm set-sampling-rate <cpuid> <num>: set sampling rate on CPU
       <cpuid>.
       xenpm set-up-threshold <cpuid> <num>: set up threshold on CPU
       <cpuid>.

To ease the use of this tool, the shortcut option is supported,
i.e. `xenpm get-cpui' is equal to `xenpm get-cpuidle-states'.

Signed-off-by: Guanqun Lu <guanqun.lu@intel.com>
17 years agox86: Update xen-detect utility to scan for Xen signature in CPUID space.
Keir Fraser [Tue, 16 Dec 2008 12:04:13 +0000 (12:04 +0000)]
x86: Update xen-detect utility to scan for Xen signature in CPUID space.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agomini-os: Make utility function get_self_id() in fs-front.c public.
Keir Fraser [Tue, 16 Dec 2008 12:00:25 +0000 (12:00 +0000)]
mini-os: Make utility function get_self_id() in fs-front.c public.

Signed-off-by: Yosuke Iwamatsu <y-iwamatsu@ab.jp.nec.com>
17 years agox86: Simpler time handling when TSC is constant across all power saving states.
Keir Fraser [Tue, 16 Dec 2008 11:59:22 +0000 (11:59 +0000)]
x86: Simpler time handling when TSC is constant across all power saving states.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Signed-off-by: Gang Wei <gang.wei@intel.com>
17 years agovmx: Do not disable real EFER.NXE even when disabled by guest.
Keir Fraser [Tue, 16 Dec 2008 11:54:11 +0000 (11:54 +0000)]
vmx: Do not disable real EFER.NXE even when disabled by guest.

We must not disable EFER.NXE in host mode since shadow code relies on
accessing shadow mappings with NX set.

We do not want to write EFER on every vmentry/vmexit if we can avoid
it, since it will be somewhat slow.

Finally, we don't believe that any guest relies on NX really being
disabled when EFER.NXE is cleared.

This given, it makes sense to ignore the guest's setting of EFER.NXE.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: Enable MTF for HVM guest single step in gdb
Keir Fraser [Tue, 16 Dec 2008 11:49:20 +0000 (11:49 +0000)]
x86: Enable MTF for HVM guest single step in gdb

Signed-off-by: Edwin Zhai <edwin.zhai@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: Decode CPUID for TSC guarantees.
Keir Fraser [Mon, 15 Dec 2008 11:37:14 +0000 (11:37 +0000)]
x86: Decode CPUID for TSC guarantees.

Signed-off-by: Wei Gang <gang.wei@intel.com>
17 years agorombios: fix references to EBDA
Keir Fraser [Mon, 15 Dec 2008 11:23:22 +0000 (11:23 +0000)]
rombios: fix references to EBDA

Extended Bios Data Area (EBDA) can be relocated by the initialization
of PCI option ROM. The IPL boot table is also.
EBDA must be accessed via 0x40E after the initialization.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
17 years agox86: Small cleanups to time handling.
Keir Fraser [Mon, 15 Dec 2008 11:17:14 +0000 (11:17 +0000)]
x86: Small cleanups to time handling.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoxenpmd: Fix bogus fgets() size parameter.
Keir Fraser [Sat, 13 Dec 2008 17:44:20 +0000 (17:44 +0000)]
xenpmd: Fix bogus fgets() size parameter.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: Clean up and simplify rwlock implementation.
Keir Fraser [Sat, 13 Dec 2008 15:56:16 +0000 (15:56 +0000)]
x86: Clean up and simplify rwlock implementation.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoClean up use of spin_is_locked() and introduce rw_is_locked().
Keir Fraser [Sat, 13 Dec 2008 15:28:10 +0000 (15:28 +0000)]
Clean up use of spin_is_locked() and introduce rw_is_locked().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoxc_pm: Fix off-by-one error in string array access.
Keir Fraser [Sat, 13 Dec 2008 15:04:53 +0000 (15:04 +0000)]
xc_pm: Fix off-by-one error in string array access.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agox86: Fix early time initialisation after recent changes.
Keir Fraser [Sat, 13 Dec 2008 15:02:55 +0000 (15:02 +0000)]
x86: Fix early time initialisation after recent changes.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoIA64: quieten PV fp fault/trap handler.
Isaku Yamahata [Fri, 12 Dec 2008 01:43:39 +0000 (10:43 +0900)]
IA64: quieten PV fp fault/trap handler.

Now fp fault/trap is handled correctly except the case fpswa
returns error. So quieten the handler.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: fix panic caused by daccess fault.
Isaku Yamahata [Fri, 12 Dec 2008 01:36:23 +0000 (10:36 +0900)]
IA64: fix panic caused by daccess fault.

While fpswa emulation, Xen VMM access guest virtual address space
which may cause daccess fault resulting in panic.
This patch make daccess fault handler handle such cases properly.

(XEN) Xen BUG at faults.c:583
(XEN) FIXME: implement ia64 dump_execution_state()
(XEN)
(XEN) Call Trace:
(XEN)  [<f4000000040fe360>] show_stack+0x90/0xb0
(XEN)                                 sp=f0000002b6067940 bsp=f0000002b6061860
(XEN)  [<f4000000040fee70>] dump_stack+0x30/0x50
(XEN)                                 sp=f0000002b6067b10 bsp=f0000002b6061840
(XEN)  [<f4000000040640d0>] __bug+0x70/0xa0
(XEN)                                 sp=f0000002b6067b10 bsp=f0000002b6061810
(XEN)  [<f4000000040b53b0>] ia64_handle_reflection+0x60/0x13b0
(XEN)                                 sp=f0000002b6067b10 bsp=f0000002b60617b8
(XEN)  [<f4000000040f5b40>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f0000002b6067b20 bsp=f0000002b60617b8
(XEN)  [<f4000000040c3a20>] __get_domain_bundle+0x0/0x40
(XEN)                                 sp=f0000002b6067d20 bsp=f0000002b6061778
(XEN)  [<f4000000040bee20>] vcpu_get_domain_bundle+0xb0/0xa10
(XEN)                                 sp=f0000002b6067d20 bsp=f0000002b60616e8
(XEN)  [<f4000000040b3f20>] handle_fpu_swa+0x360/0x4a0
(XEN)                                 sp=f0000002b6067d60 bsp=f0000002b6061660
(XEN) vcpu.c:1371: vcpu_get_domain_bundle gip 0x40000000000008a0
(XEN)  [<f4000000040b5e90>] ia64_handle_reflection+0xb40/0x13b0
(XEN)                                 sp=f0000002b6067df0 bsp=f0000002b6061610
(XEN) vcpu.c:1371: vcpu_get_domain_bundle gip 0x4000000000000730
(XEN) faults.c:343:d6 handle_fpu_swa(fault): floating-point bundle at 0x4000000000000730 not mapped
(XEN)  [<f4000000040f5b40>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f0000002b6067e00 bsp=f0000002b6061610
(XEN) vcpu.c:1371: vcpu_get_domain_bundle gip 0x40000000000008a0
(XEN) faults.c:343:d6 handle_fpu_swa(fault): floating-point bundle at 0x40000000000008a0 not mapped
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 15:
(XEN) Xen BUG at faults.c:583
(XEN) ****************************************

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: make the fpswa emulation keep the previous behaviour.
Isaku Yamahata [Fri, 12 Dec 2008 01:35:58 +0000 (10:35 +0900)]
IA64: make the fpswa emulation keep the previous behaviour.

When fpswa library return statue > 0, keep the previous behavior.
This case should be addressed somehow later, but it seems somewhat
difficult to resolve, so keep the previous behavor for now.
It is assumed that a guest kernel calls fpswa library
without preemption. This assumption breaks if a guest kernel is
preemptive.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: fix fp fault/trap handler.
Isaku Yamahata [Fri, 12 Dec 2008 01:34:18 +0000 (10:34 +0900)]
IA64: fix fp fault/trap handler.

This patch is a part of fixes to bug reported as
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1392

When fpswa handler fails to get a bundle in guest,
fp fault/trap should be injected into the guest and let a guest
to handle it.
When the fpswa library return a error, there is no way to
pass the value to the guest. In that case, just inject fpswa
fault/trap into a guest running a risk that guest may get
error with their own fpswa call. Here it is assumed that
no applications depend on SIGFP process signal to recover
their computation.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: fix emulation of fp emulation in pv domain
Isaku Yamahata [Fri, 12 Dec 2008 01:29:15 +0000 (10:29 +0900)]
IA64: fix emulation of fp emulation in pv domain

This patch is a part of fixes to bug reported as
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1392

When vmm fails to get a bundle in a question during fpswa processing,
there is no way, but a guest provides the bundle.
On the other hand the current implementation just returns random value.
This patch make the fpswa hypercall calling convention complicated and
pass necessary informations to the hypervisor.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoxentop: Fix fprintf() build failure.
Keir Fraser [Thu, 11 Dec 2008 22:32:20 +0000 (22:32 +0000)]
xentop: Fix fprintf() build failure.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agohvmloader: enable bus mastering of PCI device
Keir Fraser [Thu, 11 Dec 2008 13:26:02 +0000 (13:26 +0000)]
hvmloader: enable bus mastering of PCI device

Without this, init routine in some PCI option ROM doesn't work well.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
17 years agox86: enable interrupts explicitly in __start_xen()
Keir Fraser [Thu, 11 Dec 2008 13:25:28 +0000 (13:25 +0000)]
x86: enable interrupts explicitly in __start_xen()

Instead of relying on smp_prepare_cpus() (via check_nmi_watchdog()) or
init_xen_time() (via init_platform_timer() -> plt_overflow())
implicitly enabling interrupts, enable them explicitly once safe to do
so (it may actually be possible to move this even further up, but I
don't think that would buy us much).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Also move spin_debug_enable() a bit higer. Moving it above
smp_prepare_cpus() didn't work for some reason though!

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: Clean up early time setup.
Keir Fraser [Thu, 11 Dec 2008 13:10:19 +0000 (13:10 +0000)]
x86: Clean up early time setup.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agovga: Only vga_endboot() if vga_init() completed.
Keir Fraser [Thu, 11 Dec 2008 13:09:59 +0000 (13:09 +0000)]
vga: Only vga_endboot() if vga_init() completed.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86, cpufreq: Change cpufreq_driver->get so that it can get other
Keir Fraser [Thu, 11 Dec 2008 11:49:37 +0000 (11:49 +0000)]
x86, cpufreq: Change cpufreq_driver->get so that it can get other
cpu's real physical freq.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
17 years agoRe-enable MSI support
Keir Fraser [Thu, 11 Dec 2008 11:48:19 +0000 (11:48 +0000)]
Re-enable MSI support

Currently the MSI is disabled because of some lock issue. This patch
tries to clean up the locking related to MSI lock.

Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
17 years agox86: fix the potential of encountering panic "IO-APIC + timer doesn't work! ..."
Keir Fraser [Thu, 11 Dec 2008 11:40:10 +0000 (11:40 +0000)]
x86: fix the potential of encountering panic "IO-APIC + timer doesn't work! ..."

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Linux commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4aae07025265151e3f7041dfbf0f529e122de1d8

x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!"

Under rare circumstances we found we could have an IRQ0 entry while we
are in the middle of setting up the local APIC, the i8259A and the
PIT. That is certainly not how it's supposed to work! check_timer()
was supposed to be called with irqs turned off - but this eroded away
sometime in the past. This code would still work most of the time
because this code runs very quickly, but just the right timing
conditions are present and IRQ0 hits in this small, ~30 usecs window,
timer irqs stop and the system does not boot up. Also, given how early
this is during bootup, the hang is very deterministic - but it would
only occur on certain machines (and certain configs).

The fix was quite simple: disable/restore interrupts properly in this
function. With that in place the test-system now boots up just fine.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
17 years agox86: unify local_irq_XXX()
Keir Fraser [Thu, 11 Dec 2008 11:36:00 +0000 (11:36 +0000)]
x86: unify local_irq_XXX()

This also removes an inconsistency in that x86-64's __save_flags() had
a memory clobber, while x86_32's didn't.

It further adds type checking since blindly using {pop,push}{l,q} on a
memory operand of unknown size bares the risk of corrupting other
data.

Finally, it eliminates the redundant (with local_irq_restore())
__restore_flags() macro and renames __save_flags() to
local_save_flags(), making the naming consistent with Linux (again?).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agorombios: fix rom_scan (ja->jmp)
Keir Fraser [Thu, 11 Dec 2008 11:32:39 +0000 (11:32 +0000)]
rombios: fix rom_scan (ja->jmp)

Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
17 years agoFix a typo caused by 18898.
Keir Fraser [Thu, 11 Dec 2008 11:30:13 +0000 (11:30 +0000)]
Fix a typo caused by 18898.

new state is updated too early.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
17 years agocpufreq: Short path avoiding IPI in critical fast path.
Keir Fraser [Thu, 11 Dec 2008 11:27:49 +0000 (11:27 +0000)]
cpufreq: Short path avoiding IPI in critical fast path.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agolibxc: Fix xc_pm.c build by avoiding bogus header includes.
Keir Fraser [Thu, 11 Dec 2008 11:19:27 +0000 (11:19 +0000)]
libxc: Fix xc_pm.c build by avoiding bogus header includes.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoFix BUILD_BUG_ON()
Keir Fraser [Thu, 11 Dec 2008 11:19:01 +0000 (11:19 +0000)]
Fix BUILD_BUG_ON()

As was noticed on the Linux side, using an array here isn't appropriate
if the condition is not a compile time constant - gcc allows such
arrays, and hence the intended effect of producing a compiler error is
not achieved in that case. Bit field widths do not know similar
language extensions, and hence always produce a compiler error.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agoAvoid negative runstate pieces.
Keir Fraser [Wed, 10 Dec 2008 14:05:41 +0000 (14:05 +0000)]
Avoid negative runstate pieces.

Also consolidate all places to get cpu idle time.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoInitialize state_entry_time to zero for all idle vcpus
Keir Fraser [Wed, 10 Dec 2008 13:41:34 +0000 (13:41 +0000)]
Initialize state_entry_time to zero for all idle vcpus

NOW() is not usable since xen time sub-system hasn't
been initialized yet. On my box, it gives a initial
stamp ~60s due to local tsc stamp as zero and TSC
count is started from power on. Then a negative value
is added to runstate of that idle vcpu at schedule
point. The net effect is for some tool like xenpm
to show a big idle time gap between BSP and other APs.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: Make MCE panic message more obvious
Keir Fraser [Wed, 10 Dec 2008 13:30:10 +0000 (13:30 +0000)]
x86: Make MCE panic message more obvious

Make it more obvious to the untrained user that machine check reboots
are hardware faults, rather then just saying "CPU context corrupt".

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
17 years agogdbserver: Fix build failure.
Keir Fraser [Wed, 10 Dec 2008 13:28:58 +0000 (13:28 +0000)]
gdbserver: Fix build failure.

From: Edwin Zhai <edwin.zhai@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoAdd user PM control interface
Keir Fraser [Wed, 10 Dec 2008 13:27:41 +0000 (13:27 +0000)]
Add user PM control interface

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
17 years agoAdd cpufreq governors: performance, powersave, userspace
Keir Fraser [Wed, 10 Dec 2008 13:27:14 +0000 (13:27 +0000)]
Add cpufreq governors: performance, powersave, userspace

This patch add 3 more governors beside original running ondemand
cpufreq governor.
performance governor is with best performance, keeping cpu always
running at highest freq;
powersave governor is with best power save effect, keeping cpu always
running at lowest freq;
userspace governor provide user setting freq ability;

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
17 years agolibxc: Fix memory leak in zlib usage
Keir Fraser [Wed, 10 Dec 2008 13:14:13 +0000 (13:14 +0000)]
libxc: Fix memory leak in zlib usage

Any call to inflate() must be followed by inflateEnd(), otherwise the
internal zlib state is leaked.

Signed-off-by: Kevin Wolf <kwolf@suse.de>
17 years agoIA64: fix efi_emulate_set_virtual_address_map()
Isaku Yamahata [Wed, 10 Dec 2008 06:39:47 +0000 (15:39 +0900)]
IA64: fix efi_emulate_set_virtual_address_map()

get_page() before touching guest pages.
Otherwise pages may be freed during those operations.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: improve handle_fpu_swa()
Isaku Yamahata [Wed, 10 Dec 2008 06:39:46 +0000 (15:39 +0900)]
IA64: improve handle_fpu_swa()

It tries to get a bundle in guest.
Make it more robust using vmx_get_domain_bundle() instead of
__get_domain_bundle().

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: use symbolic constant for hypercall.
Isaku Yamahata [Wed, 10 Dec 2008 06:39:44 +0000 (15:39 +0900)]
IA64: use symbolic constant for hypercall.

define symbolic names for hypercall number and use them.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoUse virtual 8086 mode for VMX guests with CR0.PE == 0
Keir Fraser [Tue, 9 Dec 2008 16:28:02 +0000 (16:28 +0000)]
Use virtual 8086 mode for VMX guests with CR0.PE == 0

When a VMX guest tries to enter real mode, put it in virtual 8086 mode
instead, if that's possible.  Handle all errors and corner cases by
falling back to the real-mode emulator.

This is similar to the old VMXASSIST system except it uses Xen's
x86_emulate emulator instead of having a partial emulator in the guest
firmware.  It more than doubles the speed of real-mode operation on
VMX.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
17 years agovga: Fix screen clear at end of Xen bootstrap.
Keir Fraser [Tue, 9 Dec 2008 13:23:15 +0000 (13:23 +0000)]
vga: Fix screen clear at end of Xen bootstrap.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agopv-on-hvm: add pvSCSI frontend
Keir Fraser [Tue, 9 Dec 2008 13:06:19 +0000 (13:06 +0000)]
pv-on-hvm: add pvSCSI frontend

Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com>
Signed-off-by: Jun Kamada <kama@jp.fujitsu.com>
17 years agopv-on-hvm: fix for Centos 5.2
Keir Fraser [Tue, 9 Dec 2008 13:00:52 +0000 (13:00 +0000)]
pv-on-hvm: fix for Centos 5.2

From: Yoshisato YANAGISAWA <yanagisawa.yoshisato@lab.ntt.co.jp>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoVT-d: check return value of pirq_guest_bind()
Keir Fraser [Tue, 9 Dec 2008 12:55:29 +0000 (12:55 +0000)]
VT-d: check return value of pirq_guest_bind()

The eliminates a hypervisor crash when the respective domain dies or
gets the device hot removed.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Reviewed-by: Weidong Han <weidong.han@intel.com>
17 years agotools: Fix a few error-path memory leaks.
Keir Fraser [Tue, 9 Dec 2008 12:53:19 +0000 (12:53 +0000)]
tools: Fix a few error-path memory leaks.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoxend: Remember bootable flag for vbds in xenstore
Keir Fraser [Tue, 9 Dec 2008 12:45:45 +0000 (12:45 +0000)]
xend: Remember bootable flag for vbds in xenstore

When xend is restarted, bootable flags of all disk devices are lost
and then the first disk is marked as bootable by a "compatibility
hack". When a guest domain is created with a mixture of several vbd
and tap devices, the compatibility hack may fail to choose the right
bootable device. Thus preventing the guest to be restarted. This patch
fixes this behavior by remembering bootable flag for each disk device
in xenstore database.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
17 years agoxend: Fix memory allocation bug after hvm reboot in numa system
Keir Fraser [Tue, 9 Dec 2008 12:44:32 +0000 (12:44 +0000)]
xend: Fix memory allocation bug after hvm reboot in numa system

Recently we find a bug on Nahelem machine (totally with two nodes, 6G
memory (3G in each node):
- Start a HVM guest with its all VCPUS pinned to node1, so all its
memory is allocated from node1.
- Reboot the HVM.
- There will be some memory allocated from node0 even there is enough
free memory on node1.

Reason: For security issues, xen will not put all the pages of a dying
hvm to domheap directly, but put them in scrub list and wait for handled
by page_scrub_softirq(). If the dying hvm have a lot of memory,
page_scrub_softirq() will not handle all of them before the start the
hvm. There are some pages belong to node1 still in scrub list, new hvm
can't use pages in it. So this hvm will get different memory
distribution than before. Before changeset 18304, page_scrub_softirq()
can be excuted parallel between all the cpus. Changeset 18305
serialise page_scrub_softirq() and Changeset 18307 serialise
page_scrub_softirq() with a new lock to avoid holding up acquiring
page_scrub_lock in free_domheap_pages(). Those changeset slow the ability
to handle pages in scrub list. So the bug becomes more obvious after.

Patch: This patch modifiers balloon.free to avoid this bug. After
patch, balloon.free will check whether current machine is a numa
system and the new created hvm has all its vcpus in the same node. If
all the conditions above fit, we will wait until all the pages in
scrub list are freed (if waiting time go beyond 20s, we will stop
waiting it.).

This seems to be too restricted at the first glance. We used to only
wait for the free memory size of pinned node is bigger than
required. But as we know HVM memory alloction granularity is 2M. Even
the former condition is satisfied, we still may not find enough
2M-size memory on that node.

Signed-off-by: Ting Zhou <ting.g.zhou@intel.com>
Signed-off-by: Xiaowei Yang <Xiaowei.yang@intel.com>
17 years agolibxc: Fix gcc 4.3 build failure
Keir Fraser [Tue, 9 Dec 2008 12:42:18 +0000 (12:42 +0000)]
libxc: Fix gcc 4.3 build failure

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
17 years agorombios: support BCV
Keir Fraser [Tue, 9 Dec 2008 12:41:12 +0000 (12:41 +0000)]
rombios: support BCV

Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
17 years agoFix domain save when guest is in S3.
Keir Fraser [Fri, 5 Dec 2008 15:54:22 +0000 (15:54 +0000)]
Fix domain save when guest is in S3.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agox86: make an error message more precise
Keir Fraser [Fri, 5 Dec 2008 15:24:12 +0000 (15:24 +0000)]
x86: make an error message more precise

... allowing to distinguish whether the to be added or the already
existing PIRQ binding is causing the failure.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agocpufreq: allow customization of some parameters
Keir Fraser [Fri, 5 Dec 2008 15:23:32 +0000 (15:23 +0000)]
cpufreq: allow customization of some parameters

Short of having a way for powersaved to dynamically adjust these
values, at least allow specifying them on the command line. In
particular, always running at an up-threshold of 80% is perhaps nice
for laptop use, but certainly not desirable on servers. On shell
scripts invoking large numbers of short-lived processes I noticed a
50% performance degradation on a dual-socket quad-core Barcelona just
because of the load of an individual core never crossing the 80%
boundary that would have resulted in increasing the frequency.

(Powersaved on SLE10 sets this on native kernels to 60% or 80%,
depending on whether performance or power reduction is preferred,
*divided* by the number of CPUs, but capped at the lower limit of
20%.)

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agox86/cpufreq: reduce verbosity
Keir Fraser [Fri, 5 Dec 2008 15:22:43 +0000 (15:22 +0000)]
x86/cpufreq: reduce verbosity

These messages don't exist in powernow's equivalent code, and are
pretty useless anyway, hence just cluttering the logs.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agopowernow: implement struct cpufreq_driver.verify
Keir Fraser [Fri, 5 Dec 2008 15:22:21 +0000 (15:22 +0000)]
powernow: implement struct cpufreq_driver.verify

Without this, under rare conditions hypervisor crashes are possible
due to this method being called without checking against NULL.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agox86/32on64: adjust address when converting syscall to fault
Keir Fraser [Fri, 5 Dec 2008 15:21:59 +0000 (15:21 +0000)]
x86/32on64: adjust address when converting syscall to fault

The faulting address is at the start of the syscall instruction rather
than at the following one.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
17 years agox86, time: Fix scale_reciprocal().
Keir Fraser [Fri, 5 Dec 2008 14:46:38 +0000 (14:46 +0000)]
x86, time: Fix scale_reciprocal().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agominios: Clip memory not usable by Mini-OS (above 1GB)
Keir Fraser [Fri, 5 Dec 2008 13:06:57 +0000 (13:06 +0000)]
minios: Clip memory not usable by Mini-OS (above 1GB)

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
17 years agocpuidle: revise tsc-save/restore to reduce tsc skew between cpus
Keir Fraser [Fri, 5 Dec 2008 13:03:44 +0000 (13:03 +0000)]
cpuidle: revise tsc-save/restore to reduce tsc skew between cpus

Signed-off-by: Wei Gang <gang.wei@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agovga: Clear the screen when relinquishing VGA to dom0.
Keir Fraser [Fri, 5 Dec 2008 11:37:20 +0000 (11:37 +0000)]
vga: Clear the screen when relinquishing VGA to dom0.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoxentrace: trace interrupt window
Keir Fraser [Fri, 5 Dec 2008 11:05:45 +0000 (11:05 +0000)]
xentrace: trace interrupt window

Make a specific interrupt-window trace, with information about why the
interrupt in question can't be delivered.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
17 years agoVT-d code cleanup
Keir Fraser [Fri, 5 Dec 2008 10:59:41 +0000 (10:59 +0000)]
VT-d code cleanup

This patch narrow context caching flush range from the
domain-selective to the device-selective, when unmapping a device.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
17 years agomerge with xen-unstable.hg
Isaku Yamahata [Fri, 5 Dec 2008 06:47:19 +0000 (15:47 +0900)]
merge with xen-unstable.hg

17 years agoIA64: implement PHYSDEVOP_pirq_eoi_gmfn and related stuff.
Isaku Yamahata [Fri, 5 Dec 2008 06:43:08 +0000 (15:43 +0900)]
IA64: implement PHYSDEVOP_pirq_eoi_gmfn and related stuff.

This patch is ia64 counter part of 18844:c820bf73a914.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agoIA64: eliminate NR_IRQ_VECTORS. ia64 part.
Isaku Yamahata [Fri, 5 Dec 2008 06:43:06 +0000 (15:43 +0900)]
IA64: eliminate NR_IRQ_VECTORS. ia64 part.

This is ia64 counter part of 18802:935bd48f096a which eliminates
NR_IRQ_VECTORS.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
17 years agodocs: Add description of BUILD_BUG_ON().
Keir Fraser [Thu, 4 Dec 2008 16:36:43 +0000 (16:36 +0000)]
docs: Add description of BUILD_BUG_ON().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoFix one timer range issue
Keir Fraser [Thu, 4 Dec 2008 14:12:08 +0000 (14:12 +0000)]
Fix one timer range issue

According to the timer sematic, the timer can be executed at any timer
within [expires, expires_end], however, current implementation only allow
timer to be executed after expires_end, which is not conform to the timer
semantics.

This patch fix the the SPECpower score regression (~5% downgrade)
introduced by changeset 18744 "Change timer implementation to allow
variable 'slop'"

Signed-off-by: Yu Ke <ke.yu@intel.com>
17 years agoNew document on error handling in Xen.
Keir Fraser [Thu, 4 Dec 2008 12:35:22 +0000 (12:35 +0000)]
New document on error handling in Xen.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
17 years agoFix existence check for MMIO-mapped 16550 UARTs
Keir Fraser [Thu, 4 Dec 2008 11:36:18 +0000 (11:36 +0000)]
Fix existence check for MMIO-mapped 16550 UARTs

Changeset 982e6fce0e47 added an existence test for UARTs.
Unfortunately, the existence test happens before MMIO UARTs are
ioremapped, therefore it may not be probing where it thinks it's
probing.  Rather than moving more code around, I think it's probably
safe to assume the arch code knows what it's doing if it passes in an
MMIO UART.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
17 years agoxm: Fix xm block-list for inactive managed domains
Keir Fraser [Thu, 4 Dec 2008 11:32:43 +0000 (11:32 +0000)]
xm: Fix xm block-list for inactive managed domains

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
17 years agoxend: Remember bootloader settings in xenstore
Keir Fraser [Thu, 4 Dec 2008 11:31:37 +0000 (11:31 +0000)]
xend: Remember bootloader settings in xenstore

When xend is restarted, bootloader settings of all running domains are
lost. The attached patches fixes this by saving bootloader and
bootloader_args to xenstore database.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
17 years agomerge with xen-unstable.hg
Isaku Yamahata [Thu, 4 Dec 2008 02:01:53 +0000 (11:01 +0900)]
merge with xen-unstable.hg

17 years agoxentop: Fix xentop for blktap
Keir Fraser [Wed, 3 Dec 2008 15:58:23 +0000 (15:58 +0000)]
xentop: Fix xentop for blktap

Blktap devices information isn't shown by xentop currently.

xen-unstable c/s 17813 said "blktap devices have statistics
counters (e.g., rd_req, wr_req, oo_req) prepended by tap_".
In fact, it is as follows.

# ls -l /sys/devices/xen-backend/tap-1-769/statistics/
total 0
-r--r--r-- 1 root root 4096 Dec  3 20:37 oo_req
-r--r--r-- 1 root root 4096 Dec  3 20:37 rd_req
-r--r--r-- 1 root root 4096 Dec  3 20:37 rd_sect
-r--r--r-- 1 root root 4096 Dec  3 20:37 wr_req
-r--r--r-- 1 root root 4096 Dec  3 20:37 wr_sect

The statistics counters haven't had "tap_" because it was removed
by linux-2.6.18-xen c/s 34.

This patch reverts xen-unstable c/s 17813, then we can get the
blktap devices information by using xentop.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
17 years agoAMD IOMMU: Invalidate all pages on domain destruction
Keir Fraser [Wed, 3 Dec 2008 15:56:33 +0000 (15:56 +0000)]
AMD IOMMU: Invalidate all pages on domain destruction

Attached patch adds support to invalidate all pages associated with
the same domain ID on domain destruction.

Signed-off-by: Wei Wang <wei.wang2@amd.com>