xen.git
11 years agox86: clear AC bit in RFLAGS to protect Xen itself by SMAP
Feng Wu [Mon, 12 May 2014 15:01:47 +0000 (17:01 +0200)]
x86: clear AC bit in RFLAGS to protect Xen itself by SMAP

Clear AC bit in RFLAGS at the beginning of exception, interrupt, hypercall,
so Xen itself can be protected by SMAP mechanism. This patch also sets AC
bit at the beginning of double_fault and fatal_trap() to reduce the likelihood
of taking a further fault while trying to dump state.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86: add support for STAC/CLAC instructions
Feng Wu [Mon, 12 May 2014 15:00:39 +0000 (17:00 +0200)]
x86: add support for STAC/CLAC instructions

The STAC/CLAC instructions are only available when SMAP feature is
available, but on the other hand they aren't needed if SMAP is not
enabled, or before we start to run userspace, in that case, the
functions and macros do nothing.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agotools/pygrub: Fix error handling if no valid partitions are found
Andrew Cooper [Sat, 10 May 2014 01:18:33 +0000 (02:18 +0100)]
tools/pygrub: Fix error handling if no valid partitions are found

If no partitions at all are found, pygrub never creates the name 'fs',
resulting in a NameError indicating the lack of fs, rather than a
RuntimeError explaining that no partitions were found.

Set fs to None right at the start, and use the pythonic idiom "if fs is None:"
to protect against otherwise valid values for fs which compare equal to
0/False.

Reported-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoclarify SHUTDOWN_suspend additional argument
Stefano Stabellini [Thu, 8 May 2014 15:43:08 +0000 (16:43 +0100)]
clarify SHUTDOWN_suspend additional argument

Clarify the behaviour of SCHEDOP_shutdown: PV x86 guests need to pass a
third argument, that is unused on HVM and ARM guests.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/libxc: Issue individual DPRINTF()s rather than multiline ones.
Andrew Cooper [Fri, 9 May 2014 09:59:58 +0000 (10:59 +0100)]
tools/libxc: Issue individual DPRINTF()s rather than multiline ones.

For libxc users who log to syslog, this results in legible logging, rather
than long lines with #012's replacing newlines.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: arm: bitops take unsigned int
Ian Campbell [Thu, 8 May 2014 15:13:55 +0000 (16:13 +0100)]
xen: arm: bitops take unsigned int

Xen bitmaps can be 4 rather than 8 byte aligned, so use the appropriate type.
Otherwise the compiler can generate unaligned 8 byte accesses and cause traps.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agopvh dom0: Add checks and restrictions for p2m_is_foreign
Mukesh Rathor [Mon, 12 May 2014 10:10:13 +0000 (12:10 +0200)]
pvh dom0: Add checks and restrictions for p2m_is_foreign

In this patch, we add some checks and restrictions in the relevant
p2m paths for p2m_is_foreign.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoadd the facility to limit ranges per rangeset
Paul Durrant [Mon, 12 May 2014 10:04:45 +0000 (12:04 +0200)]
add the facility to limit ranges per rangeset

A subsequent patch exposes rangesets to secondary emulators, so to allow a
limit to be placed on the amount of xenheap that an emulator can cause to be
consumed, the function rangeset_limit() has been created to set the allowed
number of ranges in a rangeset. By default, there is no limit.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoadd an implentation of asprintf() for xen
Paul Durrant [Mon, 12 May 2014 10:03:57 +0000 (12:03 +0200)]
add an implentation of asprintf() for xen

Also needed to fix vsnprintf() et al so it can be called with a NULL buf
(and zero size, of course).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoioreq-server: on-demand creation of ioreq server
Paul Durrant [Mon, 12 May 2014 10:03:19 +0000 (12:03 +0200)]
ioreq-server: on-demand creation of ioreq server

This patch only creates the ioreq server when the legacy HVM parameters
are read (by an emulator).

A lock is introduced to protect access to the ioreq server by multiple
emulator/tool invocations should such an eventuality arise. The guest is
protected by creation of the ioreq server only being done whilst the
domain is paused.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agoioreq-server: create basic ioreq server abstraction
Paul Durrant [Mon, 12 May 2014 10:02:20 +0000 (12:02 +0200)]
ioreq-server: create basic ioreq server abstraction

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions. The
lock in the hvm_ioreq_page served two different purposes and has been
replaced by separate locks in the hvm_ioreq_server structure.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agoioreq-server: centralize access to ioreq structures
Paul Durrant [Mon, 12 May 2014 10:01:43 +0000 (12:01 +0200)]
ioreq-server: centralize access to ioreq structures

To simplify creation of the ioreq server abstraction in a subsequent patch,
this patch centralizes all use of the shared ioreq structure and the
buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.

The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
because the former may now be passed a data structure on stack, in which
case the barrier is unnecessary.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
11 years agoioreq-server: pre-series tidy up
Paul Durrant [Mon, 12 May 2014 10:00:30 +0000 (12:00 +0200)]
ioreq-server: pre-series tidy up

This patch tidies up various parts of the code that following patches move
around. If these modifications were combined with the code motion it would
be easy to miss them.

There's also some function renaming to reflect purpose and a single
whitespace fix.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agoNested VMX: load current_vmcs only when it exists
Edmund H White [Mon, 12 May 2014 09:59:19 +0000 (11:59 +0200)]
Nested VMX: load current_vmcs only when it exists

There may not have valid vmcs on current CPU. So only load it when it exists.

This original fixing is from Edmud <edmund.h.white@intel.com>.

Signed-off-by: Edmund H White <edmund.h.white@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agopvh dom0: construct_dom0 changes
Mukesh Rathor [Thu, 8 May 2014 12:18:27 +0000 (14:18 +0200)]
pvh dom0: construct_dom0 changes

This patch changes construct_dom0() to boot in pvh mode:
  - Make sure dom0 elf supports pvh mode.
  - Call guest_physmap_add_page for pvh rather than simple p2m setting
  - Map all non-RAM regions 1:1 upto the end region in e820 or 4GB which
    ever is higher.
  - Allocate p2m, copying calculation from toolstack.
  - Allocate shared info page from the virtual space so that dom0 PT
    can be updated. Then update p2m for it with the actual mfn.
  - Since we build the page tables for pvh same as for pv, in
    pvh_fixup_page_tables_for_hap we replace the mfns with pfns.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agox86: remove c_identify of the struct cpu_dev
Yi Li [Thu, 8 May 2014 12:06:10 +0000 (14:06 +0200)]
x86: remove c_identify of the struct cpu_dev

After commit 44e24f85674d (x86: don't call generic_identify() redundantly)
the struct cpu_dev don't need the c_identify.

Signed-off-by: Yi Li <peteryili@tencent.com>
11 years agognttab: don't flush the TLB on grant ops for auto-translated guests
Roger Pau Monné [Thu, 8 May 2014 12:05:35 +0000 (14:05 +0200)]
gnttab: don't flush the TLB on grant ops for auto-translated guests

For auto-translated guests the p2m code will do the necessary TLB
flushes, so there's no need to perform any TLB flushes in generic
grant table code.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/P2M: p2m_change_type() should pass on error from p2m_set_entry()
Jan Beulich [Thu, 8 May 2014 11:59:33 +0000 (13:59 +0200)]
x86/P2M: p2m_change_type() should pass on error from p2m_set_entry()

Modify the function's name to help eventual backports involving this
function, and in one case where this is trivially possible also stop
ignoring its return value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/P2M: pass on errors from p2m_set_entry()
Jan Beulich [Thu, 8 May 2014 11:58:46 +0000 (13:58 +0200)]
x86/P2M: pass on errors from p2m_set_entry()

... at least in a couple of straightforward cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agodomctl: tighten XEN_DOMCTL_*_permission
Jan Beulich [Thu, 8 May 2014 11:57:12 +0000 (13:57 +0200)]
domctl: tighten XEN_DOMCTL_*_permission

With proper permission (and, for the I/O port case, wrap-around) checks
added (note that for the I/O port case a count of zero is now being
disallowed, in line with I/O memory handling):

XEN_DOMCTL_irq_permission:
XEN_DOMCTL_ioport_permission:

 Of both IRQs and I/O ports there is only a reasonably small amount, so
 there's no excess resource consumption involved here. Additionally
 they both have a specialized XSM hook associated.

XEN_DOMCTL_iomem_permission:

 While this also has a specialized XSM hook associated (just like
 XEN_DOMCTL_{irq,ioport}_permission), it's not clear whether it's
 reasonable to expect XSM to restrict the number of ranges associated
 with a domain via this hook (which is the main resource consumption
 item here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Detect comparator values in the past
Don Slutz [Fri, 2 May 2014 20:18:08 +0000 (16:18 -0400)]
hvm/hpet: Detect comparator values in the past

This statement only works using 64-bit arithmetic for the main
                                     63
counter never changing by more then 2  .  (Which is a boundary
case that should not happen in my life time.)

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Prevent master clock equal to comparator while enabled
Don Slutz [Fri, 2 May 2014 20:18:07 +0000 (16:18 -0400)]
hvm/hpet: Prevent master clock equal to comparator while enabled

Based on the software-developers-hpet-spec-1-0a.pdf, the comparator
for a periodic timer will change to the new value when it matches
the master clock.  The current code here uses a very standard
rounding formula of "((x + y - 1) / y) * y".  This is wrong because
in this case you need to go to the next comparator value when "x"
equals "y". Not when "x + 1" equals "y".  In this case "y" is the
period and "x" is the master clock.

The code lines:

    elapsed = hpet_read_maincounter(h, guest_time) +
        period - 1 - comparator;
    comparator += (elapsed / period) * period;

are what matter here.

Using some numbers to help show the issue:

hpet_read_maincounter(h, guest_time) = 130252
period = 62500

comparator       : 130252
elapsed          : 62499
elapsed/period   : 0
comparator_delta : 0
new comparator   : 130252

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: comparator can only change when master clock is enabled.
Don Slutz [Fri, 2 May 2014 20:18:06 +0000 (16:18 -0400)]
hvm/hpet: comparator can only change when master clock is enabled.

This is based on software-developers-hpet-spec-1-0a.pdf saying:

When the main counter value matches the value in the timer's
comparator register, an interrupt can be generated.  The hardware
will then automatically increase the value in the compare register
by the last value written to that register.

When the overall enable is off (the main count is halted), none of
the compare registers should change.

The code lines:

    elapsed = hpet_read_maincounter(h, guest_time) +
        period - 1 - comparator;
    comparator += (elapsed / period) * period;

are what matter here.  They will always adjust comparator to be no
more then one period away.

Using some numbers to help show the issue:

hpet_read_maincounter(h, guest_time) = 67752
period = 62500
comparator = 255252 == 67752 + 3 * 62500

comparator       : 255252
elapsed          : -125001
elapsed/period   : -2
comparator_delta : -125000
new comparator   : 130252

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Init comparator64 like comparator.
Don Slutz [Fri, 2 May 2014 20:18:05 +0000 (16:18 -0400)]
hvm/hpet: Init comparator64 like comparator.

The software-developers-hpet-spec-1-0a.pdf says that the comparator
starts as all 1's.  Also make the hidden register comparator64 the same.

Since only the hidden register comparator64 is used by hpet_save, it
needs to start out with the right value.

A disabled hpet (like when a guest is starting), should start with
the value the spec says.  Both the guest (via reading the
comparator) and an administrator using xen-hvmctx, will see all 0's
not all 1's.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: In hpet_save, call hpet_get_comparator.
Don Slutz [Fri, 2 May 2014 20:18:04 +0000 (16:18 -0400)]
hvm/hpet: In hpet_save, call hpet_get_comparator.

This changes save data to consistent/expected values.  It is not
technically required because hpet_get_comparator() will adjust from
any value to the correct value. And hpet_get_comparator() is
effectivly called in hpet_load via hpet_set_timer.

However it does look strange to people that the output from
xen-hvmctx for the comparator values do not change when the master
clock does.

The software-developers-hpet-spec-1-0a.pdf says that the comparator
will allways be greater than master clock for a periodic timer.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: In hpet_save, correctly compute mc64.
Don Slutz [Fri, 2 May 2014 20:18:03 +0000 (16:18 -0400)]
hvm/hpet: In hpet_save, correctly compute mc64.

When the master clock is not enabled, mc64 has the right value.

Basicly do the same thing as hpet_read_maincounter().

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Correctly limit period to a maximum.
Don Slutz [Fri, 2 May 2014 20:18:02 +0000 (16:18 -0400)]
hvm/hpet: Correctly limit period to a maximum.

In the code section after the comment:

    /*
     * Clamp period to reasonable min/max values:
     *  - minimum is 100us, same as timers controlled by vpt.c
     *  - maximum is to prevent overflow in time_after() calculations
     */

The current maximum limit actually allows "bad" values like 0 and 1.
This is because it uses a mask not a maximum.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Only set comparator or period not both.
Don Slutz [Fri, 2 May 2014 20:18:01 +0000 (16:18 -0400)]
hvm/hpet: Only set comparator or period not both.

The current code sets both.  If setting the comparator also set
comparator64 (the hidden version).

Based on:

software-developers-hpet-spec-1-0a.pdf

A write call should only change comparator or period, not both.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Only call guest_time_hpet(h) one time per action.
Don Slutz [Fri, 2 May 2014 20:18:00 +0000 (16:18 -0400)]
hvm/hpet: Only call guest_time_hpet(h) one time per action.

This call is expensive and will cause extra time to pass.

The software-developers-hpet-spec-1-0a.pdf does not say how long it
takes after the main clock is enabled before the first change of the
master clock.  Therefore multiple calls to guest_time_hpet(h) are
not needed.  Since each timer is started by a loop, each ones start
time will change on the multple calls.  In the real hardware, there
is not delta based on which timer.

Without this change it is possible for an HVM guest running linux to
get the message:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

On the guest console(s); and the guest will panic.

Also Xen hypervisor console with be flooded with:

vioapic.c:352:d1 Unsupported delivery mode 7
vioapic.c:352:d1 Unsupported delivery mode 7
vioapic.c:352:d1 Unsupported delivery mode 7

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Add manual unit test code.
Don Slutz [Fri, 2 May 2014 20:17:59 +0000 (16:17 -0400)]
hvm/hpet: Add manual unit test code.

Add the code at tools/tests/vhpet.

See comment in tools/tests/vhpet/main.c for details on running
either in a xen source tree or elsewhere.

A basic in source tree usage is:

make -C tools/tests/vhpet run

Does repro the bug:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

The make file includes coping hpet.c and hpet.h from the source
tree.  hpet.c is then modifed to remove all include file and add the
emul.h include file.

The manual test code has only a few automatic checks that output
messages to stderr:

1) Possible ..MP-BIOS bug: 8254 timer...
   if 1st period is not <= the expected value

2) hpet_set_mode(%ld): T%d Error: Set ...
   if read of comparator != write of comparator in

3) hpet_check_stopped(%ld): T%d Error: Set ...
   if read != write

4) main(%ld): With clock stopped mc64 changed: ...
   if hpet_save returns different master clock values when called
   more then once.

It also generates a lot of output, which is why the sugested way to
use includes a redirect of stdout to a file.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxenstat: don't leak memory in getBridge
Matthew Daley [Sun, 4 May 2014 08:31:47 +0000 (20:31 +1200)]
xenstat: don't leak memory in getBridge

getBridge's method of returning a result was a little confused:
allocating a result buffer but never using it.

Simplify by instead allowing a result buffer to be passed in and
modifying the single usage to match.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxenstat: fix unsigned less-than-0 comparison
Matthew Daley [Sun, 4 May 2014 08:31:46 +0000 (20:31 +1200)]
xenstat: fix unsigned less-than-0 comparison

Commit 1438d36f ("xenstat: Fix buffer over-run with new_domains being
negative.") attempted to fix the handling of a negative error result
from xc_domain_getinfolist in xenstat_get_node. However, it forgot to
change the result variable from an unsigned type to a signed one.

Do so, allowing the error result to be handled properly.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agonetif.h: Document xen-net{back, front} multi-queue feature
Andrew J. Bennieston [Tue, 6 May 2014 11:03:18 +0000 (12:03 +0100)]
netif.h: Document xen-net{back, front} multi-queue feature

Document the multi-queue feature in terms of XenStore keys to be written
by the backend and by the frontend.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/libxl: add direct_io_safe to check-xl-disk-parse
Olaf Hering [Mon, 5 May 2014 13:30:28 +0000 (15:30 +0200)]
tools/libxl: add direct_io_safe to check-xl-disk-parse

Add missing bool "direct_io_safe" to expected output. It was added by
Commit 6ec48cf4 ("libxl: introduce an option for disabling the
non-O_DIRECT workaround"), but check-xl-disk-parse was not updated.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86: reduce redundancy in tsc_[gs]et_info()
Jan Beulich [Wed, 7 May 2014 14:36:11 +0000 (16:36 +0200)]
x86: reduce redundancy in tsc_[gs]et_info()

- some of the case statements are effectively or mostly special cases
  of others, so there's no good reason not to share the code
- in the "get" function, a variable can be made case-wide instead of
  having multiple instance of it (and those even with a pointless
  initializer)
- minor formatting adjustments

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agocredit2: use unique names
Juergen Gross [Wed, 7 May 2014 14:35:24 +0000 (16:35 +0200)]
credit2: use unique names

Avoid name duplicated with the credit scheduler. This makes live easier when
debugging with tools like cscope or crash.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
11 years agox86: merge stuff from asm-x86/x86_64/asm_defns.h to asm-x86/asm_defns.h
Feng Wu [Tue, 6 May 2014 11:55:27 +0000 (13:55 +0200)]
x86: merge stuff from asm-x86/x86_64/asm_defns.h to asm-x86/asm_defns.h

This patch move stuff unchanged from asm-x86/x86_64/asm_defns.h to
asm-x86/asm_defns.h

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agox86: move common_interrupt to entry.S
Feng Wu [Tue, 6 May 2014 11:54:16 +0000 (13:54 +0200)]
x86: move common_interrupt to entry.S

This patch moves label common_interrupt from asm_defns.h to entry.S and
convert SAVE_ALL from a C to an assembler macro.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86: define macros CPUINFO_features and CPUINFO_FEATURE_OFFSET
Feng Wu [Tue, 6 May 2014 11:51:27 +0000 (13:51 +0200)]
x86: define macros CPUINFO_features and CPUINFO_FEATURE_OFFSET

This patch defines macros CPUINFO_features and CPUINFO_FEATURE_OFFSET.
CPUINFO_features can be used as the base of the offset for cpu features,
while CPUINFO_FEATURE_OFFSET is used to define the right offset for
specific CPU feature.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Some further cleanup (both to the patch and to surrounding code).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agox86,amd_ucode: verify max allowed patch size before apply
Aravind Gopalakrishnan [Tue, 6 May 2014 11:39:05 +0000 (13:39 +0200)]
x86,amd_ucode: verify max allowed patch size before apply

Each family has a stipulated max patch_size. Use this as
additional sanity check before we apply it.

Also, tone down the amount of debug messages and
Follow microcode_intel's implementation of pr_debug.

While at it, fix comment at very top to indicate we support ucode
patch loading from fam10h and higher.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
11 years agox86/time: cpuid_time_leaf() cleanup
Andrew Cooper [Tue, 6 May 2014 11:33:46 +0000 (13:33 +0200)]
x86/time: cpuid_time_leaf() cleanup

* Don't mix uint32_t and unsigned int between prototype and definition
* Don't bitwise or with 0

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoNPT: temporarily retain page table mapping in do_recalc()
Jan Beulich [Tue, 6 May 2014 11:30:31 +0000 (13:30 +0200)]
NPT: temporarily retain page table mapping in do_recalc()

Commit b3e024f3 ("x86/NPT: don't walk page tables when changing types
on a range") neglected the fact that p2m_next_level() replaces the
previous level's mapping with the new level's one, hence dereferencing
a stale pointer the translation for which may no longer be available
(timing dependent). Add a parameter to that function allowing the
caller to request that the mapping be retained (the unmapping will be
taken care of by the caller then).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agolibxl: introduce an option for disabling the non-O_DIRECT workaround
Stefano Stabellini [Wed, 30 Apr 2014 15:06:24 +0000 (16:06 +0100)]
libxl: introduce an option for disabling the non-O_DIRECT workaround

Document and implement a new option that permits disk backends which
would otherwise have to avoid O_DIRECT (because of the network memory
lifetime bug) to use it anyway.  This is:
 direct-io-safe   in the xl domain disk config specification
 direct_io_safe   in the libxl disk API
 direct-io-safe   in the backend xenstore interface

Add a reference to xen/include/public/io/blkif.h in
docs/misc/vbd-interface.txt.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Felipe Franciosi <felipe@paradoxo.org>
11 years agolibxl: Rerun bison
Ian Jackson [Fri, 2 May 2014 16:47:55 +0000 (17:47 +0100)]
libxl: Rerun bison

This updates libxlu_cfg_y.[ch] to code generated by bison from
Debian wheezy (1:2.5.dfsg-2.1 i386).

There should be no functional change since there is no change to the
source file, but we will inherit bugfixes and behavioural changes from
the new version of bison.  So this is more a matter of hope than
knowledge.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoxen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle
Julien Grall [Tue, 22 Apr 2014 13:14:24 +0000 (14:14 +0100)]
xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle

Code adapted from linux drivers/of/base.c (commit ef42c58).

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: vtimer: rename vcpu_domain_init into domain_vtimer_init
Julien Grall [Thu, 1 May 2014 12:31:15 +0000 (13:31 +0100)]
xen/arm: vtimer: rename vcpu_domain_init into domain_vtimer_init

The current function name vcpu_domain_init doesn't reflect what the function
does and might be misused.

Rename it into domain_vtimer_init.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: bail from placement on non-NUMA boxes
Dario Faggioli [Wed, 30 Apr 2014 15:44:13 +0000 (17:44 +0200)]
libxl: bail from placement on non-NUMA boxes

If there only is 1 NUMA node, no need to go through placement
candidate selection, etc., we can just bail.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/mfn-dump: Fixes to 'dump-p2m'
Andrew Cooper [Thu, 24 Apr 2014 21:06:27 +0000 (22:06 +0100)]
tools/mfn-dump: Fixes to 'dump-p2m'

* Don't walk off the end of p2m_table under the mistaken impression that it
  contains toolstack unsigned longs.  Despite its array type it contains guest
  unsigned longs so unconditionally needs casting to the guest width to use
  correctly.  Furthermore, a 64bit toolstack must be extra careful when it
  finds a 32bit guest's INVALID_MFN.

* Drop 'mapped' and 'pinned' descriptions.  This are both bogus, including all
  uses of the is_mapped() macro.

* Rearrange the type name printing to be more concise.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/misc: Fix linkage of libxenstore
Andrew Cooper [Thu, 24 Apr 2014 21:17:57 +0000 (22:17 +0100)]
tools/misc: Fix linkage of libxenstore

* xen-mfndump doesn't use xenstore at all.  Don't link against it.

* xen-hptool can include the correct header rather than externing itself a
  single function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/libxl: fix typo in main_tmem_freeable
Olaf Hering [Tue, 29 Apr 2014 09:09:55 +0000 (11:09 +0200)]
tools/libxl: fix typo in main_tmem_freeable

missing letter 'b'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agovtpmmgr: properly remove t_uint size dependency
Daniel De Graaf [Mon, 28 Apr 2014 23:29:10 +0000 (19:29 -0400)]
vtpmmgr: properly remove t_uint size dependency

Rather than using the internal MPI format for the Diffie-Hellman group,
whose representation depends on the size of the t_uint type, store the
value as a big-endian integer and use mpi_read_binary to convert it in
an architecture-independent manner.  This patch also removes the
unnecessary range check on the exponent which ended up being different
between 32- and 64-bit code.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agobuild: export CC value to SeaBIOS
Roger Pau Monne [Wed, 16 Apr 2014 14:13:31 +0000 (16:13 +0200)]
build: export CC value to SeaBIOS

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agogdbsx: remove cast from ioctl
Roger Pau Monne [Wed, 16 Apr 2014 14:13:30 +0000 (16:13 +0200)]
gdbsx: remove cast from ioctl

The ulong type is not defined on FreeBSD, and the cast seems
pointless, so just remove it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
11 years agoxenstat: add a dummy FreeBSD implementation
Roger Pau Monne [Wed, 16 Apr 2014 14:13:29 +0000 (16:13 +0200)]
xenstat: add a dummy FreeBSD implementation

Add an empty FreeBSD implementation so xenstat can compile on FreeBSD.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: add support for OS-specific names to backend interfaces
Roger Pau Monne [Wed, 16 Apr 2014 14:13:24 +0000 (16:13 +0200)]
libxl: add support for OS-specific names to backend interfaces

libxl__device_nic_devname used to hardcode backend network interfaces
as "vif<domid>.<handle>", remove this limitation and allow libxl to
deal with OS-specific interface names.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxenstore: add some missing headers
Roger Pau Monne [Wed, 16 Apr 2014 14:13:20 +0000 (16:13 +0200)]
xenstore: add some missing headers

xs_tdb_dump.c was including tdb.h, which makes use of dev_t and ino_t,
which are defined in sys/types.h.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibelf: add defines for bswap_* functions for FreeBSD
Roger Pau Monne [Wed, 16 Apr 2014 14:13:16 +0000 (16:13 +0200)]
libelf: add defines for bswap_* functions for FreeBSD

This maps bswap_* functions used in libelf to their FreeBSD
counterparts.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxc: remove include of malloc.h
Roger Pau Monne [Wed, 16 Apr 2014 14:13:15 +0000 (16:13 +0200)]
libxc: remove include of malloc.h

The malloc set of functions should have their prototypes in stdlib.h
according to:

http://pubs.opengroup.org/onlinepubs/009695399/functions/malloc.html

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxc: remove usage of "daylight" variable
Roger Pau Monne [Wed, 16 Apr 2014 14:13:14 +0000 (16:13 +0200)]
libxc: remove usage of "daylight" variable

FreeBSD doesn't implement the XSI extension that mandates the presence
of the daylight variable as described in:

http://pubs.opengroup.org/onlinepubs/009696799/functions/tzset.html

So avoid using it for portability reasons. Use tm_isdst instead to
decide if daylight savings time conversions should be used or not.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agobuild: set FreeBSD specific build variables
Roger Pau Monne [Wed, 16 Apr 2014 14:13:10 +0000 (16:13 +0200)]
build: set FreeBSD specific build variables

This is very similar to what we do in order to build on NetBSD.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Add missing newline after commit 60f7376
Julien Grall [Thu, 24 Apr 2014 22:45:53 +0000 (23:45 +0100)]
xen/arm: Add missing newline after commit 60f7376

Commit 60f7376 "xen/arm: Inject an undefined instruction when the coproc/sysreg
is not handled" replaced panic by gdprintk.

Unfortunately panic message string doesn't need newline, rather than gdprintk
will request one.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: create_xen_entries has to flush TLBs on every CPU
Julien Grall [Wed, 23 Apr 2014 11:36:56 +0000 (12:36 +0100)]
xen/arm: create_xen_entries has to flush TLBs on every CPU

The function create_xen_entries creates mappings in second-level page tables
which is shared between every CPU.

Only flushing TLBs on local processor may result to wrong behaviour
when io{re,un}map is used.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: traps: Add missing 0x in bad_trap
Julien Grall [Thu, 10 Apr 2014 11:44:25 +0000 (12:44 +0100)]
xen/arm: traps: Add missing 0x in bad_trap

The syndrome value is printed in hexadecimal. Prefix it by 0x for less
confusion.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/dts: Add dt_property_read_bool
Julien Grall [Tue, 22 Apr 2014 13:14:23 +0000 (14:14 +0100)]
xen/dts: Add dt_property_read_bool

The function check if a property exists in a specific node.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/common: grant-table: only call IOMMU if paging mode translate is disabled
Julien Grall [Tue, 22 Apr 2014 13:14:19 +0000 (14:14 +0100)]
xen/common: grant-table: only call IOMMU if paging mode translate is disabled

From Xen point of view, ARM guests are PV guest with paging auto translate
enabled.

When IOMMU support will be added for ARM, mapping grant ref will always crash
Xen due to the BUG_ON in __gnttab_map_grant_ref.

On x86:
    - PV guests always have paging mode translate disabled
    - PVH and HVM guests have always paging mode translate enabled

It means that we can safely replace the check that the domain is a PV guests
by checking if the guest has paging mode translate enabled.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>
11 years agoxen/arm: p2m: apply_p2m_changes: Only load domain P2M when we flush TLBs
Julien Grall [Tue, 22 Apr 2014 13:14:18 +0000 (14:14 +0100)]
xen/arm: p2m: apply_p2m_changes: Only load domain P2M when we flush TLBs

apply_p2m_changes needs to switch to another VTTBR temporarily to avoid
flush every TLBs.

As it's only needed there, we can restrict the scope where the VTTBR of this
domain is loaded.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: p2m: Move comment that was misplaced
Julien Grall [Tue, 22 Apr 2014 13:14:17 +0000 (14:14 +0100)]
xen/arm: p2m: Move comment that was misplaced

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Constify address pointer for cache helpers
Julien Grall [Tue, 22 Apr 2014 13:14:16 +0000 (14:14 +0100)]
xen/arm: Constify address pointer for cache helpers

The memory pointed by this pointer is not modified in clean_xen_dcache_va_range
and clean_and_invalidate_xen_dcache_va_range.

Constify it. This will allow us to use theses helpers later in code which use
const.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: map_device: Don't hardcode dom0 in print message
Julien Grall [Tue, 22 Apr 2014 13:14:15 +0000 (14:14 +0100)]
xen/arm: map_device: Don't hardcode dom0 in print message

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/serial: remove serial_dt_irq
Julien Grall [Tue, 22 Apr 2014 12:58:45 +0000 (13:58 +0100)]
xen/serial: remove serial_dt_irq

This function was only used for ARM IRQ routing which has been removed in an
earlier patch.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Keir Fraser <keir@xen.org>
11 years agoxen/arm: IRQ: Do not allow IRQ to be shared between domains and XEN
Julien Grall [Tue, 22 Apr 2014 12:58:44 +0000 (13:58 +0100)]
xen/arm: IRQ: Do not allow IRQ to be shared between domains and XEN

The current dt_route_irq_to_guest implementation sets IRQ_GUEST even if the
IRQ is correctly setup.

An IRQ can be shared between devices, if the devices are not assigned to the
same domain or Xen, then this could result in routing the IRQ to the domain
instead of Xen ...

Also avoid to relying on wrong the behaviour when Xen is routing an IRQ to
DOM0. Therefore check the return code from route_dt_irq_to_guest in
map_device.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: Defer routing IRQ to Xen until setup_irq() call
Julien Grall [Tue, 22 Apr 2014 12:58:43 +0000 (13:58 +0100)]
xen/arm: IRQ: Defer routing IRQ to Xen until setup_irq() call

When an IRQ is handling by Xen, setup is done in 2 steps:
    - Route the IRQ to the current CPU and set priorities
    - Set up the handler

For PPIs, these steps are called on every cpu. For SPIs, they are only called
on the boot CPU.

Dividing the setup in two step complicates the code when a new driver is
added to Xen (for instance a SMMU driver). Xen can safely route the IRQ
when the driver sets up the interrupt handler.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: Require desc.lock be held by callers of hw_irq_controller callbacks
Julien Grall [Tue, 22 Apr 2014 12:58:42 +0000 (13:58 +0100)]
xen/arm: IRQ: Require desc.lock be held by callers of hw_irq_controller callbacks

When multiple action are supported, gic_irq_{startup,shutdown} will have
to be called in the same critical section as setup/release.
Otherwise there is a race condition if at the same time CPU A is calling
release_dt_irq and CPU B is calling setup_dt_irq.

This could end up with the IRQ not being enabled.

At the same time, modify gic_irq_{enable,disable} to require desc.lock be held.

With both of theses changes, ARM's locking requirements is the same as x86's.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ Introduce irq_get_domain
Julien Grall [Tue, 22 Apr 2014 12:58:41 +0000 (13:58 +0100)]
xen/arm: IRQ Introduce irq_get_domain

This function retrieves a domain from an IRQ. It will be used in several
places (such as do_IRQ) to avoid duplicated code when multiple action will be
supported.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: Move IRQ management from gic.c to irq.c
Julien Grall [Tue, 22 Apr 2014 12:58:40 +0000 (13:58 +0100)]
xen/arm: IRQ: Move IRQ management from gic.c to irq.c

The file gic.c contains functions and variables which is not related to the GIC:
    - release_irq
    - setup_irq
    - gic_route_irq_to_guest
    - {,local_}irq_desc

Move all theses functions/variables in irq.c

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: Rework gic_route_irq_to_guest function
Julien Grall [Tue, 22 Apr 2014 12:58:39 +0000 (13:58 +0100)]
xen/arm: IRQ: Rework gic_route_irq_to_guest function

The function gic_route_irq_to_guest contains code which is not related to the
GIC. Split the function in 2 parts:

- route_dt_irq_to_guest: setup the desc
- gic_route_irq_to_guest: setup correctly the GIC and the desc handler

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: remove __init from setup_dt_irq, request_dt_irq and release_irq
Julien Grall [Tue, 22 Apr 2014 12:58:38 +0000 (13:58 +0100)]
xen/arm: IRQ: remove __init from setup_dt_irq, request_dt_irq and release_irq

These functions will be used in SMMU driver which request interrupt when
a device is assigned to a guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: drop irq parameter in __setup_irq
Julien Grall [Tue, 22 Apr 2014 12:58:37 +0000 (13:58 +0100)]
xen/arm: IRQ: drop irq parameter in __setup_irq

The IRQ number is already provided by desc and __setup_irq doesn't use
it in any case.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: move gic {, un}lock in gic_set_irq_properties
Julien Grall [Tue, 22 Apr 2014 12:58:36 +0000 (13:58 +0100)]
xen/arm: IRQ: move gic {, un}lock in gic_set_irq_properties

The function gic_set_irq_properties is only called in two places:
    - gic_route_irq: the gic.lock is only taken for the call to the
    former function.
    - gic_route_irq_to_guest: the gic.lock is taken for the duration of
    the function. But the lock is only useful when gic_set_irq_properties.

So we can safely move the lock in gic_set_irq_properties and restrict the
critical section for the gic.lock in gic_route_irq_to_guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: Rename irq_cfg into arch_irq_desc
Julien Grall [Tue, 22 Apr 2014 12:58:35 +0000 (13:58 +0100)]
xen/arm: IRQ: Rename irq_cfg into arch_irq_desc

irq_cfg is never used in the code and arch_irq_desc is an alias to irq_cfg.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: IRQ: Use default irq callback from common code for no_irq_type
Julien Grall [Tue, 22 Apr 2014 12:58:34 +0000 (13:58 +0100)]
xen/arm: IRQ: Use default irq callback from common code for no_irq_type

Most of no_irq_type callback are already defined in common/irq.c. We don't
need to recreate our own callbacks.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: timer: replace timer_dt_irq by timer_get_irq
Julien Grall [Tue, 22 Apr 2014 12:58:33 +0000 (13:58 +0100)]
xen/arm: timer: replace timer_dt_irq by timer_get_irq

The function is nearly only used to retrieve the IRQ number.

There is one place where the IRQ type is used (in domain_build.c) but
as the timer IRQ is virtualised for guest we might not have the same property
(e.g active-low level sensitive interrupt).

Replace timer_dt_irq by timer_get_irq which will return the IRQ number.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agodomctl: perform initial post-XSA-77 auditing
Jan Beulich [Fri, 2 May 2014 10:09:48 +0000 (12:09 +0200)]
domctl: perform initial post-XSA-77 auditing

In a number of cases, loops over each vCPU in a domain are involved
here. For large numbers of vCPU-s these may still take some time to
complete, but we're limiting them at a couple of thousand at most, so I
would think this should not by itself be an issue. I wonder though
whether it shouldn't be possible to have XSM restrict the vCPU count
that can be set through XEN_DOMCTL_max_vcpus.

XEN_DOMCTL_pausedomain:

 A loop over vcpu_sleep_sync() for each of vCPU in the domain. That
 function itself has a loop waiting for the subject vCPU to become non-
 runnable, which ought to complete quickly (involving an IPI to be sent
 and acted on). No other unbounded resource usage.

XEN_DOMCTL_unpausedomain:

 Simply a loop calling vcpu_wake() (not having any loops or other
 resource usage itself) for each of vCPU in the domain.

XEN_DOMCTL_getdomaininfo:

 Two loops (one over all domains, i.e. bounded by the limit of 32k
 domains, and another over all vCPU-s in the domain); no other
 unbounded resource usage.

XEN_DOMCTL_getpageframeinfo:

 Inquiring just a single MFN, i.e. no loops and no other unbounded
 resource usage.

XEN_DOMCTL_getpageframeinfo{2,3}:

 Number of inquired MFNs is limited to 1024. Beyond that just like
 XEN_DOMCTL_getpageframeinfo.

XEN_DOMCTL_getvcpuinfo:

 Only obtaining information on the vCPU, no loops or other resource
 usage.

XEN_DOMCTL_setdomainhandle:

 Simply a memcpy() of a very limited amount of data.

XEN_DOMCTL_setdebugging:

 A domain_{,un}pause() pair (see XEN_DOMCTL_{,un}pausedomain) framing
 the setting of a flag.

XEN_DOMCTL_hypercall_init:

 Initializing a guest provided page with hypercall stubs. No other
 resource consumption.

XEN_DOMCTL_arch_setup:

 IA64 leftover, interface structure being removed from the public
 header.

XEN_DOMCTL_settimeoffset:

 Setting a couple of guest state fields. No other resource consumption.

XEN_DOMCTL_getvcpuaffinity:
XEN_DOMCTL_getnodeaffinity:

 Involve temporary memory allocations (approximately) bounded by the
 number of CPUs in the system / number of nodes built for, which is
 okay. Beyond that trivial operation.

XEN_DOMCTL_real_mode_area:

 PPC leftover, interface structure being removed from the public
 header.

XEN_DOMCTL_resumedomain:

 A domain_{,un}pause() pair framing operation very similar to
 XEN_DOMCTL_unpausedomain (see above).

XEN_DOMCTL_sendtrigger:

 Injects an interrupt (SCI or NMI) without any other resource
 consumption.

XEN_DOMCTL_subscribe:

 Updates the suspend event channel, i.e. affecting only the controlled
 domain.

XEN_DOMCTL_disable_migrate:
XEN_DOMCTL_suppress_spurious_page_faults:

 Just setting respective flags on the domain.

XEN_DOMCTL_get_address_size:

 Simply reading the guest property.

XEN_DOMCTL_set_opt_feature:

 Was already tagged IA64-only.

XEN_DOMCTL_set_cpuid:

 MAX_CPUID_INPUT bounded loop, which is okay. No other resource
 consumption.

XEN_DOMCTL_get_machine_address_size:

 Simply obtaining the value set by XEN_DOMCTL_set_machine_address_size
 (or the default set at domain creation time).

XEN_DOMCTL_gettscinfo:
XEN_DOMCTL_settscinfo:

 Reading/writing of a couple of guest state fields wrapped in a
 domain_{,un}pause() pair.

XEN_DOMCTL_audit_p2m:

 Enabled only in debug builds.

XEN_DOMCTL_set_max_evtchn:

 While the limit set here implies other (subsequent) resource usage,
 this is the purpose of the operation.

I also verified that all removed domctls' handlers don't leak
hypervisor memory contents .

Inspected but questionable (and hence left in place for now):

XEN_DOMCTL_max_mem:

 While only setting the field capping a domain's allocation (this
 implies potential successive resource usage, but that's the purpose of
 the operation). However, XSM doesn't see the value that's being set
 here, so the net effect would be potential unbounded memory use.

XEN_DOMCTL_set_virq_handler:

 This modifies a global array. While that is the purpose of the
 operation, if multiple domains are granted permission they can badly
 interfere with one another. Hence I'd appreciate a second opinion
 here. [Andrew confirms that this being the nature of the operation,
 it's fine to be removed from the list - will be done in a 2nd round.]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86: fix guest CPUID handling
Jan Beulich [Fri, 2 May 2014 10:09:03 +0000 (12:09 +0200)]
x86: fix guest CPUID handling

The way XEN_DOMCTL_set_cpuid got handled so far allowed for surprises
to the caller. With this set of operations
- set leaf A (using array index 0)
- set leaf B (using array index 1)
- clear leaf A (clearing array index 0)
- set leaf B (using array index 0)
- clear leaf B (clearing array index 0)
the entry for leaf B at array index 1 would still be in place, while
the caller would expect it to be cleared.

While looking at the use sites of d->arch.cpuid[] I also noticed that
the allocation of the array needlessly uses the zeroing form - the
relevant fields of the array elements get set in a loop immediately
following the allocation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/hvm: indicate avaliability of HW support of APIC virtualization to HVM guests
Boris Ostrovsky [Fri, 2 May 2014 10:06:44 +0000 (12:06 +0200)]
x86/hvm: indicate avaliability of HW support of APIC virtualization to HVM guests

Set bits in hypervisor CPUID leaf indicating that HW provides (and the
hypervisor enables) HW support for APIC and x2APIC virtualization.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/hvm: add HVM-specific hypervisor CPUID leaf
Boris Ostrovsky [Fri, 2 May 2014 10:04:20 +0000 (12:04 +0200)]
x86/hvm: add HVM-specific hypervisor CPUID leaf

CPUID leaf 0x40000004 is for HVM-specific features.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agolibxc: allow changing max number of hypervisor cpuid leaves
Boris Ostrovsky [Fri, 2 May 2014 10:03:36 +0000 (12:03 +0200)]
libxc: allow changing max number of hypervisor cpuid leaves

Add support for changing max number of hypervisor leaves from configuration
file.

This number can be specified using xl's standard 'cpuid' option. Only lowest
8 bits of leaf's 0x4000xx00 eax register are processed, all others are ignored.

The changes allow us to revert commit 80ecb40362365ba77e68fc609de8bd3b7208ae19
which is most likely no longer needed now anyway (Solaris bug that it addressed
has been fixed and backported to earlier releases) but leave possibility of
running unpatched version of Solaris by forcing number of leaves to 2 in the
configuration file.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/NPT: don't walk entire page tables when globally changing types
Jan Beulich [Fri, 2 May 2014 09:53:38 +0000 (11:53 +0200)]
x86/NPT: don't walk entire page tables when globally changing types

Instead leverage the NPF VM exit enforcement by marking just the top
level entries as needing recalculation of their type, building on the
respective range type change modifications.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/NPT: don't walk page tables when changing types on a range
Jan Beulich [Fri, 2 May 2014 09:52:42 +0000 (11:52 +0200)]
x86/NPT: don't walk page tables when changing types on a range

This builds on the fact that in order for no NPF VM exit to occur,
_PAGE_USER must always be set. I.e. by clearing the flag we can force a
VM exit allowing us to do similar lazy type changes as on EPT.

That way, the generic entry-wise code can go away, and we could remove
the range restriction in enforced on HVMOP_track_dirty_vram for XSA-27.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: don't walk page tables when changing types on a range
Jan Beulich [Fri, 2 May 2014 09:51:46 +0000 (11:51 +0200)]
x86/EPT: don't walk page tables when changing types on a range

This requires a new P2M backend hook and a little bit of extra care on
accounting in the generic function.

Note that even on leaf entries we must not immediately set the new
type (in an attempt to avoid the EPT_MISCONFIG VM exits), since the
global accounting in p2m_change_type_range() gets intentionally done
only after updating page tables (or else the update there would
conflict with the function's own use of p2m_is_logdirty_range()), and
the correct type can only be calculated with that in place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: don't walk entire page tables when globally changing types
Jan Beulich [Fri, 2 May 2014 09:50:43 +0000 (11:50 +0200)]
x86/EPT: don't walk entire page tables when globally changing types

Instead leverage the EPT_MISCONFIG VM exit by marking just the top
level entries as needing recalculation of their type, propagating the
the recalculation state down as necessary such that the actual
recalculation gets done upon access.

For this to work, we have to
- restrict the types between which conversions can be done (right now
  only the two types involved in log dirty tracking need to be taken
  care of)
- remember the ranges that log dirty tracking was requested for as well
  as whether global log dirty tracking is in effect

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agoamd, maintainers: Update MAINTAINERS file
Aravind Gopalakrishnan [Fri, 2 May 2014 09:47:00 +0000 (11:47 +0200)]
amd, maintainers: Update MAINTAINERS file

Add self as co-maintainer for AMD specific components.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
11 years agohvm_set_ioreq_page() releases wrong page in error path
Paul Durrant [Fri, 2 May 2014 09:46:32 +0000 (11:46 +0200)]
hvm_set_ioreq_page() releases wrong page in error path

The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agotmem: drop unnecessary lock in tmem_relinquish_pages()
Bob Liu [Fri, 2 May 2014 09:46:09 +0000 (11:46 +0200)]
tmem: drop unnecessary lock in tmem_relinquish_pages()

CID 1150562

tmem_rwlock is unnecessary in tmem_relinquish_pages(), as
such lock is used as gate for hypercalls. However
tmem_relinquish_pages deals with pages that are no longer
owned by any domain - hence there is no need for tmem_rwlock.

Also the function is protected by the 'heap_lock' which
is the only calleer of this function.

This patch drops said lock.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
11 years agox86/HVM: clean up HVMOP_set_mem_type processing
Jan Beulich [Fri, 2 May 2014 08:56:23 +0000 (10:56 +0200)]
x86/HVM: clean up HVMOP_set_mem_type processing

- drop unused variable "mfn"
- consistently do not use "else" when the prior "if" ends in "goto"
- use printk() referencing the target domain instead of gdprintk()
  (which references the current domain) and slightly shorten message
- annotate -EINVAL results in paging/shared paths to actually need
  switching to -EAGAIN (possible only when preemption logic got fixed
  to use -ERESTART)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: flush cache when (potentially) limiting cachability
Jan Beulich [Fri, 2 May 2014 08:54:07 +0000 (10:54 +0200)]
x86/EPT: flush cache when (potentially) limiting cachability

While generally such guest side changes ought to be followed by guest
initiated flushes, we're flushing the cache under similar conditions
elsewhere (e.g. when the guest sets CR0.CD), so let's do so here too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: also force EMT re-evaluation if pinned ranges change
Jan Beulich [Fri, 2 May 2014 08:51:32 +0000 (10:51 +0200)]
x86/EPT: also force EMT re-evaluation if pinned ranges change

This was inadvertently left out of aa9114ed ("x86/EPT: force
re-evaluation of memory type as necessary"). Note that this
intentionally doesn't use memory_type_changed(): Changes to the pinned
ranges are independent of IOMMU presence, which that function uses to
determine whether to call the underlying p2m_memory_type_changed().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: fix pinned cache attribute range checking
Jan Beulich [Fri, 2 May 2014 08:50:55 +0000 (10:50 +0200)]
x86/EPT: fix pinned cache attribute range checking

This wasn't done properly by 4d66f069 ("x86: fix pinned cache attribute
handling"): The passed in GFN shouldn't be assumed to be order aligned.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: refine direct MMIO checking when determining EMT
Jan Beulich [Fri, 2 May 2014 08:50:04 +0000 (10:50 +0200)]
x86/EPT: refine direct MMIO checking when determining EMT

With need_iommu() only ever true when iommu_enabled is also true, and
with the former getting set when a PCI device gets added to a guest,
the checks can be consolidated. The range set check are left in place
just in case raw MMIO or I/O port ranges get passed to a guest.

At once drop open-coding of cache_flush_permitted().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/EPT: consider page order when checking for APIC MFN
Jan Beulich [Fri, 2 May 2014 08:48:48 +0000 (10:48 +0200)]
x86/EPT: consider page order when checking for APIC MFN

This was overlooked in 3d90d6e6 ("x86/EPT: split super pages upon
mismatching memory types").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>