dgit.raspbian.org Git

libxc: fix memory leak in move_l3_below_4G error handling

...otherwise mmu gets leaked.

Coverity-ID: 1055844
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

libxc: fix retrieval of and remove pointless check on gzip size

Coverity-ID: 1055587
Coverity-ID: 1055963
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xencommons: write domain 0's domid to xenstore

libvchan's init_xs_srv (server-side xenstore-related initialization)
expects to find the current domain's domid at this xenstore key. libxl
(and xend) write this for domains they create. Do the same for domain 0,
allowing the use of libvchan in dom0.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

libxl: HVM domain S3 bugfix

Currently Xen hvm s3 has a bug coming from the difference between
qemu-traditional and qemu-xen. For qemu-traditional, the way to
resume from hvm s3 is via 'xl trigger' command. However, for
qemu-xen, the way to resume from hvm s3 inherited from standard
qemu, i.e. via QMP, and it doesn't work under Xen.

The root cause is, for qemu-xen, 'xl trigger' command didn't reset
devices, while QMP didn't unpause hvm domain though they did qemu
system reset.

We have two qemu patches one xl patch to fix the HVM S3 bug:
This patch is the xl patch. It invokes QMP system_wakeup so that
qemu logic for hvm s3 could be triggered.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

tools: remove unnecessary null pointer checks before frees

Patch generated by the following semantic patch
(http://coccinelle.lip6.fr/):

@@
expression *P;
@@

- if(P) free(P);
+ free(P);

...and then by filtering through the following command:

filterdiff -p1 -x 'stubdom/*' -x 'tools/firmware/*' -x 'tools/qemu-*' -x 'tools/blktap*'

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

libxl: ocaml: add META to list of generated files in Makefile

Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: idl: allow KeyedUnion members to be empty

This is useful when the key enum has an "invalid" option and avoids
the need to declare a dummy struct. Use this for domain_build_info
resulting in the generated API changing like so:
    --- tools/libxl/_libxl_BACKUP_types.h
    +++ tools/libxl/_libxl_types.h
    @@ -377,8 +377,6 @@ typedef struct libxl_domain_build_info {
                 const char * features;
                 libxl_defbool e820_host;
             } pv;
    -        struct {
    -        } invalid;
         } u;
     } libxl_domain_build_info;
     void libxl_domain_build_info_dispose(libxl_domain_build_info *p);

+ a related change to the JSON generation.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Revert "VMX: Eliminate cr3 store/load vmexit when UG enabled"

This reverts commit c9efe34c119418a5ac776e5d91aeefcce4576518. It
doesn't work right on non-UG hardware.

tools: xenstored: if the reply is too big then send E2BIG error

This fixes the issue for both C and ocaml xenstored, however only the ocaml
xenstored is vulnerable in its default configuration.

Adding a new error appears to be safe, since bit libxenstore and the Linux
driver at least treat an unknown error code as EINVAL.

This is XSA-72 / CVE-2013-4416.

Original ocaml patch by Jerome Maloberti <jerome.maloberti@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>

fix locking in cpu_disable_scheduler()

So commit eedd6039 ("scheduler: adjust internal locking interface")
uncovered - by now using proper spin lock constructs - a bug after all:
When bringing down a CPU, cpu_disable_scheduler() gets called with
interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
never really correct (i.e. the caller ended up with interrupts enabled
despite having disabled them explicitly).

Fixing this however surfaced another problem: The call path
vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
which however is a non-IRQ-safe once, and hence check_lock() doesn't
like this lock to be acquired when interrupts are already off. As we're
in stop-machine context here, getting things wrong wrt interrupt state
management during lock acquire/release is out of question though, so
the simple solution to this appears to be to just suppress spin lock
debugging for the period of time while the stop machine callback gets
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

VMX: Eliminate cr3 store/load vmexit when UG enabled

With the feature of unrestricted guest, Xen should not cause
vmexits for cr3 accesses in non-paging mode.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>

xl: don't emit misleading daemon pid message

After creating a domain, xl forks off a process to handle domain events
(shutdown, disk eject, ...). It prints out the pid of the process
created by the fork to stdout. However, the newly forked process soon
after calls daemon(), which itself forks once more (and exit()s the
original process). This means that the pid printed out is not the pid of
the actual process which remains in the background after all is said and
done, instead it is the pid of the transient process that exists between
xl's fork() and the fork'd process's daemon() call.

We could resolve this by printing the correct pid, ie. by open-coding
daemon() (we already do most of the heavy lifting it does ourselves by
fiddling with the standard fds). However, since no-one seems to be
complaining about the misleading message to begin with, and since it
seems like a pointless message anyway, just remove it outright instead.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxenstore: Use PTHREAD_STACK_MIN

The existing value of 16K is smaller than the arm64 minimum stack size, which
is 128K. PTHREAD_STACK_MIN appears to be standard
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_attr_setstacksize.html

Consindered setting a lower bound but the stack requirements of the watcher
thread are pretty minimal (tens of bytes from the looks of it) and unlikely to
blow PTHREAD_STACK_MIN on any useful platform.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: remove spurious newline from LOG() message

The macro already includes this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libelf: improve errors in elf_xen_note_check()

I recently debugged an isolated failure to boot, with no information other
than the logs.

The "Will only load images built for the generic loader or Linux images"
string was missing a newline, leading to the subsequent error being appended
to this line, rather than having its own line with correctly identified
function.

Furthermore, error messages which state "param containing $FOO is not $BAR" is
fairly useless for debugging without identifying which bad $FOO caused the
failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/irq: print direct vector mappings in the 'i' debug key

Also adjust the initial print message, as the IRQ loop has contained
non-guest interrutps for a while now.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: refine address validity checks before accessing page tables

In commit 40d66baa ("x86: correct LDT checks") and d06a0d71 ("x86: add
address validity check to guest_map_l1e()") I didn't really pay
attention to the fact that these checks would better be done before the
paging_mode_translate() ones, as there's also no equivalent check down
the shadow code paths involved here (at least not up to the first use
of the address), and such generic checks shouldn't really be done by
particular backend functions anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>

common/initcall: extern linker symbols with correct types

Coverity IDs 1054956, 1054957

Coverity pointed out that we applying array operations based on an expression
which yielded singleton pointers. The problem is actually that the externs
were typed incorrectly.

Correct the extern declaration to prevent straying into undefined behaviour,
and relying on the lenience of GCC to work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

xenctx: fix typo in arm64 output

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen: arm: Ensure HCR_EL2.RW is set correctly when building dom0

copy_to_user and friends rely on this, since the address transalation
functions (guest VA -> MFN) will truncate VA to the appropriate size.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen: arm: correctly round down MFN to 1GB boundary make sure pagetable mask macros as physaddr size

~FIRST_MASK is nothing like correct for rounding down an MFN. It is the
inverse *and* an address not a framenumber so wrong in every dimension! We
cannot use FIRST_MASK since that would mask off any zeroeth level bits.
Instead calculate the correct value from FIRST_SIZE.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen: arm: make sure pagetable mask macros have appropriate size

{ZEROETH,FIRST,SECOND,THIRD}_MASK are used with physical addresses which may
be larger than 32 bits. Therefore ensure that they are wide enough by casting
to paddr_t otherwise we may truncate addresses on 32-bit.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen: arm: map entire memory banks on arm64

Currently we only map regions which are not part of boot modules. However we
subsequently free at least some of those modules to the heaps in
discard_initial_modules and if we were unluckly with sizing/location we might
end up adding unmapped pages to the heap.

The heaps on 64-bit use 1GB mappings, so in practice this is probably pretty
unlikely and I've not actually seen it.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen: arm: Enable 40 bit addressing in VTCR for arm64

This requires setting the v8 specific VTCR_EL2.PS field. These bits are
UNK/SBZP on v7.

Also the TS0SZ field is described slightly differently for v8, so update the
comment to reflect this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen: correct xenheap_bits after "xen: support RAM at addresses 0 and 4096"

This is incorrect after commit 1aac966e24e which shuffled the zones up by one.
I've observed failures on arm64 systems with RAM at 0x8,00000000-0x8,7fffffff
since xenheap_bits ends up as 35 instead of 36 (which is the zone with all the
RAM).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

xen: arm: fix usage of bootargs for Xen.

The chosen node's bootargs property should be used for Xen if there is a dom0
kernel multiboot module with a command line, not just if xen,dom0-bootargs is
present.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.linaro.org>

xen: arm: correct XEN_COMPILE_ARCH autodetection for arm64

At least on aarch64 openSUSE running with qemu-user-aarch64 "uname -m" reports
"aarch64" and not "armv8" so include that in the seddery. There's no harm
leaving the existing armv8 rune too so do so.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen/arm: Allocate memory for dom0 from the bottom with the 1:1 Workaround

On Linux, the option CONFIG_ARM_PATCH_PHYS_VIRT (by default enabled) allows
the Kernel to be loaded anywhere (or nearly) by patching the translation
pv<->virt at boot time.

The current solution in Linux assuming that the delta physical address -
virtual address is always negative. A positive delta will destroy all the
optimisation to modify only a part of the translation instruction (add/sub).

By default, Xen is allocating memory from the top of memory and then
goes down. To avoid booting issue with Linux, we must allocate memory
from the bottom (ie starting from 0).

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: implement smp initialization callbacks for omap5

Signed-off-by: Chen Baozi <baozich@gmail.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

xen/arm: fix a typo in comment of PLATFORM_QUIRK_DOM0_MAPPING_11

Signed-off-by: Chen Baozi <baozich@gmail.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

netif.h: Add IPv6 related changes

My recent patch series to Linux netback added IPv6 checksum
offload and GSO support. This involved making some changes to the
copy of netif.h in Linux.
This patch adds those changes to the canonical copy of netif.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: add_to_physmap_one: Avoid to map mfn 0 if an error occurs

By default, the function add_to_physmap_one set mfn to 0. Some code paths that
result to an error, continue and the map the mfn 0 (valid on ARM) to the
slot given by the guest.

To fix the problem, return directly an error if sanity check has failed.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

spinlock: ensure the flags parameter is wide enough

Because of the construction of spin_lock_irq() (and varients), the flags
parameter could be trucated. Use a BUILD_BUG_ON() to verify the width of the
parameter.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

widen flags parameter for spinlock_irqsave() and friends

These issues were detected using the subsequent patch which forces a
compilation error if the result from local_irq_save() would be truncated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/irq: local_irq_restore() should not blindly popf

local_irq_restore() should only be concerned with possibly changing the
interrupt flag. A blind popf could corrupt other system flags.

While playing in this area, fixup an opencoded use of X86_EFLAGS_IF.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/xsave: also save/restore XCR0 across suspend (ACPI S3)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

xen/arm: Add CPU ID for Broadcom Brahma-B15

Let Xen recognize the Broadcom Brahma-B15 CPU by adding the appropriate
MIDR mask to the initialization phase. Further, ensure that the console
output properly reports the CPU manufacturer as "Broadcom Corporation".

Signed-off-by: Marc Carino <marc.ceeeee@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

x86: print relevant (tail) part of filename for warnings and crashes

In particular when the origin construct is in a header file (and
hence the file name is an absolute path instead of just the file name
portion) the information can otherwise become rather useless when the
build tree isn't sitting relatively close to the file system root.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

tools: update to SeaBIOS 1.7.3.1

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

xend: Drop long deprecation warning in /var/run not /tmp

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xen: arm: Emacs style fix

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

add cap value to credit scheduler debug info

Currently only the weight is the only scheduling parameter printed for
domains in the credit scheduler key handler. Add the cap value to be
printed as well.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

credit: unpause parked vcpu before destroying it

A capped out vcpu must be unpaused in case of moving it to another cpupool,
otherwise it will be paused forever.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

xen/evtchn: Fix build on ARM

The recent event channel changes introduced by commit a77eb86 and before...
break the compilation on Xen ARM. This commit adds missing includes in
common/event_fifo.c and include/xen/sched.h.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

libxl: remove qemu default devices for upstream qemu

Remove default devices created by qemu. Qemu will create only devices
defined by xen, since the devices not defined by xen are not usable.
Remove deleting of empty floppy no more needed with nodefault.

(Removed a whitespace error. -iwj)

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

pygrub: Support (/dev/xvda) style disk specifications

You get these if you install Debian Wheezy as HVM and then try to convert to
PV.

This is Debian bug #603391.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Tril <tril@metapipe.net>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl,xl: add max_event_channels option to xl configuration file

Add the 'max_event_channels' option to the xl configuration file to
limit the number of event channels that domain may use.

Plumb this option through to libxl via a new libxl_build_info field
and call xc_domain_set_max_evtchn() in the post build stage of domain
creation.

A new LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS #define indicates that this
new field is available.

The default value of 1023 limits the domain to using the minimum
amount of global mapping pages and at most 5 xenheap pages.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

libxc: add xc_domain_set_max_evtchn()

Add xc_domain_set_max_evtchn(), a wrapper around the
DOMCTL_set_max_evtchn hypercall.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

Add DOMCTL to limit the number of event channels a domain may use

Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to
set the maximum event channel port a domain may use. This may be used to
limit the amount of Xen resources (global mapping space and xenheap) that
a domain may use for event channels.

A domain that does not have a limit set may use all the event channels
supported by the event channel ABI in use.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: add FIFO-based event channel hypercalls and port ops

Add the implementation for the FIFO-based event channel ABI. The new
hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array) and
the required evtchn_ops (set_pending, unmask, etc.).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: implement EVTCHNOP_set_priority and add the set_priority hook

Implement EVTCHNOP_set_priority. A new set_priority hook added to
struct evtchn_port_ops will do the ABI specific validation and setup.

If an ABI does not provide a set_priority hook (as is the case of the
2-level ABI), the sub-op will return -ENOSYS.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: add FIFO-based event channel ABI

Add the event channel hypercall sub-ops and the definitions for the
shared data structures for the FIFO-based event channel ABI.

The design document for this new ABI is available here:

http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf

In summary, events are reported using a per-domain shared event array
of event words. Each event word has PENDING, LINKED and MASKED bits
and a LINK field for pointing to the next event in the event queue.

There are 16 event queues (with different priorities) per-VCPU.

Key advantages of this new ABI include:

- Support for over 100,000 events (2^17).
- 16 different event priorities.
- Improved fairness in event latency through the use of FIFOs.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: allow many more evtchn objects to be allocated per domain

Expand the number of event channels that can be supported internally
by altering now struct evtchn's are allocated.

The objects are indexed using a two level scheme of groups and buckets
(instead of only buckets). Each group is a page of bucket pointers.
Each bucket is a page-sized array of struct evtchn's.

The optimal number of evtchns per bucket is calculated at compile
time.

If XSM is not enabled, struct evtchn is 16 bytes and each bucket
contains 256, requiring only 1 group of 512 pointers for 2^17
(131,072) event channels. With XSM enabled, struct evtchn is 24
bytes, each bucket contains 128 and 2 groups are required.

For the common case of a domain with only a few event channels,
instead of requiring an additional allocation for the group page, the
first bucket is indexed directly.

As a consequence of this, struct domain shrinks by at least 232 bytes
as 32 bucket pointers are replaced with 1 bucket pointer and (at most)
2 group pointers.

[ Based on a patch from Wei Liu with improvements from Malcolm
Crossley. ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: use a per-domain variable for the max number of event channels

Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This
avoids having to repeatedly check the ABI type.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: print ABI specific state with the 'e' debug key

In the output of the 'e' debug key, print some ABI specific state in
addition to the (p)ending and (m)asked bits.

For the 2-level ABI, print the state of that event's selector
bit. e.g.,

(XEN)     port [p/m/s]
(XEN)        1 [0/0/1]: s=3 n=0 x=0 d=0 p=74
(XEN)        2 [0/0/1]: s=3 n=0 x=0 d=0 p=75

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

evtchn: refactor low-level event channel port ops

Use functions for the low-level event channel port operations
(set/clear pending, unmask, is_pending and is_masked).

Group these functions into a struct evtchn_port_op so they can be
replaced by alternate implementations (for different ABIs) on a
per-domain basis.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

debug: remove some event channel info from the 'i' and 'q' debug keys

The 'i' key would always use VCPU0's selector word when printing the
event channel state. Remove the incorrect output as a subsequent
change will add the (correct) information to the 'e' key instead.

When dumping domain information, printing the state of the VIRQ_DEBUG
port is redundant -- this information is available via the 'e' key.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: cache emulated instruction for retry processing

Rather than re-reading the instruction bytes upon retry processing,
stash away and re-use what we already read. That way we can be certain
that the retry won't do something different from what requested the
retry, getting once again closer to real hardware behavior (where what
we use retries for is simply a bus operation, not involving redundant
decoding of instructions).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: properly deal with hvm_copy_*_guest_phys() errors

In memory read/write handling the default case should tell the caller
that the operation cannot be handled rather than the operation having
succeeded, so that when new HVMCOPY_* states get added not handling
them explicitly will not result in errors being ignored.

In task switch emulation code stop handling some errors, but not
others.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: don't ignore hvm_copy_to_guest_phys() errors during I/O intercept

Building upon the extended retry logic we can now also make sure to
not ignore errors resulting from writing data back to guest memory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: fix direct PCI port I/O emulation retry and error handling

dpci_ioport_{read,write}() guest memory access failure handling should
be modelled after process_portio_intercept()'s (and others): Upon
encountering an error on other than the first iteration, the count
successfully handled needs to be stored and X86EMUL_OKAY returned, in
order for the generic instruction emulator to update register state
correctly before reporting failure or retrying (both of which would
only happen after re-invoking emulation).

Further we leverage (and slightly extend, due to the above mentioned
need to return X86EMUL_OKAY) the "large MMIO" retry model.

Note that there is still a special case not explicitly taken care of
here: While the first retry on the last iteration of a "rep ins"
correctly recovers the already read data, an eventual subsequent retry
is being handled by the pre-existing mmio-large logic (through
hvmemul_do_io() storing the [recovered] data [again], also taking into
consideration that the emulator converts a single iteration "ins" to
->read_io() plus ->write()).

Also fix an off-by-one in the mmio-large-read logic, and slightly
simplify the copying of the data.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: properly handle backward string instruction emulation

Multiplying a signed 32-bit quantity with an unsigned 32-bit quantity
produces an unsigned 32-bit result, yet for emulation of backward
string instructions we need the result sign extended before getting
added to the base address.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

sched: Correct function prototypes

struct vcpu pointers are traditionally v rather than d.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/MSI: fix locking in pci_restore_msi_state()

Right after the loop the lock is being dropped, so all loop exits
should happen with the lock still held.

Reported-by: Kristoffer Egefelt <kristoffer@itoc.dk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Kristoffer Egefelt <kristoffer@itoc.dk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

sched: fix race between sched_move_domain() and vcpu_wake()

From: David Vrabel <david.vrabel@citrix.com>

sched_move_domain() changes v->processor for all the domain's VCPUs.
If another domain, softirq etc. triggers a simultaneous call to
vcpu_wake() (e.g., by setting an event channel as pending), then
vcpu_wake() may lock one schedule lock and try to unlock another.

vcpu_schedule_lock() attempts to handle this but only does so for the
window between reading the schedule_lock from the per-CPU data and the
spin_lock() call. This does not help with sched_move_domain()
changing v->processor between the calls to vcpu_schedule_lock() and
vcpu_schedule_unlock().

Fix the race by taking the schedule_lock for v->processor in
sched_move_domain().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Use vcpu_schedule_lock_irq() (which now returns the lock) to properly
retry the locking should the to be used lock have changed in the course
of acquiring it (issue pointed out by George Dunlap).

Add a comment explaining the state after the v->processor adjustment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

scheduler: adjust internal locking interface

Make the locking functions return the lock pointers, so they can be
passed to the unlocking functions (which in turn can check that the
lock is still actually providing the intended protection, i.e. the
parameters determining which lock is the right one didn't change).

Further use proper spin lock primitives rather than open coded
local_irq_...() constructs, so that interrupts can be re-enabled as
appropriate while spinning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86: fix bug_line()

Due to the packing into a bit field together with a relocated field,
the computation can overflow when the relocated field ends up getting a
negative value stored. Hence it isn't sufficient to correct the value
by 1 in this case, but we also need to mask the result to the width of
the original bit field.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

Revert "QEMU_TAG update"

(My script edited the wrong xen.git branch)

This reverts commit 363cfda13a58eab51a4a85f30c7c740990b53c3a.

QEMU_TAG update

libxl: make libxl__poller_put tolerate p==NULL

This is less fragile, and more in keeping with the usual style of
initialising everything to 0 and freeing things unconditionally.

Correspondingly, remove the tests at the call sites.

Apropos of c1f3f174. No overall functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

x86: check for canonical address before doing page walks

... as there doesn't really exists any valid mapping for them.

Particularly in the case of do_page_walk() this also avoids returning
non-NULL for such invalid input.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86: use {rd,wr}{fs,gs}base when available

... as being intended to be faster than MSR reads/writes.

In the case of emulate_privileged_op() also use these in favor of the
cached (but possibly stale) addresses from arch.pv_vcpu. This allows
entirely removing the code that was the subject of XSA-67.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86: add address validity check to guest_map_l1e()

Just like for guest_get_eff_l1e() this prevents accessing as page
tables (and with the wrong memory attribute) internal data inside Xen
happening to be mapped with 1Gb pages.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86: correct LDT checks

- MMUEXT_SET_LDT should behave as similarly to the LLDT instruction as
  possible: fail only if the base address is non-canonical
- instead LDT descriptor accesses should fault if the descriptor
  address ends up being non-canonical (by ensuring this we at once
  avoid reading an entry from the mach-to-phys table and consider it a
  page table entry)
- fault propagation on using LDT selectors must distinguish #PF and #GP
  (the latter must be raised for a non-canonical descriptor address,
  which also applies to several other uses of propagate_page_fault(),
  and hence the problem is being fixed there)
- map_ldt_shadow_page() should properly wrap addresses for 32-bit VMs

At once remove the odd invokation of map_ldt_shadow_page() from the
MMUEXT_SET_LDT handler: There's nothing really telling us that the
first LDT page is going to be preferred over others.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

libxl: fix out-of-memory error handling in libxl_list_cpupool

...otherwise it will return freed memory. All the current users of this
function check already for a NULL return, so use that.

Coverity-ID: 1056194

This is CVE-2013-4371 / XSA-70

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

tools/ocaml: fix erroneous free of cpumap in stub_xc_vcpu_getaffinity

Not sure how it got there...

Coverity-ID: 1056196

This is CVE-2013-4370 / XSA-69

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

libxl: fix vif rate parsing

strtok can return NULL here. We don't need to use strtok anyway, so just
use a simple strchr method.

Coverity-ID: 1055642

This is CVE-2013-4369 / XSA-68

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Fix type. Add test case

Signed-off-by: Ian Campbell <Ian.campbell@citrix.com>

x86: check segment descriptor read result in 64-bit OUTS emulation

When emulating such an operation from a 64-bit context (CS has long
mode set), and the data segment is overridden to FS/GS, the result of
reading the overridden segment's descriptor (read_descriptor) is not
checked. If it fails, data_base is left uninitialized.

This can lead to 8 bytes of Xen's stack being leaked to the guest
(implicitly, i.e. via the address given in a #PF).

Coverity-ID: 1055116

This is CVE-2013-4368 / XSA-67.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Fix formatting.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

xen/arm: Fixing clear_guest_offset macro

Fix the the broken macro 'clear_guest_offset' in arm.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

libxl: introduce libxl_node_to_cpumap

As an helper for the special case (of libxl_nodemap_to_cpumap) when
one wants the cpumap for just one node.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xl: fix a typo in main_vcpulist()

which was preventing `xl vcpu-list -h' to work.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xl: update the manpage about "cpus=" and NUMA node-affinity

Since d06b1bf169a01a9c7b0947d7825e58cb455a0ba5 ('libxl: automatic placement
deals with node-affinity') it is no longer true that, if no "cpus=" option
is specified, xl picks up some pCPUs by default and pin the domain there.

In fact, it is the NUMA node-affinity that is affected by automatic
placement, not vCPU to pCPU pinning.

Update the xl config file documenation accordingly, as it seems to have
been forgotten at that time.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/migrate: Fix regression when migrating from older version of Xen

Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010
  "libxc: provide notification of final checkpoint to restore end"
broke migration from any version of Xen using tools from prior to that commit

Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer
tools xc_domain_restore() to start reading the qemu save record, as
ctx->last_checkpoint is 0.

The failure looks like:
  xc: error: Max batch size exceeded (1970103633). Giving up.
where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu"

With this fix in place, the behaviour for normal migrations is reverted to how
it was before the regression; the migration is considered non-checkpointed
right from the start.  A XC_SAVE_ID_LAST_CHECKPOINT chunk seen in the
migration stream is a nop.  For checkpointed migrations the behaviour is
unchanged.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> (Remus bits)

tools: adds tracer on qemu-xen debug configure options

When building tools in debug mode (debug=y), pass also
--enable-trace-backend=stderr when configuring qemu-xen.
Useful to improve debug.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

xen/arm32: Call start_xen only on the boot CPU

The boot CPU can have a CPU ID non-equal to zero. Xen needs to check the
logical CPU ID (in r12) to know if the CPU is the boot one.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

qemu-xen: Set localstatedir to /var.

This path is used by the QEMU build system to create the /run directory.
If local-state-dir is not set, the result become $prefix/var which
is not an acceptable path.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

qemu-xen: Disabling build of guest-agent.

It is not use when QEMU is run with Xen.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

hvm/vidirian: Avoid printing page_to_mfn(NULL) on error paths

While working in the viridian code, I noticed that 4cb6c4f4941

"x86/hvm: Use get_page_from_gfn() instead of get_gfn()/put_gfn."

introduced two error paths where page_to_mfn(NULL) would be formatted and
presented as a bad MFN. This provides junk in the warning rather than
something useful.

These two codepaths are fixed up to match their counterpart in
wrmsr_hypervisor_regs()

While auditing the other changes from 4cb6c4f4941, I noticed a small
optimisation which could be made by changing the order of the validity checks
to remove 6 NULL pointer checks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/traps: improvements to {rd,wr}msr_hypervisor_regs()

Coverity ID: 1055249 1055250

Coverity was complaining that the switch statments contained dead code in
their default statements. While this is quite minor, the code flow in
wrmsr_hypervisor_regs() was sufficiently opaque that I felt it approprate to
fix.

Other improvements include:
* not shadowing the function parameter 'idx'.
* use of PAGE_{SHIFT,SIZE} instead of opencoded numbers.
* a more descriptive error message for attempting to write invalid indicies
for hypercall pages.

There is no behavioural change as a result.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen/x86: Remove GB macro in asm-x86/config.h

Commit 983843e "xen: Add macros MB and GB" introduce a generic GB macro.
By mistake, the macro in asm-x86/config.h was not removed. This is result to
a compilation error when Xen is build for x86.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen/dts: Support Linux initrd DT bindings

Linux uses the property linux,initrd-start and linux,initrd-end to know where
the initrd lives in memory.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: Add support to load initrd in dom0

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/dts: Use ROUNDUP macro instead of the internal ALIGN

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen: Add macro ROUNDUP

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>

xen: Add macros MB and GB

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/HPET: basic cleanup

* Strip trailing whitespace
* Remove redundant definitions
* Update stale documentation links
* Move hpet_address into __initdata

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

VT-d: fix suspected data race condition in iommu_set_root_entry()

Coverity ID: 1054967

Coverity spotted that iommu->root_maddr was optionally allocated within the
protection of the iommu->lock, but was referenced with the protection of the
iommu->register_lock, and freed without any lock.

Luckily, the code as-is is not vulnerable to the potential risks identified.

However, the alloc_pgtable_maddr() is far more appropriately done in
iommu_alloc(), removing a set of spinlock calls, and a possibility for the
iommu setup to fail later than iommu_alloc() with an -ENOMEM.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>

libxc: add LZ4 decompression support

Since there's no shared or static library to link against, this simply
re-uses the hypervisor side code. However, I only audited the code
added here for possible security issues, not the referenced code in
the hypervisor tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>