Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 14:53:01 +0000 (10:53 -0400)]
tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE]..
Specifically:
XEN_SYSCTL_TMEM_OP_SET_[WEIGHT,COMPRESS] are now done via:
XEN_SYSCTL_TMEM_SET_CLIENT_INFO
and XEN_SYSCTL_TMEM_OP_SAVE_GET_[VERSION,MAXPOOLS,
CLIENT_WEIGHT, CLIENT_FLAGS] can now be retrieved via:
XEN_SYSCTL_TMEM_GET_CLIENT_INFO
All this information is now in 'struct xen_tmem_client' and
that is what we pass around.
We also rev up the XEN_SYSCTL_INTERFACE_VERSION as we are
re-using the value number of the deleted ones (and henceforth
the information is retrieved differently).
On the toolstack, prior to this patch, the xc_tmem_control
would use the bounce buffer only when arg1 was set and the cmd
was to list. With the 'XEN_SYSCTL_TMEM_OP_SET_[WEIGHT|COMPRESS]'
that made sense as the 'arg1' would have the value. However
for the other ones (say XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID)
the 'arg1' would be the length of the 'buf'. If this
confusing don't despair, patch patch titled:
tmem/xc_tmem_control: Rename 'arg1' to 'len' and 'arg2' to arg.
takes care of that.
The acute reader of the toolstack code will discover that
we only used the bounce buffer for LIST, not for any other
subcommands that used 'buf'!?! Which means that the contents
of 'buf' would never be copied back to the calleer 'buf'!
The author is not sure how this could possibly work, perhaps Xen 4.1
(when this was introduced) was more relaxed about the bounce buffer
being enabled. Anyhow this fixes xc_tmem_control to do it for
any subcommand that has 'arg1'.
Lastly some of the checks in xc_tmem_[restore|save] are removed
as they can't ever be reached (not even sure how they could
have been reached in the original submission). One of them
is the check for the weight against -1 when in fact the
hypervisor would never have provided that value.
Now the checks are simple - as the hypercall always returns
->version and ->maxpools (which is mirroring how it was done
prior to this patch). But if one wants to check the if a guest
has any tmem activity then the patch titled
"tmem: Batch and squash XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_
[FLAGS,NPAGES,UUID] in one sub-call: XEN_SYSCTL_TMEM_OP_GET_POOLS."
adds an ->nr_pools to check for that.
Also we add the check for ->version and ->maxpools and remove
the TODO.
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 14:50:32 +0000 (10:50 -0400)]
tmem/sysctl: Add union in struct xen_sysctl_tmem_op
No functional change. We do this to prepare for another
entry to be added in the union. See patch titled:
"tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE]"
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 14:10:42 +0000 (10:10 -0400)]
tmem: Move client weight, frozen, live_migrating, and compress
in its own structure. This paves the way to make only one hypercall
to retrieve/set this information instead of multiple ones.
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 27 Sep 2016 13:40:22 +0000 (09:40 -0400)]
tmem: Delete deduplication (and tze) code.
Couple of reasons:
- It can lead to security issues (see row-hammer, KSM and such
attacks).
- Code is quite complex.
- Deduplication is good if the pages themselves are the same
but that is hardly guaranteed.
- We got some gains (if pages are deduped) but at the cost of
making code less maintainable.
- tze depends on deduplication code.
As such, deleting it.
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 21 Sep 2016 20:53:51 +0000 (16:53 -0400)]
tmem: Retire XEN_SYSCTL_TMEM_OP_[SET_CAP|SAVE_GET_CLIENT_CAP]
It is not used by anything. Its intent was to complement
the 'weight' attribute but there hadn't been any request for this.
If there is a need to resurface it, it can be integrated back
via the XEN_SYSCTL_TMEM_SET_CLIENT_INFO introduced in
"tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE].."
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 22 Sep 2016 01:18:57 +0000 (21:18 -0400)]
libxc/tmem/restore: Remove call to XEN_SYSCTL_TMEM_OP_SAVE_GET_VERSION
The only thing this hypercall returns is TMEM_SPEC_VERSION.
The comment around is also misleading - this call does not
do any domain operation.
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Fri, 30 Sep 2016 15:47:17 +0000 (16:47 +0100)]
svm/emulate: remove duplicated const specifier
Clang complains:
emulate.c:65:3: error: duplicate 'const' declaration specifier
[-Werror,-Wduplicate-decl-specifier]
} const opc_tab[INSTR_MAX_COUNT] = {
^
Remove that const to fix the issue.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Thu, 15 Sep 2016 11:35:05 +0000 (12:35 +0100)]
xen: credit2: "relax" CSCHED2_MAX_TIMER
Credit2 is already event based, rather than tick
based. This means, the time at which the (i+1)-eth
scheduling decision needs to happen is computed
during the i-eth scheduling decision, and a timer
is set accordingly.
If there's nothing imminent (or, the most imminent
event is really really really far away), it is
ok to say "well, let's double-check things in
a little bit anyway", but such 'little bit' does
not need to be too little, as, most likely, it's
just pure overhead.
The current period, for this "safety catch"-alike
timer is 2ms, which indeed is high, but it can
well be higher. In fact, benchmarks show that
setting it to 10ms --combined with other
optimizations-- does actually improve performance.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 14:21:34 +0000 (16:21 +0200)]
xen: tracing: add trace records for schedule and rate-limiting.
As far as {csched, csched2, rt}_schedule() are concerned,
an "empty" event, would already make it easier to read and
understand a trace.
But while there, add a few useful information, like
if the cpu that is going through the scheduler has
been tickled or not, if it is currently idle, etc
(they vary, on a per-scheduler basis).
For Credit1 and Credit2, add a record about when
rate-limiting kicks in too.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 14:21:27 +0000 (16:21 +0200)]
xen: credit2: implement yield()
When a vcpu explicitly yields it is usually giving
us an advice of "let someone else run and come back
to me in a bit."
Credit2 isn't, so far, doing anything when a vcpu
yields, which means an yield is basically a NOP (well,
actually, it's pure overhead, as it causes the scheduler
kick in, but the result is --at least 99% of the time--
that the very same vcpu that yielded continues to run).
With this patch, when a vcpu yields, we go and try
picking the next vcpu on the runqueue that can run on
the pcpu where the yielding vcpu is running. Of course,
if we don't find any other vcpu that wants and can run
there, the yielding vcpu will continue.
Also, add an yield performance counter, and fix the
style of a couple of comments.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 15:11:56 +0000 (17:11 +0200)]
SVM: use generic instruction decoding
... instead of custom handling. To facilitate this break out init code
from _hvm_emulate_one() into the new hvm_emulate_init(), and make
hvmemul_insn_fetch( globally available.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Jan Beulich [Fri, 30 Sep 2016 14:45:46 +0000 (16:45 +0200)]
x86/32on64: don't modify guest descriptors without need
System gates with type 0 shouldn't have what might be their DPL altered
- such descriptors can't be used anyway without incurring a #GP, and
hence adjusting its DPL is only risking to confuse the guest.
Also bail right away for non-present descriptors - no need to write
back anything in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 14:44:49 +0000 (16:44 +0200)]
x86emul: support RTM instructions
Minimal emulation: XBEGIN aborts right away, hence
- XABORT is just a no-op,
- XEND always raises #GP,
- XTEST always signals neither RTM nor HLE are active.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:54:28 +0000 (04:54 +0200)]
xl: allow to set the ratelimit value online for Credit2
Last part of the wiring necessary for allowing to
change the value of the ratelimit_us parameter online,
for Credit2 (like it is already for Credit1).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:54:21 +0000 (04:54 +0200)]
libxl: allow to set the ratelimit value online for Credit2
This is the remaining part of the plumbing (the libxl
one) necessary to be able to change the value of the
ratelimit_us parameter online, for Credit2 (like it is
already for Credit1).
Note that, so far, we were rejecting (for Credit1) a
new value of zero, despite it is a pretty nice way to
ask for the rate limiting to be disabled, and the
hypervisor is already capable of dealing with it in
that way.
Therefore, we change things so that it is possible to
do so, both for Credit1 and Credit2
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:54:14 +0000 (04:54 +0200)]
libxl: fix coding style of credit1 parameters related functions
More specifically, the the error handling path is
made compliant with libxl's codying style.
No functional change intended.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:54:07 +0000 (04:54 +0200)]
tools: tracing: handle more scheduling related events.
There are some scheduling related trace records that
are not being taken care of (and hence only dumped as
raw records).
Some of them are being introduced in this series, while
other were just neglected by previous patches.
Add support for them.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:53:46 +0000 (04:53 +0200)]
xen: credit2: only reset credit on reset condition
The condition for a Credit2 scheduling epoch coming to an
end is that the vcpu at the front of the runqueue has negative
credits. However, it is possible, that runq_candidate() does
not actually return to the scheduler the first vcpu in the
runqueue (e.g., because such vcpu can't run on the cpu that
is going through the scheduler, because of hard-affinity).
If that happens, we should not trigger a credit reset, or we
risk altering the lenght of a scheduler epoch, wrt what the
original idea of the algorithm was.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:53:39 +0000 (04:53 +0200)]
xen: credit2: make tickling more deterministic
Right now, the following scenario can occurr:
- upon vcpu v wakeup, v itself is put in the runqueue,
and pcpu X is tickled;
- pcpu Y schedules (for whatever reason), sees v in
the runqueue and picks it up.
This may seem ok (or even a good thing), but it's not.
In fact, if runq_tickle() decided X is where v should
run, it did it for a reason (load distribution, SMT
support, cache hotness, affinity, etc), and we really
should try as hard as possible to stick to that.
Of course, we can't be too strict, or we risk leaving
vcpus in the runqueue while there is available CPU
capacity. So, we only leave v in runqueue --for X to
pick it up-- if we see that X has been tickled and
has not scheduled yet, i.e., it will have a real chance
of actually select and schedule v.
If that is not the case, we schedule it on Y (or, at
least, we consider that), as running somewhere non-ideal
is better than not running at all.
The commit also adds performance counters for each of
the possible situations.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:53:32 +0000 (04:53 +0200)]
xen: credit1: don't rate limit context switches in case of yields
Rate limiting has been primarily introduced to avoid too
heavy context switch rate due to interrupts, and, in
general, asynchronous events.
If a vcpu "voluntarily" yields, we really should let it
give up the cpu for a while.
In fact, it may be that it is yielding because it's about
to start spinning, and there's few point in forcing a vcpu
to spin for (potentially) the entire rate-limiting period.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 30 Sep 2016 02:53:25 +0000 (04:53 +0200)]
xen: credit1: return the 'time remaining to the limit' as next timeslice.
If vcpu x has run for 200us, and sched_ratelimit_us is
1000us, continue running x _but_ return 1000us-200us as
the next time slice. This way, next scheduling point will
happen in 800us, i.e., exactly at the point when x crosses
the threshold, and can be descheduled (if appropriate).
Right now (without this patch), we're always returning
sched_ratelimit_us (1000us, in the example above), which
means we're (potentially) allowing x to run more than
it should have been able to.
Note that, however, in order to avoid setting timers to very
short intervals, which is part of the purpose of rate limiting,
we never use a time slice smaller than a well defined threshold.
Such threshold (CSCHED_MIN_TIMER defined in this patch) is, in
general independent from rate limiting, but it looks a good idea
to set it to the minimum possible ratelimiting value.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 13:37:34 +0000 (15:37 +0200)]
x86emul: consolidate segment register handling
Use a single set of variables throughout the huge switch() statement,
allowing to funnel SLDT/STR into the mov-from-sreg code path.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 13:37:00 +0000 (15:37 +0200)]
x86emul: support UMIP
To make this complete, also add support for SLDT and STR. Note that by
just looking at the guest CR4 bit, this is independent of actually
making available the UMIP feature to guests.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 13:06:40 +0000 (15:06 +0200)]
x86emul: sort opcode 0f01 special case switch() statement
Sort the special case opcode 0f01 entries numerically, insert blank
lines between each of the cases, and properly place opening braces.
No functional change.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Zhi Wang [Fri, 30 Sep 2016 13:01:23 +0000 (15:01 +0200)]
x86/emulate: add support for {,v}movd {,x}mm,r/m32 and {,v}movq {,x}mm,r/m64
Found that Windows driver was using a SSE2 instruction MOVD.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Mihai Donțu [Fri, 30 Sep 2016 13:00:29 +0000 (15:00 +0200)]
x86/emulate: add support for {,v}movq xmm,xmm/m64
From: Mihai Donțu <mdontu@bitdefender.com>
Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 12:58:48 +0000 (14:58 +0200)]
x86emul: defer injection of #DB
Move the raising of the single step trap until after registers were
updated. This should probably have been that way from the beginning,
to allow the inject_hw_exception() hook to see updated register state
(in case it cares) - it's a trap, after all.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 12:57:59 +0000 (14:57 +0200)]
x86emul: support XSETBV
This is a prereq for switching PV privileged op emulation to the
generic instruction emulator. Since handle_xsetbv() is already capable
of dealing with all guest kinds, avoid introducing another hook here.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 30 Sep 2016 10:01:04 +0000 (11:01 +0100)]
x86/emulate: Resolve MISSING_BREAK issue in x86_decode()
Coverity doesn't appear to be able to spot that this is a terminal error path,
but leave a comment to "fix" MISSING_BREAK.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 30 Sep 2016 10:01:03 +0000 (11:01 +0100)]
tools/libxc: Don't leak foreign mappings when loading modules
Spotted by Coverity
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Lars Kurth [Mon, 26 Sep 2016 16:06:49 +0000 (17:06 +0100)]
Added COPYING and README.patch files to xen/common and xen/tools
This patch adds information related to non-GPL licenses and code
imports from 3rd party projects. The aim of this patch, is to
make it easier for future contributors, to perform a review
of the codebase.
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: remove all trailing whitespaces ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Lars Kurth [Mon, 26 Sep 2016 12:16:34 +0000 (13:16 +0100)]
blktap2: Added COPYING file
Blktap2 has some complexity, as some files do not have (c) headers
and the directory did not have a COPYING file. At this stage, we
have not verified the intention of (c) holders. We may do this in
future, if the need arises.
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: delete all trailing whitespaces ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Lars Kurth [Mon, 26 Sep 2016 12:16:33 +0000 (13:16 +0100)]
Added COPYING files and README.source files
Added a COPYING file as a boilerplate to explain license oddities in
this directory
Added a vtpm/COPYING file which contains MIT licensed files only
Added a vtpmmgr/README.source file which contains many BSD-3-Clause
files that originally came from tools/vtpm_manager
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: delete all trailing whitespaces ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:19:02 +0000 (18:19 -0700)]
libxl/arm: Add the size of ACPI tables to maxmem
Here it adds the ACPI tables size to set the target maxmem to avoid
providing less available memory for guest.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:19:01 +0000 (18:19 -0700)]
libxl/arm: Initialize domain param HVM_PARAM_CALLBACK_IRQ
The guest kernel will get the event channel interrupt information via
domain param HVM_PARAM_CALLBACK_IRQ. Initialize it here.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:19:00 +0000 (18:19 -0700)]
public/hvm/params.h: Add macros for HVM_PARAM_CALLBACK_TYPE_PPI
Add macros for HVM_PARAM_CALLBACK_TYPE_PPI operation values and update
them in evtchn_fixup().
Also use HVM_PARAM_CALLBACK_IRQ_TYPE_MASK in hvm_set_callback_via().
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:59 +0000 (18:18 -0700)]
libxl/arm: Add ACPI module
Add the ARM Multiboot module for ACPI, so UEFI or DomU can get the base
address of ACPI tables from it.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:58 +0000 (18:18 -0700)]
libxl/arm: Factor finalise_one_memory_node as a gerneric function
Rename finalise_one_memory_node to finalise_one_node and pass the node
name via function parameter.
This is useful for adding ACPI module which will be added by a later
patch.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:57 +0000 (18:18 -0700)]
libxl/arm: Construct ACPI DSDT table
Copy the static DSDT table into ACPI blob.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:56 +0000 (18:18 -0700)]
libxl/arm: Construct ACPI FADT table
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:55 +0000 (18:18 -0700)]
libxl/arm: Construct ACPI MADT table
According to the GIC version, construct the MADT table.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:54 +0000 (18:18 -0700)]
libxl/arm: Factor MPIDR computing codes out as a helper
Factor MPIDR computing codes out as a helper, so it could be shared
between DT and ACPI.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:53 +0000 (18:18 -0700)]
libxl/arm: Construct ACPI GTDT table
Construct GTDT table with the interrupt information of timers.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:52 +0000 (18:18 -0700)]
libxl/arm: Construct ACPI XSDT table
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:51 +0000 (18:18 -0700)]
libxl/arm: Construct ACPI RSDP table
Construct ACPI RSDP table and add a helper to calculate the ACPI table
checksum.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:50 +0000 (18:18 -0700)]
libxl/arm: Estimate the size of ACPI tables
Estimate the size of ACPI tables and reserve a memory map space for ACPI
tables.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:49 +0000 (18:18 -0700)]
libxl/arm: Generate static ACPI DSDT table
It uses static DSDT table like the way x86 uses. Currently the DSDT
table only contains processor device objects and it generates the
maximal objects which so far is 128.
While the GUEST_MAX_VCPUS is defined under __XEN__ or __XEN_TOOLS__, it
needs to add -D__XEN_TOOLS__ to compile mk_dsdt.c.
Also only check iasl for aarch64 in configure since ACPI on ARM32 is not
supported.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: run autogen.sh and fix compilation on x86 ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:48 +0000 (18:18 -0700)]
libxl/arm: prepare for constructing ACPI tables
It only constructs the ACPI tables for 64-bit ARM DomU when user enables
acpi because 32-bit DomU doesn't support ACPI. And the generation codes
are only built for 64-bit toolstack.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Shannon Zhao [Thu, 29 Sep 2016 01:18:47 +0000 (18:18 -0700)]
tools/libxl: Add an unified configuration option for ACPI
Since the existing configuration option "u.hvm.acpi" is x86 specific and
we want to reuse it on ARM as well, add a unified option "acpi" for
x86 and ARM, and for ARM it's disabled by default.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 28 Sep 2016 12:00:31 +0000 (06:00 -0600)]
pub-headers: reduce C99 dependencies
For consumers not using (fully) C99-aware compilers, limit the number
of places where tweaking of the headers would be necessary: Introduce
and use xen_mk_ullong(), allowing its helper macro to be overridden at
once.
For now don't touch public/io/, which also has a few offenders.
The need to include xen.h in hvm/e820.h demonstrates that it is a bad
idea to include public headers first thing - arch/x86/hvm/mtrr.c needs
adjustment just because of this.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 08:01:14 +0000 (10:01 +0200)]
x86emul: simplify LEAVE handling
There's no 1-byte operand size case to take care of here, and there's
no point doing the first writeback using dst fields - we can read rBP
and write rSP directly.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 07:55:32 +0000 (09:55 +0200)]
x86/PV: split out dealing with MSRs from privileged instruction handling
This is in preparation for using the generic emulator here.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 07:55:08 +0000 (09:55 +0200)]
x86/PV: split out dealing with DRn from privileged instruction handling
This is in preparation for using the generic emulator here.
Some care is needed temporarily to not unduly alter guest register
state: The local variable "res" can only go away once this code got
fully switched over to using x86_emulate().
Also switch to IS_ERR_VALUE() instead of (incorrectly) open coding it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 07:54:43 +0000 (09:54 +0200)]
x86/PV: split out dealing with CRn from privileged instruction handling
This is in preparation for using the generic emulator here.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 07:53:40 +0000 (09:53 +0200)]
x86emul: generate and make use of a canonical opcode representation
This representation is then being made available to interested callers,
to facilitate replacing their custom decoding.
This entails combining the three main switch statements into one.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Sep 2016 07:52:52 +0000 (09:52 +0200)]
x86emul: fix {,i}mul and {,i}div
Commit
a3db233ede ("x86emul: use DstEax also for {,I}{MUL,DIV}") went
a little too far: DstEax and SrcEax weren't really meant to be used
together with ModRM - they assume modrm_reg remains zero by the time
the destination / source register pointer gets calculated. Don't fully
undo that commit though, but instead just correct the register pointer,
and don't use dst.val as input for mul and imul (div and idiv did avoid
that already).
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tamas K Lengyel [Thu, 29 Sep 2016 00:55:47 +0000 (18:55 -0600)]
vm_event: Implement ARM SMC events
The ARM SMC instructions are already configured to trap to Xen by default. In
this patch we allow a user-space process in a privileged domain to receive
notification of when such event happens through the vm_event subsystem by
introducing the PRIVILEGED_CALL type.
The intended use-case for this feature is for a monitor application to be able
insert tap-points into the domU kernel-code. For this task only unconditional
SMC instruction should be used.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Paul Lai [Thu, 29 Sep 2016 10:05:06 +0000 (12:05 +0200)]
x86: altp2m cleanup work
Indent goto labels by one space.
Inline (header) altp2m functions.
In do_altp2m_op(), during the sanity check of the passed command,
return -EOPNOTSUPP if not a valid command.
In do_altp2m_op(), when evaluating a command, ASSERT_UNREACHABLE()
if the command is not recognizable. The sanity check above should
have triggered the return of -EOPNOTSUPP.
Make hvm_funcs.altp2m_supported "bool" instead of "bool_t".
Make hvm_altp2m_supported() and altp2m_vcpu_emulate_ve() return
bool (rather than return void()).
Signed-off-by: Paul Lai <paul.c.lai@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 29 Sep 2016 10:04:33 +0000 (12:04 +0200)]
x86emul: add EVEX decoding
This way we can at least size (and e.g. skip) them if needed, and we
also won't raise the wrong fault due to not having read all relevant
bytes.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 29 Sep 2016 10:04:05 +0000 (12:04 +0200)]
x86emul: add XOP decoding
This way we can at least size (and e.g. skip) them if needed, and we
also won't raise the wrong fault due to not having read all relevant
bytes.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 29 Sep 2016 10:03:12 +0000 (12:03 +0200)]
x86emul: complete decoding of two-byte instructions
This way we can at least size (and e.g. skip) them if needed, and we
also won't raise the wrong fault due to not having read all relevant
bytes.
This at once adds correct raising of #UD for the three "ud<n>" flavors
(Intel names only "ud2", but AMD names all three of them in their
opcode maps), as that may make a difference to callers compared to
getting back X86EMUL_UNHANDLEABLE.
Note on opcodes 0FA6 and 0FA7: These are VIA's PadLock instructions,
which have a ModRM like byte where only register forms are valid. I.e.
we could also use SrcImmByte there, but ModRM is more likely to be
correct for a hypothetical extension allowing non-register operations.
Note on opcode 0FB8: I think we're safe to ignore the Itanium specific
JMPE (which doesn't take a ModRM byte, but an immediate).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 29 Sep 2016 10:02:39 +0000 (12:02 +0200)]
x86emul: track only rIP in emulator state
Now that all decoding happens in x86_decode() there's no need to keep
the local registers copy in struct x86_emulate_state. Only rIP gets
updated in the decode phase, so only that register needs tracking
there. All other (read-only) registers can be read from the original
structure (but sadly, due to it getting passed to decode_register(),
the pointer can't be made point to "const" to make the compiler help
ensure no modification happens).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 29 Sep 2016 10:02:15 +0000 (12:02 +0200)]
x86emul: fetch all insn bytes during the decode phase
This way we can offer to callers the service of just sizing
instructions, and we also can better guarantee not to raise the wrong
fault due to not having read all relevant bytes.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 29 Sep 2016 10:01:37 +0000 (12:01 +0200)]
x86emul: split instruction decoding from execution
This is only the mechanical part, a subsequent patch will make non-
mechanical adjustments to actually do all decoding in this new
function.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel [Wed, 28 Sep 2016 23:23:05 +0000 (16:23 -0700)]
arm/mem_access: don't reinject stage 2 access exceptions
The only way a guest may trip with stage 2 access violation is if mem_access is
or was in-use, so reinjecting these exceptions to the guest is never required.
Requested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Wei Liu [Wed, 28 Sep 2016 18:51:08 +0000 (19:51 +0100)]
libxc: use PRI_xen_pfn in xc_dom_load_acpi
This fixes compilation on ARM.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Wei Liu [Wed, 28 Sep 2016 15:38:19 +0000 (16:38 +0100)]
libxc: fix out of range shift in populate_acpi_pages
unsigned int is only 4-byte long and "4" is treated as int. The shift
would overflow.
Use unsigned long type, calculate the bits to shift before shifting
instead of shifting twice.
Caught by clang compilation test.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Dario Faggioli [Wed, 28 Sep 2016 15:04:30 +0000 (16:04 +0100)]
libxc: improve error handling of xc Credit1 and Credit2 helpers
In fact, libxc wrappers should, on error, set errno and
return -1.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Wed, 28 Sep 2016 15:04:30 +0000 (16:04 +0100)]
xen: libxc: allow to set the ratelimit value online
The main purpose of the patch is to provide the xen-libxc
plumbing necessary to be able to change the value of the
ratelimit_us parameter online, for Credit2 (like it is
already for Credit1).
While there:
- mention in the Xen logs when rate limiting was enables
and is being disabled (and vice-versa);
- fix csched2_sys_cntl() which was always returning
-EINVAL in the XEN_SYSCTL_SCHEDOP_putinfo case.
And also:
- fix style of an if in csched_sys_cntl();
- fix the style of the switch in csched2_sys_cntl();
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:06 +0000 (09:22 -0400)]
libxc/xc_dom_core: Copy ACPI tables to guest space
Load ACPI modules into guest space
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:05 +0000 (09:22 -0400)]
libxl/acpi: Build ACPI tables for HVMlite guests
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:04 +0000 (09:22 -0400)]
libxl: Initialize domain build info before calling libxl__domain_make
libxl__domain_make() may want to use b_info so we should set defaults
a little earlier.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:03 +0000 (09:22 -0400)]
libxl/pvhv2: Include APIC page in MMIO hole for PVHv2 guests
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:02 +0000 (09:22 -0400)]
libxl/acpi: Add ACPI e820 entry
Add entry for ACPI tables created for PVHv2 guests to e820 map.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:01 +0000 (09:22 -0400)]
libxc/libxl: Allow multiple ACPI modules
Provide ability to load multiple ACPI modules. Thie feature is needed
by PVHv2 guests and will be used in subsequent patches.
We assume that PVHv2 guests do not load their ACPI modules specified
in the configuration file. We can extend support for that in the future
if desired.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:22:00 +0000 (09:22 -0400)]
libacpi: Build DSDT for PVH guests
PVH guests require DSDT with only ACPI INFO (Xen-specific) and Processor
objects. We separate ASL's ACPI INFO definition into dsdt_acpi_info.asl so
that it can be included in ASLs for both HVM and PVH2.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:59 +0000 (09:21 -0400)]
x86: Allow LAPIC-only emulation_flags for HVM guests
PVHv2 guests may request LAPIC emulation (and nothing else)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:58 +0000 (09:21 -0400)]
acpi: Move ACPI code to tools/libacpi
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:57 +0000 (09:21 -0400)]
acpi/hvmloader: Include file/paths adjustments
In prepearation to moving acpi sources into generally available
libacpi:
1. Pass IOAPIC/LAPIC/PCI mask values via struct acpi_config
2. Modify include files search paths to point to acpi directory
3. Macro-ise include file for build.c that defines various
utilities used by that file. Users of libacpi will be expected
to define this macro when compiling build.c
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:56 +0000 (09:21 -0400)]
acpi/hvmloader: Link ACPI object files directly
ACPI sources will be available to various component which will build
them according to their own rules. ACPI's Makefile will only generate
necessary source files.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:55 +0000 (09:21 -0400)]
acpi/hvmloader: Translate all addresses when assigning addresses in ACPI tables
Non-hvmloader users may be building tables in virtual address space
and therefore we need to make sure that values that end up in tables
are physical addresses.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:54 +0000 (09:21 -0400)]
acpi/hvmloader: Replace mem_alloc() and virt_to_phys() with memory ops
Components that wish to use ACPI builder will need to provide their own
mem_alloc() and virt_to_phys() routines. Pointers to these routines will
be passed to the builder as memory ops.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:53 +0000 (09:21 -0400)]
acpi/hvmloader: Build WAET optionally
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:52 +0000 (09:21 -0400)]
acpi/hvmloader: Make providing IOAPIC in MADT optional
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:51 +0000 (09:21 -0400)]
acpi/hvmloader: Set TIS header address in hvmloader
Users other than hvmloader may provide TIS address as virtual.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:50 +0000 (09:21 -0400)]
acpi/hvmloader: Collect processor and NUMA info in hvmloader
No need for ACPI code to rely on hvm_info variable.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:49 +0000 (09:21 -0400)]
acpi: Re-license ACPI builder files from GPLv2 to LGPLv2.1
ACPI builder is currently distributed under GPLv2 license.
We plan to make the builder available to components other than the
hvmloader (which is also GPLv2). Some of these components (such as
libxl) may be distributed under LGPL-2.1 so that they can be used by
non-GPLv2 callers. But this will not be possible if we incorporate
the ACPI builder in those other components.
To avoid this problem we are relicensing sources in ACPI bulder
directory to the Lesser GNU Public License (LGPL) version 2.1
gpl/mk_dsdt_asl.sh file will remain GPL-only pending permission to
relicense from Lenovo due to commit
801d469ad ("[HVM] ACPI support
patch 3 of 4: ACPI _PRT table."))
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Daniel Kiper <dkiper@net-space.pl>
Acked-by: Stefan Berger <stefanb@us.ibm.com>
Acked-by: Kouya Shimura <kouya@jp.fujitsu.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [for Oracle, VirtualIron and Sun contributions]
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:48 +0000 (09:21 -0400)]
acpi: Prevent GPL-only code from seeping into non-GPL binaries
Some code (specifically, introduced by commit
801d469ad ("[HVM] ACPI
support patch 3 of 4: ACPI _PRT table.")) has only been licensed under
GPLv2. We want to prevent this code from showing up in non-GPL
binaries which might become possible after we make ACPI builder code
available to users other than hvmloader.
There are two pieces that we need to be careful about:
(1) A small chunk of code in dsdt.asl that implements _PIC method
(2) A chunk of ASL generator in mk_dsdt.c that describes with PCI
interrupt routing.
This code will now be generated by a GPL-only script which will be
invoked only when ACPI builder's Makefile is called with GPL variable
set.
We also strip license header from generated ASL files to prevent
inadverent use of those files with incorrect license.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Wed, 28 Sep 2016 13:21:47 +0000 (09:21 -0400)]
acpi: Extract acpi info description into a separate ASL file
This code will be needed by PVH guests who don't want to use full DSDT.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Wed, 28 Sep 2016 11:12:13 +0000 (12:12 +0100)]
Config.mk: update mini-os commit
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Wed, 28 Sep 2016 08:31:58 +0000 (10:31 +0200)]
pvgrub: use printk() instead of grub_printf()
grub_printf() is supporting only a very limited number of formats.
Especially some error messages suffer from that, e.g. %lx won't work.
Switch to use printk() for error messages instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Wed, 28 Sep 2016 04:02:44 +0000 (06:02 +0200)]
pvgrub: fix crash when booting kernel with p2m list outside kernel mapping
When trying to boot a kernel with the p2m list not mapped by the
initial kernel mapping it can happen that pvgrub is failing as it is
keeping some page tables mapped.
Unmap the additional page tables created for the special p2m mapping
will avoid this failure.
Reported-by: Sven Koehler <sven.koehler@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Konrad Rzeszutek Wilk [Sun, 11 Sep 2016 00:41:24 +0000 (20:41 -0400)]
livepatch: arm[32,64],x86: NOP test-case
The test-case is quite simple - we NOP the 'xen_minor_version'.
The amount of NOPs depends on the architecture.
On x86 the function is 11 bytes long:
55 push %rbp <- NOP
48 89 e5 mov %rsp,%rbp <- NOP
b8 04 00 00 00 mov $0x4,%eax <- NOP
5d pop %rbp <- NOP
c3 retq
We can NOP everything but the last instruction (so 10 bytes).
On ARM64 its 8 bytes:
52800100 mov w0, #0x8 <- NOP
d65f03c0 ret
We can NOP the first instruction.
While on ARM32 there are 24 bytes:
e52db004 push {fp} <- NOP
e28db000 add fp, sp, #0 <- NOP
e3a00008 mov r0, #8 <- NOP
e24bd000 sub sp, fp, #0 <- NOP
e49db004 pop {fp} <- NOP
e12fff1e bx lr
And we can NOP instructions 1 through 5.
Granted this code may be different per compiler!
Hence if anybody does run this test-case - they should
verify that the assumptions made here are correct.
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 23 Sep 2016 15:25:12 +0000 (11:25 -0400)]
livepatch, arm[32|64]: Share arch_livepatch_revert
It is exactly the same in both platforms.
No functional change.
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 23 Sep 2016 00:15:09 +0000 (20:15 -0400)]
livepatch: Initial ARM32 support.
The patch piggybacks on: livepatch: Initial ARM64 support, which
brings up all of the necessary livepatch infrastructure pieces in.
This patch adds three major pieces:
1) ELF relocations. ARM32 uses SHT_REL instead of SHT_RELA which
means the adddendum had to be extracted from within the
instruction. Which required parsing BL/BLX, B/BL<cond>,
MOVT, and MOVW instructions.
The code was written from scratch using the ARM ELF manual
(and the ARM Architecture Reference Manual)
2) Inserting an trampoline. We use the B (branch to address)
which uses an offset that is based on the PC value: PC + imm32.
Because we insert the branch at the start of the old function
we have to account for the instruction already being fetched
and subtract -8 from the delta (new_addr - old_addr). See
ARM DDI 0406C.c, see A2.3 (pg 45) and A8.8.18 pg (pg 334,335)
3) Allows the test-cases to be built under ARM 32.
The "livepatch: tests: Make them compile under ARM64"
put in the right infrastructure for it and we piggyback on it.
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com> [for non-ARM parts]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 19 Sep 2016 20:04:55 +0000 (16:04 -0400)]
livepatch: tests: Make them compile under ARM64
We need to two things:
1) Wrap the platform-specific objcopy parameters in defines
The input and output parameters for $(OBJCOPY) are different
based on the platforms. As such provide them in the
OBJCOPY_MAGIC define and use that.
2) The alternative is a bit different (exists only under ARM64
and x86), while and there are no exceptions under ARM at all.
We use the LIVEPATCH_FEATURE CPU id feature for ARM similar to
how it is done on x86.
We are not yet attempting to build them under ARM32 so
that is still ifdefed out.
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 19 Sep 2016 16:41:28 +0000 (12:41 -0400)]
livepatch: x86, ARM, alternative: Expose FEATURE_LIVEPATCH
To use as a common way of testing alternative patching for
livepatches. Both architectures have this FEATURE and the
test-cases can piggyback on that.
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 13 Sep 2016 17:15:07 +0000 (13:15 -0400)]
livepatch/arm/x86: Check payload for for unwelcomed symbols.
Certain platforms, such as ARM [32|64] add extra mapping symbols
such as $x (for ARM64 instructions), or more interesting to
this patch: $t (for Thumb instructions). These symbols are supposed
to help the final linker to make any adjustments (such as
add an veneer). But more importantly - we do not compile Xen
with any Thumb instructions (which are variable length) - and
if we find these mapping symbols we should disallow such payload.
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Sat, 13 Aug 2016 03:08:32 +0000 (23:08 -0400)]
livepatch: ARM 32|64: Ignore mapping symbols: $[d,a,x]
Those symbols are used to help final linkers to replace insn.
The ARM ELF specification mandates that they are present
to denote the start of certain CPU features. There are two
variants of it - short and long format.
Either way - we can ignore these symbols.
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> [x86 bits]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 19 Sep 2016 16:37:50 +0000 (12:37 -0400)]
livepatch: ARM/x86: Check displacement of old_addr and new_addr
If the distance is too big we are in trouble - as our relocation
distance can surely be clipped, or still have a valid width - but
cause an overflow of distance.
On various architectures the maximum displacement for a unconditional
branch/jump varies. ARM32 is +/- 32MB, ARM64 is +/- 128MB while x86
for 32-bit relocations is +/- 2G.
Note: On x86 we could use the 64-bit jmpq instruction which
would provide much bigger displacement to do a jump, but we would
still have issues with the new function not being able to reach
any of the old functions (as all the relocations would assume 32-bit
displacement). And "furthermore would require an register or
memory location to load/store the address to." (From Jan).
On ARM the conditional branch supports even a smaller displacement
but fortunately we are not using that.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>