dgit.raspbian.org Git

debian/scripts: Optimize runtime scripts

Fewer fork()s and execve()s quickly add up to significant savings. I'm
concerned Debian is slowly headed towards recreating SunOS^WSolaris
5.7^W2.7^W7 and the layers and layers of scripts which killed
performance.

As these runtime scripts are heavily used, avoid all uses of external
programs by them.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

Rework "debian/rules: Do not try to move EFI binaries on armhf"

What is actually needed is a move command which fails if the move
fails (i386/amd64, the EFI files were created, but the move failed), but
succeeds if no files are moved (armhf, no EFI files are created). A
combination of find/xargs matches this.

This reworks commit 8ff478af61fa8a87806a89bbd618cd9da2354302.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

d/shuffle-binaries: Fix binary shuffling script for cross-building

`ldd` doesn't work with cross-builds, so use `file` to detect scripts
and `strings | grep` for identifying linked libraries. Even though
debhelper depends on file, make the requirement explicit.

Heavily simplify script inner loop. While the core of that inner loop
was often executed only once, it is best to shrink inner loops.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/salsa-ci.yml: don't allow reprotest to fail

Xen now builds reproducibly.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/rules: reproducibly build oxenstored

Use BUILD_PATH_PREFIX_MAP to generate build path independent output with
ocaml.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/rules: don't include build path in binaries

This is part of making the package build reproducibly.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/rules: use SOURCE_DATE_EPOCH for xen build dates

Use fixed timestamps for reproducible builds.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/salsa-ci.yml: enable diffoscope in reprotest

This displays binary differences in the CI output.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/salsa-ci.yml: enable salsa-ci

according to the manual at
https://salsa.debian.org/salsa-ci-team/pipeline/-/blob/master/README.md

If the pipeline is not yet enabled and configured in the gitlab web
interface by default, this needs to be done manually as written in the
manual.

As of now the salsa-ci only supports gbp, so convert the debrebase
repository to a gbp one.
Also work around a problem with gbp export-orig replacing variables in
.gitarchive-info which then causes problems as the repository and the
generated .tar.gz don't match.

Allow reprotest to fail for now as the build is currently not
reproducible.

Disable blhc as xen currently does not enable hardening when building
the hypervisor so blhc outputs many errors.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/rules: Set CC/LD to enable cross-building

Always set $(CC) and $(LD) when building. This has no effect on native
builds, but is needed for cross-builds to use the right compiler.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
[Hans van Kranenburg]
Note: instead of using DEB_HOST_MULTIARCH, we have to use
DEB_HOST_GNU_TYPE for CC. This commit is the 'second take' of commit
900c799589
Acked-by: Hans van Kranenburg <hans@knorrie.org>

Update changelog for new upstream 4.14.1+11-gb0b734a8b3

[git-debrebase changelog: new upstream 4.14.1+11-gb0b734a8b3]

Update to upstream 4.14.1+11-gb0b734a8b3

[git-debrebase anchor: new upstream 4.14.1+11-gb0b734a8b3, merge]

debian/changelog: finish 4.14.0+88-g1d1d1f5391-2

Revert "debian/rules: Set CC/LD to enable cross-building"

This reverts commit 900c79958963e72192fbf3654b38a5c7f8353176.

Unfortunately, this change causes an FTBFS on i386, because what we need
to call on i386 is i686-linux-gnu-gcc. Sigh.

For now, revert the change, so we can get the security updates out.
Later we will have to see how to properly fix this.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

debian/changelog: finish 4.14.0+88-g1d1d1f5391-1

We're setting urgency=high for this one, because of the security issues.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

d/changelog: most stuff for 4.14.0+88-g1d1d1f5391-1

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

debian/changelog: add a few missing CVE numbers

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

debian/rules: enable verbose build

Debian policy says the package build should be as verbose as reasonably
possible. Also this helps with debugging build issues.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Reviewed-by: Hans van Kranenburg <hans@knorrie.org>

d/xen-hypervisor-V.bug-control.vsn-in: actually install script

Rename debian/xen-hypervisor-V.bug-control.vsn-in to
debian/xen-hypervisor-V-F.bug-control.vsn-in so it actually gets
installed.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
[Hans van Kranenburg: split submitted patch in two]
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

d/xen-hypervisor-V.*: clean up unused files

These files are unused identical copies of the -V-F. files.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
[Hans van Kranenburg: split submitted patch in two]
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

d/xen-hypervisor-V-F.postrm: actually install script

Fix the filename of debian/xen-hypervisor-V-F.postrm by adding the
missing .vsn-in suffix so it actually gets installed.

This fixes the issue of not running update-grub when the Xen hypervisor
packages get removed, which resulted in a system that could not be
rebooted without interactive usage of the grub menu.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Reviewed-by: Hans van Kranenburg <hans@knorrie.org>

d/xen-hypervisor-V-F.postinst.vsn-in: use reboot-required

When updating the xen hypervisor, touch the /run/reboot-required file to
indicate a reboot is required. Also add information about the package
requesting the reboot to the /run/reboot-required.pkgs file.

Closes: #862408
Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Reviewed-by: Hans van Kranenburg <hans@knorrie.org>

d/shuffle-binaries: Remove useless extra argument being passed in

This doesn't do anything. Really it is almost a bug the script /doesn't/
complain.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/xen.init: Load xen_acpi_processor on boot

This allows more control of processor state, potentially resulting in
reduced power usage. Alternatively simply more information on processor
use.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

debian/rules: Set CC/LD to enable cross-building

Always set $(CC) and $(LD) when building. This has no effect on native
builds, but is needed for cross-builds to use the right compiler.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

d/shuffle-boot-files: The Great Quotification

These should originate with the owner of a build system and are unlikely
to get hazardous values. This script though *should* work on a system
with such a bizzare setup. On general principle, add lots of double-quotes.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

d/shuffle-binaries: Add quoting for potentially changeable variables

These should originate with the owner of a build system and are unlikely
to get hazardous values. This script though *should* work on a system
with such a bizarre setup. On general principle, add double-quotes.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

d/shuffle-binaries: Make error detection/message overt

The reason for the `ls` at the end is pretty straightforward if you think
about it, mainly confirming the shuffle step did something. Yet that
is a bit non-obvious, and the error message produced on failure is poor
("No such file or directory" simply means the script failed somewhere?).
Add an overt error message.

Signed-off-by: Elliott Mitchell <ehem+debian@m5p.com>
Acked-by: Hans van Kranenburg <hans@knorrie.org>

d/rules: do not compress /usr/share/doc/xen/html

So, we have a xen-doc package which is already split off nicely as
binary package to contain the upstream html collection of documentation.

By default, dh_compress will gzip -9 .txt files which it considers to
be too large. However, in our case, there's some index.html with links,
and if this happens, the user who consciously installs the xen-doc
package, wanting to waste disk space on this ends up with a confusing
pile of broken links when navigating to file:///usr/share/doc/xen/html/
and browsing around.

The difference between disk space occupied when not compressing is 5.2M
vs. 5.1M before.

So, add an exclude on /usr/share/doc/xen/html

Closes: #942611
Reported-by: Diederik de Haas <didi.debian@cknow.org>
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

xen/arm: fix gnttab_need_iommu_mapping

Commit 91d4eca7add broke gnttab_need_iommu_mapping on ARM.
The offending chunk is:

#define gnttab_need_iommu_mapping(d)                    \
-    (is_domain_direct_mapped(d) && need_iommu(d))
+    (is_domain_direct_mapped(d) && need_iommu_pt_sync(d))

On ARM we need gnttab_need_iommu_mapping to be true for dom0 when it is
directly mapped and IOMMU is enabled for the domain, like the old check
did, but the new check is always false.

In fact, need_iommu_pt_sync is defined as dom_iommu(d)->need_sync and
need_sync is set as:

    if ( !is_hardware_domain(d) || iommu_hwdom_strict )
        hd->need_sync = !iommu_use_hap_pt(d);

iommu_use_hap_pt(d) means that the page-table used by the IOMMU is the
P2M. It is true on ARM. need_sync means that you have a separate IOMMU
page-table and it needs to be updated for every change. need_sync is set
to false on ARM. Hence, gnttab_need_iommu_mapping(d) is false too,
which is wrong.

As a consequence, when using PV network from a domU on a system where
IOMMU is on from Dom0, I get:

(XEN) smmu: /smmu@fd800000: Unhandled context fault: fsr=0x402, iova=0x8424cb148, fsynr=0xb0001, cb=0
[   68.290307] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK

The fix is to go back to something along the lines of the old
implementation of gnttab_need_iommu_mapping.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Fixes: 91d4eca7add ("mm / iommu: split need_iommu() into has_iommu_pt() and need_iommu_pt_sync()")
Backport: 4.13+
(cherry picked from commit 04085ec1ac05a362812e9b0c6b5a8713d7dc88ad)

xen/page_alloc: Only flush the page to RAM once we know they are scrubbed

At the moment, each page are flushed to RAM just after the allocator
found some free pages. However, this is happening before check if the
page was scrubbed.

As a consequence, on Arm, a guest may be able to access the old content
of the scrubbed pages if it has cache disabled (default at boot) and
the content didn't reach the Point of Coherency.

The flush is now moved after we know the content of the page will not
change. This also has the benefit to reduce the amount of work happening
with the heap_lock held.

This is XSA-364.

Fixes: 307c3be3ccb2 ("mm: Don't scrub pages while holding heap lock in alloc_heap_pages()")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3b1cc15f1931ba56d0ee256fe9bfe65509733b27
master date: 2021-02-16 15:32:08 +0100

x86/dpci: do not remove pirqs from domain tree on unbind

A fix for a previous issue removed the pirqs from the domain tree when
they are unbound in order to prevent shared pirqs from triggering a
BUG_ON in __pirq_guest_unbind if they are unbound multiple times. That
caused free_domain_pirqs to no longer unmap the pirqs because they
are gone from the domain pirq tree, thus leaving stale unbound pirqs
after domain destruction if the domain had mapped dpci pirqs after
shutdown.

Take a different approach to fix the original issue, instead of
removing the pirq from d->pirq_tree clear the flags of the dpci pirq
struct to signal that the pirq is now unbound. This prevents calling
pirq_guest_unbind multiple times for the same pirq without having to
remove it from the domain pirq tree.

This is XSA-360.

Fixes: 5b58dad089 ('x86/pass-through: avoid double IRQ unbind during domain cleanup')
Reported-by: Samuel Verschelde <samuel.verschelde@vates.fr>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 58427889f5a420cc5226f88524b3228f90b72a58
master date: 2021-01-21 16:11:41 +0100

x86/mem_sharing: fix uninitialized 'preempted' variable

UBSAN catches an uninitialized use of the 'preempted' variable in
fork_hap_allocation when there is no preemption.

Fixes: 41548c5472a ("mem_sharing: VM forking")
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: cb34a2fa162184b150d48a3b6f385bacbec22ce7
master date: 2021-01-18 17:50:11 +0000

xen/memory: Fix compat XENMEM_acquire_resource for size requests

Copy the nr_frames from the structure which actually has the correct value, so
the caller doesn't unconditionally receive 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
master commit: 414be7b66349e7dca42bc1fd47c2b2f5b2d27432
master date: 2021-01-12 18:17:02 +0000

x86/ACPI: don't overwrite FADT

When marking fields invalid for our own purposes, we should do so in our
local copy (so we will notice later on), not in the firmware provided
one (which another entity may want to look at again, e.g. after kexec).
Also mark the function parameter const to notice such issues right away.

Instead use the pointer at the firmware copy for specifying an adjacent
printk()'s arguments. If nothing else this at least reduces the number
of relocations the assembler hasto emit and the linker has to process.

Fixes: 62d1a69a4e9f ("ACPI: support v5 (reduced HW) sleep interface")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 654c917d94d24587bb6b4a8d862baf04b25cbe33
master date: 2021-01-11 14:55:52 +0100

x86/hypercall: fix gnttab hypercall args conditional build on pvshim

A pvshim build doesn't require the grant table functionality built in,
but it does require knowing the number of arguments the hypercall has
so the hypercall parameter clobbering works properly.

Instead of also setting the argument count for the gnttab case if PV
shim functionality is enabled, just drop all of the conditionals from
hypercall_args_table, as a hypercall having a NULL handler won't get
to use that information anyway.

Note this hasn't been detected by osstest because the tools pvshim
build is done without debug enabled, so the hypercall parameter
clobbering doesn't happen.

Fixes: d2151152dd2 ('xen: make grant table support configurable')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b468b464c89e92629bd52cec58e9f51eee2e950a
master date: 2021-01-08 16:51:52 +0100

x86/dpci: EOI interrupt regardless of its masking status

Modify hvm_pirq_eoi to always EOI the interrupt if required, instead
of not doing such EOI if the interrupt is routed through the vIO-APIC
and the entry is masked at the time the EOI is performed.

Further unmask of the vIO-APIC pin won't EOI the interrupt, and thus
the guest OS has to wait for the timeout to expire and the automatic
EOI to be performed.

This allows to simplify the helpers and drop the vioapic_redir_entry
parameter from all of them.

Fixes: ccfe4e08455 ('Intel vt-d specific changes in arch/x86/hvm/vmx/vtd.')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: eb298f32fac5ac362eef30a66a9c9c42724d4ce6
master date: 2021-01-07 15:10:29 +0100

x86/vPCI: tolerate (un)masking a disabled MSI-X entry

None of the four reasons causing vpci_msix_arch_mask_entry() to get
called (there's just a single call site) are impossible or illegal prior
to an entry actually having got set up:
- the entry may remain masked (in this case, however, a prior masked ->
unmasked transition would already not have worked),
- MSI-X may not be enabled,
- the global mask bit may be set,
- the entry may not otherwise have been updated.
Hence the function asserting that the entry was previously set up was
simply wrong. Since the caller tracks the masked state (and setting up
of an entry would only be effected when that software bit is clear),
it's okay to skip both masking and unmasking requests in this case.

Fixes: d6281be9d0145 ('vpci/msix: add MSI-X handlers')
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
master commit: 04b090366ca59e8a75837c822df261a8d0bd1a30
master date: 2021-01-05 13:17:54 +0100

x86/hpet: Fix return value of hpet_setup()

hpet_setup() is idempotent if the rate has already been calculated, and
returns the cached value. However, this only works correctly when the return
statements are identical.

Use a sensibly named local variable, rather than a dead one with a bad name.

Fixes: a60bb68219 ("x86/time: reduce rounding errors in calculations")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 83736c567d6b64dbce98f251ca72e7870f556421
master date: 2020-12-31 16:19:00 +0000

update Xen version to 4.14.2-pre

update Xen version to 4.14.1

evtchn/FIFO: add 2nd smp_rmb() to evtchn_fifo_word_from_port()

Besides with add_page_to_event_array() the function also needs to
synchronize with evtchn_fifo_init_control() setting both d->evtchn_fifo
and (subsequently) d->evtchn_port_ops.

This is XSA-359 / CVE-2020-29571.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
master commit: dc8b01affd7f6f36d34c3854f51df0847df3ec0e
master date: 2020-12-15 13:42:51 +0100

evtchn/FIFO: re-order and synchronize (with) map_control_block()

For evtchn_fifo_set_pending()'s check of the control block having been
set to be effective, ordering of respective reads and writes needs to be
ensured: The control block pointer needs to be recorded strictly after
the setting of all the queue heads, and it needs checking strictly
before any uses of them (this latter aspect was already guaranteed).

This is XSA-358 / CVE-2020-29570.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
master commit: c5e63651fdc706954d920a2d98f74f4a21b46a7e
master date: 2020-12-15 13:46:37 +0100

x86/irq: fix infinite loop in irq_move_cleanup_interrupt

If Xen enters irq_move_cleanup_interrupt with a dynamic vector below
IRQ_MOVE_CLEANUP_VECTOR pending in IRR (0x20 or 0x21) that's also
designated for a cleanup it will enter a loop where
irq_move_cleanup_interrupt continuously sends a cleanup IPI (vector
0x22) to itself while waiting for the vector with lower priority to be
injected - which will never happen because IRQ_MOVE_CLEANUP_VECTOR
takes precedence and it's always injected first.

Fix this by making sure vectors below IRQ_MOVE_CLEANUP_VECTOR are
marked as used and thus not available for APs. Also add some logic to
assert and prevent irq_move_cleanup_interrupt from entering such an
infinite loop, albeit that should never happen given the current code.

This is XSA-356 / CVE-2020-29567.

Fixes: 3fba06ba9f8 ('x86/IRQ: re-use legacy vector ranges on APs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ca85682e8c16361fdf3814c9b25a2ec3ff4f8bed
master date: 2020-12-15 13:42:16 +0100

x86: avoid calling {svm,vmx}_do_resume()

These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.

Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.

This is XSA-348 / CVE-2020-29566.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: e6ebd394385db52855d1517cea829ffff68b34b8
master date: 2020-12-15 13:41:23 +0100

x86: fold guest_idle_loop() into idle_loop()

The latter can easily be made cover both cases. This is in preparation
of using idle_loop directly for populating idle_csw.tail.

Take the liberty and also adjust indentation / spacing in involved code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 058e469ab4d5cc5959423aafd6ba181dfc310a7f
master date: 2020-12-15 13:41:09 +0100

x86: replace reset_stack_and_jump_nolp()

Move the necessary check into check_for_livepatch_work(), rather than
mostly duplicating reset_stack_and_jump() for this purpose. This is to
prevent an inflation of reset_stack_and_jump() flavors.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: e3df3fbc448b9ae401332ba7d71c190d2efe3ae8
master date: 2020-12-15 13:40:27 +0100

tools/ocaml/xenstored: only Dom0 can change node owner

Otherwise we can give quota away to another domain, either causing it to run
out of quota, or in case of Dom0 use unbounded amounts of memory and bypass
the quota system entirely.

This was fixed in the C version of xenstored in 2006 (c/s db34d2aaa5f5,
predating the XSA process by 5 years).

It was also fixed in the mirage version of xenstore in 2012, with a unit test
demonstrating the vulnerability:

https://github.com/mirage/ocaml-xenstore/commit/6b91f3ac46b885d0530a51d57a9b3a57d64923a7
https://github.com/mirage/ocaml-xenstore/commit/22ee5417c90b8fda905c38de0d534506152eace6

but possibly without realising that the vulnerability still affected the
in-tree oxenstored (added c/s f44af660412 in 2010).

This is XSA-352.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml/xenstored: delete watch from trie too when resetting watches

c/s f8c72b526129 "oxenstored: implement XS_RESET_WATCHES" from Xen 4.6
introduced reset watches support in oxenstored by mirroring the change
in cxenstored.

However the OCaml version has some additional data structures to
optimize watch firing, and just resetting the watches in one of the data
structures creates a security bug where a malicious guest kernel can
exceed its watch quota, driving oxenstored into OOM:
* create watches
* reset watches (this still keeps the watches lingering in another data
structure, using memory)
* create some more watches
* loop until oxenstored dies

The guest kernel doesn't necessarily have to be malicious to trigger
this:
* if control/platform-feature-xs_reset_watches is set
* the guest kexecs (e.g. because it crashes)
* on boot more watches are set up
* this will slowly "leak" memory for watches in oxenstored, driving it
towards OOM.

This is XSA-330.

Fixes: f8c72b526129 ("oxenstored: implement XS_RESET_WATCHES")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/xenstore: Preserve bad client until they are destroyed

XenStored will kill any connection that it thinks has misbehaved,
this is currently happening in two places:
* In `handle_input()` if the sanity check on the ring and the message
fails.
* In `handle_output()` when failing to write the response in the ring.

As the domain structure is a child of the connection, XenStored will
destroy its view of the domain when killing the connection. This will
result in sending @releaseDomain event to all the watchers.

As the watch event doesn't carry which domain has been released,
the watcher (such as XenStored) will generally go through the list of
domains registers and check if one of them is shutting down/dying.
In the case of a client misbehaving, the domain will likely to be
running, so no action will be performed.

When the domain is effectively destroyed, XenStored will not be aware of
the domain anymore. So the watch event is not going to be sent.
By consequence, the watchers of the event will not release mappings
they may have on the domain. This will result in a zombie domain.

In order to send @releaseDomain event at the correct time, we want
to keep the domain structure until the domain is effectively
shutting-down/dying.

We also want to keep the connection around so we could possibly revive
the connection in the future.

A new flag 'is_ignored' is added to mark whether a connection should be
ignored when checking if there are work to do. Additionally any
transactions, watches, buffers associated to the connection will be
freed as you can't do much with them (restarting the connection will
likely need a reset).

As a side note, when the device model were running in a stubdomain, a
guest would have been able to introduce a use-after-free because there
is two parents for a guest connection.

This is XSA-325.

Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: drop watch event messages exceeding maximum size

By setting a watch with a very large tag it is possible to trick
xenstored to send watch event messages exceeding the maximum allowed
payload size. This might in turn lead to a crash of xenstored as the
resulting error can cause dereferencing a NULL pointer in case there
is no active request being handled by the guest the watch event is
being sent to.

Fix that by just dropping such watch events. Additionally modify the
error handling to test the pointer to be not NULL before dereferencing
it.

This is XSA-324.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

tools/ocaml/xenstored: Fix path length validation

Currently, oxenstored checks the length of paths against 1024, then
prepends "/local/domain/$DOMID/" to relative paths.  This allows a domU
to create paths which can't subsequently be read by anyone, even dom0.
This also interferes with listing directories, etc.

Define a new oxenstored.conf entry: quota-path-max, defaulting to 1024
as before.  For paths that begin with "/local/domain/$DOMID/" check the
relative path length against this quota. For all other paths check the
entire path length.

This ensures that if the domid changes (and thus the length of a prefix
changes) a path that used to be valid stays valid (e.g. after a
live-migration).  It also ensures that regardless how the client tries
to access a path (domid-relative or absolute) it will get consistent
results, since the limit is always applied on the final canonicalized
path.

Delete the unused Domain.get_path to avoid it being confused with
Connection.get_path (which differs by a trailing slash only).

Rewrite Util.path_validate to apply the appropriate length restriction
based on whether the path is relative or not.  Remove the check for
connection_path being absolute, because it is not guest controlled data.

This is part of XSA-323.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/ocaml/xenstored: clean up permissions for dead domains

domain ids are prone to wrapping (15-bits), and with sufficient number
of VMs in a reboot loop it is possible to trigger it.  Xenstore entries
may linger after a domain dies, until a toolstack cleans it up. During
this time there is a window where a wrapped domid could access these
xenstore keys (that belonged to another VM).

To prevent this do a cleanup when a domain dies:
* walk the entire xenstore tree and update permissions for all nodes
   * if the dead domain had an ACL entry: remove it
   * if the dead domain was the owner: change the owner to Dom0

This is done without quota checks or a transaction. Quota checks would
be a no-op (either the domain is dead, or it is Dom0 where they are not
enforced).  Transactions are not needed, because this is all done
atomically by oxenstored's single thread.

The xenstore entries owned by the dead domain are not deleted, because
that could confuse a toolstack / backends that are still bound to it
(or generate unexpected watch events). It is the responsibility of a
toolstack to remove the xenstore entries themselves.

This is part of XSA-322.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

tools/xenstore: revoke access rights for removed domains

Access rights of Xenstore nodes are per domid. Unfortunately existing
granted access rights are not removed when a domain is being destroyed.
This means that a new domain created with the same domid will inherit
the access rights to Xenstore nodes from the previous domain(s) with
the same domid.

This can be avoided by adding a generation counter to each domain.
The generation counter of the domain is set to the global generation
counter when a domain structure is being allocated. When reading or
writing a node all permissions of domains which are younger than the
node itself are dropped. This is done by flagging the related entry
as invalid in order to avoid modifying permissions in a way the user
could detect.

A special case has to be considered: for a new domain the first
Xenstore entries are already written before the domain is officially
introduced in Xenstore. In order not to drop the permissions for the
new domain a domain struct is allocated even before introduction if
the hypervisor is aware of the domain. This requires adding another
bool "introduced" to struct domain in xenstored. In order to avoid
additional padding holes convert the shutdown flag to bool, too.

As verifying permissions has its price regarding runtime add a new
quota for limiting the number of permissions an unprivileged domain
can set for a node. The default for that new quota is 5.

This is part of XSA-322.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <julien@amazon.com>

tools/ocaml/xenstored: add xenstored.conf flag to turn off watch permission checks

There are flags to turn off quotas and the permission system, so add one
that turns off the newly introduced watch permission checks as well.

This is part of XSA-115.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml/xenstored: avoid watch events for nodes without access

Today watch events are sent regardless of the access rights of the
node the event is sent for. This enables any guest to e.g. setup a
watch for "/" in order to have a detailed record of all Xenstore
modifications.

Modify that by sending only watch events for nodes that the watcher
has a chance to see otherwise (either via direct reads or by querying
the children of a node). This includes cases where the visibility of
a node for a watcher is changing (permissions being removed).

Permissions for nodes are looked up either in the old (pre
transaction/command) or current trees (post transaction).  If
permissions are changed multiple times in a transaction only the final
version is checked, because considering a transaction atomic the
individual permission changes would not be noticable to an outside
observer.

Two trees are only needed for set_perms: here we can either notice the
node disappearing (if we loose permission), appearing
(if we gain permission), or changing (if we preserve permission).

RM needs to only look at the old tree: in the new tree the node would be
gone, or could have different permissions if it was recreated (the
recreation would get its own watch fired).

Inside a tree we lookup the watch path's parent, and then the watch path
child itself.  This gets us 4 sets of permissions in worst case, and if
either of these allows a watch, then we permit it to fire.  The
permission lookups are done without logging the failures, otherwise we'd
get confusing errors about permission denied for some paths, but a watch
still firing. The actual result is logged in xenstored-access log:

  'w event ...' as usual if watch was fired
  'w notfired...' if the watch was not fired, together with path and
  permission set to help in troubleshooting

Adding a watch bypasses permission checks and always fires the watch
once immediately. This is consistent with the specification, and no
information is gained (the watch is fired both if the path exists or
doesn't, and both if you have or don't have access, i.e. it reflects the
path a domain gave it back to that domain).

There are some semantic changes here:

  * Write+rm in a single transaction of the same path is unobservable
    now via watches: both before and after a transaction the path
    doesn't exist, thus both tree lookups come up with the empty
    permission set, and noone, not even Dom0 can see this. This is
    consistent with transaction atomicity though.
  * Similar to above if we temporarily grant and then revoke permission
    on a path any watches fired inbetween are ignored as well
  * There is a new log event (w notfired) which shows the permission set
    of the path, and the path.
  * Watches on paths that a domain doesn't have access to are now not
    seen, which is the purpose of the security fix.

This is part of XSA-115.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml/xenstored: introduce permissions for special watches

The special watches "@introduceDomain" and "@releaseDomain" should be
allowed for privileged callers only, as they allow to gain information
about presence of other guests on the host. So send watch events for
those watches via privileged connections only.

Start to address this by treating the special watches as regular nodes
in the tree, which gives them normal semantics for permissions. A later
change will restrict the handling, so that they can't be listed, etc.

This is part of XSA-115.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml/xenstored: unify watch firing

This will make it easier insert additional checks in a follow-up patch.
All watches are now fired from a single function.

This is part of XSA-115.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml/xenstored: check privilege for XS_IS_DOMAIN_INTRODUCED

The Xenstore command XS_IS_DOMAIN_INTRODUCED should be possible for privileged
domains only (the only user in the tree is the xenpaging daemon).

This is part of XSA-115.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/ocaml/xenstored: ignore transaction id for [un]watch

Instead of ignoring the transaction id for XS_WATCH and XS_UNWATCH
commands as it is documented in docs/misc/xenstore.txt, it is tested
for validity today.

Really ignore the transaction id for XS_WATCH and XS_UNWATCH.

This is part of XSA-115.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/xenstore: avoid watch events for nodes without access

Today watch events are sent regardless of the access rights of the
node the event is sent for. This enables any guest to e.g. setup a
watch for "/" in order to have a detailed record of all Xenstore
modifications.

Modify that by sending only watch events for nodes that the watcher
has a chance to see otherwise (either via direct reads or by querying
the children of a node). This includes cases where the visibility of
a node for a watcher is changing (permissions being removed).

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: allow special watches for privileged callers only

The special watches "@introduceDomain" and "@releaseDomain" should be
allowed for privileged callers only, as they allow to gain information
about presence of other guests on the host. So send watch events for
those watches via privileged connections only.

In order to allow for disaggregated setups where e.g. driver domains
need to make use of those special watches add support for calling
"set permissions" for those special nodes, too.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: introduce node_perms structure

There are several places in xenstored using a permission array and the
size of that array. Introduce a new struct node_perms containing both.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: fire watches only when removing a specific node

Instead of firing all watches for removing a subtree in one go, do so
only when the related node is being removed.

The watches for the top-most node being removed include all watches
including that node, while watches for nodes below that are only fired
if they are matching exactly. This avoids firing any watch more than
once when removing a subtree.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: rework node removal

Today a Xenstore node is being removed by deleting it from the parent
first and then deleting itself and all its children. This results in
stale entries remaining in the data base in case e.g. a memory
allocation is failing during processing. This would result in the
rather strange behavior to be able to read a node (as its still in the
data base) while not being visible in the tree view of Xenstore.

Fix that by deleting the nodes from the leaf side instead of starting
at the root.

As fire_watches() is now called from _rm() the ctx parameter needs a
const attribute.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: check privilege for XS_IS_DOMAIN_INTRODUCED

The Xenstore command XS_IS_DOMAIN_INTRODUCED should be possible for
privileged domains only (the only user in the tree is the xenpaging
daemon).

Instead of having the privilege test for each command introduce a
per-command flag for that purpose.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: simplify and rename check_event_node()

There is no path which allows to call check_event_node() without a
event name. So don't let the result depend on the name being NULL and
add an assert() covering that case.

Rename the function to check_special_event() to better match the
semantics.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: fix node accounting after failed node creation

When a node creation fails the number of nodes of the domain should be
the same as before the failed node creation. In case of failure when
trying to create a node requiring to create one or more intermediate
nodes as well (e.g. when /a/b/c/d is to be created, but /a/b isn't
existing yet) it might happen that the number of nodes of the creating
domain is not reset to the value it had before.

So move the quota accounting out of construct_node() and into the node
write loop in create_node() in order to be able to undo the accounting
in case of an error in the intermediate node destructor.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>

tools/xenstore: ignore transaction id for [un]watch

Instead of ignoring the transaction id for XS_WATCH and XS_UNWATCH
commands as it is documented in docs/misc/xenstore.txt, it is tested
for validity today.

Really ignore the transaction id for XS_WATCH and XS_UNWATCH.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/xenstore: allow removing child of a node exceeding quota

An unprivileged user of Xenstore is not allowed to write nodes with a
size exceeding a global quota, while privileged users like dom0 are
allowed to write such nodes. The size of a node is the needed space
to store all node specific data, this includes the names of all
children of the node.

When deleting a node its parent has to be modified by removing the
name of the to be deleted child from it.

This results in the strange situation that an unprivileged owner of a
node might not succeed in deleting that node in case its parent is
exceeding the quota of that unprivileged user (it might have been
written by dom0), as the user is not allowed to write the updated
parent node.

Fix that by not checking the quota when writing a node for the
purpose of removing a child's name only.

The same applies to transaction handling: a node being read during a
transaction is written to the transaction specific area and it should
not be tested for exceeding the quota, as it might not be owned by
the reader and presumably the original write would have failed if the
node is owned by the reader.

This is part of XSA-115.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>

tools/ocaml/xenstored: do permission checks on xenstore root

This was lacking in a disappointing number of places.

The xenstore root node is treated differently from all other nodes, because it
doesn't have a parent, and mutation requires changing the parent.

Unfortunately this lead to open-coding the special case for root into every
single xenstore operation, and out of all the xenstore operations only read
did a permission check when handling the root node.

This means that an unprivileged guest can:

* xenstore-chmod / to its liking and subsequently write new arbitrary nodes
   there (subject to quota)
* xenstore-rm -r / deletes almost the entire xenstore tree (xenopsd quickly
   refills some, but you are left with a broken system)
* DIRECTORY on / lists all children when called through python
   bindings (xenstore-ls stops at /local because it tries to list recursively)
* get-perms on / works too, but that is just a minor information leak

Add the missing permission checks, but this should really be refactored to do
the root handling and permission checks on the node only once from a single
function, instead of getting it wrong nearly everywhere.

This is XSA-353.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Update changelog for new upstream 4.14.0+88-g1d1d1f5391

[git-debrebase changelog: new upstream 4.14.0+88-g1d1d1f5391]

Update to upstream 4.14.0+88-g1d1d1f5391

[git-debrebase anchor: new upstream 4.14.0+88-g1d1d1f5391, merge]

finalise changelog for unstable

Signed-off-by: Ian Jackson <iwj@xenproject.org>

changelog: document ~exp2

Signed-off-by: Ian Jackson <iwj@xenproject.org>

x86/vioapic: fix usage of index in place of GSI in vioapic_write_redirent

The usage of idx instead of the GSI in vioapic_write_redirent when
accessing gsi_assert_count can cause a PVH dom0 with multiple
vIO-APICs to lose interrupts in case a pin of a IO-APIC different than
the first one is unmasked with pending interrupts.

Switch to use gsi instead to fix the issue.

Fixes: 9f44b08f7d0e4 ('x86/vioapic: introduce support for multiple vIO APICS')
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Roger Pau Monné <roge.rpau@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3ae469af8e680df31eecd0a2ac6a83b58ad7ce53
master date: 2020-11-30 14:06:38 +0100

xen/events: rework fifo queue locking

Two cpus entering evtchn_fifo_set_pending() for the same event channel
can race in case the first one gets interrupted after setting
EVTCHN_FIFO_PENDING and when the other one manages to set
EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
lead to evtchn_check_pollers() being called before the event is put
properly into the queue, resulting eventually in the guest not seeing
the event pending and thus blocking forever afterwards.

Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
lock") made the race just more obvious, while the fifo event channel
implementation had this race forever since the introduction and use of
per-channel locks, when an unmask operation was running in parallel with
an event channel send operation.

Using a spinlock for the per event channel lock had turned out
problematic due to some paths needing to take the lock are called with
interrupts off, so the lock would need to disable interrupts, which in
turn broke some use cases related to vm events.

For avoiding this race the queue locking in evtchn_fifo_set_pending()
needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an
event channel needs to change queues both queues need to be locked
initially, in order to avoid having a window with no lock held at all.

Reported-by: Jan Beulich <jbeulich@suse.com>
Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
Fixes: de6acb78bf0e137c ("evtchn: use a per-event channel lock for sending events")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 71ac522909e9302350a88bc378be99affa87067c
master date: 2020-11-30 14:05:39 +0100

x86/DMI: fix SMBIOS pointer range check

Forever since its introduction this has been using an inverted relation
operator.

Fixes: 54057a28f22b ("x86: support SMBIOS v3")
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6befe598706218673b14710d90d00ce90763b372
master date: 2020-11-24 11:25:29 +0100

xen/events: access last_priority and last_vcpu_id together

The queue for a fifo event is depending on the vcpu_id and the
priority of the event. When sending an event it might happen the
event needs to change queues and the old queue needs to be kept for
keeping the links between queue elements intact. For this purpose
the event channel contains last_priority and last_vcpu_id values
elements for being able to identify the old queue.

In order to avoid races always access last_priority and last_vcpu_id
with a single atomic operation avoiding any inconsistencies.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
master commit: 1277cb9dc5e966f1faf665bcded02b7533e38078
master date: 2020-11-24 11:23:42 +0100

x86/vpt: fix build with old gcc

I believe it was the XSA-336 fix (42fcdd42328f "x86/vpt: fix race when
migrating timers between vCPUs") which has unmasked a bogus
uninitialized variable warning. This is observable with gcc 4.3.4, but
only on 4.13 and older; it's hidden on newer versions apparently due to
the addition to _read_unlock() done by 12509bbeb9e3 ("rwlocks: call
preempt_disable() when taking a rwlock").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f2c620aa062767b318267d678ae249dcb637b870
master date: 2020-11-18 12:38:01 +0100

xen/evtchn: revert 52e1fc47abc3a0123

With the event channel lock no longer disabling interrupts commit
52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on send path") can
be reverted again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: b5ad37f8e9284cc147218f7a5193d739ae7b956f
master date: 2020-11-10 14:37:15 +0100

xen/evtchn: rework per event channel lock

Currently the lock for a single event channel needs to be taken with
interrupts off, which causes deadlocks in some cases.

Rework the per event channel lock to be non-blocking for the case of
sending an event and removing the need for disabling interrupts for
taking the lock.

The lock is needed for avoiding races between event channel state
changes (creation, closing, binding) against normal operations (set
pending, [un]masking, priority changes).

Use a rwlock, but with some restrictions:

- Changing the state of an event channel (creation, closing, binding)
  needs to use write_lock(), with ASSERT()ing that the lock is taken as
  writer only when the state of the event channel is either before or
  after the locked region appropriate (either free or unbound).

- Sending an event needs to use read_trylock() mostly, in case of not
  obtaining the lock the operation is omitted. This is needed as
  sending an event can happen with interrupts off (at least in some
  cases).

- Dumping the event channel state for debug purposes is using
  read_trylock(), too, in order to avoid blocking in case the lock is
  taken as writer for a long time.

- All other cases can use read_lock().

Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
xen/events: fix build

Commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
introduced a build failure for NDEBUG builds.

Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 5f2df45ead7c1195142f68b7923047a1e9479d54
master date: 2020-11-10 14:36:15 +0100
master commit: 53bacb86f496fdb11560d9e3b361bca7de60d268
master date: 2020-11-11 08:56:21 +0100

memory: fix off-by-one in XSA-346 change

The comparison against ARRAY_SIZE() needs to be >= in order to avoid
overrunning the pages[] array.

This is XSA-355.

Fixes: 5777a3742d88 ("IOMMU: hold page ref until after deferred TLB flush")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
master commit: 9b156bcc3ffcc7949edd4460b718a241e87ae302
master date: 2020-11-24 14:01:31 +0100

debian/changelog: finalise 4.14.0+80-gd101b417b7-1~exp1

debian/changelog: fix typo

Partially revert "debian/rules: Combine shared Make args"

This reverts part of commit af30268ed73397d24acaee83477ff555a43df5e4.

The change caused the following FTBFS on i386:

    gcc    -Wl,-z,relro -Wl,-z,now -pthread -Wl,-soname
    -Wl,libxentoolcore.so.1 -shared -Wl,--version-script=libxentoolcore.map
    -o libxentoolcore.so.1.0 handlereg.opic
    /usr/bin/ld: i386:x86-64 architecture of input file `handlereg.opic' is
    incompatible with i386 output
    /usr/bin/ld: handlereg.opic: file class ELFCLASS64 incompatible with
    ELFCLASS32
    /usr/bin/ld: final link failed: file in wrong format
    collect2: error: ld returned 1 exit status

"For $(make_args_xen), ${XEN_TARGET_ARCH} gets $(xen_arch_$(flavour)),
but for $(make_args_tools), ${XEN_TARGET_ARCH} gets
$(xen_arch_$(DEB_HOST_ARCH))."

Update changelog for new upstream 4.14.0+80-gd101b417b7

[git-debrebase changelog: new upstream 4.14.0+80-gd101b417b7]

Update to upstream 4.14.0+80-gd101b417b7

[git-debrebase anchor: new upstream 4.14.0+80-gd101b417b7, merge]

WIP debian/changelog: start 4.14.0-1~exp2

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

d/xen-utils-common.xen.init: disable oom killer for xenstored

In case of oom killer terminating some process, we'd rather not see
xenstored go. Xenstored has an in-memory database, and when starting the
process again, it would be empty, which is very inconvenient. Xenstored
should already score quite low and have a fairly low memory footprint,
but according to the user report, it happened.

Closes: #961511
Suggested-by: Samuel Thibault <sthibault@debian.org>
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>

x86/msr: Disallow guest access to the RAPL MSRs

Researchers have demonstrated using the RAPL interface to perform a
differential power analysis attack to recover AES keys used by other cores in
the system.

Furthermore, even privileged guests cannot use this interface correctly, due
to MSR scope and vcpu scheduling issues. The interface would want to be
paravirtualised to be used sensibly.

Disallow access to the RAPL MSRs completely, as well as other MSRs which
potentially access fine grain power information.

This is part of XSA-351.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Currently a PV hardware domain can also be given control over the CPU
frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
However since commit 322ec7c89f6 the default behavior has been changed
to reject accesses to not explicitly handled MSRs, preventing PV
guests that manage CPU frequency from reading
MSR_IA32_PERF_{STATUS/CTL}.

Additionally some HVM guests (Windows at least) will attempt to read
MSR_IA32_PERF_CTL and will panic if given back a #GP fault:

vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0

Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
handling shared between HVM and PV guests, and add an explicit case
for reads to MSR_IA32_PERF_{STATUS/CTL}.

Restore previous behavior and allow PV guests with the required
permissions to read the contents of the mentioned MSRs. Non privileged
guests will get 0 when trying to read those registers, as writes to
MSR_IA32_PERF_CTL by such guest will already be silently dropped.

Fixes: 322ec7c89f6 ('x86/pv: disallow access to unknown MSRs')
Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 3059178798a23ba870ff86ff54d442a07e6651fc)

xen/arm: Always trap AMU system registers

The Activity Monitors Unit (AMU) has been introduced by ARMv8.4. It is
considered to be unsafe to be expose to guests as they might expose
information about code executed by other guests or the host.

Arm provided a way to trap all the AMU system registers by setting
CPTR_EL2.TAM to 1.

Unfortunately, on older revision of the specification, the bit 30 (now
CPTR_EL1.TAM) was RES0. Because of that, Xen is setting it to 0 and
therefore the system registers would be exposed to the guest when it is
run on processors with AMU.

As the bit is mark as UNKNOWN at boot in Armv8.4, the only safe solution
for us is to always set CPTR_EL1.TAM to 1.

Guest trying to access the AMU system registers will now receive an
undefined instruction. Unfortunately, this means that even well-behaved
guest may fail to boot because we don't sanitize the ID registers.

This is a known issues with other Armv8.0+ features (e.g. SVE, Pointer
Auth). This will taken care separately.

This is part of XSA-351 (or XSA-93 re-born).

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
(cherry picked from commit 628e1becb6fb121475a6ce68e3f1cb4499851255)

tools/libs/stat: use memcpy instead of strncpy in getBridge

Use memcpy in getBridge to prevent gcc warnings about truncated
strings. We know that we might truncate it, so the gcc warning
here is wrong.
Revert previous change changing buffer sizes as bigger buffers
are not needed.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Wei Liu <wl@xen.org>
(cherry picked from commit 40fe714ca4245a9716264fcb3ee8df42c3650cf6)

tool/libs/light: Fix libxenlight gcc warning

Fix gcc10 compilation warning about uninitialized variable by setting
it to 0.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Wei Liu <wl@xen.org>
(cherry picked from commit 0241809bf838875615797f52af34222e5ab8e98f)

tools/libxc: report malloc errors in writev_exact

The caller of writev_exact should be notified about malloc errors
when dealing with partial writes.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
(cherry picked from commit 0d8d289af7a679c028462c4ed5d98586f9ef9648)

tools/libs/stat: fix broken build

Making getBridge() static triggered a build error with some gcc versions:

error: 'strncpy' output may be truncated copying 15 bytes from a string of
length 255 [-Werror=stringop-truncation]

Fix that by using a buffer with 256 bytes instead.

Fixes: 6d0ec053907794 ("tools: split libxenstat into new tools/libs/stat directory")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
(cherry picked from commit c8099e48c3dbb8c7399551a265756ecf354eece2)

tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating

The do_ls() function has somewhat inconsistent handling of errors.

If reading the node's contents with xs_read() fails, then do_ls() will
just quietly not display the contents.

If reading the node's permissions with xs_get_permissions() fails, then
do_ls() will print a warning, continue, and ultimately won't exit with
an error code (unless another error happens).

If recursing into the node with xs_directory() fails, then do_ls() will
abort immediately, not printing any further nodes.

For persistent failure modes — such as ENOENT because a node has been
removed, or EACCES because it has had its permisions changed since the
xs_directory() on the parent directory returned its name — it's
obviously quite likely that if either of the first two errors occur for
a given node, then so will the third and thus xenstore-ls will abort.

The ENOENT one is actually a fairly common case, and has caused tools to
fail to clean up a network device because it *apparently* already
doesn't exist in xenstore.

There is a school of thought that says, "Well, xenstore-ls returned an
error. So the tools should not trust its output."

The natural corollary of this would surely be that the tools must re-run
xenstore-ls as many times as is necessary until its manages to exit
without hitting the race condition. I am not keen on that conclusion.

For the specific case of ENOENT it seems reasonable to declare that,
but for the timing, we might as well just not have seen that node at
all when calling xs_directory() for the parent. By ignoring the error,
we give acceptable output.

The issue can be reproduced as follows:

(dom0) # for a in `seq 1 1000` ; do
              xenstore-write /local/domain/2/foo/$a $a ;
         done

Now simultaneously:

(dom0) # for a in `seq 1 999` ; do
              xenstore-rm /local/domain/2/foo/$a ;
         done
(dom2) # while true ; do
              ./xenstore-ls -p /local/domain/2/foo | grep -c 1000 ;
         done

We should expect to see node 1000 in the output, every time.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit beb105af19cc3e58e2ed49224cfe190a736e5fa0)

tools/xenpmd: Fix gcc10 snprintf warning

Add a check for snprintf return code and ignore the entry if we get an
error. This should in fact never happen and is more a trick to make gcc
happy and prevent compilation errors.

This is solving the following gcc warning when compiling for arm32 host
platforms with optimization activated:
xenpmd.c:92:37: error: '%s' directive output may be truncated writing
between 4 and 2147483645 bytes into a region of size 271
[-Werror=format-truncation=]

This is also solving the following Debian bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970802

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Wei Liu <wl@xen.org>
(cherry picked from commit 0dfddb2116e3757f77a691a3fe335173088d69dc)

libxl: fix -Werror=stringop-truncation in libxl__prepare_sockaddr_un

In file included from /usr/include/string.h:495,
                 from libxl_internal.h:38,
                 from libxl_utils.c:20:
In function 'strncpy',
    inlined from 'libxl__prepare_sockaddr_un' at libxl_utils.c:1262:5:
/usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 108 equals destination size [-Werror=stringop-truncation]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit fff1b7f50e75ad9535c86f3fcf425b4945c50a1c)