dgit.raspbian.org Git

xl: improve exit codes of save/restore and migration functions

by making them more consistent with other examples in xl.

Macros CHK_ERRNOVAL, CHK_SYSCALL, MUST are also updated.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xl: improve return and exit codes of memory related functions

by making them more consistent with other examples in xl.

While there, make freemem() of boolean return type, which
looks more natural, and add comment explaining why
parse_mem_size_kb() needs to diverge from the pattern.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: replace the usage of uuid_t with a char array

The internals of the uuid_t struct don't match a big endian octet stream on
*BSD systems, which means that it cannot be directly casted to a
uint8_t[16].

In order to solve that change the type to be an unsigned char[16], which
doesn't imply any other change on Linux. On *BSDs change the helpers so that
the uuid is always stored as a big endian byte stream.

NB: tested on FreeBSD and Linux only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Discussed-with: Ian Jackson <Ian.Jackson@eu.citrix.com>
Discussed-with: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: handle xl migrate --debug in legacy stream

Doing a 'xl migrate --debug domU host' on xen-4.5 adds a
XC_SAVE_ID_ENABLE_VERIFY_MODE marker, which is not handled.
Since using --debug is valid usage, handle it by logging the fact
instead of aborting the migration.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: bind_usbintf: do not reuse 'path'

To avoid confusion, use "intf_path" to indicate driver/interface path,
and "bind_path" indicate driver/bind path.

CID: 1358111

Signed-off-by: Chunyan Liu <cyliu@suse.com>
CC: Simon Cao <caobosimon@gmail.com>
CC: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Set rc on failure of usbdev_busaddr_to_busid

We must set rc before using `goto out'.

Bug introduced in bf7628f0 "libxl: add pvusb API".

CID: 1358113
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: coverity@xenproject.org
CC: Simon Cao <caobosimon@gmail.com>
CC: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Chunyan Liu <cyliu@suse.com>

libxl: fix rc handling in libxl_device_usbdev_list

In testing with libvirt pvusb functionality, found a rc check
error in libxl_device_usbdev_list. Correct it. This function
is not used by xl.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
CC: Simon Cao <caobosimon@gmail.com>
CC: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86: remove the use of vm86_mode()

Xen, being 64bit only, cannot run PV guests in vm86 mode.  HVM guests however
can be running in vm86 mode, and common codepaths need to be able to cope.

The definition of vm86_mode() in x86_64/regs.h is incorrect, as the predicate
is used by non-PV codepaths.

One buggy use is in hvm/emulate.c.  An HVM guest can be in vm86 mode, and
vm86_mode() sliently omits the check.  Luckily, due to the VEX prefix decoding
logic in x86_emulate(), there is no path to the erronious use with EFLAGS_VM
set.

Another potentially problematic use is in show_guest_stack().  In principle,
show_guest_stack() is common code called for both PV and HVM vcpus.  HVM vcpus
exit early (with no reasonable way of making the code generic), making this
part a PV-only codepath.

Open-code its use in emulate.c, matching the surrounding code.  This causes
all other uses to be in PV-only codepaths, making the code to be logically
dead.  Drop it completely, to avoid future misuse.

Part of resulting cleanup removes vm86attr from read_descriptor(), although
retaining one relevant piece of information; i.e. whether we are reading a
selector for an instruction fetch, or a data fetch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/xsaves: ebx may return wrong value using CPUID eax=0xd,ecx =1

Refer to SDM Volume 1 Extended Region of an XSAVE Area. The value returned
by ecx[1] with cpuid function 0xd and sub-function i (i>1) indicates
the alignment of the state component i when the compacted format of the
extended region of an xsave area is used.

So when hvm guest using CPUID eax=0xd, ecx=1 to get the size of area
used for compacted format, we need to take alignment into consideration.

tools side is fixed by
"tools/libxc: Calculate xstate cpuid leaf from guest information"
by Andrew Cooper

Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/xsaves: fix two miscellaneous issues

1. get_xsave_addr() will only be called when
xsave_area_compressed(xsave) is true. So drop the
conditional expression.

2. expand_xsave_states() will memset the area when
get NULL from get_xsave_addr().

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

build: fix build with Clang

c/s 607044bf9 "build: avoid putting local absolute symbols in symbol tables"
breaks the build with Clang, as the command line argument isn't understood.

Clang does not appear to have any equivielent option, and already has
outstanding issues with duplicate symbols. Excluding this option makes the
problem no worse.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

sched: move pCPU initialization in a helper

That will turn out useful in following patches, where such
code will need to be called more than just once. Create an
helper now, and move the code there, to avoid mixing code
motion and functional changes later.

In Credit2, some style cleanup is also done.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

sched: implement .init_pdata in Credit, Credit2 and RTDS

In fact, if a scheduler needs per-pCPU information,
that needs to be initialized appropriately. So, we take
the code that is performing initializations from (right
now) .alloc_pdata, and use it for .init_pdata, leaving
only actualy allocations in the former, if any (which
is the case in RTDS and Credit1).

On the other hand, in Credit2, since we don't really
need any per-pCPU data allocation, everything that was
being done in .alloc_pdata, is now done in .init_pdata.
And the fact that now .alloc_pdata can be left undefined,
allows us to just get rid of it.

Still for Credit2, the fact that .init_pdata is called
during CPU_STARTING (rather than CPU_UP_PREPARE) kills
the need for the scheduler to setup a similar callback
itself, simplifying the code.

And thanks to such simplification, it is now also ok to
move some of the logic meant at double checking that a
cpu was (or was not) initialized, into ASSERTS (rather
than an if() and a BUG_ON).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

sched: make implementing .alloc_pdata optional

The .alloc_pdata scheduler hook must, before this change,
be implemented by all schedulers --even those ones that
don't need to allocate anything.

Make it possible to just use the SCHED_OP(), like for
the other hooks, by using ERR_PTR() and IS_ERR() for
error reporting. This:
- makes NULL a variant of success;
- allows for errors other than ENOMEM to be properly
communicated (if ever necessary).

This, in turn, means that schedulers not needing to
allocate any per-pCPU data, can avoid implementing the
hook. In fact, the artificial implementation of
.alloc_pdata in the ARINC653 is removed (and, while there,
nuke .free_pdata too, as it is equally useless).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>

libxl: pvusb: Correctly check the controller type

Missing a check of controller type.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
CC: Simon Cao <caobosimon@gmail.com>
CC: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

libxl: libxl_pvusb.c: Remove redundant lstat

There is no harm in calling realpath, and we can handle ENOENT then.

CID: 1358112

Signed-off-by: Chunyan Liu <cyliu@suse.com>
CC: Simon Cao <caobosimon@gmail.com>
CC: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

libxl: libxl_write_exactly: correct argument to sizeof

sizeof is wrongly used in libxl_write_exactly function, using
strlen instead.

CID: 1358110
CID: 1358109

Signed-off-by: Chunyan Liu <cyliu@suse.com>
CC: Simon Cao <caobosimon@gmail.com>
CC: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

libxc: fix uninitialized variable when changing rtds scheduling parameters

Commit 046c2b503a89d21b41e4d555a9f75d02af00dbc6 introduces a build
failure: in some cases (e.g., num_vcpus <=0),
xc_sched_rtds_vcpu_get/set returns an uninitialized variable.

Fix it.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Chong Li <chong.li@wustl.edu>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

xl: enable per-VCPU parameter for RTDS

Change main_sched_rtds and related output functions to support
per-VCPU settings.

Signed-off-by: Chong Li <chong.li@wustl.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: enable per-VCPU parameter for RTDS

Add libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU settings.

Signed-off-by: Chong Li <chong.li@wustl.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc: enable per-VCPU parameter for RTDS

Add xc_sched_rtds_vcpu_get/set functions to interact with
Xen to get/set a domain's per-VCPU parameters.

Signed-off-by: Chong Li <chong.li@wustl.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

hotplug/FreeBSD: document disk hotplug interface

Add the FreeBSD disk hotplug interface details to the block-scripts.txt
document.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>

libxl: fix error message in local_device_attach_cb

The fields that are printed might not be set in the case of a failure, which
generates a segmentation fault.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: add a FreeBSD implementation of libxl__devid_to_localdev

This code is extracted from the FreeBSD blkfront implementation.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: properly use vdev vs local device

The current code in libxl assumed that vdev is equal to local device, but
this is only true for Linux systems. In other OSes the local device can use
a nomenclature completely different from the virtual device one.

Move the current libxl__devid_to_localdev Linux implementation out of the
OS-specific file and rename it to libxl__devid_to_vdev, and then make sure
local_device_attach_cb return the local device in the diskpath field.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: add support for disk hotplug scripts on FreeBSD

Allow FreeBSD to execute hotplug scripts when attaching disk devices.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

libxl: refactor the FreeBSD hotplug script code

This factors out the nic hotplug specific code from the common code path in
order to make it easier to add support for disk hotplug scripts. It
shouldn't include any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

hotplug/FreeBSD: add block hotplug script

This is the default hotplug script for block devices. Its only job is to
copy the "params" blkback xenstore node to "physical-device".

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

blkif: document how FreeBSD uses the physical-device backend node

FreeBSD blkback uses the physical-device-path xenstore node in order to
fetch the path to the underlying backing storage (either a block device or
raw image). This node is set by the hotplug scripts. Also clarify the usage
of the physical-device node.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

blktap2: Invalid logic detecting unaligned buffers in vhd_write_block

It seems the logic is meant to detect sector unaligned buffers for block
writes. The NOTing of the logic instead masks off any unaligned bits and
also would cause the function to always fail. It seems the function is not
used in any of the tools so that is probably why the problem is not seen.
In the vhd_read_block function it is correct.

Signed-off-by: Ross Philipson <ross.philipson@ainfosec.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: colo: make it depend on availability of libnl

Netlink is required when initialising COLO, so make sure only to compile
COLO only when netlink is available. Change the inclusion of
linux/netlink.h to netlink/netlink.h so that it doesn't use Linux kernel
header directly.

Provide necessary stub functions in case COLO is disabled. This should
fix build on FreeBSD because there is no netlink there. It would also
make libxl build properly when netlink is not present on a Linux
system.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

build: rename CONFIG_REMUS_NETBUF to CONFIG_LIBNL

COLO and Remus net buffer support both depend on the availability of
libnl. Use a generic name.

No functional changes.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: colo: move netlink related stuff to libxl_colo_proxy.c

They are only used there, no need to expose them to other parts of
libxl.

This is necessary to make libxl build on FreeBSD again because FreeBSD
doesn't have netlink.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: colo: rearrange things in header files

We need to separate COLO code from common code as clean as possible.
With this patch, all COLO structures are now in libxl_colo.h.

It does the following:
1. Move two typedefs for libxl__domain_create_state{,cb} back to
libxl_internal.h.
2. Move libxl__colo_save_state to libxl_colo.h.
3. Include libxl_internal.h in libxl_colo.h.
4. Move a bunch of colo typedefs to the top of libxl_internal.h.
5. Move the inclusion of libxl_colo.h to the right place in
libxl_internal.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: check for dynamic device model start required

Add a service routine checking whether a device model must be started
after adding a device to a domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add service function to check whether device model is running

Add an internal service function to check for a running device model.
This can be used later when adding devices to a domain requiring a
device model for either printing an error message or starting the
device model in case it is not already running.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: add new pvusb backend "qusb" provided by qemu

Add a new pvusb backend type "qusb" which is provided by qemu. It can
be selected either by specifying the type directly in the configuration
or it is selected automatically by libxl in case there is no "usbback"
driver loaded.

The "qusb" backend is a full replacement for the kernel based backend
"vusb". The interface to the frontend is the very same including all
Xenstore paths, while the backend paths in Xenstore differ ("qusb"
instead ov "vusb").

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: add query function for backend support by device model

Add a function to query whether the device model is supporting a
specific backend type. The device model is writing the supported
backend types to Xenstore on startup. The new query function checks
for the appropriate entry to be present.

As not all versions of qemu are capable to indicate support of
specific backends the query function is to be called with an indicator
whether the default return value should be "supported" (in case qemu
doesn't know set any support information) or "not supported".

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: make libxl__need_xenpv_qemu() operate on domain config

libxl__need_xenpv_qemu() is called with configuration data for console,
vfbs, disks and channels today in order to evaluate the need for
starting a device model for a pv domain.

The console data is local to the caller and setup in a way to never
require a device model. All other data is taken from the domain config
structure.

In order to support other device backends via qemu change the interface
of libxl__need_xenpv_qemu() to take the domain config structure as
input instead of the single device arrays.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: handle error from libxl__need_xenpv_qemu() correctly

In case libxl__need_xenpv_qemu() returns an error let domain creation
fail.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl/CODING_STYLE: Clarify the singular statement in a conditional statement explanation.

It takes a bit of thinking to parse the original statement. It looks
like it is missing an ',' - right after the 'braced' to make it
obvious that:

if (X)
return XYZ;

is OK, while multiple conditionals require braces.

Changing the 'apart from' with 'except' makes it more obvious
(I hope).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: colo: remove dead code in colo_save_setup_script_cb

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: colo: fix indentation of abort()

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: colo: add missing break in qemu_disk_scsi_drive_string

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: colo: simplify colo_proxy_async_wait_for_checkpoint

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc: colo: don't leak pfns and iov in send_checkpoint_dirty_pfn_list

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: fix POLLHUP handling

The current code in bootloader_copyfail will error out on expected POLLHUPs
because of a missing "else" in the if clause.

The behaviour that triggers this bug has only been seen on FreeBSD so far.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl: Fix uninitialized pointer when passing an empty cdrom

Commit 3fec17d4bb56567d139d7806392f4d8702d3f6a7 introduced a bug where
an empty cdrom would cause target_path to be uninitialized.  Initialize
target_path to NULL instead.

The other option here would have been to set target_path to NULL only
on the LIBXL_DISK_FORMAT_EMPTY path.  That would potentially enable
the compiler to catch future uninitialized paths, rather than having
those paths (potentially) dereference a NULL pointer.  But given that
a bunch of our compilers failed to catch *this* uninitialized path,
setting it to NULL at declaration seems the safer option for now.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

sched: fix deadlock when changing scheduling parameters

Commit f7b87b0745b4 ("enable per-VCPU parameter for RTDS") introduced
a bug: it made it possible, in Credit and Credit2, when doing domain
or vcpu parameters' manipulation, to leave the hypervisor with a
spinlock held and interrupts disabled.

Fix it.

Signed-off-by: Chong Li <chong.li@wustl.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>

x86/pvh: initialize ioreq server for PVH guests

ioreq server list needs to be initialized for PVH guests. This list is
walked in handle_hvm_io_completion() by all guests in HVM containers and
leaving it uninitialized may cause PVH guests to crash there.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/HVM: remove dead code

With commit 96ae556569 ("x86/HVM: fix forwarding of internally cached
requests") rc doesn't change anymore in the respective preceding
switch() statements.

Coverity ID: 1358080
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/xsaves: fix overwriting between non-lazy/lazy xsaves

The offset at which components xsaved by xsave[sc] are not fixed.
So when when a save with v->fpu_dirtied set is followed by one
with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data
written by the lazy one.

The solution is when using_xsave_compact is enabled and taking xcr0_accum into
consideration, if guest has ever used XSTATE_LAZY & ~XSTATE_FP_SSE
(XSTATE_FP_SSE will be excluded beacause xsave will write XSTATE_FP_SSE
part in legacy region of xsave area which is fixed, saving XSTATE_FS_SSE
will not cause overwriting problem), vcpu_xsave_mask will return XSTATE_ALL.
Otherwise vcpu_xsave_mask will return XSTATE_NONLAZY.

This may cause overhead save on lazy states which will cause performance
impact. After doing some performance tests on xsavec and xsaveopt
(suggested by jan), the results show xsaveopt performs better than xsavec.
So hypervisor will not use xsavec anymore.

xsaves will not be used until supervised state is introduced in hypervisor.
And XSTATE_XSAVES_ONLY (indicates supervised state is understood in xen)
is introduced, the use of xsaves depend on whether XSTATE_XSAVES_ONLY is set
in xcr0_accum.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

public/xen.h: add flags field to vcpu_time_info

This field has two possible flags (as of latest pvclock ABI
shared with KVM).

flags: bits in this field indicate extended capabilities
coordinated between the guest and the hypervisor.  Specifically
on KVM, availability of specific flags has to be checked in
0x40000001 cpuid leaf. On Xen, we don't have that but we can
still check some of the flags after registering the time info
page since a force_update_vcpu_system_time is performed.

Current flags are:

flag bit   | cpuid bit    | meaning
-------------------------------------------------------------
            |              | time measures taken across
     0      |      24      | multiple cpus are guaranteed to
            |              | be monotonic
-------------------------------------------------------------
            |              | guest vcpu has been paused by
     1      |     N/A      | the host
            |              |
-------------------------------------------------------------

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Add XEN_ prefixes to new #define-s. Make structure layout change
dependent upon __XEN_INTERFACE_VERSION__ (intentionally comparing to
4.6, as we may want to backport this at least there).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

docs: Document block-script protocol

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

libxl: Allow local access for block devices with hotplug scripts

pygrub and qemuu need to be able to access a VM's disks locally in
order to be able to pull out the kernel and provide emulated disk
access, respectively. This can be done either by accessing the local
disk directly, or by plugging the target disk into dom0 to allow
access.

Unfortunately, while the plugging machinery works for pygrub, it does
not yet work for qemuu; this means that at the moment, disks with
hotplug scripts or disks with non-dom0 backends cannot be provided as
emulated devices to HVM domains.

Fortunately, disks using hotplug scripts created in dom0 do create a
block device as part of set-up, which can be accessed locally; and if
they use block-common.sh:write_dev, this path will be written to
physical-device-path.

Modify libxl__device_disk_setdefault() to be able to fish this path
out of xenstore and pass it to the caller.

Unfortunately, at the time pygrub runs, the devices have not yet been
set up. Rather than try to stash the domid somewhere to pass, we just
pass INVALID_DOMID.

This allows qemuu to emulate block devices created with custom hotplug
scripts.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

libxl: Share logic for finding path between qemuu and pygrub

qemu can also access disks which will be provided with a qdisk backend
directly; add a flag to libxl__device_disk_find_local_path to indicate
whether to check for qdisk direct access.

Call libxl__device_disk_find_local_path() for most paths. If we can't
find a local path, print an error and skip the disk, rather than using
a bogus path.

Now if there is no local access to the disk (i.e., because the disk
has a non-local backend, or relies on a custom hotplug script), libxl
will now print a warning and not provide the emulated disk, rather
than providing bogus parameters to qemu which cause it to error out.
(Such disks will still be available via the PV backend.)

I left the libxl__blktap_devpath in the qemuu-specific code rather
than sharing it with the pyrgub code because:

1) When the pygrub path runs the guest disks have not yet been set up

2) libxl__blktap_devpath() will give you the existing devpath if it
already exists, but will set one up for you if you don't. So on the
pygrub path, this would end up setting up a new tap device.

3) There is no tap-specific teardown code on the pygrub path, and I
don't want to add any (particularly since I'm hoping to remove tapdisk
altogether soon).

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

libxl: Rearrange qemu upstream disk argument code

Reorganize the qemuu disk argument code to make a clean separation
between finding a file to use, and constructing the parameters:

* Rename pdev_path to target_path

* Only use qemu_disk_format_string() in circumstances where qemu may
be interpreting the disk (i.e., backend==QDISK). In all other cases,
it should use RAW.

* Share as much as possible between the is_cdrom path and the normal
path.

This is mainly prep for sharing the local path finder with the
bootloader; but it does allow cdroms to use any backend that a normal
disk can use. Previously this was limited to RAW files or things that
qemu could handle directly; as of this changeset, it now includes tap
disks; and in future changesets it will include backends with custom
block scripts.

NB that this retains an existing bug, that disks with custom block
scripts or non-dom0 backends will have the bogus pdev_path passed in
to qemu, most likely resulting in qemu exiting with an error. This
will be fixed in follow-up patches.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

libxl: Move check for local access to a funciton

Move pygrub checks for local access ability into a separate function.

Also reorganize libxl__device_disk_local_initiate_attach so that we
don't initialize dls->disk unless we actually end up doing a local
attach.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

tools/hotplug: Write physical-device-path in addition to physical-device

Change block-common.sh on Linux to write physical-device-path with the
path of the device node, in addition to physical-device with its
major:minor numbers.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

libxl: Remove redundant setting of phyical-device

Regardless of whether we're running a custom hotplug script or using
normal phy: or file:, the "block" script will be run, which will set
all the necessary xenstore nodes.

In fact, writing this value here prevents the block script from
accomplishing its only purpose: to detect duplicate physical block
devices used in different virtual devices. The first thing the block
script does is check to see if this node is written; and if it is, it
silently exits.

Remove this, and let the block script perform its duplicate checking
function.

NOTE: It's likely that the duplicate checking for physical devices has
never been run under libxl (at least since this bug was introduced);
this may shake out some issues.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

tools/hotplug: Add a "dummy" hotplug script for testing

Testing the hotplug external script path at the moment involves
actually setting up one of the alternate datapaths (blktap, iscsi,
&c). Simplify testing by making a script which does a simple loopback,
but still has a target that can't be used directly.

To use:

script=block-dummy,vdev=xvda,target=dummy:<file>

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>

unmodified_drivers: enable use of register_oldmem_pfn_is_ram() API

During the investigation of very slow dump times of guest images in
Amazon EC2 instance, it was discovered that the
register_oldmem_pfn_is_ram() API implemented by the upstream kernel
commit 997c136f518c5debd63847e78e2a8694f56dcf90:

        fs/proc/vmcore.c: add hook to read_from_oldmem() to check
                           for non-ram pages

was not being called.  This was due to the PV driver with the call
to register_oldmem_pfn_is_ram() API was not including the
kernel header file that is used to communicate support of the API in the
kernel.  Fix the issue by including the required header file.

Signed-off-by: Mike Meyer <mike.meyer@teradata.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Olaf Hering <olaf@aepfle.de>

MAINTAINERS: add ioreq.c and ioreq.h to x86 I/O emulation file list

Commit 108788e8 "x86/hvm: separate ioreq server code from generic hvm
code" split the code supporting ioreq servers from the rest of the HVM
code. Now that this is separate, the new files can be added to the x86
I/O emulation file list in MAINTAINERS.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

libxl: ARM build: fix type of libxl__srm_callout_callback_restore_results

COLO introduced a few callbacks. The original implementation used
unsigned long for a type which in fact should be xen_pfn_t. That broke
libxl compilation on ARM, because xen_pfn_t is not a synonym for
unsigned long on ARM platform.

Fixing this requires modifying the perl script: specifically now we
need to include xenctrl.h before _libxl_save_msgs_*.h, rather than
afterwards, so that we can use xen_pfn_t there.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc: remove second unistd.h inclusion

There is already one a few lines above.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

libxc: xc_domain_resume_hvm is used by x86 only

The call site is enclosed by x86 define guards.

Without this patch:

[ 334s] xc_resume.c:112:12: error: 'xc_domain_resume_hvm' defined but not used [-Werror=unused-function]
[ 334s] static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>

libxc: use PRIx64 to print out pfn

Pfn is always 64 bit long. Use PRIx64 to avoid truncation.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>

docs: fix xl manpage compilation broken by COLO

Rearrange the section to conform with pod syntax. Fix some typos along
the way.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

doc: document pvusb Xenstore paths

The patches adding Xen tools support for paravirtualized USB devices
(pvUSB) omitted documenting the introduced Xenstore paths.

Add the paths to docs/misc/xenstore-paths.markdown

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

doc: correct hyperlinks in docs/misc/xenstore-paths.markdown

The hyperlinks for the different I/O protocols are wrong.

Correct them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/hvm: separate ioreq server code from generic hvm code

The code in hvm/hvm.c related to handling I/O emulation using the ioreq
server framework is large and mostly self-contained.

This patch separates the ioreq server code into a new hvm/ioreq.c source
module and accompanying asm-x86/hvm/ioreq.h header file. There is no
intended functional change, only code movement.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

enable per-VCPU parameter for RTDS

Add XEN_DOMCTL_SCHEDOP_getvcpuinfo and _putvcpuinfo hypercalls
to independently get and set the scheduling parameters of each
vCPU of a domain.

Also fix a bug in XEN_DOMCTL_SCHEDOP_getinfo, where PERIOD and
BUDGET are not divided by MICROSECS(1) before being retruned
to the caller.

Signed-off-by: Chong Li <chong.li@wustl.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>

libxl: allow 'phy' backend to use empty files

This was introduced by 97ee1f (~5 years ago), but was probably never
surfaced because most people used regular files as CDROM images, so the PHY
backend was actually never selected. A year ago this was changed, and now
regular RAW files are also handled by the PHY backend, which has made this
bug suface.

Fix it by allowing empty disks to use the PHY backend, skipping the stat
tests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Alex Braunegg <alex.braunegg@gmail.com>

libelf: rewrite symtab/strtab loading

Current implementation of elf_load_bsdsyms is broken when loading inside of
a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
memory space, which it is not.

Take the oportunity to do some cleanup and properly document how
elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
when dealing with data that needs to be copied to the guest memory space.
Also reduce the number of section headers copied to the minimum necessary.

This patch also removes the duplication of code found in the libxc ELF
loader, since the libelf symtab/strtab loading code will also handle this
case without having to duplicate it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN

And use it as the default value for the VGA kind. This allows libxl to set
it to the default value later on when the domain type is known. For HVM
guests the default value is LIBXL_VGA_INTERFACE_TYPE_CIRRUS while for
HVMlite the default value is LIBXL_VGA_INTERFACE_TYPE_NONE.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xenalyze: handle RTDS scheduler events

so the trace will show properly decoded info,
rather than just a bunch of hex codes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: George Dunlap <george.dunlap@citrix.com>

xenalyze: handle DOM0 operations events

(i.e., domain creation and destruction) so the
trace will show properly decoded info, rather
than just a bunch of hex codes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>

tools/libxc: Fix build error when using xc_version_len

Tools fails to build with gcc 4.5, it does not provide ssize_t.

Fixes d275ec9 ("libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION
hypercall")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reported-by: Olaf Hering <olaf@aepfle.de>
Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc: Document xc_domain_resume

Document the save and suspend mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>

libxc: support to resume uncooperative HVM guests

Before this patch:
1. Suspend
  a. PVHVM and PV: we use the same way to suspend the guest (send the
     suspend request to the guest). If the guest doesn't support evtchn, the
     xenstore variant will be used, suspending the guest via XenBus control
     node.
  b. Pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
     the guest.

2. Resume:
  a. Fast path (fast=1)
     Do not change the guest state. We call libxl__domain_resume(.., 1) which
     calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
     PV:       Modify the return code to 1, and than call the domctl:
               XEN_DOMCTL_resumedomain
     PVHVM:    same with PV
     Pure HVM: Do nothing in modify_returncode, and than call the domctl:
               XEN_DOMCTL_resumedomain
  b. Slow path (fast=0)
     Used when the guest's state have been changed. Will call
     libxl__domain_resume(..., 0) to resume the guest.
     PV:       Update start info, and reset all secondary CPU states. Than call
               the domctl: XEN_DOMCTL_resumedomain
     PVHVM:    Can not be resumed. You will get the following error message:
               "Cannot resume uncooperative HVM guests"
     Pure HVM: Same with PVHVM

After this patch:
1. Suspend
  Unchanged.

2. Resume
  a. Fast path
     Unchanged.
  b. Slow path
     PV:       Unchanged.
     PVHVM:    Call XEN_DOMCTL_resumedomain to resume the guest. Because we
               don't modify the return code, PV drivers will disconnect
               and reconnect.
               The guest ends up doing the XENMAPSPACE_shared_info
               XENMEM_add_to_physmap hypercall and resetting all of its CPU
               states to point to the shared_info (except the ones past 32).
               That is the Linux kernel does that - regardless whether the
               SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
     Pure HVM: Call XEN_DOMCTL_resumedomain to resume the guest.

Under COLO, we will update the guest's state (modify memory, CPU registers,
device state etc). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use the slow path to resume the guest.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ wei: reformat commit message a bit ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>

cmdline switches and config vars to control colo-proxy

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

setup and control colo proxy on secondary side

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

setup and control colo proxy on primary side

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

COLO nic: implement COLO nic subkind

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

COLO proxy: implement setup/teardown/preresume/postresume/checkpoint

setup/teardown/preresume/postresume/checkpoint of COLO proxy module.
we use netlink to communicate with proxy module.
About colo-proxy module:
http://www.spinics.net/lists/netdev/msg333520.html
https://github.com/wencongyang/colo-proxy
How to use:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

COLO: use qemu block replication

Use qemu block replication as our block replication solution.
Note that guest must be paused before starting COLO, otherwise,
the disk won't be consistent between primary and secondary.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Support colo mode for qemu disk

Usage: disk = ['...,colo,colo-host=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Note: we just introduce COLO framework, but don't implement COLO
operations in this patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Introduce COLO mode and refactor relevant function

No functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

COLO: introduce new API to prepare/start/do/get_error/stop replication

We will use qemu block replication, and qemu provides some qmp commands
to prepare replication, start replication, get replication error, and
stop replication. Introduce new API to execute these qmp commands.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

implement the cmdline for COLO

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option --colo to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc/save: support COLO save

After suspend primary vm, get dirty bitmap on secondary vm,
and send pages both dirty on primary/secondary to secondary.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc/restore: support COLO restore

a. call callbacks resume/checkpoint/suspend while secondary vm
status is consistent with primary
b. send dirty pfn list to primary when checkpoint under colo
c. send store gfn and console gfn to xl before resuming secondary vm

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

primary vm suspend/resume/checkpoint code

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read CHECKPOINT_SVM_SUSPENDED sent by secondary
2. Checkpoint
   a. Write emulator xenstore data and emulator context
   b. Write checkpoint end record
3. Resume primary vm
   a. Read CHECKPOINT_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read CHECKPOINT_SVM_RESUMED from slave
4. Wait a new checkpoint
   a. Wait a new checkpoint(not implemented)
   b. Send CHECKPOINT_NEW to slave

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_internal: move stream read manipulations to right place

No functional changes and this cleanup will make the later
patch called "primary vm suspend/resume/checkpoint code" not
too complicated.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

secondary vm suspend/resume/checkpoint code

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send CHECKPOINT_SVM_READY to master.
   b. If it is not the first resume, call libxl__checkpoint_devices_preresume().
   c. If it is the first resume(resume right after live migration),
      - call libxl__xc_domain_restore_done() to build the secondary vm.
      - enable secondary vm's logdirty.
      - call libxl__domain_resume() to resume secondary vm.
      - call libxl__checkpoint_devices_setup() to setup checkpoint devices.
   d. Send CHECKPOINT_SVM_RESUMED to master.
2. Wait a new checkpoint
   a. Call libxl__checkpoint_devices_commit().
   b. Read CHECKPOINT_NEW from master.
3. Suspend secondary vm
   a. Suspend secondary vm.
   b. Call libxl__checkpoint_devices_postsuspend().
   c. Send CHECKPOINT_SVM_SUSPENDED to master.
4. Checkpoint
   a. Read emulator xenstore data and emulator context
   b. REC_TYPE_CHECKPOINT_END

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/libxl: add back channel support to read stream

This is used by primay to read records sent by secondary.

Note: The function libxl__stream_read_checkpoint_state() will be used
in later patches called "secondary vm suspend/resume/checkpoint code" and
"primary vm suspend/resume/checkpoint code".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/libxl: add back channel support to write stream

Add back channel support to write stream. If the write stream is
a back channel stream, this means the write stream is used by
Secondary to send some records back.

Note: The function libxl__stream_write_checkpoint_state() will be used
in later patches called "secondary vm suspend/resume/checkpoint code" and
"primary vm suspend/resume/checkpoint code".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc/migration: export read_record for common use

read_record() could be used by primary to read dirty bitmap
record sent by secondary under COLO.
When used by xc save side, we need to pass the backchannel fd
instead of ctx->fd to read_record(), so we added a fd param to
it.
No functional changes.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc/migration: Specification update for DIRTY_PFN_LIST records

Used by secondary to send it's dirty bitmap to primary under COLO.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>