From: Timothée Ravier Date: Fri, 2 Oct 2020 12:34:33 +0000 (+0200) Subject: docs: Move and update pages from the manual X-Git-Tag: archive/raspbian/2022.1-3+rpi1~1^2~4^2~7^2~8^2~2 X-Git-Url: https://dgit.raspbian.org/?a=commitdiff_plain;h=68ac9e9c50d0e190a5a82bef5fdea29bd46fdd0a;p=ostree.git docs: Move and update pages from the manual --- diff --git a/docs/adapting-existing.md b/docs/adapting-existing.md new file mode 100644 index 00000000..cc4b76d2 --- /dev/null +++ b/docs/adapting-existing.md @@ -0,0 +1,180 @@ +--- +nav_order: 6 +--- + +# Adapting existing mainstream distributions +{: .no_toc } + +1. TOC +{:toc} + +## System layout + +First, OSTree encourages systems to implement +[UsrMove](http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge/) +This is simply to avoid the need for more bind mounts. By default +OSTree's dracut hook creates a read-only bind mount over `/usr`; you +can of course generate individual bind-mounts for `/bin`, all the +`/lib` variants, etc. So it is not intended to be a hard requirement. + +Remember, because by default the system is booted into a `chroot` +equivalent, there has to be some way to refer to the actual physical +root filesystem. Therefore, your operating system tree should contain +an empty `/sysroot` directory; at boot time, OSTree will make this a +bind mount to the physical / root directory. There is precedent for +this name in the initramfs context. You should furthermore make a +toplevel symbolic link `/ostree` which points to `/sysroot/ostree`, so +that the OSTree tool at runtime can consistently find the system data +regardless of whether it's operating on a physical root or inside a +deployment. + +Because OSTree only preserves `/var` across upgrades (each +deployment's chroot directory will be garbage collected +eventually), you will need to choose how to handle other +toplevel writable directories specified by the [Filesystem Hierarchy Standard](http://www.pathname.com/fhs/). +Your operating system may of course choose +not to support some of these such as `/usr/local`, but following is the +recommended set: + + - `/home` → `/var/home` + - `/opt` → `/var/opt` + - `/srv` → `/var/srv` + - `/root` → `/var/roothome` + - `/usr/local` → `/var/usrlocal` + - `/mnt` → `/var/mnt` + - `/tmp` → `/sysroot/tmp` + +Furthermore, since `/var` is empty by default, your operating system +will need to dynamically create the *targets* of these at boot. A +good way to do this is using `systemd-tmpfiles`, if your OS uses +systemd. For example: + +``` +d /var/log/journal 0755 root root - +L /var/home - - - - ../sysroot/home +d /var/opt 0755 root root - +d /var/srv 0755 root root - +d /var/roothome 0700 root root - +d /var/usrlocal 0755 root root - +d /var/usrlocal/bin 0755 root root - +d /var/usrlocal/etc 0755 root root - +d /var/usrlocal/games 0755 root root - +d /var/usrlocal/include 0755 root root - +d /var/usrlocal/lib 0755 root root - +d /var/usrlocal/man 0755 root root - +d /var/usrlocal/sbin 0755 root root - +d /var/usrlocal/share 0755 root root - +d /var/usrlocal/src 0755 root root - +d /var/mnt 0755 root root - +d /run/media 0755 root root - +``` + +Particularly note here the double indirection of `/home`. By default, +each deployment will share the global toplevel `/home` directory on +the physical root filesystem. It is then up to higher levels of +management tools to keep `/etc/passwd` or equivalent synchronized +between operating systems. Each deployment can easily be reconfigured +to have its own home directory set simply by making `/var/home` a real +directory. + +## Booting and initramfs technology + +OSTree comes with optional dracut+systemd integration code which follows +this logic: + +- Parse the `ostree=` kernel command line argument in the initramfs +- Set up a read-only bind mount on `/usr` +- Bind mount the deployment's `/sysroot` to the physical `/` +- Use `mount(MS_MOVE)` to make the deployment root appear to be the root filesystem + +After these steps, systemd switches root. + +If you are not using dracut or systemd, using OSTree should still be +possible, but you will have to write the integration code. See the +existing sources in +[src/switchroot](https://github.com/ostreedev/ostree/tree/master/src/switchroot) +as a reference. + +Patches to support other initramfs technologies and init systems, if +sufficiently clean, will likely be accepted upstream. + +A further specific note regarding `sysvinit`: OSTree used to support +recording device files such as the `/dev/initctl` FIFO, but no longer +does. It's recommended to just patch your initramfs to create this at +boot. + +## /usr/lib/passwd + +Unlike traditional package systems, OSTree trees contain *numeric* uid +and gids. Furthermore, it does not have a `%post` type mechanism +where `useradd` could be invoked. In order to ship an OS that +contains both system users and users dynamically created on client +machines, you will need to choose a solution for `/etc/passwd`. The +core problem is that if you add a user to the system for a daemon, the +OSTree upgrade process for `/etc` will simply notice that because +`/etc/passwd` differs from the previous default, it will keep the +modified config file, and your new OS user will not be visible. The +solution chosen for the [Gnome Continuous](https://live.gnome.org/Projects/GnomeContinuous) operating +system is to create `/usr/lib/passwd`, and to include a NSS module +[nss-altfiles](https://github.com/aperezdc/nss-altfiles) which +instructs glibc to read from it. Then, the build system places all +system users there, freeing up `/etc/passwd` to be purely a database +of local users. See also a more recent effort from [Systemd Stateless](http://0pointer.de/blog/projects/stateless.html) + +## Adapting existing package managers + +The largest endeavor is likely to be redesigning your distribution's +package manager to be on top of OSTree, particularly if you want to +keep compatibility with the "old way" of installing into the physical +`/`. This section will use examples from both `dpkg` and `rpm` as the +author has familiarity with both; but the abstract concepts should +apply to most traditional package managers. + +There are many levels of possible integration; initially, we will +describe the most naive implementation which is the simplest but also +the least efficient. We will assume here that the admin is booted +into an OSTree-enabled system, and wants to add a set of packages. + +Many package managers store their state in `/var`; but since in the +OSTree model that directory is shared between independent versions, +the package database must first be found in the per-deployment `/usr` +directory. It becomes read-only; remember, all upgrades involve +constructing a new filesystem tree, so your package manager will also +need to create a copy of its database. Most likely, if you want to +continue supporting non-OSTree deployments, simply have your package +manager fall back to the legacy `/var` location if the one in `/usr` +is not found. + +To install a set of new packages (without removing any existing ones), +enumerate the set of packages in the currently booted deployment, and +perform dependency resolution to compute the complete set of new +packages. Download and unpack these new packages to a temporary +directory. + +Now, because we are merely installing new packages and not +removing anything, we can make the major optimization of reusing +our existing filesystem tree, and merely +*layering* the composed filesystem tree of +these new packages on top. A command like this: + +``` +ostree commit -b osname/releasename/description +--tree=ref=$osname/$releasename/$description +--tree=dir=/var/tmp/newpackages.13A8D0/ +``` + +will create a new commit in the `$osname/$releasename/$description` +branch. The OSTree SHA256 checksum of all the files in +`/var/tmp/newpackages.13A8D0/` will be computed, but we will not +re-checksum the present existing tree. In this layering model, +earlier directories will take precedence, but files in later layers +will silently override earlier layers. + +Then to actually deploy this tree for the next boot: +`ostree admin deploy $osname/$releasename/$description` + +This is essentially what [rpm-ostree](https://github.com/projectatomic/rpm-ostree/) +does to support its [package layering model](https://rpm-ostree.readthedocs.io/en/latest/manual/administrator-handbook/#hybrid-imagepackaging-via-package-layering). + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/atomic-upgrades.md b/docs/atomic-upgrades.md new file mode 100644 index 00000000..3ddd8b40 --- /dev/null +++ b/docs/atomic-upgrades.md @@ -0,0 +1,129 @@ +--- +nav_order: 5 +--- + +# Atomic Upgrades +{: .no_toc } + +1. TOC +{:toc} + +## You can turn off the power anytime you want... + +OSTree is designed to implement fully atomic and safe upgrades; +more generally, atomic transitions between lists of bootable +deployments. If the system crashes or you pull the power, you +will have either the old system, or the new one. + +## Simple upgrades via HTTP + +First, the most basic model OSTree supports is one where it replicates +pre-generated filesystem trees from a server over HTTP, tracking +exactly one ref, which is stored in the `.origin` file for the +deployment. The command `ostree admin upgrade` +implements this. + +To begin a simple upgrade, OSTree fetches the contents of the ref from +the remote server. Suppose we're tracking a ref named +`exampleos/buildmaster/x86_64-runtime`. OSTree fetches the URL +`http://example.com/repo/refs/heads/exampleos/buildmaster/x86_64-runtime`, +which contains a SHA256 checksum. This determines the tree to deploy, +and `/etc` will be merged from currently booted tree. + +If we do not have this commit, then, then we perform a pull process. +At present (without static deltas), this involves quite simply just +fetching each individual object that we do not have, asynchronously. +Put in other words, we only download changed files (zlib-compressed). +Each object has its checksum validated and is stored in `/ostree/repo/objects/`. + +Once the pull is complete, we have all the objects locally +we need to perform a deployment. + +## Upgrades via external tools (e.g. package managers) + +As mentioned in the introduction, OSTree is also designed to allow a +model where filesystem trees are computed on the client. It is +completely agnostic as to how those trees are generated; they could be +computed with traditional packages, packages with post-deployment +scripts on top, or built by developers directly from revision control +locally, etc. + +At a practical level, most package managers today (`dpkg` and `rpm`) +operate "live" on the currently booted filesystem. The way they could +work with OSTree is instead to take the list of installed packages in +the currently booted tree, and compute a new filesystem from that. A +later chapter describes in more details how this could work: +[Adapting Existing Systems](adapting-existing.md). + +For the purposes of this section, let's assume that we have a +newly generated filesystem tree stored in the repo (which shares +storage with the existing booted tree). We can then move on to +checking it back out of the repo into a deployment. + +## Assembling a new deployment directory + +Given a commit to deploy, OSTree first allocates a directory for +it. This is of the form `/boot/loader/entries/ostree-$stateroot-$checksum.$serial.conf`. +The `$serial` is normally `0`, but if a +given commit is deployed more than once, it will be incremented. +This is supported because the previous deployment may have +configuration in `/etc` that we do not want to use or overwrite. + +Now that we have a deployment directory, a 3-way merge is +performed between the (by default) currently booted deployment's +`/etc`, its default +configuration, and the new deployment (based on its `/usr/etc`). + +## Atomically swapping boot configuration + +At this point, a new deployment directory has been created as a +hardlink farm; the running system is untouched, and the bootloader +configuration is untouched. We want to add this deployment to the +"deployment list". + +To support a more general case, OSTree supports atomic transitioning +between arbitrary sets of deployments, with the restriction that the +currently booted deployment must always be in the new set. In the +normal case, we have exactly one deployment, which is the booted one, +and we want to add the new deployment to the list. A more complex +command might allow creating 100 deployments as part of one atomic +transaction, so that one can set up an automated system to bisect +across them. + +## The bootversion + +OSTree allows swapping between boot configurations by implementing the +"swapped directory pattern" in `/boot`. This means it is a symbolic +link to one of two directories `/ostree/boot.[0|1]`. To swap the +contents atomically, if the current version is `0`, we create +`/ostree/boot.1`, populate it with the new contents, then atomically +swap the symbolic link. Finally, the old contents can be garbage +collected at any point. + +## The /ostree/boot directory + +However, we want to optimize for the case where the set of +kernel/initramfs/devicetree sets is the same between both the old and new +deployment lists. This happens when doing an upgrade that does not +include the kernel; think of a simple translation update. OSTree +optimizes for this case because on some systems `/boot` may be on a +separate medium such as flash storage not optimized for significant +amounts of write traffic. Related to this, modern OSTree has support +for having `/boot` be a read-only mount by default - it will +automatically remount read-write just for the portion of time +necessary to update the bootloader configuration. + +To implement this, OSTree also maintains the directory +`/ostree/boot.$bootversion`, which is a set +of symbolic links to the deployment directories. The +`$bootversion` here must match the version of +`/boot`. However, in order to allow atomic transitions of +*this* directory, this is also a swapped directory, +so just like `/boot`, it has a version of `0` or `1` appended. + +Each bootloader entry has a special `ostree=` argument which refers to +one of these symbolic links. This is parsed at runtime in the +initramfs. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/buildsystem-and-repos.md b/docs/buildsystem-and-repos.md new file mode 100644 index 00000000..6d506b4e --- /dev/null +++ b/docs/buildsystem-and-repos.md @@ -0,0 +1,196 @@ +--- +nav_order: 8 +--- + +# Writing a buildsystem and managing repositories +{: .no_toc } + +1. TOC +{:toc} + +OSTree is not a package system. It does not directly support building +source code. Rather, it is a tool for transporting and managing +content, along with package-system independent aspects like bootloader +management for updates. + +We'll assume here that we're planning to generate commits on a build +server, then have client systems replicate it. Doing client-side +assembly is also possible of course, but this discussion will focus +primarily on server-side concerns. + +## Build vs buy + +Therefore, you need to either pick an existing tool for writing +content into an OSTree repository, or to write your own. An example +tool is [rpm-ostree](https://github.com/projectatomic/rpm-ostree) - it +takes as input RPMs, and commits them (currently oriented for a server +side, but aiming to do client side too). + +## Initializing + +For this initial discussion, we're assuming you have a single +`archive` repository: + +``` +mkdir repo +ostree --repo=repo init --mode=archive +``` + +You can export this via a static webserver, and configure clients to +pull from it. + +## Writing your own OSTree buildsystem + +There exist many, many systems that basically follow this pattern: + +``` +$pkg --installroot=/path/to/tmpdir install foo bar baz +$imagesystem commit --root=/path/to/tmpdir +``` + +For various values of `$pkg` such as `yum`, `apt-get`, etc., and +values of `$imagesystem` could be simple tarballs, Amazon Machine +Images, ISOs, etc. + +Now obviously in this document, we're going to talk about the +situation where `$imagesystem` is OSTree. The general idea with +OSTree is that wherever you might store a series of tarballs for +applications or OS images, OSTree is likely going to be better. For +example, it supports GPG signatures, binary deltas, writing bootloader +configuration, etc. + +OSTree does not include a package/component build system simply +because there already exist plenty of good ones - rather, it is +intended to provide an infrastructure layer. + +The above mentioned `rpm-ostree compose tree` chooses RPM as the value +of `$pkg` - so binaries are built as RPMs, then committed as a whole +into an OSTree commit. + +But let's discuss building our own. If you're just experimenting, +it's quite easy to start with the command line. We'll assume for this +purpose that you have a build process that outputs a directory tree - +we'll call this tool `$pkginstallroot` (which could be `yum +--installroot` or `debootstrap`, etc.). + +Your initial prototype is going to look like: + +``` +$pkginstallroot /path/to/tmpdir +ostree --repo=repo commit -s 'build' -b exampleos/x86_64/standard --tree=dir=/path/to/tmpdir +``` + +Alternatively, if your build system can generate a tarball, you can +commit that tarball into OSTree. For example, +[OpenEmbedded](http://www.openembedded.org/) can output a tarball, and +one can commit it via: + +``` +ostree commit -s 'build' -b exampleos/x86_64/standard --tree=tar=myos.tar +``` + +## Constructing trees from unions + +The above is a very simplistic model, and you will quickly notice that +it's slow. This is because OSTree has to re-checksum and recompress +the content each time it's committed. (Most of the CPU time is spent +in compression which gets thrown away if the content turns out to be +already stored). + +A more advanced approach is to store components in OSTree itself, then +union them, and recommit them. At this point, we recommend taking a +look at the OSTree API, and choose a programming language supported by +[GObject Introspection](https://wiki.gnome.org/Projects/GObjectIntrospection) +to write your buildsystem scripts. Python may be a good choice, or +you could choose custom C code, etc. + +For the purposes of this tutorial we will use shell script, but it's +strongly recommended to choose a real programming language for your +build system. + +Let's say that your build system produces separate artifacts (whether +those are RPMs, zip files, or whatever). These artifacts should be +the result of `make install DESTDIR=` or similar. Basically +equivalent to RPMs/debs. + +Further, in order to make things fast, we will need a separate +`bare-user` repository in order to perform checkouts quickly via +hardlinks. We'll then export content into the `archive` repository +for use by client systems. + +``` +mkdir build-repo +ostree --repo=build-repo init --mode=bare-user +``` + +You can begin committing those as individual branches: + +``` +ostree --repo=build-repo commit -b exampleos/x86_64/bash --tree=tar=bash-4.2-bin.tar.gz +ostree --repo=build-repo commit -b exampleos/x86_64/systemd --tree=tar=systemd-224-bin.tar.gz +``` + +Set things up so that whenever a package changes, you redo the +`commit` with the new package version - conceptually, the branch +tracks the individual package versions over time, and defaults to +"latest". This isn't required - one could also include the version in +the branch name, and have metadata outside to determine "latest" (or +the desired version). + +Now, to construct our final tree: + +``` +rm -rf exampleos-build +for package in bash systemd; do + ostree --repo=build-repo checkout -U --union exampleos/x86_64/${package} exampleos-build +done +# Set up a "rofiles-fuse" mount point; this ensures that any processes +# we run for post-processing of the tree don't corrupt the hardlinks. +mkdir -p mnt +rofiles-fuse exampleos-build mnt +# Now run global "triggers", generate cache files: +ldconfig -r mnt + (Insert other programs here) +fusermount -u mnt +ostree --repo=build-repo commit -b exampleos/x86_64/standard --link-checkout-speedup exampleos-build +``` + +There are a number of interesting things going on here. The major +architectural change is that we're using `--link-checkout-speedup`. +This is a way to tell OSTree that our checkout is made via hardlinks, +and to scan the repository in order to build up a reverse `(device, +inode) -> checksum` mapping. + +In order for this mapping to be accurate, we needed the `rofiles-fuse` +to ensure that any changed files had new inodes (and hence a new +checksum). + +## Migrating content between repositories + +Now that we have content in our `build-repo` repository (in +`bare-user` mode), we need to move the `exampleos/x86_64/standard` +branch content into the repository just named `repo` (in `archive` +mode) for export, which will involve zlib compression of new objects. +We likely want to generate static deltas after that as well. + +Let's copy the content: + +``` +ostree --repo=repo pull-local build-repo exampleos/x86_64/standard +``` + +Clients can now incrementally download new objects - however, this +would also be a good time to generate a delta from the previous +commit. + +``` +ostree --repo=repo static-delta generate exampleos/x86_64/standard +``` + +## More sophisticated repository management + +Next, see [Repository Management](repository-management.md) for the +next steps in managing content in OSTree repositories. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/deployment.md b/docs/deployment.md new file mode 100644 index 00000000..1ea7ea46 --- /dev/null +++ b/docs/deployment.md @@ -0,0 +1,109 @@ +--- +nav_order: 4 +--- + +# Deployments +{: .no_toc } + +1. TOC +{:toc} + +## Overview + +Built on top of the OSTree versioning filesystem core is a layer +that knows how to deploy, parallel install, and manage Unix-like +operating systems (accessible via `ostree admin`). The core content of these operating systems +are treated as read-only, but they transparently share storage. + +A deployment is physically located at a path of the form +`/ostree/deploy/$stateroot/deploy/$checksum`. +OSTree is designed to boot directly into exactly one deployment +at a time; each deployment is intended to be a target for +`chroot()` or equivalent. + +### "stateroot" (AKA "osname"): Group of deployments that share /var + +Each deployment is grouped in exactly one "stateroot" (also known as an "osname"); +the former term is preferred. + +From above, you can see that an stateroot is physically represented in the +`/ostree/deploy/$stateroot` directory. For example, OSTree can allow parallel +installing Debian in `/ostree/deploy/debian` and Red Hat Enterprise Linux in +`/ostree/deploy/rhel` (subject to operating system support, present released +versions of these operating systems may not support this). + +Each stateroot has exactly one copy of the traditional Unix `/var`, +stored physically in `/ostree/deploy/$stateroot/var`. OSTree provides +support tools for `systemd` to create a Linux bind mount that ensures +the booted deployment sees the shared copy of `/var`. + +OSTree does not touch the contents of `/var`. Operating system +components such as daemon services are required to create any +directories they require there at runtime +(e.g. `/var/cache/$daemonname`), and to manage upgrading data formats +inside those directories. + +### Contents of a deployment + +A deployment begins with a specific commit (represented as a +SHA256 hash) in the OSTree repository in `/ostree/repo`. This commit refers +to a filesystem tree that represents the underlying basis of a +deployment. For short, we will call this the "tree", to +distinguish it from the concept of a deployment. + +First, the tree must include a kernel (and optionally an initramfs). The +current standard locations for these are `/usr/lib/modules/$kver/vmlinuz` and +`/usr/lib/modules/$kver/initramfs.img`. The "boot checksum" will be computed +automatically. This follows the current Fedora kernel layout, and is +the current recommended path. However, older versions of libostree don't +support this; you may need to also put kernels in the previous (legacy) +paths, which are `vmlinuz(-.*)?-$checksum` in either `/boot` or `/usr/lib/ostree-boot`. +The checksum should be a SHA256 hash of the kernel contents; it must be +pre-computed before storing the kernel in the repository. Optionally, +the directory can also contain an initramfs, stored as +`initramfs(-.*)?-$checksum` and/or a device tree, stored as +`devicetree(-.*)?-$checksum`. If an initramfs or devicetree exist, +the checksum must include all of the kernel, initramfs and devicetree contents. +OSTree will use this to determine which kernels are shared. The rationale for +this is to avoid computing checksums on the client by default. + +The deployment should not have a traditional UNIX `/etc`; instead, it +should include `/usr/etc`. This is the "default configuration". When +OSTree creates a deployment, it performs a 3-way merge using the +*old* default configuration, the active system's `/etc`, and the new +default configuration. In the final filesystem tree for a deployment +then, `/etc` is a regular writable directory. + +Besides the exceptions of `/var` and `/etc` then, the rest of the +contents of the tree are checked out as hard links into the +repository. It's strongly recommended that operating systems ship all +of their content in `/usr`, but this is not a hard requirement. + +Finally, a deployment may have a `.origin` file, stored next to its +directory. This file tells `ostree admin upgrade` how to upgrade it. +At the moment, OSTree only supports upgrading a single refspec. +However, in the future OSTree may support a syntax for composing +layers of trees, for example. + +### The system /boot + +While OSTree parallel installs deployments cleanly inside the +`/ostree` directory, ultimately it has to control the system's `/boot` +directory. The way this works is via the +[Boot Loader Specification](http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec), +which is a standard for bootloader-independent drop-in configuration +files. + +When a tree is deployed, it will have a configuration file generated +of the form +`/boot/loader/entries/ostree-$stateroot-$checksum.$serial.conf`. This +configuration file will include a special `ostree=` kernel argument +that allows the initramfs to find (and `chroot()` into) the specified +deployment. + +At present, not all bootloaders implement the BootLoaderSpec, so +OSTree contains code for some of these to regenerate native config +files (such as `/boot/syslinux/syslinux.conf`) based on the entries. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/formats.md b/docs/formats.md new file mode 100644 index 00000000..36d395bd --- /dev/null +++ b/docs/formats.md @@ -0,0 +1,196 @@ +--- +nav_order: 7 +--- + +# OSTree data formats +{: .no_toc } + +1. TOC +{:toc} + +## On the topic of "smart servers" + +One really crucial difference between OSTree and git is that git has a +"smart server". Even when fetching over `https://`, it isn't just a +static webserver, but one that e.g. dynamically computes and +compresses pack files for each client. + +In contrast, the author of OSTree feels that for operating system +updates, many deployments will want to use simple static webservers, +the same target most package systems were designed to use. The +primary advantages are security and compute efficiency. Services like +Amazon S3 and CDNs are a canonical target, as well as a stock static +nginx server. + +## The archive format + +In the [repo](repo) section, the concept of objects was introduced, +where file/content objects are checksummed and managed individually. +(Unlike a package system, which operates on compressed aggregates). + +The `archive` format simply gzip-compresses each content object. +Metadata objects are stored uncompressed. This means that it's easy +to serve via static HTTP. Note: the repo config file still uses the +historical term `archive-z2` as mode. But this essentially indicates +the modern `archive` format. + +When you commit new content, you will see new `.filez` files appearing +in `objects/`. + +## archive efficiency + +The advantages of `archive`: + + - It's easy to understand and implement + - Can be served directly over plain HTTP by a static webserver + - Clients can download/unpack updates incrementally + - Space efficient on the server + +The biggest disadvantage of this format is that for a client to +perform an update, one HTTP request per changed file is required. In +some scenarios, this actually isn't bad at all, particularly with +techniques to reduce HTTP overhead, such as +[HTTP/2](https://en.wikipedia.org/wiki/HTTP/2). + +In order to make this format work well, you should design your content +such that large data that changes infrequently (e.g. graphic images) +are stored separately from small frequently changing data (application +code). + +Other disadvantages of `archive`: + + - It's quite bad when clients are performing an initial pull (without HTTP/2), + - One doesn't know the total size (compressed or uncompressed) of content + before downloading everything + +## Aside: the bare and bare-user formats + +The most common operation is to pull from an `archive` repository +into a `bare` or `bare-user` formatted repository. These latter two +are not compressed on disk. In other words, pulling to them is +similar to unpacking (but not installing) an RPM/deb package. + +The `bare-user` format is a bit special in that the uid/gid and xattrs +from the content are ignored. This is primarily useful if you want to +have the same OSTree-managed content that can be run on a host system +or an unprivileged container. + +## Static deltas + +OSTree itself was originally focused on a continuous delivery model, where +client systems are expected to update regularly. However, many OS vendors +would like to supply content that's updated e.g. once a month or less often. + +For this model, we can do a lot better to support batched updates than +a basic `archive` repo. However, we still want to preserve the +model of "static webserver only". Given this, OSTree has gained the +concept of a "static delta". + +These deltas are targeted to be a delta between two specific commit +objects, including "bsdiff" and "rsync-style" deltas within a content +object. Static deltas also support `from NULL`, where the client can +more efficiently download a commit object from scratch - this is +mostly useful when using OSTree for containers, rather than OS images. +For OS images, one tends to download an installer ISO or qcow2 image +which is a single file that contains the tree data already. + +Effectively, we're spending server-side storage (and one-time compute +cost), and gaining efficiency in client network bandwidth. + +## Static delta repository layout + +Since static deltas may not exist, the client first needs to attempt +to locate one. Suppose a client wants to retrieve commit `${new}` +while currently running `${current}`. + +The first thing to understand is that in order to save space, these +two commits are "modified base64" - the `/` character is replaced with +`_`. + +Like the commit objects, a "prefix directory" is used to make +management easier for filesystem tools + +A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example +`GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`, +which in SHA256 format is +`1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`. + +Finally, the actual content can be found in +`deltas/$fromprefix/$fromsuffix-$to`. + +## Static delta internal structure + +A delta is itself a directory. Inside, there is a file called +`superblock` which contains metadata. The rest of the files will be +integers bearing packs of content. + +The file format of static deltas should be currently considered an +OSTree implementation detail. Obviously, nothing stops one from +writing code which is compatible with OSTree today. However, we would +like the flexibility to expand and change things, and having multiple +codebases makes that more problematic. Please contact the authors +with any requests. + +That said, one critical thing to understand about the design is that +delta payloads are a bit more like "restricted programs" than they are +raw data. There's a "compilation" phase which generates output that +the client executes. + +This "updates as code" model allows for multiple content generation +strategies. The design of this was inspired by that of Chromium: +[ChromiumOS Autoupdate](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate). + +### The delta superblock + +The superblock contains: + + - arbitrary metadata + - delta generation timestamp + - the new commit object + - An array of recursive deltas to apply + - An array of per-part metadata, including total object sizes (compressed and uncompressed), + - An array of fallback objects + +Let's define a delta part, then return to discuss details: + +## A delta part + +A delta part is a combination of a raw blob of data, plus a very +restricted bytecode that operates on it. Say for example two files +happen to share a common section. It's possible for the delta +compilation to include that section once in the delta data blob, then +generate instructions to write out that blob twice when generating +both objects. + +Realistically though, it's very common for most of a delta to just be +"stream of new objects" - if one considers it, it doesn't make sense +to have too much duplication inside operating system content at this +level. + +So then, what's more interesting is that OSTree static deltas support +a per-file delta algorithm called +[bsdiff](https://github.com/mendsley/bsdiff) that most notably works +well on executable code. + +The current delta compiler scans for files with matching basenames in +each commit that have a similar size, and attempts a bsdiff between +them. (It would make sense later to have a build system provide a +hint for this - for example, files within a same package). + +A generated bsdiff is included in the payload blob, and applying it is +an instruction. + +## Fallback objects + +It's possible for there to be large-ish files which might be resistant +to bsdiff. A good example is that it's common for operating systems +to use an "initramfs", which is itself a compressed filesystem. This +"internal compression" defeats bsdiff analysis. + +For these types of objects, the delta superblock contains an array of +"fallback objects". These objects aren't included in the delta +parts - the client simply fetches them from the underlying `.filez` +object. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/introduction.md b/docs/introduction.md new file mode 100644 index 00000000..a6fa2252 --- /dev/null +++ b/docs/introduction.md @@ -0,0 +1,191 @@ +--- +nav_order: 2 +--- + +# OSTree Overview +{: .no_toc } + +1. TOC +{:toc} + +## Introduction + +OSTree is an upgrade system for Linux-based operating systems that +performs atomic upgrades of complete filesystem trees. It is +not a package system; rather, it is intended to complement them. +A primary model is composing packages on a server, and then +replicating them to clients. + +The underlying architecture might be summarized as "git for +operating system binaries". It operates in userspace, and will +work on top of any Linux filesystem. At its core is a git-like +content-addressed object store with branches (or "refs") to track +meaningful filesystem trees within the store. Similarly, one can +check out or commit to these branches. + +Layered on top of that is bootloader configuration, management of +`/etc`, and other functions to perform an upgrade beyond just +replicating files. + +You can use OSTree standalone in the pure replication model, +but another approach is to add a package manager on top, +thus creating a hybrid tree/package system. + +## Hello World example + +OSTree is mostly used as a library, but a quick tour of using its +CLI tools can give a general idea of how it works at its most +basic level. + +You can create a new OSTree repository using `init`: + +``` +$ ostree --repo=repo init +``` + +This will create a new `repo` directory containing your +repository. Feel free to inspect it. + +Now, let's prepare some data to add to the repo: + +``` +$ mkdir tree +$ echo "Hello world!" > tree/hello.txt +``` + +We can now import our `tree/` directory using the `commit` +command: + +``` +$ ostree --repo=repo commit --branch=foo tree/ +``` + +This will create a new branch `foo` pointing to the full tree +imported from `tree/`. In fact, we could now delete `tree/` if we +wanted to. + +To check that we indeed now have a `foo` branch, you can use the +`refs` command: + +``` +$ ostree --repo=repo refs +foo +``` + +We can also inspect the filesystem tree using the `ls` and `cat` +commands: + +``` +$ ostree --repo=repo ls foo +d00775 1000 1000 0 / +-00664 1000 1000 13 /hello.txt +$ ostree --repo=repo cat foo /hello.txt +Hello world! +``` + +And finally, we can check out our tree from the repository: + +``` +$ ostree --repo=repo checkout foo tree-checkout/ +$ cat tree-checkout/hello.txt +Hello world! +``` + +## Comparison with "package managers" + +Because OSTree is designed for deploying core operating +systems, a comparison with traditional "package managers" such +as dpkg and rpm is illustrative. Packages are traditionally +composed of partial filesystem trees with metadata and scripts +attached, and these are dynamically assembled on the client +machine, after a process of dependency resolution. + +In contrast, OSTree only supports recording and deploying +*complete* (bootable) filesystem trees. It +has no built-in knowledge of how a given filesystem tree was +generated or the origin of individual files, or dependencies, +descriptions of individual components. Put another way, OSTree +only handles delivery and deployment; you will likely still want +to include inside each tree metadata about the individual +components that went into the tree. For example, a system +administrator may want to know what version of OpenSSL was +included in your tree, so you should support the equivalent of +`rpm -q` or `dpkg -L`. + +The OSTree core emphasizes replicating read-only OS trees via +HTTP, and where the OS includes (if desired) an entirely +separate mechanism to install applications, stored in `/var` if they're system global, or +`/home` for per-user +application installation. An example application mechanism is + + +However, it is entirely possible to use OSTree underneath a +package system, where the contents of `/usr` are computed on the client. +For example, when installing a package, rather than changing the +currently running filesystem, the package manager could assemble +a new filesystem tree that layers the new packages on top of a +base tree, record it in the local OSTree repository, and then +set it up for the next boot. To support this model, OSTree +provides an (introspectable) C shared library. + +## Comparison with block/image replication + +OSTree shares some similarity with "dumb" replication and +stateless deployments, such as the model common in "cloud" +deployments where nodes are booted from an (effectively) +readonly disk, and user data is kept on a different volumes. +The advantage of "dumb" replication, shared by both OSTree and +the cloud model, is that it's *reliable* +and *predictable*. + +But unlike many default image-based deployments, OSTree supports +exactly two persistent writable directories that are preserved across +upgrades: `/etc` and `/var`. + +Because OSTree operates at the Unix filesystem layer, it works +on top of any filesystem or block storage layout; it's possible +to replicate a given filesystem tree from an OSTree repository +into plain ext4, BTRFS, XFS, or in general any Unix-compatible +filesystem that supports hard links. Note: OSTree will +transparently take advantage of some BTRFS features if deployed +on it. + +OSTree is orthogonal to virtualization mechanisms like AMIs and qcow2 +images, though it's most useful though if you plan to update stateful +VMs in-place, rather than generating new images. + +In practice, users of "bare metal" configurations will find the OSTree +model most useful. + +## Atomic transitions between parallel-installable read-only filesystem trees + +Another deeply fundamental difference between both package +managers and image-based replication is that OSTree is +designed to parallel-install *multiple versions* of multiple +*independent* operating systems. OSTree +relies on a new toplevel `ostree` directory; it can in fact +parallel install inside an existing OS or distribution +occupying the physical `/` root. + +On each client machine, there is an OSTree repository stored +in `/ostree/repo`, and a set of "deployments" stored in `/ostree/deploy/$STATEROOT/$CHECKSUM`. +Each deployment is primarily composed of a set of hardlinks +into the repository. This means each version is deduplicated; +an upgrade process only costs disk space proportional to the +new files, plus some constant overhead. + +The model OSTree emphasizes is that the OS read-only content +is kept in the classic Unix `/usr`; it comes with code to +create a Linux read-only bind mount to prevent inadvertent +corruption. There is exactly one `/var` writable directory shared +between each deployment for a given OS. The OSTree core code +does not touch content in this directory; it is up to the code +in each operating system for how to manage and upgrade state. + +Finally, each deployment has its own writable copy of the +configuration store `/etc`. On upgrade, OSTree will +perform a basic 3-way diff, and apply any local changes to the +new copy, while leaving the old untouched. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/adapting-existing.md b/docs/manual/adapting-existing.md deleted file mode 100644 index 3a1b8d69..00000000 --- a/docs/manual/adapting-existing.md +++ /dev/null @@ -1,172 +0,0 @@ -# Adapting existing mainstream distributions - -## System layout - -First, OSTree encourages systems to implement -[UsrMove](http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge/) -This is simply to avoid the need for more bind mounts. By default -OSTree's dracut hook creates a read-only bind mount over `/usr`; you -can of course generate individual bind-mounts for `/bin`, all the -`/lib` variants, etc. So it is not intended to be a hard requirement. - -Remember, because by default the system is booted into a `chroot` -equivalent, there has to be some way to refer to the actual physical -root filesystem. Therefore, your operating system tree should contain -an empty `/sysroot` directory; at boot time, OSTree will make this a -bind mount to the physical / root directory. There is precedent for -this name in the initramfs context. You should furthermore make a -toplevel symbolic link `/ostree` which points to `/sysroot/ostree`, so -that the OSTree tool at runtime can consistently find the system data -regardless of whether it's operating on a physical root or inside a -deployment. - -Because OSTree only preserves `/var` across upgrades (each -deployment's chroot directory will be garbage collected -eventually), you will need to choose how to handle other -toplevel writable directories specified by the [Filesystem Hierarchy Standard](http://www.pathname.com/fhs/). -Your operating system may of course choose -not to support some of these such as `/usr/local`, but following is the -recommended set: - - - `/home` → `/var/home` - - `/opt` → `/var/opt` - - `/srv` → `/var/srv` - - `/root` → `/var/roothome` - - `/usr/local` → `/var/usrlocal` - - `/mnt` → `/var/mnt` - - `/tmp` → `/sysroot/tmp` - -Furthermore, since `/var` is empty by default, your operating system -will need to dynamically create the *targets* of these at boot. A -good way to do this is using `systemd-tmpfiles`, if your OS uses -systemd. For example: - -``` -d /var/log/journal 0755 root root - -L /var/home - - - - ../sysroot/home -d /var/opt 0755 root root - -d /var/srv 0755 root root - -d /var/roothome 0700 root root - -d /var/usrlocal 0755 root root - -d /var/usrlocal/bin 0755 root root - -d /var/usrlocal/etc 0755 root root - -d /var/usrlocal/games 0755 root root - -d /var/usrlocal/include 0755 root root - -d /var/usrlocal/lib 0755 root root - -d /var/usrlocal/man 0755 root root - -d /var/usrlocal/sbin 0755 root root - -d /var/usrlocal/share 0755 root root - -d /var/usrlocal/src 0755 root root - -d /var/mnt 0755 root root - -d /run/media 0755 root root - -``` - -Particularly note here the double indirection of `/home`. By default, -each deployment will share the global toplevel `/home` directory on -the physical root filesystem. It is then up to higher levels of -management tools to keep `/etc/passwd` or equivalent synchronized -between operating systems. Each deployment can easily be reconfigured -to have its own home directory set simply by making `/var/home` a real -directory. - -## Booting and initramfs technology - -OSTree comes with optional dracut+systemd integration code which follows -this logic: - -- Parse the `ostree=` kernel command line argument in the initramfs -- Set up a read-only bind mount on `/usr` -- Bind mount the deployment's `/sysroot` to the physical `/` -- Use `mount(MS_MOVE)` to make the deployment root appear to be the root filesystem - -After these steps, systemd switches root. - -If you are not using dracut or systemd, using OSTree should still be -possible, but you will have to write the integration code. See the -existing sources in -[src/switchroot](https://github.com/ostreedev/ostree/tree/master/src/switchroot) -as a reference. - -Patches to support other initramfs technologies and init systems, if -sufficiently clean, will likely be accepted upstream. - -A further specific note regarding `sysvinit`: OSTree used to support -recording device files such as the `/dev/initctl` FIFO, but no longer -does. It's recommended to just patch your initramfs to create this at -boot. - -## /usr/lib/passwd - -Unlike traditional package systems, OSTree trees contain *numeric* uid -and gids. Furthermore, it does not have a `%post` type mechanism -where `useradd` could be invoked. In order to ship an OS that -contains both system users and users dynamically created on client -machines, you will need to choose a solution for `/etc/passwd`. The -core problem is that if you add a user to the system for a daemon, the -OSTree upgrade process for `/etc` will simply notice that because -`/etc/passwd` differs from the previous default, it will keep the -modified config file, and your new OS user will not be visible. The -solution chosen for the [Gnome Continuous](https://live.gnome.org/Projects/GnomeContinuous) operating -system is to create `/usr/lib/passwd`, and to include a NSS module -[nss-altfiles](https://github.com/aperezdc/nss-altfiles) which -instructs glibc to read from it. Then, the build system places all -system users there, freeing up `/etc/passwd` to be purely a database -of local users. See also a more recent effort from [Systemd Stateless](http://0pointer.de/blog/projects/stateless.html) - -## Adapting existing package managers - -The largest endeavor is likely to be redesigning your distribution's -package manager to be on top of OSTree, particularly if you want to -keep compatibility with the "old way" of installing into the physical -`/`. This section will use examples from both `dpkg` and `rpm` as the -author has familiarity with both; but the abstract concepts should -apply to most traditional package managers. - -There are many levels of possible integration; initially, we will -describe the most naive implementation which is the simplest but also -the least efficient. We will assume here that the admin is booted -into an OSTree-enabled system, and wants to add a set of packages. - -Many package managers store their state in `/var`; but since in the -OSTree model that directory is shared between independent versions, -the package database must first be found in the per-deployment `/usr` -directory. It becomes read-only; remember, all upgrades involve -constructing a new filesystem tree, so your package manager will also -need to create a copy of its database. Most likely, if you want to -continue supporting non-OSTree deployments, simply have your package -manager fall back to the legacy `/var` location if the one in `/usr` -is not found. - -To install a set of new packages (without removing any existing ones), -enumerate the set of packages in the currently booted deployment, and -perform dependency resolution to compute the complete set of new -packages. Download and unpack these new packages to a temporary -directory. - -Now, because we are merely installing new packages and not -removing anything, we can make the major optimization of reusing -our existing filesystem tree, and merely -*layering* the composed filesystem tree of -these new packages on top. A command like this: - -``` -ostree commit -b osname/releasename/description ---tree=ref=$osname/$releasename/$description ---tree=dir=/var/tmp/newpackages.13A8D0/ -``` - -will create a new commit in the `$osname/$releasename/$description` -branch. The OSTree SHA256 checksum of all the files in -`/var/tmp/newpackages.13A8D0/` will be computed, but we will not -re-checksum the present existing tree. In this layering model, -earlier directories will take precedence, but files in later layers -will silently override earlier layers. - -Then to actually deploy this tree for the next boot: -`ostree admin deploy $osname/$releasename/$description` - -This is essentially what [rpm-ostree](https://github.com/projectatomic/rpm-ostree/) -does to support its [package layering model](https://rpm-ostree.readthedocs.io/en/latest/manual/administrator-handbook/#hybrid-imagepackaging-via-package-layering). - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/atomic-upgrades.md b/docs/manual/atomic-upgrades.md deleted file mode 100644 index f2c01cc1..00000000 --- a/docs/manual/atomic-upgrades.md +++ /dev/null @@ -1,121 +0,0 @@ -# Atomic Upgrades - -## You can turn off the power anytime you want... - -OSTree is designed to implement fully atomic and safe upgrades; -more generally, atomic transitions between lists of bootable -deployments. If the system crashes or you pull the power, you -will have either the old system, or the new one. - -## Simple upgrades via HTTP - -First, the most basic model OSTree supports is one where it replicates -pre-generated filesystem trees from a server over HTTP, tracking -exactly one ref, which is stored in the `.origin` file for the -deployment. The command `ostree admin upgrade` -implements this. - -To begin a simple upgrade, OSTree fetches the contents of the ref from -the remote server. Suppose we're tracking a ref named -`exampleos/buildmaster/x86_64-runtime`. OSTree fetches the URL -`http://example.com/repo/refs/heads/exampleos/buildmaster/x86_64-runtime`, -which contains a SHA256 checksum. This determines the tree to deploy, -and `/etc` will be merged from currently booted tree. - -If we do not have this commit, then, then we perform a pull process. -At present (without static deltas), this involves quite simply just -fetching each individual object that we do not have, asynchronously. -Put in other words, we only download changed files (zlib-compressed). -Each object has its checksum validated and is stored in `/ostree/repo/objects/`. - -Once the pull is complete, we have all the objects locally -we need to perform a deployment. - -## Upgrades via external tools (e.g. package managers) - -As mentioned in the introduction, OSTree is also designed to allow a -model where filesystem trees are computed on the client. It is -completely agnostic as to how those trees are generated; they could be -computed with traditional packages, packages with post-deployment -scripts on top, or built by developers directly from revision control -locally, etc. - -At a practical level, most package managers today (`dpkg` and `rpm`) -operate "live" on the currently booted filesystem. The way they could -work with OSTree is instead to take the list of installed packages in -the currently booted tree, and compute a new filesystem from that. A -later chapter describes in more details how this could work: -[Adapting Existing Systems](adapting-existing.md). - -For the purposes of this section, let's assume that we have a -newly generated filesystem tree stored in the repo (which shares -storage with the existing booted tree). We can then move on to -checking it back out of the repo into a deployment. - -## Assembling a new deployment directory - -Given a commit to deploy, OSTree first allocates a directory for -it. This is of the form `/boot/loader/entries/ostree-$stateroot-$checksum.$serial.conf`. -The `$serial` is normally `0`, but if a -given commit is deployed more than once, it will be incremented. -This is supported because the previous deployment may have -configuration in `/etc` that we do not want to use or overwrite. - -Now that we have a deployment directory, a 3-way merge is -performed between the (by default) currently booted deployment's -`/etc`, its default -configuration, and the new deployment (based on its `/usr/etc`). - -## Atomically swapping boot configuration - -At this point, a new deployment directory has been created as a -hardlink farm; the running system is untouched, and the bootloader -configuration is untouched. We want to add this deployment to the -"deployment list". - -To support a more general case, OSTree supports atomic transitioning -between arbitrary sets of deployments, with the restriction that the -currently booted deployment must always be in the new set. In the -normal case, we have exactly one deployment, which is the booted one, -and we want to add the new deployment to the list. A more complex -command might allow creating 100 deployments as part of one atomic -transaction, so that one can set up an automated system to bisect -across them. - -## The bootversion - -OSTree allows swapping between boot configurations by implementing the -"swapped directory pattern" in `/boot`. This means it is a symbolic -link to one of two directories `/ostree/boot.[0|1]`. To swap the -contents atomically, if the current version is `0`, we create -`/ostree/boot.1`, populate it with the new contents, then atomically -swap the symbolic link. Finally, the old contents can be garbage -collected at any point. - -## The /ostree/boot directory - -However, we want to optimize for the case where the set of -kernel/initramfs/devicetree sets is the same between both the old and new -deployment lists. This happens when doing an upgrade that does not -include the kernel; think of a simple translation update. OSTree -optimizes for this case because on some systems `/boot` may be on a -separate medium such as flash storage not optimized for significant -amounts of write traffic. Related to this, modern OSTree has support -for having `/boot` be a read-only mount by default - it will -automatically remount read-write just for the portion of time -necessary to update the bootloader configuration. - -To implement this, OSTree also maintains the directory -`/ostree/boot.$bootversion`, which is a set -of symbolic links to the deployment directories. The -`$bootversion` here must match the version of -`/boot`. However, in order to allow atomic transitions of -*this* directory, this is also a swapped directory, -so just like `/boot`, it has a version of `0` or `1` appended. - -Each bootloader entry has a special `ostree=` argument which refers to -one of these symbolic links. This is parsed at runtime in the -initramfs. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/buildsystem-and-repos.md b/docs/manual/buildsystem-and-repos.md deleted file mode 100644 index fbae0322..00000000 --- a/docs/manual/buildsystem-and-repos.md +++ /dev/null @@ -1,188 +0,0 @@ -# Writing a buildsystem and managing repositories - -OSTree is not a package system. It does not directly support building -source code. Rather, it is a tool for transporting and managing -content, along with package-system independent aspects like bootloader -management for updates. - -We'll assume here that we're planning to generate commits on a build -server, then have client systems replicate it. Doing client-side -assembly is also possible of course, but this discussion will focus -primarily on server-side concerns. - -## Build vs buy - -Therefore, you need to either pick an existing tool for writing -content into an OSTree repository, or to write your own. An example -tool is [rpm-ostree](https://github.com/projectatomic/rpm-ostree) - it -takes as input RPMs, and commits them (currently oriented for a server -side, but aiming to do client side too). - -## Initializing - -For this initial discussion, we're assuming you have a single -`archive` repository: - -``` -mkdir repo -ostree --repo=repo init --mode=archive -``` - -You can export this via a static webserver, and configure clients to -pull from it. - -## Writing your own OSTree buildsystem - -There exist many, many systems that basically follow this pattern: - -``` -$pkg --installroot=/path/to/tmpdir install foo bar baz -$imagesystem commit --root=/path/to/tmpdir -``` - -For various values of `$pkg` such as `yum`, `apt-get`, etc., and -values of `$imagesystem` could be simple tarballs, Amazon Machine -Images, ISOs, etc. - -Now obviously in this document, we're going to talk about the -situation where `$imagesystem` is OSTree. The general idea with -OSTree is that wherever you might store a series of tarballs for -applications or OS images, OSTree is likely going to be better. For -example, it supports GPG signatures, binary deltas, writing bootloader -configuration, etc. - -OSTree does not include a package/component build system simply -because there already exist plenty of good ones - rather, it is -intended to provide an infrastructure layer. - -The above mentioned `rpm-ostree compose tree` chooses RPM as the value -of `$pkg` - so binaries are built as RPMs, then committed as a whole -into an OSTree commit. - -But let's discuss building our own. If you're just experimenting, -it's quite easy to start with the command line. We'll assume for this -purpose that you have a build process that outputs a directory tree - -we'll call this tool `$pkginstallroot` (which could be `yum ---installroot` or `debootstrap`, etc.). - -Your initial prototype is going to look like: - -``` -$pkginstallroot /path/to/tmpdir -ostree --repo=repo commit -s 'build' -b exampleos/x86_64/standard --tree=dir=/path/to/tmpdir -``` - -Alternatively, if your build system can generate a tarball, you can -commit that tarball into OSTree. For example, -[OpenEmbedded](http://www.openembedded.org/) can output a tarball, and -one can commit it via: - -``` -ostree commit -s 'build' -b exampleos/x86_64/standard --tree=tar=myos.tar -``` - -## Constructing trees from unions - -The above is a very simplistic model, and you will quickly notice that -it's slow. This is because OSTree has to re-checksum and recompress -the content each time it's committed. (Most of the CPU time is spent -in compression which gets thrown away if the content turns out to be -already stored). - -A more advanced approach is to store components in OSTree itself, then -union them, and recommit them. At this point, we recommend taking a -look at the OSTree API, and choose a programming language supported by -[GObject Introspection](https://wiki.gnome.org/Projects/GObjectIntrospection) -to write your buildsystem scripts. Python may be a good choice, or -you could choose custom C code, etc. - -For the purposes of this tutorial we will use shell script, but it's -strongly recommended to choose a real programming language for your -build system. - -Let's say that your build system produces separate artifacts (whether -those are RPMs, zip files, or whatever). These artifacts should be -the result of `make install DESTDIR=` or similar. Basically -equivalent to RPMs/debs. - -Further, in order to make things fast, we will need a separate -`bare-user` repository in order to perform checkouts quickly via -hardlinks. We'll then export content into the `archive` repository -for use by client systems. - -``` -mkdir build-repo -ostree --repo=build-repo init --mode=bare-user -``` - -You can begin committing those as individual branches: - -``` -ostree --repo=build-repo commit -b exampleos/x86_64/bash --tree=tar=bash-4.2-bin.tar.gz -ostree --repo=build-repo commit -b exampleos/x86_64/systemd --tree=tar=systemd-224-bin.tar.gz -``` - -Set things up so that whenever a package changes, you redo the -`commit` with the new package version - conceptually, the branch -tracks the individual package versions over time, and defaults to -"latest". This isn't required - one could also include the version in -the branch name, and have metadata outside to determine "latest" (or -the desired version). - -Now, to construct our final tree: - -``` -rm -rf exampleos-build -for package in bash systemd; do - ostree --repo=build-repo checkout -U --union exampleos/x86_64/${package} exampleos-build -done -# Set up a "rofiles-fuse" mount point; this ensures that any processes -# we run for post-processing of the tree don't corrupt the hardlinks. -mkdir -p mnt -rofiles-fuse exampleos-build mnt -# Now run global "triggers", generate cache files: -ldconfig -r mnt - (Insert other programs here) -fusermount -u mnt -ostree --repo=build-repo commit -b exampleos/x86_64/standard --link-checkout-speedup exampleos-build -``` - -There are a number of interesting things going on here. The major -architectural change is that we're using `--link-checkout-speedup`. -This is a way to tell OSTree that our checkout is made via hardlinks, -and to scan the repository in order to build up a reverse `(device, -inode) -> checksum` mapping. - -In order for this mapping to be accurate, we needed the `rofiles-fuse` -to ensure that any changed files had new inodes (and hence a new -checksum). - -## Migrating content between repositories - -Now that we have content in our `build-repo` repository (in -`bare-user` mode), we need to move the `exampleos/x86_64/standard` -branch content into the repository just named `repo` (in `archive` -mode) for export, which will involve zlib compression of new objects. -We likely want to generate static deltas after that as well. - -Let's copy the content: - -``` -ostree --repo=repo pull-local build-repo exampleos/x86_64/standard -``` - -Clients can now incrementally download new objects - however, this -would also be a good time to generate a delta from the previous -commit. - -``` -ostree --repo=repo static-delta generate exampleos/x86_64/standard -``` - -## More sophisticated repository management - -Next, see [Repository Management](repository-management.md) for the -next steps in managing content in OSTree repositories. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/deployment.md b/docs/manual/deployment.md deleted file mode 100644 index afbcbabb..00000000 --- a/docs/manual/deployment.md +++ /dev/null @@ -1,101 +0,0 @@ -# Deployments - -## Overview - -Built on top of the OSTree versioning filesystem core is a layer -that knows how to deploy, parallel install, and manage Unix-like -operating systems (accessible via `ostree admin`). The core content of these operating systems -are treated as read-only, but they transparently share storage. - -A deployment is physically located at a path of the form -`/ostree/deploy/$stateroot/deploy/$checksum`. -OSTree is designed to boot directly into exactly one deployment -at a time; each deployment is intended to be a target for -`chroot()` or equivalent. - -### "stateroot" (AKA "osname"): Group of deployments that share /var - -Each deployment is grouped in exactly one "stateroot" (also known as an "osname"); -the former term is preferred. - -From above, you can see that an stateroot is physically represented in the -`/ostree/deploy/$stateroot` directory. For example, OSTree can allow parallel -installing Debian in `/ostree/deploy/debian` and Red Hat Enterprise Linux in -`/ostree/deploy/rhel` (subject to operating system support, present released -versions of these operating systems may not support this). - -Each stateroot has exactly one copy of the traditional Unix `/var`, -stored physically in `/ostree/deploy/$stateroot/var`. OSTree provides -support tools for `systemd` to create a Linux bind mount that ensures -the booted deployment sees the shared copy of `/var`. - -OSTree does not touch the contents of `/var`. Operating system -components such as daemon services are required to create any -directories they require there at runtime -(e.g. `/var/cache/$daemonname`), and to manage upgrading data formats -inside those directories. - -### Contents of a deployment - -A deployment begins with a specific commit (represented as a -SHA256 hash) in the OSTree repository in `/ostree/repo`. This commit refers -to a filesystem tree that represents the underlying basis of a -deployment. For short, we will call this the "tree", to -distinguish it from the concept of a deployment. - -First, the tree must include a kernel (and optionally an initramfs). The -current standard locations for these are `/usr/lib/modules/$kver/vmlinuz` and -`/usr/lib/modules/$kver/initramfs.img`. The "boot checksum" will be computed -automatically. This follows the current Fedora kernel layout, and is -the current recommended path. However, older versions of libostree don't -support this; you may need to also put kernels in the previous (legacy) -paths, which are `vmlinuz(-.*)?-$checksum` in either `/boot` or `/usr/lib/ostree-boot`. -The checksum should be a SHA256 hash of the kernel contents; it must be -pre-computed before storing the kernel in the repository. Optionally, -the directory can also contain an initramfs, stored as -`initramfs(-.*)?-$checksum` and/or a device tree, stored as -`devicetree(-.*)?-$checksum`. If an initramfs or devicetree exist, -the checksum must include all of the kernel, initramfs and devicetree contents. -OSTree will use this to determine which kernels are shared. The rationale for -this is to avoid computing checksums on the client by default. - -The deployment should not have a traditional UNIX `/etc`; instead, it -should include `/usr/etc`. This is the "default configuration". When -OSTree creates a deployment, it performs a 3-way merge using the -*old* default configuration, the active system's `/etc`, and the new -default configuration. In the final filesystem tree for a deployment -then, `/etc` is a regular writable directory. - -Besides the exceptions of `/var` and `/etc` then, the rest of the -contents of the tree are checked out as hard links into the -repository. It's strongly recommended that operating systems ship all -of their content in `/usr`, but this is not a hard requirement. - -Finally, a deployment may have a `.origin` file, stored next to its -directory. This file tells `ostree admin upgrade` how to upgrade it. -At the moment, OSTree only supports upgrading a single refspec. -However, in the future OSTree may support a syntax for composing -layers of trees, for example. - -### The system /boot - -While OSTree parallel installs deployments cleanly inside the -`/ostree` directory, ultimately it has to control the system's `/boot` -directory. The way this works is via the -[Boot Loader Specification](http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec), -which is a standard for bootloader-independent drop-in configuration -files. - -When a tree is deployed, it will have a configuration file generated -of the form -`/boot/loader/entries/ostree-$stateroot-$checksum.$serial.conf`. This -configuration file will include a special `ostree=` kernel argument -that allows the initramfs to find (and `chroot()` into) the specified -deployment. - -At present, not all bootloaders implement the BootLoaderSpec, so -OSTree contains code for some of these to regenerate native config -files (such as `/boot/syslinux/syslinux.conf`) based on the entries. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/formats.md b/docs/manual/formats.md deleted file mode 100644 index 884b1b5e..00000000 --- a/docs/manual/formats.md +++ /dev/null @@ -1,188 +0,0 @@ -# OSTree data formats - -## On the topic of "smart servers" - -One really crucial difference between OSTree and git is that git has a -"smart server". Even when fetching over `https://`, it isn't just a -static webserver, but one that e.g. dynamically computes and -compresses pack files for each client. - -In contrast, the author of OSTree feels that for operating system -updates, many deployments will want to use simple static webservers, -the same target most package systems were designed to use. The -primary advantages are security and compute efficiency. Services like -Amazon S3 and CDNs are a canonical target, as well as a stock static -nginx server. - -## The archive format - -In the [repo](repo) section, the concept of objects was introduced, -where file/content objects are checksummed and managed individually. -(Unlike a package system, which operates on compressed aggregates). - -The `archive` format simply gzip-compresses each content object. -Metadata objects are stored uncompressed. This means that it's easy -to serve via static HTTP. Note: the repo config file still uses the -historical term `archive-z2` as mode. But this essentially indicates -the modern `archive` format. - -When you commit new content, you will see new `.filez` files appearing -in `objects/`. - -## archive efficiency - -The advantages of `archive`: - - - It's easy to understand and implement - - Can be served directly over plain HTTP by a static webserver - - Clients can download/unpack updates incrementally - - Space efficient on the server - -The biggest disadvantage of this format is that for a client to -perform an update, one HTTP request per changed file is required. In -some scenarios, this actually isn't bad at all, particularly with -techniques to reduce HTTP overhead, such as -[HTTP/2](https://en.wikipedia.org/wiki/HTTP/2). - -In order to make this format work well, you should design your content -such that large data that changes infrequently (e.g. graphic images) -are stored separately from small frequently changing data (application -code). - -Other disadvantages of `archive`: - - - It's quite bad when clients are performing an initial pull (without HTTP/2), - - One doesn't know the total size (compressed or uncompressed) of content - before downloading everything - -## Aside: the bare and bare-user formats - -The most common operation is to pull from an `archive` repository -into a `bare` or `bare-user` formatted repository. These latter two -are not compressed on disk. In other words, pulling to them is -similar to unpacking (but not installing) an RPM/deb package. - -The `bare-user` format is a bit special in that the uid/gid and xattrs -from the content are ignored. This is primarily useful if you want to -have the same OSTree-managed content that can be run on a host system -or an unprivileged container. - -## Static deltas - -OSTree itself was originally focused on a continuous delivery model, where -client systems are expected to update regularly. However, many OS vendors -would like to supply content that's updated e.g. once a month or less often. - -For this model, we can do a lot better to support batched updates than -a basic `archive` repo. However, we still want to preserve the -model of "static webserver only". Given this, OSTree has gained the -concept of a "static delta". - -These deltas are targeted to be a delta between two specific commit -objects, including "bsdiff" and "rsync-style" deltas within a content -object. Static deltas also support `from NULL`, where the client can -more efficiently download a commit object from scratch - this is -mostly useful when using OSTree for containers, rather than OS images. -For OS images, one tends to download an installer ISO or qcow2 image -which is a single file that contains the tree data already. - -Effectively, we're spending server-side storage (and one-time compute -cost), and gaining efficiency in client network bandwidth. - -## Static delta repository layout - -Since static deltas may not exist, the client first needs to attempt -to locate one. Suppose a client wants to retrieve commit `${new}` -while currently running `${current}`. - -The first thing to understand is that in order to save space, these -two commits are "modified base64" - the `/` character is replaced with -`_`. - -Like the commit objects, a "prefix directory" is used to make -management easier for filesystem tools - -A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example -`GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`, -which in SHA256 format is -`1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`. - -Finally, the actual content can be found in -`deltas/$fromprefix/$fromsuffix-$to`. - -## Static delta internal structure - -A delta is itself a directory. Inside, there is a file called -`superblock` which contains metadata. The rest of the files will be -integers bearing packs of content. - -The file format of static deltas should be currently considered an -OSTree implementation detail. Obviously, nothing stops one from -writing code which is compatible with OSTree today. However, we would -like the flexibility to expand and change things, and having multiple -codebases makes that more problematic. Please contact the authors -with any requests. - -That said, one critical thing to understand about the design is that -delta payloads are a bit more like "restricted programs" than they are -raw data. There's a "compilation" phase which generates output that -the client executes. - -This "updates as code" model allows for multiple content generation -strategies. The design of this was inspired by that of Chromium: -[ChromiumOS Autoupdate](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate). - -### The delta superblock - -The superblock contains: - - - arbitrary metadata - - delta generation timestamp - - the new commit object - - An array of recursive deltas to apply - - An array of per-part metadata, including total object sizes (compressed and uncompressed), - - An array of fallback objects - -Let's define a delta part, then return to discuss details: - -## A delta part - -A delta part is a combination of a raw blob of data, plus a very -restricted bytecode that operates on it. Say for example two files -happen to share a common section. It's possible for the delta -compilation to include that section once in the delta data blob, then -generate instructions to write out that blob twice when generating -both objects. - -Realistically though, it's very common for most of a delta to just be -"stream of new objects" - if one considers it, it doesn't make sense -to have too much duplication inside operating system content at this -level. - -So then, what's more interesting is that OSTree static deltas support -a per-file delta algorithm called -[bsdiff](https://github.com/mendsley/bsdiff) that most notably works -well on executable code. - -The current delta compiler scans for files with matching basenames in -each commit that have a similar size, and attempts a bsdiff between -them. (It would make sense later to have a build system provide a -hint for this - for example, files within a same package). - -A generated bsdiff is included in the payload blob, and applying it is -an instruction. - -## Fallback objects - -It's possible for there to be large-ish files which might be resistant -to bsdiff. A good example is that it's common for operating systems -to use an "initramfs", which is itself a compressed filesystem. This -"internal compression" defeats bsdiff analysis. - -For these types of objects, the delta superblock contains an array of -"fallback objects". These objects aren't included in the delta -parts - the client simply fetches them from the underlying `.filez` -object. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/introduction.md b/docs/manual/introduction.md deleted file mode 100644 index c0113f5d..00000000 --- a/docs/manual/introduction.md +++ /dev/null @@ -1,183 +0,0 @@ -# OSTree Overview - -## Introduction - -OSTree is an upgrade system for Linux-based operating systems that -performs atomic upgrades of complete filesystem trees. It is -not a package system; rather, it is intended to complement them. -A primary model is composing packages on a server, and then -replicating them to clients. - -The underlying architecture might be summarized as "git for -operating system binaries". It operates in userspace, and will -work on top of any Linux filesystem. At its core is a git-like -content-addressed object store with branches (or "refs") to track -meaningful filesystem trees within the store. Similarly, one can -check out or commit to these branches. - -Layered on top of that is bootloader configuration, management of -`/etc`, and other functions to perform an upgrade beyond just -replicating files. - -You can use OSTree standalone in the pure replication model, -but another approach is to add a package manager on top, -thus creating a hybrid tree/package system. - -## Hello World example - -OSTree is mostly used as a library, but a quick tour of using its -CLI tools can give a general idea of how it works at its most -basic level. - -You can create a new OSTree repository using `init`: - -``` -$ ostree --repo=repo init -``` - -This will create a new `repo` directory containing your -repository. Feel free to inspect it. - -Now, let's prepare some data to add to the repo: - -``` -$ mkdir tree -$ echo "Hello world!" > tree/hello.txt -``` - -We can now import our `tree/` directory using the `commit` -command: - -``` -$ ostree --repo=repo commit --branch=foo tree/ -``` - -This will create a new branch `foo` pointing to the full tree -imported from `tree/`. In fact, we could now delete `tree/` if we -wanted to. - -To check that we indeed now have a `foo` branch, you can use the -`refs` command: - -``` -$ ostree --repo=repo refs -foo -``` - -We can also inspect the filesystem tree using the `ls` and `cat` -commands: - -``` -$ ostree --repo=repo ls foo -d00775 1000 1000 0 / --00664 1000 1000 13 /hello.txt -$ ostree --repo=repo cat foo /hello.txt -Hello world! -``` - -And finally, we can check out our tree from the repository: - -``` -$ ostree --repo=repo checkout foo tree-checkout/ -$ cat tree-checkout/hello.txt -Hello world! -``` - -## Comparison with "package managers" - -Because OSTree is designed for deploying core operating -systems, a comparison with traditional "package managers" such -as dpkg and rpm is illustrative. Packages are traditionally -composed of partial filesystem trees with metadata and scripts -attached, and these are dynamically assembled on the client -machine, after a process of dependency resolution. - -In contrast, OSTree only supports recording and deploying -*complete* (bootable) filesystem trees. It -has no built-in knowledge of how a given filesystem tree was -generated or the origin of individual files, or dependencies, -descriptions of individual components. Put another way, OSTree -only handles delivery and deployment; you will likely still want -to include inside each tree metadata about the individual -components that went into the tree. For example, a system -administrator may want to know what version of OpenSSL was -included in your tree, so you should support the equivalent of -`rpm -q` or `dpkg -L`. - -The OSTree core emphasizes replicating read-only OS trees via -HTTP, and where the OS includes (if desired) an entirely -separate mechanism to install applications, stored in `/var` if they're system global, or -`/home` for per-user -application installation. An example application mechanism is - - -However, it is entirely possible to use OSTree underneath a -package system, where the contents of `/usr` are computed on the client. -For example, when installing a package, rather than changing the -currently running filesystem, the package manager could assemble -a new filesystem tree that layers the new packages on top of a -base tree, record it in the local OSTree repository, and then -set it up for the next boot. To support this model, OSTree -provides an (introspectable) C shared library. - -## Comparison with block/image replication - -OSTree shares some similarity with "dumb" replication and -stateless deployments, such as the model common in "cloud" -deployments where nodes are booted from an (effectively) -readonly disk, and user data is kept on a different volumes. -The advantage of "dumb" replication, shared by both OSTree and -the cloud model, is that it's *reliable* -and *predictable*. - -But unlike many default image-based deployments, OSTree supports -exactly two persistent writable directories that are preserved across -upgrades: `/etc` and `/var`. - -Because OSTree operates at the Unix filesystem layer, it works -on top of any filesystem or block storage layout; it's possible -to replicate a given filesystem tree from an OSTree repository -into plain ext4, BTRFS, XFS, or in general any Unix-compatible -filesystem that supports hard links. Note: OSTree will -transparently take advantage of some BTRFS features if deployed -on it. - -OSTree is orthogonal to virtualization mechanisms like AMIs and qcow2 -images, though it's most useful though if you plan to update stateful -VMs in-place, rather than generating new images. - -In practice, users of "bare metal" configurations will find the OSTree -model most useful. - -## Atomic transitions between parallel-installable read-only filesystem trees - -Another deeply fundamental difference between both package -managers and image-based replication is that OSTree is -designed to parallel-install *multiple versions* of multiple -*independent* operating systems. OSTree -relies on a new toplevel `ostree` directory; it can in fact -parallel install inside an existing OS or distribution -occupying the physical `/` root. - -On each client machine, there is an OSTree repository stored -in `/ostree/repo`, and a set of "deployments" stored in `/ostree/deploy/$STATEROOT/$CHECKSUM`. -Each deployment is primarily composed of a set of hardlinks -into the repository. This means each version is deduplicated; -an upgrade process only costs disk space proportional to the -new files, plus some constant overhead. - -The model OSTree emphasizes is that the OS read-only content -is kept in the classic Unix `/usr`; it comes with code to -create a Linux read-only bind mount to prevent inadvertent -corruption. There is exactly one `/var` writable directory shared -between each deployment for a given OS. The OSTree core code -does not touch content in this directory; it is up to the code -in each operating system for how to manage and upgrade state. - -Finally, each deployment has its own writable copy of the -configuration store `/etc`. On upgrade, OSTree will -perform a basic 3-way diff, and apply any local changes to the -new copy, while leaving the old untouched. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/related-projects.md b/docs/manual/related-projects.md deleted file mode 100644 index 429b46b9..00000000 --- a/docs/manual/related-projects.md +++ /dev/null @@ -1,358 +0,0 @@ -# Related Projects - -OSTree is in many ways very evolutionary. It builds on concepts and -ideas introduced from many different projects such as -[Systemd Stateless](http://0pointer.net/blog/projects/stateless.html), -[Systemd Bootloader Spec](https://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/), -[Chromium Autoupdate](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate), -the much older -[Fedora/Red Hat Stateless Project](https://fedoraproject.org/wiki/StatelessLinux), -[Linux VServer](http://linux-vserver.org/index.php?title=util-vserver:Vhashify&oldid=2285) -and many more. - -As mentioned elsewhere, OSTree is strongly influenced by package -manager designs as well. This page is not intended to be an -exhaustive list of such projects, but we will try to keep it up to -date, and relatively agnostic. - -Broadly speaking, projects in this area fall into two camps; either -a tool to snapshot systems on the client side (dpkg/rpm + BTRFS/LVM), -or a tool to compose on a server and replicate (ChromiumOS, Clear -Linux). OSTree is flexible enough to do both. - -Note that this section of the documentation is almost entirely -focused on the "ostree for host" model; the [flatpak](https://github.com/flatpak/flatpak/) -project uses libostree to store application data, distinct from the -host system management model. - -## Combining dpkg/rpm + (BTRFS/LVM) - -In this approach, one uses a block/filesystem snapshot tool underneath -the system package manager. - -The -[oVirt Node imgbased](https://gerrit.ovirt.org/gitweb?p=imgbased.git) -tool is an example of this approach, as are a few others below. - -Regarding [BTRFS](https://btrfs.wiki.kernel.org/index.php/Main_Page) -in particular - the OSTree author believes that Linux storage is a -wide world, and while BTRFS is quite good, it is not everywhere now, -nor will it be in the near future. There are other recently developed -filesystems like [f2fs](https://en.wikipedia.org/wiki/F2FS), and Red -Hat Enterprise Linux still defaults to -[XFS](https://en.wikipedia.org/wiki/XFS). - -Using a snapshot tool underneath a package manager does help -significantly. In the rest of this text, we will use "BTRFS" as a -mostly generic tool for filesystem snapshots. - -The obvious thing to do is layer BTRFS under dpkg/rpm, and have a -separate subvolume for `/home` so rollbacks don't lose your data. See -e.g. [Fedora BTRFS Rollback Feature](http://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs). - -More generally, if you want to use BTRFS to roll back changes made by -dpkg/rpm, you have to carefully set up the partition layout so that -the files laid out by dpkg/rpm are installed in a subvolume to -snapshot. - -This problem in many ways is addressed by the changes OSTree forces, -such as putting all local state in `/var` (e.g. `/usr/local` -> -`/var/usrlocal`). Then one can BTRFS snapshot `/usr`. This gets pretty -far, except handling `/etc` is messy. This is something OSTree does -well. - -In general, if one really tries to flesh out the BTRFS approach, a -nontrivial middle layer of code between dpkg/rpm and BTRFS (or deep -awareness of BTRFS in dpkg/rpm itself) will be required. A good -example of this is the [snapper.io](http://snapper.io/) project. - -The OSTree author believes that having total freedom at the block -storage layer is better for general purpose operating systems. For -example, the ability to choose dm-crypt per deployment is quite useful; -not every site wants to pay the performance penalty. One can choose -LVM or not, etc. - -Where applicable, OSTree does take advantage of copy-on-write/reflink -features offered by the kernel for `/etc`. It uses the now generic -`ioctl(FICLONE)` and `copy_file_range()`. - -Another major distinction between the default OSTree usage and package managers -is whether updates are "online" or "offline" by default. The default OSTree -design writes updates into a new root, leaving the running system unchanged. -This means preparing updates is completely non-disruptive and safe - if the -system runs out of disk space in the middle, it's easy to recover. However, -there is work in the [rpm-ostree](https://github.com/projectatomic/rpm-ostree/) -project to support online updates as well. - -OSTree supports using "bare-user" repositories, which do not require -root to use. Using a filesystem-level layer without root is more -difficult and would likely require a setuid helper or privileged service. - -Finally, see the next portion around ChromiumOS for why a hybrid but -integrated package/image system improves on this. - -## ChromiumOS updater - -Many people who look at OSTree are most interested in using -it as an updater for embedded or fixed-purpose systems, similar to use cases -from the [ChromiumOS updater](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate). - -The ChromiumOS approach uses two partitions that are swapped via the -bootloader. It has a very network-efficient update protocol, using a -custom binary delta scheme between filesystem snapshots. - -This model even allows for switching filesystem types in an update. - -A major downside of this approach is that the OS size is doubled on -disk always. In contrast, OSTree uses plain Unix hardlinks, which -means it essentially only requires disk space proportional to the -changed files, plus some small fixed overhead. - -This means with OSTree, one can easily have more than two trees -(deployments). Another example is that the system OSTree repository -could *also* be used for application containers. - -Finally, the author of OSTree believes that what one really wants for -many cases is image replication *with* the ability to layer on some -additional components (e.g. packages) - a hybrid model. This is what -[rpm-ostree](https://github.com/projectatomic/rpm-ostree/) is aiming -to support. - -## Ubuntu Image Based Updates - -See . Very architecturally -similar to ChromeOS, although more interesting is discussion for -supporting package installation on top, similar to -[rpm-ostree package layering](https://github.com/projectatomic/rpm-ostree/pull/107). - -## Clear Linux Software update - -The -[Clear Linux Software update](https://clearlinux.org/features/software-update) -system is not very well documented. -[This mailing list post](https://lists.clearlinux.org/pipermail/dev/2016-January/000159.html) -has some reverse-engineered design documentation. - -Like OSTree static deltas, it also uses bsdiff for network efficiency. - -More information will be filled in here over time. The OSTree author -believes that at the moment, the "CL updater" is not truly atomic in -the sense that because it applies updates live, there is a window -where the OS root may be inconsistent. - -## casync - -The [systemd casync](https://github.com/systemd/casync) project is -relatively new. Currently, it is more of a storage library, and doesn't -support higher level logic for things like GPG signatures, versioning -information, etc. This is mostly the `OstreeRepo` layer. Moving up to -the `OstreeSysroot` level - things like managing the bootloader -configuration, and most importantly implementing correct merging for `/etc` -are missing. casync also is unaware of SELinux. - -OSTree is really today a shared library, and has been for quite some time. -This has made it easy to build higher level projects such as -[rpm-ostree](https://github.com/projectatomic/rpm-ostree/) which has quite -a bit more, such as a DBus API and other projects consume that, such as -[Cockpit](http://cockpit-project.org/). - -A major issue with casync today is that it doesn't support garbage collection -on the server side. OSTree's GC works symmetrically on the server and client -side. - -Broadly speaking, casync is a twist on the dual partition approach, and -shares the general purpose disadvantages of those. - -## Mender.io - -[Mender.io](https://mender.io/) is another implementation of the dual -partition approach. - -## OLPC update - -OSTree is basically a generalization of olpc-update, except using -plain HTTP instead of rsync. OSTree has the notion of separate trees -that one can track independently or parallel install, while still -sharing storage via the hardlinked repository, whereas olpc-update -uses version numbers for a single OS. - -OSTree has built-in plain old HTTP replication which can be served -from a static webserver, whereas olpc-update uses `rsync` (more server -load, but more efficient on the network side). The OSTree solution to -improving network bandwidth consumption is via static deltas. - -See -[this comment](http://blog.verbum.org/2013/08/26/ostree-v2013-6-released/#comment-1169) -for a comparison. - -## NixOS / Nix - -See [NixOS](http://nixos.org/). It was a very influential project for OSTree. -NixOS and OSTree both support the idea of independent "roots" that are bootable. - -In NixOS, files in a package are accessed by a path depending on the checksums -of package inputs (build dependencies) - see -[Nix store](http://nixos.org/nix/manual/#chap-package-management/). -However, OSTree uses a commit/deploy model - it isn't tied to any particular -directory layout, and you can put whatever data you want inside an OSTree, for -example the standard FHS layout. A both positive and negative of the Nix model -is that a change in the build dependencies (e.g. being built with a newer gcc), -requires a cascading rebuild of everything. It's good because it makes it easy -to do massive system-wide changes such as gcc upgrades, and allows installing -multiple versions of packages at once. However, a security update to e.g. glibc -forces a rebuild of everything from scratch, and so Nix is not practical at -scale. OSTree supports using a build system that just rebuilds individual -components (packages) as they change, without forcing a rebuild of their -dependencies. - -Nix automatically detects runtime package dependencies by scanning content for -hashes. OSTree only supports only system-level images, and doesn't do dependency -management. Nix can store arbitrary files, using nix-store --add, but, more -commonly, paths are added as the result of running a derivation file generated -using the Nix language. OSTree is build-system agnostic; filesystem trees are -committed using a simple C API, and this is the only way to commit files. - -OSTree automatically shares the storage of identical data using hard links into -a content-addressed store. Nix can deduplicate using hard links as well, using -the auto-optimise-store option, but this is not on by default, and Nix does not -guarantee that all of its files are in the content-addressed store. OSTree -provides a git-like command line interface for browsing the content-addressed -store, while Nix does not have this functionality. - -Nix used to use the immutable bit to prevent modifications to /nix/store, but -now it uses a read-only bind mount. The bind mount can be privately remounted, -allowing per-process privileged write access. OSTree uses the immutable -bit on the root of the deployment, and mounts /usr as read-only. - -NixOS supports switching OS images on-the-fly, by maintaining both booted-system -and current-system roots. It is not clear how well this approach works. OSTree -currently requries a reboot to switch images. - -Finally, NixOS supports installing user-specific packages from trusted -repositories without requiring root, using a trusted daemon. -[Flatpak](https://lwn.net/Articles/687909/), based on OSTree, similarly has a -policykit-based system helper that allows you to authenticate via polkit to -install into the system repository. - -## Solaris IPS - -See -[Solaris IPS](http://hub.opensolaris.org/bin/view/Project+pkg/). Broadly, -this is a similar design as to a combination of BTRFS+RPM/deb. There -is a bootloader management system which combines with the snapshots. -It's relatively well thought through - however, it is a client-side -system assembly. If one wants to image servers and replicate -reliably, that'd be a different system. - -## Google servers (custom rsync-like approach, live updates) - -This paper talks about how Google was (at least at one point) managing -updates for the host systems for some servers: -[Live Upgrading Thousands of Servers from an Ancient Red Hat Distribution to 10 Year Newer Debian Based One (USENIX LISA 2013)](https://www.usenix.org/node/177348) - -## Conary - -See -[Conary Updates and Rollbacks](http://wiki.rpath.com/wiki/Conary:Updates_and_Rollbacks). If -rpm/dpkg are like CVS, Conary is closer to Subversion. It's not bad, -but e.g. its rollback model is rather ad-hoc and not atomic. It also -is a fully client side system and doesn't have an image-like -replication with deltas. - -## bmap - -See -[bmap](https://source.tizen.org/documentation/reference/bmaptool/introduction). -A tool for optimized copying of disk images. Intended for offline use, -so not directly comparable. - -## Git - -Although OSTree has been called "Git for Binaries", and the two share the idea -of a hashed content store, the implementation details are quite different. -OSTree supports extended attributes and uses SHA256 instead of Git's SHA1. It -"checks out" files via hardlinks, rather than copying, and thus requires the -checkout to be immutable. At the moment, OSTree commits may have at most one -parent, as opposed to Git which allows an arbitrary number. Git uses a -smart-delta protocol for updates, while OSTree uses 1 HTTP request per changed -file, or can generate static deltas. - -## Conda - -[Conda](http://conda.pydata.org/docs/) is an "OS-agnostic, system-level binary -package manager and ecosystem"; although most well-known for its accompanying -Python distribution anaconda, its scope has been expanding quickly. The package -format is very similar to well-known ones such as RPM. However, unlike typical -RPMs, the packages are built to be relocatable. Also, the package manager runs -natively on Windows. Conda's main advantage is its ability to install -collections of packages into "environments" by unpacking them all to the same -directory. Conda reduces duplication across environments using hardlinks, -similar to OSTree's sharing between deployments (although Conda uses package / -file path instead of file hash). Overall, it is quite similar to rpm-ostree in -functionality and scope. - -## rpm-ostree - -This builds on top of ostree to support building RPMs into OSTree images, and -even composing RPMs on-the-fly using an overlay filesystem. It is being -developed by Fedora, Red Hat, and CentOS as part of Project Atomic. - -## GNOME Continuous - -This is a service that incrementally rebuilds and tests GNOME on every commit. -The need to make and distribute snapshots for this system was the original -inspiration for ostree. - -## Docker - -It makes sense to compare OSTree and Docker as far as *wire formats* -go. OSTree is not itself a container tool, but can be used as a -transport/storage format for container tools. - -Docker has (at the time of this writing) two format versions (v1 and -v2). v1 is deprecated, so we'll look at [format version 2](https://github.com/docker/docker/blob/master/image/spec/v1.1.md). - -A Docker image is a series of layers, and a layer is essentially JSON -metadata plus a tarball. The tarballs capture changes between layers, -including handling deleting files in higher layers. - -Because the payload format is just tar, Docker hence captures -(numeric) uid/gid and xattrs. - -This "layering" model is an interesting and powerful part of Docker, -allowing different images to reference a shared base. OSTree doesn't -implement this natively, but it's not difficult to implement in higher -level tools. For example in -[flatpak](https://github.com/flatpak/flatpak), there's a concept of a -SDK and runtime, and it would make a lot of sense for the SDK to -depend on the runtime, to avoid clients downloading data twice (even -if it's deduplicated on disk). - -That gets to an advantage of OSTree over Docker; OSTree checksums -individual files (not tarballs), and uses this for deduplication. -Docker (natively) only shares storage via layering. - -The biggest feature OSTree has over Docker though is support for -(static) deltas, and even without pre-configured static deltas, the -`archive` format has "natural" deltas. Particularly for a "base -operating system", one really wants on-wire deltas. It'd likely be -possible to extend Docker with this concept. - -A core challenge both share is around metadata (particularly signing) -and search/discovery (the ostree `summary` file doesn't scale very -well). - -One major issue Docker has is that it [checksums compressed data](https://github.com/projectatomic/skopeo/issues/11), -and furthermore the tar format is flexible, with multiple ways to represent data, -making it hard to impossible to reassemble and verify from on-disk state. -The [tarsum](https://github.com/docker/docker/blob/master/pkg/tarsum/tarsum_spec.md) effort -was intended to address this, but it was not adopted in the end for v2. - -## Docker-related: Balena - -The [Balena](https://github.com/resin-os/balena) project forks Docker and aims -to even use Docker/OCI format for the root filesystem, and adds wire deltas -using librsync. See also [discussion on libostree-list](https://mail.gnome.org/archives/ostree-list/2017-December/msg00002.html). - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/repo.md b/docs/manual/repo.md deleted file mode 100644 index 8746163e..00000000 --- a/docs/manual/repo.md +++ /dev/null @@ -1,158 +0,0 @@ -# Anatomy of an OSTree repository - -## Core object types and data model - -OSTree is deeply inspired by git; the core layer is a userspace -content-addressed versioning filesystem. It is worth taking some time -to familiarize yourself with -[Git Internals](http://git-scm.com/book/en/Git-Internals), as this -section will assume some knowledge of how git works. - -Its object types are similar to git; it has commit objects and content -objects. Git has "tree" objects, whereas OSTree splits them into -"dirtree" and "dirmeta" objects. But unlike git, OSTree's checksums -are SHA256. And most crucially, its content objects include uid, gid, -and extended attributes (but still no timestamps). - -### Commit objects - -A commit object contains metadata such as a timestamp, a log -message, and most importantly, a reference to a -dirtree/dirmeta pair of checksums which describe the root -directory of the filesystem. -Also like git, each commit in OSTree can have a parent. It is -designed to store a history of your binary builds, just like git -stores a history of source control. However, OSTree also makes -it easy to delete data, under the assumption that you can -regenerate it from source code. - -### Dirtree objects - -A dirtree contains a sorted array of (filename, checksum) -pairs for content objects, and a second sorted array of -(filename, dirtree checksum, dirmeta checksum), which are -subdirectories. These type of objects are stored as files -ending with `.dirtree` in the objects directory. - -### Dirmeta objects - -In git, tree objects contain the metadata such as permissions -for their children. But OSTree splits this into a separate -object to avoid duplicating extended attribute listings. -These type of objects are stored as files ending with `.dirmeta` -in the objects directory. - -### Content objects - -Unlike the first three object types which are metadata, designed to be -`mmap()`ed, the content object has a separate internal header and -payload sections. The header contains uid, gid, mode, and symbolic -link target (for symlinks), as well as extended attributes. After the -header, for regular files, the content follows. These parts toghether -form the SHA256 hash for content objects. The content type objects in -this format exist only in `archive` OSTree repositories. Today the -content part is gzip'ed and the objects are stored as files ending -with `.filez` in the objects directory. Because the SHA256 hash is -formed over the uncompressed content, these files do not match the -hash they are named as. - -The OSTree data format intentionally does not contain timestamps. The reasoning -is that data files may be downloaded at different times, and by different build -systems, and so will have different timestamps but identical physical content. -These files may be large, so most users would like them to be shared, both in -the repository and between the repository and deployments. - -This could cause problems with programs that check if files are out-of-date by -comparing timestamps. For Git, the logical choice is to not mess with -timestamps, because unnecessary rebuilding is better than a broken tree. -However, OSTree has to hardlink files to check them out, and commits are assumed -to be internally consistent with no build steps needed. For this reason, OSTree -acts as though all timestamps are set to time_t 0, so that comparisons will be -considered up-to-date. Note that for a few releases, OSTree used 1 to fix -warnings such as GNU Tar emitting "implausibly old time stamp" with 0; however, -until we have a mechanism to transition cleanly to 1, for compatibilty OSTree -is reverted to use zero again. - -# Repository types and locations - -Also unlike git, an OSTree repository can be in one of four separate -modes: `bare`, `bare-user`, `bare-user-only`, and `archive`. A bare repository is -one where content files are just stored as regular files; it's -designed to be the source of a "hardlink farm", where each operating -system checkout is merely links into it. If you want to store files -owned by e.g. root in this mode, you must run OSTree as root. - -The `bare-user` mode is a later addition that is like `bare` in that -files are unpacked, but it can (and should generally) be created as -non-root. In this mode, extended metadata such as owner uid, gid, and -extended attributes are stored in extended attributes under the name -`user.ostreemeta` but not actually applied. -The `bare-user` mode is useful for build systems that run as non-root -but want to generate root-owned content, as well as non-root container -systems. - -The `bare-user-only` mode is a variant to the `bare-user` mode. Unlike -`bare-user`, neither ownership nor extended attributes are stored. These repos -are meant to to be checked out in user mode (with the `-U` flag), where this -information is not applied anyway. Hence this mode may loose metadata. -The main advantage of `bare-user-only` is that repos can be stored on -filesystems which do not support extended attributes, such as tmpfs. - -In contrast, the `archive` mode is designed for serving via plain -HTTP. Like tar files, it can be read/written by non-root users. - -On an OSTree-deployed system, the "system repository" is `/ostree/repo`. It can -be read by any uid, but only written by root. The `ostree` command will by -default operate on the system repository; you may provide the `--repo` argument -to override this, or set the `$OSTREE_REPO` environment variable. - -## Refs - -Like git, OSTree uses the terminology "references" (abbreviated -"refs") which are text files that name (refer to) particular -commits. See the -[Git Documentation](https://git-scm.com/book/en/v2/Git-Internals-Git-References) -for information on how git uses them. Unlike git though, it doesn't -usually make sense to have a "master" branch. There is a convention -for references in OSTree that looks like this: -`exampleos/buildmaster/x86_64-runtime` and -`exampleos/buildmaster/x86_64-devel-debug`. These two refs point to -two different generated filesystem trees. In this example, the -"runtime" tree contains just enough to run a basic system, and -"devel-debug" contains all of the developer tools and debuginfo. - -The `ostree` supports a simple syntax using the caret `^` to refer to -the parent of a given commit. For example, -`exampleos/buildmaster/x86_64-runtime^` refers to the previous build, -and `exampleos/buildmaster/x86_64-runtime^^` refers to the one before -that. - -## The summary file - -A later addition to OSTree is the concept of a "summary" file, created -via the `ostree summary -u` command. This was introduced for a few -reasons. A primary use case is to be compatible with -[Metalink](https://en.wikipedia.org/wiki/Metalink), which requires a -single file with a known checksum as a target. - -The summary file primarily contains two mappings: - - - A mapping of the refs and their checksums, equivalent to fetching - the ref file individually - - A list of all static deltas, along with their metadata checksums - -This currently means that it grows linearly with both items. On the -other hand, using the summary file, a client can enumerate branches. - -Further, fetching the summary file over e.g. pinned TLS creates a strong -end-to-end verification of the commit or static delta. - -The summary file can also be GPG signed (detached). This is currently -the only way to provide GPG signatures (transitively) on deltas. - -If a repository administrator creates a summary file, they must -thereafter run `ostree summary -u` to update it whenever a ref is -updated or a static delta is generated. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/manual/repository-management.md b/docs/manual/repository-management.md deleted file mode 100644 index 77519fb9..00000000 --- a/docs/manual/repository-management.md +++ /dev/null @@ -1,245 +0,0 @@ -# Managing content in OSTree repositories - -Once you have a build system going, if you actually want client -systems to retrieve the content, you will quickly feel a need for -"repository management". - -The command line tool `ostree` does cover some core functionality, but -doesn't include very high level workflows. One reason is that how -content is delivered and managed has concerns very specific to the -organization. For example, some operating system content vendors may -want integration with a specific errata notification system when -generating commits. - -In this section, we will describe some high level ideas and methods -for managing content in OSTree repositories, mostly independent of any -particular model or tool. That said, there is an associated upstream -project [ostree-releng-scripts](https://github.com/ostreedev/ostree-releng-scripts) -which has some scripts that are intended to implement portions of -this document. - -Another example of software which can assist in managing OSTree -repositories today is the [Pulp Project](http://www.pulpproject.org/), -which has a -[Pulp OSTree plugin](https://docs.pulpproject.org/plugins/pulp_ostree/index.html). - -## Mirroring repositories - -It's very common to want to perform a full or partial mirror, in -particular across organizational boundaries (e.g. an upstream OS -provider, and a user that wants offline and faster access to the -content). OSTree supports both full and partial mirroring of the base -`archive` content, although not yet of static deltas. - -To create a mirror, first create an `archive` repository (you don't -need to run this as root), then add the upstream as a remote, then use -`pull --mirror`. - -``` -ostree --repo=repo init --mode=archive -ostree --repo=repo remote add exampleos https://exampleos.com/ostree/repo -ostree --repo=repo pull --mirror exampleos:exampleos/x86_64/standard -``` - -You can use the `--depth=-1` option to retrieve all history, or a -positive integer like `3` to retrieve just the last 3 commits. - -See also the `rsync-repos` script in -[ostree-releng-scripts](https://github.com/ostreedev/ostree-releng-scripts). - -## Separate development vs release repositories - -By default, OSTree accumulates server side history. This is actually -optional in that your build system can (using the API) write a commit -with no parent. But first, we'll investigate the ramifications of -server side history. - -Many content vendors will want to separate their internal development -with what is made public to the world. Therefore, you will want (at -least) two OSTree repositories, we'll call them "dev" and "prod". - -To phrase this another way, let's say you have a continuous delivery -system which is building from git and committing into your "dev" -OSTree repository. This might happen tens to hundreds of times per -day. That's a substantial amount of history over time, and it's -unlikely most of your content consumers (i.e. not developers/testers) -will be interested in all of it. - -The original vision of OSTree was to fulfill this "dev" role, and in -particular the "archive" format was designed for it. - -Then, what you'll want to do is promote content from "dev" to "prod". -We'll discuss this later, but first, let's talk about promotion -*inside* our "dev" repository. - -## Promoting content along OSTree branches - "buildmaster", "smoketested" - -Besides multiple repositories, OSTree also supports multiple branches -inside one repository, equivalent to git's branches. We saw in an -earlier section an example branch name like -`exampleos/x86_64/standard`. Choosing the branch name for your "prod" -repository is absolutely critical as client systems will reference it. -It becomes an important part of your face to the world, in the same -way the "master" branch in a git repository is. - -But with your "dev" repository internally, it can be very useful to -use OSTree's branching concepts to represent different stages in a -software delivery pipeline. - -Deriving from `exampleos/x86_64/standard`, let's say our "dev" -repository contains `exampleos/x86_64/buildmaster/standard`. We choose the -term "buildmaster" to represent something that came straight from git -master. It may not be tested very much. - -Our next step should be to hook up a testing system (Jenkins, -Buildbot, etc.) to this. When a build (commit) passes some tests, we -want to "promote" that commit. Let's create a new branch called -`smoketested` to say that some basic sanity checks pass on the -complete system. This might be where human testers get involved, for -example. - -A basic way to "promote" the `buildmaster` commit that passed -testing like this: - -``` -ostree commit -b exampleos/x86_64/smoketested/standard -s 'Passed tests' --tree=ref=aec070645fe53... -``` - -Here we're generating a new commit object (perhaps include in the commit -log links to build logs, etc.), but we're reusing the *content* from the `buildmaster` -commit `aec070645fe53` that passed the smoketests. - -For a more sophisticated implementation of this model, see the -[do-release-tags](https://github.com/ostreedev/ostree-releng-scripts/blob/master/do-release-tags) -script, which includes support for things like propagating version -numbers across commit promotion. - -We can easily generalize this model to have an arbitrary number of -stages like `exampleos/x86_64/stage-1-pass/standard`, -`exampleos/x86_64/stage-2-pass/standard`, etc. depending on business -requirements and logic. - -In this suggested model, the "stages" are increasingly expensive. The -logic is that we don't want to spend substantial time on e.g. network -performance tests if something basic like a systemd unit file fails on -bootup. - - -## Promoting content between OSTree repositories - -Now, we have our internal continuous delivery stream flowing, it's -being tested and works. We want to periodically take the latest -commit on `exampleos/x86_64/stage-3-pass/standard` and expose it in -our "prod" repository as `exampleos/x86_64/standard`, with a much -smaller history. - -We'll have other business requirements such as writing release notes -(and potentially putting them in the OSTree commit message), etc. - -In [Build Systems](buildsystem-and-repos.md) we saw how the -`pull-local` command can be used to migrate content from the "build" -repository (in `bare-user` mode) into an `archive` repository for -serving to client systems. - -Following this section, we now have three repositories, let's call -them `repo-build`, `repo-dev`, and `repo-prod`. We've been pulling -content from `repo-build` into `repo-dev` (which involves gzip -compression among other things since it is a format change). - -When using `pull-local` to migrate content between two `archive` -repositories, the binary content is taken unmodified. Let's go ahead -and generate a new commit in our prod repository: - -``` -checksum=$(ostree --repo=repo-dev rev-parse exampleos/x86_64/stage-3-pass/standard`) -ostree --repo=repo-prod pull-local repo-dev ${checksum} -ostree --repo=repo-prod commit -b exampleos/x86_64/standard \ - -s 'Release 1.2.3' --add-metadata-string=version=1.2.3 \ - --tree=ref=${checksum} -``` - -There are a few things going on here. First, we found the latest -commit checksum for the "stage-3 dev", and told `pull-local` to copy -it, without using the branch name. We do this because we don't want -to expose the `exampleos/x86_64/stage-3-pass/standard` branch name in -our "prod" repository. - -Next, we generate a new commit in prod that's referencing the exact -binary content in dev. If the "dev" and "prod" repositories are on -the same Unix filesystem, (like git) OSTree will make use of hard -links to avoid copying any content at all - making the process very -fast. - -Another interesting thing to notice here is that we're adding an -`version` metadata string to the commit. This is an optional -piece of metadata, but we are encouraging its use in the OSTree -ecosystem of tools. Commands like `ostree admin status` show it by -default. - -## Derived data - static deltas and the summary file - -As discussed in [Formats](formats.md), the `archive` repository we -use for "prod" requires one HTTP fetch per client request by default. -If we're only performing a release e.g. once a week, it's appropriate -to use "static deltas" to speed up client updates. - -So once we've used the above command to pull content from `repo-dev` -into `repo-prod`, let's generate a delta against the previous commit: - -``` -ostree --repo=repo-prod static-delta generate exampleos/x86_64/standard -``` - -We may also want to support client systems upgrading from *two* -commits previous. - -``` -ostree --repo=repo-prod static-delta generate --from=exampleos/x86_64/standard^^ --to=exampleos/x86_64/standard -``` - -Generating a full permutation of deltas across all prior versions can -get expensive, and there is some support in the OSTree core for static -deltas which "recurse" to a parent. This can help create a model -where clients download a chain of deltas. Support for this is not -fully implemented yet however. - -Regardless of whether or not you choose to generate static deltas, -you should update the summary file: - -``` -ostree --repo=repo-prod summary -u -``` - -(Remember, the `summary` command cannot be run concurrently, so this - should be triggered serially by other jobs). - -There is some more information on the design of the summary file in -[Repo](repo.md). - -## Pruning our build and dev repositories - -First, the OSTree author believes you should *not* use OSTree as a -"primary content store". The binaries in an OSTree repository should -be derived from a git repository. Your build system should record -proper metadata such as the configuration options used to generate the -build, and you should be able to rebuild it if necessary. Art assets -should be stored in a system that's designed for that -(e.g. [Git LFS](https://git-lfs.github.com/)). - -Another way to say this is that five years down the line, we are -unlikely to care about retaining the exact binaries from an OS build -on Wednesday afternoon three years ago. - -We want to save space and prune our "dev" repository. - -``` -ostree --repo=repo-dev prune --refs-only --keep-younger-than="6 months ago" -``` - -That will truncate the history older than 6 months. Deleted commits -will have "tombstone markers" added so that you know they were -explicitly deleted, but all content in them (that is not referenced by -a still retained commit) will be garbage collected. - -###### Licensing for this document: -`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/related-projects.md b/docs/related-projects.md new file mode 100644 index 00000000..7ddf043f --- /dev/null +++ b/docs/related-projects.md @@ -0,0 +1,366 @@ +--- +nav_order: 10 +--- + +# Related Projects +{: .no_toc } + +1. TOC +{:toc} + +OSTree is in many ways very evolutionary. It builds on concepts and +ideas introduced from many different projects such as +[Systemd Stateless](http://0pointer.net/blog/projects/stateless.html), +[Systemd Bootloader Spec](https://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/), +[Chromium Autoupdate](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate), +the much older +[Fedora/Red Hat Stateless Project](https://fedoraproject.org/wiki/StatelessLinux), +[Linux VServer](http://linux-vserver.org/index.php?title=util-vserver:Vhashify&oldid=2285) +and many more. + +As mentioned elsewhere, OSTree is strongly influenced by package +manager designs as well. This page is not intended to be an +exhaustive list of such projects, but we will try to keep it up to +date, and relatively agnostic. + +Broadly speaking, projects in this area fall into two camps; either +a tool to snapshot systems on the client side (dpkg/rpm + BTRFS/LVM), +or a tool to compose on a server and replicate (ChromiumOS, Clear +Linux). OSTree is flexible enough to do both. + +Note that this section of the documentation is almost entirely +focused on the "ostree for host" model; the [flatpak](https://github.com/flatpak/flatpak/) +project uses libostree to store application data, distinct from the +host system management model. + +## Combining dpkg/rpm + (BTRFS/LVM) + +In this approach, one uses a block/filesystem snapshot tool underneath +the system package manager. + +The +[oVirt Node imgbased](https://gerrit.ovirt.org/gitweb?p=imgbased.git) +tool is an example of this approach, as are a few others below. + +Regarding [BTRFS](https://btrfs.wiki.kernel.org/index.php/Main_Page) +in particular - the OSTree author believes that Linux storage is a +wide world, and while BTRFS is quite good, it is not everywhere now, +nor will it be in the near future. There are other recently developed +filesystems like [f2fs](https://en.wikipedia.org/wiki/F2FS), and Red +Hat Enterprise Linux still defaults to +[XFS](https://en.wikipedia.org/wiki/XFS). + +Using a snapshot tool underneath a package manager does help +significantly. In the rest of this text, we will use "BTRFS" as a +mostly generic tool for filesystem snapshots. + +The obvious thing to do is layer BTRFS under dpkg/rpm, and have a +separate subvolume for `/home` so rollbacks don't lose your data. See +e.g. [Fedora BTRFS Rollback Feature](http://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs). + +More generally, if you want to use BTRFS to roll back changes made by +dpkg/rpm, you have to carefully set up the partition layout so that +the files laid out by dpkg/rpm are installed in a subvolume to +snapshot. + +This problem in many ways is addressed by the changes OSTree forces, +such as putting all local state in `/var` (e.g. `/usr/local` -> +`/var/usrlocal`). Then one can BTRFS snapshot `/usr`. This gets pretty +far, except handling `/etc` is messy. This is something OSTree does +well. + +In general, if one really tries to flesh out the BTRFS approach, a +nontrivial middle layer of code between dpkg/rpm and BTRFS (or deep +awareness of BTRFS in dpkg/rpm itself) will be required. A good +example of this is the [snapper.io](http://snapper.io/) project. + +The OSTree author believes that having total freedom at the block +storage layer is better for general purpose operating systems. For +example, the ability to choose dm-crypt per deployment is quite useful; +not every site wants to pay the performance penalty. One can choose +LVM or not, etc. + +Where applicable, OSTree does take advantage of copy-on-write/reflink +features offered by the kernel for `/etc`. It uses the now generic +`ioctl(FICLONE)` and `copy_file_range()`. + +Another major distinction between the default OSTree usage and package managers +is whether updates are "online" or "offline" by default. The default OSTree +design writes updates into a new root, leaving the running system unchanged. +This means preparing updates is completely non-disruptive and safe - if the +system runs out of disk space in the middle, it's easy to recover. However, +there is work in the [rpm-ostree](https://github.com/projectatomic/rpm-ostree/) +project to support online updates as well. + +OSTree supports using "bare-user" repositories, which do not require +root to use. Using a filesystem-level layer without root is more +difficult and would likely require a setuid helper or privileged service. + +Finally, see the next portion around ChromiumOS for why a hybrid but +integrated package/image system improves on this. + +## ChromiumOS updater + +Many people who look at OSTree are most interested in using +it as an updater for embedded or fixed-purpose systems, similar to use cases +from the [ChromiumOS updater](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate). + +The ChromiumOS approach uses two partitions that are swapped via the +bootloader. It has a very network-efficient update protocol, using a +custom binary delta scheme between filesystem snapshots. + +This model even allows for switching filesystem types in an update. + +A major downside of this approach is that the OS size is doubled on +disk always. In contrast, OSTree uses plain Unix hardlinks, which +means it essentially only requires disk space proportional to the +changed files, plus some small fixed overhead. + +This means with OSTree, one can easily have more than two trees +(deployments). Another example is that the system OSTree repository +could *also* be used for application containers. + +Finally, the author of OSTree believes that what one really wants for +many cases is image replication *with* the ability to layer on some +additional components (e.g. packages) - a hybrid model. This is what +[rpm-ostree](https://github.com/projectatomic/rpm-ostree/) is aiming +to support. + +## Ubuntu Image Based Updates + +See . Very architecturally +similar to ChromeOS, although more interesting is discussion for +supporting package installation on top, similar to +[rpm-ostree package layering](https://github.com/projectatomic/rpm-ostree/pull/107). + +## Clear Linux Software update + +The +[Clear Linux Software update](https://clearlinux.org/features/software-update) +system is not very well documented. +[This mailing list post](https://lists.clearlinux.org/pipermail/dev/2016-January/000159.html) +has some reverse-engineered design documentation. + +Like OSTree static deltas, it also uses bsdiff for network efficiency. + +More information will be filled in here over time. The OSTree author +believes that at the moment, the "CL updater" is not truly atomic in +the sense that because it applies updates live, there is a window +where the OS root may be inconsistent. + +## casync + +The [systemd casync](https://github.com/systemd/casync) project is +relatively new. Currently, it is more of a storage library, and doesn't +support higher level logic for things like GPG signatures, versioning +information, etc. This is mostly the `OstreeRepo` layer. Moving up to +the `OstreeSysroot` level - things like managing the bootloader +configuration, and most importantly implementing correct merging for `/etc` +are missing. casync also is unaware of SELinux. + +OSTree is really today a shared library, and has been for quite some time. +This has made it easy to build higher level projects such as +[rpm-ostree](https://github.com/projectatomic/rpm-ostree/) which has quite +a bit more, such as a DBus API and other projects consume that, such as +[Cockpit](http://cockpit-project.org/). + +A major issue with casync today is that it doesn't support garbage collection +on the server side. OSTree's GC works symmetrically on the server and client +side. + +Broadly speaking, casync is a twist on the dual partition approach, and +shares the general purpose disadvantages of those. + +## Mender.io + +[Mender.io](https://mender.io/) is another implementation of the dual +partition approach. + +## OLPC update + +OSTree is basically a generalization of olpc-update, except using +plain HTTP instead of rsync. OSTree has the notion of separate trees +that one can track independently or parallel install, while still +sharing storage via the hardlinked repository, whereas olpc-update +uses version numbers for a single OS. + +OSTree has built-in plain old HTTP replication which can be served +from a static webserver, whereas olpc-update uses `rsync` (more server +load, but more efficient on the network side). The OSTree solution to +improving network bandwidth consumption is via static deltas. + +See +[this comment](http://blog.verbum.org/2013/08/26/ostree-v2013-6-released/#comment-1169) +for a comparison. + +## NixOS / Nix + +See [NixOS](http://nixos.org/). It was a very influential project for OSTree. +NixOS and OSTree both support the idea of independent "roots" that are bootable. + +In NixOS, files in a package are accessed by a path depending on the checksums +of package inputs (build dependencies) - see +[Nix store](http://nixos.org/nix/manual/#chap-package-management/). +However, OSTree uses a commit/deploy model - it isn't tied to any particular +directory layout, and you can put whatever data you want inside an OSTree, for +example the standard FHS layout. A both positive and negative of the Nix model +is that a change in the build dependencies (e.g. being built with a newer gcc), +requires a cascading rebuild of everything. It's good because it makes it easy +to do massive system-wide changes such as gcc upgrades, and allows installing +multiple versions of packages at once. However, a security update to e.g. glibc +forces a rebuild of everything from scratch, and so Nix is not practical at +scale. OSTree supports using a build system that just rebuilds individual +components (packages) as they change, without forcing a rebuild of their +dependencies. + +Nix automatically detects runtime package dependencies by scanning content for +hashes. OSTree only supports only system-level images, and doesn't do dependency +management. Nix can store arbitrary files, using nix-store --add, but, more +commonly, paths are added as the result of running a derivation file generated +using the Nix language. OSTree is build-system agnostic; filesystem trees are +committed using a simple C API, and this is the only way to commit files. + +OSTree automatically shares the storage of identical data using hard links into +a content-addressed store. Nix can deduplicate using hard links as well, using +the auto-optimise-store option, but this is not on by default, and Nix does not +guarantee that all of its files are in the content-addressed store. OSTree +provides a git-like command line interface for browsing the content-addressed +store, while Nix does not have this functionality. + +Nix used to use the immutable bit to prevent modifications to /nix/store, but +now it uses a read-only bind mount. The bind mount can be privately remounted, +allowing per-process privileged write access. OSTree uses the immutable +bit on the root of the deployment, and mounts /usr as read-only. + +NixOS supports switching OS images on-the-fly, by maintaining both booted-system +and current-system roots. It is not clear how well this approach works. OSTree +currently requries a reboot to switch images. + +Finally, NixOS supports installing user-specific packages from trusted +repositories without requiring root, using a trusted daemon. +[Flatpak](https://lwn.net/Articles/687909/), based on OSTree, similarly has a +policykit-based system helper that allows you to authenticate via polkit to +install into the system repository. + +## Solaris IPS + +See +[Solaris IPS](http://hub.opensolaris.org/bin/view/Project+pkg/). Broadly, +this is a similar design as to a combination of BTRFS+RPM/deb. There +is a bootloader management system which combines with the snapshots. +It's relatively well thought through - however, it is a client-side +system assembly. If one wants to image servers and replicate +reliably, that'd be a different system. + +## Google servers (custom rsync-like approach, live updates) + +This paper talks about how Google was (at least at one point) managing +updates for the host systems for some servers: +[Live Upgrading Thousands of Servers from an Ancient Red Hat Distribution to 10 Year Newer Debian Based One (USENIX LISA 2013)](https://www.usenix.org/node/177348) + +## Conary + +See +[Conary Updates and Rollbacks](http://wiki.rpath.com/wiki/Conary:Updates_and_Rollbacks). If +rpm/dpkg are like CVS, Conary is closer to Subversion. It's not bad, +but e.g. its rollback model is rather ad-hoc and not atomic. It also +is a fully client side system and doesn't have an image-like +replication with deltas. + +## bmap + +See +[bmap](https://source.tizen.org/documentation/reference/bmaptool/introduction). +A tool for optimized copying of disk images. Intended for offline use, +so not directly comparable. + +## Git + +Although OSTree has been called "Git for Binaries", and the two share the idea +of a hashed content store, the implementation details are quite different. +OSTree supports extended attributes and uses SHA256 instead of Git's SHA1. It +"checks out" files via hardlinks, rather than copying, and thus requires the +checkout to be immutable. At the moment, OSTree commits may have at most one +parent, as opposed to Git which allows an arbitrary number. Git uses a +smart-delta protocol for updates, while OSTree uses 1 HTTP request per changed +file, or can generate static deltas. + +## Conda + +[Conda](http://conda.pydata.org/docs/) is an "OS-agnostic, system-level binary +package manager and ecosystem"; although most well-known for its accompanying +Python distribution anaconda, its scope has been expanding quickly. The package +format is very similar to well-known ones such as RPM. However, unlike typical +RPMs, the packages are built to be relocatable. Also, the package manager runs +natively on Windows. Conda's main advantage is its ability to install +collections of packages into "environments" by unpacking them all to the same +directory. Conda reduces duplication across environments using hardlinks, +similar to OSTree's sharing between deployments (although Conda uses package / +file path instead of file hash). Overall, it is quite similar to rpm-ostree in +functionality and scope. + +## rpm-ostree + +This builds on top of ostree to support building RPMs into OSTree images, and +even composing RPMs on-the-fly using an overlay filesystem. It is being +developed by Fedora, Red Hat, and CentOS as part of Project Atomic. + +## GNOME Continuous + +This is a service that incrementally rebuilds and tests GNOME on every commit. +The need to make and distribute snapshots for this system was the original +inspiration for ostree. + +## Docker + +It makes sense to compare OSTree and Docker as far as *wire formats* +go. OSTree is not itself a container tool, but can be used as a +transport/storage format for container tools. + +Docker has (at the time of this writing) two format versions (v1 and +v2). v1 is deprecated, so we'll look at [format version 2](https://github.com/docker/docker/blob/master/image/spec/v1.1.md). + +A Docker image is a series of layers, and a layer is essentially JSON +metadata plus a tarball. The tarballs capture changes between layers, +including handling deleting files in higher layers. + +Because the payload format is just tar, Docker hence captures +(numeric) uid/gid and xattrs. + +This "layering" model is an interesting and powerful part of Docker, +allowing different images to reference a shared base. OSTree doesn't +implement this natively, but it's not difficult to implement in higher +level tools. For example in +[flatpak](https://github.com/flatpak/flatpak), there's a concept of a +SDK and runtime, and it would make a lot of sense for the SDK to +depend on the runtime, to avoid clients downloading data twice (even +if it's deduplicated on disk). + +That gets to an advantage of OSTree over Docker; OSTree checksums +individual files (not tarballs), and uses this for deduplication. +Docker (natively) only shares storage via layering. + +The biggest feature OSTree has over Docker though is support for +(static) deltas, and even without pre-configured static deltas, the +`archive` format has "natural" deltas. Particularly for a "base +operating system", one really wants on-wire deltas. It'd likely be +possible to extend Docker with this concept. + +A core challenge both share is around metadata (particularly signing) +and search/discovery (the ostree `summary` file doesn't scale very +well). + +One major issue Docker has is that it [checksums compressed data](https://github.com/projectatomic/skopeo/issues/11), +and furthermore the tar format is flexible, with multiple ways to represent data, +making it hard to impossible to reassemble and verify from on-disk state. +The [tarsum](https://github.com/docker/docker/blob/master/pkg/tarsum/tarsum_spec.md) effort +was intended to address this, but it was not adopted in the end for v2. + +## Docker-related: Balena + +The [Balena](https://github.com/resin-os/balena) project forks Docker and aims +to even use Docker/OCI format for the root filesystem, and adds wire deltas +using librsync. See also [discussion on libostree-list](https://mail.gnome.org/archives/ostree-list/2017-December/msg00002.html). + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/repo.md b/docs/repo.md new file mode 100644 index 00000000..5cc59bf1 --- /dev/null +++ b/docs/repo.md @@ -0,0 +1,166 @@ +--- +nav_order: 3 +--- + +# Anatomy of an OSTree repository +{: .no_toc } + +1. TOC +{:toc} + +## Core object types and data model + +OSTree is deeply inspired by git; the core layer is a userspace +content-addressed versioning filesystem. It is worth taking some time +to familiarize yourself with +[Git Internals](http://git-scm.com/book/en/Git-Internals), as this +section will assume some knowledge of how git works. + +Its object types are similar to git; it has commit objects and content +objects. Git has "tree" objects, whereas OSTree splits them into +"dirtree" and "dirmeta" objects. But unlike git, OSTree's checksums +are SHA256. And most crucially, its content objects include uid, gid, +and extended attributes (but still no timestamps). + +### Commit objects + +A commit object contains metadata such as a timestamp, a log +message, and most importantly, a reference to a +dirtree/dirmeta pair of checksums which describe the root +directory of the filesystem. +Also like git, each commit in OSTree can have a parent. It is +designed to store a history of your binary builds, just like git +stores a history of source control. However, OSTree also makes +it easy to delete data, under the assumption that you can +regenerate it from source code. + +### Dirtree objects + +A dirtree contains a sorted array of (filename, checksum) +pairs for content objects, and a second sorted array of +(filename, dirtree checksum, dirmeta checksum), which are +subdirectories. These type of objects are stored as files +ending with `.dirtree` in the objects directory. + +### Dirmeta objects + +In git, tree objects contain the metadata such as permissions +for their children. But OSTree splits this into a separate +object to avoid duplicating extended attribute listings. +These type of objects are stored as files ending with `.dirmeta` +in the objects directory. + +### Content objects + +Unlike the first three object types which are metadata, designed to be +`mmap()`ed, the content object has a separate internal header and +payload sections. The header contains uid, gid, mode, and symbolic +link target (for symlinks), as well as extended attributes. After the +header, for regular files, the content follows. These parts toghether +form the SHA256 hash for content objects. The content type objects in +this format exist only in `archive` OSTree repositories. Today the +content part is gzip'ed and the objects are stored as files ending +with `.filez` in the objects directory. Because the SHA256 hash is +formed over the uncompressed content, these files do not match the +hash they are named as. + +The OSTree data format intentionally does not contain timestamps. The reasoning +is that data files may be downloaded at different times, and by different build +systems, and so will have different timestamps but identical physical content. +These files may be large, so most users would like them to be shared, both in +the repository and between the repository and deployments. + +This could cause problems with programs that check if files are out-of-date by +comparing timestamps. For Git, the logical choice is to not mess with +timestamps, because unnecessary rebuilding is better than a broken tree. +However, OSTree has to hardlink files to check them out, and commits are assumed +to be internally consistent with no build steps needed. For this reason, OSTree +acts as though all timestamps are set to time_t 0, so that comparisons will be +considered up-to-date. Note that for a few releases, OSTree used 1 to fix +warnings such as GNU Tar emitting "implausibly old time stamp" with 0; however, +until we have a mechanism to transition cleanly to 1, for compatibilty OSTree +is reverted to use zero again. + +# Repository types and locations + +Also unlike git, an OSTree repository can be in one of four separate +modes: `bare`, `bare-user`, `bare-user-only`, and `archive`. A bare repository is +one where content files are just stored as regular files; it's +designed to be the source of a "hardlink farm", where each operating +system checkout is merely links into it. If you want to store files +owned by e.g. root in this mode, you must run OSTree as root. + +The `bare-user` mode is a later addition that is like `bare` in that +files are unpacked, but it can (and should generally) be created as +non-root. In this mode, extended metadata such as owner uid, gid, and +extended attributes are stored in extended attributes under the name +`user.ostreemeta` but not actually applied. +The `bare-user` mode is useful for build systems that run as non-root +but want to generate root-owned content, as well as non-root container +systems. + +The `bare-user-only` mode is a variant to the `bare-user` mode. Unlike +`bare-user`, neither ownership nor extended attributes are stored. These repos +are meant to to be checked out in user mode (with the `-U` flag), where this +information is not applied anyway. Hence this mode may loose metadata. +The main advantage of `bare-user-only` is that repos can be stored on +filesystems which do not support extended attributes, such as tmpfs. + +In contrast, the `archive` mode is designed for serving via plain +HTTP. Like tar files, it can be read/written by non-root users. + +On an OSTree-deployed system, the "system repository" is `/ostree/repo`. It can +be read by any uid, but only written by root. The `ostree` command will by +default operate on the system repository; you may provide the `--repo` argument +to override this, or set the `$OSTREE_REPO` environment variable. + +## Refs + +Like git, OSTree uses the terminology "references" (abbreviated +"refs") which are text files that name (refer to) particular +commits. See the +[Git Documentation](https://git-scm.com/book/en/v2/Git-Internals-Git-References) +for information on how git uses them. Unlike git though, it doesn't +usually make sense to have a "master" branch. There is a convention +for references in OSTree that looks like this: +`exampleos/buildmaster/x86_64-runtime` and +`exampleos/buildmaster/x86_64-devel-debug`. These two refs point to +two different generated filesystem trees. In this example, the +"runtime" tree contains just enough to run a basic system, and +"devel-debug" contains all of the developer tools and debuginfo. + +The `ostree` supports a simple syntax using the caret `^` to refer to +the parent of a given commit. For example, +`exampleos/buildmaster/x86_64-runtime^` refers to the previous build, +and `exampleos/buildmaster/x86_64-runtime^^` refers to the one before +that. + +## The summary file + +A later addition to OSTree is the concept of a "summary" file, created +via the `ostree summary -u` command. This was introduced for a few +reasons. A primary use case is to be compatible with +[Metalink](https://en.wikipedia.org/wiki/Metalink), which requires a +single file with a known checksum as a target. + +The summary file primarily contains two mappings: + + - A mapping of the refs and their checksums, equivalent to fetching + the ref file individually + - A list of all static deltas, along with their metadata checksums + +This currently means that it grows linearly with both items. On the +other hand, using the summary file, a client can enumerate branches. + +Further, fetching the summary file over e.g. pinned TLS creates a strong +end-to-end verification of the commit or static delta. + +The summary file can also be GPG signed (detached). This is currently +the only way to provide GPG signatures (transitively) on deltas. + +If a repository administrator creates a summary file, they must +thereafter run `ostree summary -u` to update it whenever a ref is +updated or a static delta is generated. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` diff --git a/docs/repository-management.md b/docs/repository-management.md new file mode 100644 index 00000000..11fe2f40 --- /dev/null +++ b/docs/repository-management.md @@ -0,0 +1,253 @@ +--- +nav_order: 9 +--- + +# Managing content in OSTree repositories +{: .no_toc } + +1. TOC +{:toc} + +Once you have a build system going, if you actually want client +systems to retrieve the content, you will quickly feel a need for +"repository management". + +The command line tool `ostree` does cover some core functionality, but +doesn't include very high level workflows. One reason is that how +content is delivered and managed has concerns very specific to the +organization. For example, some operating system content vendors may +want integration with a specific errata notification system when +generating commits. + +In this section, we will describe some high level ideas and methods +for managing content in OSTree repositories, mostly independent of any +particular model or tool. That said, there is an associated upstream +project [ostree-releng-scripts](https://github.com/ostreedev/ostree-releng-scripts) +which has some scripts that are intended to implement portions of +this document. + +Another example of software which can assist in managing OSTree +repositories today is the [Pulp Project](http://www.pulpproject.org/), +which has a +[Pulp OSTree plugin](https://docs.pulpproject.org/plugins/pulp_ostree/index.html). + +## Mirroring repositories + +It's very common to want to perform a full or partial mirror, in +particular across organizational boundaries (e.g. an upstream OS +provider, and a user that wants offline and faster access to the +content). OSTree supports both full and partial mirroring of the base +`archive` content, although not yet of static deltas. + +To create a mirror, first create an `archive` repository (you don't +need to run this as root), then add the upstream as a remote, then use +`pull --mirror`. + +``` +ostree --repo=repo init --mode=archive +ostree --repo=repo remote add exampleos https://exampleos.com/ostree/repo +ostree --repo=repo pull --mirror exampleos:exampleos/x86_64/standard +``` + +You can use the `--depth=-1` option to retrieve all history, or a +positive integer like `3` to retrieve just the last 3 commits. + +See also the `rsync-repos` script in +[ostree-releng-scripts](https://github.com/ostreedev/ostree-releng-scripts). + +## Separate development vs release repositories + +By default, OSTree accumulates server side history. This is actually +optional in that your build system can (using the API) write a commit +with no parent. But first, we'll investigate the ramifications of +server side history. + +Many content vendors will want to separate their internal development +with what is made public to the world. Therefore, you will want (at +least) two OSTree repositories, we'll call them "dev" and "prod". + +To phrase this another way, let's say you have a continuous delivery +system which is building from git and committing into your "dev" +OSTree repository. This might happen tens to hundreds of times per +day. That's a substantial amount of history over time, and it's +unlikely most of your content consumers (i.e. not developers/testers) +will be interested in all of it. + +The original vision of OSTree was to fulfill this "dev" role, and in +particular the "archive" format was designed for it. + +Then, what you'll want to do is promote content from "dev" to "prod". +We'll discuss this later, but first, let's talk about promotion +*inside* our "dev" repository. + +## Promoting content along OSTree branches - "buildmaster", "smoketested" + +Besides multiple repositories, OSTree also supports multiple branches +inside one repository, equivalent to git's branches. We saw in an +earlier section an example branch name like +`exampleos/x86_64/standard`. Choosing the branch name for your "prod" +repository is absolutely critical as client systems will reference it. +It becomes an important part of your face to the world, in the same +way the "master" branch in a git repository is. + +But with your "dev" repository internally, it can be very useful to +use OSTree's branching concepts to represent different stages in a +software delivery pipeline. + +Deriving from `exampleos/x86_64/standard`, let's say our "dev" +repository contains `exampleos/x86_64/buildmaster/standard`. We choose the +term "buildmaster" to represent something that came straight from git +master. It may not be tested very much. + +Our next step should be to hook up a testing system (Jenkins, +Buildbot, etc.) to this. When a build (commit) passes some tests, we +want to "promote" that commit. Let's create a new branch called +`smoketested` to say that some basic sanity checks pass on the +complete system. This might be where human testers get involved, for +example. + +A basic way to "promote" the `buildmaster` commit that passed +testing like this: + +``` +ostree commit -b exampleos/x86_64/smoketested/standard -s 'Passed tests' --tree=ref=aec070645fe53... +``` + +Here we're generating a new commit object (perhaps include in the commit +log links to build logs, etc.), but we're reusing the *content* from the `buildmaster` +commit `aec070645fe53` that passed the smoketests. + +For a more sophisticated implementation of this model, see the +[do-release-tags](https://github.com/ostreedev/ostree-releng-scripts/blob/master/do-release-tags) +script, which includes support for things like propagating version +numbers across commit promotion. + +We can easily generalize this model to have an arbitrary number of +stages like `exampleos/x86_64/stage-1-pass/standard`, +`exampleos/x86_64/stage-2-pass/standard`, etc. depending on business +requirements and logic. + +In this suggested model, the "stages" are increasingly expensive. The +logic is that we don't want to spend substantial time on e.g. network +performance tests if something basic like a systemd unit file fails on +bootup. + + +## Promoting content between OSTree repositories + +Now, we have our internal continuous delivery stream flowing, it's +being tested and works. We want to periodically take the latest +commit on `exampleos/x86_64/stage-3-pass/standard` and expose it in +our "prod" repository as `exampleos/x86_64/standard`, with a much +smaller history. + +We'll have other business requirements such as writing release notes +(and potentially putting them in the OSTree commit message), etc. + +In [Build Systems](buildsystem-and-repos.md) we saw how the +`pull-local` command can be used to migrate content from the "build" +repository (in `bare-user` mode) into an `archive` repository for +serving to client systems. + +Following this section, we now have three repositories, let's call +them `repo-build`, `repo-dev`, and `repo-prod`. We've been pulling +content from `repo-build` into `repo-dev` (which involves gzip +compression among other things since it is a format change). + +When using `pull-local` to migrate content between two `archive` +repositories, the binary content is taken unmodified. Let's go ahead +and generate a new commit in our prod repository: + +``` +checksum=$(ostree --repo=repo-dev rev-parse exampleos/x86_64/stage-3-pass/standard`) +ostree --repo=repo-prod pull-local repo-dev ${checksum} +ostree --repo=repo-prod commit -b exampleos/x86_64/standard \ + -s 'Release 1.2.3' --add-metadata-string=version=1.2.3 \ + --tree=ref=${checksum} +``` + +There are a few things going on here. First, we found the latest +commit checksum for the "stage-3 dev", and told `pull-local` to copy +it, without using the branch name. We do this because we don't want +to expose the `exampleos/x86_64/stage-3-pass/standard` branch name in +our "prod" repository. + +Next, we generate a new commit in prod that's referencing the exact +binary content in dev. If the "dev" and "prod" repositories are on +the same Unix filesystem, (like git) OSTree will make use of hard +links to avoid copying any content at all - making the process very +fast. + +Another interesting thing to notice here is that we're adding an +`version` metadata string to the commit. This is an optional +piece of metadata, but we are encouraging its use in the OSTree +ecosystem of tools. Commands like `ostree admin status` show it by +default. + +## Derived data - static deltas and the summary file + +As discussed in [Formats](formats.md), the `archive` repository we +use for "prod" requires one HTTP fetch per client request by default. +If we're only performing a release e.g. once a week, it's appropriate +to use "static deltas" to speed up client updates. + +So once we've used the above command to pull content from `repo-dev` +into `repo-prod`, let's generate a delta against the previous commit: + +``` +ostree --repo=repo-prod static-delta generate exampleos/x86_64/standard +``` + +We may also want to support client systems upgrading from *two* +commits previous. + +``` +ostree --repo=repo-prod static-delta generate --from=exampleos/x86_64/standard^^ --to=exampleos/x86_64/standard +``` + +Generating a full permutation of deltas across all prior versions can +get expensive, and there is some support in the OSTree core for static +deltas which "recurse" to a parent. This can help create a model +where clients download a chain of deltas. Support for this is not +fully implemented yet however. + +Regardless of whether or not you choose to generate static deltas, +you should update the summary file: + +``` +ostree --repo=repo-prod summary -u +``` + +(Remember, the `summary` command cannot be run concurrently, so this + should be triggered serially by other jobs). + +There is some more information on the design of the summary file in +[Repo](repo.md). + +## Pruning our build and dev repositories + +First, the OSTree author believes you should *not* use OSTree as a +"primary content store". The binaries in an OSTree repository should +be derived from a git repository. Your build system should record +proper metadata such as the configuration options used to generate the +build, and you should be able to rebuild it if necessary. Art assets +should be stored in a system that's designed for that +(e.g. [Git LFS](https://git-lfs.github.com/)). + +Another way to say this is that five years down the line, we are +unlikely to care about retaining the exact binaries from an OS build +on Wednesday afternoon three years ago. + +We want to save space and prune our "dev" repository. + +``` +ostree --repo=repo-dev prune --refs-only --keep-younger-than="6 months ago" +``` + +That will truncate the history older than 6 months. Deleted commits +will have "tombstone markers" added so that you know they were +explicitly deleted, but all content in them (that is not referenced by +a still retained commit) will be garbage collected. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)`