--- /dev/null
+# OSTree data formats
+
+## On the topic of "smart servers"
+
+One really crucial difference between OSTree and git is that git has a
+"smart server". Even when fetching over `https://`, it isn't just a
+static webserver, but one that e.g. dynamically computes and
+compresses pack files for each client.
+
+In contrast, the author of OSTree feels that for operating system
+updates, many deployments will want to use simple static webservers,
+the same target most package systems were designed to use. The
+primary advantages are security and compute efficiency. Services like
+Amazon S3 and CDNs are a canonical target, as well as a stock static
+nginx server.
+
+## The archive-z2 format
+
+In the [repo](repo) section, the concept of objects was introduced,
+where file/content objects are checksummed and managed individually.
+(Unlike a package system, which operates on compressed aggregates).
+
+The archive-z2 format simply gzip-compresses each content object.
+Metadata objects are stored uncompressed. This means that it's easy
+to serve via static HTTP.
+
+When you commit new content, you will see new `.filez` files appearing
+in `objects/`.
+
+## archive-z2 efficiency
+
+The advantages of `archive-z2`:
+
+ - It's easy to understand and implement
+ - Can be served directly over plain HTTP by a static webserver
+ - Clients can download/unpack updates incrementally
+ - Space efficient on the server
+
+The biggest disadvantage of this format is that for a client to
+perform an update, one HTTP request per changed file is required. In
+some scenarios, this actually isn't bad at all, particularly with
+techniques to reduce HTTP overhead, such as
+[HTTP/2](https://en.wikipedia.org/wiki/HTTP/2).
+
+In order to make this format work well, you should design your content
+such that large data that changes infrequently (e.g. graphic images)
+are stored separately from small frequently changing data (application
+code).
+
+Other disadvantages of `archive-z2`:
+
+ - It's quite bad when clients are performing an initial pull (without HTTP/2),
+ - One doesn't know the total size (compressed or uncompressed) of content
+ before downloading everything
+
+## Aside: the bare and bare-user formats
+
+The most common operation is to pull from an `archive-z2` repository
+into a `bare` or `bare-user` formatted repository. These latter two
+are not compressed on disk. In other words, pulling to them is
+similar to unpacking (but not installing) an RPM/deb package.
+
+The `bare-user` format is a bit special in that the uid/gid and xattrs
+from the content are ignored. This is primarily useful if you want to
+have the same OSTree-managed content that can be run on a host system
+or an unprivileged container.
+
+## Static deltas
+
+OSTree itself was originally focused on a continous delivery model, where
+client systems are expected to update regularly. However, many OS vendors
+would like to supply content that's updated e.g. once a month or less often.
+
+For this model, we can do a lot better to support batched updates than
+a basic `archive-z2` repo. However, we still want to preserve the
+model of "static webserver only". Given this, OSTree has gained the
+concept of a "static delta".
+
+These deltas are targeted to be a delta between two specific commit
+objects, including "bsdiff" and "rsync-style" deltas within a content
+object. Static deltas also support `from NULL`, where the client can
+more efficiently download a commit object from scratch.
+
+Effectively, we're spending server-side storage (and one-time compute
+cost), and gaining efficiency in client network bandwith.
+
+## Static delta repository layout
+
+Since static deltas may not exist, the client first needs to attempt
+to locate one. Suppose a client wants to retrieve commit `${new}`
+while currently running `${current}`.
+
+The first thing to understand is that in order to save space, these
+two commits are "modified base64" - the `/` character is replaced with
+`_`.
+
+Like the commit objects, a "prefix directory" is used to make
+management easier for filesystem tools
+
+A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example
+`GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`,
+which in sha256 format is
+`1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`.
+
+Finally, the actual content can be found in
+`deltas/$fromprefix/$fromsuffix-$to`.
+
+## Static delta internal structure
+
+A delta is itself a directory. Inside, there is a file called
+`superblock` which contains metadata. The rest of the files will be
+integers bearing packs of content.
+
+The file format of static deltas should be currently considered an
+OSTree implementation detail. Obviously, nothing stops one from
+writing code which is compatible with OSTree today. However, we would
+like the flexibility to expand and change things, and having multiple
+codebases makes that more problematic. Please contact the authors
+with any requests.
+
+That said, one critical thing to understand about the design is that
+delta payloads are a bit more like "restricted programs" than they are
+raw data. There's a "compilation" phase which generates output that
+the client executes.
+
+This "updates as code" model allows for multiple content generation
+strategies. The design of this was inspired by that of Chromium:
+[http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate](ChromiumOS
+autoupdate).
+
+### The delta superblock
+
+The superblock contains:
+
+ - arbitrary metadata
+ - delta generation timestamp
+ - the new commit object
+ - An array of recursive deltas to apply
+ - An array of per-part metadata, including total object sizes (compressed and uncompressed),
+ - An array of fallback objects
+
+Let's define a delta part, then return to discuss details:
+
+## A delta part
+
+A delta part is a combination of a raw blob of data, plus a very
+restricted bytecode that operates on it. Say for example two files
+happen to share a common section. It's possible for the delta
+compilation to include that section once in the delta data blob, then
+generate instructions to write out that blob twice when generating
+both objects.
+
+Realistically though, it's very common for most of a delta to just be
+"stream of new objects" - if one considers it, it doesn't make sense
+to have too much duplication inside operating system content at this
+level.
+
+So then, what's more interesting is that OSTree static deltas support
+a per-file delta algorithm called
+[bsdiff](https://github.com/mendsley/bsdiff) that most notably works
+well on executable code.
+
+The current delta compiler scans for files with maching basenamesin
+each commit that have a similar size, and attempts a bsdiff between
+them. (It would make sense later to have a build system provide a
+hint for this - for example, files within a same package).
+
+A generated bsdiff is included in the payload blob, and applying it is
+an instruction.
+
+## Fallback objects
+
+It's possible for there to be large-ish files which might be resistant
+to bsdiff. A good example is that it's common for operating systems
+to use an "initramfs", which is itself a compressed filesystem. This
+"internal compression" defeats bsdiff analysis.
+
+For these types of objects, the delta superblock contains an array of
+"fallback objects". These objects aren't included in the delta
+parts - the client simply fetches them from the underlying `.filez`
+object.
+++ /dev/null
-OSTree Static Object Deltas
-===========================
-
-Currently, OSTree's "archive-z2" mode stores both metadata and content
-objects as individual files in the filesystem. Content objects are
-zlib-compressed.
-
-The advantage of this is model are:
-
-0) It's easy to understand and implement
-1) Can be served directly over plain HTTP by a static webserver
-2) Space efficient on the server
-
-However, it can be inefficient both for large updates and small ones:
-
-0) For large tree changes (such as going from -runtime to
- -devel-debug, or major version upgrades), this can mean thousands
- and thousands of HTTP requests. The overhead for that is very
- large (until SPDY/HTTP2.0), and will be catastrophically bad if the
- webserver is not configured with KeepAlive.
-1) Small changes (typo in gnome-shell .js file) still require around
- 5 metadata HTTP requests, plus a redownload of the whole file.
-
-Why not smart servers?
-======================
-
-Smart servers (custom daemons, or just CGI scripts) as git has are not
-under consideration for this proposal. OSTree is designed for the
-same use case as GNU/Linux distribution package systems are, where
-content is served by a network of volunteer mirrors that will
-generally not run custom code.
-
-In particular, Amazon S3 style dumb content servers is a very
-important use case, as is being able to apply updates from static
-media like DVD-ROM.
-
-Finding Static Deltas
-=====================
-
-Since static deltas may not exist, the client first needs to attempt
-to locate one. Suppose a client wants to retrieve commit ${new} while
-currently running ${current}. The first thing to fetch is the delta
-metadata, called "meta". It can be found at
-${repo}/deltas/${current}-${new}/meta.
-
-FIXME: GPG signatures (.metameta?) Or include commit object in meta?
-But we would then be forced to verify the commit only after processing
-the entirety of the delta, which is dangerous. I think we need to
-require signing deltas.
-
-Delta Bytecode Format
-=====================
-
-A delta-part has the following form:
-
-byte compression-type (0 = none, 'g' = gzip')
-REPEAT[(varint size, delta-part-content)]
-
-delta-part-content:
- byte[] payload
- ARRAY[operation]
-
-The rationale for having delta-part is that it allows easy incremental
-resumption of downloads. The client can look at the delta descriptor
-and skip downloading delta-parts for which it already has the
-contained objects. This is better than simply resuming a gigantic
-file because if the client decides to fetch a slightly newer version,
-it's very probable that some of the downloading we've already done is
-still useful.
-
-For the actual delta payload, it comes as a stream of pair of
-(payload, operation) so that it can be processed while being
-decompressed.
-
-Finally, the delta-part-content is effectively a high level bytecode
-for a stack-oriented machine. It iterates on the array of objects in
-order. The following operations are available:
-
-FETCH
- Fall back to fetching the current object individually. Move
- to the next object.
-
-WRITE(array[(varint offset, varint length)])
- Write from current input target (default payload) to output.
-
-GUNZIP(array[(varint offset, varint length)])
- gunzip from current input target (default payload) to output.
-
-CLOSE
- Close the current output target, and proceed to the next; if the
- output object was a temporary, the output resets to the current
- object.
-
-# Change the input source to an object
-READOBJECT(csum object)
- Set object as current input target
-
-# Change the input source to payload
-READPAYLOAD
- Set payload as current input target
-
-Compiling Deltas
-================
-
-After reading the above, you may be wondering how we actually *make*
-these deltas. I envison a strategy similar to that employed by
-Chromium autoupdate:
-http://www.chromium.org/chromium-os/chromiumos-design-docs/autoupdate-details
-
-Something like this would be a useful initial algorithm:
-1) Compute the set of added objects NEW
-2) For each object in NEW:
- - Look for a the set of "superficially similar" objects in the
- previous tree, using heuristics based first on filename (including
- prefix), then on size. Call this set CANDIDATES.
- For each entry in CANDIDATES:
- - Try doing a bup/librsync style rolling checksum, and compute the
- list of changed blocks.
- - Try gzip-compressing it
-3) Choose the lowest cost method for each NEW object, and partition
- the program for each method into deltapart-sized chunks.
-
-However, there are many other possibilities, that could be used in a
-hybrid mode with the above. For example, we could try to find similar
-objects, and gzip them together. This would be a *very* useful
-strategy for things like the 9000 Boost headers which have massive
-amounts of redundant data.
-
-Notice too that the delta format supports falling back to retrieving
-individual objects. For cases like the initramfs which is compressed
-inside the tree with gzip, we're not going to find an efficient way to
-sync it, so the delta compiler should just fall back to fetching it
-individually.
-
-Which Deltas To Create?
-=======================
-
-Going back to the start, there are two cases to optimize for:
-
-1) Incremental upgrades between builds
-2) Major version upgrades
-
-A command line operation would look something like this:
-
-$ ostree --repo=/path/to/repo gendelta --ref-prefix=gnome-ostree/buildmaster/ --strategy=latest --depth=5
-
-This would tell ostree to generate deltas from each of the last 4
-commits to each ref (e.g. gnome-ostree/buildmaster/x86_64-runtime) to
-the latest commit. It might also be possible of course to have
---strategy=incremental where we generate a delta between each commit.
-I suspect that'd be something to do if one has a *lot* of disk space
-to spend, and there's a reason for clients to be fetching individual
-refs.
-
-$ ostree --repo=/path/to/repo gendelta --from=gnome-ostree/3.10/x86_64-runtime --to=gnome-ostree/buildmaster/x86_64-runtime
-
-This is an obvious one - generate a delta from the last stable release
-to the current development head.