
OSTree – Robust OS upgrades for Linux - alexlarsson
http://blog.verbum.org/2013/08/26/ostree-v2013-6-released/
======
contingencies
The point that nobody seems to be making is that this degree of rigour is
essentially a requirement for some classes of systems, ie. the OS platform
upon which you deploy applications becomes a named, versioned, tested entity
with its own repo and changelog, which can be rapidly provisioned at any of
its versioned states, rolled back, and have specific versions of specific
applications tested against it. This is not always required, but is definitely
good practice in all cases.

This is the part where I suggest docker people broaden their scope to include
such... "devopsy" concerns around virt deployment. In my own system used
internally within my company we have this distinction .. the platforms are
called PEs (platform environments) and the applications are SEs (service
environments). Combining both produces a SIN (service instance node) on a
particular CP (cloud provider).

While I really support docker and they probably get annoyed at me always
commenting on their project and coming across as being slightly critical, in
all honesty docker irritates me because it leaves all this sort of business
out of scope. But I think they are possibly also heading in this direction. :)

~~~
wmf
That stuff is being built by CoreOS which is technically a different project
but I suspect many people will end up using CoreOS and Docker together for
maximum DevOps. Since Chrome/CoreOS and OSTree have different philosophies it
might be worth exploring both ways to do it.

~~~
contingencies
Ahh yes, I forgot about that. CoreOS looks interesting but doesn't seem to
handle paravirt, which is a serious limitation if you need to run non-Linux
systems, plus it's based on vagrant, which falls in to the class of IMHO
misconstrued sysadmin-automation software I refer to as PFCTs (post-facto
configuration tinkerers). PFCT based instantiation is IMHO epic-fail versus
cleaner methods such as cloning a blockstore as it opens large classes of
potential bugs that are otherwise avoidable.

Basically, it seems like CoreOS might replace Ubuntu as the default host
environment for the docker userbase, however the issue of supporting more
exotic environments is apparently not being tackled.

~~~
kanzure
> plus it's based on vagrant, which falls in to the class of IMHO misconstrued
> sysadmin-automation software I refer to as PFCTs (post-facto configuration
> tinkerers).

Huh? I thought the point of vagrant was to just boot up a known image. Then
you have veewee which can build an image suitable for vagrant from kickstart
or whatever. Can you elaborate on your concerns? Thanks!

~~~
contingencies
There is a fallacy in here...

 _Provisioners in Vagrant allow you to automatically install software, alter
configurations, and more on the machine as part of the vagrant up process.

This is useful since boxes typically aren't built perfectly for your use case.
Of course, if you want to just use vagrant ssh and install the software by
hand, that works. But by using the provisioning systems built-in to Vagrant,
it automates the process so that it is repeatable._

It's the last part of the final sentence. You can't prove something is
repeatable if it's potentially accessing the internet, date and time logic,
etc. Cloning an image is a lot more repeatable (and known quantity) than re-
massaging one in to existence.

I term the class of IMHO misdirected systems administration tools that attempt
the latter PFCTs (post-facto configuration tinkerers). Vagrant is on the
better end of these (at least it focuses on one-time instantiation versus
constant modification). Other tools such as puppet could be construed as
encouraging long term modification without a clean reference at all, thus
potentially resulting in configuration drift.

It seems clear to me that this class of tool grew somewhat organically from
classical systems administration and is not the most rigorous of approaches in
this era of cheap virtualization and instant cloud-based provisioning.

------
binarycrusader
(disclaimer: I'm a developer on the Image Packaging System project:
[https://java.net/projects/ips/](https://java.net/projects/ips/))

I think there's been some editorialising of the title as I don't see that the
author of OSTree claims that it's a "robust" solution (alone).

With that caveat, without filesystem snapshot support, OSTree really isn't a
complete solution (as the original author points out).

On Solaris 11+, there are generally two update scenarios for package upgrades:

1) pkg update [name1 name2 ...]

    
    
      no packages have new or updated items tagged with
        reboot-needed=true
      pkg will create a zfs snapshot
      pkg will create a backup boot environment
      perform update in place on live root
      if update fails, will exit and tell admin name
        of snapshot so they can revert to it if desired;
        will also print name of backup BE
      if update succeeds, will destroy snapshot and exit
    

2) pkg update [name1 name2 ...]

    
    
      one or more packages have new or updated items tagged
      with reboot-needed=true
      pkg will create a zfs snapshot
      pkg will clone the live boot environment
      pkg will perform update on clone
      if update fails, will tell admin name
        of clone BE so they can inspect it and exit
      if update succeeds, will activate clone BE,
        destroy snapshot, and exit telling admin name
        of new clone
    

Put another way, on Solaris 11+, the "default practice is the best practice."
This is the advantage of integrating the package system with the native
features the OS itself supports.

~~~
colinwalters
If you think OSTree _needs_ filesystem snapshot support, you aren't
understanding how it works. Or perhaps we don't have a shared definition of
"robust" \- for me, assuming correct behaviour at the filesystem and block
layer, I believe deployment changes to be atomic.

How do you determine reboot-needed=true? Is that something assigned by the
package developer statically (i.e. kernel is reboot-needed=true)? Determined
dynamically at update time (like in the package metadata for a specific
revision?)

Do you attempt to control for local configuration?

Is the X server package reboot-needed=true?

~~~
binarycrusader

      If you think OSTree needs filesystem snapshot support, you
      aren't understanding how it works. Or perhaps we don't
      have a shared definition of "robust" - for me, assuming
      correct behaviour at the filesystem and block layer, I
      believe deployment changes to be atomic.
    

While OSTree itself may not need filesystem snapshot support, for "robust OS
upgrades" (the original title of the ycombinator story) you do need some sort
of snapshotting mechanism. Whether that's in the form of ZFS-style snapshots,
or boot environments is up to the implementer.

Also, I'd argue that assuming correct operation at the filesystem and block
layer is a pretty big assumption :-) Even if the software itself works
correctly, time has proven that trusting the hardware is not a good thing.

With that said, yes, I think OSTree _needs_ filesystem snapshot support if
admins are to rely on it to provide "robust" OS upgrades.

    
    
      How do you determine reboot-needed=true? Is that
      something assigned by the package developer statically
      (i.e. kernel is reboot-needed=true)? Determined
      dynamically at update time (like in the package metadata
      for a specific revision?)
    

Currently, reboot-needed is determined by the package creator at the
individual item level by tagging individual "actions" in a package manifest.
It's a way for a package creator say "this component can't be safely updated
on a live system, so force the creation and use of a boot environment clone
when updating it."

Further refinement of this functionality is planned when applied to specific
types of actions found in a package manifest, such as "driver" actions. But it
will be handled by the package system transparently.

    
    
      Do you attempt to control for local configuration?
    

In what sense? pkg(5) allows an admin to force the creation and use of a clone
boot environment as part of any package-modifying operation.

If you're talking about service configuration, that's handled via SMF (the
service management facility) which has its own snapshot system that services
use.

If you're talking about legacy service configuration that uses flat files on-
disk, then package creators have the option of using service actuators to
trigger a restart or refresh of the service whenever the configuration file
changes.

You'd have to be more specific.

    
    
      Is the X server package reboot-needed=true?
    

No, because any files that the X server needs are already loaded into memory
so updates to the on-disk files won't generally affect it. Any changes made to
the X server binaries and libraries on-disk won't take effect until the X
server is restarted.

An incompatible change in the X server itself (such as a protocol change,
which is very rare) will likely come with a related kernel change which would
require a reboot. But again, the package creator is in control of that.

Also, the "reboot-neededed" tag doesn't force a reboot per se; it just forces
the update of a given component to be performed in a clone boot environment.

~~~
colinwalters
> With that said, yes, I think OSTree needs filesystem snapshot support

I'm happy with the fact that it allows admins to choose whatever they want at
the filesystem and block level, the same as dpkg/rpm do. For example in a
cloud environment, block level redundancy may be more easily provided at the
infrastructure level for guests.

For those who do need redundancy, you're free to choose BTRFS, XFS+LVM, or
hardware raid, whatever suits you.

As for the rest of your reply around the reboot-needed flag; it basically
sounds like it's fairly manual, but maybe that's "good enough". I have a
particular paranoia about race conditions though, so were I to design a system
that attempted to do live update applications, it'd be a whilelist, not a
blacklist as reboot-needed effectively is.

~~~
binarycrusader

      I'm happy with the fact that it allows admins to choose
      whatever they want at the filesystem and block level, the
      same as dpkg/rpm do. For example in a cloud environment,
      block level redundancy may be more easily provided at the
      infrastructure level for guests.
    

While it may provide admins with choice, it doesn't provide them with "robust
OS upgrades". Again, that's my only quibble here.

    
    
      For those who do need redundancy, you're free to choos
      BTRFS, XFS+LVM, or hardware raid, whatever suits you.
    

There's a large variance in those solutions that doesn't provide the same end-
user experience (or even result), and without integration with the package
system, doesn't really provide the sort of safety net most administrators are
looking for.

    
    
      As for the rest of your reply around the reboot-needed
      flag; it basically sounds like it's fairly manual, but
      maybe that's "good enough".
    

It's only "manual" in some sense for package creators; not administrators. For
administrators, it's a transparent decision about what's safe to update and
what's not.

As for actually being manual, not really. The package system provides package
creators with a tool called pkgmogrify, which allows them to set rules that
cause transformation of actions based on patterns at package publication time.
As an example, Solaris has a set of rules that say that, by default, any files
delivered to /kernel should be tagged with reboot-needed=true.

I won't claim it gets the OS 100% coverage, but as has been said before
"perfect is the enemy of good."

    
    
      I have a particular paranoia about race conditions though,
      so were I to design a system that attempted to do live
      update applications, it'd be a whilelist, not a blacklist
      as reboot-needed effectively is.
    

While I understand the paranoia, a whitelist is not actually a practical
option. The vast majority of content delivered onto a modern UNIX or UNIX-like
system doesn't require a reboot when being updated. man pages, header files,
and most userspace binaries and libraries can all be safely updated without a
reboot.

For example, on my desktop workstation, ~289,806 items have been installed by
the package system. Of those, only 857 have been determined to require a
reboot if they are updated.

That means that roughly 0.37% (yes, < 1%) of the items on my system have been
determined to require updates to be performed in a clone boot environment
(require a reboot).

For enterprise-level customers, minimising the number of reboots needed to
enact change is paramount. Any downtime at all can often cost them millions of
dollars.

------
aristidb
This would not be complete without a mention of Nix:
[http://nixos.org/](http://nixos.org/)

~~~
mst
Absolutely true; fortunately the article already provides one.

------
zokier
While I like the idea of atomic upgrades, the idea really should be made to
work without requiring a full reboot. The update procedure could be something
like:

1) updater makes a private snapshot of the filesystem

2) updater writes its changes to the private snapshot while the rest of the
system gets served by unchanged FS

3) updater stops affected services

4) the updated snapshot is swapped in (atomically)

5) services are restarted

~~~
wmf
A wise man once said there is no such thing as a bootable system, only systems
that have booted. Many OS updates will change something in the boot process
(maybe the kernel, maybe some init scripts, etc.) and you need to reboot to be
sure that stuff is still working. Likewise any change that would require the
user session to logout/login might as well force a full reboot.

~~~
contingencies
_Many OS updates will change something in the boot process (maybe the kernel,
maybe some init scripts, etc.) and you need to reboot to be sure that stuff is
still working_

That's precisely why a manual upgrade process should not exist for system
environments. Instead of live upgrades, you name, version, and test a newer
and known-good image of the environment with the applications required on it
on test infrastructure prior to final staging/deployment.

~~~
wmf
Keep in mind that some of this stuff is designed for near-zero-admin desktops,
not just the cloud.

~~~
contingencies
Fair point. Strange anyone would bother though, I always thought PXE was the
best desktop solution... maybe for often-offline nodes or laptops.

~~~
wmf
Sure, PXE imaging is great for enterprise desktops but people don't use it at
home.

------
breakall
How does this relate to what I've been reading about lately with docker and
"containers"?

~~~
shykes
Docker applies similar concepts (change management at the file and directory
level, a chrootable filesystem as the unit of delivery, atomic and revertable
deployments), except it applies them 1 level higher in the stack.

Instead of rebooting machines into a new filesystem, docker spawns processes
directly into them, using the sandboxing capabilities of the linux kernel. So
you get a much more powerful and flexible deployment mechanism. But of course,
you need a machine to exist in the first place, which docker isn't designed to
do.

So OSTree is a good companion for Docker, because you need a machine to run
docker. A good workflow is probably: pack the bare minimum on a physical
machine, using ostree. Then put everything else into docker containers. That's
the approach of new "just enough" distros like CoreOS.

~~~
colinwalters
Right, this is a pretty good summary (that docker is 1 level above).

One fundamental difference is that OSTree has a custom serialization format
for trees (inspired by git), whereas from what I can tell from the code,
docker is just tar (I assume whatever the host /usr/bin/tar serializes to).
For example, OSTree explicitly supports extended attributes, so it can support
SELinux (and SMACK). Fedora ships a patched tar but...the tar format is a
really serious mess.

I would further add though that OSTree does, providing the OS is compatible
with it, allow booting a separate "deployment" as a container. So for example
if you have Debian in /ostree/deploy/debian/90cd266 while you're booted into
/ostree/deploy/fedora/562d0a, you can easily just systemd-nspawn
/ostree/deploy/debian/90cd266 and boot the same OS as a container.

But the emphasis right now of OSTree is indeed on bare metal deployments, and
I'd like to push hard to integrate with package systems.

~~~
shykes
> _you can easily just systemd-nspawn /ostree/deploy/debian/90cd266 and boot
> the same OS as a container_

Perhaps we should be looking at integrating ostree and docker then? :)

> _I 'd like to push hard to integrate with package systems._

How would that integration work exactly?

~~~
colinwalters
> Perhaps we should be looking at integrating ostree and docker then? :)

It might make sense for docker to be able to store containers as OSTree
commits in addition to tarfiles. But I haven't used it myself. On the HTTP
side, OSTree may be less efficient or more efficient than what docker does on
the wire for updates; I don't know. Static deltas will help significantly.

> How would that integration work exactly?

This section describes it very briefly:
[https://people.gnome.org/~walters/ostree/doc/adapting-
packag...](https://people.gnome.org/~walters/ostree/doc/adapting-package-
manager.html)

Basically this is something you can do on a build server _or_ on a client.

------
v0land
From [https://people.gnome.org/~walters/ostree/doc/ostree-
package-...](https://people.gnome.org/~walters/ostree/doc/ostree-package-
comparison.html):

> OSTree only supports recording and deploying complete (bootable) filesystem
> trees. It has no built-in knowledge of how a given filesystem tree was
> generated or the origin of individual files, or dependencies, descriptions
> of individual components.

Um-m, isn't this what snapshots (e.g. LVM snapshots) are for? Correct me if
I'm wrong.

~~~
cbhl
What he's saying is that if you had (say) a FS image with nginx, mysql, and
rails installed on it, the system has no idea of what files belong to mysql
and what files belong to rails. So you wouldn't just be able to take said
image and tell OSTree "remove mysql and replace it postgresql".

Docker works in a really similar way, actually -- it creates file system
images and tars them up, and then uses AUFS to separate changes in each
container from the base image. Docker added metadata to describe how to build
images; if you want to change the components (AFAICT) the "right" way is to
change the metadata and then do a full rebuild.

~~~
shykes
> _Docker added metadata to describe how to build images; if you want to
> change the components (AFAICT) the "right" way is to change the metadata and
> then do a full rebuild._

Correct. And Docker implements caching of build layers, so you get the
_semantics_ of a full rebuild, but in practice are only rebuilding the
interesting parts.

------
jaybuff
Reminds me of [http://camlistore.org/](http://camlistore.org/)

