
Checkpoint and restore Docker containers with CRIU - kimh
http://blog.circleci.com/checkpoint-and-restore-docker-container-with-criu/
======
616c
A very good interview with one of the CRIU devs on FLOSS Weekly.

[https://twit.tv/show/floss-weekly/334](https://twit.tv/show/floss-weekly/334)

[http://www.youtube.com/watch?v=w5ftqjOrpfA](http://www.youtube.com/watch?v=w5ftqjOrpfA)

Some of this stuff is crazy. Moving processes is cool, but faking TCP/IP state
between containers to basically transfer a TCP/IP stream from one machine to
the other with client service devices noticing? INSANE.

I am a Linux enthusiast and very novice *nix sysadmin. It is worth the listen,
even if you are not way into containerization, because what they have
accomplished is impressive.

~~~
vezzy-fnord
EROS and KeyKOS accomplished the same OS-wide some 25 years ago, though they
were also built on a completely different process model. Grasshopper [1] was
another system from the same time period built around orthogonal persistence.

Bolting checkpointing on top of a monolithic Unix is harder, though doable.
DragonFly BSD has had a rather elegant and simple mechanism called
sys_checkpoint(2) since 2003, but is still limited to single-threaded
programs.

[1] [http://www-systems.cs.st-andrews.ac.uk/gh/](http://www-systems.cs.st-
andrews.ac.uk/gh/)

~~~
616c
> Bolting checkpointing on top of a monolithic Unix is harder

Indeed. In the interview this comes up, as they wanted their patches mainlined
into Linux development and Linus, and unusually so given the joking about his
outbursts, said they were insane or going for the impossible and passed.

------
purplezky
* Docker fork with CRIU support Known Issues:

Currently, networking is broken in this PR.

Although it's implemented at the libcontainer level, the method used no longer
works since the introduction of libnetwork.

There are likely several networking related issues to work out, like:

\- ensuring IPs are reserved across daemon restarts

\- ensuring port maps are reserved

\- deciding how to deal with network resources in the "new container" model

------
marcosnils
I've made a talk about this a couple of months ago for a local meetup in
argentina:
[https://www.youtube.com/watch?v=0ZKE5nJFDJ0](https://www.youtube.com/watch?v=0ZKE5nJFDJ0)

I also would like to share something I did for the Docker global hackday:
[https://github.com/marcosnils/cmt](https://github.com/marcosnils/cmt).

------
kimh
Btw, the container migration will be supported by runC natively:
[https://github.com/opencontainers/runc/tree/master/libcontai...](https://github.com/opencontainers/runc/tree/master/libcontainer#checkpoint
--restore)

------
alfonsodev
What are the posible practical uses of this technology ? I was thinking that
could be useful to get "snapshots" of a system when a fatal exception is
raised, then download the files and practice a forensic analysis. Maybe it's
too much overhead ? other uses?

~~~
laurencerowe
Being able to use AWS spot instances for long running batch processes would be
great. We have some bioinformatics analysis jobs that take days to complete
and this could cut their cost by ~80%.

~~~
marcosnils
I'm interested about your use case, can you ping me at @marcosnils?

Thx!.

