

Remus: possible high availability of apps through replicating virtual machines - ktom
http://www.usenix.org/events/nsdi08/tech/full_papers/cully/cully_html/index.html

======
glymor
Brendan Cully's masters thesis has more detail
[http://www.cs.ubc.ca/grads/resources/thesis/Nov07/Cully_Bren...](http://www.cs.ubc.ca/grads/resources/thesis/Nov07/Cully_Brendan.pdf)

This was also interesting: _"we believe that the high-frequency checkpointing
mechanism we have engineered in support of Remus will have many other
interesting applications, ranging from forensics and error recovery tools
based on replayable history to software engineering applications such as
concurrency-aware time-travelling debuggers."_

------
glymor
The problem with these systems is they don't know what is significant state so
they have to copy everything to the slave.

The way Remus gets round this is it bulk copies (upto 40 times a second)
rather than on every change. So the master runs slightly ahead.

Terracotta is something similar for the JVM. I think they get round it by
exploiting the fact the JVM knows what's going on so for example you could say
I want only this field on a class to be replicated. (But I've never used
terracotta so someone might have to correct me on that.)

------
hedgehog
I imagine the performance hit is pretty substantial but for things like VOIP
or messaging servers this will make real HA possible on commodity hardware.
Pretty cool.

------
jacquesm
someone should combine this with openmosix!

