

Dynamic software updating in C - billconan
http://kitsune-dsu.com/

======
chwahoo
This was work done as part of my phd thesis. You can take the code and samples
that are here [https://github.com/kitsune-dsu](https://github.com/kitsune-dsu)
and play with them. However, I wouldn't consider what has been released ready
for production use. The papers were the main product of this research and are
your best resource if you're interested:
[http://www.cs.umd.edu/~hayden/papers/kitsune-
draft.pdf](http://www.cs.umd.edu/~hayden/papers/kitsune-draft.pdf)

I moved on (graduated!) from the project in 2012 and the code that has been
released is pretty much where I left it at that time. An undergrad
collaborator was doing neat work on updating Tor, so that code continued to
evolve a bit after I left.

I think there may still be folks at UMD working in some ways with Kitsune, but
I'm not up on the details.

~~~
billconan
I was thinking using this to update our graphics software, but then I realized
I need to backup GPU states (textures, for example) and recover them before
and after the update.

lots of the servers are now using gpus for deep learning, may be adding gpu
support to the framework is a good feature.

~~~
mwhicks1
If such states are preserved while the process that used them is still running
then there is nothing to do: They will still be available to the updated
program.

~~~
billconan
I was thinking about memory mapped resources, like memory mapped io. and
resource references.

maybe you are right, there might be no problem. I need to read the paper to
understand better.

~~~
mwhicks1
Yes, memory mappings should be preserved, so I can't immediately think why it
wouldn't work.

------
jeffinhat
Seems interesting! It would be nice to have a diff between the base version
and patch-ready version of the various software (eg, redis). It's hard to
quickly understand what the integration of this into an large, existing piece
of software would be.

That being said, I haven't read the paper so it may be clear as day there.

~~~
chwahoo
The paper describes the main changes that you have to make to make your
program updatable and is the best resource.

You can find all the modified programs (redis, etc) here:
[https://github.com/kitsune-dsu](https://github.com/kitsune-dsu)

~~~
mwhicks1
To add: The number of changes tends to be very small, as reported in the
paper. We are talking 100-300 LOC even for applications that are 100 KLOC. And
these changes are robust in the sense that once you retrofit to include them,
you rarely need to make further changes of that sort -- new versions will just
work.

------
patagonicus
Ah, Kitsune. I studied the paper for that as well as Rubah, a DSU for Java,
quite a bit while working on my bachelor's thesis - which was implementing DSU
in pure Java. I was pretty suprised with how little code you needed to make it
work (for the one program I was testing on at least ...).

Unfortunately due to some failed prototypes and general lack of time I barely
managed to get the basics working, there are little to no tests and I'm pretty
sure when I handed everything in I knew a couple of critical bugs that hadn't
been fixed.

The code isn't public right now, but if there's interest I may be able to make
it open source. Just have to talk to a few people first.

------
kozukumi
Very cool, this is one of the reasons I love Erlang and Java. Hot-patching is
so simple.

~~~
paulasmuth
Honest question; How can you even do this (replace the running binary/java
code without closing file descriptors, etc) in plain java/on the jvm? [
Without resorting to tricks where there is some trampoline code that actually
does the connection handling that never gets updated. I realize that's kind-of
what the authors of the linked paper are doing too, but it's certainly
possible to do this "the old fashioned way" without such hacks in native code
on linux]

~~~
paulasmuth
To answer my own question here, but without being a java expert, so talking
out of ignorance probably. This paper [1] argues there are only two methods to
do this on the stock JVM. One is putting in some trampoline code, the other
one is patching the JVM. So it looks like the stock JVM doesn't actually
support restarting the program while keeping the old heap and FDs etc around.

[1]
[http://www.cs.umd.edu/~mwh/papers/rubah.pdf](http://www.cs.umd.edu/~mwh/papers/rubah.pdf)

~~~
mwhicks1
I was just about to point you to this paper (I'm a co-author), but you beat me
to it. The JVM has a fix-and-continue updating feature that perhaps replacing
method bodies as long as (a) the method is not running, and (b) it has the
same type as before. This approach is limiting and also slow, which directed
us to use bytecode transformation. The Rubah implementation is pretty robust
at this point, at least for research software. Your reference to "trampoline
code" should probably clarify: You just need ways to "restart" your threads,
as you do in Kitsune. You don't have to insert trampolines for every updated
method, as some prior approaches require.

~~~
paulasmuth
Thanks for the clarification! I understand that you got rid of the
"trampoline"/dispatch code for calling methods by rewriting the respective
bytecode at runtime. But you still need to register file descriptors and such
with some rubah runtime code, and that piece of rubah runtime code doesn't get
replaced on process upgrade, right?

I guess this really doesn't matter from the user's point of view, just for the
sake of curiosity, want to work out that/if there seems to be a fundamental
difference (FWIW) between what rubah is doing which I understand as "hot patch
the code within the running process and have some kind of
'trampoline'/dispatch code within that process to switch between new and old
version, regardless if it is actual explicit code or just implicit in the
rewritten bytecode, but never actually leave that original process" vs. what
you'd do in native code on linux which is more like "tell the kernel to start
a proper new process and then pass all open fd's and programm state to the new
process" (which obviously requires that process to be written in a way that
allows it to "continue where it left off") --- and the latter is not possible
to do when running on the stock jvm if I understand correctly, because the jvm
doesn't allow you to implement the part where you pass the filedescriptors,
right?

~~~
mwhicks1
There's no need to register file descriptors, since these stay open during the
upgrade -- the process identity doesn't change.

As far as a fundamental difference with the approach you propose: We actually
tried it with an earlier system called Ekiden. We found that updating by
migration was slower and more cumbersome than updating in place, but I don't
believe there were fundamental issues. If you look at the related work section
of the Kitsune paper you'll see a nice comparison with all known approaches,
as well as Ekiden.

