Hacker News new | past | comments | ask | show | jobs | submit login

If you use Terminal on top of AWS (one deployment option) we can just migrate your workloads without rebooting.

The way it works is that you read the RAM pages from one machine to another in real time and when the RAM cache is almost synchronized you slam the IP address over to the new box (and then you let Amazon reboot your old box and then migrate back post-upgrade if you want to).

You can try it out on our public cloud at terminal.com if you'd like to (we auto-migrate all of our customers off of the degrading hardware before it reboots on our public cloud, but you can control that if you're running terminal as your infrastructure).

... how?? That is seriously nifty.

Are you migrating just a process tree / other contained environment, or the entire machine?

Are you using CRIU or similar? Do open TCP connections survive the transfer?

We wrote a bunch of hacks to the linux kernel to do it.

Custom container implementation, custom networking, custom storage.

It's just really good hardcore kernel engineering.

If you wanna talk more and you're in SF, come to our meetup on the 10th: machinelearningsf.eventbrite.com.

Edit: the whole machine including RAM cache, CPU instructions, IP connections, etc. is carried over. We can also resize your machine in seconds while it's running.

Is this somehow different to Xen Live Migration/VMware Vmotion/etc?

Yes. VMWare VMotion and Xen Live Migration are both VM migration tools, not containers.

The difference is subtle, but important. VMs have overhead because of virtualizing the kernel, Containers don't (or rather containers benefit from kernel performance much more than VMs).

In other words, you can achieve the same thing with VMotion, but it's slower and more overhead and harder to manage.

Ah I didn't even know you're a container based shop. So you're moving live containers between aws provided xen vms.

Or on Bare-metal. It works just the same on Xen but slower because of the VM overhead.

Containers are really only part of the solution, there's a lot of other things you have to think about if you want to build a better mousetrap in the virtualization world (like networking and storage).

Wow, that's pretty sweet. Any plans for trying to get that upstream?

Don't know that we have plans around that at this time. I'll try to dig and see if I can flesh out our story around this.

With TCP_REPAIR, presumably they could...but both ends need to implement the REPAIR option I think, so maybe not in practice yet.

Or if the SDN of your cloud is good enough, even TCP_REPAIR might not be needed!

It's a custom SDN layer we wrote.

Are you running your VMs inside Amazon VMs? Or are you running containers instead, to avoid the overhead of having 3 nested OSs (the Xen host > the Amazon Xen guest > your VM)? If you run containers, how do you guarantee isolation of tenants (it is generally considered to be very difficult to achieve)?

We are running a custom container implementation. The goal of our implementation is containers that perform like VMWare.

Process isolation is hard, but we've achieved it. We currently have some tens of thousands of users on our public cloud with zero container breakout, and while no security is perfect, we're constantly trying to improve our offering through White Hat bounties and constant security testing. In this case, I can tell you heuristics with which you can infer security, but I can't blanket label something as secure. I would say I think it's the most secure new virtualization tech, but I would also note that's a matter of personal opinion. Again, zero container breakout is probably the main point.

You can run our virtualization inside of Amazon, in which case you only really have the pain of Xen host + Amazon Xen, but it performs faster on bare-metal (as one might expect).

Interesting! Thanks for the clarification.

How does it compare with OpenVZ, which is also able to live migrate containers?

I guess I have to use the kernel provided by you and cannot use a kernel of my choice?

Isn't your ability to "migrate workloads without rebooting" similar to Google Compute Engine transparent maintenance and to the live-update capability that Amazon is progressively deploying (which is explained in the post)?

How is it different from Xen or KVM live migration?

It's much faster and doesn't use VMs.

I guess it's much faster because you migrate containers instead of full VMs? (and also because your implementation is probably very good!)

It's faster because we made it fast. What's the difference between migrating a container with network, RAM, CPU and disk state and migrating a VM? IMHO, the difference is that the VM has a massive performance penalty and the container implementation does not.

It's not enough to just migrate the container, you also need to migrate the network and the storage too. That's actually the harder part, IMHO. Everyone forgets about network and storage until it's time to go into production and then it gets hard.

I don't see anything on your web page about running on top of AWS...? It looks like you guys only run your own cloud. Can you point me at some docs or anything about running on AWS?

I don't have docs yet because I haven't written them, but it's running on AWS right now.

It runs inside of any hypervisor or on bare metal.

Feel free to email me at josh[at]terminal[dot]com if you want to talk more. I can peel back the kimono quite far (we're also in SF if you wanna meet up).

Neat! I'll probably take you up on that. :-)

I was definitely impressed by the pulldocker err... now pullcontainer project and think it'd be great to see how you secure your containers and handle networking.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact