Hacker News new | past | comments | ask | show | jobs | submit login
Firecracker: Start a VM in less than a second (jvns.ca)
381 points by ArmandGrillet 40 days ago | hide | past | favorite | 67 comments



Regarding performance, I wrote a bit about this a few months ago in a post comparing different workload isolation techniques:

https://fly.io/blog/sandboxing-and-workload-isolation/

Two useful links from my Pinboard research for that post:

A person at Red Hat optimizing QEMU boot time:

http://oirase.annexia.org/tmp/paper.pdf

An Intel deck talking about qemu-lite:

http://events17.linuxfoundation.org/sites/events/files/slide...

(The other thing to follow up on if you're interested in the background on this stuff is kvmtool).

In both cases, a big part of the answer seems to be eliminating BIOS overhead; getting rid of oproms appears to have been the single biggest win for Intel. But the Red Hat article also finds lots of overhead in QEMU itself, and both pieces talk about kernel config issues (for instance, scrubbing the kernel you boot of subsystems that have expensive initcalls).

By comparison: Firecracker is purpose-built in Rust for this one task, provides no BIOS, and offers only network, block, keyboard, and serial device support --- with tiny drivers (the serial support is less than 300 lines of code).


Regarding your first link I had questions about the following passages from it:

>"People like to say “Docker isn’t a security boundary”, but that’s not so true anymore, though it once was."

Could you explain why that wasn't true and why it is now?

>"Systems security people spent almost a decade dunking on Docker because of all the gaps in this simplified container model. But nobody really runs containers like this anymore."

Could you elaborate? People don't run container like what? It isn't clear from the preceding paragraph.


Not OP but most public cloud need stronger isolation than what Docker provides. GKE uses gVisor, AWS uses Firecracker now. When they started, lambda code was scheduled on a different EC2 instance per customer. VM isolation is just so much stronger. Firecracker hopes to make launching VMs as quick as launching containers while retaining the great isolation benefits of a VM.


> ...GKE uses gVisor

Another commentator on this thread points out that GKE uses trimmed-down Chromium OS? https://news.ycombinator.com/item?id=25884851

Edit: I guess container-optimized Chromium OS and gVisor are complementary (that is, you're welcome to run your app in gVisor sandboxed containers on a guest VM running Google's container-optimized OS). Ref: https://go-review.googlesource.com/c/playground/+/195983



Thanks. I remember mrkurt say fly.io evaluated gVisor but settled on Firecracker eventually, though, I must ask: can gVisor on its own achieve same level of protection as micro-VMs? Even if not, can the protection offered by application kernels like gVisor and Kata be considered enough for multi-tenant workloads (like the one fly.io runs) inside the same guest (VM)?

Also, if I may, in what cases would one prefer to use Firecracker managed micro-VM running gVisor sandboxed containers? I'd imagine gVisor slows things down, so it might not be for everyone, but I'm curious as to what value-add might make gVisor worth that.


That's a tough question to answer! I spent some time on it in the blog post, and a lot more time that I didn't write about hacking on little bits of gVisor while researching that post. My take is, no, I don't think gVisor is as secure as Firecracker; it's way more ambitious and has a larger attack surface. But I buy that it is substantially better than OS-sandboxed container runtimes, and I would probably trust it for multitenant workloads.

(Kata, as I understand it, is mostly about microvm containers; I like Firecracker almost as much for its implementation, which I think is pretty gorgeous, as for the microvm design choice, but I like 'em both fine I guess. Also, there's a Firecracker-based Kata that people use, right?)


Modern Docker running in default configurations is reasonably well locked down. I wouldn't use it for multitenant scenarios, though.


Can you elaborate why? Would love to know where it falls behind and how to fix these without going the Firecracker route.


With containers, both the kernel and the hypervisor are shared. With vms, only the hypervisor is shared.

It's a matter of having a smaller attack surface. There are plenty of container images that run with root access by default, which is almost full access to the kernel. This means that if the application running in the container is compromised, you need to rely on the kernel enforcing the sandbox between containers. This is a relatively new threat (root not being fully trusted), so beyond there simply being more attack surface, there's likely to be more bugs/vulns out there to be discovered. With effort and care you can safely run this but reducing attack surface is a good idea for defense in depth.


If one only allows container to run as a non-root user (no user namespace either) with all privileges dropped with strong mount isolation and some form of syscall filtering, then the attack surface is similar to that of hypervisors if not smaller while the performance is significantly better.

But yes, quite a few services assumes they have root privileges and do not work as is in such containers, like recent OpenSSH. For those cases VM isolation makes for much smaller attack surface.


The security models really aren't comparable. Again, see the blog post, which offers two examples of attacks that break the model you're proposing. I don't want to get into too much detail in this thread because I really just wanted to add some data to questions Julia Evans specifically asked in her post.


The blog post incorrectly states that Go is a memory-safe language. It is not in its standard configuration. And it does not mention that hypervisor bugs allowing to escape from a VM are typically due to wrong logic, not memory safety. Using memory-safe language does not protect from those. And that in turn is the reflection of complexity of hypervisor interfaces in modern CPU. And with VM one gets much greater exposure to hardware bugs as VM has access to more instructions.


Homer-into-the-hedges.gif.


The blog post linked upthread goes into some detail about why; answering that question is kind of the point of the post.


Do you know libkrun? github.com/containers/libkrun

"libkrun is a dynamic library that allows programs to easily acquire the ability to run processes in a partially isolated environment using KVM Virtualization."


Another incredible blog post from Julia Evans.

Julia if you're reading, I'm a big fan. One request: could you add the date/time to the post, preferably near the title?

I do see a `<time>` tag in the HTML, but it doesn't render any human readable text. The datetime is also part of the URL which is "good enough," but it always takes me a minute to remember that and Ctrl+F won't find it.

I'm a little ADD about knowing when things were published, so far from an average person. If you like it the way it is, then don't worry about me. I just wanted to throw it out there.


+1 you're not alone; seconding the request -- and esp the kudos/praise for Julia, one of my "secret heroes"


On reading this, I went to see if the time tag renders in reader mode, assuming that’s part of why it’s there, but at least on my mobile Safari it surprisingly does not.

I’ll second the request. And also second that I’m a big fan of Julia Evans.


Totally agree but in the meantime, I just noticed that she has the dates on the main page: https://jvns.ca/


+1 on the ADD about timestamps :) and a +1 for Julia. Following her on Twitter and she's awesome!


>Another incredible blog post from Julia Evans.

Same same, everything grounded and then dug deep.


The blog post doesn't mention it but Firecracker was originally based on crosvm [0] built at Google by the ChromeOS team for a WSL-like Linux sandbox on top of ChromeOS running debian-buster containers viz. penguin (afaik) in a gentoo-based VM viz. termina [1]. crosvm inturn is part of a much bigger crostini project [2], which I find to be super fascinating, as it supports UI workloads (over Wayland and X).

If you're using ChromeOS to run any Linux app, you're using crostini which launches those apps in crosvm managed sandboxes (Container inside a VM) in seconds. In not so distant future, it looks like Android will sandbox platform workloads (running outside the Android framework?) managed by crosvm, to considerably improve security [3].

I looked for but couldn't find ChromeOS GCP instances. I mean, ChromeOS might be a great platform to run multi-tenant server workloads at this point.

[0] https://archive.is/T1ZNJ

[1] One can run custom Linux containers (other than the debian-based penguin) but not VMs (other than termina) at this point: https://chromeos.dev/en/linux/linux-on-chromeos-faq

[2] https://chromium.googlesource.com/chromiumos/docs/+/master/c...

[3] https://lwn.net/Articles/836693/


> ... launches those apps in crosvm sandboxes in seconds.

Technically true, but the initial VM launch is not all that quick. E.g., on a Pixelbook i7 launching a terminal session without the VM started, it takes about 20 seconds for the VM itself to initiate, and another 30 seconds for things like volume mounting and starting the container running Debian, for a total of 50 seconds.

Subsequent launches once the VM is started are much quicker, just a few seconds.

> I looked for but couldn't find ChromeOS GCP instances

You're looking for Container-Optimize OS: https://cloud.google.com/container-optimized-os/ , which is Chromium OS based.

It's the default OS for e.g. GKE nodes, so Google probably runs pretty large numbers of them.


Android has been increasing the amount of sanboxing around security critical projects, so this might be the next step.

https://source.android.com/security/enhancements


> ... so this might be the next step.

Absolutely.

When I first heard about KVM on Android, I thought Android was going to run every app in its own micro-VM managed by crosvm... But that'd be too resource intensive for mobile devices, I think?

What they are instead doing with this project led by Will Deacon is more towards isolating non-Android workloads that OEMs run (like Radios): https://news.ycombinator.com/item?id=10905643, https://news.ycombinator.com/item?id=24109856, https://news.ycombinator.com/item?id=14859602?


The whole Java vs Android Java vs Kotlin puts me off, and how hard it is to be productive with the NDK (vs other platforms).

However, I really like all the security efforts they put into it (which is also a reason why the NDK is so constrained).

Thanks for the links.


And for some reason Crostini still isn't available on the majority of Chromebooks.


Even on mine it keeps breaking!


You can use QEmu snapshots to start a VM in under a second (but more than 125ms).

I used this in 2009 for an IRC bot that safely evaluated arbitrary shell commands for demonstration purposes, and a fork thereof is still chugging along to this day.

https://github.com/geirha/shbot


QEMU can start a VM quite quickly:

https://asciinema.org/a/jlTamarDTOVpO9wYAjXcnNZZt

For a generic build like you'll find in a Linux distro, QEMU startup is slow mostly because it links to dozens of shared libraries (103 on Fedora 33).


I wrote a similar post some weeks ago, going to similar deepths to script launching a VM with Firecracker.

My main goal then was to provide the necessary automation to help using cloud images for the VMs, so you can easily leverage a wide array of existing images. Most of the credit is due to cloud-init, which helps automate instance configuration after boot.

"Automation to run VMs based on vanilla Cloud Images on Firecracker": https://ongres.com/blog/automation-to-run-vms-based-on-vanil...


I just got this working in windows preview build too. If you enable KVM and rebuild the WSL2 kernel, then you can follow the Linux firecracker demo on github step by step and it works. I was able to launch 400 concurrent firecracker VMs on my laptop in 60 sec.


60s total or each? I haven't tried WSL2 but I had to help people with _really_ bad docker performance running in WSL (php page load took around 60s)


Total. And it was like the first 200 were in 15 to 20 sec.


> Firecracker can start a VM in less than a second!

So this makes it comparable to containers when speed in question.

Anybody using this instead typical VM's in production (and not being Amazon) ?


Yep. At Fly.io, we run customer containers on our own hardware around the world --- the normal workflow just pushes Docker containers to our registry --- by converting them into root filesystems and running them as Firecracker instances.


What exactly is involved in converting them to root filesystems?


Have a look at: https://stackoverflow.com/questions/23436613/how-can-i-conve...

Or "slim": https://news.ycombinator.com/item?id=20182141

Typically, a docker filesystem doesn't include a proper unit (docker will inject "tiny init") - so you would probably have to add a kernel, and init, somehow.

I think I'd prefer to just create a vm, rather than re-use the pre-built docker.


Ed: a proper "init" not "unit"


The linked article contains a script building the file system.

It creates a block device using `qemu-img`, adds an empty filesystem using `mkfs.ext4` and then simply mounts it and copies in the files.

The previous posts cover this topic in more detail:

Day 43: Building VM images - https://jvns.ca/blog/2021/01/21/day-43--building-vm-images/

Day 44: Building my VMs with Docker - https://jvns.ca/blog/2021/01/22/day-44--got-some-vms-to-star...


It may not be what fly.io does but there's a walkthrough of the basic idea here https://iximiuz.com/en/posts/from-docker-container-to-bootab...


It's not a big secret or anything but it's changed recently and Jerome would do a better job of describing it than I would; apart from the filesystem stuff, we have an init we wrote in Rust that does a bunch of the lifting.


I’d love to know more about this as well. That’s the part usually nobody talks about.


We use containerd with the devmapper snapshotter! Works nicely.

We create a hard link to the resulting device for the root drive inside firecracker.


May I ask why you use firecracker, especially when you already have Docker images in your registry? Do you need root and/or a kernel in your containers for your application?


Because of multitenancy. It isn't safe to run jobs from different customers alongside each other in namespaced OS containers; instead, we give every customer instance its own VM and its own OS. This is the same model that Lambda and Fargate use (of course, that's what Amazon implemented Firecracker for).

I linked to a blog post we wrote about the rationales here, upthread.


Cool, thanks much. I'll find the link.


What is the difference between Firecracker vs LXC/LXD?


Linux containers are containers, not VMs. They are more like docker (although, lxd/lxc typically are used more like jails/VMs - a "full" user land, rather than just an application binary, like with a docker container wrapping a service implemented in go).

Technically, docker/lxc uses kernel namespaces to isolate a process tree - firecracker starts up a virtual machine.


When a VM context switch happens, the CPU uses extensions like Intel VMX to isolate the virtual machine code from the host code. Usually the hypervisor also forces a cache flush to mitigate CPU vulnerabilities as well.


VMs vs containers. One uses KVM under the hood the other uses cgroups. Btw. you can run Firecracker VMs with container isolation on the top.

https://github.com/firecracker-microvm/firecracker/blob/mast...


Surprise I have not yet encountered firecracker until now with your post. Thanks for sharing.


Great post! :D Actually, Firecracker is so cool that we've used it to develop our own native alternative to Kubernetes based on running application containers as microVMs... https://opennebula.io/opennebula-kubernetes-comparing-two-co...


Re: if there’s something similar for macOS, it’s built-in. The new Virtualization framework in macOS 11 (Big Sur) is very very quick. User space with support for Virtio spec.


I read somewhere (?) that firecracker has poorer io performance than KVM/qemu. Is that still the case, or has it been fixed?


You can read about its IO performance against QEMU in their NSDI paper (table 8,9): https://www.usenix.org/system/files/nsdi20-paper-agache.pdf

Even though the startup times are fast, the IO performance is poor compared to native. This is primarily because of IO emulation. We've been working on a new hypervisor that can directly run isolated containers (no VMs). Email me if you are interested in learning more.


Hi can you post more info on this new hypervisor publicly? Who's involved?


We will post more publicly soon.


I am not so sure about that. Amazon uses Firecracker because of its performance and how lightweight it is. Qemu VMs used to be heavier and I think Qemu devs started a project to have a lightweight version like Firecracker recently.


Are VM and containers different in this context? I’m confused where Firecracker would be better than Docker for this usage.


Firecracker would be better from the security isolation point of view. It can use everything that Docker has to offer in the security domain and some more.

https://github.com/firecracker-microvm/firecracker/blob/mast...


This is where I believe we should have invested our efforts as an industry: making VMs faster, not designing and adopting containers.

If we can get VMs to launch even remotely as fast as containers, that possibly means the end of containers due to VMs offering far superior isolation.


I probably agree about container runtimes, but containers as a specification for a preconfigured unit of compute have lots of value outside the runtimes. We don't use namespaced containers at all at Fly.io, and run everything inside of jailed Firecrackers. But we get enormous value out of container tooling, and the most common and simplest way of deploying something to Fly is simply to have our tooling push a Docker image over to Fly's repo.

I think containers have been largely a force for good. They're not as much a force for good as some might claim, but that's a different argument than that they were a waste of time.


Has anyone know that Firecracker can access GPU ?


not currently, though there's some people working on that.

https://github.com/firecracker-microvm/firecracker/issues/11...


Anyone evaluated this for io (disk) performance? Any different from kvm?




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: