> I spoke to the GitHub Actions engineering team, who told me that using an ephemeral VM and an immutable OS image would solve the concerns.
that doesn't solve them all. the main problem is secrets. if a job has access to an api token that can be used to modify your code or access a cloud service, a PR can abuse that to modify things it shouldn't. a second problem is even if you don't have secrets exposed, a PR can run a crypto miner, wasting your money. finally, a self-hosted runner is a step into your private network and can be used for attacks, which firecracker can help mitigate but never eliminate.
the best solution to these problems is 1) don't allow repos to trigger your CI unless the user is trusted or the change has been reviewed, 2) always use least privilege and zero-trust for all access (yes even for dev services), 3) add basic constraints by default on all jobs running to prevent misuse, and then finally 4) provide strong isolation in addition to ephemeral environments.
Facebook had this exact problem recently with the pytorch repo. Their self hosted CI runners would run on all PRs, and it could leak all sorts of stuff.
It’s written specifically to host the Linux kernel, and doesn’t use a bios or a boot loader. If you backported that into another hypervisor, it would probably have to be something like “are we loading a compatible Linux? If so switch to Firecracker mode”. But of course you can do that yourself, with a small shell script that either starts the traditional VM or Firecracker.
> The recommended way to trigger a guest-initiated shut down is by generating a triple-fault, which will cause the VM to initiate a reboot
Doesn’t that mean it can’t distinguish an intentional triple fault to trigger reboot from an accidental triple fault caused by a guest kernel bug which corrupts the IDT? I think it would be better if there was some kind of call the guest could make to the hypervisor to reboot-one is less likely to invoke that service by accident than to triple fault by accident.
I'm used to QEMU VMs being slow and annoying to work with due to them being full VMs, so I was quite surprised to see that this is really just as fast as Firecracker!
Hi, I'd not heard of webapp.io before so thanks for mentioning it.
Actuated is not a preview branch product, that's an interesting area but not the problem we're trying to solve.
actuated is not trying to be a CI system or a replacement for one like webapp.
It's a direct integration with GitHub Actions, and as we get interest from pilot customer for GitLab etc, we'll consider adding support for those platforms too.
Unopinionated, without lock-in. We want to create the hosted experience, with safety and speed built in.
Hey, yeah this looks somewhat similar to what we're building at https://webapp.io (nee LayerCI, YC S20)
We migrated to a fork of firecracker, but we're a fully hosted product that doesn't directly interact with GHA at all (similar to how CircleCI works), so there's some positioning difference between us and OP at the very least.
Something I've increasingly wondered is if the model of CI where a totally pristine container (or VM) gets spun on each change for each test set imposes an floor on how fast CI can run.
Each job will always have to run a clone, always pay the cost of either bootstrapping a toolchain or download a giant container with the toolchain, and always have to download a big remote cache.
If I had infinity time, I'd build a CI system that found a runner that maintained some state (gasp!) about the build and went to a test runner that had most of its local build cache downloaded, source code cloned, and toolchain bootstrapped.
You'd love a service like that, until you have some weird stuff working in CI but not in local (or vice-versa), that's why things are built from scratch all the time, to prevent any such issues from happening.
Npm was (still is?) famously bad at installing dependencies, where sometimes the fix is to remove node_modules and simply reinstalling. Back when npm was more brittle (yes, possible) it was nearly impossible to maintain caches of node_modules directories, as they ended up being different than if you reinstalled with no existing node_modules directory.
I think Nix could be leveraged to resolve this. If the dependencies aren't perfectly matched it downloads the _different_ dependencies, but can use any locally downloaded instances already.
So infra concerns are identical. Remove any state your application itself uses (clean slate, like a local DB), but your VM can functionally be persistent (perhaps you shut it off when not in use to reduce spend)?
I mean, given that my full build takes hours but my incremental build takes seconds--and given that my build system itself tends to only mess up the incremental build a few times a year (and mostly in ways I can predict), I'd totally be OK with "correctness once a day" or "correctness on demand" in exchange for having the CI feel like something that I can use constantly. It isn't like I am locally developing or testing with "correctness each and every time", no matter how cool that sounds: I'd get nothing done!
This really depends a lot on context and there's no right or wrong answer here.
If you're working on something safety critical you'll want correctness every time. For most things short of that it's a trade-off between risk, time, and money—each of which can be fungible depending on context.
A small change in a dependency, essentially, bubbles or chains to all dependent steps. I.e., a change in the fizzbuzz source but inherently run the fizzbuzz tests. This cascades into your integration tests — we must run the integration tests that include fizzbuzz … but those now need all the other components involved; so, that sort of bubbles or chains to all reverse dependencies (i.e., we need to build the bazqux service, since it is in the integration test with fizzbuzz…) and now I'm building a large portion of my dependency graph.
And in practice, to keep the logic in CI reasonably simple … the answer is "build it all".
(If I had better content-aware builds, I could cache them: I could say, ah, bazqux's source hashes to $X, and we already have a build for that hash, excellent. In practice, this is really hard. If all of bazqux was limited to some subtree, but inevitably one file decides to include some source from outside the spiritual root of bazqux, and now bazqux's hash is "the entire tree", which by definition we've never built.)
I work in games, our repository is ~100GB (20m download) and a clean compile is 2 hours on a 16 core machine with 32GB ram (c6i.4xlarge for any Aws friends). Actually building a runnable version of the game takes two clean compiles (one editor and one client) plus an asset processing task that takes about another 2 hours clean.
Our toolchain install takes about 30 minutes (although that includes making a snapshot of the EBS volume to make an AMI out of).
That's ~7 hours for a clean build.
We have a somewhat better system than this - our base ami contains the entire toolchain, and we do an initial clone on the ami to get the bulk of the download done too. We store all the intermediates on a separate drive and we just mount it, build incrementally and unmount again. Sometimes we end up with duplicated work but overall it works pretty well. Our full builds are down from 7 hours (in theory) to about 30 minutes, including artifact deployments.
This is how CI systems have always behaved traditionally. Just install a Jenkins agent on any computer/VM and it will maintain persistent workspace on disk for each job to reuse in incremental builds. There are countless other tools that work in the same way. This also solves the problem of isolating builds if your ci only checks out the code and then launches a constrained docker container executing the build. This can easily be extended to use persistent network disks and scaled up workers, but is usually not worth the cost.
It's baffling to see this new trend of yaml actions running in pristine workers, redownloading the whole npm-universe from scratch on every change, birthing hundreds of startups trying to "solve" CI by presenting solutions to non-problems and then wrapping things in even more layers of lock-in and micro-VMs and detaching yourself from the integration.
While Jenkins might not be the best tool in the world, the industry needs a wake-up shower on how to simplify and keep in touch with reality, not hidden behind layers of SaaS-abstractions.
Agreed, this is more or less the inspiration behind Depot (https://depot.dev). Today it builds Docker images with this philosophy, but we'll be expanding to other more general inputs as well. Builds get routed to runner instances pre-configured to build as fast as possible, with local SSD cache and pre-installed toolchains, but without needing to set up any of that orchestration yourself.
- Watch which files are read (at the OS level) during each step, and snapshot the entire RAM/disk state of the MicroVM
- When you next push, just skip ahead to the latest snapshot
In practice this makes a generalized version of "cache keys" where you can snapshot the VM as it builds, and then restore the most appropriate snapshot for any given change.
I have zero experience with bazel, but I believe it offers the possibility of mechanisms similar to this? Or a mechanism that makes this "somewhat safe"?
Yes it does, but one should be warned that adopting Bazel isn't the lightest decision to make. But yeah, the CI experience is one of its best attributes.
We are using Bazel with Github self-hosted runners, and have consistent low build times with a growing codebase and test suite, as Bazel will only re-build and re-test what needs to be changed.
The CI experience compared to e.g. doing naive caching of some directories with Github managed runners is amazing, and it's probably the most reliable build/test setup I've had. The most common failure we have of the build system itself (which is still rare with ~once a week) is network issues with one of the package managers, rather than quirks introduced by one of the engineers (and there would be a straightforward path towards preventing those failures, we just haven't bothered to set that up yet).
> Each job will always have to run a clone, always pay the cost of either bootstrapping a toolchain or download a giant container with the toolchain, and always have to download a big remote cache.
Couldn’t this be addressed if every node had a local caching proxy server container/VM, and all the other containers/VMs on the node used it for Git checkouts, image/package downloads, etc?
I'm using buildkite - which lets me run the workers myself. These are long-lived Ubuntu systems setup with the same code we use on dev and production running all the same software dependencies. Tests are fast and it works pretty nice.
Self-hosted runners are brilliant, but have a poor security model for running containers or building them within a job. Whilst we're focusing on GitHub Actions at the moment, the same problems exist for GitLab CI, Drone, Bitbucket and Azure DevOps. We explain why in the FAQ (link in the post).
That is a benefit over DIND and socket sharing, however it doesn't allow for running containers or K8s itself within a job. Any tooling that depends on running "docker" (the CLI) will also break or need adapting.
Good article. Firecracker is something that has definitely piqued my interest when it comes to quickly spinning up a throwaway environment to use for either development or CI. I run a CI platform [1], which currently uses QEMU for the build environments (Docker is also supported but currently disabled on the hosted offering), startup times are ok, but having a boot time of 1-2s is definitely highly appealing. I will have to investigate Firecracker further to see if I could incorporate this into what I'm doing.
Julia Evans has also written about Firecracker in the past too [2][3].
Thanks for commenting, and your product looks cool btw.
Yeah a lot of people have talked about Firecracker in the past, that's why I focus on the pain and the problem being solved. The tech is cool, but it's not the only thing that matters.
People need to know that there are better alternatives to sharing a docker socket or using DIND with K8s runners.
Firecracker is nice but still very limited to what it can do.
My gripe with all CI systems is that an an industry standard we've universally sacrificed performance for hermeticity and re-entrancy, even when it doesn't really gives us a practical advantage. Downloading and re-running containers and vms, endlessly checking out code, installing deps over and over is just a waste of time, even with caching, COW, and other optimizations.
> My gripe with all CI systems is that an an industry standard we've universally sacrificed performance for hermeticity and re-entrancy, even when it doesn't really gives us a practical advantage.
The perceived practical advantage is the incremental confidence that the thing you built won't blow up in production.
> even with caching, COW, and other optimizations
Many CI systems do employ caching. For example, Circle.
Hermeticity is precisely what allows you to avoid endlessly downloading and building the same dependencies. Without hermeticity you can't rely on caching.
I feel like 90% of the computer industry is ignoring the lessons of Bazel and is probably going to wake up in 10 years and go "ooooooh, that's how we should have been doing it".
I think everyone agrees that the Bazel/Nix approach is correct, the problem is that Bazel/Nix/etc are insanely hard to use. For example, I spent a good chunk of last weekend trying to get Bazel to build a multiarch Go image, and I couldn't figure it out. Someone needs to figure out how to polish Bazel/Nix so they're viable for organizations that can't invest in a team to operate and provide guidance on Bazel/Nix/etc.
I’ve used Pants professionally and that was possibly the worst of the three in my experience. Support across build tools varies by language, but I didn’t get the impression that Pants was head-and-shoulders above other tools for any language ecosystem.
Can you elaborate on some of the lessons of Bazel? I've only just heard of it recently, and while I'm intrigued, my impression is this is similar to Facebook writing their own source control: different problems at massive scale. Can a SMB (~50 engineers) benefit from adopting Bazel?
> Can a SMB (~50 engineers) benefit from adopting Bazel?
We are ~8 engineers, and yes, definitely. However there should be good buy-in across the team (as it can be quite invasive), and depending on your choice of languages/tooling the difficulty of adoption may greatly vary.
I was the one introducing Bazel to the company and across the ~80 weeks at the company I spent maybe ~4 weeks on setting up and maintaining Bazel.
I don't know about your current setup and challenges you have with your CI system. However, compared to the generic type of build system I've seen at companies of that size, I would estimate that with 50 engineers having a single build systems/developer tooling engineer focused on setting up and maintaining Bazel should easily have a positive ROI (through increased development velocity and less time wasted on hunting CI bugs alone).
If you're doing golang in a large monorepo, in a company of 1000+ engineers, then maybe. If you're a a mobile dev in a similar sized company, then also maybe. If you have devops resources and SRE's and dedicated personnel that understand bazel, then maybe.
Personally I wouldn't touch it with a 10 foot pole. It's an opinionated task runner, with terrible docs, that if you don't configure correctly, will just hurt your dev process.
Yes absolutely (depending on what you do exactly).
The core idea behind Bazel is to make build steps truly hermetic, so you know exactly what inputs they are using. This means you can rely on caching, incremental builds, distributed builds and so on.
I'm sure if you've had any experience of Make or similar systems you've encountered "a clean build fixed it". That root cause of that is because somewhere there's a mistake in your build system where you forgot to declare a dependency on something, and it just happens to work most of the time, but maybe one time the dependency changed and Make didn't know it had to rebuild some stuff so the build breaks.
That's basically why almost everyone's CI system builds everything from scratch every time. Nobody trusts incremental builds.
Bazel goes to great lengths to make it so that you have to declare dependencies, otherwise you simply can't access them. That includes:
* Cleaning environment variables for build steps
* Running build steps in sandboxes
* Storing intermediate artefacts in random directories
* Including tools themselves (e.g. compilers) as part of the dependency tree
Honestly it's not a perfectly hermetic environment, e.g. your build steps can still read the current time, RNGs, etc. so you can still have indeterminacy in your build, but it goes a lot further than anything else.
So ultimately the upside is that you can do things like have CI only build and test things that possibly could have been affected by a change. Fix a typo in a README? It won't have to build or test anything.
My current company spend 300 compute hours and 2-6 wall hours on every CI run, even for fixing doc typos. Bazel can prevent that.
There are downsides though - Bazel was the first system to do this so it has rough edges. And all that sandboxing means there is extra effort to make debugging and IDE integration work. Also because it is super conservative about rebuilding things it can do it sometimes even when it doesn't need to. So I probably wouldn't use it on really small projects, like ones where CI time is under 10-20 minutes anyway.
There are a load of newer build systems that use the same idea: Pants, Buck, Pants 2, Please.build, etc. But obviously they don't have the momentum of Bazel.
I guess my issue is not hermeticity but ephemeral constrainers and stateless approach to things. Maybe a "one-way" hermeticity... a semi-permeability if you will.
for the frontend space, NX gives you bazel caching / features. It just doesn't cache dependencies, but in my experience with GitHub Actions running `pnpm install` or `yarn install` is not the slowest operation, its running the tools after.
Honestly, I've never missed the shared mutable environment approach one bit. It might have been marginally faster, but I'd trade a whole bunch of performance for consistency (and the optimizations mean there's not much of a performance difference). Moreover, most of the time spent in CI is not container/VM overhead, but rather crappy Docker images, slow toolchains, slow tests, etc.
When you say it's limited in what it can do, what are you comparing it to? And what do you wish it could do?
Fly has a lot of ideas here, and we've also been able to optimize how things work in terms of downloads and as for boot-up speed, it's less than 1-2s before a runner is connected.
There aren't any NixOS-like tooling that isn't incredibly burdensome. I think Nix and NixOS have the right vision, but there's way too much friction for most orgs to use. Containers are imperfect, but they're way easier to work with.
Oh yea, i use it - i get it lol. But as someone who uses NixOS, for all it's flaws the community also is quite passionate and pushes out quite a bit of features, ideas, etc. There's little experiments in all aspects of the ecosystem.
I'm just kinda surprised some Docker-esque thing hasn't stuck. Something that works with Docker, but transforms it to all the advantages of NixOS.
CI pipelines are just so rough and repetitive in plain Docker, which is what we use.
> for all it's flaws the community also is quite passionate and pushes out quite a bit of features, ideas, etc.
This hasn't been my experience. There have been significant issues with Nix since its inception and very little progress has been made. Here are a few off the top of my head:
* The nix expression language is dynamically typed and there are virtually no imports that would point you in the right direction, so it's incredibly difficult to figure out what kind of data a package requires (you typically have to find the callsite and recurse backwards to figure out what kind of data is provided or follow the data down the callstack [recurse forwards] just to discern the 'type' of data).
* The nix expression language is really hard to learn. It's really unfamiliar to most developers, which is a big deal because everyone in an organization that uses Nix has to interface with the expression language (it's not neatly encapsulated such that some small core team can worry about it). This is an enormous cost with no tangible upside.
* Package defs in nixpkgs are horribly documented.
* Nixpkgs is terribly organized (I think there is finally some energy around reorganizing, but I haven't discerned any meaningful progress yet).
I can fully believe that the community is responsive to improvements in individual packages, but there seems to be very little energy/enthusiasm around big systemic improvements.
> I'm just kinda surprised some Docker-esque thing hasn't stuck. Something that works with Docker, but transforms it to all the advantages of NixOS.
Using something like Nix to build Docker images is conceptually great. Nix is great at building artifacts efficiently and Docker is a great runtime. The problem is that there's no low-friction Nix-like experience to date.
It sounds like your issues with Nix stem from its steep adoption curve, rather than any technical concern. This _is_ a concern for a team that needs to manage it - I agree.
I'm quite diehard in terms of personal Nix/NixOS use, but I hesitate to recommend to colleagues as a solution because the learning curve would likely reduce productivity for quite some time.
That said - I do think that deterministic, declarative package/dependency management is the proper future, especially when it comes to runtime environments.
> It sounds like your issues with Nix stem from its steep adoption curve, rather than any technical concern
Not only is it difficult to learn (although that's a huge problem), but it's also difficult to use. For instance, even once you've "learned Nix", inferring data types is an ongoing problem because there is no static type system. These obstacles are prohibitive for most organizations (because of the high-touch nature of build tooling).
> This _is_ a concern for a team that needs to manage it
The problem is that there isn't "one team that needs to manage it"; every team needs to touch the build definitions or else you're bottlenecking your development on one central team of Nix experts which is also an unacceptable tradeoff. If build tools weren't inherently high-touch, then the learning curve would be a much smaller problem.
Sorry, I wasn't clear - I wasn't implying there should be a central team to manage it. One of the beauties of Nix is providing declarative dev environments in repositories, which means to fully embrace it each individual team should own it for themselves.
At best a central team would be useful for managing an artifactory/cache + maybe company-wide nixpkgs, but in general singular teams need to decide for themselves if Nix is helpful + then manage it themselves.
Agreed. It's just that when every team has to own their stuff, usability issues become a bigger problem and afaict the Nix team is not making much progress on usability (to the extent that it seems like they don't care about the organizational use case--as is their prerogative).
Firecracker is very cool, I wish/hope tooling around it matures enough to be super easy. I'd love to see the technical details on how this is run. It looks like it's closed source?
The need for baremetal for Firecracker is a bit of a shame, but it's still wicked cool. (You can run it on a DO droplet but nested virtualization feels a bit icky?)
I run a CI app myself, and have looked at firecracker. Right now I'm working on moving some compute to Fly.io and it's Machines API, which is well suited for on-demand compute.
We're running a pilot and looking for customers who want to make CI faster for public or self-hosted runners, want to avoid side-effects and security compromise of DIND / sharing a Docker socket or need to build on ARM64 for speed.
The article does not say what a MicroVM is. From what I can gather, it's using KVM to virtualize specifically a Linux kernel. In this way, Firecracker is somewhat intermediate between Docker (which shares the host kernel) and Vagrant (which is not limited to running Linux). Is that accurate?
Is it possible to use a MicroVM to virtualize a non-Linux OS?
It is, but is also a very low-level tool, and there is very little support around it. We've been building this platform since the summer and there are many nuances and edge cases to cater for.
But if you just want to try out Firecracker, I've got a free lab listed in the blog post.
I hear Podman desktop is also getting some traction, if you have particular issues with Docker Desktop.
Hey thanks for the feedback. We may do some more around this. What kinds of things do you want to know?
To get hands on, you can run my Firecracker lab that I shared in the blog post, then add a runner can be done with "arkade system install actions-runner"
Not the poster you were replying to, but I've looked at your firecracker init lab (cool stuff!) and just wondering how that fits together with a control plane. Would be cool to see how the orchestration happens in terms of messaging between host/guest and how I/O is provisioned on the host dynamically.
Wondering if it would be possible to run macos. The hosted runner of Github Actions for macos are really really horrible, our builds take easily 2x to 3x more time than hosted Windows and Linux machines.
The interesting part of this is that the client supplies the most difficult resource to get for this setup. As in, a machine on which Firecracker can run.
Users provide a number of hosts and run a simple agent. We maintain the OS image, Kernel configuration and control plane service, with support for ARM64 too.
Great stuff, undeniably. There’s not much going on in the open source space around multi-host schedules for Firecracker. So that’s a mountain of work.
With regards to the host, I made remark because of Firecracker requirements regarding virtualisation. Running Firecracker is no brainer when an org maintains a fleet of their own hardware.
hm. well assuming they have no network and don't otherwise encrypt bits that an attacker could get ahold of, it's probably fine.
the bigger issue would be something like spawning a bunch of servers that share the same rng state that can then be manipulated by an attacker (and therefore encrypt different data with the same key+nonce and such).
It seems like buildjet is competing directly with GitHub on price (GitHub has bigger runners available now, pay per minute), and GitHub will always win because they own Azure, so I'm not sure what their USP is and worry they will get commoditised and then lose their market share.
Actuated is hybrid, not self-hosted. We run actuated as a managed service and scheduler, you provide your own compute and run our agent, then it's a very hands-off experience. This comes with support from our team, and extensive documentation.
that doesn't solve them all. the main problem is secrets. if a job has access to an api token that can be used to modify your code or access a cloud service, a PR can abuse that to modify things it shouldn't. a second problem is even if you don't have secrets exposed, a PR can run a crypto miner, wasting your money. finally, a self-hosted runner is a step into your private network and can be used for attacks, which firecracker can help mitigate but never eliminate.
the best solution to these problems is 1) don't allow repos to trigger your CI unless the user is trusted or the change has been reviewed, 2) always use least privilege and zero-trust for all access (yes even for dev services), 3) add basic constraints by default on all jobs running to prevent misuse, and then finally 4) provide strong isolation in addition to ephemeral environments.