There's something to say about building a tower of abstractions and then trying to tear it back down. We used to just run a compiler on a machine. Startup time: 0.001 seconds. Then we'd run a Docker container on a machine. Startup time: 0.01 sections. Fine, if you need that abstraction. Now apparently we're booting full VMs to run compilers - startup time: 5 seconds. But that's not enough, because we're also allocating a bunch of resources in a distributed network - startup time: 40 seconds.
Do we actually need all this stuff, or does it suffice to get one really powerful server (price less than $40k) and run Docker on it?
We don't need all of those layers and abstractions of course. But if we do things right we also don't need to go the bare metal server route -- cloud platforms, if done right, can provide both strong, hardware-level (read: vm) isolation plus fast starts.
On kraft.cloud (shameless plug) we build extremely specialized VMs (aka unikernels) where most of the code in them is the application code, and pair this with a fast, custom controller and other perf tweaks. We use Dockerfiles to build from, but when deploying we eliminate all of those layers you mention. Cold boot times are in milliseconds (e.g., nginx 20ms, a basic node app ~50ms), as are scale to zero and autoscale.
A really powerful server should not cost you anywhere near $40k unless you're renting bare metal in AWS or something like that.
Getting rid of the overhead is possible but hard, unless you're willing to sacrifice things people really want.
1. Docker. Adds a few hundred msec of startup time to containers, configuration complexity, daemons, disk caches to manage, repositories .... a lot of stuff. In rigorously controlled corp environments it's not needed. You can just have a base OS distro that's managed centrally and tell people to target it. If they're building on e.g. the JVM then Docker isn't adding much. I don't use it on my own companies CI cluster for example, it's just raw TeamCity agents on raw machines.
2. VMs. Clouds need them because they don't trust the Linux kernel to isolate customers from each other, and they want to buy the biggest machines possible and then subdivide them. That's how their business model works. You can solve this a few ways. One is something like Firecracker where they make a super bare bones VM. Another would be to make a super-hardened version of Linux, so hardened people trust it to provide inter-tenant isolation. Another way would be a clean room kernel designed for security from day one (e.g. written in Rust, Java or C#?)
3. Drives on a distributed network. Honestly not sure why this is needed. For CI runners entirely ephemeral VMs running off read only root drive images should be fine. They could swap to local NVMe storage. I think the big clouds don't always like to offer this because they have a lot of machines with no local storage whatsoever, as that increases the density and allows storage aggregation/binpacking, which lowers their costs.
Basically a big driver of overheads is that people want to be in the big clouds because it avoids the need to do long term planning or commit capital spend to CI, but the cloud is so popular that providers want to pack everyone in as tightly as possible which requires strong isolation and the need to avoid arbitrary boundaries caused by physical hardware shapes.
How do you get Docker container startup time of 0.01s with any real-life workload (yes, I know they are just processes, so you could build a simple "hello world" thing, but I'd be surprised if even that runs this fast)?
Do you have an example image and network config that would demonstrate that?
(I'd love to understand the performance limits of Docker containers, but never played with them deeply enough since they are usually in >1s space which is too slow for me to care)
On kraft.cloud we use Dockeffiles to build into extremely specialized VMs for deployment. With this in place, we can have say an nginx server cold started and ready to serve at a public URL in about 20 millis (not quite the 10ms you mention, but in the right ballpark, and we're constantly shaving that down). Heavier apps can take longer of course, but not too much (e.g., node/next < 100ms). Autoscale and scale to zero also operate in those timescales.
Underneath, we use specialized VMs (unikernels), a custom controller and load balancer, as well as a number of perf tweaks to achieve this. But it's (now) certainly possible.
Still, that mostly confirms my experience: to achieve this level of performance, you need to do optimizations on a lower level, and this is not really achievable with docker out of the box (plain Linux host with usual Docker runtime).
Maybe your container isn't set up right - Docker contains run directly on the host, just partitioned off from accessing stuff outside of themselves with the equivalent of chroot. Or it could be a Mac-specific thing. Docker only works that way on Linux, and has to emulate Linux on other platforms.
Right, they said they're on a macbook so unless they're going out of their way to run Linux bare-metal it has to use a VM. And AIUI there are extra footguns in that situation, especially that mapping volumes from the host is slower because instead of just telling the kernel to make the directory visible you have to actually share from the host to the VM.
> Shared folders are designed to allow application code to be edited on the host while being executed in containers. For non-code items such as cache directories or databases, the performance will be much better if they are stored in the Linux VM, using a data volume (named volume) or data container.
And we are curious why it is like so because we not only understand how to do stuff without containers, we also understand how containers work and your claim sounds off.
I'm saying it is slower on docker due to container startup, pulling images, overheads, working out what incantations to run, filesystem access, network weirdness, things talking to other things, configuration required, pull limits, API tokens, all sorts.
I’m using VMs these day because of conflicts and inconsistencies between tooling. But the VM is dedicated to one project and I set it up just like a real machine (GUI, browser, and stuff). No file sharing. It’s been a blast.
And you usually get lumbered with some shitty thing like github actions which consumes one mortal full time to keep it working, goes down twice a month (yesterday wasn't it this week?), takes bloody forever to build anything and is impossible to debug.
Do we actually need all this stuff, or does it suffice to get one really powerful server (price less than $40k) and run Docker on it?