It’s really the best OSS to come out of Amazon.
It seems that the autoscaling limits are only defined in the fly.toml with the soft and hard limits? It might be useful to make this easily visible under flyctl scale. Also if I delete the fly.toml, can I regenerate it easily?
As a sidenote, I was looking around for more information on the platform and looking at old hn posts. I know the company pivoted a couple of times, but all the old articles are 404ing because the blog url changed.
We do need to cleanup our old blog posts and links. We created a lot of content at various times. This content is not always relevant anymore.
As for your fly.toml question, you can get the config with `flyctl config save -a your-app`. It'll create a fly.toml with the latest config we know about.
Concurrency limits are still being worked on. They should definitely be visible in more places. The only way to know about them right now is from the fly.toml, that's not ideal.
One issue I encountered is that the app in question does not benefit from full-page caching. Even if we deployed our app through Fly.io, we'd still have our databases hosted somewhere else. How does Fly.io solve this, or how could we solve this?
When I dabbled with this idea, I thought about deploying DB read-only replicas around the world. There would be some replication lag, but for the app in question, that would not be problematic. Writes would still be affected by the added latency, but this would not be that problematic as the fast majority of the queries are reads.
When people use https://fly.io/heroku, we launch VMs in the same region their Heroku app is in so there's no latency hit to the DB. Weirdly, latency between a Fly app a DB on AWS in the same region is sometimes even better than AWS cross-zone latency.
We _also_ give apps a special global Redis cache (https://fly.io/docs/redis). This is sometimes enough to make a full stack app multi-regional, usually people cut way down on DB queries when they use their framework's caching abilities, which can make it pretty nice to run a Rails app + cache in, say, San Jose while the DB is in Virginia.
I know of a couple of devs running Elixir apps on Fly that leave a data service in the region where their DB is and basically rpc to it from other regions, which seems to work well.
Read replicas are a good idea, we'd actually like to try that out at some point. It seems pretty doable to put something like pgbounce/pgpool in front of a read replica and let it handle routing write transactions properly.
Did you do any measurements and if so, on which dyno types? We found that using Performance-M dynos' gives us a rather large performance boost. Performance-m dyno's are also more stable because they run on dedicatd hardware. They're expensive, but we don't run any apps in production without it.
One thing that worked really well for us is to just put Cloudflare or Cloudfront in front of our app. As I mentioned, we don't do any full-page caching. We cache pretty much everything else, but pages themselves have zero caching (business requirement). I believe Cloudflare and Cloudfront also do edge TLS.
> Read replicas are a good idea, we'd actually like to try that out at some point. It seems pretty doable to put something like pgbounce/pgpool in front of a read replica and let it handle routing write transactions properly.
This is going to be tricky. We weren't able to set up replication from Heroku Postgres databases to hosts outside of Heroku. Another thing to keep in mind that is it might be better to let the app decide what is a read query and what is a write query. We have some parts of the app that we need reading directly from the master, so we let the app handle it. The app receives two database URI's, both pointing to pgbouncer.
If all you're doing is caching some endpoints and TLS termination then either will work. Cloudflare has a bigger network with robust security capabilities, Fly has more flexibility in application logic you can run.
Data has gravity and having a globally distributed database layer is something companies have spent millions on. Usually the solution is to cache as much as possible in each region first, then look at doing database replicas, and eventually multi-regional active/active database scale-outs.
We did some measurements, but mostly focusing on the network bits (which you largely solved with CloudFlare): https://fly.io/blog/turboku/
I was surprised at how much faster things seemed on our VMs vs Heroku's Dynos, to be honest. We only compared Standard dynos, but we should be even better price vs performance compared to the performance dynos since we run our own physical servers. A performance-m dyno on Heroku costs about the same as 8 cpus on fly.
It's totally self serving, but if you feel like playing around with the Fly stuff I'd love to know how it compares.
> This is going to be tricky. We weren't able to set up replication from Heroku Postgres databases to hosts outside of Heroku. Another thing to keep in mind that is it might be better to let the app decide what is a read query and what is a write query. We have some parts of the app that we need reading directly from the master, so we let the app handle it. The app receives two database URI's, both pointing to pgbouncer.
This is why I think the in memory caching is such a good option. Usually if I'm building an app, I'll add a caching layer before a DB replica. Write through caching seems to fit my mental processes better. :D
I confused Cloudfront and CloudFlare yet again. :)
There’s no limit to domains.
What are usual cold-start times you see with firecracker?
What other VMMs or Unikernels did you consider before settling on firecracker?
Was the firecracker documentation good enough or did you have to go digging through emails or code to figure out certain things?
What was the hardest part of using firecracker in production?
Cold starts depend a lot on what people actually deploy. They're really fast for an optimized Go binary, really slow for most Node apps. We were playing with Deno + OSv just today and got an app to boot and accept an HTTP request in about 20ms. That assumes you have the root fs all built and ready to go, though, pulling down images and prepping everything is a bit of a bottleneck for that.
We looked at gvisor pretty hard but preferred more traditional virtualization. We didn't look much at other virtualization options, Firecracker was really good from day one.
The docs were pretty good. We ended up having to build a bunch, though, probably just because of the nature of our product. We built a custom init (in Rust), a Docker image to root fs builder, and a nomad task driver (both in Go). The init includes and rpc mechanism so we can communicate with the VM.
Firecracker was pretty easy, building the scaffolding to use it was a little harder, but the vast majority of our time is spent on the dev UX and proxy/routing layer.
Core product in Rust, Firecracker Micro-VMs, Nomad instead of k8s (never used it myself but see the strong value in it and think it makes sense + deserves more attention), and experimentation with Deno (huge fan).
I wish I could clone myself and do some work for you guys just to soak up that knowledge.
You can't quite use it as a drop in replacement for Heroku since we don't have a Postgres offering. You can use fly to replace the web dynos in a Heroku app for faster performance, though.
We don't scale to 0 because the cold start experience for most apps is brutal. In the future we may be able to suspend VMs and wake them up quickly, or even migrate them to regions with idle servers.
I'm not sure what you're asking.
I really was confused about what "OSS" meant w/o the "F" in front.
Two questions in case someone from Weave tunes into the discussion:
I got the impression that VMs needed an SSH server to be accessible. Is this correct and if so will it be possible to implement something similar to docker exec so that I won't need an SSH server on every VM?
> At the moment ignite and ignited need root privileges on the host to operate due to certain operations (e.g. mount). This will change in the future.
Is there a timetable and could you perhaps elaborate a bit as to why it currently requires root? (I don't know anything about virtual machine internals so this isn't a passive-aggressive question from my side. It's genuine curiosity.)
Virtual Machines are provided by software or hardware emulation which run separate guest OS with own kernel. There is no standard way for a host to let you run any process and interact with its stdio inside guest OS because the host simply is not aware what you exactly run inside.
The solution is to have an agreed connectivity standard both on the guest and the host. The guest can provide SSH server, telnet server serial terminal, irc bot or some other kind of control capability. Then of course host needs a tooling too, e.g. SSH client.
Not from Weave, but I might have an idea as I've played with Firecracker a bit.
When you start up a Firecracker VM, you need to provide it with a rootfs drive, which is a file containing the root file system to be used for the VM. Ignite uses OCI images, so I guess they are doing something similar to  in code, the `mount` part requires sudo, so that would be my guess to why you need root.
But more generically, if you have to log into your cattle for troubleshooting, you probably need a better logging infrastructure.
Then again, if you are referring to how to initially install software, wouldn’t you usually just create an image for it to run?
I use it to figure out what the image and orchestration settings should be. Packaging an application for containers and container orchestration takes me many, many, attempts to get right. Personally I'm unable to divine the correct combination of settings by reading documentation alone so I try something, enter the container, and look at the outcome.
If you want the capability to exec processes from the host into the VM, I think either Docker API or Kube API is the thing for that, as I understand it. If you could kernel exec processes directly into a VM, then it would not be isolated from the host, this seems almost tautological.
You can arrange for process execution another way than SSH, Docker, or Kube API but regardless of what shape it takes, it will still be an entry point in similar fashion to any of these, as the MicroVM or VM runs its own kernel on KVM and does not talk to the host in this way. Perhaps someone knows more about KVM and can clue me in further if there is more here than meets the eye and maybe what you said is possible.
If you don't need the isolation of a proper VM and were only looking for a roughly VM-shaped system that you CAN "kernel exec" or use nsenter to get processes into, you should look at Footloose.
I'm suggesting what you are looking for is actually a container that looks more like a VM or bare-metal machine from the perspective of inits and with the vantage point of the processes running inside.
By default, Footloose nodes are running SSH and SystemD, may appear to work similarly to Ignite VMs, but they are Docker containers that may or may not run privileged mode.
So, if it suits you, then you could start up Footloose "VMs" as I still call them, strip SSH from them, then nsenter or Docker exec into them as you desire, or run Kubernetes on them and use the Kube API including exec.
That is actually a lead in to the next project, known as Firekube, kind of a mashup of all these technologies plus one more (wksctl). Firekube integrates both Ignite and Firecracker as well, so you can use it similarly on Linux, (where KVM support is available for Ignite), or MacOS, where Footloose runs container-VMs instead, both behave alike; this suite of projects all put together is a very slick and well integrated package IMHO. It is probably comparable in functionality to Minikube, but with GitOps baked right in.
Disclosure: I am not working for Weaveworks, but we are good friends.
Probably the most interesting feature.
You get a VM with user space networking and a one button keyboard. This allows the kernel too aggressively swap out unused resources and ultimately increase the total concurrency supported.
This sounds like a hybrid system. The intermediary is cooperative, the client code is oblivious. I’m curioue to see how this plays out over the long haul.
Coding practices change like the seasons. Every year is a little different, maybe a little better than the last, some things feel brand new, but the general patterns hold.
We've booted some apps in Firecracker in <20ms, so if you build the app properly you can absolutely do a new VM per request or TCP connection even.
And if I'd read farther down the chain, someone said that Amazon already does this.
>Still no nested virtualization on aws
It should work on some azure instances and gce instances when you enable nested virtualization.
Or is this intended to be separate, a competitor?
Firecracker can be used in kubernetes through containerd+kata containers integration. (Containerd from docker is a key tech behind both docker and K8s, kata containers from Intel enables containerd to run vms)
I don't think the experience is great yet, but it's on the way.
This title is a bit awkwardly worded; in particular, "techno" in American English is a genre of music, not an abbreviation for "technology".
Submitters: please don't do that—this is in the site guidelines: https://news.ycombinator.com/newsguidelines.html. If a title is misleading or baity, please rewrite it, but please also make it good English.
As such, I'm never really sure if this is something worth playing with. I've played with docker and k8s, and generally understand how those tools help me, but I'm unsure about how firecracker would help unless I'm building a PaaS like fly.io.
It would be really cool if someone wrote a blog post which compared running a service on the different comparable options. Developers have the Todo apps written in dozens of frameworks to compare against. Is a similar type of exploration not feasible here?
From what I understand, even QEMU can work in different modes, either emulating hardware or a system call interface. So I'm not sure which of those modes you are referring to.
“ Firecracker enables you to deploy workloads in lightweight virtual machines, called microVMs, which provide enhanced security and workload isolation over traditional VMs, while enabling the speed and resource efficiency of containers”
Security of a VM + efficiency of a container.
Firecracker looks very promising from a server-side technology stand-point but the support for AMD, RISC-V platforms couldn't be stressed more enough.
Amazon better find a way of supporting AMD processors since Intel's CPU bugs are being brought into the sunlight and exploited in all directions by security researchers which have cataclysmic implications for users and server providers these days. This is demonstrated by a ridiculous Intel vulnerability which rendered Apple's FileVault encryption facilities completely useless which is absolutely unacceptable to Apple. There are many other CPU vulnerabilities waiting to be found and it could be the next Meltdown-like candidate.
The sooner the move to AMD or RISC-V open technologies, the better for developers and users.
Firecracker is open source. We welcome community contributions to bring the technology to additional CPU architectures.
Also, as other commenters have noted, AMD support is already in-tree. It's just in Developer Preview, which indicates the relative level of maturity.
AWS does not have any RISC-V processors in its EC2 offering portfolio, but if customers are demanding them, we'd love to hear from you - please reach out to your account team.
Missing arm support is also suprising but less so, as arm market penetration is obviously lower than x64
The shift is underway but it takes time.
It was also independently implemented in POWER and ARM A75
One prime example when with virtio it was possible to get native performance _after_ minimum configuration tuning:
More details on virtio:
Also, "not good enough" kind of needs a definition of a workload and what's "good enough" for it.
I'm sorry that my first comment was not enough informative.Now, I'm refering to the paper that I read couple of days ago . If you look at the figure 8-9, FC has some issues with IO.
This is an odd thing to say; surely that depends on the application? I doubt most FaaS users are disk IO-bound?