Hacker News new | past | comments | ask | show | jobs | submit login

Firecracker is great We use it to run fleets of fast booting vms at https://fly.io.

It’s really the best OSS to come out of Amazon.




I played around with fly.io for a bit, it seems pretty interesting. It works pretty well too, I went through the setup for the DoH proxy and the latency I get is very similar to Cloudflare itself, so that's pretty awesome.

It seems that the autoscaling limits are only defined in the fly.toml with the soft and hard limits? It might be useful to make this easily visible under flyctl scale. Also if I delete the fly.toml, can I regenerate it easily?

As a sidenote, I was looking around for more information on the platform and looking at old hn posts. I know the company pivoted a couple of times, but all the old articles are 404ing because the blog url changed.


That's nice to read! Thanks.

We do need to cleanup our old blog posts and links. We created a lot of content at various times. This content is not always relevant anymore.

As for your fly.toml question, you can get the config with `flyctl config save -a your-app`. It'll create a fly.toml with the latest config we know about.

Concurrency limits are still being worked on. They should definitely be visible in more places. The only way to know about them right now is from the fly.toml, that's not ideal.


I dabbled with an idea similar to Fly.io's Heroku supercharging functionality (Turboku?).

One issue I encountered is that the app in question does not benefit from full-page caching. Even if we deployed our app through Fly.io, we'd still have our databases hosted somewhere else. How does Fly.io solve this, or how could we solve this?

When I dabbled with this idea, I thought about deploying DB read-only replicas around the world. There would be some replication lag, but for the app in question, that would not be problematic. Writes would still be affected by the added latency, but this would not be that problematic as the fast majority of the queries are reads.


You've pretty much nailed the problem. The ?good? news though is that Heroku is really slow, so just running Firecracker VMs on real hardware, doing edge TLS, and adding http2 + brotli is a huge win.

When people use https://fly.io/heroku, we launch VMs in the same region their Heroku app is in so there's no latency hit to the DB. Weirdly, latency between a Fly app a DB on AWS in the same region is sometimes even better than AWS cross-zone latency.

We _also_ give apps a special global Redis cache (https://fly.io/docs/redis). This is sometimes enough to make a full stack app multi-regional, usually people cut way down on DB queries when they use their framework's caching abilities, which can make it pretty nice to run a Rails app + cache in, say, San Jose while the DB is in Virginia.

I know of a couple of devs running Elixir apps on Fly that leave a data service in the region where their DB is and basically rpc to it from other regions, which seems to work well.

Read replicas are a good idea, we'd actually like to try that out at some point. It seems pretty doable to put something like pgbounce/pgpool in front of a read replica and let it handle routing write transactions properly.


> The ?good? news though is that Heroku is really slow

Did you do any measurements and if so, on which dyno types? We found that using Performance-M dynos' gives us a rather large performance boost. Performance-m dyno's are also more stable because they run on dedicatd hardware. They're expensive, but we don't run any apps in production without it.

One thing that worked really well for us is to just put Cloudflare or Cloudfront in front of our app. As I mentioned, we don't do any full-page caching. We cache pretty much everything else, but pages themselves have zero caching (business requirement). I believe Cloudflare and Cloudfront also do edge TLS.

> Read replicas are a good idea, we'd actually like to try that out at some point. It seems pretty doable to put something like pgbounce/pgpool in front of a read replica and let it handle routing write transactions properly.

This is going to be tricky. We weren't able to set up replication from Heroku Postgres databases to hosts outside of Heroku. Another thing to keep in mind that is it might be better to let the app decide what is a read query and what is a write query. We have some parts of the app that we need reading directly from the master, so we let the app handle it. The app receives two database URI's, both pointing to pgbouncer.


Cloudflare and Fly are both reverse-proxy CDN services that handle caching and TLS at the edge. They also both support running arbitrary logic at the edge. Cloudflare has Workers (javascript web workers API) with their custom KV key/value persistent data layer. Fly started similar but now supports containers running anything and has a Redis non-persistent cache layer.

If all you're doing is caching some endpoints and TLS termination then either will work. Cloudflare has a bigger network with robust security capabilities, Fly has more flexibility in application logic you can run.

Data has gravity and having a globally distributed database layer is something companies have spent millions on. Usually the solution is to cache as much as possible in each region first, then look at doing database replicas, and eventually multi-regional active/active database scale-outs.


> Did you do any measurements and if so, on which dyno types? We found that using Performance-M dynos' gives us a rather large performance boost. Performance-m dyno's are also more stable because they run on dedicatd hardware. They're expensive, but we don't run any apps in production without it.

We did some measurements, but mostly focusing on the network bits (which you largely solved with CloudFlare): https://fly.io/blog/turboku/

I was surprised at how much faster things seemed on our VMs vs Heroku's Dynos, to be honest. We only compared Standard dynos, but we should be even better price vs performance compared to the performance dynos since we run our own physical servers. A performance-m dyno on Heroku costs about the same as 8 cpus on fly.

It's totally self serving, but if you feel like playing around with the Fly stuff I'd love to know how it compares.

> This is going to be tricky. We weren't able to set up replication from Heroku Postgres databases to hosts outside of Heroku. Another thing to keep in mind that is it might be better to let the app decide what is a read query and what is a write query. We have some parts of the app that we need reading directly from the master, so we let the app handle it. The app receives two database URI's, both pointing to pgbouncer.

This is why I think the in memory caching is such a good option. Usually if I'm building an app, I'll add a caching layer before a DB replica. Write through caching seems to fit my mental processes better. :D


I tried adding Cloudfront in front of an app hosted on Heroku with page caching off (Vary by Cookie) and it increased latency to 3X, never could figure out why. Would like to do it but seems like way too much of a trade off.


Shameless plug, https://fly.io/heroku gives you a lot of the benefit of Cloudfront without adding a layer. It's like running Heroku with a modern router.

--edit-- I confused Cloudfront and CloudFlare yet again. :)


Yeah, I’m gonna try this out. I’m writing a book authoring platform and I want to let people give access to their books on their own domains. Is there an API to add custom domains with LE certs to my app? And is there a limit on how many domains I can add?


There is indeed a certificate API! We’re putting up a guide for it this week, I can send you the draft if you’d like. The CLI commands for managing certs are here: https://fly.io/docs/flyctl/certs/

There’s no limit to domains.


Hi Kurt,

What are usual cold-start times you see with firecracker?

What other VMMs or Unikernels did you consider before settling on firecracker?

Was the firecracker documentation good enough or did you have to go digging through emails or code to figure out certain things?

What was the hardest part of using firecracker in production?

Thanks.


Hello again!

Cold starts depend a lot on what people actually deploy. They're really fast for an optimized Go binary, really slow for most Node apps. We were playing with Deno + OSv just today and got an app to boot and accept an HTTP request in about 20ms. That assumes you have the root fs all built and ready to go, though, pulling down images and prepping everything is a bit of a bottleneck for that.

We looked at gvisor pretty hard but preferred more traditional virtualization. We didn't look much at other virtualization options, Firecracker was really good from day one.

The docs were pretty good. We ended up having to build a bunch, though, probably just because of the nature of our product. We built a custom init (in Rust), a Docker image to root fs builder, and a nomad task driver (both in Go). The init includes and rpc mechanism so we can communicate with the VM.

Firecracker was pretty easy, building the scaffolding to use it was a little harder, but the vast majority of our time is spent on the dev UX and proxy/routing layer.


Your tech stack is really fantastic and cutting-edge.

Core product in Rust, Firecracker Micro-VMs, Nomad instead of k8s (never used it myself but see the strong value in it and think it makes sense + deserves more attention), and experimentation with Deno (huge fan).

I wish I could clone myself and do some work for you guys just to soak up that knowledge.


Even after reading the comments and looking through the site I'm still not sure what fly.io is from a developer perspective. Is it a drop in replacement for heroku? How different is it from cloud run or cloud functions?


It's closest to Cloud Run. We run your containers and scale across regions to minimize your users' latency.

You can't quite use it as a drop in replacement for Heroku since we don't have a Postgres offering. You can use fly to replace the web dynos in a Heroku app for faster performance, though.


Thanks for clarifying! Does it scale down to zero? If I have a very small hobby app that may only have a few users a day would it make sense to throw it on to fly.io?


It would make sense! We give free credits specifically for side/hobby apps (in theory, you can run ~3 microscopic VMS pretty much full time with this): https://fly.io/docs/pricing/#free-for-side-projects

We don't scale to 0 because the cold start experience for most apps is brutal. In the future we may be able to suspend VMs and wake them up quickly, or even migrate them to regions with idle servers.


What do you use for orchestration?


Nomad + our own firecracker task driver. There's a promising open source task driver for Firecracker as well (ours does a ton that's specific to our networking setup): https://nomadproject.io/docs/drivers/external/firecracker-ta...


What's OSS in this context?


Open source software.


Oh FOSS. Got it.


Do people still try to claim that source available is open source? I go by the OSI definition.


That was weird, I'm not sure why that comment got flagged.

I'm not sure what you're asking.

I really was confused about what "OSS" meant w/o the "F" in front.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: