Hacker News new | past | comments | ask | show | jobs | submit login
Fly.io: The reclaimer of Heroku's magic (christine.website)
528 points by sealeck on May 15, 2022 | hide | past | favorite | 299 comments

At one point (a very long time ago now) it was declared that Dogwood was the future and as a result Go would be the language of choice at Heroku and Erlang would be no more.

Trouble is that Erlang ran all the important Cedar code (it might still today) and the Erlang engineers didn't particularly like the news that Erlang code was essentially deprecated so they left and nobody knew how to maintain the stack. This definitely wasn't the only problem we had but it was a big one.

What do fellow Herokai think? Was Dogwood a fool's errand? Or did we just not get enough staff to build it properly?

I just don't understand all the fetishism for go. At its core, its a brutally pragmatist language optimized for engineers who barely understand closures while running reasonably fast.

need a garbage collected native compiled language?/

- ocaml fits the bill and provides better type checking

need a fast as possible system?

- rust is faster and has much better abstraction for type checking. unless you're writing a throwaway or a very short lambda function, rust is almost always a better choice here as handling errors and maintaining the code is going to be more important overtime and go is just now getting its generics story straight

need a networking application?

- elixir (and erlang) do this so much better. it has 30+ years of high reliability networking built in and its about as fast as go. additionally, fault tolerance and error handling is so much better. I have real parallelism out of the box and async primitives that make goroutines look like a joke.

additionally, all 3 (ocaml, rust and elixir) give you proper tools for handling error branches. go downgrades you back to c style which works but means your code is going to evolve into a god damn mess as there's no way to separate your error path from your happy path

Literally the only place I see go making sense are small scripts that need to be written asap and wont' need much long term maintenance. for everything else, go seems woefully inadequate.

Rust probably didn't exist when this decision was made.

Out of the box from day one, Go is great at writing HTTP services including proxies, which is a large part of what Heroku needed. Ocaml is harder to use and not a popular choice for such things. Go has easy to follow docs and tonnes of useful contemporary libraries. Go is especially easy to pick up for anyone with older C experience.

I've found Go excellent for the long term, when you come back to something that hasn't been touched for years have passed, it compiles and run quickly and easily. I wouldn't have thought it was any good until I actually used it for something.

Also, concurrency in Go is braindead easy, there are multiple choices of "worker pool" libraries and queue/messaging choices. You don't even have to know about channels to do work across cores.

EDIT: Having said that, if you already had Erlang and an experienced team, you wouldn't ditch that for Go. Why do companies do this? Is there some famous historical case where keeping and growing a highly experienced team has backfired?

> Having said that, if you already had Erlang and an experienced team, you wouldn't ditch that for Go. Why do companies do this?

You don't get promoted until you do the new shiny thing. This seems to be especially prominent in Google and the likes. Big companies do not give out bonuses/promotions for those who quietly sit and work with existing, proven stack, maintaining it and fixing bugs. Shiny new things get noticed.

It might also be caused by staffing issues? Erlang/Elixir to this day are pretty niche stacks. It's much easier to find Go developers(or developers willing to switch to Go), than Erlang.

I worked at one of the beneficiaries of that exodus with some ex-Heroku Erlang folk. Can't speak to if Dogwood was a fools errand or not but dumping Erlang in favor of Go considering the quality of Erlang talent they had on staff was definitely a colossal fuckup.

Have seen similar transitions in different companies, and while I can't say anything about Heroku, it is often not the target technology or architecture that matters.

For instance I've seen PHP -> nodejs transition while moving to micrservices, and while the ideas made sense on paper:

- It didn't come from the engineers at large. Most weren't phased by the prospect and the main engine for the change was the top architects and the engineering management.

- target architecture and language were very popular, and easy to hire. Incidentally salaries would be cheaper for same level of experience engineers.

Predictidbly a ton of the existing engineers left and new blood came in, and it was mostly according to plan from what I gathered. Some of the "old folks" stayed at a premium once they were the only ones left with enough knowledge, but they got relagated in side "experts" roles and the product as a whole saw a big change in philosophy (I think mentally what was seen as the "core" also became "legacy" in everyone's mind as engineers moved away)

This is pretty common in corporates too. Shiny new project gets all the attention, funding and management focus. The poor guys left to deal with the day to day crap get screwed over then leave. If they leave too soon its an emergency, if they stay too long they get laid off.

Oh for sure. The last company I worked at did a similar thing and decreed that all Ruby projects would be maintenance-only and all new projects were to be in Golang, despite there being a single golang dev in the entire company (who did not even work in the backend dev team but rather in a "platform" team). Within five months all the Ruby devs had left, so despite not a single Golang project being released yet the company now had nobody who knew how the current system works. Several large outages ensued.

Yeah.. I'm seeing the same thing where I work. Moving from ruby/rails to elixir. No outages, but dev speed is at a crawl. All the legacy stuff just gets ugly patches.

There was zero need to switch because even though elixir/BEAM allows for a pile of processes, postgres does not.

If I had to guess, the switch was made because someone saw something shiny.

Lets re-write it they said. It will be easier to maintain they said. It will be more use friendly they said. We're Netscape and there's nothing like us they said.

I think that it’s a very complicated answer to this question, .

Sorry I couldn't find a reference, what is dogwood?

Heroku named all of their stacks after an alliterated color + tree convention:

- Argent Aspen

- Badious Bamboo

- Celadon Cedar

I can’t remember what to color name for the Dogwood stack was meant to be, we mostly only ever referred to them by the tree name and dropped the color. Suffice to say that Dogwood was meant to be the 4th major evolution of what Heroku was.

Second paragraph in the article

The next phase after Cedar. It was mostly a "Go 2" style mythical benchmark that they compromised on for private spaces but was never really able to meet.

It’s mentioned in the article.

The only thing I don't like is their usage-based pricing. On Heroku I could pay $7 a month and know I'd never be charged more than that. I'm sure when you're scaling a service it's fine - maybe even better - to do it on a sliding scale. But for a fire-and-forget blog site, I don't want to have to worry about stuff like that.

This is a problem. And a bit of an own goal on our part.

I hate services that don't put a price on things like bandwidth (because there's always a price!). So we priced bandwidth and made it transparent. You can put an app on Fly.io and server petabytes of data every month, if you want. We'll never complain that you're serving the wrong content type.

But the reality is – having an unlimited bandwidth promise is perfect for for a fire and forget blog site. We're not doing ourselves any favors with scary pricing for that kind of app.

I don’t want an unlimited bandwidth promise, I want a cap that I know can never be exceeded. I mean, I use Azure professionally and one of the key reasons I don’t use it to host my own stuff is exactly because it could potentially become very expensive. I’d rather have my own stuff shut down until I decide what I want to do with it.

Things like alerts are fine, professionally, but not for things like running a small app, blog or whatever, that you’re not sure where is heading.

I don’t think anything I’ve build on my own time has ever ended up breaking my bank, but signing up my credit card is a risk I’m never going to take, and I’m fairly certain I’m not alone in that. Of course I have no idea if there are enough of us to make small scale fixed prices products profitable at scale.

We actually launched with that feature: https://news.ycombinator.com/item?id=22616857

No one took us up on it. What we found is that the majority of people want their stuff to stay up, and the right UX for "shut it down so you don't get billed" is not obvious.

We ended up implementing prepayment instead. If you sign up and buy $25 in credit, we'll just suspend your apps when the credit runs out.

Bandwidth is weird because we have to pay for it (as does every other provider). We aren't yet in a position where we can just make it free without limits. Maybe next year. :)

I'm actually very curious: why is bandwidth so much cheaper on more traditional VPS or dedicated server hosts like Hetzner ? This extends to their somewhat new-ish cloud product, where you get 20TB traffic included - even on a tiny instance. And it's 1 Euro per TB after that. [1]

Do they just decide to not profit from bandwidth or are they doing something special that allows them to be so cheap?

[1] https://docs.hetzner.com/robot/general/traffic/

There are three ways to manage bandwidth prices:

1. Put servers where bandwidth is cheap (not Sydney, for example)

2. Constrain throughput per server

3. Buy from cheap transit providers like Cogent

Hetzner does all three. Bandwidth in the US/EU is very cheap. They meter total throughput on their services. And they use cheap providers. None of these are bad choices, just different than ours.

Our product has multiple layers, too. When you connect to a Fly app, you hit our edge, then traffic goes to a VM that's probably in another region. When you hit a hetzner server, there are no intermediate hops.

We usually pay that three times as data moves from customer VMs to our edges to end users (out from our edge to worker vm, out from worker vm to our edge, out from our edge to end user). Or 10x, in some cases, if data moves from Virginia to Chennai to an end user.

We pay $0.005/GB in the US and $0.9/GB in Chennai. You can see how this might add up. :)

The chennai pricing in incredibly steep. It's weird that bandwidth prices for end-users in India is among the cheapest (if not the cheapest) in the world but enterprise customers are paying so much.

Jio and friends are (were?) subsidizing connectivity and trying to drive growth to make it up later/elsewhere like payments and services.

Service providers come to the opposite problem. Local infra sucks, the market is full of incumbents, and india is generally protective of those markets.

Very similar dynamics with china as well.

$0.9 is a typo, I meant $0.09/GB in Chennai. We're not taking that much of a haircut.

You forgot play the peering game which can lead to substantially cheaper bandwidth when you connect to enough Tier1/2 ISPs. I think Hetzner does this some as well.

transit pricing can still be had readily for ~.10 cents per megabit. that's per second mind you. so you pay for capacity not for bits transferred. {$cloudprovider} makes a sweet margin by upselling that to you in terms of bytes transferred. CF wrote a post explaining it [1]

Obviously, you would need to manage servers,colo,network and keep it up on your own, or pay for it. And cloud providers offer alot of value as well. but if you are at the right scale you can diy it (in-house ops/network team) you can save ALOT of money.

1 - https://blog.cloudflare.com/aws-egregious-egress/

I would love to hear the answer to this as well.

My guess is that they either don't profit from bandwidth or they have peering agreements from back when the internet was young.

Having a billing limit is important to me as I don't want to risk unbounded costs for personal projects.

> we'll just suspend your apps when the credit runs out.

This sounds great! I've looked at Fly.io before but didn't realise this was a thing so didn't go past looking. I'll definitely give Fly.io a test run now. :)

I can't speak for your whole market, but I know for me the pre-loading flow sounds really clunky because I'd have to go and manually add funds each month (right?)

It's understandable if your usage data showed the fee-capping feature just wasn't popular enough to be worth maintaining, though that would surprise me based on this thread (but possibly HN just isn't representative of the whole market)

I’d be happy with a “refill if under $10 up to $x/month”. If they can cut off when you’re out of credit they can presumably do it for other criteria, too.

Do you also have a monthly prepayment subscription where the balance is topped up to a fixed (ideally, optionally calculated automatically via web UI by the user)? Naively that's what I'd expect the solution to look like.

This. I've had a very real situation with DataDog: I've accidentally created a Synthetics test (and forgot about it) on a test account and incurred $5k+ bill at end of month. Zero notice or anything.

I ended up swallowing the bill but will not be using them again since this is plain scary.

Edit: funny thing - to add insult to injury, I've started to hear from their sales people on "my growth plans", even though I've had a support ticket hoping to resolve this to no avail.

Datadog is one of the worst offenders in this one. It's like they actually want to get you into that situation. Wanna see your prepaid capacity? Lol, email our support. Wanna use front-end monitoring? Here's an example with the defaults hidden. (Oh btw the defaults enable all the features and set the sampling to 100%) How do I turn on sampling? Don't worry, limitless tracing is the way, we'll figure out how to filter your traces. (Oh btw you'll still pay for ingestion, we're just omitting it here)

Yeah, if I'm going to be signing up with my credit card I'll just log back into my AWS account since I have already pricing monitoring setup for that stuff. I definitely don't want to worry about it (surprise costs) for scratch / hobby stuff, let alone monitor bills for it.

With Heroku it is stupidly simple to get things going, no credit cards, "fire and forget". Works great for hobby projects, and examples you want to show.

This topic has been discussed many times. There's no easy solution to what happens after the cap is exceeded - should the vendor delete everything? - which is why so many of them don't bother.

And for customers, it's far easier to negotiate billing disputes then to try and recover from an account deletion because of spending caps (and there have been plenty of examples of companies shutting down because of such a mistake here).

It's pretty simple: Ask people when signing up if they want to set a spending limit or not. If they do, return HTTP 500 as soon as the spending cap is reached. If it's for storage, return a quota exceeded error. Obviously the vendor shouldn't delete data, that would be stupid.

A lot of people try to get hobby users on a platform as a form of PR -- if you don't have spending caps, you are probably going to scare a lot of them away.

And I've never heard a company shutting down because they exceeded a limit -- I've only heard of people being surprised by unintended extremely high bills.

> "Obviously the vendor shouldn't delete data, that would be stupid"

Since data accrues charges over time, there is no alternative if you want a hard cap. Which is why none of this is very simple at all and requires a tremendous amount of planning and complexity just to implement, let alone all the possible new issues and mistakes it creates for customers.

Again, this is the typical "write some code in a weekend" approach that's missing all context of what it actually takes. And as I mentioned before, it's far easier to just negotiate billing then to deal with the aftereffects of whatever service and data disruption this feature would cause for the tiny fraction of customers that end up with this problem.

Waving your $5 fee is far easier and cheaper than spending millions on trying to avoid it in the first place, only to get replaced with potential complains that a production account was suspended or deleted.

It's very easy to solve this problem. Just let people set a cap on the things with usage billing:

"I want to cap the amount of data I store to 50GB and the amount of traffic I serve to 1000GB"

Of course there is an obvious problem with this, the pricing structure would become transparent and you don't want that as a cloud provider. You want your customer to just pay his bills and not even know why it costs this much.

> "an obvious problem with this, the pricing structure would become transparent "

How is the pricing structure not currently transparent? What number are you missing exactly?

Imo it’s really simple, actually. Stop the service and delete after the next billing cycle. Price that in for regular price of the service.

That's limited to free tiers that usually deactivate or delete if there's too much usage.

Anything with production usage and pay-as-you-go pricing means data at rest still costs money - and requires deleting to avoid accruing new charges. Do you want your databases and volumes and object storage deleted when your app stops?

And if this was offered then there would be a whole new class of mistakes leading to lost data. Like I said, billing is easier to negotiate than deleted accounts.

The high charges are almost never storage costs, its almost always processing or bandwidth.

And storage costs are simple to predict -- as soon as you see the cap would be exceeded, stop accepting new data.

Overages are overages, whether its 1 penny or thousands of dollars. Data also still costs money per time and can be unpredictable.

This kind of pacing and billing buffer is an immense amount of complexity at scale for very little benefit (even if an individual user might like it).

Can be unpredictable? How would data storage costs increase if you stopped accepting new data?

SaaS companies manage to pull of ridiculously complicated things, but coming up with a billing scheme that does fuck over the customer is asking too much?

The simple truth is that usage based pricing is designed to be unpredictable, and surprising customers with high bills is probably considered a feature, not a bug.

Data (and all resources) cost money per time. You don't need to add new data to an S3 bucket to get a bill every month because the existing data accrues charges.

The billing scheme is very transparent and friendly by being pay-as-you-go. It doesn't "fuck over the customer".

Your entire complaint isn't about the pricing scheme but about an additional feature to stop billing at some point - which I've explained is not easy to calculate precisely because charges accrue on time and adding even more complexity for calculations and potential for mistakes is not worth it for the many reasons I outlined previously.

> "The simple truth"

You keep repeating that word. There is nothing simple about this. Let's end this here.

> You don't need to add new data to an S3 bucket to get a bill every month because the existing data accrues charges.

Yes, but it's pretty predictable. Once there's enough data in your bucket that the monthly cost would go over the limit, just stop accepting new data.

Nobody cares if the spending limit is accurate to the cent. What people care about is not being surprised by huge invoices.

> The billing scheme is very transparent and friendly by being pay-as-you-go.

Have you looked at eg. the glacier pricing scheme, or at lambda pricing? It's almost impossible to know how much it's going to cost you ahead of time. The only thing you know is that if you happen to use it differently than anticipated, it's going to be expensive.

> "it's pretty predictable"

It quite literally isn't, otherwise there would be no billing surprises in the first place. Your entire argument about predictability is counter to the problem of unpredictable charges.

> "just stop accepting new data"

This is still effectively data loss and a major problem in production. Customers would rather negotiate a bill than lose data.

> "Nobody cares if the spending limit is accurate to the cent. What people care about is not being surprised by huge invoices."

Then it's a soft-cap, and if that's all you want then you already have billing alarms. Otherwise what's the buffer amount? What overage is acceptable? Is there a real hard cap? What if that's reached? You didn't actually provide any solution here.

> "the glacier pricing scheme, or at lambda pricing ... It's almost impossible to know how much it's going to cost you ahead of time."

How so? AWS is completely transparent about pricing. The calculations for it might be hard, but that's an entirely different issue. There are plenty of tools you can use if you don't want to do it yourself, however this is another logically incongruent point where you claim billing is easy enough to calculate and predict accurately for caps yet simultaneously hard enough that it's "almost impossible".

You are twisting my words. I said it's easy to predict storage costs, to counter your claim that you'd need to delete data to stay within budget.

The biggest problem are mainly exorbitant bandwidth costs, and those are trivial to cap -- just stop serving requests.

Also, billing alarms are not a soft cap. They don't prevent you from waking up in the morning to a 5000€ bill.

> You didn't actually provide any solution here.

I'm commenting on the internet, I don't need to come up with a way for AWS to implement billing caps, especially since they have designed their service pricing in a way that makes estimates really hard.

But for most services, billing caps really aren't that hard, especially since the company we are discussing here (fly.io) apparently already allows billing caps if you prepay (according to other comments here).

> "it's easy to predict storage costs"

You're just repeating this. Predictable is the opposite of surprise.

Even if storage use was very stable, so what? The overall bill is the problem so where the charges come from doesn't matter, only that eventually a limit is crossed. An overage is still an overage and the only way for billing to stop immediately is to delete and drop everything. This is the fundamental issue that you're not considering. It's what happens at the limit, not about how you get there.

> " billing alarms are not a soft cap"

Soft caps that don't actually stop anything are effectively nothing more than billing alarms. What else is their purpose?

> "I don't need to come up with a way for AWS to implement billing caps"

I didn't ask for implementation, I'm inquiring as to what logically is supposed to happen in the scenarios that occur based on your proposed "pretty simple" solution. If you can't answer then it's not so simple is it? You either haven't thought it through entirely to conclude that it's not actually possible to do that way.

> "designed their service pricing in a way that makes estimates really hard"

How so? You also keep repeating this without evidence. How is providing numbers on exactly what they charge for make it difficult? It's as transparent as it gets. They also have a calculator on their site. What more are you expecting?

> "for most services, billing caps really aren't that hard"

The nature of the service changes everything. Fly.io doesn't have billing caps, they just stop the apps when the credits run out and eat the bandwidth cost for now. The economics of scale can change that answer drastically, however even Fly repeats what I've said before: "the majority of people want their stuff to stay up" and "shut it down so you don't get billed" is usually not the preferred solution compared to negotiating a large bill.

> I'm inquiring as to what logically is supposed to happen in the scenarios that occur based on your proposed "pretty simple" solution.

Here's the simplest solution: If the limit is reached, stop serving requests, stop accepting new data, but don't delete any data. Allow static storage costs to go over the limit. That is probably what 99% of people who ask for a budget cap want, and it's the most logical thing to do because typically 99% of the charges are for bandwidth/requests/processing and only 1% for storage. If I set a limit at 10€ and amazon ends up charging me 10.2€ I can live with that.

The next simplest solution would be to look at how much is currently stored, multiply that with the storage cost per hour, multiply that with the remaining hours in the month, then subtract that from the monthly budget, and stop serving requests or accepting new data as soon as this lower limit is reached. This will guarantee that you never go over the limit without having to delete data. If data in storage is deleted before the end of the month, you'll end up spending less than the limit.

Now if you consider this basic math too complicated for a software engineer making $300000 a year, you could do something even simpler: allow the customer to set a limit on each resource. Eg. let the customer say, I want to limit S3 to 100GB of storage and 5TB of bandwidth and 2 million requests (or whatever). Of course that would be a bit of a hassle for the customer, but it would be very effective at preventing surprise charges.

> the majority of people want their stuff to stay up

At any cost? That's unlikely. I'm pretty sure that every company has some limit where they'd prefer the service to be down rather than pay the bill.

But if you go back up the thread you'll see that this discussion is about hobby users and tinkerers, and people who just sign up to try out the tech. These people absolutely want a hard cap on potential charges, and if you don't offer that you might scare them away.

> billing is easier to negotiate than deleted accounts

Maybe, but I'd still like the option. My blog site is stateless so I couldn't care less if the service got deleted

I don't want unlimited bandwidth (I mean, it'd be nice - but I realize it costs money) -- I want predictable pricing and spending caps for my personal projects.

If I can't have that, then I just can't take the risk of my site getting hammered (by attackers, getting linked on HN, whatever) and racking up some kind of bill I can't pay. I realize the chances of that are small but that's scary.

I realize that, obviously, "stay up at all costs" is what a business needs and that's where the money is. Fly.io won't get rich off of personal projects. But, I do think they serve to get developers onto the platform.

Your bandwidth allocations and rates seem very fair and generous, btw. I do really want to check out your platform.

> And a bit of an own goal on our part.

More than a bit.

Simply give people the option to put a charge limit and let the app be offline when that limit gets hit. Don't make it the default, but do allow people to do it.

This would resolve 99% of the fear people have. And most people wouldn't set the limit anyway. However, your knowledgable people might set it, and those are the ones you're most trying to attract.

When I worked in adtech (Brightroll) we hosted a segment mapping server on heroku. All it did was look at a url query for the target url, append a value from a cookie to it, and 302 to that url. The value in adtech is that I can tell you which segment a user is in by taking the data in my cookie and appending it to your url.

We launched it on Heroku using Nodejs and sent billions of requests a day to it. The thing ran on like a few dozen dynos, but it was responsible for like 20-30% of all of Heroku's bandwidth. Fantastic value for us. Immediate headache for Heroku.

> We'll never complain that you're serving the wrong content type.

Cloudflare shouldn't restrict media (video, images, and audio) from its unlimited bandwidth promise for Workers and R2 (though, ToS doesn't yet reflect that).


> But the reality is – having an unlimited bandwidth promise is perfect for for a fire and forget blog site

I think, an auto flyctl pause -a <myapp> when myapp exceeds $x in charges (with an auto-resume when the billing rolls over) may serve as a viable interim solution. May be this is already possible with fly.io's graphql api?

Yup. Not too hard. This outputs the current bandwidth in gigabytes when you pass your app name.

  #!/usr/bin/env bash

  set -euo pipefail

  QUERY=$(cat <<EOF
          "start":"$(date +%Y-%m)",
          "end":"$(date -d "$(date +%Y%m01) +1 month -1 day" +%Y-%m-%d)"
      "query":"query (\$app: String!, \$start: ISO8601DateTime!, \$end: ISO8601DateTime!) { app(name: \$app) { organization { billables(startDate: \$start, endDate: \$end) { edges { node { app { id } category quantity } } } } } } "

  curl https://api.fly.io/graphql \
      -H "Authorization: Bearer $(fly auth token)" \
      -H 'content-type: application/json' \
      --compressed \
      -X POST \
      --data-raw "$QUERY" \
      --silent |
      jq '[.data.app.organization.billables.edges[].node | select(.app.id == "'"$1"'" and .category == "data_out") | .quantity] | add'

I might be in the minority here, but I don’t have the slightest clue what my monthly bandwidth currently is on heroku, and so I can’t estimate what Fly will cost.

Um, how do I find this out? Preferably historical usage.

I only found out my bandwidth after putting cloudflare infront of my apps as they email a monthly bandwidth saved report.

I think it's also fine to just say "that's not our primary target market". Just thought it was worth pointing out as a (perhaps small?) segment of Heroku's market, if we're comparing apples to apples

Oh but you are! And it won't even cost you anything. I'd bet money your blog fits in our free tier, we just don't (a) tell you that and (b) solve the "what happens if there's a bandwidth burst" problem.

Yeah, I also supposed that it would probably mostly fit in the free tier, which is great. But I'd lose a small amount of sleep over the possibility of getting a huge bandwidth burst (DDOS or otherwise) that goes straight to my bank account

A feature that could help would be giving people the option to set a cost limit, where if their site surpasses that limit in a given month you just pull it offline instead of charging more money. That's what I'd want for my blog site, and I've heard others request such a feature from other cloud providers

Do you hate Cloudflare?

This is the reason I use Heroku and would never on a personal level use some of the bigger solutions, or apparently Fly.io. As an individual and despite being generally careful, I just cannot have a tiny risk of having a $100k+ accidental bill. I'd rather my project goes down if there's a DDOS attack, or if I made a typo and created an infinite loop. If I take some of my hobby projects to the "next level", it'd be def behind a LTD company.

Is it possible to rack up a 100k+ bill with personal projects on platforms like fly.io? Like some absolutely massive DDOS or something?

It could also be a simple user error- there’s been stories on HN on how people racked up massive bills for example because CI was running continuously etc.

Have you looked at render? Been using it for my hobby sites and am very happy with it.

I also really like render, very easy to use.

Does that mean if my webapps get DOSed or something like that, and I can‘t react very quickly, I could face a bill potentially in the thousands of dollars?

Currently considering switching from Heroku, but fixed pricing is a must. I‘d rather they shut down my apps temporality in case something is out of control, then get broke ;-)

Any other recommendations besides fly.io?

No, we don't bill people for traffic from attacks. We also waive fees from big, legitimate bursts. The intent of our bandwidth pricing is to allow high usage, sustained bandwidth workloads. It's not to sneak one over on you.

How about large traffic from a legitimate spike (e.g. front page of reddit or HN)?

You have enumerated a lot of alternatives so far (prepaying, waiving attack costs) but you still haven't addressed the number one scenario that everybody has been asking about, and which was the reason why Heroku was such a hit: do you offer a flat fee which, if exceeded, simply shuts down the app until the next billing cycle?

"We also waive fees from big, legitimate bursts."


Great question. I want to know the answer to it.

I apologize for being skeptical but the wording in the contract seems to be extremely handwavy and they don't give me any confidence that if my bill goes from $4 to $200 that month because I made it to the front page of reddit, my bandwidth will magically be waived.

Right now, I feel 100% sure that my credit card would get charged.

Contractually? I doubt it. But they have refunded many, many people on their forums after they report accidental charges. Whether they continue is a different story, but I feel safe using it and certain that I won't be overcharged.

To be clear a bandwidth limit would be awesome! And the pricing may not be for everyone. However there is a large amount of leeway as evidenced by the community forum posts.

Good will does not feel like a good way to scale this. For a business that might be ok, but I’m highly unlikely to recommend to a business something I haven’t personally investigated, and I’m not writing cheques whose value is dependent on the good will of a startup.

I don’t understand why cloud providers will not accommodate this basic “prepay to X and allow me to use the credit” model.

>I don’t understand why cloud providers will not accommodate this basic “prepay to X and allow me to use the credit” model.

They actually do:

> You can configure fly.io apps with a max monthly budget, we'll suspend them when they hit that budget, and then re-enable them at the beginning of the next month.

From their Launch HN: https://news.ycombinator.com/item?id=22616857

According to this they got rid of the feature at some point: https://news.ycombinator.com/item?id=31391451

How about you just offer the option to throttle free instances once they run out free bandwidth? That way your free tier will be "safe".

Big concern of mine as well. They take your CC for usage, which is reasonable given bad actors, but then I can’t put a limit on my monthly charges.

I use privacy.com for things like this. Make as many CCs as you want with whatever limits you want.

That's also something that has kept me with VPS hosts over cloudy things for hobby stuff: a) included traffic amount is vastly higher there, leading to way lower cost per GB if you need it and b) they usually do just cap you if you exceed traffic and don't opt-in to pay more.

I've actually been deterred from Heroku because of the pricing. I find $7/month too much for a website or a blog, especially as I like starting new projects. There may be months where nobody visits my sites. I pay 2.49 euros/month for a VPS in Germany with 20 TB of free monthly bandwidth, 2GB RAM, 20 GB of disk space, no brainer. It's just a bit more manual.

Just use a static hoster for a blog site (like netlify or even github pages).

I've got a small amount of dynamic content

DigitalOcean App Platform has a $5/month flat rate for dynamic stuff on top of two static sites hosted for free...

if something goes crazy and you end up using a wild amount of outbound data, it looks like the next jump up is only to $12

Depends on what you would consider a "wild amount".

Their app platforms' bandwidth pricing is pretty painful at 0.10$/GB. With these prices and considering the app platform lacks functionality like multi-regional droplets or VPC integration, they are a subpar choice even compared to Firebase or Amplify.

Author of the post here. Fun fact: if I paid that for how much traffic I'm getting, this post would cost me $5 for being on the front page of hacker news for a few hours.

Most of static site hosts nowadays provide (serverless) functions for you to do something on server side.

what if they want to use a CMS?

https://forestry.io could help

After all the chatter this week, I've come to the conclusion that Heroku froze at the perfect time for my 4 person company. All of these so called "features" are exactly what we don't want or need.

1. Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.

2. File persistence is fine but not typically necessary. S3 works just fine.

It's easy to forget that most companies are a handful of people or just solo devs. At the same time, most money comes from the enterprise, so products that reach sufficient traction tend to shift their focus to serving the needs of these larger clients.

I'm really glad Heroku froze when it did. Markets always demand growth at all costs, and I find it incredibly refreshing that Heroku ended up staying in its lane. IMO it was and remains the best PaaS for indie devs and small teams.

> Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.

Guess what? fly.io offers a turnkey distributed/replicated Postgres for just this reason. You use an HTTP header to route writes to the region hosting your primary.


You do still need to consider the possibility of read replicas being behind the primary when designing your application. If your design considers that from day 1, I think it takes less away from solving your business problems.

Alternatively, you can also just ignore all the multi-region stuff and deploy to one place, as if it was old-school Heroku :-)

> Guess what? fly.io offers a turnkey distributed/replicated Postgres for just this reason. You use an HTTP header to route writes to the region hosting your primary.

Doesn't this take away a lot of the benefits of global distribution?

For example if you pay Fly hundreds of dollars a month to distribute your small app in a few datacenters around the globe but your primary DB is in California then everyone from the EU is going to have about 150-200ms round trip latency every time you write to your DB because you can't get around the limitations of the speed of light.

Now we're back to non-distributed latency times every time you want to write to the DB which is quite often in a lot of types of apps. If you want to cache mostly static read-only pages at the CDN level you can do this with a number of services.

Fly has about 20 datacenters, hosting a small'ish web app that's distributed across them will be over $200 / month without counting extra storage or bandwidth just for the web app portion. Their pg pricing isn't clear but a fairly small cluster is $33.40 / month for 2GB of memory and 40GB of storage. Based on their pricing page it sounds like that's the cost for 1 datacenter, so if you wanted read-replicas in a bunch of other places it adds up. Before you know it you might be at $500 / month to host something that will have similar latency on DB writes as a $20 / month DigitalOcean server that you self manage, Fly also charges you $2 / month per Let's Encrypt wildcard cert where as that's free from Let's Encrypt directly.

You don’t need to route every write to primary though, but only those writes that have dependencies on other writes. Things like telemetry can be written in edge instances. Depends on your application of course, but in many cases that should be only a tiny fraction of all requests needing redirects to primary.

And why would you get 20 instances, all around the world right out of the gate? 6-7 probably do the job quite well, but maybe you don’t even need that many. Depending on where most of your customers are, you could get good results with 3-4 for most users.

> You don’t need to route every write to primary though, but only those writes that have dependencies on other writes.

Thanks, can you give an example of how that works? Did you write your own fork of Postgres or are you using a third party solution like BDR?

Also do you have a few use cases where you'd want writes being dependent on another write?

> 6-7 probably do the job quite well

You could, let's call it 5.

For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?

I was thinking maybe 5 background workers wasn't necessary but frameworks like Rails will put a bunch of things through a background worker where you would want low latency even if they're happening in the background because it's not only things like sending an email where it doesn't matter if it's delayed for 2 seconds behind the scenes. It's performing various Hotwire Turbo actions which render templates and modify records where you'd want to see those things reflected in the web UI as soon as possible.

> Thanks, can you give an example of how that works?

I just noticed I formulated it wrong, my apologies. What I meant is that the replicating regions don’t need to wait for the primary writes to go through before they respond to clients. They will still be read-only Postgres replicas, and info could be shuttled to primary in a fire-and-forget manner, if that’s an option.

Whenever an instance notices that it‘s not primary, but it is currently dealing with a critical write, it can refuse to handle the request, and return a 409 with the fly-replay header that specifies the primary region. Their infra will replay the original request in the specified region.

> Did you write your own fork of Postgres or are you using a third party solution like BDR?

When using fly.io, the best option would probably be to use their postgres cluster service which supports read-only replicas (can take a few seconds for updates to reach replicas): https://fly.io/docs/getting-started/multi-region-databases/

> For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?

Maybe. A few thoughts:

- Why would you need 5 web workers, would one running on primary not be ideal? If you need so much compute for background work, then that’s not fly‘s fault, I guess.

- Not sure the Postgres read replicas would need to be as powerful as primary

- Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks

> Why would you need 5 web workers, would one running on primary not be ideal?

It's not ideal due to some frameworks using background jobs to handle pushing events through to your web UI, such as broadcasting changes over websockets with Hotwire Turbo.

The UI would update when that job completes and if you only have 1 worker then it's back to waiting 100-350ms to reach the primary worker to see UI changes based on your location which loses the appeal of global distribution. You might as well consider running everything on 1 DigitalOcean server for 15x less at this point and bypass the idea of global distribution if your goal was to reduce latency for your visitors.

> Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks

A number of web frameworks let you use Redis as a session, cache and job queue back-end with no alternatives (or having to make pretty big compromises to use a SQL DB as an alternative). Also, Rails depends on Redis for Action Cable, swapping that for SQLite isn't an option.

For low-latency workers like that it might make sense to just run them on the same instance as the web servers.

Does Fly let you run multiple commands in separate Docker images? That's usually the pattern on how to run a web app + worker with Docker, as opposed to creating an init system in Docker and running (2) processes in 1 container (this goes against best practices). The Fly docs only mention the approach of using an init system inside of your image and also tries to talk you into running a separate VM[0] to keep your web app + worker isolated.

In either case I think the price still doubles because both your web app and worker need memory for a bunch of common set ups like Rails + Sidekiq, Flask / Django + Celery, etc..

[0]: https://fly.io/docs/app-guides/multiple-processes/

It's interesting that their bash init uses fg %1. That may return only on the first process changing state, rather than either process exiting. It should probably use this instead:

  #!/usr/bin/env bash
  /app/server &
  /app/server -bar &
  wait -f -n -p app ; rc=$?
  printf "%s: Application '%s' exited: status '%i'\n" "$0" "$app" "$rc"
  exit $rc

That looks a million times better than the horrible hack I wrote. Do you want credit for it when I fix the doc?

Only if it's credited to either "IPBH" or "Some Bash-loving troll on Hacker News" (ninja edit, sry)


It sounds like you're asking if we offer some alternative between running multiple processes in a VM, and running multiple VMs for multiple processes. What's the third option you're looking for? Are you asking if you can run Docker inside a VM, and parcel that single VM out that way? You've got root in a full-fledged Linux VM, so you can do that.

> Are you asking if you can run Docker inside a VM, and parcel that single VM out that way? You've got root in a full-fledged Linux VM, so you can do that.

On a single server VPS I'd use Docker Compose and up the project to run multiple containers.

On a multi-server set up I'd use Kubernetes and set up a deployment for each long running container.

On Heroku I'd use a Procfile to spin up web / workers as needed.

The Fly docs say if you have 1 Docker image you need to run an init system in the Docker image and manage that in your image, it also suggests not using 2 processes in 1 VM and recommends spinning up 1 VM per process.

I suppose I was looking for an easy solution to run multiple processes in 1 VM (in this case multiple Docker containers). The other 3 solutions are IMO easy because once you learn how they work you depend on the happy path of those tools using the built in mechanisms they support. In the Fly case, not even the docs cover how to do it other than rolling your own init system in Docker.

If you have root, can I run docker-compose up in a Fly VM? Will it respect things like graceful timeouts out of the box? Does it support everything Docker Compose supports in the context of that single VM?

This is embarrassingly non obvious in the docs, but you can run workers/web just like you would on Heroku: https://community.fly.io/t/preview-multi-process-apps-get-yo...

Most people run workers in their primary region with the writable DB, then distribute their web/DB read replicas.

The document you cited (I wrote it!) is entirely about the different ways to run multiple processes in 1 VM.

There's no reason I can see why you couldn't run a VM that itself ran Docker, and have docker-compose run at startup. I wouldn't recommend it? It's kind of a lot of mechanism for a simple problem. I'd just use a process supervisor instead. But you could do it, and maybe I'm wrong and docker-compose is good for this.

What you can't do is use docker-compose to boot up a bunch of different containers in different VMs on Fly.io.

I think docker-compose is pretty good at this. One advantage is that you get a development environment and a production setup in a singe config file.

I feel like this setup might make quite a lot of sense if you have a bunch of micro services that are small enough that they can share resources.

Not sure how Ruby works, but can you not run the workers and the web server in the same process? In our Node.js apps, this is as simple as importing a function and calling it.

Most of the popular background workers in Ruby run as a separate process (Sidekiq, Resque, GoodJob). The same goes for using Celery with Python. I'm not sure about PHP but Laravel's docs mention running a separate command for the worker so I'm guessing that's also a 2nd process.

It's common to separate them due to either language limitations or to let you individually scale your workers vs your web apps since in a lot of cases you might be doing a lot of computationally intensive work in the workers and need more of them vs your web apps. Not just more in number of replicas but potentially a different class of compute resources too. Your wep apps might be humming along with a consistent memory / CPU usage but your workers might need double or triple the memory and better cpus.

Yeah, it definitely makes sense to be able to scale workers and web processes separately. It just so happens that they app I work on for my day job is:

1. Fairly low traffic (requests per minute not requests per second except very occasional bursts)

2. Has somewhat prematurely been split into 6 microservices (used to be 10, but I've managed to rein that back a bit!). Which means despite running on the smallest instances available we are rather over-provisioned. We could likely move up one instances size and run absolutely everything on the one machine rather than having 12 separate instances!

3. Is for the most part only really using queue-tasks to keep request latency low.

Probably what would make most sense for us is to merge back in to a monolith, but continue to run web and worker processes separately I guess. But in general, I there is maybe a niche for running both together for apps with very small resource requirements.

Why are people tripping over $2/mo ? I don’t get this tight-ass mentality. It’s a rounding error.

It's not the $2/mo at face value. For me it's the idea of them pushing to make paying for SSL certificates the norm again after Let's Encrypt has put in a huge amount of effort to change that field. Not that there's anything wrong with charging for things but charging for a free service rubs me in a weird way, especially since they're using Let's Encrypt.

Using the rounding error logic, how do you feel about companies adding $1.99 "convenience fees" or "administrative fees"?

why spend money when you don’t have to?

One reason is that many people use that as a baseline for how they I'd multi-tenacy. It could be they just proxy resources from their customer down to the infrastructure.

And in today's world all data needs to be federated between national borders anyway. Try doing business in China if the user data isn't stored in China. Or Russia. Or the EU. Modern designs need the data layer to be forked between regions, not replicated, with merges between the forks.

On top of that, most replication systems are brittle and create logistical and administrative headaches. If you can get by with just rsync, do.

Yes, there are hundreds of different ways you could accomplish this. Fly.io is a convenient and easy to use one.

Hot take: if people spent half the energy doing multi-region that they today spend screwing around with Kubernetes, they’d be a hell of a lot more reliable.

I think people misconstrue the benefits of k8s to be related to reliability or similar. Ultimately it's about the API and the consistency and productivity it offers.

For larger teams having a well defined API that delineates applications from infrastructure that doesn't require extreme specialist knowledge (it still requires some specialist knowledge but vastly less than direct manipulation of resources via something like Terraform) is a massive productivity boost.

Of course none of that matters if you have 4 developers like OP but for folks like myself that routinely end up at places with 300+ engineers then it's a huge deal.

> I think people misconstrue the benefits of k8s to be related to reliability or similar. Ultimately it's about the API and the consistency and productivity it offers

I think this is the first time I've heard somebody say one of the benefits of kubernetes was productivity.

Really? I think it's a pretty obvious benefit. If you bundle something into a container, you can probably run it in kubernetes. This uniformity makes it incredibly easy to deploy and scale new applications.

Yeah if you study over it instead of copy pasting stuff from the internet, I find k8s the best thing for my small projects. I only have to setup a simple dockerfile and helm chart and I can run a new service in my cluster on DO, for which they offer free control plane, and not be billed for a completely new app and have to setup all my deps and env vars in a clunky UI. I can setup scaling, ingress easily, the Datadog agent is going to pick it up automatically, I can have services communicating via private dns etc. etc.

I am not an ops guy.

+1 this is what I do as well. If you have any semblance of uniformity in your project folder structure, you can even automate the build/deploy process with a simple shell script/bash function.

Of course, this quickly stops working once your small projects grow to have multiple collaborators, a staging environment, etc. - but at that point you're running a proper business

I think what I've heard is the kubernetes end result is very often a massively overcomplicated infrastructure that nobody understands, that's a constant source of headaches and lost time due to leaky abstractions.

Disclaimer: I've never actually used it myself. That's mostly just what I've read and heard from people who use kubernetes.

Basically depends on expertise. The parent commenter probably comes from a team of good well paid ops engineers who understand and set up k8s well. In any other org it’s the show you describe.

It's like what Hedberg said about rice: k8s is great if you're really hungry and what to host two thousand of something.

The same is true of ECS, but with a much simpler API, much tighter integration with load balancers, a no-charge control plane, and not having to upgrade every cluster every 3-6 months.

I’ve had 2 services running flawlessly in ecs for over a year (with load balancing) without having to touch them. Took me all of 15m to set them up. It’s quite good.

Fargate is really nice too—it takes the benefits of ECS even further to the point that the underlying EC2 instances are abstracted away almost completely.

Fargate with auto-scaling plus Aurora for the DB is pretty great in terms of a nearly zero-maintenance setup that can handle just about any type of service up to massive scale.

Unfortunately getting all the networking and IAM stuff right is still pretty thorny if you want to do things the correct way and not just run it all in public subnets. And the baseline costs are significantly higher than a paas.

We're running Nomad, but that's just a detail. The great thing for both development teams and the ops team is that the container orchestration starts working as a contract between the teams. This allows building more of a framework in which we can provide some initial starting point for standard tasks, like connections to infrastructure components, migrations in various languages, automated deployments, rollbacks and so on for teams out of the box. With this, product teams can go from an idea to deployed and running code very quickly and confidently.

As someone that was a very active Heroku user for years and then worked there for years: I wouldn't trust it as my host. There is nowhere near enough people maintaining it in order to have confidence it'll run without reliability or security issues. They aren't exactly in a position to retain or attract talent either.

I thought Cedar was going to fall over years ago but ironically I think people migrating off the platform are helping it stay alive.

I’m always confused why edge services are always selling points given point 1. The most basic of backend services won’t be able to completely utilize edge services.

It’s a tremendous latency speed up for read heavy apps that can tolerate eventually consistent read replicas. Any app using a popular sql rdbms likely falls into this category at scale. Any app using a redid cache likely falls into this category at scale.

Also any app that has global clients and terminates ssl likely benefits from edge compute.

Yep, for anyone confused on how this works:

You'd still be sending writes to a single region (leader). If the leader is located across the world from the request's origin, there will be a significant latency. Not to mention you need to wait for that write to replicate across the world before it becomes generally available.

This is the distribute-your-Rails-app-without-making-any-code-changes version of that story. It works great for apps that are 51% or more read heavy. You drop our library in, add a region, and off you go. The library takes care of eventual consistency issues.

HTTP requests that write to the DB are basically the same speed as "Heroku, but in one place". If you're building infrastructure for all the full stack devs you can target, this is a good way to do it.

Distributing write heavy work loads is an application architecture problem. You can do it with something like CockroachDB, but you have to model your data specifically to solve that problem. We have maybe 5 customers who've made that leap.

In our experience, people get a huge boost from read replicas without needing to change their app (or learn to model data for geo-distribution).

It's also trivial to serve read requests from a caching layer or via a CDN. At any sufficient scale, you're probably going to need a CDN anyway, whether your database is replicated or not. You don't want every read to hit your database.

I don't think this is that trivial. I've never seen it done correctly. It typically manifests itself as not being able to read your own writes, and I see this all the time (often from companies that have blog posts about how smart their caching algorithm is). For example, you add something to a list, then you're redirected to the list, and it's not there. Then you press refresh and it is.

I guess that's acceptable because people don't really look for the feedback; why do users add the same thing to the list twice, why does everyone hit the refresh button after adding an item to the list, etc. It's because the bug happens after the user is committed to using your service (contract, cost of switching too high; so you don't see adding the cache layer correspond to "churn"), and that it's annoying but not annoying enough to file a support ticket (so you don't see adding the cache layer correspond to increased support burden).

All I can say is, be careful. I wouldn't annoy my users to save a small amount of money. That the industry as a whole is oblivious to quality doesn't mean that it's okay for you to be oblivious about quality.

(Corollary: relaxing the transactional isolation level on your database to increase performance is very hard to reason about it. Do some tests and your eyes will pop out of your head.)

> It typically manifests itself as not being able to read your own writes

Multi-region databases with read replicas face the same issue

Database read request are not the same as readonly HTTP requests. I am much happier having all requests hit my app process than I am trying to do the CDN dance.

Right now your choices are: run a database in on region and:

1. Use the weird HTTP header based cache API with a boring CDN

2. Write a second, JS based app with Workers or Deno Deploy that can do more sophisticated data caching

3. Just put your database close to users. You can use us for this, or you can use something like Cloud Flare Workers and their databases.

My hot take is: if something like Fly.io had existed in 1998, most developers wouldn't bother with a CDN.

Weirdly, most Heroku developers already don't bother with a CDN. It's an extra layer that's not always worth it.

> It's easy to forget that most companies are a handful of people or just solo devs.

I have the same complaint all the way down to simple sysadmin tasks. Ex: MS365 has a lot of churn on features and changes. It’s like they think everyone has a team of admins for it when in reality a lot of small businesses would be satisfied with a simple, email only product they can manage without help.

I strongly agree with your last paragraph. I used Heroku for my wedding website and I would 100% use it again on a project site.

In about 15 minutes I was able to take my site from localhost to a custom domain with SSL with just a little more than a git push. I can't think of many solutions that are simpler than that.

This is literally Netlify's core offering.

Vercel and Github pages would be better IMO if it's a static site.

If it’s a static site then just dump it in an s3 bucket and be on your way.

Static websites have their own issues but an S3 bucket is probably the worst hosting mechanism for them these days. The other services mentioned are much nicer and easier to deal with.

> 2. File persistence is fine but not typically necessary. S3 works just fine.

I'm so glad you pointed this out. Cloud-native development is an important factor in newly architected systems. Defaulting to an S3 API for persistent I/O brings loads of benefits over using traditional file I/O, and brings significant new design considerations. Until a majority of software developers learn exactly how and why to use these new designs, we'll be stuck with outmoded platforms catering to old designs.

> Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.

I have used multi-region for every production database I've deployed in the last ~8 years, and it took < 10 seconds of extra time. It's a core feature of services like RDS on AWS.

There is a benefit if you're multi-region (but not global) because individual regions go down all the time.

It costs more every month, but if you have a B2B business, it's worth the extra cost.

For RDS, are you talking about multi-region, or multi-AZ? I know the latter is easy, but I don't think the first is, though maybe Aurora makes it easier.

Oh, that's handy. Thanks for sharing!

We’re not “froze”. The last week has been exhausting fud.

That’s a lie. The official freeze for new features started in 2017 and customers can see themselves by reading the changelog.

Even small companies should be multi-region, if they care about uptime.

No, they shouldn't. In many instances it's cheaper to tolerate downtime than to pay to avoid it, especially when there's no SLA involved.

Most of the time. If heroku is having downtime. Then Amazon is having downtime. Then half the internet is down. Let customers know Amazon is down. Sit back and relax.

Uptime isn't an axiom. Most software isn't mission critical and most users won't notice if it's down for 30 minutes once or twice a month, and for everything else we have SLA's to manage professional expectations.

Wow, that's a horrible way of thinking about the user experience. And honestly, I'm not surprised. That's why companies that really care about the user experience will always steal market share from those that don't.

It’s actually small companies that care about user experience that will often make these trade-offs. Less time managing multi-cloud deployments means more time spent building our core product and talking to users.

On the one hand, yeah it sucks. On the other hand, my local ice-cream shop was closed for 30 minutes last week because the owner was doing something and the staff member who was rostered on was out sick. If your online business is at the same level of profit and necessity as an ice-cream shop, it can probably close for 30 minutes once or twice a year.

Very few companies have uptime requirements so critical they can justify this. Small companies with limited ops resources may struggle to make a multi-region setup work more reliably than a single-region one.

Often for small companies with limited resources, the act of trying to make something multi region has the effect of making the overall system less correct and less reliable than just running in a single region.

> I wouldn't straight out recommend fly.io. See, part of Heroku's value proposition was their excellent support. I had asked fly.io to delete my account since there wasn't a provision to delete it from their interface and they never bothered to reply and put me in some kind of shadow ban from re-registering with my email. I mean if this is how you're going to treat your customers, then good luck! I'm yet to try out railway.app and..looks interesting to say the least.

Same thing happened to me with Lime (the bike and scooter „rental“ company) after they kept charging my old CC even after deleting and adding a new one, which creeped me out as I would hope a delete action on a payment method actually means delete. Instead of deleting my account they banned me so I could never register again.

I don’t feel that way at all.

Every time I’ve tried Fly (trust me I’ve wanted to love it), there’s always a rough edge or the service breaks for me.

First time I tried it, the web panel wasn’t even loading. Second time, months later, everything was 500ing and I couldn’t find a way to SFTP into a disk (!!!). Total dealbreaker.

This was easily done in Render.com with an even more magical experience. Deploy from a GitHub repo and I was live in minutes. Upload the files from local and done.

I want to love Fly so much. I align with their mission. I love their first class Elixir support. But so far I’m not impressed.

It looks to me like Render is seriously taking the PaaS crown at the moment, with innovation after innovation, affordable pricing and excellent user experience.

We've been kicking around ideas for managing files on volumes. This is a common problem – it's actually more difficult than you'd expect because "securitah". Once your volume is mounted in one of your VMs, we can't run tools outside the VM to let you manage the file system. On something like k8s with vanilla Docker, we could. But no one should run multitenant Docker.

It's not really an excuse, just a reason it's taking longer to solve than we'd like.

The errors on the web UI sucked. These have improved drastically in the last three months (because we have smart, dedicated people working on fullstack for us now).

> We've been kicking around ideas for managing files on volumes. This is a common problem – it's actually more difficult than you'd expect because "securitah". Once your volume is mounted in one of your VMs, we can't run tools outside the VM to let you manage the file system. On something like k8s with vanilla Docker, we could. But no one should run multitenant Docker.

Was there a reason for not using something similar to kata containers where you run a microvm but still use containers inside them? It seems like it would make such things easier while getting the isolation of a VM.

They do that already AFAICT by using Firecracker.

Unless they’ve changed things, there is no containerization within the VM a la kata. They run their own custom init inside the VM and use it to start the entry point. https://github.com/superfly/init-snapshot is the source.

Thank you for the response. I want to love your service.

Will keep an eye on the changelogs to see when I can test deploy my apps.

Why do you need to SFTP into a disk?

I was moving a Ghost blog from Render. Ghost is notorious for having a difficult time with hosting assets in S3, so it uses disks.

I needed to move the Ghost assets directory for all the posts.

This was a ten minute thing in Render. SOL in Fly.

Can you not ssh into a fly instance and then pull?

No. Flyctl SSH does not allow this functionality.

We definitely don't do SFTP! My general M.O. for moving files to running VMs is Magic Wormhole. But that's not optimal either. You've got a legit irritation and I'm glad you put it on our radar.

So let me get this straight. Fly has a ‘flyctl ssh’ feature where you can gain console/ssh/whatever into the OS of the fly instance/firecracker/container/thing. So you are saying that once in, you cannot use curl even to pull/download something? (I believe curl supports sftp protocol as well)

You can. You gotta get Wireguard running on your local machine [0] and be able to see the ".internal" network, but once you do, you can ssh into any edge VM and if access its filesystem. I've SCP'd SQLite db files and dumps. It's also a pretty anti-pattern way of debugging an app if you're an outlaw :)

[0] https://fly.io/docs/reference/private-networking/

There's a lot of chat about fly.io on HN. possibly just because the founders and friends are posters here.

Is there any in depth comparison between them and render.com ?

As an occasional third-party Fly “shill” of sorts, I mainly talk about Fly because I’ve really wanted almost exactly what Fly does for a long time now. I’d tried other things like Hyper, Fargate and Cloud Run, but they’ve mostly been disappointing in some regards, being cumbersome to use or get started with, unrealistically expensive, or simply being too slow or limited. Fly is none of that. It still has some stuff that’s lacking; resizing disks is probably the biggest thing missing; the proxying is surprisingly complete but UDP services are a little awkward; copying data in and out is somewhat tricky, you have to use SSH but over a wireguard tunnel with an ephemeral SSH key and on Windows it’s not particularly fun. (I don’t think SCP is supported either, which is fine, but still.) It remains exciting nonetheless because it scratches an itch that I don’t feel anyone else has really managed to; Netlify and Vercel and probably Cloudflare Pages does static sites very well, but Fly feels like, so far, the first PaaS to do any arbitrary service very well. The ability to throw Docker images into micro VMs with such ease and speed with this paradigm is truly liberating.

edit: As a note, I have not tried Render. I am sure it is fine too. I found Fly first and it satisfies my needs well enough that I don’t feel it is necessary to keep searching, though I wouldn’t mind checking it out just to see what it has to offer.

I think it is because they are refreshingly open about their stack and people love their writing tone.

And the architecture seems solid.

Agreed, really enjoyed their tone on explaining SQLite and litestream.

Not in-depth, but two things that have stood out to me in assessing Render vs Fly are Render's lack of Multi-region apps and release phase scripts for running migrations. Easy multi-region apps seem to be a main selling point of Fly. However, both of the features I mentioned are on Render's roadmap (Multi-region apps has been on their roadmap since 2019, so maybe not a priority?)

these were the reasons we chose fly in the end; very happy with it

They are backed by Y Combinator so that might be why they get extremely decent exposure on HN.

My experience with Fly.io has not been a good one.

If I could sum it up, it would be that the dev ux needs a lot of work, and it seems like they are mostly focused on the fundamentals of the platform first.

Following their guide you get postgres not spinning up and linking to your app correctly and you have to nuke your entire app.

The billing UI is weird and feels cobbled together.

I don't feel secure using Fly right now. But again, they are doing cutting edge shit and are probably focused on the underlying bedrock of the platform. They can always circle back to polish the dev ux.

Right now we're on Render.com and it does absolutely everything we want with wonderful dev ux.

In my mind it's a race: can fly catch up to render's UX before render can catch up to flys global mesh load balancing? We'll see who wins.

It's very generous of you to assume we're focusing on underlying platform. It's true! But it's a difficult thing to notice when our service gave you papercuts.

We've just now grown large enough to have people focus full time on the in browser UX. If you feel like fiddling around again in a few months, let me know and I can hook you up with some credits. :)

The billing UI is definitely cobbled together. This is because I built it over a weekend and it's marginally better than "no billing UI". I have learned that if I'm building features, they probably aren't gonna be very good.

Being a small-scale Heroku user, I have a hard time deciding whether to stay with Heroku or move to render.com or fly.io. Before the latest incident, Heroku seemed to be frozen but stable. Now… I don't know. Are they even trying to bring back Github Connect?

Fly.io seems cutting-edge but I feel I would not profit from their multi-region, close to the user infrastructure. So what are their tradeoffs? Render.com appears more complete (?) and cheaper. But they don't have the same elegant database backups or the pipeline with review apps.

Fly.io isn't as much cutting-edge as it is a rethink on what devex on Cloud should look like. It is a fantastic offering that despite its shortcomings is really a delight to use. I use it for toy projects (mostly stateless, or state stored elsewhere but not on Fly.io), but there are plenty who run pretty serious workloads. Give it a spin! You'd be surprised how butter-smooth all that cutting-edge is.

> [...] despite its shortcomings [...]

What do you see as some of its shortcomings? Do e.g. semi-broken docs (or other instances of unclear/uncertain messaging) factor into your impression?

Docs, oh god yes. They're the anti-Stripe in this regard.

And hard-to-debug deploy-time (and sometimes runtime / uptime) issues are a major sticking point. It is hard to know what or who's at fault without asking for it in the forums, since you can't stackoverflow much.

Their support (granted it's free) is a bit of a hit and a miss; while it isn't clear what really is on their roadmap (or important) and what isn't (even though, they are more than transparent than other providers I've interacted with).

These are but pains that come with adopting a nascent ecosystem, I suppose. I still persist with Fly because it is still simpler to build certain apps on it than on any other BigCloud (bar Cloudflare).

Render has native automatic backups for PostgreSQL: https://render.com/docs/databases#backups

Review apps on Render are called Preview Environments: https://render.com/docs/preview-environments

Last week I couldn't get Fly.io to spin up a simple Postgres database for my app. I've now been on Render for a week with multiple production apps, and so far it has been working fine.

i used both Render and Fly.io

Render is more user-friendly and great for teams, but Fly.io is more flexible and has more advanced features

Fly has a nice DX but unfortunately the platform still has major issues with reliability and networking. There are often problems with deploys leading to stuck and unreachable instances or other networking issues. It's probably exacerbated by the new user growth but the slick deploy workflow can't override having an app that works.

Render and others are interesting but K8S is still fundamentally better considering all the DX progress there making it pretty easy to get a container running in a cluster.

What exactly makes K8s better?

Definitely not the developer experience.

But as it is often mentioend, as companies and apps grow, so do their requirements and the need for flexibility. I can't think of a more flexible deployment target that handles a lot of the PaaS concerns than Kubernetes, warts and all.

Fly is surprisingly great, volumes + containers is very close to universal for a PaaS, but it certainly can't cover everything that is running in my workplace's kubernetes.

It's basically the platform to build platforms like these (and some of them already run on it).

K8S offers the same primitives and more, with progressive complexity as you need it. You can deploy a single container with a 1-line command or an entire PaaS subsystem. It's also more portable than any single platform and you can run it on a few VPS instances or your own bare metal. I've also found it far more reliable than all these PaaS services that have their own issues with no visibility.

K8S experience is also more valuable and useful in future projects and there are plenty of nice DX tooling to make deploys easy if that's the blocker.

I’d be very surprised if any edge compute systems run on (a single) K8s cluster. In nearly all cases you’d run at least a cluster per region. K8s also provides no functionality for per region routing or networking between clusters.

That is to say, for most of what fly gives you there is no K8s equivalent.

None of them run on a single cluster globally. That's not supported or recommended by anyone. Some PaaS do use K8S as an underlying orchestrator for workloads, the same way Fly uses Hashicorp's Nomad, with multiple clusters per region.

Kubernetes does have federation/multi-cluster abilities, and global routing/load balancing is available from every CDN and cloud. It's more work to setup but not by that much.

I'm using fly.io lightly.

One thing for me - fly.io is cool, does a lot of cool fancy things. However, the basic PaaS stack from Heroku gets a bit lost / not always fully there for me. For a while they didn't really talk about their container deploy story (AWS App Runner / AWS Fargate, AWS ECS). Digging in, it's all doable, but they do so many cool things it sometimes isn't obvious to me how the basics go. That's changed in last year (at least looking at the docs)? I also had some bobbles on UI in past.

I don't need multi-region in my case, but I do want low latency for example. Easy to find this on aws (https://www.cloudping.info/) - a bit harder to get all regions with a ping endpoint on fly.io - I went looking, they have a demo app you can I think find what they see as closest region, but I wanted to roll my own cloudping.info style approach and it wasn't obvious how to do so. I'm getting about 8ms from my home computer to my closest AWS region.

The basic story need for me is github commit -> goes to dev for review -> promote to production? I happen to use containers now (even if inefficient) because getting stuff "working" often is easier -> if it's working on the container on my machine, it'll work up in the cloud.

That said, there is definitely I think a crop of AWS / Heroku competitors that are going to start to be competitive (think cloudflare on primitives, fly.io and render etc on PaaS style).

> Heroku was catalytic to my career. It's been hard to watch the fall from grace. Don't get me wrong, Heroku still works, but it's obviously been in maintenance mode for years.

Sounds like mission accomplished; the elusive 1.0.

The main thing I want is pipelines - review apps, staging apps, and promotion to production, all integrated closely with GitHub, along with a slack integration that lets me do all of it in a public chatroom.

Until another service has all of this, we’re sticking with Heroku.

First, Heroku's pipelines and GitHub integration are (or, I guess, were) excellent.

We (Fly.io) intentionally didn't build a pipeline replacement. We exist for one reason: to run apps close to users. We're just now to the size where we can do that well. It'll be a while before we can get good at a second thing. Heroku shipped them something like 8 years after they launched.

At the same time, GitHub actions and Buildkite are _very_ good. They're less opinionated than Heroku Pipelines, but I don't regret figuring out how to use them for my own projects.

I think there's a chance that emulating Heroku that closely is a path to failure in the 2020s.

Yeah, totally get that.

> I think there's a chance that emulating Heroku that closely is a path to failure in the 2020s.

I'm not sure I agree, considering that a different platform emulating this exact setup with ~zero configuration is basically everything we want! GitHub actions is (I agree) really great and very versatile, but I'll take Heroku's UI over digging through actions plugin documentation for hours any day.

Render has review apps (https://render.com/docs/preview-environments). We're actively working on pipelines (early access ETA late summer).

Any plans for a chatops extension for Slack? The best think about Heroku is being able to say "/deploy [app]/[branch] to [environment]", and have it Just Work, all with inline threads that convey status updates about the deployment. Does Render plan to build something like that?

No immediate plans, but I'd love to see this on Render at some point, and not just for Slack.

A coworker and I basically built review apps on top of hubot interactions with the Heroku API before review apps existed. It was probably the work of about two weeks to get right.

Yes, it’s automation you have to own, but it changed very little once it was done and provided phenomenal value. I wouldn’t let those features hold you back from exploring other options.

We’re super early but if all of that running on top of your own cloud using tools like cloud run or fargate sounds interesting, check out www.withcoherence.com (I’m a cofounder).

I get the intention with Slack. I’ve never understood, except for the geek cred, pushing work into chat services. Github is open to the team too.

I hear complaints about chat distractions and see engineers create those distractions. I’m at a loss why we want to do that to ourselves?

Nevermind it’s one more pipeline for messages to lost in. It’s needless complexity and configuration too.

The main reason is to use a chat solution is as a unified ledger / single pane of glass. Silly as it is, there isn't really a better solution out there for "integrate all of my third-party deploy, CI, build, etc. status updates in real time, in one place." Sure, someone can click into GitHub, but if you use another service to deploy into, does GitHub pipe the status of that service into your dashboard? How about error logs and alerting, do they go there? If there's a customer incident, does the trust site status show up as well?

On Slack and other chat solutions, it's possible to set up a #operations style single-pane-of-glass channel with deploy notifications, error alerting, and customer communication platforms all pushing to the same place. If an incident occurs, engineers, product, support, etc. can all collaborate around what's going on in real-time without needing to ask "hey has someone updated the trust site yet?" or click into 10 different tabs.

It's honestly pretty good when it works well and really bad and noisy when it doesn't, but it has a place.

Non-engineering are usually in Slack too, which really helps when support, product, or the field need quick answers to easy questions like "has this commit been deployed yet today?"

Perhaps for looking back to see the history of things, a unified ledger is convenient. But as a notifications/alerts solution it’s terrible because it essentially guarantees that the signal to noise ratio will be incredibly low.

This just depends on how disciplined you are. We use a system where backend errors are logged to slack, and then we treat addressing the source of the errors as an immediate priority. Keeping the SNR high is a priority. Plus generally being very selective about what goes into the channel.

Chat is a multi-player CLI. Anything you would do in a terminal that your team members should also know about can profitably be integrated with chat.

I'd never, ever go back to working without chat.

We've switched to Discord (because reasons. Slack is much better for work though) and I rewrote a bunch of Slack hooks to get Dicord notifications.

> I hear complaints about chat distractions and see engineers create those distractions.

People will complain no matter what.

Fly.io is a reclaimer. Also see: Vercel, Digital Ocean, Dokku, Netlify, Firebase, Engine Yard, OpenShift, Appfleet, Github Pages, Render, AWS Amplify.

Heroku made it easier to deploy, but now it feels a tad bit bit more frictional than other services, including fly.io and these mentioned above. It is probably a bit outdated in that regard.

We use a tool called squash.io. It's geared more for instant QA environment than production workloads, but it has the advantage of being super simple. It just listens for feature branch commits and grabs the docker-compose file and runs it. Funny enough, before we signed up we also considered Heroku since we had a relationship with Salesforce and I was familiar with their product. After an intro call, they never followed up. Guess that was a sign.

I could relate to the comment on Redis.

  The only things I can see missing are automated Redis hosting by the platform.
There have been so many times I wanted some simple key value store which I do not have to bother about setting up and taking care of. Something like "ambient Redis". It's OK not to have crazy scaling promises. You just enable an API (maybe for a small fee) and just use it.

If and when you get big enough you switch to a setup you bother about setting up and taking care of.

Am I making sense to anyone?

Fly did have a built-in redis cache (albeit multi-tenant / shared) and for the life of me I can't figure out why they'd deprecate it (though, it still works): https://community.fly.io/t/debugging-a-failing-image-without... and https://community.fly.io/t/please-bring-redis-back/1563

If anything, I'd prefer they moved their PaaS more towards serverless and managed offerings than towards IaaS.

On their forums they say they envision some other company handling Postgres too. Personally I would prefer they handle the databases, redis etc because that way the services you need can be provisioned right alongside your app servers otherwise it undermines the value of having edge servers near your users. Heroku could get away with 3rd parties being responsible for the database software because using AWS it was easy for other companies to be in the same region or zone.



My money is on Supabase building for Fly as a target / default IaaS, and that's as much close to managed services we are going to get, given Fly's insistence that they're not really good at (or want to) building (and maintaining) managed services.

I've been really loving Upstash's Redis offering. Scales down to $0/month and for my needs (1-3 ops per second) even their high availability Redis ends up being just a few bucks a month. Probably cheaper and certainly easier than spinning a simple instance up on my own, but with performance and uptime guarantees closer to one of the cloud managed Redis offerings which start at mid to high $10s per month

I was in the exact same spot (only on render.com), where Redis seemed overkill, but a key-value store was needed.

So I created a Postgres Key Value Ruby library. https://github.com/berkes/postgres_key_value

Perhaps just using an embedded DB would be better in your case? Something like Berkeley DB. Alternatively, just Dockerize it, although I assume it's too much upkeep.

While not as "Redis-y", there are some decent KV-ish managed databases out there:

- Firestore - Cloudflare Workers KV - DynamoDB - Bigtable

One GCP Firestore tip a lot of folks don't realize for non-client-WebSocket-y workloads you can run Firestore in Datastore mode and it acts much more like DynamoDB. Can turn usage bills from $10s per month to pennies, and IIRC gives you faster response times and higher scaling limits in terms of writes per second

Anything can be a key/value store and there are dozens of managed databases with serverless pricing and public/HTTP access so you can use it anywhere with acceptable latency.

Heroku was magic for hosting college projects of 2010's complexity. The failure wasn't in the prohibitive cost at scale (though that factor didn't help); it was that for most real-world stuff we need IaaS, not PaaS. That has become more and more evident over the last ten years.

I think if fly succeeds, they need to figure out edge IaaS, and not put all their eggs into edge PaaS. And I hope they do! I'm curious what a successful edge IaaS looks like!

This is pretty much what I believe. This isn't HN frontpage worthy, but one of the things I'm most excited about is people running production CockroachDB clusters on Fly.io. It is still a little more difficult to use the underlying infrastructure than it should be, but we're getting close.

Neat, I'd say that is absolutely HN frontpage worthy!

And disclosure, I used to be on Azure Front Door team and was the lead for Azure Edge Zones development. I really wanted to do something like fly with that. But it turned out that too many people wanted too many different things. Some needed GPU (some Nvidia, some AMD), some FPGA, some general compute. Surprisingly few cared about Functions (our Lambda-like service), or even Web Apps (our Heroku). SQL Server was a big request; CosmosDB was not. Remote Desktop was another big one. They also wanted availability zones within the edge zone itself. And even if latency was tolerable to the nearest region, we had to put almost all our infra services local too because everything needed to continue working if there's a network outage. And the extra annoying thing that shouldn't have to be annoying -- everybody wanted ipv4.

So it ended up hardly making sense to deploy anything less than like 80 racks per site, at which point it's basically a small region minus a few small pieces.

Then there's just the risk that the people who wanted whatever special GPU or SSD combination would quit wanting them and they'd just sit there unused indefinitely after that. Or stockouts when demand rose due to a conference or whatever that would tarnish the brand. And of course nobody wanted to pay more than like 10% markup. They were more amenable to long term contracts though. It was just hard to figure out the right use case and make it profitable.

Seemed like what customers really wanted out of them were nearby replacements for pieces of their own datacenter. It was exactly the opposite direction of where I was hoping things would go, which was something between fly and cloudflare workers. Not sure what they're doing now; I left about 18 months ago.

This is part of why the PaaS take has worked so well for us. People who think they want edge have all kinds of different needs. When we realized that all full stack devs could benefit from something kind-of-like-edge it helped us do more focused work.

Yeah, I hope it works! Prior to MS I was the solo dev for Smilebooth (a photobooth company), and when I joined Edge Zones my north star to shrink the minimum footprint (I think it was three racks initially) so that you could load an edge zone host on like a Smilebooth console or something and manage photobooth fleets by deploying Web Apps directly to them. (I realized that was ridiculously far-fetched at the time and that there were probably better ways of achieving that outcome, but I certainly didn't foresee the minimum footprint growing so dramatically)!

And, like I said earlier, I hope to see what a real edge IaaS solution looks like too, if such a thing is even possible. Maybe the IaaS that would allow a build-your-own-CDN.

if you're counting pennies just use dokku on a vps, there was even an article here on HN recently. By the time you outgrow it you'll have outgrown the vast majority of the new paas sites that are springing up trying to be the next Heroku and your choice of where to go next will be easier. In the meantime, you'll have saved all that effort on just picking where your app is hosted and, instead, spend it on actually delivering your app. crazy thought, i know.

i'm running https://wikinewsfeed.org on fly.io

very satisfied so far and would definitely deploy there next time!

the killer feature i like the most is automatic prometheus metrics collection

one thing i don't really like about fly.io is the fact that they charge money for free Let's Encrypt SSL certs

They aren’t charging for the certificate so much as they’re charging for TLS termination, infrastructure, DNS, caching, handling invalidations, etc. It’s also 10 free then $0.10/mo per certificate thereafter (or $2/mo for wildcard). They also donate 50% of their TLS fees to Let’s Encrypt.

So, yes, some users will have have to pay for certificates, but it seems extremely reasonable to me.

well, maybe

but, my expectation as a PaaS customer in 2022 is that you shouldn't need to pay for a SSL cert

the expectation is because nobody else charges for them anymore, not even their competitors

Do they? I feel like you're just uncomfortable with line-item pricing and prefer flat all-in-one pricing. What are the other competitors that offer actual PaaS instead of static-site hosting?

* Render.com charges $0.60 per custom domain after the first 25

* Heroku gives you "free" custom certificates once you're on a $7/mo minimum.

A clarification: every static site or full stack app can have up to 25 custom domains for free on Render. Most sites only have 2 (apex and www).

Yeah but "after the first 25" may cover the majority of their customers. I myself am one of them, and at this point I don't care at all what happens after I reach 25 domains.

Essentially it _is_ free.

I agree with you of course. I feel the same way about fly.io’s pricing. For nearly all use cases, certs are free. Which is why I found it so odd that GP was complaining about it.

Heroku is getting hammered like crazy this week on HN

The one thing I'm really missing after looking at a number of hosts (Fly and Render being top of list otherwise) is a free database tier.

For a toy app, 10k db rows (across all tables) from Heroku was enough to get the app running and have a public URL to share, and I miss those days.

I'm working on a fresh Rails7 toy app to try out some new features, and my current thinking is to use sqlite, and add an initializer early in the stack to migrate+seed the db if it's missing. If it's ephemeral that's fine, I just want some baseline data to interact with the app at a distance beyond localhost.

Fly's free resource allowance includes Postgres

Render has a free DB tier

Render's free DB tier isn't really usable when the data gets nuked after 90 days. Really strange decision IMO.

It's a temporary decision until we can build multi-tenant Postgres instances which will be perpetually free.

Good point. I haven’t used it because I use other free DB offerings. CockroachDB and cosmos DB both have free tiers.

For me the missing bit in Fly is scheduled tasks. I know how to solve this by spinning up an app that runs permanently as a scheduler, but basic cron-like scheduling should be part of the platform IMO. All other FaaS-like service do this.

I'd love to do this, if for no other reason than I hate working with cron. What would you use it for? What would the ideal version of this feature look like for you? What kind of apps would you be more easily able to ship? Is it mostly so you wouldn't need to keep a single tiny running VM sitting around running cron?

The sort of things I have had running on a schedule (using Celery[0] or something like it) are sending a set of emails at a certain time, running a report or generating the data for reports, even running a backup of some data.

[0] - https://docs.celeryq.dev/en/stable/userguide/periodic-tasks....

> What would the ideal version of this feature look like for you?

I think that Render solved "Cron as a Service" beautifully:


I would use render but fly.io persistent storage is a killer feature. It is literally 100x simpler to just deal with files on a real file system than having to deal with remote object storage.

Render has persistent disks! https://render.com/docs/disks. We need to expose them for cron jobs though.

I'd like to do any one-off job, not only scheduled ones; retrieving data and transforming or storing it, scraping a web page, running a database migration. My biggest annoyance with Fly is the assumption that everything is a long running application.

Don't tell anyone, but you can try "fly machines run" (currently only in a new, empty app). Then "fly machine start <id>" to start the thing back up after it exits. It's not released yet, though.

I'm not sure how Fly handles one-off tasks behind the scenes, but we're building task queues (w/ cron) as a first class citizen at https://tasker.sh

Some apps are just that - a periodic task. Nothing more. Ideally a task project would have a git repo with a short fly.toml, a requirements.txt and a main.py. (In case of Python. I'm not familiar with using other languages on fly.io) I don't think it needs to be more complicated than that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact