Hacker News new | past | comments | ask | show | jobs | submit login

Does it have basic functioning other stuff? I am shocked at how our production usage of Fly has gone. Even basic stuff as support not being able to just... look up internal platform issues. Cryptic/non-existent error messages. I'm not impressed. It feels like it's compelling to those scared of or ignorant of Kubernetes. I thought I was over Kubernetes, but Fly makes me miss it.



I was hoping to migrate to Fly.io and during my testing I found that simple deploys would drop connections for a few seconds during a deploy switch over. Try a `watch -n 2 curl <serviceipv4>` during a deploy to see for yourself (try any one of the the strategies documented including blue-green). I wonder how many people know this?

When I tested it I was hoping for at worst early termination of old connections with no dropped new connections and at best I expected them to gracefully wait for old connections to finish. But nope, just a full downtime switch over every time. But then when you think about the network topology described in their blog posts, you realize theres no way it could've been done correctly to begin with.

It's very rare for me to comment negatively on a service but that fact that this was the case paired with the way support acted like we were crazy when we sent video evidence of it definitely irked me for infrastructure company standards. Wouldn't recommend it outside of toy applications now.

> It feels like it's compelling to those scared of or ignorant of Kubernetes

I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)


Yeah I had a similar experience where I got builds frozen for a couple days, such that I was not able to release any updates. When I emailed their support, I got an auto-response asking me to post in the forum. Pretty much all hosts are expected to offer a ticket system even for their unmanaged services if its a problem on their side. I just moved over all my stuff to Render.com, it's more expensive, but its been reliable so far.


The first (pinned) post in the fly.io forum explains it:

https://community.fly.io/t/fly-io-support-community-vs-email...


That forum post just says what OP said, that they will ignore all tickets from unnmanaged customers. Which is a pretty shitty thing to do to your customers.


The cheapest plan that gets email support is nothing more than a commitment to spend a minimum of $29/mo on their services. That is, if you spend >=$29/mo, it costs nothing extra. Not what I'd call "managed".


> I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)

Have you tried Google Cloud Run(based on KNative) I've never used it in production, but on paper seems to fit the bill.


Yeah we're mostly hosted there now. The cpu/virtualization feels slow but I haven't had time to confirm (we had to offload super small ffmepg operations).

It's in a weird place between heroku and lambda. If your container has a bad startup time like one of our python services, autoscaling can't be used as latency becomes a pain. Its also common deploy services on there that need things like health checks (unlike functions which you assume are alive), this assumes at least 1 instance of sustained use as well, assuming you do minute health checks. Their domain mapping service is also really really bad and can take hours to issue a cert for a domain so you have to be very careful about putting a lb in front of it for hostname migrations.

I don't care right now but the fact that we're paying 5x in compute is starting to bother me a bit. A 8core 16gb 'node' is ~$500/month ($100 on DO) assuming you don't scale to zero (which you probably wont). Plus I'm pretty sure the 8 cores reported isn't a meaty 8 cores.

But its been pretty stable and nice to use otherwise!


A 6c / 12t Dedicated Server with 32GB of ram is 65$ a month with OVH

I do get that it is a bare server, but if you deploy even just bare containers to it, you would be saving a good bit of money and get better performance from it.


Another interpretation is the so-called dedicated servers are too good to be true.


It depends on what the 6 cores are. Like I have a 8C/8T dedicated server sitting in my closet that costs $65 per the number of times you buy it. (Usually once.) The cores are not as fast as the highest-end Epyc cores, however ;)


At the $65/month level for an OVH dedicated server, you get a 6-core CPU from 2018 and a 500Mbps public network limit. Doesnt even seem like that good a deal.

There is also a $63/month option that is significantly worse.


We also run some small ffmpeg workloads and experimented with Cloud Run consuming Pub/sub via EventArc triggers. Since Cloud Run's opaque scaling is tied to http requests, EventArc uses a push subscription. In pub/sub these don't give you any knobs to turn regarding rate limiting/back pressure, so it basically tries to DoS your service and then backs off. This setup was basically impossible to tune or monitor properly.

Our solution was to migrate the service to Kubernetes using an HPA scaling on the number of un-acked messages in the subscription, and then use a pull subscription to ensure reliable delivery (if the service is down they just sit in the queue rather retrying indefinitely).

I'm convinced Cloud Run/Functions are only useful for trivial HTTP workloads at this point and I rarely consider them.


Thats very interesting. Thanks for sharing.

But sweet sweet github triggered deploys. Have you found an easy solution to this?


> easy solution to this

Triggered deploys to Kubernetes you mean? There's a million ways to solve this problem for better or worse. We use Gitlab CI so we invoke helm in our pipelines (I'm sure there's a way to do this with github actions), but there's also flux cd, argo, etc. etc.

We use Kubernetes (GKE) elsewhere so we already had this machinery in place luckily. I can see the appeal of CloudRun/Functions as a way to avoid taking that plunge


I have yet to gain positive experience with Cloud Run. I have one project with it, and Cloud Run is very unpredictable with autoscaling. Sometimes, it can start spinning up/down containers without any apparent reason, and after hunting Google support for months, they said it is an "expected behavior". Good luck trying to debug this independently because you don't have access to knative logs.

Starting containers on Cloud Run is weirdly slow, and oh boy, how expensive that thing is. I'm getting the impression that pure VMs + Nomad would be a way better option.


> I'm getting the impression that pure VMs + Nomad would be a way better option

As a long time Nomad fan (disclaimer: now I work at HashiCorp), I would certainly agree. You lose some on the maintenance side because there's stuff for you to deal with that Google could abstract for you, but the added flexibility is probably worth it.


> Starting containers on Cloud Run is weirdly slow

What is this about? I assumed a highly throttled cpu or terrible disk performance. A python process that would start in 4 seconds locally could easily take 30 seconds there.


Last I checked, Cloud Run isn't actually running real Linux, it's emulating Linux syscalls.


Cloud Run "gen2" runs a microvm (ala lambda) rather than gvisor, so it depends on your settings.


Ah, good to know, thank you! I hadn't seen the announcement of the second generation environment.


I just use AWS EC2, load balancer, auto scaling groups. The user_data pulls and runs a docker image. To deploy I do an instance refresh which has no downtime. Obvious downside is more configuration than more managed services.


I have been using Google Cloud Run in production for a few years and have had a very good experience. It has the fastest auto scaler I have ever seen, except only for FaaS, which are not a good option for client-facing web services.


Same experience here, using it for years in production for our critical api services without issues.


Cloud Run is compatible with KNative YAML but actually runs on Borg under the hood, not Kubernetes. At least when not using the "Cloud Run on GKE" option via Anthos.


> Try a `watch -n 2 curl <serviceipv4>` during a deploy

You need blackbox HTTP monitoring right now, don't ever wait for your customer to tell you that your service is down.

I use Prometheus (&Grafana), but you can also get a hosted service like Pingdom or whatever.


Can you email the first two letters of my username at fly.io with more details? I'd love to find out what you've been having trouble with so I can help make the situation better any way I can. Thanks!


Another support.flycombinator.com classic.


Would you rather them be unresponsive?


It's HN -- if the company proved responsive it might invalidate his OP and everyone who band wagons on it.


Why would you care about customer problems if they don’t embarrass you in public?

/s


the only thing easier than them responding in this thread is someone making this comment in this thread…


[flagged]


It seems to me that your comment is personally targeting OP and I think that is quite out of line.


...as if it's one person who had issues! I thought it was just incompetency. But it now looks like a theatre, pretending now.


I've been a paying Fly.io customer for 3 years now, and for the past 18 months, I've had no real issue with any of my apps. In fact, I don't even monitor our Fly.io servers any more than I monitor S3 buckets; the kind of zero devops I expect from it is already a reality.

> it's one person who had issues

Issues specific to an application or one particular account have to be addressed as special cases (like any NewCloud platform, Fly.io has its own idiosyncrasies). The first step anyway is figuring out just what you're dealing with (special v common failure).

> looks like a theatre

I have had the Fly.io CEO do customer service. Some may call it theatre, but this isn't uncommon for smaller upstarts, and indicative of their commitment, if anything.


Yep they have terrible reliability and support. Couldn’t deploy for 2 days once and they actually told me to use another company. Unmanaged dbs masquerading as managed. Random downtime. I could go on but it’s not a production ready service and I moved off of it months ago.


The header at the top of their Getting Started is "This Is Not Managed Postgres " [1]

and they have a managed offering [2] in private beta now...

> Supabase now offers their excellent managed Postgres service on Fly.io infrastructure. Provisioning Supabase via flyctl ensures secure, low-latency database access from applications hosted on Fly.io.

[1] https://fly.io/docs/postgres/getting-started/what-you-should...

[2] https://fly.io/docs/reference/supabase/


> Unmanaged dbs masquerading as managed

Are you talking about fly postgres? Because I use it and feel they've been pretty clear that it's unmanaged.


Seriously! That's crazy. I need to setup terraform and move to AWS before launching I guess.


> Seriously! That's crazy

huh? it does what it says on the tin. nothing crazy about it.

They spell out for you in detail what they offer: https://fly.io/docs/postgres/getting-started/what-you-should...

And suggest external providers if you need managed postgres: https://fly.io/docs/postgres/getting-started/what-you-should...


I was shocked because I didn't realise it wasn't managed. Even Digital Ocean offer managed Postgres.

If you are offering a service like Fly I think the database should be managed personally, the whole point of Fly.io is to provide abstractions to make production simpler.

Do you think the type of user who is using fly.io is interested in or capable of managing their own Postgres database? I'd rather just trust RDS or another provider.


> Do you think the type of user who is using fly.io is interested in or capable of managing their own Postgres database?

Honestly.. kinda, yeah

At least I'm projecting my weird "I want to love you for some reason, Fly" plus my skillset onto anyone else that wants to love Fly too haha

They feel very developer/nerd/HN/tinkerer targeted


Yep, I never really trusted managed databases. It just feels like one of those things that's so important to your app that not having full control of it is weird.


Unfortunately this is a pretty common story. Half the people I know who adopted Fly migrated off it.

I was very excited about Fly originally, and built an entire orchestrator on top of Fly machines—until they had a multi-day outage where it took days to even get a response.

Kubernetes can be complex, but at least that complexity is (a) controllable and (b) fairly well-trodden.


Fly.io is not comparable to Kubernetes. It’s a bit like comparing AWS to Terraform.

Or to clarify your comment, Kubernetes on which cloud? Amazon? google? Linode?


Kubernetes on AWS, GCP, and Linode are all controllable and well-trodden.

I definitely understand the comparison between Kubernetes and fly. You have couple apps that are totally unrelated, managed by separate teams, and you want to figure out how you can avoid the two teams duplicating effort. One option is to use something like fly.io, where you get a command line you run to build your project and push the binary to a server. Another option is to self-host infrastructure like Kubernetes, and eventually get that down to one command to build and push (or have your CI system do it).

The end result that organizations are aiming for are similar; developers code the code and then the code runs in production. Frankly, a lot of toil and human effort is spent on this task, and everyone is aiming to get it to take less effort. fly.io is an approach. Kubernetes is an approach. Terraform on AWS is an approach.


Maybe you’re comparing flyctl with Kubernetes?

That’d be a slightly more valid comparison albeit flyctl is much less ambitious by choice and design. That said, using flyctl to orchestrate your deployments is not the only way to Fly. Example:

https://fly.io/blog/fks/


> Fly.io is not comparable to Kubernetes.

The Fly team has worked on solving similar problems to Kubernetes. Ex://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/

Of course, Fly also provides the underlying infrastructure stack too. If you want to be pedantic, you can compare it to GKE/AKS/EKS.

Kubernetes on any major cloud platform is more mature, controllable, and reliable than Fly.


I have run several services on Fly for almost a year now, have not had any issues.


I find it amazing how much bad vibes fly.io gets here.

It looks worse than AWS or Azure to me.

Never used the service, but based on what I hear, I'll never try...


I don't see any reason use Fly. There are more mature, feature-richer and cheaper solutions out there. We have the big complex ones like AWS, Azure, GCP, the easier more affordable all rounder like DO, Render, the hosting plattforms like Vercel, Heroku and finally the biggest bang for your money barebones like Hetzner.

Why should I choose Fly? How come they are so prominent on hackernews? Are they backed by VC and get their default 400 upvotes by backers? I get the impression that Fly posts here are kind of sponsored.


I switched to Kamal and Hetzner. It's the sweet spot.


Been on it 7 months, 0 issues. Feel like you're alone on this potentially.


Alone? Every thread about Fly has complaints about reliability and people complain about it on Twitter too


It's hard to tell how meaningful the reviews are. I have used AWS, GCP, DigialOcean, and Linode throughout my career. Every single one of these, through no fault of myself or my team, messed up and caused downtime. Like, you can get most SRE types in a room to laugh if you blurt out "us-east-1", because it's known to be so unreliable. And yet, it's where every Fortune 500 puts every service; we laugh about the reliability and it's literally powering the economy just fine.

So yes, a lot of people on HN complain about fly's reliability. fly posts to HN a lot and gives them the opportunity. Is it actually meaningful compared to the alternatives? It's very hard to tell.


Hoo boy.

First: this is 100% a "live by the sword, die by the sword" situation for us. We're as aware as anybody about our weird HN darling status (this is a post from two months ago, about an announcement from many months ago, that spent like 12 hours plastered to the front page; we have no idea why it hit today, and it actually stepped on another thing we wanted to post today so don't think we secretly orchestrated any of this!). We've allowed ourselves to be ultra-visible here, and threads like this are natural consequence.

Moreover: a lot of this criticism is well warranted! I can cough up a litany of mitigating factors (the guy who stored his database in ephemeral instance storage instead of a volume, for instance), but I mean, come on. The single most highly upvoted and trafficked thing we've ever written was a post a year ago owning up to reliability issues on the platform. People have definitely had issues!

A fun cop-out answer here is to note all the times people compare us to AWS or Cloudflare, as if we were a hyperscaler public cloud. More fun still is to search HN for stories about us-east-1. We certainly do that to self-sooth internally! And: also? If your only consideration for picking a place to host an application is platform reliability? You're hosting on AWS anyways. But it's still a cop-out.

So I guess I'd sum all this up as: we've picked a hard problem to work on. Things are mathematically guaranteed to go wrong even if we're perfect, and we are not that. People should take criticisms of us on these threads seriously. We do. This is a tough crowd (the threads, if not the vote scores on our blog post) and there's value in that. Over the last year, and through this upcoming year, staffing for infra reliability has been the single biggest driver of hiring at Fly.io, I think that's the right call, and I think the fact that we occasionally get mauled on threads is part of what enabled us to make that call.

(Ordinarily I'd shut up about this stuff and let the thread die out itself, but some dearly loved user of ours took a stand and said they'd never had any problems on us, which: you can imagine the "ohhhhh nooooooo" montage that took place in my brain when I read that someone had essentially dared the thread to come up with times when we'd sucked for some user, so I guess all bets are off. Go easy on Xe, though: they really are just an ultra-helpful uncynical person, and kind of walked into a buzzsaw here).


I also don't know why HN is so upset about people willing to help out in the threads. The way I see it is, if you talk about your product on HN, inevitably someone will remember they have a support inquiry while HN is open, and ask it there instead of over email. Since employees are probably reading HN, they are naturally going to want to answer or say they escalated there. I don't think it's some sort of scam, just what any reasonable person would do.


It's become a YC cliche, that the way to get support for any issue is to get a complaint upvoted to the top of a thread. People used to talk about "Collison installs", which are real-use product demos that are so slick your company founder (in this case Stripe's 'pc) can just wander around installing your product for people to evangelize it; there should be another Collison term for decisively resolving customer support issues by having the founder drop into a thread, and I think that's the vibe people are reacting to here.


ok possibly not alone, maybe the issues happened before I started using them extensively. I've had ~no downtime that affects me in 7 months.

I do wish they had some features I need, but their support and responses are top notch. And I've lost much less hair and time than I would going full-blown AWS or another cloud provider.


To be fair most hosting providers come with plenty of public complaints about downtime. The big ones do way better, the best one is AWS, then GC and last Azure. They cost stupid money though.

Digital ocean has been terrible for me, some regions just go down every month and I lose thousands of requests, increasing my churn rate.

Fly.io had tons of weird issues but it got better in the last months. It's still very incomplete in terms of functionality and figuring out how to deploy the first time is a massive pain.

My plan is to add Hetzner and load balance with bunnycdn across DO and H


Every thread on the Internet about any product or service has complaints.


Actually here is a good example: Cloudflare. Sure people complain a ton about privacy but I haven't seen a single complaint about the reliability of Cloudflare Workers or similar product in the dozens of threads I've seen on HN


Not to this extent, it has always stood out to me in particular


[flagged]


Wow that devolved into aggression pretty quickly.


I'd correct that as a corrupt definition of "aggression".


This isn't the place for silly, unjustified personal attacks. Stop it.


That hasn’t been my experience with Fly but I’m sorry to hear it seems to be others :(



Not alone, I’ve been part of two teams who have evaluated fly and hit weird reliability or stability issues, deemed it not ready yet.


this is what I thought, until once I spent two days to publish a new, trivial code change to my Fly.io hosted API — it just wouldn't update! And every time I tried to re-publish it'd give me a slightly different error.

When it works, it's brilliant. The problem is that it hasn't worked too well in the last few months.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: