I'm still not entirely convinced that serving apps closer to users solves more problems than it creates, but I am always impressed by the fly.io blog posts. The writing style makes them a joy to read.
Most pieces of technology create as many problems as they're meant to solve. The cost-benefit is always around whether or not the solved problems are worth the cost of the new problems. If latency is such a problem that you'd want to deploy servers and caches as close as possible to a user, you'd probably gladly pay for the reduction of that latency at the cost of the new class(es) of problem.
It was an interesting decision to make a SaaS out of this solution, though, as I think 99% of problems are not solved by having machines closer to their users.
My favourite kind of technology is technology that either takes something that was impossible and makes it possible, or that takes something that was too hard to do and reduces the difficulty to the point that you chose to do it when previously you would have chosen not to.
Fly.io fits that latter category perfectly. Global distribution was generally Too Hard to consider for my projects. Now it isn't.
Right, and with fly.io it is easier to move our chunky, chocked full of latency code closer to the user than remove the internal latency. It is kind of an Amdahl's Law of planet scale serving. I get 100ms for free and I don't have to touch my code.
Not only latency but possibly also the cost / availability of massive egress. This is why e.g. YouTube has local caches around the world. Other heavily-hit caches with heavy content may make sense, too; I can't easily imagine a case beside serving video, though.
It is also about pure network capacity.... if all of the traffic on the internet has to traverse the entire path to a central data center, there wouldn't be enough backbone capacity
Steam and other game distribution systems also have an extensive network of edge caches around the world, and Linux distros have had local mirrors since forever.
It’s also pretty common for people running large LAN parties to run a local Steam cache so that when everyone in the building decides to download a game on the list of tournaments it’s coming over the local network rather than downloading hundreds of copies from the Internet.
That's an interesting point. However, I wonder if there are extra costs to serve a request from Australia to the US (there are definitely extra costs but I wonder if either the user or server pay them). If these costs are exposed to either of the ends, then it might be cost effective to create something that serve requests from local caches.
I don't know about now, but many years ago, it was common for some Australians to have different prices or caps to AU vs non-AU destinations, and for some European customers to pay more for trans-atlantic traffic. I've never seen destination specific charging in the US.
My guess is the cost differences are still there, but the increases in capicity and decreases in all costs combined with the difficulty of users to control traffic at that level mean that you would really only see this if you're buying a lot of transit. If you're just a residential customer, it might be more likely to impact you as your ISP may have more congestion on oceanic routes rather than an explicit charge.
I do occassionally see hosting operators that will optionally charge more for access to some routes that are better but too costly to include in a bundled rate.
Most of the big operators have peering agreements in place, but that doesn't mean every participant has infinite bandwidth. Google Global Cache and Netflix Open Appliance go a long way to reducing costs by avoiding interconnect where possible.
> Most pieces of technology create as many problems as they're meant to solve. The cost-benefit is always around whether or not the solved problems are worth the cost of the new problems.
Fly is the only company that consistently makes me want to reconsider my negative views on "the cloud", and it's absolutely 100% due to their incredible blog posts.
As for this particular post, it's great as always, but I would have liked to see more specifics on how applications might write to the main replica vs their local Redis instance.
Well that's nice of you to say. Would you believe we also have negative views of the cloud?
You're right, we kind of glossed over how to do that. People usually just keep two Redis connections in their code, something like `regionalRedis` and `globalRedis`. It's cheap to keep Redis connections around.
I can't really think of a better way to handle it, it's kind of a weird problem because not _all_ writes need to go a certain place, just writes you deem global.
Since I personally believe fly.io has the most potential in redefining how apps are deployed (though, with no disrespect I don’t believe it’s quite there yet) … is there any plans to providing a PaaS like offering where I can deploy my favorite web framework to your edge network and you handle the OS, Database, App server, etc.
Said another way, I just write my web app code and fly.io handles literally everything else. (I don’t even mess with docker etc. Just my app code and be done)
Yes, this is the dream. We're definitely not there yet.
This is especially important for DBs. What I'd really love is for us to work with DB "owners" to jointly provide managed DBs. One thing I hate about AWS is that they have a lopsided, parasitic relationship with the people who build interesting databases.
Even though people love to hate PHP, people have been able to achieve that dream of only writing code and a hosting provider takes care of the rest for almost 20 years now.
I recall in thr early 2000s having a personal VPS account with Dreamhost and doing just that since they managed the OS, Database and Apache/nginx.
It’s amazing how in many ways - deploying code over the years has only become radically harder than simpler.
I worked for a Dreamhost competitor for years. These "managed VPS" accounts with providers like that are definitely not something where you can just throw your code over the wall. There were and are whole teams of unix weenies like me responding to the myriad ways that software or customers would bork their VPS install.
The situation on "shared hosting" was IMO much better in terms of reliability, since customers didn't have root, but these servers were definitely still pets and not cattle.
Basically, these companies are a way to outsource sysadmin labor to sweaty cubicle farms rather than a way to actually reduce the amount of labor that is needed. Arguably the same is true for cloud but I think in general the cloud paradigm is actually more labor-efficient. As a thought experiment, imagine if AWS tried to serve their current customer base with the techniques of Dreamhost-style hosting companies. They'd need to employee 1000x as many people! And it would still be worse!
Once my PHP app became mildly popular, I quickly learned the limits of shared hosting. At that point I was forced to learn all of the things that people avoid by deploying apps on someone else’s PHP server.
But your app reached the point of popularity before you started needing to worry about provisioning and maintaining the infrastructure for it? That sounds like a win to me.
It took about a month to reach that point and I had all sorts of confusing performance problems and errors (process per user limits, script execution time limits, inability to manage extensions, lack of access to logs and configuration) from the shared hosting infrastructure. I would have been better off if I realized from the beginning that shared hosting would never be adequate and just started it on a VM in Python.
Shared hosting has limits, but moving an app to a dedicated managed server (or managed VPS) is straightforward, especially if you were on a cPanel shared host (most of them are), since you can move accounts across servers with a few minutes of downtime at most.
Saying “I hit shared hosting limits, I would’ve been better off writing it in a different language and running on entirely different infrastructure” doesn’t really seem like the logical next step.
It’s like saying “I hit the limit of my barebones PostgreSQL server, and instead of getting a bigger instance I should’ve just built everything with NoSQL”.
I looked into a dedicated managed server and it did not give me the flexibility I wanted. Fewer resource limits, but the lack of admin rights is about as bad as shared hosting… Can you imagine having to contact support when you want to upgrade an extension or something? Maybe if you have no sysadmin skills that would be a good choice. I can easily run a server myself and I don’t want support staff at some random host administrating my server for me at all. I think I tried a managed dedicated server for about two weeks before switching to unmanaged.
I did what I’m describing very successfully 10 years ago and I had a very good reasons to make these decisions at the time.
I mentioned python because I ended up re-writing everything in python two or three years later. You’re right that it’s not related to this, other than that trying to run a python app under shared hosting was really difficult.
I moved to a unmanaged VM (this was 2008, first Slicehost and then Linode) since I’d been using Linux as my main home OS for 10 years anyway.
The easiest, and also most terrible, way to deploy was FTP I guess - just drag the entire folder's content over. During the copying process you'd get errors and downtime of course.
With my current stack I provision a VPS with Forge and deploy the repo with Envoy (zero downtime). It's pretty easy and feels very solid.
I will be happy if you just get to the point where I can upload a docker compose file and then run that. Not only would it make deployments/development much easier, but it would fit most companies better than maintaining their own cluster.
Just holding the two Redis connections makes sense. I remember the Postgres post said something about modifying HTTP headers (a solution I liked), and I didn't know if this would be something similar.
Actually, I just looked back, and that was to solve the problem of only being able to write to a single postgres instance of the globally distributed cluster. For just caching, with every instance writeable, that probably wouldn't apply here.
I think the main difference is that by making replicas writable you lose the ability to have redis throw an error like Postgres does when you try to write to a replica. The upside is that you can write to replicas when you don’t need a global write without rerouting the request somewhere else, the downside is that your app has to have that knowledge about which one to use in which situations. It’s definitely an interesting set of trade offs for the two use cases!
Fly has become a serious contender. I remember the early days where they were more of a reverse proxy with javascript workers but have since evolved into a fantastic platform, solving for both compute and data.
I might have to launch a startup just to get a chance to use them.
yes it looks really interesting. I wonder about the reverse proxy (with caching) aspect since we’re looking at caching some of our Rails pages, and to purge them would mean upgrade to Cloudflare enterprise which is expensive for us. Is fly a more flexible CDN contender then? or would we be better off with bunny or keycdn for this use-case? where does the reverse proxy route fit in this picture?
Fly is no longer a CDN or reverse proxy. It lets you run anything packaged into container in several regions around the world and serve traffic to them all through a single domain.
You can build a CDN on top but if you just need basic CDN features then you should probably look at something else. Cloudflare allows purging all of the content, or individual URLs, for free through the UI or API. You only need enterprise for the more advanced tag-based purging. You can also look at Fastly which is another configurable CDN. BunnyCDN is also good. Start simple and then move as you need more.
Yes, we need more advanced purging functionality. Enterprise seems like the logical next step, but the price increase is hard to swallow. Using something like Bunny/Fastly etc can save us a bundle, but then we're kinda ditching all the other built-in features. I guess that's exactly the Cloudflare play to get you started cheap...
I wasn't aware fly evolved from a CDN. I just saw your comment and the docs mentioning speeding Heroku apps, running nginx proxy, openresty etc, so was curious if it's something worth looking into.
We do this to locally distribute our javascript… from our main rails app we publish out to each region in aws where we have a cluster of redis and nginx and a little bit of lua to grab and render data out… it’s super fast and very easy to maintain … you get your nice normalized database with all its complex queries and then on the client side you get your nice big blob of context all precomputed and perfect for rendering exactly what the client needs… bonus you cache the context data for a shorter time in each local nginx worker for even faster response times then serving from disk…
These posts always strike a nice balance between doing a deep dive on a topic while still staying accessible and building the reader up to the "Aha!" step by step although I have things deployed on Fly so maybe I'm a little biased at this point
"-is" is typical 3rd declension. If it's not neuter (how do you pick a gender for loanwords in Romance languages?), it'd be Redes. If it's neuter it'd be Reda as you said.
Or maybe it'd follow the -eris pattern and be Rederes or Redera.
Sorry I was just being pedantic on a Friday afternoon. The rest of the blog post was great and I agree with all your points. I've been advocating for immutable cache keys for years, for the exact reason you mention.
If we're playing Warhammer 40K, the chapter keeping all these Redises around the galaxy alive might be called the Cachess Redisari. Now I wanna see what happens when get a little too twisted by the chaos.
Great post! I'm not familiar with all of fly's inner workings but I'm guessing that each app gets its own Redis instance at each location (rather than all of your customers sharing instances) because "global" item replication is a bit cheaper if "global" == locations where the app actually exists. I wonder if there's a way for an application to write to a subset of locations (i.e. locations all over the US, I just want to update something in California). Obv I could have each of my CA locations do a local update but maybe I don't want to
The Redis instances we're talking about here are "just" Fly.io apps; they're not like a special platform feature or anything. Which is good news; it means you can do whatever you want with Redis. You could, I guess, do a replica-of-replicas tree configuration, with app nodes holding connections (Redis connections are really cheap) to their local node, a macro-regional node, and the global node and selectively synchronizing. Or you could go the other way and lose some Redis servers, keeping Redis servers only in certain regions. We make it sort of straightforward for apps to suss out the topology with DNS queries.
> Fly runs apps (and databases) close to users, by taking Docker images and transmogrifying them into Firecracker micro-vms, running on our hardware around the world.
Any reason you run the apps on micro-vms? Why not directly on a container runtime?
> MicroVMs provide strong hardware-virtualization-based security and workload isolation, this allows us to safely run applications from different customers on shared hardware.
I'm curious what the moat of fly.io is. Wouldn't Cloudflare use the same Firecracker VMs to replicate the same experience here if it becomes successful?
They're referring to the Cloudflare product suite which has massively grown from CDN to registrar to application security to corporate networking and more. I agree that it can be hard to see and use all the different product lines given how fractured everything is.