Hacker News new | past | comments | ask | show | jobs | submit login
Fly Kubernetes (fly.io)
272 points by ferriswil on Dec 18, 2023 | hide | past | favorite | 168 comments



This is really exciting, but there are a few things they will certainly have to work through:

*Services:*

Kubernetes expects DNS records like {pod}.default.svc.cluster.local. In order to achieve this they will have to have some custom DNS records on the "pod" (fly machine) to resolve this with their metadata. Not impossible, but something that has to be take into account.

*StatefulSets:*

This has 2 major obstacles:

The first is dealing with disk. k8s expects that it can move disks to different logical pods when they lose them (e.g. mapping EBS to an EC2 node). The problem here is that fly has a fundamentally different model. It means that it either has to decide not to schedule a pod because it can't get the machine that the disk lives on, or not guarantee that the disk is the same. While this does exist as a setting currently, the former is a serious issue.

The second major issue is again with DNS. StatefulSets have ordinal pod names (e.g. {ss-name}-{0..n}.default.sv.cluster.local). While this can be achieved with their machine metadata and custom DNS on the machine, it means that it either has to run a local DNS server to "translate" DNS records to the fly nomenclature, or have to constantly update local services on machines to tell them about new records. Both will incur some penalty.


Am I understanding correctly that because they map a “Pod” to a “Fly Machine”, there’s no intermediate “Node” concept?

If so, this is very attractive. When using GKS, we had to do a lot of work to get our Node utilization (the percent of resources we had reserve on a VM actually occupied by pods) to be higher than 50%.

Curios what happens when you run “kubectl get nodes” - does it lie to you, or call each region one Node?


GKE Autopilot is an attractive option here if you don't want to worry about node utilization and provisioning. Effectively you have an on-demand infinitely-sized k8s cluster that scales up and down as you need new pods. Some caveats, but it's an incredible onramp if you're coming from a Heroku or similar PaaS and don't want to worry about the infrastructure side of things: Github Actions building images and deploying a Helm chart to GKE Autopilot is a remarkable friendly yet customizable stack. Google should absolutely promote it more than it does. https://cloud.google.com/kubernetes-engine/docs/concepts/aut...


Unfortunately last I checked the compute pricing for GKE autopilot was almost double, so if you can beat 50% utilization, you might as well just keep the under-utilized Node around.


If this is “free GKE autopilot” (autopilot billed at the same price as regular Fly Machine compute), then that changes the way I think about Fly’s basic compute pricing a lot.

I would think they should highlight that a lot more in the product announcement!


Say more! What should we highlight more?


As someone not familiar with Fly's offering (but very interested for the same reasons as the post you're replying to!), a couple things come to mind if you're looking at convincing people familiar with k8s to move workloads here:

- https://fly.io/docs/ doesn't show any results when searching kubernetes or k8s or k3s.

- https://fly.io/blog/fks/ is self-admittedly snarky but also doesn't provide details about the product itself. It jumps straight into technical details - and while I like the openness about fault tolerance, there's no paragraph after the intro about what Fly Kubernetes is.

- What exactly does the combination of k3s and virtual-kubelet provide compared to standard k8s? Does it provide secret and confmap storage and namespaces and all those expected things? Can we run things like the Kubernetes dashboard? cert-manager? nginx-ingress?

- On that note, what's the ingress story in general? Is Fly automatically routing traffic to the k8s cluster based on the ingress declarations? Are there limitations? Where are they documented?

- Most people running k8s will have fault-tolerant workloads, but reasonable expectations for pod lifetime and reliability of underlying "hardware" are nonetheless important. If I'm migrating from EKS or GKE and want to run a 24/7 background process, can I expect it to keep running on the same Fly Machine for weeks or months until updated? Or are there limits here? (This might be better documented for Fly Machine but it's worth documenting specifically in this context.)

Absolutely understand that this is an experimental work in progress. It's really cool work! But it's also impossible to even justify playing with as an experiment, with so many unanswered questions about where hard caps in the functionality may be hit.


If I use GKE or any other standard Kubernetes offering (excluding GKE autopilot for now), if I have a variable workload and I want Node-level autoscaling, I will probably pay between 1.5x-2.5x in compute costs above what my Pod requests sum to because of difficulty with Node utilization.

It seems like with FKS, my pods will map directly to Fly Machines billing, and so there’s no compute that I’m paying for but not using


GKE Autopilot is pretty much useless, very few cases where it actually turns out cheaper than simply using Cluster Autoscaler + Node autoprovisioning. Not only is the pricing absolutely absurd, they don't even allow normal K8s bursting behavior (requests need to be equal to limits) which means you not only end up paying more than regular K8s cluster but now also need to highly overprovision your pods


Why would you use GKE Autopilot over Cloud Run?


Cloud Run is great if you just need to deploy a few services and expose their endpoints, and don't have a particularly complex backend service architecture.

But with more complex architectures, you'll end up implementing a sort of GKE-like layer over Cloud Run, at which point GKE would probably make more sense.

GKE lets you shell into containers, run all different kinds of workloads (e.g. no need for a separate "Cloud Tasks" system), supports stateful workloads, provides a standardized language for defining and deploying resources of all kinds (the k8s resource definition language), and as such integrates with standard gitops deployment systems such as ArgoCD.


My understanding is that Cloud Run is not suitable for stateful workloads (databases, etc.)


The node would be a virtual-kubelet. You can check out the virtual-kubelet GitHub repo for more info.

Interestingly, there are already multiple providers of virtual-kubelet. For example, Azure AKS has virtual nodes where pods are Azure Container Instances. There’s even a Nomad provider.

> So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

So probably a cluster per region. You could theoretically spin up multiple virtual-kubelets though and configure each one as a specific region.

> Because of kine, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.

This would mean the control-plane would be on a single-server without high-availability? Although, I suppose there really isn’t any state stored since they are just proxying requests to the Fly Machine API. But still, if the machine went down your kubectl commands wouldn’t work.


The diagram on https://virtual-kubelet.io/docs/architecture/ makes me wonder whether it's possible to have a k8s cluster where the nodes are all virtual kubelets backed by different cloud providers (and then perhaps schedule loads preferentially with selectors)


I think it’s completely possible. Though, you’ll have to manage your own control-plane.

Azure AKS and EKS provide virtual-kubelet functionality in some form, but AKS is an a managed control-plane where you can’t add nodes yourself and EKS only allows nodes in the same VPC.

Edit: It already is a thing. https://github.com/virtual-kubelet/tensile-kube


tensile-kube seems to be structured as a "k8s cluster of k8s clusters", with an upper kubemaster farming out resources to lower kubemasters (through virtual-node). I don't know if there's any particular reason to have that separation; possibly the lower kubemasters could be removed and you could just run a bunch of virtual-kubelets.


I think the biggest hurdle would be networking between the pods since they will be running on different cloud providers.


I've seen some people using wireguard for intra-cluster networking so that all their nodes can run pretty much anywhere.


Wouldn't the network cost be absurd in such case? Not only the pod-to-pod communication cost skyrocket, all the heartbeats, health checks, metrics, daemonsets pinging each other will probably end up costing more than the CPU and Memory


> Had to do a lot of work to get node utilization ... higher than 50%

How is this the schedulers fault? Is this not just your resource requests being wildly off? Mapping directly to a "fly machine" just means your "fly machine" utilization will be low


I think there’s a slight misunderstanding - I’m referring to how much of a Node is being used by the Pods running on it, not how much of each Pod’s compute is being used by the software inside it.

Even if my Pods were perfectly sized, a large percent of the VMs running the Pod was underutilized because the Pods were poorly distributed across the Nodes


Is that really a problem in Cloud environments where you would typically use a Cluster Autoscaler? GKE has "optimize-utilization" profile or you could use a descheduler to binpack your nodes better


DX might be better I suppose, since you don’t have to fiddle with node sizing, cluster autoscalers, etc.

Someone else linked GKE Autopilot which manages all of that for you. So if you’re using GKE I don’t see much improvement, since you lose out on k8s features like persistent volumes and DaemonSets.


> we had to do a lot of work to get our Node utilization ... over 50%

Same, a while back you had to install cluster-autoscaler and set it to aggressive mode. GKE has this option now on setup, though I think anyone who's had to do this stuff knows that just using a cluster-autoscaler is never enough. I don't see this being different for any cluster and is more a consequence of your workloads and how they are partitioned (if not partitioning, you'll have real trouble getting high utilization)


I wonder how it copes with things like anti-affinity rules, where you don't want two things running on the same physical / virtual server for resilience reasons.


You wouldn’t use affinity rules anymore. The pods are scheduled on a single virtual-kubelet node, so if you use anti-affinity scheduling would fail.


> You wouldn’t use affinity rules anymore

Point being: what if I wanted to do this? How could I achieve making sure services were running according to the antiaffinity rules I provided? E.g. not on same physical machine; not on same VM; not in same datacentre; not in same region; etc.


If there were a virtual kubelet per unit of granularity (datacenter, in their case?) then you would be able to use affinity rules just fine.


Right. Though, the virtual-kublets can be running on the same machine actually. They just need to be configured to have different node names.

The press release states that your k8s API is actually running on a single machine with k3s and a virtual-kubelet. So, I’m not sure if it’s one “cluster” per region, or one “cluster” with multiple virtual-kubelets for regions.

Either way, your FKS cluster control-plane would sit in a single region.


How do you forbid running two instances of the same service on one node without anti affinity?


Traditionally, each node is its own machine. virtual-kubelet creates a virtual node that is a proxy to some other pod infrastructure. In the case with FKS, each pod in the virtual node is a machine (a node in the traditional sense), so it’s equivalent of having an anti-affinity on all pods with an infinite node pool.


if it is pod per vm, that would make it like EKS Fargate


is GKS some amalgamation of GKE and EKS


Typo haha - I meant GKE. Fixed now.


Is this still a limitation for Fly k8s?

> A Fly Volume is a slice of an NVMe drive on the physical server your Fly App runs on. It’s tied to that hardware.

Does the k8s have any kind of storage provisioning that allows pods with persistent storage (e.g. databases) to just do their thing without me worrying about it or do I still need to handle disks potentially vanishing?

I think this is the only hold-up that stops me actually using Fly. I don't know what happens if my machine crashes and is brought back on different hardware. Presumably the data is just not there anymore.

Is everyone else using an off-site DB like Planetscale? Or just hoping it's an issue that never comes up, w/ backups just in case? Or maybe setting up full-scale DB clusters on Fly so it's less of a potential issue? Or 'other'?


Not speaking for the FKS case, but in general for the platform: when you associate an app with a volume, your app is anchored to the hardware the volume is on (people used to use tiny volumes as a way to express hard-locked region affinity when we were still using Nomad). So if your Fly Machine crashes, it's going to come back on the same physical as the volume lives on.

We back up volumes to off-net block storage, and, under the hood, we can seamlessly migrate a volume to another physical (the way we do it is interesting, and we should write it up, but it's still also an important part of our work sample hiring process, which is why we haven't). So your app could move from one physical to another; the data would come with it.

On the other hand: Fly Volumes are attached storage. They're not a SAN system like EBS, they're not backed onto a 50-9s storage engine like S3. If a physical server throws a rod, you can lose data. This is why, for instance, if you boot up a Fly Postgres cluster here and ask us to do it with only one instance, we'll print a big red warning. (When you run a multi-node Postgres cluster, or use LiteFS Cloud with SQLite, you'd doing at the application layer what a more reliable storage layer would do at the block layer).


And fly becomes the standard cloud provider like everyone else. I think this transition is only natural. It's hard to be a big business without catering to the needs of larger companies and that is the operation of many services, not individual apps.


Nothing is changing for anybody who doesn't care about K8s. If you're not a K8s person, or you are and you don't like K8s much, you shouldn't ever touch FKS.


I used Fly for some projects, I really like it.

But once again, for many of my projects, I still need my outbound IPs to resolve to a specific country. I can't have them all resolve to Chicago, US in undeterministic ways.

I would be willing to pay an additional cost for this but even with reserved IPs, I am given IPs that are labelled as Chicago, US IPs by GeoIP providers even for non US regions.


fwiw - our network folks _should_ have fixed this a few weeks ago. Some of the outbound IPs were incorrectly tagged in some of the geoip databases as being in the US when they were not.


I remember asking about this about half a year ago. Back then I was told since Fly can route these IPs to wherever they want in their infrastructure by making simple configuration changes, the line is blurred anyway but GeoIP providers just take the shortcut and resolve to where the company is registered.

I'll try again if the situation is different now.


Inbound IPs use anycast - the IP we give you routes to the nearest fly region and then hits a wireguard tunnel to get to the correct region.

Outbound IPs are tied to the individual host your machine is running on (which is located in a single region). These are tied to a single region, and in some cases, these were resolving to the wrong region in a few of the GeoIP databases. We hopefully fixed this part.


Thank you, I will try again today. It would unblock deploying my usecase on Fly.


If they are reluctant and only do it because they have to, are they really the right vendor for managed k8s?

What about them makes for a good trade-off when considering the many other vendors?


We're not a K8s vendor. We're a lower-level platform than that. If all you care about is K8s, and no part of the rest of our platform is interesting to you --- the global distribution and Anycast, the fly-proxy features, the Machines API --- we're not a natural fit for what you're doing.

We were surprised at how FKS turned out, which is part of why we decided to launch it as a feature and all of why we wrote it up this way. That's all.


> We're not a K8s vendor

It might now be more accurate to say "You were not a k8s vendor", but now you are based on

> If K8s is important for your project, and that’s all that’s been holding you back from trying out Fly.io, we’ve spent the past several months building something for you.

If it's fundamentally different, maybe you shouldn't call it Kubernetes, perhaps a Kubernetes API compatible alternative?

fwiw/context, I use GKE and also many of the low-level services on GCP

Is Fly supposed to be simpler for the average developer?


Are any of the cloud provided managed K8s offerings just K8s under the covers? I’ve always assumed all of them shim the K8s api onto other more bespoke orchestration systems.


Most are pretty much actually k8s, though they tend to have different ways of handling the masters, my understanding is that it's the same k8s binaries and code

What I've seen is the k8s APIs working their way into other systems like cloud functions and servers, so you can use the same Yaml across vendors and products. This is further solidifying k8s APIs as an industry standard

Searching "managed kubernetes providers" is a good starting point to learn about the various offerings


I’ve looked extensively at the documentation for gke autopilot (a system I use extensively) and haven’t found any documentation on how they orchestrate those clusters under the covers.

I’ve always assumed it was a plugin to borg, not a different fleet orchestrator. Not to be overtly needy, but do you have a link to docs that contradicts that?


It is definitely not a plugin for borg. Borg is totally different api (source - i was borg sre). Afaik it’s actually vanilla k8s apiserver with some shimmed bespoke storage but it’s not really documented anywhere. You can test that fact using kubectl proxy though



You’ll note those documents are extremely careful in describing the control plane architecture to not promise you are running a stock k8s install. Which is why I’ve always assumed otherwise.

But I’ll trust the sibling comment which suggests the only bespoke component in gke is storage well enough to leave it alone.


Maybe you are reading it differently than me, but when they refer to the same binaries, as if I managed it myself, as being the pieces they use, it definitely seems like the open source project being used.

> The control plane is the unified endpoint for your cluster. You interact with the control plane through Kubernetes API calls. The control plane runs the Kubernetes API server process (kube-apiserver) to handle API requests.

> A node runs the services necessary to support the containers that make up your cluster's workloads. These include the runtime and the Kubernetes node agent (kubelet)


We are well into the reeds of what doesn’t matter, only in that fly has given us a very under the covers look at their implementation that is hard to find with other alternatives. But as someone who has run K8s in other contexts I find the following to be pretty circumspect (not in a way that causes me concern, I’m a happy gke user)

> GKE Autopilot manages the entire underlying infrastructure of clusters, including the control plane, nodes, and all system components. If you use GKE Standard mode, GKE manages the control plane and system components, and you manage the nodes.

There is a mile of implementation detail in that. Which I’m happy for them to keep on their side of the street.


> I’m a happy gke user && Which I’m happy for them to keep on their side of the street.

100% agreement

iirc, there was a time where I thought they were doing some consolidation things with how they run the control plane, then at some point my cluster updates had a warning related to control plane unavailability during an update, this was on a single node cluster

I get what you are saying though, there's probably some magic going on somewhere, but after many years on GKE, I don't really think about it.


I'm excited about this as a way to configure my Fly.io apps in a more declarative way. One of my biggest gripes about Fly.io is that there's a lightly documented bespoke config format to learn (fly.toml), and at the same time there's a ton of stuff you can't even do with that config file.

I love Kubernetes because the .yaml gives you have the entire story, but I'd _really_ love to get that experience w/o having to run Kubernetes. (Even in most managed k8s setups, I've found the need to run lots of non-managed things inside the cluster to make it user-friendly.)


Probably good for people already used to fly or interested in fly for other reasons, that could also use k8s ?

Sometimes you just want to run k8s without thinking too much about it, without having all the requirements that gcp have answers to.


If their reluctance were based on valid reasons that they handled in a unique way - might be good. In theory.


k8s has become a standard api and platform for running apps, having a * on it makes the implementation an outlier from the standard, not normally considered a good thing because you have to be aware of the nuanced differences.


Maybe a got fit for someone who is reluctant to use Kubernetes but has to for whatever reason.


If someone isn't a cloud provider they should be reluctant to use Kubernetes.


Aren't most k8s users not cloud providers?

It's more about good abstractions and APIs for running applications in the cloud. Cloud providers are the one's offering the APIs and abstractions we use, and increasingly putting k8s abst/apis at the forefront, because that is where industry has moved to


> Aren't most k8s users not cloud providers?

Yes. The point still stands.


People and companies use it because it makes this easier in several areas, it's industry standard at this point.

Can you explain why we should be reluctant to use k8s?


> People and companies use it because it makes this easier in several areas

Compute on demand is significantly more complex using k8s than what most companies already pay for from their own cloud provider, AWS is the industry standard for this purpose, followed by Azure, then TF, then Pulumi maybe.

k8s is a meme for resume-driven development.


Ok, you're obviously just a hater and aren't worth taking seriously


Plenty of companies/teams qualify as "cloud providers" even though all the usage is internal.


There is a very high price to pay when going with your own scheduling solution: you have to compete with the resources google and others are throwing at the problem.

Also, there is the market for talent, which is non-existent for fly.io technology if it's not open source (I see what you did here, Google): you'll have to teach people how your solution works internally and congratulations, now you have a global pool of 20 (maybe 100) people that can improved it (if you have really deep pockets, maybe you can have 5 Phd). Damn, universities right now maybe have classes about Kubernetes for undergrad students. Will they teach your internal solution?

So, if a big part of your problem is already solved by a gigantic corporation investing millions to create a pool of talented people, you better take use of that!

Nice move, fly.io!


What if it’s really not that complicated, and by adding more people you make it more complex. So complex that you need even more people to maintain that complexity?

I love fly.io for rethinking some of the problems.


How does this handle multiple containers for a Pod? In a container runtime k8s, containers within a pod share the same network namespace (same localhost) and possibly pid namespace.

The press release maps pods to machines, but provides no mapping of pod containers to a Fly.io concept.

Are multiple containers allowed? Do they share the same network namespace? Is sharing PID namespace optional?

Having multiple containers per pod is a core functionality of Kubernetes.


You can use mount namespaces, or even containers in your VM. Maybe that's how?


Fly.io claims it’s “just a VM”. But, Fly.io Machines are an abstraction of microVMs using Firecracker. Building upon that, the FKS implementation is an abstraction on top of Fly.io Machines. So what I’m asking is how, if even, does the FKS implementation support multiple containers for a pod? Using FKS, the abstraction is no longer a VM.

It seems that Fly.io Machines support multiple processes for a single container, but not multiple containers per Machine [0]. This means one container image per Machine and thus no shared network namespace across multiple containers.

[0] https://community.fly.io/t/multi-process-machines/8375


You can run Docker on a Fly Machine, and run arbitrary numbers of containers inside of it. Or you can run lots of small Fly Machines. FKS is just one model for deploying things.


Right, but what is the point of FKS then? It’s no longer Kubernetes if it doesn’t support a core behavior of Kubernetes.

If you only support deploying single containers with single processes on FKS, then you might as well use flyctl.

It’s a solvable issue of course. The virtual-kubelet implementation would need to create a Machine running a container runtime image that would then run a the pod containers to match the pod configuration.

I think that some disclaimers of the limitations of FKS compared to standardized Kubernetes should be present and highly visible.


We're actively working on the ability to run multiple processes with different images because it's something people using our platform want and it just happens to also be something needed for us to make FKS a more standardized Kubernetes offering.


Seems like Fly.io Machines are trying to reimplement Kata Containers with the Firecracker backend [0], but also abstracting away the host hypervisor machine infrastructure.

Kata has a guest image and guest agent to run multiple isolated containers [1].

[0] https://katacontainers.io/

[1] https://github.com/kata-containers/kata-containers/blob/main...


The article discusses what you get by using K8s alongside Fly.io. If you want to bin-pack containers onto Fly Machines, you can of course just boot up your own K8s cluster here; that has always been an option.


It should discuss what you don’t get compared to the standard behaviors of Kubernetes.


The problem is not the need to bin-pack, the problem is completeness. Sidecars and multiple containers are used for logging, backups, etc. Not to mention that if you grab a manifest or chart for an app, it is going to have pods with multiple containers (whether that's strictly needed or not), and those won't work on fly.io.

This is a critical feature available in every Kubernetes offering, and as such people rely on it. Trying to say that maybe you can do without is missing the forest for the trees.

Like saying that your C compiler doesn't need arrays because it has pointers: sure, maybe, but now good luck compiling any existing code. Maybe don't call it a C compiler if no C program will work on it unmodified.


Again: Fly Machines are just Linux VMs, and you have root on them.


I'm discussing the capabilities of Fly Kubernetes, not Fly Machines. This is good news though, it means they might get there in the future.


Why should you do this - sounds like an antipattern to me


It’s used widely in the Kubernetes world and is known as sidecars [0].

[0] https://kubernetes.io/blog/2023/08/25/native-sidecar-contain...


Widely? I lived in this helm, kubernetes, pulumi world the past 4 years and we followed the simple rule of one service/container per pod. Why add complexity where it’s not needed. Like running a dB a and a service in the same docker container - a no go for me and many.


Kubernetes official documentation states, "A Pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers."

Running more than one container in a pod is a fundamental concept of Kubernetes. Init containers and sidecars allow for a separation of concerns, which is essential for non-cloud-native workloads. Logging and telemetry are just a couple of features which may be designed and built into cloud-native applications, but legacy applications need this flexibility without modifying the application itself.

The fact you ran K8S for four years without it demonstrates only that it is not required by your workload -- not that it is "unnecessary complexity" or an "anti-pattern."


The thing is you might not be adding the sidecars yourself.

Kubernetes has a resource called MutatingAdmissionWebhook that allows mutations of the Pod object before creation.

I think the most common use case is for service meshes. Your Deployment might not have any sidecars, but the service mesh controller will automatically add a network proxy container to your pod via the admission webhook.

Another use case would be OpenTelemetry or some other observability service sidecar injection for auto-instrumentation.


Ok that was something inward missing on my end. Thanks for all the explanations and sorry for me being wrong


Nobody would suggest this. But what about a metrics scraper for your db pod? That’s where sidecars come in.


Like helm, it's a widely used anti-pattern


Its because you have multiple processes (containers) that work together in their little pod. You could stick them all in a single image somehow but that would be much more work and less flexible.


Great writeup! Love reading about orchestration, especially distributed.

> When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

Why a single machine? Is it because this single fly machine is itself orchestrated by your control plane (Nomad)?

> ...we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF). But the ideas are the same.

very cool, is this similar to how Cilium works?


The control plane is not nomad anymore : https://community.fly.io/t/the-death-of-nomad/16220


Man, I just wish they'd work on stability. Fly.io is an amazing offering. But it's so buggy, it's almost more headache than it's worth trying to build PaaS-flavored software on it. Even the Fly docs are "buggy" since they mostly transitioned to v2 Machines but the docs are still a mix of Nomad and Machines.

There's so much power on the platform with Flycast, LiteFS and other clever ways to work with containers. If it was 90% stable I'd consider it a huge win.


I agree - I find if you pick the "mainstream" regions like IAD you get close to 100% uptime, like what you see from my 3rd-party status page here: https://flyio.onlineornot.com/

Once you start deploying in SIN/CDG etc you start to get really weird instability (and this is on v2 machines).


One of fly's main features is global distribution so it's kinda silly if you have to avoid SIN and CDG


You can use hkg and ams instead :)


I haven't been able to bring up a clustered Elixir server in hkg without experiencing netsplits every 5-10 minutes. ewr, ord, and cdg have been totally reliable.


I'm confused about what this is actually offering (also very tired due to some flight problems; anyway)

To me, I'd imagine kubernetes on fly as running kind (kubernetes in docker) with fly converting the docker images to firecracker images OR "normal" kubernetes api server running on one machine then using CAPI/or a homegrown thing for spinning up additional nodes as needed.

So, what's the deal here? Why k3s + a virtual kublet?


You can certainly boot up your own K8s cluster, any way you'd like to, just by enlisting a bunch of Fly Machines and configuring them yourself. A Fly Machine is just a VM, and you have root in the VM. You can set up systemd, you can set up Docker, you can run kubelets on all your Machines.

The thought here is: Fly.io already does a lot of the things any K8s distribution would do. If you were to boot up a complete K8s distribution on your own Fly Machines, running oblivious to the fact that they were on Fly.io, you'd be duplicating some of the work we'd already done (that's fine, maybe you like your way better, but still, bear with me).

So, rather than setting up a "vanilla" K8s that works the same way it would if you were on, like, Hetzner or whatever, you can instead boot up a drastically stripped down K8s (based on K3s and Virtual Kubelet) that defers some of what K8s does to our own APIs. Instead of a cluster of scheduling servers synchronized with Raft, you just run a single SQLite database. Instead of bin-packing VMs with Docker and a kubelet, you just run everything as an independent Fly Machine.

We took the time to write about this because it was interesting to us (I think we expected a K8s to be more annoying for us to roll, and when it was easier we got a lot more interested). There are probably a variety of reasons to consider alternative formulations of K8s!


Right, I think I get it now, like benpacker said in https://news.ycombinator.com/item?id=38685760 it's a way to map the kubernetes API to the fly platform (basically). Which makes a lot of sense!

Not sure what the implications of that are in practice but sounds interesting.


Did you consider using/adopting the seemingly defunct nomad virtual kubelet?


Do people want to run Nomad on Fly.io? It wouldn't have addressed any of the reasons we replaced Nomad with flyd; it's hard to scale a globally synchronized distributed database for real-time scheduling.


I thought you guys used Nomad, disregard. The virtual kubelet for Nomad would've let you proxy Kubernetes into your Nomad environment basically.


Always look forward to reading the fly.io blog write-ups. As much as people hate it, K8s has become the defacto operating system for the cloud so it makes sense to support it.


I like the discussion on scheduling. One of the things I've thought recently is that, since there's no one model of how an app or system should work, nor one network architecture, there shouldn't be one scheduler.

Instead, I think the system components should expose themselves as independent entities, and grant other system components the ability to use them under criteria. With this model, any software which can use the system components' interfaces can request resources and use them, in whatever pattern they decide to.

But this requires a universal interface for each kind of component, loosely coupled. Each component then needs to have networking, logging, metrics, credentials, authn+z, configuration. And there needs to be a method by which users can configure all this & start/stop it. Basically it's a distributed OS.

We need to make a standard for distributed OS components using a loosely coupled interface and all the attributes needed. So, not just a standard for logging, auth, creds, etc, but also a standard for networked storage objects that have all those other attributes.

When all that's done, you could make an app on Fly.io, and then from GCP you could attach to your Fly.io app's storage. Or from Fly.io, send logs to Azure Monitor Logs. As long as it's a standard distributed OS component, you just attach to it and use it, and it'll verify you over the standard auth, etc. Not over the "Fly.io integration API for Log Export", but over the "Distributed OS Logging Standard" protocol.

We've got to get away from these one-off REST APIs and get back to real standards. I know corporations hate standards and love to make their own little one-offs, but it's really holding back technological progress.


You're basically describing Kubernetes and why it has become so popular


Every time the topic of k8s is being discussed on hn, without fail, someone will chime in saying basically "i hate k8s - it would be better if someone did <proceeds to describe all the things that k8s already does>"


K8s is the opposite. It's proprietary, insular, not compatible with anything else, tightly coupled, not layered, not backwards compatible, etc. It has network services, logging, auth, etc, but so does literally every other system in the world, that doesn't make them all identical.

K8s is popular because it's free, has a lot of bells and whistles, and was made by Google. Otherwise nobody would use it. It's basically a larger, slightly less crappy Jenkins.


> K8s is the opposite. It's proprietary, insular, not compatible with anything else

Not accurate, k8s is open source and every major tech company is developing or using it. There are a large number of companies building on top of it as well.

- https://github.com/kubernetes (open source)

- https://landscape.cncf.io/ (huge ecosystem)

- https://k8s.devstats.cncf.io/d/9/companies-table?orgId=1 (count of contributions by company)

> It's basically a larger, slightly less crappy Jenkins.

What? This is not even remotely accurate. Where did you come to this opinion?

K8s replaced mesos/marathon, which was the dominant open source orchestration system at the time. Jenkins is a CI system and can run on k8s. There is also JenkinsX that is trying to be the yaml driven, k8s native platform, but I think it missed the mark. There are better k8s native CI/CD systems like the Argo projects


Proprietary as in, relates only to itself, not compatible with anything other than itself. It's a monolith with no standards. You have to write custom software to make anything work with it. Nothing just works with k8s out of the box, because it provides no loosely coupled standard interface that remains backwards-compatible. It has an API that becomes obsolete every 9 months.

Jenkins and K8s do effectively the same thing. They're both monolithic applications (in K8s' case it's a monolith of microservices, but same difference), both have a manager/worker (formerly master/slave) architecture, both run arbitrary workloads, load secrets, store and retrieve logs from your workload, schedule them to execute on worker nodes that you configure, manage users and permissions, etc, etc. They're functionally extremely similar: distributed centralized systems designed to execute arbitrary tasks. The difference is mostly technical. Ironically, Jenkins is the more flexible of the two, with much more stable interfaces.

People in tech think in terms of cargo-cult imaginary categories, like an "orchestration system" - which isn't a computer science concept. Schedulers are, operating systems are, but "orchestrators" are not. It's a term made up to sell a product, like a configuration management engine (Terraform) or an application workload scheduler (K8s). Very different things that people use the same word ("orchestrator") for because it sounds cooler, but doesn't mean anything.


> It's proprietary, insular, not compatible with anything else, tightly coupled, not layered, not backwards compatible, etc.

Is this a joke post? It's literally the opposite of all those things


> I know corporations hate standards and love to make their own little one-offs, but it's really holding back technological progress.

Corporations create standards all the time, either directly or through standards bodies, that they also fund. You can already push logs with syslog, or transform them with Beats then push them; you can already attach storage from elsewhere, etc etc. It's just often a bad idea to for performance and data movement cost reasons.

I don't see the major technological progress this holds back, and if you think technological progress is a measure of how much corporations hate standards, then by that logic, based on the last 50 years of utterly insane progress, they must love standards.



Having little experience with k3s, how big of a workload (“nodes” aka virtual kubelets, pods, crds, etc) can you have before saturating the non-HA control plane becomes a concern?


This looks interesting, but I run a bare metal k8s cluster over wire guard for independence. Not willing to rely on a nonstandard api/platform. Current provider annoys me and I’m shutting down nodes the next day. Probably could not do that on FKS.


This is impressive, but also seems to fly in the face of their raison d'etre. I don't even bother with k8s on AWS because it's too complex for even a mid-size operation. Isn't the point of PaaS to obscure complexity?


We're not replacing Fly.io and the Fly Machines API and the Fly Launch stuff in `flyctl` with FKS. FKS is just there for people who want a K8s interface. If you're not interested in K8s at all, you shouldn't touch FKS.


Most of these PaaS are just abstracting their k8s away from you in the end anyway. But they'd never tell you that, they need to be able to switch back to Mesos or whatever the market heads to in 10 years without scaring customers.


Wouldn't it have cost less to enhance the Nomad scheduler rather than move to, and enhance, Kubernetes?

This aside, Fly is in a position to build its own alternative to K8s and Nomad from scratch, so maybe it will?


We absolutely have not moved to K8s. We've just added a feature that lets you run K8s, in a particularly simple configuration, if K8s is what you want. If you weren't already interested in using K8s, you shouldn't touch FKS.

The ordinary way someone would boot up an app on Fly.io is to visit a directory in their filesystem with a Rails or Django or Express app or something, or a Dockerfile, and just type `flyctl launch`. No K8s will be involved in any way. You have to go out of your way to get K8s on Fly.io. :)


They have for their infrastructure, as I understood from this and previous blogs. This is for their user-facing offering. It makes sense if people are using other cloud K8S solutions and want to migrate without rethinking too much of their existing architecture.


I kind of miss the point of this. So if I'm reading this right, fly.io practically only exposes the Pods API, but Kubernetes is really much more than that. I'm not very familiar with any serious company that directly uses Pods API to launch containers, so if their reimplementation of Pods API is just a shim, and they're not going to be able to implement ever-growing set of features in Kubernetes Pod lifecycle/configuration (starting from /logs, /exec, /proxy...) why even bother branding it Kubernetes? Instead they could do what Google does with Cloud Run (https://cloud.run/) which Fly.io is already doing?

I don't know why would anyone would be like "here's a container execution platform, let me go ahead and use their fake Pods API instead of their official API".


This is a good comment. More like this!

Right now, the immediate things you'd get out of using FKS are:

* The declarative K8s style of defining an app deployment, and some of the K8s mechanics for reconciling that declaration to what's actually running. We did most of this stuff before when we were backed on Nomad, but less of it now with Fly Machines. If you missed having a centralized orchestrator, here's one.

* Some compatibility with K8s tooling (we spin up a cluster, spit out a kubeconfig file, and you can just go to town with kubectl or whatever).

This is absolutely not going to let you do everything you can possibly do with K8s! Maybe we'll beef it up over time. Maybe not many people will use it, because people who want K8s want the entire K8s Cinematic Universe, and we'll keep it simple.

Mostly: we wrote about it because it was interesting, is all that's happening here.

I think you asked a super good question, and "I don't know, you might be right" is our genuine answer. Are there big things this is missing for you? (Especially if they're low-hanging fruit). I can (sort of) predict how likely we are to do them near term.


I think there’s potential here.

It is Kubernetes since they are running k3s as the control-plane. It’s not just an implementation of the Pod API, it’s an implementation of kubelet which handles logs/exec/etc APIs. The rest of the Kubernetes API is part of the control-plane on k3s.

The only major issue I see is persistent volume support, but persistent volumes in Kubernetes were always a bit flaky and I’ve always preferred to use an externally managed DB or storage solution.


Nice!

Was there an internal project name for this? Fubernetes? f8s? :D


The internal project name was FKS. How could you do better than fks? :)


Do you handle high throughput volumes? I would need this for testing to host a database service at scale.


i definitely want to try this! never really worked with kubernetes, because it always seemed too complicated, for what i needed. after using fly.io for my first real web project in a while, they do seem to provide exactly what i want from a "hoster".


Well, that's a surprise. Glad to see that the team is flexible and willing to change. :)


Apples to oranges, but it has a similar vibe to when Deno added npm compat eventually.


> But, come on: you never took us too seriously about K8s, right?

What a strange way to admit they were wrong.


Is that what we did here? You get that this is just a `flyctl` feature and some Dockerfiles, right? You could have built FKS yourself by forking `flyctl`.


Maybe you were not wrong on technical merits, but your move proves you underestimated the value of the Kubernetes ecosystem of lots and lots of standards and trained talent. It’s like building a fancy CMS today with a fancy API, but then make it Wordpress compatible when you realize the value of WP is not just in the tech, but also in the ecosystem.


I have so many questions, it is a very good article!

My most important one is this: can I build a distributed k8s cluster with this?

I mean having fly machines in Europe, US and Asia acting as a solid k8s cluster and letting the kube scheduler do its job?

If yes then it is better than what the current cloud offerings are, with their region-based implementation.

My second question is obviously how is the storage handled when my workload migrates from the US to Europe: so I still profit from NVME speeds? Is it replicated synchronously?

Last but not least: does it support RWM semantics?

If all the answers are yes, kudos, you just solved many folk’s problems.

Stellar article, as usual.


Wen custom OS?


I am a current Fly customer (personal and work), and have been happy with the service. Will likely be trying this out. That said, the marketing tone of this final part of the blog:

> More to come! We’re itching to see just how many different ways this bet might pay off. Or: we’ll perish in flames! Either way, it’ll be fun to watch.

is like nails on the chalkboard for me.


Why? If you're using something like Fly you should 100% always have a fallback plan ready. You are gambling using smaller players to get cheaper services or some other benefit the big players don't offer in exchange for the very real possibility of a random day they announce 30 days til they permanently shutdown with zero migration path.

I don't think it's in poor taste to acknowledge exactly what everyone should understand and be prepared for.


You can communicate those ideas specifically without hiding it beneath the veneer of relatability. The entire post started with this bit:

> But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn’t mean it’s not a great fit for what you’re building. We’ve been clear about that all along, right? Sure we have!

which already starts the post in a bad space for the reader. I have cognitive whiplash from what is intended. "We DON'T like Kubernets UNTIL WE DO but then WE MIGHT NOT IN THE FUTURE". Clear meaning is far more appreciated.


We're not trying to sell you so much as we are trying to put you in the headspace we are in building this stuff. Your summary (we DON'T until we DO but MAYBE NOT) does feel pretty true to life!


Interesting! We're mostly not kidding about that. We launched in 2020 with a scheduler that looks a lot like how K8s works†. We ran into scaling issues. Instead of scaling a globally coordinated "eye in the sky" scheduler, like Nomad and K8s offers, we relaxed a constraint ("when you ask to run a job, we'll move heaven and earth to put it somewhere") and wound up with a totally different scheduling model (a market-based system that bids on resources, where requests to place jobs are all effectively fill-or-kill limit orders).

This was a bet. We're bullish about this bet! Even without K8s, having core scheduling be "less reliable" but with a simpler, more responsive interface puts us in a position to do some of the "move heaven and earth" work that K8s and Nomad do in simpler components (like: we can write Elixir code to drive the scheduler).

But it might not pay off! That's what makes it a bet.

(see: comments on this thread asking why overengineered and wrote out own version of stuff; the expectation that you'd run a platform like Fly.io on standard K8s or Nomad is pretty strong!).


Some people have a cheeky sense of humor. Counterpoint: I'm ok with it.


I made the same bet with cloudways which is now owned by digitalocean. they filled a gap for me, and I was ok if they decided to close shop; I am glad it didn't go that direction, and they are part of a bigger company that also was once a small company, but they are now publicly traded. you make your bets...


> To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF).

That is quite the opposite of “simple”. That is in fact, overly complex and engineered.


How do you know their own Anycast proxy isn't simpler than K8s CNIs? Building something yourself isn't necessarily overly complex or over engineered. Sometimes building a simple thing yourself is the way to simplicity when the only available options already built are very heavy/overkill or complex


What part of it is overly complex and engineered? Maybe you're right, but it's hard to respond without a better idea of what you think our problem domain was.


To be fair I don't have insights into your projects. But generally speaking in my experience, anytime there is already some standard most people have adopted, rolling your own solution is usually the wrong solution, and typically over engineered.


Reading the features of your CNI, I don't see why Calico wouldn't have worked for your needs, what made you want to DIY?


Can you say more about what you think our needs were? I'm not trying to be evasive, I just want to spare you a 9 paragraph response that doesn't address anything you were thinking.


This is all very common platform/infrastructure stuff for any PAAS. Even more-so as multi-tenant k8s (and nics, and nvmeOF, etc) isn't exactly one of the most supported or talked about things. Lots of secret sauce everywhere, but they have to do it in a lot of scenarios.


Why should one use kubernetes? Or rather, at what point of an apps growth cycle does k8s become appropriate?


Kubernetes is popular because it solves problems at a certain scale. It's not for super small environments because you need a number of infrastructure engineers to manage it. But if you have a few hundred or thousand employees and don't want to write your own orchestration, it makes sense.

That said, it's a questionable design choice when you get to a hyperscale environment, since all the primitives are extremely opinionated and have design and scalability issues with service discovery, networking, and so on. All the controllers had to be rewritten, we had to roll our own deployment system, our own service discovery system, our own load balancing, and so on. But if you reach this level, you're probably making a lot of money and can figure out how to solve your problems.


Kubernetes is not really meant to assist apps themselves. It's a tool for organizations with multiple independent development teams which helps define a single source of truth for whats running where.

Kubernetes is a great fit for even extremely simple applications - assuming you have dozens to keep track of and dozens of developers who want to make changes to them.


> Or rather, at what point of an apps growth cycle does k8s become appropriate?

The real problem is that the point it becomes attractive to have something like Kubernetes is not too far from the point where Kubernetes becomes an overly-complex mess of disparate parts.


I'd say, not in an app's growth cycle, but when an organization wants to manage and scale platforms for itself, on which it runs apps, is when k8s becomes appropriate. In other words, k8s is a platform builder.


If the org is going to go that far with managing and scaling their platform, why would they use fly then?


I ditched k8s and imported an eBPF library into my project. When certain conditions are met I fork logic, and scale back as needed. I haz a v8-like engine built into my project.

Not needing a bloated black box sysadmin framework (aside from Linux itself, which is plenty bloated and over engineered) is a huge time saver. And the eBPF libs have a lot of eyes on them.

IMO sysadmin and devops are done for. They lasted this long to “create jobs”.


This is one of the biggest footguns of a tech company I've seen in the last decade.

Time will tell if embracing the complexity of Kubernetes was a good play for them or not. But, in all honesty, I'm pretty sad to see this happening, although I'm sure they had their reasons.


We don't use k8s and you don't have to either. This is for current and future users who absolutely want k8s. We are a compute provider after all and making it easy to host a great variety of apps is good for our users.


We really didn't expect so many people to read this like "Fly is going all K8s"! It's interesting.


There are a substantial number of frankly ignorant people on HN who see K8S and run for the hills just because it is something they genuinely do not comprehend or were burned by an inappropriate deployment. The irony is that you arent switching to K8s, you are simply offering it as a compatibility layer, and still seeing flak. Case in point.


It seems I misunderstood the article. I'm happy to read this in any case, thanks for the follow up!


fly.io employee here, it's basically an adaptor from the Kubernetes world of YAML to the fly.io world of Machines. How could we have framed it better so that it was more clear?


A more in-depth analysis of which parts of the Kubernetes spec are unsupported by this adaptor would be extremely useful in evaluating it's viability for any given use case.


If the article was summarized with a TL;DR or phrase along the lines you mentioned "it's basically an adaptor from the Kubernetes world of YAML to the fly.io world of Machines" that would have made it way easier to understand.

In my opinion, the concept is not trivial to grasp by oneself (meaning: is not trivial to drive into that conclusion even after reading the article). So, being explicit first, and guide the reader through that concept along the article would have made a big difference.


Kubernetes is really epic and powerful if you actually take the time to understand it from first principles. Unfortunately people don't do this, and individuals without good networking/devops experience roll something half-baked out with a terrible deployment process, a mess of helm charts, etc... and it ends up being hated by everyone.

At FarmLogs (yc 12) we had a pretty righteous gitops (homegrown) kube platform running dozens of microservices. We would not have been able to move as quickly as we did and roll out so many different features without it. This was back when people had just started to adopt it. Mesos was still a contender (lmao). We were polyglot too - python/clojure mixture. Heck, we even ran an ancient climate model called APSIM that was built in c#/mono, required all kinds of ancient fortran dependencies etc and it worked like a charm on kube thanks to containers. We had dedicated internal load balancers behind our VPN for raw access to services and endpoints, like "microservice.internal.farmlogs.com" (this was before istio, fabric networks, all the incredible progress that exists now)

I recall Brendan Burns asking me to write up a blog post for the Kube blog about our success story, but unfortunately was so saddled with product dev work and managing the team that I never found time for it.

I will absolutely adopt K8s again one day (very soon) but you need to know how to harness its capabilities and deploy it correctly. Build your own Heroku that fits your business. Use the Kube API directly. It's really not hard. It gets hard due to all the crap in the ecosystem (helm, yaml files). Hitting API direct means no yaml =)

I am stoked to see Fly offering this.


I have used a number of orchestrator platforms in production, including AWS ECS (Docker + custom orchestration), EKS (Kubernetes) and Google Cloud's Kubernetes.

I've also used Chef, custom RPM packages and classic Unix startup scripts. And I'm probably forgetting some.

And honestly? Kubernetes can be really great. Especially if you:

- Read enough to understand the split between pods/replication controllers/deployments, which is a bit unusual, and the fact that "services" are basically a name lookup system. This split is weird, but it's not that hard to figure out.

- Pay someone for a quality managed Kubernetes.

- Don't get clever with the networking overlays.

I especially like the way that Kubernetes allows me to deploy almost anything with a short YAML file, and the fact that I never need to worry about individual servers at all.

Now, I wouldn't use Kubernetes if I could get away with a "Heroku like" system. But for anything more complicated than that, Kubernetes can be pretty simple and reliable. Certainly I'd take Kubernetes over a really complex Terraform setup.


There is absolutely (still!!!!) room in the market for a PaaS built on top of Kube that is actually good, and is a hybrid of Heroku+Kube. More convention, less all the rope to hang yourself with, while still enabling advanced use cases with full control.

Even docker compose is too annoying for what most people need.

Most of our use cases are so simple. I need a (black box) container that exposes a TCP port. I want N of them. I want them load-balanced with a friendly name. I want them load balanced behind a friendly name with auth in front of it. I want centralized logs, based on these names. I want centralized stats, based on these names (not xyz-pod-232903209284390-dev but xyz aggregated). I want auto deploys based on a github repo. I want my releases tagged with the git short hash.

When someone cracks the nut of heroku+kube they will become the next billionares. This is why I think it is wise to try and enter this space, as Fly is doing.

The new stuff Microsoft has announced (named Radius, not to be confused with RADIUS network auth protocol) is the closes thing to this sort of solution I have been imagining for years: https://azure.microsoft.com/en-us/blog/the-microsoft-azure-i...

I want N http/tcp/php/python/node/insert-tool-here API or web servers. I want a best-practices RabbitMQ deployment, a best-practices Redis deployment, and a best-practices PostgreSQL deployment. For Redis, I do not care about state, just make it work. For Rabbit, I want 3+ nodes for HA. For PSQL, I do care about state so please use <insert EBS volume here> and do backups.

This is the same shit we have been reinventing for years over and over again and the recipes are all the same now. Radius is the best (at least on paper) attempt at unifying all this stuff.

I've been beating this drum for over 10 years now though, I keep thinking someone will figure it out and no one has. Maybe time to bite the bullet and just do the thing.


I'm guessing this is one of the areas where sticking to a vision loses out over winning the most business in the short term.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: