One more good experience.
I created an cluster of dedicated servers ( 64 cores, 6TB of SSD storage and 256 GB of RAM, 1 GPU) using Rancher, for about 250 Euros/Month. This would cost at least 2k in a cloud such as AWS. There is a post about how I did with persistent storage here (https://medium.com/@bratao/state-of-persistent-storage-in-k8...)
It really transformed my company DevOps. I´m VERY happy. If you can, use Rancher. It is just perfect!
We're in the same camp with a cluster ~2x as large for Squawk[1] and it would cost us many multiples in the cloud (excluding our TURN relays which aren't k8s). However, the one killer feature that the cloud still has over self hosted is the state layer. There is nothing that comes close to the turn key, highly available, point in time recoverable database offerings from the cloud providers. We're running Spilo/Patroni helm charts, and we've really tried to break our setup chaos monkey style. But I'll admit I'd sleep better leaving it in Amazon's hands (fortunately, with all the money we save, we have multiple synchronous replicas and ship log files every 10 seconds).
_EDIT_ I've just read your blog post. We went the other direction and have used the local storage provisioner to create PVCs directly on host storage, and push the replication to the application layer. We run postgres and redis (keydb) with 3 replicas each with at least one in sync replication (where supported) and shipping postgres wal logs to S3 every 10 seconds.
You can also try databases that are natively distributed with replication and scaling built-in. If you need SQL you have many "newSQL" choices like CockroachDB, Yugabyte, Vitess, TiDB, and others.
Because we needed geographic distribution so that we don't end up hairpinning our users, and they only run a single service so the value prop is much lower. We use route 53 to do geodns across a number of cheap instances around the world (which is also nice, let's you pick regions with cheap bandwidth but good latency to major metro areas). We currently have TURN relays in Las Vegas, New York, and Amsterdam and that gives us pretty good coverage (sorry Asia...you're just so damn expensive!).
But all of our APIs sit in one k8s cluster across two datacenters (Hetzner, with whom we couldn't be happier).
Really interested in hosting at Hetzner, as their prices are fantastic by comparison to AWS, Azure & GCP.
I'm particularly interested in what an HA Postgres setup might look like. Assuming you are running some kind of database (whether Postgres or otherwise), what are you doing for persistent storage? Are you using Hetzner's cloud block storage volumes? What is performance like?
Interesting! Is that a single K8 control plane across one cluster? We've gone with fully isolated clusters across 2 data centers to protect against a network isolation incident between them causing a split brain/borking etcd.
Yes the control plane is only in one of the data centers. The other only runs admin services like offsite backups, our development infra (gitlab, etc) and CI/CD.
We could definitely do two clusters and probably should, but the secondary data center has few services that it wasn’t really worth the extra work.
At first look the numbers in the colourful table near the end, Piraeus/Linstor/DRBD seems 10x faster than Longhorn 0.8. The article goes into great depth of the (a)synchronous replication options of Piraeus, but doesn't mention that Longhorn always does synchronous replication. I wonder why?
I should have made it clearer that Longhorn is sync as default. Linstor is also synchronous as default, but you can mess with it to make async in some situations (In reality you allow it to be out-of-sync).
I´m really rooting for Longhorn. I´m a sucker for GUIs. But in my tests the performance is not there yet.
However, they opened a new epic ticket to focus on performance, and hopefully they will keep improving Longhorn after the acquisition.
You mentioned somewhere that your servers were hosted with Hetzner - are you using their "cloud volume" block storage? Really curious to know what performance is like with this cloud attached SSD storage!
It really transformed my company DevOps. I´m VERY happy. If you can, use Rancher. It is just perfect!