Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One more good experience. I created an cluster of dedicated servers ( 64 cores, 6TB of SSD storage and 256 GB of RAM, 1 GPU) using Rancher, for about 250 Euros/Month. This would cost at least 2k in a cloud such as AWS. There is a post about how I did with persistent storage here (https://medium.com/@bratao/state-of-persistent-storage-in-k8...)

It really transformed my company DevOps. I´m VERY happy. If you can, use Rancher. It is just perfect!




We're in the same camp with a cluster ~2x as large for Squawk[1] and it would cost us many multiples in the cloud (excluding our TURN relays which aren't k8s). However, the one killer feature that the cloud still has over self hosted is the state layer. There is nothing that comes close to the turn key, highly available, point in time recoverable database offerings from the cloud providers. We're running Spilo/Patroni helm charts, and we've really tried to break our setup chaos monkey style. But I'll admit I'd sleep better leaving it in Amazon's hands (fortunately, with all the money we save, we have multiple synchronous replicas and ship log files every 10 seconds).

[1] Shamless plug Squawk: Walkie Talkie for Teams - https://www.squawk.to

_EDIT_ I've just read your blog post. We went the other direction and have used the local storage provisioner to create PVCs directly on host storage, and push the replication to the application layer. We run postgres and redis (keydb) with 3 replicas each with at least one in sync replication (where supported) and shipping postgres wal logs to S3 every 10 seconds.


You can also try databases that are natively distributed with replication and scaling built-in. If you need SQL you have many "newSQL" choices like CockroachDB, Yugabyte, Vitess, TiDB, and others.


Why did you keep your TURN relays out of k8s?


Because we needed geographic distribution so that we don't end up hairpinning our users, and they only run a single service so the value prop is much lower. We use route 53 to do geodns across a number of cheap instances around the world (which is also nice, let's you pick regions with cheap bandwidth but good latency to major metro areas). We currently have TURN relays in Las Vegas, New York, and Amsterdam and that gives us pretty good coverage (sorry Asia...you're just so damn expensive!).

But all of our APIs sit in one k8s cluster across two datacenters (Hetzner, with whom we couldn't be happier).


Really interested in hosting at Hetzner, as their prices are fantastic by comparison to AWS, Azure & GCP.

I'm particularly interested in what an HA Postgres setup might look like. Assuming you are running some kind of database (whether Postgres or otherwise), what are you doing for persistent storage? Are you using Hetzner's cloud block storage volumes? What is performance like?


Interesting! Is that a single K8 control plane across one cluster? We've gone with fully isolated clusters across 2 data centers to protect against a network isolation incident between them causing a split brain/borking etcd.


Yes the control plane is only in one of the data centers. The other only runs admin services like offsite backups, our development infra (gitlab, etc) and CI/CD.

We could definitely do two clusters and probably should, but the secondary data center has few services that it wasn’t really worth the extra work.


Oh cool, interesting. Thanks for the overview


Longhorn synchronously replicates the volume across multiple replicas stored on multiple nodes https://github.com/longhorn/longhorn

At first look the numbers in the colourful table near the end, Piraeus/Linstor/DRBD seems 10x faster than Longhorn 0.8. The article goes into great depth of the (a)synchronous replication options of Piraeus, but doesn't mention that Longhorn always does synchronous replication. I wonder why?

SUSE being full into btrfs and CEPH, I wonder if they will allow Yasker https://github.com/longhorn/longhorn/graphs/contributors to continue developing. At Kubecon EU & US 2019 https://youtu.be/hvVnfZf9V6o?t=1659 Sheng Yang explains how he tried to make Longhorn first class citizen Kubernetes Storage.


Longhorn serves a very difference use case than btrfs and CEPH so continued investment makes sense.

Disclaimer: I'm the Rancher Labs CTO


drbd is really really hard to use. (ceph aswell tough)

also performance is extremly dependant on so many factors which are not always given. i.e. drives, network, etc.

for some stuff even a distributed fs is enough, like glusterfs


I should have made it clearer that Longhorn is sync as default. Linstor is also synchronous as default, but you can mess with it to make async in some situations (In reality you allow it to be out-of-sync).

I´m really rooting for Longhorn. I´m a sucker for GUIs. But in my tests the performance is not there yet.

However, they opened a new epic ticket to focus on performance, and hopefully they will keep improving Longhorn after the acquisition.


You mentioned somewhere that your servers were hosted with Hetzner - are you using their "cloud volume" block storage? Really curious to know what performance is like with this cloud attached SSD storage!


Thats a great depiction of the power of one person with proper knowledge!

Get a little bit of money (in comparision to all those shiny great things), build it, wing it and provide a huge benefit :)


Agreed, Rancher rocks.


Where did you rent dedicated servers?


Hetzner




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: