Hacker News new | past | comments | ask | show | jobs | submit login

> The only way to sanely run Kubernetes is by giving teams/orgs their own clusters

Funny, we're running it sanely without doing that. We've separated our clusters based on use-case - delivery vs. back-end, aiming towards the "cell-based" architecture.

> Managed Kubernetes (EKS on AWS, GKE on Google) is very much in its infancy and doesn’t solve most of the challenges with owning/operating Kubernetes (if anything it makes them more difficult at this time)

...some details on the challenges they don't solve, or indeed make more difficult would be good.

But yep, K8s is complex. So, to paraphrase `import this`, you only want to use it when you have sufficiently complicated systems that the complexity is worth it.

It will catch up with you. I was at one shop with an 11 person platform team dedicated to platforms. The shop moved really fast, and even with 500 employees, they were able to moved from OpenStack to DC/OS in 2~3 months. (We had CoreOS running on open stack, but fully migrated over to DC/OS. Jenkins -> Gitlab also happened very rapidly; really good engineers).

At my current shop, we struggle to maintain k8s clusters with an 8 person team. We inherited the debt of a previous team that had deployed k8s and their old legacy stuff was full of dependency rot. We have new clusters, and we update them regularity, but it's taken nearly half a year so far and we don't have everything moved over.

You do need good teams to move fast; and good leaders to prioritize minimizing tech debt.

We've used a GitOps model (using Flux[1]) with a reviewer team made of people across our dev teams (and the sysops, natch) to ensure that people aren't just kubectl-ing or helm installing random crap, and we put about 2 weeks of effort into getting RBAC right, so that everyone has read access to cluster resources, but only a subset (generally 1 or 2 per team) have what we call "k8ops" roles - and those are the same people reviewing pull requests in the Flux repo - and the norm is to use the read-only role as default. Only time I've had to recently use my k8ops role was to manually scale an experimental app that was spamming the logs to 0 replicas so the devs responsible could sort it in the morning.

I think the way we've approached it achieves the same goal as just giving each team their own cluster to avoid them messing up other teams.

[1]: https://docs.fluxcd.io/en/1.19.0/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact