Hacker News new | past | comments | ask | show | jobs | submit login

It will catch up with you. I was at one shop with an 11 person platform team dedicated to platforms. The shop moved really fast, and even with 500 employees, they were able to moved from OpenStack to DC/OS in 2~3 months. (We had CoreOS running on open stack, but fully migrated over to DC/OS. Jenkins -> Gitlab also happened very rapidly; really good engineers).

At my current shop, we struggle to maintain k8s clusters with an 8 person team. We inherited the debt of a previous team that had deployed k8s and their old legacy stuff was full of dependency rot. We have new clusters, and we update them regularity, but it's taken nearly half a year so far and we don't have everything moved over.

You do need good teams to move fast; and good leaders to prioritize minimizing tech debt.

We've used a GitOps model (using Flux[1]) with a reviewer team made of people across our dev teams (and the sysops, natch) to ensure that people aren't just kubectl-ing or helm installing random crap, and we put about 2 weeks of effort into getting RBAC right, so that everyone has read access to cluster resources, but only a subset (generally 1 or 2 per team) have what we call "k8ops" roles - and those are the same people reviewing pull requests in the Flux repo - and the norm is to use the read-only role as default. Only time I've had to recently use my k8ops role was to manually scale an experimental app that was spamming the logs to 0 replicas so the devs responsible could sort it in the morning.

I think the way we've approached it achieves the same goal as just giving each team their own cluster to avoid them messing up other teams.

[1]: https://docs.fluxcd.io/en/1.19.0/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact