Hacker News new | comments | show | ask | jobs | submit login

Hi! Post author here! I agree that it's really important to be careful of "shiny new tool" syndrome -- one of my primary goals in writing this post was to show that operating Kubernetes in production is complicated and to encourage people to think carefully before introducing a Kubernetes cluster into their infrastructure.

As you say -- I think by itself "we want to run some cron jobs" isn't a good enough reason by itself to use Kubernetes (though it might be a good enough reason if you’re using a managed Kubernetes cluster where someone else handles the cluster operations). A goal for this project was to prove to ourselves that we actually could run production code in Kubernetes, to learn about how much work operating Kubernetes actually is, and to lay the groundwork for moving more things to Kubernetes in the future.

In my mind, a huge advantage of Kubernetes is that Kubernetes' code is very readable and they're great at accepting contributions. In the past when we've run into performance problems with Jenkins (we also use jenkins-job-builder to manage our 1k node Jenkins cluster), they've been extremely difficult to debug and it's hard to get visibility into what's going on inside Jenkins. I find Kubernetes’ code a lot easier to read, it's fairly easy to monitor the internals, and the core components have pprof included by default if you want to get profiling information out. Being able to easily fix bugs in Kubernetes and get the patches merged upstream has been a big deal for us.




> A goal for this project was to prove to ourselves that we actually could run production code in Kubernetes, to learn about how much work operating Kubernetes actually is, and to lay the groundwork for moving more things to Kubernetes in the future.

Why wasn't the final sentence "and to re-evaluate if moving forward was even a good idea?"

Because I get nervous every time someone is relying on their patches to be included upstream. Or they need to dive in to the internals of something repeatedly. That screams "not production ready" to me.

After reading the post, Kubernetes did not sound at all like a slam dunk in terms of a solution, let alone a foundation for more mission critical infrastructure. The Jenkins solution offered by the parent sounds more reasonable, even with the objections you list.

Edit: Take my comments with a grain of salt, but from internet armchair vantage point it does sound like Kubernetes was chosen first, and rationalized second. (Though I very much appreciated the thoroughness with which you went about learning the technology)


Hello! I work at Stripe and helped with some aspects of the Kubernetes cron stuff -- maybe these answers can be helpful.

  > Why wasn't the final sentence "and to re-evaluate if
  > moving forward was even a good idea?"
I think that's sort of implied -- complex technical projects have a risk of unexpected roadblocks, and it's important that "stop and roll back" always be on the list of options. Never burn your ships.

We invested a (proportionally) large amount of engineering effort to ensure we had the ability to move the whole shebang back to Chronos ~immediately. As noted in the article, we exercised this rollback feature several times when particular cronjobs deviated from expected behavior when run in Kubernetes.

  > Because I get nervous every time someone is relying on
  > their patches to be included upstream. Or they need to
  > dive in to the internals of something repeatedly. That
  > screams "not production ready" to me.
This is the same basic model as disto-specific patches to the Linux kernel.

Every engineering organization reaches the point where they want more features than are available in an existing platform. The most practical solutions for this are to launch a new platform ("Not Invented Here"), or contribute code upstream. The first option can provide better short-term outcomes, but is usually inferior on multi-year timescales.

Consider that with a mature build infrastructure, internal builds are actually the latest stable release plus cherry-picked patches. This provides the best of all worlds -- an upstream foundation, with bug fixes on our schedule, and an eventually-consistent contribution to the community.


Julia is a visibility pro. When things scale, you need to be able to look inside the thing. If that's tough :grimmacing: for probably hundreds of developers. What a waste! /irony




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: