As you say -- I think by itself "we want to run some cron jobs" isn't a good enough reason by itself to use Kubernetes (though it might be a good enough reason if you’re using a managed Kubernetes cluster where someone else handles the cluster operations). A goal for this project was to prove to ourselves that we actually could run production code in Kubernetes, to learn about how much work operating Kubernetes actually is, and to lay the groundwork for moving more things to Kubernetes in the future.
In my mind, a huge advantage of Kubernetes is that Kubernetes' code is very readable and they're great at accepting contributions. In the past when we've run into performance problems with Jenkins (we also use jenkins-job-builder to manage our 1k node Jenkins cluster), they've been extremely difficult to debug and it's hard to get visibility into what's going on inside Jenkins. I find Kubernetes’ code a lot easier to read, it's fairly easy to monitor the internals, and the core components have pprof included by default if you want to get profiling information out. Being able to easily fix bugs in Kubernetes and get the patches merged upstream has been a big deal for us.
Why wasn't the final sentence "and to re-evaluate if moving forward was even a good idea?"
Because I get nervous every time someone is relying on their patches to be included upstream. Or they need to dive in to the internals of something repeatedly. That screams "not production ready" to me.
After reading the post, Kubernetes did not sound at all like a slam dunk in terms of a solution, let alone a foundation for more mission critical infrastructure. The Jenkins solution offered by the parent sounds more reasonable, even with the objections you list.
Edit: Take my comments with a grain of salt, but from internet armchair vantage point it does sound like Kubernetes was chosen first, and rationalized second. (Though I very much appreciated the thoroughness with which you went about learning the technology)
> Why wasn't the final sentence "and to re-evaluate if
> moving forward was even a good idea?"
We invested a (proportionally) large amount of engineering effort to ensure we had the ability to move the whole shebang back to Chronos ~immediately. As noted in the article, we exercised this rollback feature several times when particular cronjobs deviated from expected behavior when run in Kubernetes.
> Because I get nervous every time someone is relying on
> their patches to be included upstream. Or they need to
> dive in to the internals of something repeatedly. That
> screams "not production ready" to me.
Every engineering organization reaches the point where they want more features than are available in an existing platform. The most practical solutions for this are to launch a new platform ("Not Invented Here"), or contribute code upstream. The first option can provide better short-term outcomes, but is usually inferior on multi-year timescales.
Consider that with a mature build infrastructure, internal builds are actually the latest stable release plus cherry-picked patches. This provides the best of all worlds -- an upstream foundation, with bug fixes on our schedule, and an eventually-consistent contribution to the community.