
Fix Two Linux Kernel Bugs While Testing TiDB Operator in K8s - WTTT
https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/
======
gautamdivgi
The first bug hits close to home. I spent a good 2-3 months in 2018
researching this and providing mitigations to our ops teams. We also had
another bug with older k8s versions - cgroup leakage where pods in a crash
loop would leave cgroups hanging around.

We came around to the doing exactly what the team here did. Disable kmem
accounting, but we had to rebuild kubelet/runc after making code changes to
disable it.

I created some very basic tests to verify which kernel would not give the kmem
accounting error. I randomly tested a few kernels (RHEL, CenOs - 3.10.x,
Ubuntu - 4.4, 4.15). The 4.15 kernel is where my test didn't show the issue
recurring. But I've still seen some sporadic occurrences in the 4.15 kernels
as well.

I think you can get a no-kmem stack of k8s/runc, etc. for RHEL/CentOS kernels
though. It's what we started to use earlier this year and have been happy with
it so far.

