Unfortunately, this reads like a 100 foot marketing document for Sysdig, not actual container security best practices.
If you want to look at actual container security best practices, check out CIS [1] & DISA [2], and NSA [3], with some theory at NIST [4], as well as the documentation from your preferred cloud vendors, be it AWS, Azure, GCP, or other, as well as the specific container security practices.
(disclaimer: I know the company and some of the early founders)
I wish all "marketing documents" were this detailed. In other words, I disagree with you. I've read the blog post and it doesn't seem too high level. The resources you indicate are nice, but a 60-pages kubernetes hardening guide by the US Government is perhaps one level deeper than a blog post on internet.
Clearly sounds like a marketing document. Cites a survey from "Cloud Native Computing Foundation" and claims "92 percent of companies are using containers in production" + "Thus, Kubernetes, Openshift, and other container technologies are present everywhere" while ignoring the fact that the survey is heavily biased towards companies that run containers, of course.
Their own services and blog posts is also referenced in almost every section of the post, even when better external resources exists. Zero competitors are listed in any section. Doesn't sound very neutral to me.
In this sense, yes, I agree with you. But a "100 foot marketing document" offers a certain negative connotation that reads like "no content, just fluff"; the content is there, and yes, it is biased, and yes, no competitors are mentioned.
I also agree with you on the fact that a "smarter" kind of content marketing would go beyond these limitations; it would mention competitors, or alternatives; and it wouldn't highlight its company's own services too much.
If someone from Sysdig is reading, these are suggestions for you, guys.
I read the entire article thinking it would be a shill, I saw little evidence that it was. In fact, I got to the end and I still don't know what the hell Sysdig is.
If anything, Sysdig fucking sucked at marketing this one, if it was supposed to be a puff piece for the product.
Container security should start with image security. Instead of runtime security stuff, you can statically analysis images before they are running somewhere and find what known exploits might exist in them. This is also easier to scale.
One of the hardest things to get any dev organization to start taking seriously is supply chain security. That first scan which lights up like a Christmas Tree is always such a daunting obstacle to get over. It's a shame because it is probably the highest value SDLC practice that many are not doing.
Yet, the base Debian image _does_ light up like a Christmas tree when you run a snyk scan. Mostly with incorrect issues (version number causes a flag but the fix is backported) or are considered low priority and thus WONTFIX by upstream.
If you’re writing software against, say, dotnet3 (which has a docker image based on Debian) then you’re basically noised out.
Even if it is a marketing document, it's still got incredibly valuable information. Almost nobody is going to read a government specification, but they will probably read this page.
>Almost nobody is going to read a government specification
Why is that?
Every company I have worked security in, including where I am at now, regularly reads government guidance. Especially NIST guidance, which is referenced all over the world.
It's funny that you use the term "actual" to describe the guidance from the US government. They don't really know what they are talking about. Their release process for guidance takes so long that by the time it's release, it's out of date. This is absolutely true for k8s guidance. Last I checked, they were suggesting everyone use "Docker Enterprise" on their guidance long after it no longer existed (are vendors supposed to magically know mirantis is now an option?)
I always have to laugh a little bit when someone says NIST, NSA, etc. just "don't really know what they are talking about".
They aren't perfect (you know, being humans and all), and can sometimes be slow in disseminating information to the public, but you're out to lunch if you think they "don't really know" anything.
I'm scoping my statement to container security & orchestration best practices, not their competency as a whole. I know the specifics of their guidance due to the industry I work in, so I feel comfortable speaking generally about specific guidance in regards to specific technology.
>I'm scoping my statement to container security & orchestration best practices, not their competency as a whole.
vs.
>It's funny that you use the term "actual" to describe the guidance from the US government. They don't really know what they are talking about.
Perhaps you can understand why I thought you were speaking generally, when your comment is written generally. I can't read minds to figure out what your silently scoping your comment to.
But if saying I laughed and why I laughed is overly defensive, my apologies. I'm not sure how else I would tell someone I find their comment funny.
In a similar vein, a fairly mid-level dev was recently trying to convince me that "Rob Pike is a clueless idiot who knows nothing about language design".
It really wasn’t more nuanced than that - I’m pretty much quoting verbatim. The argument stemmed from the lack of generics in Go, which apparently was a sign of incompetence.
My general point is that there a lot of people who see the world in binary - genius or idiot, perfect or incompetent.
It seems that Sysdig doesn't have a blog post about making containers immutable and read-only, nor offer a service that enables that, so probably not worth mentioning for them.
Yep I've always had read only root filesystems down as a good control and one that's often not too tough to implement.
Another favourite of mine would be using multi-stage builds and minimal base images in production (FROM Scratch, where possible). having limited or no tooling in the running container makes an attackers life trickier for sure.
My home k8s cluster is now "locked down" using micro-vms (kata-containers[0]), pod level firewalling (cilium[1]), permission-limited container users, mostly immutable environments, and distroless[2] base images (not even a shell is inside!). Given how quickly I rolled this out; the tools to enhance cluster environment security seem more accessible now than my previous research a few years ago.
I know it's not exactly a production setup, but I really do feel that it's atleast the most secure runtime environment I've ever had accessible at home. Probably more so than my desktops, which you could argue undermines most of my effort, but I like to think I'm pretty careful.
In the beginning I was very skeptical, but being able to just build a docker/OCI image and then manage its relationships with other services with "one pane of glass" that I can commit to git is so much simpler to me than my previous workflows. My previous setup involved messing with a bunch of tools like packer, cloud-init, terraform, ansible, libvirt, whatever firewall frontend was on the OS, and occasionally sshing in for anything not covered. And now I can feel even more comfortable than when I was running a traditional VM+VLAN per exposed service.
Using a sidecar is also an option for debugging stuff involving shared storage, yes. The distroless project also ships aptly named "debug" containers that have BusyBox if you want a minimal shell for debugging something in the container filesystem itself. I've also made use of self-made "debug" containers with go-delve or the JVM in their respecting over-the-network debugging modes and a kubectl port forward, for anything written by me.
For network observability I'm using Cilium's Hubble, which I will soon figure out how to get into a greylog setup or something. For container image vulnerability interrogation I'm running Harbor with Trivy enabled, initial motivation was to have an effective pull through cache for multiple registries because I got rate limited by AWS ECR (due to a misconfigured CI pipeline, oops), but it ended up killing two birds with 1 stone.
Next on my list is writing an admission controller to modify supported registry targets to match my pull through cache configuration.
Inside the cluster my containers are Linux only. I don't believe kata-containers supports Windows containers as I don't think rust-vmm, which is used by CloudHypervisor[0], or the kata internal execution agent support it.
If I wanted to run Windows in the cluster I'd probably have to look at KubeVirt[1]. KubeVirt is oriented towards getting traditional VM workloads (ones you'd run in QEMU, Hyper-V, etc) functioning in a Kubernetes environment. While kata-containers is oriented towards giving container runtime based workloads (images that run on docker, containerd, CRI-O) the protection of virtualization, with minimal friction.
Previously external to the cluster I had some Windows VMs hosted on QEMU/KVM + libvirt for experimentation with Linux and Active Directory integration, but they've since been deleted. The only remaining traditional VMs I have are 2 DNS servers and one OpenBSD server for serving up update images to my routers.
For network infra I have a number of VyOS[2] firewalls both at the edge and between VLANs, and Mikrotik devices for switching.
The thing that kills me about all of this is how hard it is to do it right. I wish there were a dumbed down version of containers and orchestrators for people trying to do basic multi-tenant compute in a SaaS and don't care a ton about the best performance.
Would I be generally ok if I use gvisor to give a shell environment to customers and just keep the host up to date?
Or is using containers just relatively pointless for multitenant compute in a SaaS compared to giving customers virtual machines?
If you can't imagine the kind of SaaS I'm talking about, think something along the lines of Github's new online IDE, CodeSpaces.
Multitenancy is difficult with containerization and not something I would recommend. It isn't what the technology is intended for. The ultimate example of multitenancy is actual platform and infrastructure providers and they all do it by giving you VMs because type I hypervisors are actually designed to do this kind of thing. Breakouts are always still possible when two processes are on the same physical server, but it's never as trivial as figuring out how to mount the kernel virtual filesystems.
I say this as a Kubernetes consultant. If you want "multitenancy" in the sense of distinct product or application teams all employed by the same parent company or organization, it's fine. But if you're talking truly different organizations with no implied trust between them, don't put them on a shared cluster.
I'm kind of curious how Github does this, because you can still get very minimalistic with VMs. Make the startup script for your application something that also mounts the filesystems it needs and name it /sbin/init and you just made yourself a poor man's unikernel.
I'll be devil's advocate and say breakouts are totally possible with VM's, just by different vectors.
The vast majority of container breakouts are due to bugs in the control plane and not so much the kernel.
The same is likely true for VMM's/hypervisors until those really started getting mature.
dotCloud and and Heroku are both examples of multi-tenant containers.
That's very true, although I think there's a difference in attack surface size between the three isolation options (process based, sandbox based, hypervisor based).
I think the challenge for process isolation container based stacks (as I'm sure you know :) ) is that there's multiple components/groups involved in security and then there's co-ordination with the underlying Linux kernel as well, which makes things tricky, as Linux kernel devs will have potentially differing goals to the container people (e.g. the challenges about how to handle the interaction of new syscalls and seccomp filters)
If you compare that to something like gVisor, where there's essentially a single group responsible for creating/maintaining the sandbox, it's an easier task for them.
I think "dumbed down" and "multi-tenant compute" aren't compatible. No company needs to do multi-tenant compute by default. If you do, you are in the cloud hosting/infrastructure business (whether you like it or not) and should be expected to have the knowledge necessary to run such an operation.
While that's a common sentiment in some topics in tech; I think the general intent, if not actual result, of progress in tech is to make things faster, more secure and easier.
Calling your guide the ‘ultimate guide’ is disingenuous marketing. No single guide can cover all security concepts in all contexts. Every time I see that sorta wording I just assume the writer doesn’t actually know what they’re talking about
Continued: and given the writer seems to be all about tools the article fails to highlight that static (and automated dynamic) tools are limited in their ability to detect some classes of vulnerabilities and need to be backed with experience manual testing. This almost feels like it’s been written by a devops engineer who has a vague understanding about containerisation doesn’t have a clue about real and practical mechanisms to secure applications and services hosted inside containers.
I’m not saying the article is totally bad, but calling it an ‘Ultimate Guide’ makes the author a charlatan.
I'm always a bit confused about the CPU limit (for the pod), some guides (and tools) advice to always set one, but this one [0] doesn't.
Ops people I worked with almost always want to lower that limit and I have to insist for raising it (no way they disable it).
Is there an ultimate best practice for that?
CPU limits are harmful if they strand resources that could have been applied. I usually skip them for batch deployments, use them for latency-sensitive services. Doesn’t seem like a security topic though.
They are actually even worse for latency sensitive workload because cfs with 100ms default period will cause crap tail latency (especially for multithreaded processes such as most go programs)
Interesting. It's my impression too. I understand that CPU limit will artificially throttle CPU, when not necessarily needed, wasting CPU cycles I could use.
(Java programs in my case but I imagine it's comparable to Go ones)
Do you recommend to disable CPU limit? In the general case.
We don’t set them anywhere in prod and generally didn’t have any issues. We always set cpu requests and alert if those are exceeded for prolonged periods and always set memory req=limit
I think this is backwards. How are you planning on “sticking to it” when you’re serving unpredictable user traffic? If requests are set appropriately everywhere then it won’t really starve batch as kernel would just scale everything to their respective cpu.shares when cpu is fully saturated. This would allow you to weather spiky load with minimum latency impact and minimize spend
It's weird that apparently you are a borg user from google, according to other discussions we have exchanged, but you question the value of hard-capping for latency-sensitive processes.
Borg sre even ;) (former) and yes i do question them. For one borg aint using 100ms cfs period and it wasn’t even standard cfs if i recall so yes i do question that outside of limited borg usecase
Curious to know whether anyone here can speak to how much safer Hyper V isolation[1] is than process isolation and whether it negates some of the concerns in the article.
Microsoft's guidance (last I looked) was that Windows containers (e.g. the non Hyper-V ones) were not a security boundary, only Hyper-V based Windows containers should be considered to provide isolation.
Virtualization & Containerization security depends a great deal on the security of the underlying platform.
Hyper-V can be used on endpoints [1], similar to VMware Workstation.
It can also be installed as a role on top of Windows Server [2], and, used as bootable OS of its own[3] (likely deprecated in the future, so no hyper-v server past server 2019).
Related to this is the type of Windows server install, as it touches on attack surface also [4], but I believe there are constraints for the very small installs.
This matters because attack surface is likely to be, from smallest to largest: hyper-v server < Windows Server < Windows Endpoint
It just changes complexity. The difference between a container on bare metal where the target is an adjacent application (or container image), and a container inside a vm where the target is an adjacent application on the host (or inside a vm/vm+Container) the attack chain includes a container breakout and* a hypervisor breakout, which is harder to do, but probably not beyond highly sophisticated threat actors.
Virtualization-backed container technologies are a definite security improvement over traditional containers (including Hyper-V), but most of the measures in this article are still important. Remember, security-in-depth. Virtualization mainly protects against zero-day kernel exploits, limiting the "blast radius" to a single container. You still need to monitor dependencies, isolation, signing, scanning, and have a vulnerability management program, among other things.
Production host root fs should be mounted ro. Check out Linux IMA and how to only allow specific executables by hash. Centrally forward container logs. Use a VCS for container/workload templates and routinely audit for misconfig. Sysdig/falco and related tools are nice, but containers and their prod hosts are easier to harden
If you want to look at actual container security best practices, check out CIS [1] & DISA [2], and NSA [3], with some theory at NIST [4], as well as the documentation from your preferred cloud vendors, be it AWS, Azure, GCP, or other, as well as the specific container security practices.
[1] https://www.cisecurity.org/
[2] https://public.cyber.mil/stigs/downloads/
[3] https://media.defense.gov/2021/Aug/03/2002820425/-1/-1/0/CTR...
[4] https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S...