Container security best practices: Ultimate guide

OrvalWintermute · on Oct 13, 2021

Unfortunately, this reads like a 100 foot marketing document for Sysdig, not actual container security best practices.

If you want to look at actual container security best practices, check out CIS [1] & DISA [2], and NSA [3], with some theory at NIST [4], as well as the documentation from your preferred cloud vendors, be it AWS, Azure, GCP, or other, as well as the specific container security practices.

[1] https://www.cisecurity.org/

[2] https://public.cyber.mil/stigs/downloads/

[3] https://media.defense.gov/2021/Aug/03/2002820425/-1/-1/0/CTR...

[4] https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S...

simonebrunozzi · on Oct 13, 2021

(disclaimer: I know the company and some of the early founders)

I wish all "marketing documents" were this detailed. In other words, I disagree with you. I've read the blog post and it doesn't seem too high level. The resources you indicate are nice, but a 60-pages kubernetes hardening guide by the US Government is perhaps one level deeper than a blog post on internet.

capableweb · on Oct 13, 2021

Clearly sounds like a marketing document. Cites a survey from "Cloud Native Computing Foundation" and claims "92 percent of companies are using containers in production" + "Thus, Kubernetes, Openshift, and other container technologies are present everywhere" while ignoring the fact that the survey is heavily biased towards companies that run containers, of course.

Their own services and blog posts is also referenced in almost every section of the post, even when better external resources exists. Zero competitors are listed in any section. Doesn't sound very neutral to me.

simonebrunozzi · on Oct 13, 2021

In this sense, yes, I agree with you. But a "100 foot marketing document" offers a certain negative connotation that reads like "no content, just fluff"; the content is there, and yes, it is biased, and yes, no competitors are mentioned.

I also agree with you on the fact that a "smarter" kind of content marketing would go beyond these limitations; it would mention competitors, or alternatives; and it wouldn't highlight its company's own services too much.

If someone from Sysdig is reading, these are suggestions for you, guys.

ziddoap · on Oct 13, 2021

>but a 60-pages kubernetes hardening guide by the US Government is perhaps one level deeper

Perhaps "Ultimate guide" is a bit of a misnomer.

OrvalWintermute · on Oct 13, 2021

> Perhaps "Ultimate guide" is a bit of a misnomer.

"Ultimate Guide, Executive Version" ?

1B05H1N · on Oct 13, 2021

It's supposed to be an "Ultimate guide" though.

ljm · on Oct 13, 2021

I guess Sysdig isn't a Y Combinator startup.

I read the entire article thinking it would be a shill, I saw little evidence that it was. In fact, I got to the end and I still don't know what the hell Sysdig is.

If anything, Sysdig fucking sucked at marketing this one, if it was supposed to be a puff piece for the product.

adamgordonbell · on Oct 13, 2021

Container security should start with image security. Instead of runtime security stuff, you can statically analysis images before they are running somewhere and find what known exploits might exist in them. This is also easier to scale.

Nist gets it right by starting there.

thinkharderdev · on Oct 13, 2021

One of the hardest things to get any dev organization to start taking seriously is supply chain security. That first scan which lights up like a Christmas Tree is always such a daunting obstacle to get over. It's a shame because it is probably the highest value SDLC practice that many are not doing.

dijit · on Oct 13, 2021

Yet, the base Debian image _does_ light up like a Christmas tree when you run a snyk scan. Mostly with incorrect issues (version number causes a flag but the fix is backported) or are considered low priority and thus WONTFIX by upstream.

If you’re writing software against, say, dotnet3 (which has a docker image based on Debian) then you’re basically noised out.

0xbadcafebee · on Oct 13, 2021

Even if it is a marketing document, it's still got incredibly valuable information. Almost nobody is going to read a government specification, but they will probably read this page.

ziddoap · on Oct 13, 2021

>Almost nobody is going to read a government specification

Why is that?

Every company I have worked security in, including where I am at now, regularly reads government guidance. Especially NIST guidance, which is referenced all over the world.

gunfighthacksaw · on Oct 14, 2021

Yet another (soon to be penultimate, etc) ultimate guide

moochmooch · on Oct 13, 2021

It's funny that you use the term "actual" to describe the guidance from the US government. They don't really know what they are talking about. Their release process for guidance takes so long that by the time it's release, it's out of date. This is absolutely true for k8s guidance. Last I checked, they were suggesting everyone use "Docker Enterprise" on their guidance long after it no longer existed (are vendors supposed to magically know mirantis is now an option?)

ziddoap · on Oct 13, 2021

I always have to laugh a little bit when someone says NIST, NSA, etc. just "don't really know what they are talking about".

They aren't perfect (you know, being humans and all), and can sometimes be slow in disseminating information to the public, but you're out to lunch if you think they "don't really know" anything.

moochmooch · on Oct 13, 2021

I'm scoping my statement to container security & orchestration best practices, not their competency as a whole. I know the specifics of their guidance due to the industry I work in, so I feel comfortable speaking generally about specific guidance in regards to specific technology.

Your comments reads overly defensive to me.

ziddoap · on Oct 13, 2021

>I'm scoping my statement to container security & orchestration best practices, not their competency as a whole.

vs.

>It's funny that you use the term "actual" to describe the guidance from the US government. They don't really know what they are talking about.

Perhaps you can understand why I thought you were speaking generally, when your comment is written generally. I can't read minds to figure out what your silently scoping your comment to.

But if saying I laughed and why I laughed is overly defensive, my apologies. I'm not sure how else I would tell someone I find their comment funny.

blowski · on Oct 13, 2021

Yeah. Typical dev hyperbole.

In a similar vein, a fairly mid-level dev was recently trying to convince me that "Rob Pike is a clueless idiot who knows nothing about language design".

y4mi · on Oct 13, 2021

I somehow think that their opinion was a little more nuanced then that.

And fwiw, Rob Pike definitely did make mistakes. Golang is a great language, but it's not perfect.

blowski · on Oct 13, 2021

It really wasn’t more nuanced than that - I’m pretty much quoting verbatim. The argument stemmed from the lack of generics in Go, which apparently was a sign of incompetence.

My general point is that there a lot of people who see the world in binary - genius or idiot, perfect or incompetent.

OrvalWintermute · on Oct 14, 2021

Sometimes they take a longer time to release a document officially in a final version, like NIST.

However, they regularly put out drafts and socialize them at an early stage.

Additionally, there is a huge amount of content that they produce that isn't widely disseminated outside of DoD/IC.

dpedu · on Oct 13, 2021

Perhaps I overlooked it, but it seems strange there's nothing about making containers immutable and read-only. This is a powerful tool IMO.

https://cloud.google.com/architecture/best-practices-for-ope...

capableweb · on Oct 13, 2021

It seems that Sysdig doesn't have a blog post about making containers immutable and read-only, nor offer a service that enables that, so probably not worth mentioning for them.

capitangolo · on Oct 13, 2021

Hmm, that seems like a weird miss from my side.

i.e. We covered this across several articles like this one about tags: https://sysdig.com/blog/toctou-tag-mutability/

This other one about file integrity monitoring (Disclaimer: A rather commercial one) https://sysdig.com/blog/file-integrity-monitoring/

And I recall others more explicit on the read-only part, but I’m away from my laptop now. Edit: Found it (point 1.3 in https://sysdig.com/blog/dockerfile-best-practices/ )

Thanks for pointing it out. Definitely it should be more explicit.

raesene9 · on Oct 13, 2021

Yep I've always had read only root filesystems down as a good control and one that's often not too tough to implement.

Another favourite of mine would be using multi-stage builds and minimal base images in production (FROM Scratch, where possible). having limited or no tooling in the running container makes an attackers life trickier for sure.

travisd · on Oct 13, 2021

The distroless static images are pretty good. It’s essentially scratch plus certificate authority roots of trust.

kjs3 · on Oct 13, 2021

I would assume that's because that mitigation isn't what sysdig does.

w7 · on Oct 13, 2021

My home k8s cluster is now "locked down" using micro-vms (kata-containers[0]), pod level firewalling (cilium[1]), permission-limited container users, mostly immutable environments, and distroless[2] base images (not even a shell is inside!). Given how quickly I rolled this out; the tools to enhance cluster environment security seem more accessible now than my previous research a few years ago.

I know it's not exactly a production setup, but I really do feel that it's atleast the most secure runtime environment I've ever had accessible at home. Probably more so than my desktops, which you could argue undermines most of my effort, but I like to think I'm pretty careful.

In the beginning I was very skeptical, but being able to just build a docker/OCI image and then manage its relationships with other services with "one pane of glass" that I can commit to git is so much simpler to me than my previous workflows. My previous setup involved messing with a bunch of tools like packer, cloud-init, terraform, ansible, libvirt, whatever firewall frontend was on the OS, and occasionally sshing in for anything not covered. And now I can feel even more comfortable than when I was running a traditional VM+VLAN per exposed service.

[0] https://github.com/kata-containers/kata-containers

[1] https://github.com/cilium/cilium

[2] https://github.com/GoogleContainerTools/distroless

runamok · on Oct 16, 2021

So if you are trying to troubleshoot something in a particular container how do you handle it? Attach a sidecar with various tools or...?

w7 · on Oct 18, 2021

Using a sidecar is also an option for debugging stuff involving shared storage, yes. The distroless project also ships aptly named "debug" containers that have BusyBox if you want a minimal shell for debugging something in the container filesystem itself. I've also made use of self-made "debug" containers with go-delve or the JVM in their respecting over-the-network debugging modes and a kubectl port forward, for anything written by me.

OrvalWintermute · on Oct 14, 2021

I read alot on home setups and yours seems to balance both high security and maintainability very well.

Care to share about the details of the security services side of your stack too?

Cheers

w7 · on Oct 14, 2021

Sure, hopefully I understand what you mean.

For network observability I'm using Cilium's Hubble, which I will soon figure out how to get into a greylog setup or something. For container image vulnerability interrogation I'm running Harbor with Trivy enabled, initial motivation was to have an effective pull through cache for multiple registries because I got rate limited by AWS ECR (due to a misconfigured CI pipeline, oops), but it ended up killing two birds with 1 stone.

Next on my list is writing an admission controller to modify supported registry targets to match my pull through cache configuration.

Is there something more specific you wanted?

OrvalWintermute · on Oct 16, 2021

> Is there something more specific you wanted?

Yeah sure, what is your network infrastructure too? :)

Are all the containers Linux only, or other OSes too?

w7 · on Oct 18, 2021

Inside the cluster my containers are Linux only. I don't believe kata-containers supports Windows containers as I don't think rust-vmm, which is used by CloudHypervisor[0], or the kata internal execution agent support it.

If I wanted to run Windows in the cluster I'd probably have to look at KubeVirt[1]. KubeVirt is oriented towards getting traditional VM workloads (ones you'd run in QEMU, Hyper-V, etc) functioning in a Kubernetes environment. While kata-containers is oriented towards giving container runtime based workloads (images that run on docker, containerd, CRI-O) the protection of virtualization, with minimal friction.

Previously external to the cluster I had some Windows VMs hosted on QEMU/KVM + libvirt for experimentation with Linux and Active Directory integration, but they've since been deleted. The only remaining traditional VMs I have are 2 DNS servers and one OpenBSD server for serving up update images to my routers.

For network infra I have a number of VyOS[2] firewalls both at the edge and between VLANs, and Mikrotik devices for switching.

[0] https://github.com/cloud-hypervisor/cloud-hypervisor

[1] https://github.com/kubevirt/kubevirt

[2] https://www.vyos.io

w7 · on Oct 19, 2021

Correction, CloudHypervisor supports Windows, but the kata agent does not.

eatonphil · on Oct 13, 2021

The thing that kills me about all of this is how hard it is to do it right. I wish there were a dumbed down version of containers and orchestrators for people trying to do basic multi-tenant compute in a SaaS and don't care a ton about the best performance.

Would I be generally ok if I use gvisor to give a shell environment to customers and just keep the host up to date?

Or is using containers just relatively pointless for multitenant compute in a SaaS compared to giving customers virtual machines?

If you can't imagine the kind of SaaS I'm talking about, think something along the lines of Github's new online IDE, CodeSpaces.

nonameiguess · on Oct 13, 2021

Multitenancy is difficult with containerization and not something I would recommend. It isn't what the technology is intended for. The ultimate example of multitenancy is actual platform and infrastructure providers and they all do it by giving you VMs because type I hypervisors are actually designed to do this kind of thing. Breakouts are always still possible when two processes are on the same physical server, but it's never as trivial as figuring out how to mount the kernel virtual filesystems.

I say this as a Kubernetes consultant. If you want "multitenancy" in the sense of distinct product or application teams all employed by the same parent company or organization, it's fine. But if you're talking truly different organizations with no implied trust between them, don't put them on a shared cluster.

I'm kind of curious how Github does this, because you can still get very minimalistic with VMs. Make the startup script for your application something that also mounts the filesystems it needs and name it /sbin/init and you just made yourself a poor man's unikernel.

cpuguy83 · on Oct 13, 2021

I'll be devil's advocate and say breakouts are totally possible with VM's, just by different vectors.

The vast majority of container breakouts are due to bugs in the control plane and not so much the kernel. The same is likely true for VMM's/hypervisors until those really started getting mature.

dotCloud and and Heroku are both examples of multi-tenant containers.

raesene9 · on Oct 13, 2021

That's very true, although I think there's a difference in attack surface size between the three isolation options (process based, sandbox based, hypervisor based).

I think the challenge for process isolation container based stacks (as I'm sure you know :) ) is that there's multiple components/groups involved in security and then there's co-ordination with the underlying Linux kernel as well, which makes things tricky, as Linux kernel devs will have potentially differing goals to the container people (e.g. the challenges about how to handle the interaction of new syscalls and seccomp filters)

If you compare that to something like gVisor, where there's essentially a single group responsible for creating/maintaining the sandbox, it's an easier task for them.

paxys · on Oct 13, 2021

I think "dumbed down" and "multi-tenant compute" aren't compatible. No company needs to do multi-tenant compute by default. If you do, you are in the cloud hosting/infrastructure business (whether you like it or not) and should be expected to have the knowledge necessary to run such an operation.

eatonphil · on Oct 13, 2021

While that's a common sentiment in some topics in tech; I think the general intent, if not actual result, of progress in tech is to make things faster, more secure and easier.

So forgive me for asking. :)

raesene9 · on Oct 13, 2021

Multi-Tenant Kubernetes is straight up difficult to do well, especially where you're talking hard multi-tenancy for external customers.

There was a good report that covered a lot of the risks and mitigations here https://raw.githubusercontent.com/salesforce/kubernetes-cont...

But even then that had limited scope and didn't cover things like networking.

mac-chaffee · on Oct 13, 2021

There's a set of benchmarks for multi-tenant kubernetes clusters that might prove useful (although they could use more depth): https://github.com/kubernetes-sigs/multi-tenancy/tree/master...

simonebrunozzi · on Oct 13, 2021

Reinventing the wheel, all the time. Early VMware (with VMs) had a much better sense of product than Google had with K8S.

hsbauauvhabzb · on Oct 13, 2021

Calling your guide the ‘ultimate guide’ is disingenuous marketing. No single guide can cover all security concepts in all contexts. Every time I see that sorta wording I just assume the writer doesn’t actually know what they’re talking about

hsbauauvhabzb · on Oct 13, 2021

Continued: and given the writer seems to be all about tools the article fails to highlight that static (and automated dynamic) tools are limited in their ability to detect some classes of vulnerabilities and need to be backed with experience manual testing. This almost feels like it’s been written by a devops engineer who has a vague understanding about containerisation doesn’t have a clue about real and practical mechanisms to secure applications and services hosted inside containers.

I’m not saying the article is totally bad, but calling it an ‘Ultimate Guide’ makes the author a charlatan.

gui77aume · on Oct 13, 2021

I'm always a bit confused about the CPU limit (for the pod), some guides (and tools) advice to always set one, but this one [0] doesn't. Ops people I worked with almost always want to lower that limit and I have to insist for raising it (no way they disable it). Is there an ultimate best practice for that?

[0] https://learnk8s.io/production-best-practices

jeffbee · on Oct 13, 2021

CPU limits are harmful if they strand resources that could have been applied. I usually skip them for batch deployments, use them for latency-sensitive services. Doesn’t seem like a security topic though.

dilyevsky · on Oct 14, 2021

They are actually even worse for latency sensitive workload because cfs with 100ms default period will cause crap tail latency (especially for multithreaded processes such as most go programs)

gui77aume · on Oct 14, 2021

Interesting. It's my impression too. I understand that CPU limit will artificially throttle CPU, when not necessarily needed, wasting CPU cycles I could use. (Java programs in my case but I imagine it's comparable to Go ones)

Do you recommend to disable CPU limit? In the general case.

dilyevsky · on Oct 15, 2021

We don’t set them anywhere in prod and generally didn’t have any issues. We always set cpu requests and alert if those are exceeded for prolonged periods and always set memory req=limit

jeffbee · on Oct 14, 2021

Yes, but I put limits on LS workloads because I expect them to have a capacity plan, stick to it, and not abusively starve out batch workloads.

dilyevsky · on Oct 14, 2021

I think this is backwards. How are you planning on “sticking to it” when you’re serving unpredictable user traffic? If requests are set appropriately everywhere then it won’t really starve batch as kernel would just scale everything to their respective cpu.shares when cpu is fully saturated. This would allow you to weather spiky load with minimum latency impact and minimize spend

jeffbee · on Oct 15, 2021

It's weird that apparently you are a borg user from google, according to other discussions we have exchanged, but you question the value of hard-capping for latency-sensitive processes.

dilyevsky · on Oct 15, 2021

Borg sre even ;) (former) and yes i do question them. For one borg aint using 100ms cfs period and it wasn’t even standard cfs if i recall so yes i do question that outside of limited borg usecase

danjc · on Oct 13, 2021

Curious to know whether anyone here can speak to how much safer Hyper V isolation[1] is than process isolation and whether it negates some of the concerns in the article.

1. https://docs.microsoft.com/en-us/virtualization/windowsconta...

raesene9 · on Oct 13, 2021

Microsoft's guidance (last I looked) was that Windows containers (e.g. the non Hyper-V ones) were not a security boundary, only Hyper-V based Windows containers should be considered to provide isolation.

That has the slight clash with the fact that Hyper-V containers are not currently supported under Kubernetes (https://kubernetes.io/docs/setup/production-environment/wind...):)

For more depth on the challenges of securing Process isolation containers with Windows https://googleprojectzero.blogspot.com/2021/04/who-contains-... is a great read.

OrvalWintermute · on Oct 13, 2021

To add to above:

Virtualization & Containerization security depends a great deal on the security of the underlying platform.

Hyper-V can be used on endpoints [1], similar to VMware Workstation.

It can also be installed as a role on top of Windows Server [2], and, used as bootable OS of its own[3] (likely deprecated in the future, so no hyper-v server past server 2019).

Related to this is the type of Windows server install, as it touches on attack surface also [4], but I believe there are constraints for the very small installs.

This matters because attack surface is likely to be, from smallest to largest: hyper-v server < Windows Server < Windows Endpoint

[1] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-w...

[2] https://docs.microsoft.com/en-us/windows-server/virtualizati...

[3] https://www.microsoft.com/en-us/evalcenter/evaluate-hyper-v-...

[4] https://docs.microsoft.com/en-us/previous-versions/windows/d...

hsbauauvhabzb · on Oct 14, 2021

It just changes complexity. The difference between a container on bare metal where the target is an adjacent application (or container image), and a container inside a vm where the target is an adjacent application on the host (or inside a vm/vm+Container) the attack chain includes a container breakout and* a hypervisor breakout, which is harder to do, but probably not beyond highly sophisticated threat actors.

invokestatic · on Oct 13, 2021

Virtualization-backed container technologies are a definite security improvement over traditional containers (including Hyper-V), but most of the measures in this article are still important. Remember, security-in-depth. Virtualization mainly protects against zero-day kernel exploits, limiting the "blast radius" to a single container. You still need to monitor dependencies, isolation, signing, scanning, and have a vulnerability management program, among other things.

badrabbit · on Oct 14, 2021

Production host root fs should be mounted ro. Check out Linux IMA and how to only allow specific executables by hash. Centrally forward container logs. Use a VCS for container/workload templates and routinely audit for misconfig. Sysdig/falco and related tools are nice, but containers and their prod hosts are easier to harden