Hacker News new | comments | show | ask | jobs | submit login
Docker Image Vulnerability Research (federacy.com)
90 points by jsulinski on Mar 14, 2017 | hide | past | web | favorite | 46 comments

This is something near and dear to my heart! The great thing about container images is the software distribution is based on static assets. This enables scanners to give teams actionable data without being on every host. This is a net new capability and I think enables better security in organizations who adopt containers. And unlike "VM sprawl" container systems are generally introspectable via a cluster level API like Kubernetes and scanning doesn't require active agents on every node. Two things that have happened recently in this space:

- Quay.io[1] offers scanning as a standard feature on all accounts including free open source accounts. This also includes notifications to external services like Slack. This is what it looks like when you ignore an image[1].

- The Kubernetes community has started automating scans of all of the containers that are maintained by that community to ensure that they are patched and bumped to the latest versions. A recent example[2].

The cool thing is that both of these systems utilize Clair[3] Open Source Project as a way of gathering up data sources from all of the various distribution projects. This all leads to the reason we feel automated updates of distributed systems are so critical and why CoreOS continues to push forward these concepts in CoreOS Tectonic[4].

[0] https://blog.quay.io/quay-secscanner-clair1/

[1] https://quay.io/repository/philips/host-info?tag=latest&tab=...

[2] https://github.com/kubernetes/kubernetes/pull/42933

[3] https://github.com/coreos/clair

[4] https://coreos.com/tectonic

Absolutely. Huge props to CoreOS and the Kube community for pushing forward with this stuff.

I gave Clair a shout out in the article, and I intend on adding it as an optional scanner to Federacy.

Funny, I'm pretty sure we met before either of us started our existing ventures, when you came to the San Francisco DevOps meetups. :)

Sounds great!

If you hit any issues with Clair feel free to file an issue; we have a lot of folks who have been helping maintain the project.

Some quick feedback from the post and your website. It says there is a quick command to scan my container images but I couldn't find the command. I also signed up but the confirmation URL was a 404 and from "team" with subject "Confirmation email".

In any case really happy to see more people digging into these problems and coming up with new solutions.

Thanks sir! I redesigned my landing page and made it static; forgot to update the confirmation email link.

The command is a standard 'wget/bash' script that you will receive when you login, but it's pretty simple to grok.

I'll send you a new confirmation email soon.


Got it!

This is great research but I think an important point is missed. It may come across that these images are vulnerable because of some intrinsic property of using Docker however this is not the case. What is also important to point out is by adopting Docker this analysis actually becomes easier to do across an organization and similarly mitigation becomes easier as well.

I think another aspect that is missed is that just because you use a vulnerable image doesn't necessarily mean you are at risk of being compromised no matter what other security layers you employ. This gets to the practical scenarios of security operations.

Absolutely agree. I did see some bad practices in the Docker community that I expect to see elsewhere as well. Specifically: reliance on deprecated images and not updating images during build. Thoughts?

I didn't address the implications of software vulnerabilities in respect to other mitigation techniques, however, as it's far outside the scope of the article. I probably should at least add a second addendum though. I'll work on this soon. Thanks!

> by adopting Docker this analysis actually becomes easier to do across an organization and similarly mitigation becomes easier as well

I'm curious what makes you say such analysis and mitigation is easier with docker?

Analysis is easier because you already have a running agent that you can remotely query about deployed software. With regular OS you need to provide such an agent.

I don't know why it's easier to mitigate risks, though. Maybe just because it's easier to run the analysis.

> Analysis is easier because you already have a running agent that you can remotely query about deployed software.

Not sure I buy this. Sure, I can query the docker daemon for what images are running, but that's not enough to tell me which images are vulnerable. I still need to build something to actually scan the images.

Also, on any linux host, I don't need a daemon to tell me about deployed software - the package manager can do just that, and the tool used for scanning in this article appears to just query the package manager, which would work just as well on any linux host outside of docker.

That's correct, vuls queries the package manager for installed packages, versions, and changelogs. It then compares the CVEs found in the changelogs to NVD.

There are certainly flaws in this approach; it's one of the reasons we intend to support multiple scanners. We started with vuls because clair wasn't released yet and we wanted to support more than containers.

Are there any dynamic scanners that are designed like Vuls or Clair (I'm assuming they're both static)?

I don't fully understand the question.

clair does static analysis

vuls uses a package manager and changelogs

Are there (any) dynamic analysis options available that would give a report similar in scope to clair or vuls?

> Sure, I can query the docker daemon for what images are running, but that's not enough to tell me which images are vulnerable.

If you can query what images are running, you can tie it with list of deployed software. Then you can compare that list with database of known vulnerabilities; obviously, you'd do the same if you were assessing the host OS without Docker. What's easier is that you already have an API that can be called remotely.

> Also, on any linux host, I don't need a daemon to tell me about deployed software - the package manager can do just that

But you need to get to each of these hosts somehow and get the data out of package manager, so a report can be prepared. This is the part that makes it easier to assess what you have in the case of Docker. Then there is also software that was not installed with OS-supplied package system, because programmers somehow dislike those and work around them with virtualenv or npm-du-jour.

> [...] the tool used for scanning in this article appears to just query the package manager, which would work just as well on any linux host outside of docker.

I haven't read the article, but most probably you're right.

The conversion of package version numbers to vulnerabilities is perilous and incredibly complicated. That's one of the most significant challenges that we want to solve, which is even more pressing considering how badly the CVE ecosystem is breaking down.

Note that an image containing vulnerable binaries is not the same thing as an exploitable cointainer. A container derived from a full OS like Ubuntu will have many binaries to provide a standard environment, but most of them will never be touched by the running program. That year-old image might have a vulnerable Perl version, but nothing in the container even runs Perl, so it's a non-issue.

This is why many people can get away with a minimal base image like Alphine-- a tiny busybox shell provides enough features to run the application while still supporting some manual debugging with docker exec. It also avoids false positives like these, letting you more quickly find precisely what you need to upgrade when a new OpenSSL vulnerability is announced.

(Disclaimer: I work on Google Container Engine / Kubernetes).

This is a very well-written explanation.

The only exception is when people have access to the underlying container, willing or not. Then these vulnerable binaries can lead to a vulnerable container.

This is also why the subjectivity in CVE rating is such a significant problem.

In addition, it's really not that hard to add the equivalent of "apt update && apt -y upgrade && apt autoclean" to every dockerfile.

One thing that does get a mention but only right at the bottom of the post is using smaller base images (e.g. Alpine).

If you can I'd recommend this as a good practice to reduce these kinds of problems. The fundamental fact is that if you don't have a library installed, you can't be affected by a vulnerability in it. So the smaller your image, the fewer possible avenues for shipping vulnerable libs you'll have and you'll have to spend less time re-building images with updated packages.

Honest question: Do you know if Alpine ships patches to substantial vulnerabilities more quickly than Debian and/or Ubuntu? Thanks!

I don't know the answer to this, but I do know that Alpine has some really awesome stuff around vulnerabilities, and I would presume that they react to vulnerabilities more quickly.

However, I intend to validate this presumption in a future project.

Thank you! I'd love to hear your findings.

Absolutely. I intended this post to identify (some of) the problems/challenges. My next post will focus on how to address them.

Alpine is definitely one of the major points, as well as static binary images and some advice on Dockerfile configuration.

I'm looking for a base image choice and this article helped me a lot. It seems Debian base image is a good choice so far. Alpine is quite popular lately but I'm afraid musl library may cause some headaches in the future. Is Debian to go for production use? What about other alternatives like Centos?

CentOS/RHEL have a very small footprint in the open source community, it seems. I was pretty surprised by this because they have such significant corporate backing, a lot of enterprise software is RHEL only, and they may be the only linux distribution currently support SCAP (required by FISMA for federal agencies).

In order, I would opt for: binary image, alpine, then debian. There are other choices like CoreOS, FreeBSD, etc. if you are comfortable moving away from linux.

CoreOS, the distribution, is a Linux distribution. They also have a lot of container oriented tools. The distro itself is optimized for use in container management. It is also relatively lightweight for use as a container (but not as light as Alpine).

Good point. What I meant was comfort moving away from a major distribution.

NixOS is another distro that looks interesting.

Not for shaming purposes, but to see if there are any patterns, will you release a list of the docker images reviewed, and which have vulnerabilities?

Do those without vulnerabilities use a CI/CD process which results in the container being auto-updated whenever there are new releases?

I will absolutely release some data. I intend to fully automate this research so that it is current whenever viewed as well.

Not sure about the state of CI/CD in the image building process, I assume it varies wildly. Two of the major points I'll address in my next posts are regarding deprecation in Docker repositories and lines of a Dockerfile important to minimizing vulnerabilities.

To be clear, one of those lines relates to making sure you pull in upstream during image building. This is super important, as it seems that people have assumed their base image will be current and that is not always the case.

So thats another factor to see if its a pattern: Do the images w/o problems apt-get update && apt-get upgrade

And maybe there's an opportunity for a chrome browser extension that can overlay an indicator when choosing a docker image to pick one that uses best practices like that.

There absolutely is a pattern, but the thing is -- even if the image is updated at build, as soon as you deploy it, vulnerabilities begin to emerge.

If this is the vulnerability rate for 'latest' Docker images...imagine how many servers in general have vulnerabilities like this.

That's exactly what I'm trying to highlight. After you take that 'latest' image, if you're not applying updates regularly, you are vulnerable from almost day 1.

This also applies to most of the AWS, Digital Ocean, etc images I have seen as well. I'll be writing more about how to mitigate this in the next article.

Is there a way for teams with production Docker deployments to easily experiment with this kind of scanning on their own infra to understand their own situation? Maybe worth writing up a quick description of how operators can do something like that.

Absolutely. Docker and Quay.io both offer scanning for repositories they host, there are open source options like vuls and clair that are a bit more work to set up, and we have a free plan for up to 5 hosts and for open source projects and schools.

Happy to help if you need a hand.

How do people currently scan their infrastructure to look for vulnerabilities? Do you have a dedicated team that handles this, or is security "everyone's job"?

I'd say that's very organization specific. Personally I'd see the maturity curve being

No-one's doing it --> specialists doing it --> everyone's doing it.

With the speed of modern development, ideally everyone should have a good handle on basic security practices, ideally with a specialist team available for more niche requirements

I did some market research before I started working on Federacy (which began as frustrations I encountered at mopub/twitter). It seems that very few companies sub-hundreds of employees and thousands of servers have a security specialist, and almost no one is running vulnerability analysis in a real way.

So really big companies will/do have teams of VA people, also where they're a regulated industry so subject to things like PCI, I'd expect to see that as well.

However as you say in the small company space, it's very hit or miss as to what effort can be put into this kind of work.

The thing I'd say about services that do package vuln. scanning is that they can be useful but it's easy to get seduced by absolute sounding numbers (e.g. a CVSS 10 oh that must be much worse than a 4).

Unfortunately from what I've seen scoring can be pretty arbitrary (e.g. https://raesene.github.io/blog/2014/11/17/want-to-improve-yo... )

Also the problems there have been in the CVE space (http://www.theregister.co.uk/2016/05/25/mitre_fighter_deploy...) could reduce the efficacy of that kind of scanning if there are gaps where vulnerabilities are not being placed into the system.

All that's not to say there's no value in that kind of work, it's definitely a piece of the programme, but it's important to get it in the appropriate context :)

To add a bit of detail here, one of the most surprising things I found that I'm saving for my next post is: 24% of recent vulnerabilities in the NVD have no rating, and that doesn't even include the ones that weren't posted to NVD at all.

On top of this, the fact that the rating systems used by the different vendors/sources of vulnerabilities are quite different, and like you mentioned, the implicit subjectivism... it's a mess. But a solvable one! That's what I'm working on.

good luck :) it's an interesting challenge for sure!

Thank you. I'll definitely be reaching out.

You're spot on. These are two of the things I intend on working on next.

Thanks for the links.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact