The quantity of vulnerabilities in an image is not really all that useful information. A large amount of vulnerabilities in a Docker image does not necessarily imply that there's anything insecure going on. Many people don't realize that a vulnerability is usually defined as "has a CVE security advisory", and that CVEs get assigned based on a worst-case evaluation of the bug. As a result, having a CVE in your container barely tells you anything about your actual vulnerability position. In fact, most of the time you will find that having a CVE in some random utility doesn't matter. Most CVEs in system packages don't apply to most of your containers' threat models.
Why not? Because an attacker is very unlikely to be able to use vulnerabilities in these system libraries or utilities. Those utilities are usually not in active use in the first place. Even if they are used, you are not usually in a position to exploit these vulnerabilities as an attacker.
Just as an example, a hypothetical outdated version of grep in one of these containers can hypothetically contain many CVEs. But if your Docker service doesn't use grep, then you would need to manually run grep to be vulnerable. And an attacker that is able to run grep in your Docker container has already owned you - it doesn't make a difference that your grep is vulnerable! This hypothetical vulnerable version of grep therefore makes no difference in the security of your container, despite containing many CVEs.
It's the quality of these vulnerabilities that matters. Can an attacker actually exploit the vulnerabilities to do bad things? The answer for almost all of these CVEs is "no". But that's not really the product that Snyk sells - Snyk sells a product to show you as many vulnerabilities as possible. Any vulnerability scanner company thinks it can provide most business value (and make the most money) by reporting as many vulnerabilities as it can. For sure it can help you to pinpoint those few vulnerabilities that are exploitable, but that's where your own analysis comes in.
I'm not saying there's not a lot to improve in terms of container security. There's a whole bunch to improve there. But focusing on quantities like "amount of CVEs in an image" is not the solution - it's marketing.
Relying on an Alpine/Debian/Ubuntu base helps to get dependencies installed quickly. Docker could have just created their own base distro and some mechanism to track package updates across images, but they did not.
There are guides for making bare containers, they contain nothing .. no ip, grep, bash .. only the bare minimum libraries and requirements to run your service. They are minimal, but incredibly difficult to debug (sysdig still sucks unless you shell out money for enterprise).
I feel like containers are alright, but Docker is a partial dumpster fire. cgroup isolation is good, the crazy way we deal with packages in container systems is not so good.
Sure if you're just checking for base-distro packages for security vulnerabilities, you're going to find security issues that don't apply (e.g. an exploit in libpng even though your container runs nothing that even links to libpng), but it does excuse the whole issue with the way containers are constructed.
I think this space is really open too, for people to find better systems that are also portable: image formats that are easy to build, easy to maintain dependencies for, and easy to run as FreeBSD jails OR Linux cgroup managed containers (Docker for FreeBSD translated images to jails, but it's been unmaintained for years).
> e.g. an exploit in libpng even though your container runs nothing that even links to libpng
it's a problem because some services or api's could be abused giving the attacker a path to these vulnerable resources and then use that vulnerability regardless if it is used currently by your service.
I like my images to only contain what's absolutely needed to do that job only. It's not so difficult to do, provided people would be willing to architect systems from the ground up, instead of pulling in a complete debian or fedora installation and then removing things (that should be outlawed imho lol). Not only do I get less attack surface but also smaller updates (which again then is incentive to do more often), less complexity, less logs, easier auditabile (now every log file or even log line might give valid clues), faster incident response, easier troubleshooting, sorry for going on and on ...
It's a cultural problem too: where people working in an environment where it's normal to have every command available on a production system (really?), and where there is no barrier to install anything new that is "required" without discussion & peer review (what are we pair programming for?) or where nobody even tracks the dead weight in production or whether configs are locked down?.
I sometimes think many companies lost control over this long ago.  :(
The ops folks I work with banter around the same idea that you're getting at here, that engineers should not have access to the production system they maintain. I'll ask you the question I ask them: when the system has a production outage¹, how am I supposed to debug it, effectively? To do that, I need to be able to introspect the state of the system, and that pretty much necessitates running arbitrary commands.
Even if I'm stripped of such abilities… I write the code. I can just change the code to surface what data I need to do, and redeploy. That can be incredibly inefficient, as deploying often resets the very state I seek to get at, so I have to sometimes wait for it to recur naturally if I don't have a solid means of reproducing it.
You debug it via tooling, instrumentation and logs. I realize if you're accustomed to sudoing on prod when troubleshooting, this sounds crazy. Trust me, it works fine; better, in fact. Far fewer weird things happen in well-controlled environments.
One of the technical things it does is segregate (on purpose) developers from any kind of production access.
This is because (from historical experience) a "bad apple" developer can do amazingly fraudulent things, and have a more than reasonable chance of covering it up. o_O
We have a platform as a service team which maintains our PaaS infrastructure (think an internal version of Heroku), and they are the only ones who can SSH into any production systems (<50 engineers, I'd guess).
Engineers write the code (mandatory green build and peer review required before merge to protect against bad actors.. but that's just good engineering practice too!), build their own deployment pipelines on Bamboo or Bitbucket Pipelines to push up code & assets to a docker registry, and ultimately the deployment pipelines make some API calls to deploy docker images. Engineers are also responsible for keeping those services running; most products (such as Jira, Confluence, Bitbucket, etc) also have a dedicated SRE team who are focused on improving reliability of services which support that product.
The vast majority (95%) of our production issues are troubleshooted by looking at Datadog metrics (from CloudWatch & services publish a great deal of metrics too) and Splunk (our services log a lot, we log all traffic, and the host systems also ship their logs off). Fixes are usually to do an automated rollback (part of the PaaS), turn off a feature flag to disable code, redeploy the existing code to fix a transient issue (knowing we'll identify a proper fix in the post incident review), or in rare cases, roll forward by merging a patch & deploying that (~30 mins turnaround time - but this happens <5% of the time). Good test coverage, fast builds (~5 mins on avg), fast deploys, and automated smoke tests before a new deploy goes live all help a lot in preventing issues in the first place.
It's not perfect, but it works a lot better than you might expect.
Asking because is one of the conceptual things I've been trying to figure out.
Still currently using non Docker deployment of production stuff for our public services. Have looked at Docker a few times, but for deployment to the public internet where being accessible to clients on both IPv4 and IPv6 is mandatory, it just doesn't seem to suit.
Docker (swarm) doesn't seem to do IPv6 at all, and the general networking approach in non-swarm docker seems insecure as hell for public services + it also seems to change arbitrarily between versions. For a (currently) 1 man setup, it seems like a bad use of time to have to keep on top of. ;)
Maybe using Nginx on the public services, reverse proxying to not-publicly-accessible docker container hosts would be the right approach instead?
Btw - asparck.com (as per your profile info) doesn't seem to be online?
The actual PaaS relies pretty heavily on AWS Cloud Formation - it predates swarm, mesosphere, kube, etc. So when we deploy a new version of a service, it's really "deploy an auto scaling group of EC2 instances across several AZs fronted by an ELB, then there's an automatic DNS change made which makes the new stack of EC2 instances live once the new stack is validated as working". The upside of the one service per EC2 image approach is no multi-tenancy interference - downsides are cost and it takes a bit longer to deploy. There's a project underway to switch compute to being Kube-based though, so that's promising.
All this is apples and oranges though - solns for big companies don't make sense for a 1-person shop. I still have side projects being deployed via ansible and git pull instead of Docker, because it hasn't been worth the ROI to upgrade to the latter.
Re asparck - yeah, it was my personal site but I struggled to find the time to work on it. In the end I decided it was better to have it offline than terribly out of date, but hopefully I'll resurrect it some day.
Problems with this are as follows (real, not imagined)
1. AWS cloudformation scripts - who makes them? If dev does, sysads can't change.
2. Does dev have the security mindset to maintain configurations in IaaS things like Cloudformation? Who reviews things like NACLs, Security Groups, VPCs, and the like?
3. Scripts - how big or what impact does a script need to be written by sysad or dev?
4. Oncall - normally are sysads job, but when you implement strong gates between dev/sysad, you need oncall devs.
Note that I'm not saying devs should have access to every production machine; I'm only saying that access should be granted to devs for what they are responsible for maintaining.
Sure, one can write custom tooling to promote chosen pieces of information out of the system into other, managed systems. E.g., piping logs out to ELK. And we do this. But it often is not sufficient, and production incidents might end up involving information that I did not think to capture at the time I wrote the code.
Certain queries might fail, only on particular data. That data may or may not be logged, and root-causing the failure will necessitate figuring out what that data is.
And it may not be possible to add it to the code at the time of the incident; yes, later one might come back and capture that information in a more formal channel or tool now that one has the benefit of hindsight, but at the time of the outage, priority number one is always to restore the service to a functioning state. Deploying in the middle of that might be risky, or simply make the issue worse, particularly when you do not know what the issue is. (Which, since we're discussing introspecting the system, I think is almost always the part of the outage where you don't yet know what is wrong.)
This is what I always felt was the more appropriate use of tech debt. You are literally borrowing against tech built by others for that likely did not have the same requirements as you.
Is it convenient? Yeah. But it breeds bad choices.
I only have so much time, and very little is budgeted towards things like pushing information into managed systems. I do that when and where I can, but I do not get (and have never, frankly) gotten sufficient support from management or ops teams to have sufficient tooling/infrastructure to where I can introspect the system sufficiently during issues w/o direct access to the system itself.
The only place where I really disagree on principal (that is, what you propose is theoretically possible given way more time & money than I have, except) is unexpected, unanticipated outages, which IMO should be the majority of your outages. Nearly all of our production issues are actual, unforeseen novel issues with the code; most of them are one-offs, too, as the code is subsequently fixed to prevent reoccurrence.
But right at the moment it happens, we generally have no idea why something is wrong, and I really don't see a way to figure that out w/o direct access. We surface what we can: e.g., we send logs to Kibana, metrics to Prom/Grafana. But that requires us to have the foresight to send that information, and we do not always get that right; we'd need to be clairvoyant for that. What we don't capture in managed systems requires direct access.
I'm not really disagreeing. There will be "break glass" situations. I just think these situations should be few and far between, and we should be working to make them fewer and farther. Consider, when was the last time you needed physical access to a machine? Used to, folks fought to keep that ability, too.
High-quality testing of your systems, leading to engineers working on generating playbooks that cover the vast majority of production incidents, could be one approach that some might consider. Designing in metrics that aid debuggability could even be possible in some scenarios! Taken together, this can mean engineers get woken up left often for trivial things.
This isn't impossible. It's not even difficult or complex. It is time-consuming, and definitely requires a shift in mindset on the part of engineers.
For any incident that happens, I'm going to — if at all possible — fix it in code s.t. it doesn't happen again, ever. There is no playbook: the bug is fixed, outright.
That only leaves novel incidents, for which a playbook cannot exist by definition. Had I thought to write a playbook, I would have just fixed the code.
(I am not saying that playbooks can't exist in isolated cases, either, but in the general case of "system is no longer functioning according to specification", you cannot write a playbook for every unknown, since you quite simply can't predict the myriad of ways a system might fail.)
These points are very wise and correct. Yet, is it possible that situations might occur that don't fall into these situations? For instance, a hardware failure or a third-party service failure, or a common process is mis-applied and needs to be reversed. There could be a vast number of potential scenarios that are neither bugs nor novel events for which playbooks could be authored. There is non-trivial value to be gained in making it easy for an operational team to handle such events, particularly when events that do recur have their handling codified.
You are, of course, absolutely correct to note that many events either will not recur or cannot be anticipated. Yet, might there also be value to be gained by recognizing that there are events outside these categories that can be anticipated and planned for?
Otherwise what's the point of automated testing? Just fix any bugs when they show up and never write tests!
that is a point, but it wasn't at all _my_ point. with what is available I was referring to the commands that are installed inside the container which allow potential breakout of the container once the container is compromised.
fwiw there is a breaking point with teams that don't restrict access to the production environment. once too many people have access it becomes unmanageable.
The busybox image is a good starting point. Take that, then copy your executables and libraries. If you are willing to go further, you can rather easily compile your own busybox with most utilities stripped out. It's not time intensive because you need to do it just once, and it takes just an afternoon to figure out how.
"The tooling makes it too easy to do it wrong." Compared to shell scripts with package manager invocations?
Nobody configures a system with just packages: there are always scripts to call, chroots to create, users and groups to create, passwords to set, firewall policies to update, etc.
There are a bunch of ways to create LXC containers: shell scripts, Docker, ansible. Shell scripts preceded Docker: you can write a function to stop, create an intermediate tarball, and then proceed (so that you don't have to run e.g. debootstrap without a mirror every time you manually test your system build script; so that you can cache build steps that completed successfully).
With Docker images, the correct thing to do is to extend FROM the image you want to use, build the whole thing yourself, and then tag and store your image in a container repository. Neither should you rely upon months-old liveCD images.
"You should just build containers on busybox." So, no package management? A whole ensemble of custom builds to manually maintain (with no AppArmor or SELinux labels)? Maintainers may prefer for distros to field bug reports for their own common build configurations and known-good package sets. Please don't run as root in a container ("because it's only a container that'll get restarted someday"). Busybox is not a sufficient OS distribution.
It's not the tools, it's how people are choosing to use them. They can, could, and should try and use idempotent package management tasks within their container build scripts; but they don't and that's not Bash/Ash/POSIX's fault either.
This should rebuild all.
There should be an e.g. `apt-get upgrade -y && rm -rf /var/lib/apt/lists` in there somewhere (because base images are usually not totally current (and neither are install ISOs)).
`docker build --no-cache --pull`
You should check that each Dockerfile extends FROM `tag:latest` or the latest version of the tag that you support. Its' not magical, you do have to work it.
Also, IMHO, Docker SHOULD NOT create another Linux distribution.
When bootstrapping a VM or container image it's the same principle (imo), it's safer to once think about what is needed and have that locked down and under control rather than having to keep track of what else was forgotten or might be suddenly a flaw in this ever changing "threat landscape". E.g. the libpng example from above: What else is lying around that is suddenly vulnerable tomorrow?
If you got time to compile your own kernel then why not. It's a process IMO and I usually get to that stage once everything else is locked down. To start a minimal set-up using alpine (or busybox) instead of ubuntu or whatever is already a huge reduction of attack surface (and complexity). Next I want restriction of all system calls on a process basis (if a syscall isn't whitelisted for this specific process it's an error). Once this is in place I might think about the kernel but in my scenarios this is rare. It really depends what your service does (what you want to protect).
It's imo less work (less maintenance costs) in the long run and always knowing what you have (and why) rather than having to think "oh I didn't know there was foo too? what does foo do"? Because as code changes and as you ship updates you will anyway be attending to updates that suddenly break your setup (and run against your whitelist). But doing this review/audit during an upgrade with a system that suffers from "feature and scope creep" is impossible (at least imho).
generally the less I have installed the less I need to track what might be suddenly a problem. Once my container/VM image is the way I want it I push it to my own docker registry and only deploy from there to production.
EDIT: really good talk on docker security (not just microservices) https://www.youtube.com/watch?v=346WmxQ5xtk
Sanitary environments seem to be in the extreme minority to me!
With Docker and its procedural dependency declarations ("apt-get install this", "wget | bash that") we've lost all that precious version information and reproducibility.
Imagine if Docker was like Bundler/Maven/… but for Debian/Ubuntu repositories. You could deterministically reproduce an image from an easily auditable package file and basically do an apt-get update on the .lock file to fix the security vulnerabilities in the selected versions.
I think one of the big problems is that people are using insane base images (and by "people" I include myself, because I'm guilty of it too). A Debian distro is not an appropriate base image for a Rails server, for instance.
For what it's worth, though, there is nothing stopping you from using apt the way you want. You have to do it all by hand (i.e. specify the versions you want when you install them). The dockerfile is your lockfile. I have been thinking that something like Guix is a much better fit, though. I just haven't gotten around to trying it...
FWIW I made a presentation years ago about this topic (it's in the second half):
Until recently, they don't follow immutable versioning. Publisher can remove package at will.
The NPM team looks like they ignore any knowledge any existing package manager before them ever did. They just happen to get included by NodeJS and become defacto by convenience.
I remember when I first starting dabbling in web development from an avionics systems background and I'd just made a little toy express app that did a couple tricks and I was kind of exploring and naive and took an innocent gander into that specially icon'd node_modules directory and it was filled with other projects and peaking into a few those I started to get this feeling of dread, the kind that starts in your stomach and slowly diffuses through the rest of you with that leaden feeling of growing hopelessness. Looking back, I should have stopped there, I really should have, but instead I popped open a terminal and, I can still hear each key as my fingers thumped it out: `tree ./node_modules`
Watching the unraveling was me bearing witness to everything I had ever known and loved and had believed in and strived for get thrown naked and defenseless and pathetic into the abyss and still it kept unraveling and still I sat and watched the descent until I no longer knew where I was or even who I was and still the modules unraveled relentlessly.
I spent that night looking into the mirror as if looking into the eyes of a stranger. The person staring back at me was someone I had not only ceased to know but had never known to begin with. And when the sun rose I had changed and I knew then as the sparrows began to chirp into the cold winter twilight spreading light over a now alien and unrecognizable planet that I had lost something I would never regain because to lose it is to simply realize it was never there to begin with.
I steeled my nerves and pulled on my itchy long underwear and buttoned up my plaid button up and I did not put on socks and headed downstairs to the kitchen and put on Now That's What I Call Music 23 and I drank a glass of orange juice and ate my oatmeal and the mix I had hitherto so much enjoyed now struck me as chaotic and ill-thought out and then Abba came on and the song was about a Dancing Queen and I resolved myself once more and brought my dishes to the sink to rinse them and decided there was nothing to do besides async.waterfall the next set of promises only after the necessary precedents resolved (which is how we spoke back in 2016s).
And I found myself back at my machine and worked like hell to reign my terror in because the tree command I had run the previous evening was still printing to stdout and I closed my eyes and blindly mapped my fingers to ctrl+c and felt just a little twinge of reassurance as the reflexive `clear` followed without me having to have consciously willed it and I took a deep breath and scratched my leg because of the itchy long underwear and then I was typing out `npm --i numbButAlive` and by the shift+b,u tap dance I began to feel hope one last time and finger resting slack on the carriage return I couldn't bring myself to push it yet for some reason and I froze and I clearly remember hearing Agnetha at that exact moment singing about that 17 year old dancing queen, "Having the time of your life, oh, see that girl" and then the hope was gone because it wasn't ever there it was really there like the hopelessness of hoping to feel hope and I felt an anger and I revolt in me rise up and the room filled with my voice "Fuck it, fuck it all!" and I meant it: fuck this world and fuck Abba and fuck Now That's What I Call Music Vol. 3 and up (with the exception of Vol. 7, maybe) and fuck all the empty dreams of all the fucking vapid nobodies polluting it with their nonsense parade of nonsense distraction after distraction after distraction from the truth that their life will end in death and so will their children's and their children's children and there's no greater significance to that, no hidden meaning or lesson, only death and emptiness and going through life knowing this and knowing how all everything you do to whittle away at the precious little time you've been alotted is a complete waste despite your protests and denials and self-deception and still you keep denying it to your dying breath because here even acceptance is denial in that acceptance is meaningless to that which is what is with or without your fucking acceptance and I found myself laughing a hollow, bitter laugh, a laugh that didn't sound like me, a sound I could never had made until that moment, this moment, a moment without joy or sadness or anything besides utter resignation and defeat and I tacked a `-g` onto the command for no other reason than fuck it, fuck everything and everyone and all their delusional basis for anything and it was easy to push the carriage return now and as I did so and I tasted something new to me then and it must have been my first taste of true freedom in all its terrifying glory, a freedom that leaves nothing to its beholder and is beholden to no one and ultimately sentenced to an eternity imprisoned within itself to as its becoming of self is just a much a defiance to what it should have been free to become.
Can you expand on this? With respect to packages a container is just a mounted file system. A base image is just a way to start with a known file system state. You can do whatever you want to the file system state using whatever package manager you like. In what way should docker have attempted to exert more control over the file system to improve package management?
Docker is my go-to example for worse-is-worse, because of that. They have solved only the easy problems and gotten a phenomenally approachable UI as a result (everyone who has used the console to install dependencies can write their own Dockerfile). But in the process they have occupied the niche in which a better packaging solution could have evolved and grabbed all the mindshare with enormous marketing effort (aided by an easy-to-use product).
You can create images in other ways than Docker or Dockerfiles (ocra-build, img). Other programs can run container images (runc, containerd).
For example, Google's BLAZE/BAZEL build software can directly output a container image (and upload it to a registry) and then you can run that with runc on any platform and you haven't touched Docker or a Dockerfile once.
From the docker.com website ( https://www.docker.com/get-started ):
> Building and deploying new applications is faster with containers. Docker containers wrap up software and its dependencies into a standardized unit for software development that includes everything it needs to run: code, runtime, system tools and libraries. This guarantees that your application will always run the same and makes collaboration as simple as sharing a container image.
Basically I understand that as "write it on my machine, deploy it anywhere". "Everything it needs to run" are the dependencies in my lingo. So for me, all of this is dependency management. I have never asked for a way to drive a file system to a particular state, in the same way that I don't particularly care how a 'node_modules' folder is structured, as long as I can `require` whatever I want inside my programs.
(My point is muddied by the other task docker fulfills for me: Software configuration by creating a directory structure with the following access rights here, writing an nginx config file there. But for me, the ideal scenario would be to reduce the accidental complexity involved in the configuration (I don't care where exactly my program is stored and how exactly it is called, I just want to run it at much reduced privileges and the way I know how to achieve that is to create a custom user and run my program under that user) and define the rest declaratively.)
In another discussion the other day, I had heard programming these days compared to slowly transitioning out of the hunter-gatherer phase and into more structured society. From what I have seen this largely rings true, we are still relying on software that is largely not engineered, but written with loose engineering. The security industry seems to largely be like this, but more of a wild west (as depicted in Westerns) feel to it. Some companies and organizations have structured strategies for security, but even in large organizations like Equifax theres still a kinda "go shoot the bad guys and tie up the gate so the cattle dont get out" aspect to it, very ad hoc.
I am hoping the industry moves more towards engineering things, standardizing interactions, characterizing software modules, etc so that the security industry can spend less time on wild goose chases when trying to figure out how something is supposed to work and how this latest vulnerability applies to that.
Good luck with that. From my PoV, the "industry" has moved away from standardization ever since around 2008 when the "consumerization of IT" became a thing. Almost no meaningful standardization has been carried out in this decade. Previous efforts are being derided (XML bashing etc.) by people who haven't experienced 1990s lock-in. The web is driven ad absurdum with folks standing by and cheering. Enterprises are catering to the uneducated with "REST microservice" spaghetti, and younger devs mostly get their information from random (or even targetted) product blogs on the 'web.
It depends on need and risk.
In theory I agree that there are trade offs like this, but in practice I rarely see them being applied properly. A small startup up using electron to build a cross platform app for instance, I can see how that's a good trade off, but then you see multi-billion dollar companies with hundreds of devs and millions of users building electron apps when they can easily dedicate the resource for native ones.
Security tends to be similar, giant (non-tech) companies with lots of important data are the ones that optimize for cost the most and don't care about the risk.
I'm a pretty big fan of Electron + Cordova to reach a broader base of users. I don't think it's a net bad as a user who prefers the same tools on windows, linux and mac as much as possible.
There are a lot of things you get in the box with a browser based platform beyond the cross browser support. I mean even reflows and alignments working right are far more consistent, more easily. CSS/Styling is the same as the browser which is very flexible/capable. Some may dislike JS, but it gets the job done.
But on the flip side, I've seen people build an entire application (executable) or service from what could be a simple script in any given scripting language.
If client doesnt need extensive security i would strongly advice him to make compromises to push his product out faster.
Professionals should put client needs at first place and help meet them without making the client run out of money running after some Utopian image he does not even need.
There is a saying which also expresses the same intent: "life is a series of compromises".
If businesses tend to ignore safety and gamble with respect to human life (in scenarios where it results in net cost savings), I have no remote disillusionment that they'll care about data security in cases human life isn't at risk (again, unless it's shown to reduce overall costs and convince management of that).
This is a really dangerous way of thinking of container security.
First of all, if I give employee A a list of 500 "vulnerabilities" to fix, and employee B a list of three real, serious vulnerabilities to fix, employee A is much more likely to wait for the problem to go away and employee B is much more likely to find the issue approachable and get it resolved. It sounds like a big difference until you understand that actually employee A was given the same three serious vulnerabilities plus 497 "unexploitable" vulnerabilities, and just didn't know which was which because the three serious vulnerabilities got lost in the noise. You need to instill a zero-tolerance culture to make sure that you don't let serious vulnerabilities stick around. Compliance regimes acknowledge this which is why they're ultimately reasonably effective at getting large, lumbering enterprises to be secure.
Second of all, while much of the open-source software inside your container may not be directly invokable without an exploit, the fact of the matter is that virtually no organization is subjecting every release of software produced in-house to rigorous security auditing. Yeah, your software needs to be pwned before those vulnerabilities matter, but Murphy would like to remind you that your software was being pwned while your wrote your comment and that the attackers have exploited the other software in the container while you were reading mine. And maybe it makes a difference, for example, if your containerized service runs under a limited-privilege service user but a vulnerability in adjacently installed software permits the attacker to escalate to root within the container.
You're right in that most orgs probably have lower-hanging fruit that provides better bang for the buck to improve their org's security posture. But adopting an attitude of "meh, not all CVEs are really CVEs" is irresponsible at best.
Isn't "really dangerous" a bit hyperbolic here? I'm describing a process by which you figure out the actual risk of vulnerabilities before treating them further. You can find quotes in my comment to make it looks worse than it is,
but I was not expecting to find the process I described to be controversial.
But yes, it's true: I'm advocating a risk-based approach to such vulnerabilities rather than a compliance-based one. I guess which is better depends on organizational fit and personal taste.
I'm also confused what to do with your example of "fixing container vulnerabilities" in this context of base image vulnerabilities. Both employees A and B would have to fix their set of vulnerabilities by either (a) updating the vulnerable base image or (b) switching to a different base image. Fixing base image vulnerabilities is not the pick-and-choose versus all-or-nothing affair you seem to be describing.
The assumption is that we're talking about big-enough orgs here. If you're running a small enough org, take a weekend, put your system in distroless containers and be done with it. Container scanners don't add much value to small orgs to begin with, their whole value proposition is "you run so many containers that you can barely keep track of them, so here's a tool that helps you understand the true state of things." Process and compliance are practically a given.
> either (a) updating the vulnerable base image or (b) switching to a different base image
Non-trivial in practice. Hopefully your org has standardized on a single base image which the org maintains and takes responsibility for, so (b) is a non-starter. If you could just update the base image fleet-wide overnight without issues then we wouldn't need containers in the first place; if you tried to do that twenty years ago you'd instantly cause rolling outages (now you'll roll back after your canary dies, but it's a moot point, you still aren't de-facto instantly updating). Containerization made it easier and safer to deploy services, but it didn't give you a "click here to magically update everything with no risk of rejection" button. Vulnerable services have often been vulnerable for long periods of time, with complicated update paths, possibly needing in-house patches, etc.
If you are working in a more political organization (that is not a value judgment - that often comes with organization scale) then other things influence you processes. I'm sorry, but that's not the perspective I take. That doesn't make my approach any more dangerous though - I think it's an appropriate perspective and I'm happy I can take it.
The problem here is that your assessment of how a vulnerability might be leveraged or accessed is bound by your own teams limited knowledge.
The reality is, the attackers knowledge and creativity is more or less unbounded (and unknown). So making that judgment call of what is a real risk, vs having zero tolerance is a huge gamble IMHO, especially if your teams are not red team wizards.
Moreover, it seems you're stating that "no tolerance" should focus on having no CVEs in container images. Does the CVE database really have that level of authority for people? It seems like the wrong thing to focus on even in these hypothetical no tolerance situations, I'm really not sure what to tell you there.
But it is useful for maintaining a system if you examine this as an ongoing business process in which you're continually trying to minimize a set of "unknowns". For third-party libraries, I argue it's generally cheaper to get rid of unknowns (when you can) rather than take the time to quantify them. What's left over is easier to prioritize.
As you point out, all these things are probably not vulnerabilities, but they might be. What's the likelihood? Well, by upgrading or patching, the probability becomes zero, and then you can stop caring about exactly what it is. Patch it, move on.
(And, to be clear, some unknowns are related to your code, so you do want to investigate those, but those are presumably the unknowns that you have the most expertise with, so they're much cheaper.)
It often takes more time to assess whether your system is truly vulnerable to a given public exploit than it takes to just grab a newer version of the component.
Also worth considering: getting pwned because of a 0day is no fun, but getting pwned because of an unpatched CVE in your system - priceless.
* The ability to get a containerized app to promote you to an in-app admin
* Getting RCE as the application user
* Escalating from the application user to the container's root
* Going from container root to attacking the host
Each of these represents, broadly, an increase in threat. Each attack can be aided by outdated and vulnerable versions of libraries or utilities. It's not always obvious what in your container can be attacked or used to escalate, and how a developer intends a container to be used isn't always a good guide.
Designing for safety means designing for safe failure. Designing for security means designing for being pwned and minimizing the blast radius. The common term of art is "defense in depth".
Yes this is so true, almost all of the time, the vulnerability is only a vulnerability under certain very specific scenarios which are not relevant to the project. For example, it might only be a vulnerability if you pass user input to the function but for all other possible use cases, it's not a vulnerability at all... In fact your system may not even be using that function at all. Snyk will still mark all dependent modules as being vulnerable; but it's a lie; I feel that they are intentionally overzealous just to grab people's attention to their platform... Much like this post "Top ten most popular docker images each contain at least 30 vulnerabilities" - It's attention grabbing but it's not true. The real title should be "Top ten most popular docker images each contain at least 30 possible vulnerabilities none of which are actual vulnerabilities"
I think that Snyk has been very useful for the Node.js ecosystem in terms of encouraging module authors to get rid of unnecessary dependencies but it doesn't change the fact that Snyk is a liar and that we should be cautious with it (some misinformation can be a catalyst for positive change, but too much can be dangerous).
The bad thing about Snyk is that they can only publicly shame open source projects; not commercial solutions (which are usually far worse). They should definitely try to make a distinction between 'vulnerability' and 'possible vulnerability' because it's becoming downright deceptive and it's going to start hurting open source as a whole.
Either they should fix their platform to have fewer false positives, or they should fix their communication around it so that they're not blatantly lying to people and harming the reputation of open source developers who are producing high quality, secure code.
I agree that the article doesn't emphasize what is the actual important point here (it is clickbait-ish), but the numbers they're presenting should (hopefully) trigger people to actually think about putting "continuous eyes" on the container images they're using. Just like you should continuously monitor your code, your application's dependencies, and your host system libraries.
A hacker needs to be right once, you need to be right 100% of the time. That's not marketing.
Current version: 1.5.2-r0
Fixed version: 1.5.3-r0
This actually helps you get actionable data. You can even sort by "stuff I can fix".
The flip side of "this reports too many vulnerabilities" is "this reports too few vulnerabilities", it should always be made clear we are talking only about publicly known vulnerabilities, which is a subset of all discovered vulnerabilities, which is a subset of all vulnerabilities.
Side note: this is why it is more difficult (from a security perspective) to run a computer lab than to host a web application. Much greater attack surface area when you have users who have shell access.
It doesn't matter whether an exploit is locally or remotely exploitable potentially. It matters whether it's exploitable for an attacker.
For example, CVE-2017-5645 is a remote code execution vulnerability in Log4j that will light up your vulnerability scanner like a Christmas tree, but requires you to use Log4j functionality that you will never realistically use in an application container.
Sure it would be nice to see some examples where chaining the CVEs from a popular image can lead to an actual attack, but I wouldn’t write off problem just because someone is marketing their product with it.
I usually don’t look into them, since the script rarely gets run and isn’t exposed on the internet, but I do wonder if there could be any real vulnerability.
Taking a risk-based approach doesn't get you to skip out on thinking about any category of vulnerabilities, at any layer of defense.
Am I crazy?
What does this tell us about Docker as an ecosystem? It's amazing tech, to be sure, but I feel like a lot of projects are leaning on, "just install the docker image," to avoid the hassles of making flexible, compatible, installable, readable software. If people out in the world can't install your software because it's not compatible with a library they have updated to patch a security vulnerability, they you will hear about it, and maybe get a patch. If people just install your docker image... eh, it works, why bother looking behind the curtain and see how? That's a ecosystem where I would expect to see a lot of bloat and security vulnerabilities creeping in and getting worse over time.
We started in a place of way too little concern about security vulnerabilities. Some environments are still there, but many have been driven by draconian policy to go way overboard.
But my big concern here is, "How do Docker users stay on top of vulnerabilities?" And I worry that for many of them, the answer is that they don't. Or they just update their image when a new version comes out. And the latter answer could actually be a big win for security... provided Docker image maintainers are staying on top of vulnerabilities. Is the Docker infrastructure doing a good job of policing that? Of highlighting images that have known vulnerabilities?
Lots of people are replying that the article doesn't give any details about which vulnerabilities. That's valid, but is Docker giving details about known vulnerabilities?
We aren't saying that it was the good approach to security, we are just saying that fear-mongering articles like that aren't the good approach to security.
That remind me of the anti-virus software that listed each cookies as a virus.
If to exploit a security vulnerability in a software you need a security vulnerability that give the same right as that software that contains the vulnerability... yeah you don't have to care about this. I would be happy to be shown wrong on that statement.
Does that apply to all the vulnerabilities they count? Probably not, but it certainly inflate the number, just like cookies in an anti-virus count, both give a wrong idea of what's actually happening and aren't the solution to a safer world.
Am I crazy to think that we don't have to lie to give security suggestion?
> If people out in the world can't install your software because it's not compatible with a library they have updated to patch a security vulnerability, they you will hear about it, and maybe get a patch. If people just install your docker image... eh, it works, why bother looking behind the curtain and see how?
That's your definition of a good approach to security?
Are you asking if many people looking at your software is likely to improve it's security?
The general attitude is that, if you're not saying what the issues actually are, then counting them doesn't tell you anything meaningful.
No. By throwing away dependency management since the beginning, security was thrown under the bus straight away.
Meanwhile some Linux distributions are still doing careful packaging, freezing, vendorization of dependencies, backporting security fixes to stable releases and so on.
- the docker containers are often based on the same distributions
- the distribution keeps received security updates that are often neglected by the docker ecosystem
- the distribution is designed to give maximum control to system engineers: not to install (or remove) any package that is not required
- applications are packaged using dynamic linking and without vendored dependencies as much as possible. Vulnerabilities in dependencies are patched once and for all system-wide
- some distributions do peer review and have a very high entry bar to become a member
Well no, but they're not off-base either.
Security is best implemented through a layered approach.
Security is also an exercise in risk analysis and mitigation.
Such an important point.
Unless your job is nothing but security, you better focus on the impact first.
This article is scaremongering with the intent to market a product. The scaremongering itself is really not helpful to the education of the non-security or non-technical people.
Yes, we should strive to be a lot better and maintaining dependencies is one thing that generally everyone in modern development does bad, but this sort of alarmist posts that have no concrete examples generally just lead to people ignoring the whole field/industry. If those images are exploitable in the ways they are intended to be used highlight that. If it's as bad as this post makes it sound then that should be easy.
Having said that, what can lead to a serious attack for your organization is often subject to a combination of factors. Say, there is an SSRF vulnerability in your own application, because the HTTP library you use doesn't parse the URL correctly, so now an attacker can let your application perform arbitrary HTTP requests. But fortunately, the connectivity of your application server is quite limited, so that an attacker can't reach internal systems, can't go to the internet, uses strong authentication to web services it does use, all the good stuff. So now the chance of a successful, serious attack is largely diminished.
Also, it can be quite complicated to know what the exact, real dependencies of your application are. What is the transitive/recursive list of dependencies your application uses? Which of your application's dependencies actually use libraries on your system? And what are _their_ dependencies? I think that cost wise, it is cheaper to make sure your application dependencies, containers, host system libraries, container orchestration tools, etc. are always up to date.
And yeah, I agree that the post doesn't do a good job at all to provide a sane rationale on _why_ you should update. Anyone who has ever administered an operating system knows that security vulnerabilities are found in them every day. But the awareness that a Docker container is subject to the same pace is definitely not present everywhere, and it probably should be.
Just saying that the default node image has 580 vulnerabilities helps no one actually trying to fix these vulnerabilities or assess how to prevent this in the future.
Just try "docker run -it registry.access.redhat.com/rhel7-minimal /bin/bash" and you're good to go...
This page says (for the NodeJS image)
"Before downloading or using this Certified Container, you must agree to both the Red Hat subscription agreement located at redhat.com/licenses and the Red Hat Connect Certified Container Partner’s terms which are referenced by URL in the Partner’s product description and/or included in the Certified Container. If you do not agree with these terms, do not download or use the Certified Container. If you have an existing Red Hat Enterprise Agreement (or other negotiated agreement with Red Hat) with terms that govern subscription services associated with Certified Containers, then your existing agreement will control."
I started doing this around the time we started doing vulnerability scanning and now the containers are both tiny and free of scannable security issues. I recommend that others take this approach if possible, as having too much stuff in your container increases app startup time, storage costs, and your attack surface.
So far this approach seems to work OK, but it feels unnecessarily hacky, and I wish there was better tool support for this kind of thing.
You also find image that contain some weird little tool, apparently solely because author of the Dockerfile wrote it himself, not because you actually need it.
I would wish that more people would build packages for their distribution of choice and simply install via the package manager, rather than pulling down compile time dependencies and rebuilding things like webservers, databases or frameworks in containers.
We reach the point where I'm concern when people around me use images that aren't just base OS images. The quality of image from Docker hub is all over the place.
Obviously, any reasonable shop will have such a pipeline in place, but DockerHub and the whole ecosystem of docker getting started tutorials seem to really encourage a "set it and forget it" mentality toward a container once it's built and working.
Docker has a lot of problems but build repeatability is not one of them, in my experience. It makes it really really frictionless, in some cases way too frictionless
"Wait what is all th---"
And then the tutorials or documentation just rolls on.... and doesn't really get back to what I should be doing to maintain / secure anything.
Can anyone speak to what the actual attack vector is?
Russ Cox wrote an article about this in 2007, and the situation is still the same. Go doesn't have this problem since Russ Cox is one of the lead authors of Go, and wrote Go's regex library.
- C (and thus everything with a FFI): https://github.com/rust-lang/regex/tree/master/regex-capi
- Go (heh): https://github.com/BurntSushi/rure-go
- Possibly somewhat out of date Python: https://github.com/davidblewett/rure-python
I wasn't trying to say "you should have linked this instead", just trying to point anyone who sees this and goes "that's an issue in my code" in the right direction.
Sometimes there is one, but it's something like if you pipe large amounts of data to socket.io (or whatever else is parsing any input) in a certain way it will hang (often marked as a 7.5/High on CVSS).
And then every once in a while there is an actual, serious vulnerability because you or someone in your dependency chain glued libs together in a certain way and by that point you have already stopped listening to any report looking similar to the two previous examples.
I'm not sure how anyone maintaining a large node app built using current js ecosystem best-practices could ever keep track well enough to actually find the exploitable ones.
It's not perfect, it's maybe not even good, but it's better than nothing.
So to answer your question:
- Really understand what those reports mean and what vulnerabilities apply to your application using those images
- Use a minimal image ( like Alpine ), I'm not a fan of that solution because Alpine is really minimal so it makes troubleshooting difficult, and teams like Ubuntu have competent security teams which Alpine doesn't have.
- Update the image often and have CI/CD pipeline that does that for you ( with a security scanner )
- Some languages like Go can compile with 0 dependencies, so you can use a scratch image that has almost nothing ( it brings another set of problems like updated the app itself when there is a security issue )
With docker, you will attempt a defense in depth. Even if someone breaks into an app in a container, it can be very hard to break into other containers or the host.
I suspect that many, maybe most developers have lower hanging fruits on the security tree than upgrading deployed docker containers daily.
The removal of a vulnerability which can't be used as a link in the "kill chain" of a hacker attacking your system isn't improving security that much.
curl/oldstable 7.38.0-4+deb8u14 amd64 [upgradable from: 7.38.0-4+deb8u13]
libcurl3/oldstable 7.38.0-4+deb8u14 amd64 [upgradable from: 7.38.0-4+deb8u13]
libcurl3-gnutls/oldstable 7.38.0-4+deb8u14 amd64 [upgradable from: 7.38.0-4+deb8u13]
libcurl4-openssl-dev/oldstable 7.38.0-4+deb8u14 amd64 [upgradable from: 7.38.0-4+deb8u13]
libpq-dev/oldstable 9.4.21-0+deb8u1 amd64 [upgradable from: 9.4.20-0+deb8u1]
libpq5/oldstable 9.4.21-0+deb8u1 amd64 [upgradable from: 9.4.20-0+deb8u1]
libsystemd0/oldstable 215-17+deb8u10 amd64 [upgradable from: 215-17+deb8u9]
libtiff5/oldstable 4.0.3-12.3+deb8u8 amd64 [upgradable from: 4.0.3-12.3+deb8u7]
libtiff5-dev/oldstable 4.0.3-12.3+deb8u8 amd64 [upgradable from: 4.0.3-12.3+deb8u7]
libtiffxx5/oldstable 4.0.3-12.3+deb8u8 amd64 [upgradable from: 4.0.3-12.3+deb8u7]
libudev1/oldstable 215-17+deb8u10 amd64 [upgradable from: 215-17+deb8u9]
systemd/oldstable 215-17+deb8u10 amd64 [upgradable from: 215-17+deb8u9]
systemd-sysv/oldstable 215-17+deb8u10 amd64 [upgradable from: 215-17+deb8u9]
udev/oldstable 215-17+deb8u10 amd64 [upgradable from: 215-17+deb8u9]
The article mentions backports of fixes, so I suppose they just don't compare blindly package version numbers with the versions provided in the CVE report. For Debian, they could use the security tracker to know if a CVE is fixed and in which version (something Alpine is lacking, so it's difficult to assess the security of Alpine). However, many CVE are not fixed because the security issue is deemed to be too minor. A bit more details about the 500 vulnerabilities would help to understand.
You could obviously do this with cron, as well, but if you already have a CI pipeline managing your base images, it makes sense to set up a recurring build.
The node:10-alpine image is a better option [...]
while no vulnerabilities were detected
in the version of the Alpine image we tested
… and everything loaded by every dependency under any situation which doesn't require admin access. Your server can be fine but if e.g. you process images you have to follow libjpeg, libpng, zlib, littlecms, etc.
Yes, it's a lot better than a full multiuser Unix system where you have to worry about background processes which aren't useful for a dedicated microservice but there's a long history of vulnerabilities in components being combined into successful exploits and it's usually far more expensive to try to analyze those chains than to upgrade.
This brings me to:
> Or same with Redis, which I hope isn't anywhere near the public internet?
That's hopefully true in general but also consider chained attacks: say you're running a web app and I find a way to run code in the app process. That might be limited but if I can poke at Redis enough to run code there I can test whether you were as diligent about sandboxing it. That'll hurt if, say, there was a container exploit which someone delayed patching because they “knew” our app only runs as an unprivileged user.
How fast these images get updated? How widespread these vulnerable images are?
Maybe we could call it a "base OS image" or maybe "shared libraries."