FUD. The technology of deployment does not change 'minutes or hours' into 'days or months' - it's management red tape that does that. In fact, in my use case, Docker takes a similar time to build as a normal package (.deb) using an up-to-date base image, but is actually slower to deploy, since now my servers have to download a stupidly large container with build-essential (npm doesn't really survive without it), python (because npm maintainers use python frequently), and graphicsmagick (for the in-house app), instead of 'just the app' that's in a normal package.
If your environment is simple enough that you don't have to be concerned with testing in 'staging' against staging databases or similar, then you're definitely not saving 'days', because your env just isn't that complicated.
I wouldn't say that's true. We're transitioning into multiple languages, and want to have an environment that will allow future languages to be added as required. Building a generic infrastructure to run containers lets us run everything on the same base platform. Otherwise, we'd need to tailor the images and configuration for the individual language type. When a new language is introduced, it can take 'days or months' to get everything working well.
That's not to say Docker doesn't require the same attention to security as other options. This seems to me akin to running a downloaded base VM image without first doing updates.
What do I mean by that? shared drives.
Seriously, install python$ver plus dependenceies into /mnt/bin add it into your path. You now have a single source of (readonly optional) each binary version.
this means that you can have many versions of the same software all compiled in a different way. But because they are in the path, they can be transparently managed. Also it means that much of the config management is now in one place, making joining nodes super simple.
No, it is much much MUCH better to actually have an application build with its dependencies and deploy with its dependencies. And you know how you fix issues with security patches? You have a real build system that rebuilds your binaries and you redeploy regularly.
Yes, and now you have to tailor the distribution in the container to the new language. Of course, the impact is smaller than changing one system that contains everything.
However, this problem was long-solved before containers (as in OS-level virtualization) as well in virtualization (Xen, KVM, etc.). (Of course, FreeBSD had containers for ages, but they were largely ignored.)
And here we have the prime example, why the Docker-model of building and distributing containers is horrible when it comes to security and maintenance.
Bundling dependencies for production environments has always been and always will be a terrible idea.
> Bundling dependencies for production environments has always been and always will be a terrible idea.
We're considering Docker currently -- not for the distribution model at all, since we'd only ever use our own internally built & maintained images -- but as a clean way to break apart dependencies, and make it possible to run a diverse multiple-server-type environment (production) in miniature (development, demo, UAT).
I quite like the idea of something that may occupy multiple VMs or dedicated servers in production be able to run as a lightweight app in a dev environment, with exactly the same dependencies in place -- that's quite useful.
If this kind of use case is also a terrible idea, I'm interested to hear more -- we're just now tinkering with the idea, and haven't yet moved from theory to practice.
My own concerns revolve around how easy it will be to keep updated on RHEL patches, for example -- apparently we should be able to keep both host and app dependencies updated without much trouble, but it adds more complexity to the maintenance cycle (it seems).
That's about the "problem" with Docker – it's deceptively easy to roll out everything as its own containerized app. Updating? Not so much.
It turns Docker from a magical silver bullet into a slightly fancier way to handle reproducible deployments. Using it this way is fine, but not what Docker is marketed as by many.
sudo docker exec -it my_pgsql_container_name /bin/sh -c "apt-get update; apt-get -qqy upgrade; apt-get clean"
For pg, there might be some migration needed when jumping from a major version to the next. Which requires both versions installed, on Debian at least.
Many programs have their state represented as files that are stable across versions. If you have a cluster of the same image with different states it's more efficient to move volume containers across a network. Easier to backup/upgrade too.
pg is going to give you those problems whether you are using Docker or not.
Note: I am not related to Redhat, but we are considering Docker, too. And we are evaluating how would Atomic fit in our infrastructure.
I'd just use an existing one. PaaSes require an enormous amount of work to make them featuresome and robust. That's all work you're spending that isn't user-facing value.
I've worked on Cloud Foundry and so obviously I think it's the bee's knees. You might prefer OpenShift.
If you're happy in the public cloud, you can host on Heroku, Pivotal Web Services (my employer's Cloud Foundry instance) or on Bluemix (IBM's Cloud Foundry instance).
But lucky for you, Docker provides some ways to run commands on an existing image, like the RHEL patching/updating tools. It should be possible to update an image's files using RHEL's patches, as long as the whole RHEL install is there in the images.
As far as breaking apart these sets of files into disparate dependencies: again, it's totally possible, but it does not simplify nor reduce your maintenance complexity.
Now, some really stupid people would recommend you compile applications from source and deploy them on top of RHEL, and basically build all your deps from scratch. You don't want to do that because a large company has already done that for you and put it into a nice little package called an "rpm". You take these RPMs and you find a simple way to unpack them on the filesystem, make a Docker image out of them, label/version them, and keep them in your Docker image hub. Now you have your RHEL patches as individual Docker images and can deploy them willy-nilly.
(This is, of course, exactly the same as maintenance on systems without Docker, and your dev & production environments would be the same with or without Docker, but Docker does make a handy wrapper for deploying and running individual instances with different dependencies)
Because I know what I'd bet on.
Also, official images are not production ready, they are apparently intended for development purposes. Take the Django image as an example. The server it runs on start is not Gunicorn, or uWSGI, or Apache. It is the development server of Django. I can do better than that myself.
I don't think that is a problem with Docker - the application. If Docker - the company - does not have the resources to properly maintain so many official images then it shouldn't try to.
You may very well do better than that yourself. I don't doubt that large proportions of HN users would do better.
But how many will?
Consider that the quality of the official docker images is an illustration of the quality of images from people who are above average invested in this.
Look at some of the unofficial images, and you will find incredible dreck very quickly.
Now imagine the set of users of images that have not even tried to build their own images yet, and imagine they were asked to put together their own replacements for the official images to use...
Reason being, you can more easily deal with silly things like goofy hosts, goofy networks, possible lack of internet connections, bad host OS support, etc.
The normal downsides of doing it yourself of course apply.
But that's not what I'm questioning, but whether or not homegrown images on average are going to do better. Look at the non-official images, and see how much nonsense is in there.
If you know you can do better, by all means do. For many of us that is the best option. And I absolutely wish there was more focus on more secure practices for the official images too. But I still think the official images are likely to be better than what most developers would cook up.
Doesn't mean it's good. Just better than the (terrifying) alternative.
The trick is to get people to care about their security. In theory, this is what open source is about. Why not assemble a taskforce to go and secure these containers?
If Docker apps were somehow integrated with maintained Linux repos, this could be possible by default -- e.g., all Docker images built on Debian stable dependencies would have their internal dependencies auto-upgraded with each Debian stable sub-release, and possibly be flagged as "needs human intervention" on major releases.
Have there been efforts to do anything like this? I'm new to the Docker world....
There needs to be, though, otherwise a "secure app" is always a temporary creation.
Its 2015. If security isn't a priority by a project, then that project is just incompetent. That may be harsh sounding, but are we really talking about security as optional with internet facing services? This is what happens when devs build their own systems without the experience of being a sysadmin. There's a lot of kitchen sink and duct tape "does it work? Yes, then we're done," mentalities at play here. Not enough people are worrying about maintainability and upgradability.
Heck, most of these things ship with everything running as root. Its like we've regressed to the 90s with Docker and Docker-like technologies.
If you are not bundling dependencies how do you rollback a deploy that migrated to a new version of a dependency? If you rollback your code, you also have to do something to rollback the dependency.
For Python, I currently rebuild a virtualenv from scratch on each deploy, but it just feels like a poor solution. Docker containers seem like an interesting way to package these dependencies in a way that is portable, where a deploy is just pushing a new version of the Docker container. Is there a Better Way(tm) that doesn't involve me needing to deal building OS packages for all of my virtualenv dependencies?
(I'll note that several dependencies have C extensions, and are thus not pure Python -- e.g. `itsdangerous` depends on `pycrypto` was has extensions.)
That process relies on your platform's own dependency-resolution system, and I hope you're using something sane such as Debian/Ubuntu, or are building from source via Gentoo. RPM distros can work but tend to be far flakier.
Start with a base install, have a package for your own source which specifies deps, including if necessary _maximum_ version numbers for deps, and build the target image. Once that's built, you can generally deploy that directly rather than re-build for each deployed host.
Packaging and image preparation _aren't_ tasks which can be abstracted away entirely. It's this point which the containers craze founders on the reefs of reality. Yes, packaging software properly is a pain. But not packaging it properly is an even bigger pain.
It's 'simple' for me to build a virtualenv in a directory with `pip install -r requirements.txt` in my source repo, but everything I've read about making those virtualenvs portable (even moving them between directories on the same server you built them on) is that it is a path fraught with peril.
In other words, the app should be able to bundle dependencies without having to use a crazy opaque container system, and those dependencies should be easily auditable.
This is the case for Java, where dependencies are 1) bundled with the application, 2) declared explicitly, 3) signed, 4) centrally managed with maven repository software.
> It's now feasible to build a new virtualenv on every deploy. The virtualenv can be considered immutable. That is, once it is created, it will never be modified. No more concerns about legacy cruft causing issues with the build.
> This also opens the door to saving previous builds for quick rollbacks in the event of a bad deploy. Rolling back could be as simple as moving a symlink and reloading the Python services.
This is exactly what I do now: a new virtualenv from scratch on each deploy in the same directory with all other build artifacts (so that each deploy is in a self-contained, timestamped directory that is swapped out with a 'current' symlink). I just bite the bullet on the additional time it takes to deploy.
The part of this blog post that affects me is that upgrading to pip 7 would speed up my deploy times.
This part seems interesting:
> Another possibility is building your wheels in a central location prior to deployment. As long as your build server (or container) matches the OS and architecture of the application servers, you can build the wheels once and distribute them as a tarball (see Armin Ronacher's platter project) or using your own PyPI server. In this scenario, you are guaranteed the packages are an exact match across all your servers. You can also avoid installing build tools and development headers on all your servers because the wheels are pre-compiled.
I've looked at platter a bit, but I haven't really digested what will be needed to migrate to that point, and he doesn't really expand on it.
First, if you look at their own analysis the number drops from 30% to 23% when limited to only the latest tagged images in the official repository. I'd expect to see a higher rate of vulnerabilities in previous versions...that's why you rebuild. Find me a linux admin that would accept their OS is vulnerable if you're citing old, unpatched versions.
Second, they seem to virtually _all_ be package vulnerabilities. These would, ostensibly, reach parity with whatever the target distro is by simply updating packages on a rebuild.
Finally, I think one would be hard pressed to lay any vulnerabilities traced to updated, current packages at the feet of docker. That fault would seem to lie squarely with distro package maintainers.
So, two simple rules would seem to bring the security of container deployment in line with standard bare metal deployment (by the metrics applied in this research):
1. Don't use old shit
2. Rebuild your selected docker container to ensure packages are up to date. Why? See rule #1.
Personally, my biggest gripe with Dockerhub is that a Dockerfile should be required in order to upload to the hub, and it should show the Dockerfile that produced each version. The fact that people can create fundamentally unreproducible binaries is nasty (there's also the issue of not specifying versions in the apt/yum steps used in the Dockerfiles, but that's just a general problem with the way package management software is designed).
None of that's a problem with Docker itself though.
E.g. you have a consistent, reproducible application environment which _should_ be vetted through a gauntlet of continuous integration, testing, etc. that once created will run identically on any host running docker.
If you have a "trusted source" to do all the grunt work for you, fine. But docker's promise isn't guaranteeing a trusted source. It's providing a consistent, invariant application target from developer laptop -> production host.
Just to clarify, our article was not meant to blame any particular party, but rather to provide awareness of the security vulnerabilities that exist even in the latest official images on Docker Hub.
As you point out, this study specifically focused on the OS package vulnerabilities -- including application-level packages and/or other types of vulnerabilities would increase the percentage of vulnerable images.
As we also mention in the article, rebuilding is a great way to solve some of the problems. However, rebuilding comes at a cost -- the overhead of redeploying the container infrastructure, managing audit trails, potential instability introduced to developer applications, etc. These need be balanced against the benefits of rebuilding constantly.
1. Don't use old shit
2. Docker should provide a way to tell you you're not running the latest tagged image so you stop running old shit
3. Don't use base images whose maintainers can't be bothered to rebuild when security updates hit
This is assuming you want to trust some 3rd party with the maintenance and security of your production environment.
Docker containers are, usually, just operating systems running a single logical application service. I don't think Docker promises a free Sys Admin. ;)
It's not about trusting a 3rd party with the maintenance and security of your production environment as much as it is "Docker should provide a way to let the people handling the maintenance of your production environment to know shit may be happening". Rebuilding from the 'latest' tag is great. If you know you have to rebuild, and that there's an update available.
And if you have a continuous integration environment building and validating artifacts on every developer commit with a regular, vetted release cycle that catches any regression bugs...
Well, now you're on the right track.
The libtasn1 bug seems to be only relevant if you're using GnuTLS. Again, not great but not the most widely used library either.
Cutting those two out cuts the number of vulnerable images in half and there's probably a few more rarely used programs with security issues further down the tail. Again, this isn't great, but it's not quite as terrible as the authors are making it to be.
The user supplied packages on the other hand seems to be quite a bit worse.
More importantly, however, you want updates to be a routine frequent thing so you don't train people to ignore them or let the backlog build up to the point where the size itself becomes a deterrent to updating because too many things will change. If you install updates regularly, you keep changes smaller and keep the focus on the tight reaction time which you'll need for serious vulnerabilities.
We think Docker, and containers in general, is a great way to deploy software -- the speed and agility is so much better than traditional approaches. This also means that we should have sound security practices in place from the very beginning, or else we could easily end up with insecure images floating around in several places (dev laptops to public cloud).
Complete agreement here – Docker's strong points are exactly the things which make patch deployment easier than in legacy environments. Hopefully we'll start seeing orchestration tools which really streamline the rebuild/partial deploy/monitor error rates/deploy more cycle when updates are available.
If Docker Hub is a monetization strategy, I think a lot of people might be willing to pay for that -- though it's weird, because that's a problem golden images themselves created, so maybe it's not fair -- and the world would be better if security info was always free. Tracking security updates is hard if you use a lot of deps, anyway, this has the benefit of being a central place that can check these things. Most developers shipping software definitely do not track security history for most of their components, and this is a huge opportunity.
Problem gets harder when people get things from outside package managers and vendor stuff though -- which does not help.
I owe Red Hat for a large part of the way I think about things, and I do think the world would be better if package managers were used more extensively for exactly the reason of tracking vendor security. I also realize not everybody can package everything and do like to vendor deps (or similarly use language specific package managers often installed in arbitrary locations) or put them together however (random internet tarballs), and this ironically is why things like Docker also exist too.
The immutable systems movement is good, but something to clean up security practices would be a huge plus to avoid the comparisons to regression back to "golden images". Using random base images vs distro base images makes it worse, but using stale distro images is itself a thing.
However, merely having some packages with vulnerabilities may not be enough. E.g. you have security in package manager (apt), but you never use it after building the image. Or even shellshock is no flyer, if you don't use CGI scripts and don't have ssh access.
In Virtual Machines this problem also exists. I guess it is more about how often you update your software than Docker itself.
Gotta love those security experts that your company hires when they say to you "your app has a security issue right here" and I say "alright then prove it, hack it, let's see if there really is a security issue" and they can't do it.
If I don't want to worry about deployment, there's Heroku. If I don't want to worry about testing, there's Circle CI. If I don't want to worry about scaling, there's AWS EC2. If I don't want to worry about security, there's... nothing. Because it's not a real product. At least not real in the way databases, deployment, testing and scaling are.
So when people say "programmers don't care about security" I honestly don't understand what they mean since I've never seen a secure app. It's like there's this mob of believers that want to convince you security is the salvation. OK, teach me by showing. Show me a bunch of secure apps and we'll learn from it. But those don't exist, so no one ever learns, but that doesn't keep "security experts" from blaming programmers building real things in the real world for not caring about their imaginary friend.
I'll believe security experts care when they create a service and sell it for money to people like me.
Bank: So what? Nobody knows about those tunnels.
Security Guy: But someone who finds them, like me, but with less morals, could rob you.
Bank: Prove it. Rob the vault.
Security Guy: ..... ?
Finding a vulnerability isn't the same thing as exploiting one, and a lack of exploitation doesn't imply a lack of vulnerability. You also have to consider that a small portion of vulnerabilities are actually exploitable, but it's a very hard problem to find out which ones are and which ones aren't. Exploiting a single vulnerability is typically harder, in fact, than patching a dozen of them (for example, you can easily start using a secure version of strcpy(), but exploiting it requires an attacker to smash the stack or ROP their way into full execution).
The bottom line is that you're not only naive if you believe what you just said, but you're doing a huge disservice to anybody who uses any code that you may write.
Why does that never happen? Why are security experts always consultants and they never have a product to sell?
Naive is a person that thinks just because they are a security expert, programmers will care. No amount of shaming will change that. If you're a security expert your job is to make this so easy that I almost don't think about it. Like I almost don't think about databases, deployment, testing, scaling. Getting on your high horse and begging programmers changes nothing.
Just look at RSpec. All of a sudden everyone wants to write tests because it's fun and easy and looks sort of like English. Now we don't have to care much about tests, we just write them and RSpec runs them, collects and reports errors, formats them nicely, tells me the path and the line number where each error occurred, etc. Now imagine you're a "testing expert" and there's no RSpec and you keep yelling at programmers to change their ways, to write and maintain tests, and so on. No one would do it (like few did before the recent craze). So please, learn from that lesson, round up some peers, and contribute to your damn field by letting me forget about it.
HOW DOES THAT MAKE ANY SENSE?!?!
Like it or not, we're stuck on Von Neumann architecture, and as a result, data can be treated as code and vice-versa. The consequence of this is that, under certain circumstances, data can be carefully crafted to act as code, and can be executed in an unforeseen context. As a software engineer, it is your job to take precautions when developing software. Precautions that prevent this execution. Security people do the best they can to make it easy to develop safely, but all of that is useless if the developers ignore it. And, because security vulnerabilities are a manipulation of context-and-program-specific control flow, there's not a way to encapsulate all security measures in a way that is transparent. It's just not possible. Only developers know the specifics of their software, and only developers can protect certain edge cases. If you assert otherwise, you have a fundamental misunderstanding of the systems that you work with, and you need to re-evaluate your education before continuing to work in the industry (assuming you do). This isn't an opinion. This is a fact.
Lastly, us "security experts" do contribute to our field. Security is one of the hard problems in computer science - far harder than whatever you're doing that lets you "not think about databases, deployment, testing, scaling" - and there's a lot of solutions that have been engineered to deal with software that has been created by people like you. There's static code analysis tools, which can detect bugs in code before it is even compiled. There's memory analyzers that can detect dozens of different classes of memory-related bugs by just watching your software run. There's memory allocators and garbage collectors that can prevent issues with use-after-free and other heap-related exploitation bugs at run-time. There's data execution prevention and buffer execution prevention that, at run-time, help prevent code from being executed from data pages. There's EMET and other real-time exploit detection tools that exist outside of your software and can still prevent exploitation. That's not even an exhaustive list. There are literally hundreds of tools out there that make finding and fixing security bugs easy, but those tools can't patch your code for you. That's why there are consultants, code auditors, and penetration-testers that can give advice on how to fix bugs, find bugs where automated tools fail, and even coach developers into writing more secure code; because having smart, security aware developers is one of the major ways to defend against security bugs.
On other people's software as well? Why was it not PostgreSQL's (random example) job to make sure their software rejects invalid input? All it would take is for them to use a typed language (given that the type system in Haskell, for instance, is enough to prevent SQL injection). So tell me, when does it become my job to patch whatever database code I choose because no database ever has concerned itself (it seems) with solving this for everyone else in one fell swoop (so we didn't have to think about it anymore for all these decades of dealing with SQL injection in every language that implements a database driver)?
Before the first million programmers had to write the same damn code to clean the input to give to these databases, the database coder should have fixed it themselves. But you weren't there to chastise him so we didn't get it.
Maybe the "mere mortal" programmers like me would be more excited about security if the industry standard software was also secure (we would want to mimic it, and keep it all secure, and not introduce security problems). No security expert has fixed the SQL injection problem where it should be fixed, but they do charge by the hour to fix it in every company that uses a database.
query = "SELECT * FROM USERS WHERE NAME = '" + userinput + "'";
' OR 1=1--
query = "SELECT * FROM USERS WHERE NAME = @:USER";
statement = prepare(query, "USER", userinput)
Also, just to be pedantic, I'll point out that a type system wouldn't change how SQL injects currently work, lol, no clue how you think that's the case, but I wouldn't put it past you at this point.
Just to be pedantic, I'll point out that maybe your C and C++ "type" system wouldn't change how SQL injects currently work, lol, but the one I use can avoid not just SQL injection but XSS attacks: http://www.yesodweb.com/page/about
I'll say it again, you're wasting your time staying in that small rickety photocopy room called C/C++. But I wouldn't put it past you at this point. Whatever that means, hahah.
And I never said anything about any C/C++ type system doing anything? But okay.
Back to the topic: if you've heard of them, why did you insist that SQL is inherently insecure? Did you forget they existed, or did you just think I wouldn't notice? Are you that cocky?
I really hope your employer one day recognizes your incompetence and fires you, because the software world is plagued with enough bugs without people like you purposely and gladly laying out a red carpet for them to walk in on. I can't continue to argue with what is either a relentless geyser of misinformation or a brilliant troll, so I'm done. Maybe one day you'll come to your senses, but I doubt it.
Well guess what, you don't need pre-compiled statements to benefit from this feature - all you need is the hoisting aspect of it. In other words, if SQL drivers did not offer the unsafe function exec_query that takes the whole query as a string and returns a result, and instead they only exposed a hoisted version of that function that takes a list of arguments and a placeholder query as a string...
exec_query ["john", 12] "SELECT ... WHERE... = $1 AND ... = $2"
So if only SQL database drivers did not offer exec_query but instead forced the user to provide the whole query string in one go with placeholders, then the driver would be able to enforce security at the proper software layer - which is not everyone's program that interacts with a database.
The flag might make sense on a new vulnerability, and it could be applied automatically. Imagine [Tag: Heartbleed - Untested] when the vulnerability happened, then as the automated process rolls through the images [Tag: Heartbleed - vulnerable] [Tag: Heartbleed - no vulnerability detected]. Future images are required to pass first.
We have to be careful with widely distributed images.
You have to be a little bit careful when it comes to version numbers and matching them to security issues. Most linux distributions for example apply security patches to older releases.
E.g. Ubuntu 14.04LTS comes with Apache 2.4.7-1ubuntu4.4 which one might parse as 2.4.7 which has multiple security issues.
The article references to distribution specific vulnerability ratings so I assume they als matched those versions correctly.
Ultimately keeping your OS completely up to date is on you, not Docker, not Amazon, you. VM's suffer from the exact same problems as Docker containers.
Edit: Also, security issues with using community AMIs are already well known, should be no surprise the same applies to Docker community images.
I thought it was obvious that public images on Docker Hub were to be used for experimentation only--even in that case I only use the "official" Docker images in the library namespace. Anyone using Docker for serious purposes should build their own or at least vet the pre-built images.
Docker IMO creates a "never touch a running system" attitude. The "running system" in this case is the docker image which nobody dares touching after the developer has left the company. (or the developer themselves have no idea anymore what it contained 3 weeks later)
Also the overhead of setting up containers in a secure way is even more work than not using docker in the first place (ever had too look seriously into SElinux? not something you do casually on the side as it's massively complex).
So the justification that "by using docker we save time on deployment" is a farce. I guess it creates new jobs though for container specialists.
to paraphrase Theo de Raadt:
‟You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can’t write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes.”
EDIT: is it still possible in Docker/LXD to access /proc/sys/kernel/panic or /sys/class/thermal/cooling_device0/cur_state ? And how about consuming all the entropy of the host via /dev/random ?
Looking at the top vulnerability CVE-2014-9462 in mercurial.
It affects mercurial clients that access crafted repositories as far as I understand.
Even if I use mercurial in my Docker image to get my app and not prepackage it (what I do), and I know this is about public images, how is this "high" vulnerability? I don't deny it's one I would just like to learn why it is classified high if e.g. I use Docker for my HAProxy.