Hacker News new | past | comments | ask | show | jobs | submit login
Docker Hub Hacked – 190k accounts, GitHub tokens revoked, builds disabled
1146 points by lugg on April 27, 2019 | hide | past | favorite | 257 comments
Received this email a few minutes ago:

"On Thursday, April 25th, 2019, we discovered unauthorized access to a single Hub database storing a subset of non-financial user data. Upon discovery, we acted quickly to intervene and secure the site.

We want to update you on what we've learned from our ongoing investigation, including which Hub accounts are impacted, and what actions users should take.

Here is what we’ve learned:

During a brief period of unauthorized access to a Docker Hub database, sensitive data from approximately 190,000 accounts may have been exposed (less than 5% of Hub users). Data includes usernames and hashed passwords for a small percentage of these users, as well as Github and Bitbucket tokens for Docker autobuilds.

Actions to Take:

- We are asking users to change their password on Docker Hub and any other accounts that shared this password.

- For users with autobuilds that may have been impacted, we have revoked GitHub tokens and access keys, and ask that you reconnect to your repositories and check security logs to see if any unexpected actions have taken place.

- You may view security actions on your GitHub or BitBucket accounts to see if any unexpected access has occurred over the past 24 hours -see https://help.github.com/en/articles/reviewing-your-security-log and https://bitbucket.org/blog/new-audit-logs-give-you-the-who-what-when-and-where

- This may affect your ongoing builds from our Automated build service. You may need to unlink and then relink your Github and Bitbucket source provider as described in https://docs.docker.com/docker-hub/builds/link-source/

We are enhancing our overall security processes and reviewing our policies. Additional monitoring tools are now in place.

Our investigation is still ongoing, and we will share more information as it becomes available.

Thank you,

Kent Lamb Director of Docker Support info@docker.com"

What permissions did the leaked tokens have?

If they had write access, then leaked personal data is the least of anyone's worries. The real concern is how close the hackers came to infiltrating the image source for virtually every modern microservices system. If you could put a malicious image in say alpine:latest for even a minute, there's no telling how many compromised images would have been built using the base in that time.

Yes, huge poisoning target enhanced by the fact images/tags are not immutable, you really have no idea what you are fetching straight from dockerhub, one pull of the same image/tag may be different to the next pull. Most people blindly fetch without verifying regardless with multiple images of varying quality for software packages.

tag are not immutable, but images (manifests) are. Much like git commit vs branches/tags. That is why best practice is to resolve docker image tag into "@sha256:..." digest and pull that, instead of tag. It guarantees that image you are pulling stays byte to byte the same.

How could one verify ?

You can't. Not without end-to-end integrity with nonrepudiation. Checksums aren't anywhere near enough. But that's Docker.. security optional and run random, untrusted code from the internet.

And Docker has a signing system but it's only enabled for the official-library builds! So all user images are completely unsigned despite all of the discussions of how secure the notary project might be.

And even if you hosted your own distribution and notary (like we now do for openSUSE and SUSE), you can't force Docker to check the signatures of all images from that server!

Only docker.io/library/* has enforced image signing and the only other option is to globally enforce image signing which means "docker build" will result in unusable images out-of-the-box.

If you look at something like OBS (the Open Build Service that openSUSE and SUSE use to provide packages as well as host user repos), the signing story is far better (and OBS was written a very long time ago). All packages are signed without exception, and each project gets it's own key that is managed and rotated by OBS. zypper will scream very loudly if you try to install a package that is unsigned or the key for a repo changes without the corresponding rollover setup. And keys are associated with projects so a valid rpm from a different project will also produce a warning and resolution prompt. That's how the Docker Hub should've been designed from the start.

(Disclaimer: I work for SUSE, on containers funnily enough.)

The company I work for, Sylabs, is taking what I think to be a pretty great approach to solving this problem. Essentially we've introduced a container image format where the actual runtime filesystem can be cryptographically signed (you can read about that here: https://www.sylabs.io/2018/03/sif-containing-your-containers...). The Singularity container runtime we develop treats this concept of "end-to-end integrity" as a core philosophy. Our docker hub analogue, the container Library, is working to make cryptographic signing one of the fundamentals of a container workflow. We're also actively working on container image encryption, which I think will bump container integrity up a few notches.

> Checksums aren't anywhere near enough.

Why not?

A checksum’s typical use is to detect transmission errors. A cryptographically secure signature is what’s needed.

It uses SHA-256 right? My understanding is that there isn't yet a workable collision attack on the SHA-2 family.

Regardless, I think it's certainly an excellent hardening step.

Infosec in 2019: The server I download code from telling me the hash of said code is "certainly an excellent hardening step".

Well I was approaching it from the point of view that you verify the image is correct and then you guarantee you'll always use that image and not some other version given that tags are mutable.

Can't the hash be verified by the client too?

Sure, but who tells the client what the correct hash is?

If you're the entity who created the image you can retain the original hash and verify it against the downloaded copies. But that kind of defeats the purpose of being able to download docker images across distributed hosts.

They'd really need to be signatures attached to the images, not just hashes.

Why do you need a collision? If you control the build, you control the sha-256 hash. But you can't sign it with a key that you don't have.

A hash only provides integrity. A signature provides integirty and authentication.

Integrity is all you need as long as you have verified the original image that you have saved the hash for.

Is your argument that you only need integrity if you verified the authenticity out of band?

No, I'm saying you only need integrity to validate you are getting the same thing each time. If I checked and made sure an image is safe, then I can save that hash and know that as long as the has matches, I'm always getting that same safe image.

This is useless without authentication though. You're opening yourself up to attacks on the first retrieve. Sure, you can make sure you're getting the file they want you to have, but you don't know _who_ is giving you that file.

If I hard-code the checksum, then the base image can't be tampered with at least.

You can tamper with data protected by checksums: they are not designed to be irreversible, just fast to calculate and good at detecting errors, not deliberate manipulations.

Use proper cryptography and don't roll your own!

Wouldn't that mean you need to find a collision?

There's a good chance that someone who can modify your base image can also modify the checksum you're showing to whatever is the new checksum.

For example, when Linux Mint's ISOs were briefly backdoored, the attackers also changed the checksum shown on the website: https://www.zdnet.com/article/hacker-hundreds-were-tricked-i...

But that's not the point here. The point is that you choose an image and verify that it is safe and then pin the hash. So I can pull that hash a thousand times over from whatever source I want and I can be sure it is always the same image that 8 originally verified. I don't care who has it or who now has control over the site because if the image doesn't match the hash, then it isn't mine.

I think yall are using the terminology differently from each other in this thread. "Checksum" historically did not imply resilience against intentional modifications.

Nowadays, it's arguably a best-practice when designing a new protocol or storage format to simply make all checksums cryptographically strong unless there's a reason not to. I think that might be where the confusion is coming from.

You're right, I confused checksum with hash.

The issue is, how do you verify the checksum you are using is valid. If you obtain the checksum from the same place you get the image, then an attacker can simply calculate a new checksum for the malicious image and publish it too.

I guess if you were really sure you had obtained a checksum prior to the service compromise, then that would give reasonable assurance the image was not tampered with.

Checksums/fingerprints can help mitigating the problem of _changing_ images people already use. As you correctly point out they don't solve the problem of authenticated distribution.

Assuming you have fetched a given image and captured its sha in a config file in your version control (e.g. a kubetnetes manifest), then whenever you deploy a container you are sure that you're not affected by exploits happening _after_ you saved the fingerprint.

You create the docker image on your local computer, create checksum and write it down / remember it. Then you just use this checksum when downloading the image from other computers to check it's the same one. This only works for images created and uploaded by you of course, for images created by other people it does not work.

Second preimage mostly, which is harder than collision with most common algorithms (even MD5 might still be secure for this, not that anyone should use it for anything at this point). Collision resistance is only important if someone can create two versions at the same time that have the same hash and wants to quietly switch them later.

Using SHA-256 as you describe works well and is widely used by package systems to ensure that source code being built does not change. Signatures can be better for determining if the initial image is valid if the signing is done away from the distribution infrastructure since development machines might be more secure than distribution infrastructure (and if not you will have problems either way). You still need to get the correct public key to start with. However, if you do have a good image by luck or actually secure enough distribution infrastructure then SHA-256 will certainly let you know that you get the same one later. Signatures almost always sign a cryptographic hash of the message and not the message itself.

I think his point is that for some checksums it could be trivial (and for some, tools already exist). Checksums aren't designed for this, while on the other hand secure hashing is. As a result, authors of hashing algorithms often attempt to mathematically prove their strength and resistance to a collision.

Docker uses SHA256 though, for which it isn't trivial.

Yes. The previous comments were about checksums, which SHA256 is not.

that's most modern programming languages too

Docker Hub's Github integration requires both read and write access for private repositories. In fact, if you want it to set up deploy keys I think it requires Admin access.

See this issue: https://github.com/docker/hub-feedback/issues/873

Maybe some day we'll get serious about reproducible builds, since reproducibility can serve as a layer of defense against such compromises.

Maybe I'm missing something, but reproducible builds wouldn't be that helpful here with write access to the source repo, no?

Definitely wouldn't have helped prevent the compromise.

True, but they allow you to find out whether any Docker images have actually been compromised.

Debian is already using reproducible builds.

I'd like to mention that Docker recently changed their automated builds to require giving them access to GitHub instead of just using a webhook. Glad I disabled access but no telling how long this was undiscovered.

Interesting. What’s their rationale?

I don't know for sure, but I would image it has something to do with wanting to make a unified solution with how to manage things, but I see a lot of great options, such as setting up a free GitLab pipeline to build and push your image. You don't even have to use Docker with kaniko, if you want a Kubernetes-native image builder and there are great registries that can be deployed in Kubernetes like Harbor, with automated security scanning. This can all be done in GitLab as well with paid features. I also recommend checking out building and deploying rootless containers for builds.

Pretty sure (don't quote me) those are read only and repo specific but that could contain all sorts of juicy info depending how lax you are with security of configs in private repos.

Even then just read access to code often allows enough info for leveraging/escalating privilege.

When you connect your Github account to Docker Hub, that will give DH full access to all repos (https://i.imgur.com/4jJWrez.png). I'm not even sure if Github's permission model supports adding only read access to private repositories.

I'm not 100% sure if Docker hub uses deploy keys for repos it has access to thru the integration, but at least previously there was an option to manually add one to repository if it couldn't access it otherwise.

> I'm not even sure if Github's permission model supports adding only read access to private repositories.

Their newer GH apps permission model allows fine-grained access to only specific repos (and also only read access e.g.). However their older Oauth flow only allows full access to everything. And 99% of GH integrations still seem to use the older authentication method.

This is also something that many CI providers suffer from. There are only few that already support GH apps.

I tried to give an user read-only access to a private repository on GitHub a few weeks ago, and from what I could tell it isn't possible.

Did you look under Settings > Collaborators?

Yep, it's only possible to give full access there. No read only.

Docker Hub being hacked was basically just a question of time.

With how much of the internet blindly pulls images from it, the potential gain from hijacking just one high-profile one would be monumental.

Hacking aside, Docker is an invitation to trouble. Anybody can publish a binary blob, and users are expected to blindly trust it. It's centralized. It doesn't have a context of "trustworthiness" yet I don't recall docker ever warning me that the image I'm downloading could have been the work of any person.

Shortcuts all around -- kind of reminds me of MongoDB. Sad it's the primary player...

> Docker is an invitation to trouble. Anybody can publish a binary blob, and users are expected to blindly trust it

How is this issue specific to Docker? Anyone can download a random library off github, use a shady linux distribution, or install utility tools loaded with spyware.

I don't think Docker aims to solve issues relating to trusting upstream software. It's a tool to help package applications, just like how tar allows you to package files. What you put in it is up to you.

That's a strawman because the shriekingly-obvious difference is ease of facilitation. With Docker, it's a mere couple of commands to pull a box and run it. Docker weakens trust because it lets anonymous people, as well as trusted ones, upload images that can be immediately run... but without a proper chain-of-custody, QA or assurance that an image hasn't been manipulated on Docker's side. It's spray-and-pray DevOps. Image integrity has to be solved on Docker's side with end-to-end integrity or it's all for naught... this is something that cannot be solved within a container or separate from Docker, it must be universal, mandatory and trustworthy.

> Docker weakens trust because it lets anonymous people ... upload images that can be immediately run... but without a proper chain-of-custody, QA or assurance that an image hasn't been manipulated

How is this github any different?

Many package managers that support git as source allow to pin to a specific commit sha. That's as far as I can see a quite secure way to keep using an uncompromised/verified version. It's not the most popular feature but people do it every now and then, probably it should be done more.

I wonder if docker allows this and on the other hand if that's even feasible for say application images, given that applications must be updated a lot for security reasons. Of course if the Dockerfile's parent reference is not pinned, that does only help to some degree...

You can pull an image using the sha:

docker pull ubuntu@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2

Which effectively nobody does. Package managers and distribution packaging systems default to the safe method rather then defaulting to insecure rewritable tags.

To be fair, the docker.io/library/* images are signed but no other images are and there are a bunch of issues with how the signing policies work for users that want to enforce that some images must be signed.

The important thing is that tags are signed and up-to-date, like how git tags work or how Debian signs its entire repository as a unit (via the Release file) rather than having developers just sign individual packages. Otherwise, even if it's signed, it's subject to downgrade attacks.

Installing known-vulnerable old versions of legitimate software can be just as bad as installing custom malware.

Sure, that's how almost all package managers work. I can't think of a modern package manager from an "enterprise" distribution that didn't have a lot of the features of TUF[+].

And as I said, only official-library Docker images are signed. All other images are unsigned and even for third-party repos you can't force Docker to verify all images from a given repo (you have to enable it globally, which breaks the utility of a local "docker build").

[+] Arch is the only counterexample I can think of and I'm not even sure if my memory is correct.

I do it! Everything I pull is pinned with sha256 since I use Nix/Kubenix, so I'm required to pin sha256 if I'm fetching from the Docker registry (or build the package deterministically myself.)

The way image signing works with Docker is that there is a signature tying a tag to a sha256. If you use the sha256 directly you get immutable sources, but now your source isn't signed anymore -- how are you sure the hash is correct?

It's a bit of a pain, you need to build, push, pull, then get the sha. I suspect it would be done more if there was actually a decent UX for it.

> How is this issue specific to Docker? Anyone can download a random library off github, use a shady linux distribution

Strawman. Anyone can use Debian Stable or at least Testing.

Libraries off Github literally have the source available for you and the community at large to vet. And you'll find almost no sane shop on the planet where people are allowed, hell encouraged to use shady distros or install random utility tools in production the way they are encouraged to pull unchecked binary blobs from Docker Hub in an often non-reproducible manner.

> Libraries off Github literally have the source available for you and the community at large to vet.

Nobody read the source code for this exact reason: “the community is here to read it so I won’t".

Well, for most libraries used in Desktop Linux, a significant number of stakeholders (developers+users) exist, which actually care for development and the complete thing itself. Also the libraries generally are designed for solving problems and not getting github-stars by bots/dependency-building.

For docker (and npm for all that matters) _a lot_ of important dependencies are basically simple one-off "developments" with a single developer and no userbase at all caring for them, because they don't really solve any consistent problem, being basically just created to increase the visibility of its creator on primitive metrics. The community is there for high-level packages, but the dependencies lurk in test-scripts and seldom-used functions carefully placed by some idiotic digital nomads for their personal CV-polishment (ehm, not looking at you: https://github.com/sindresorhus/shebang-regex). Have a look at where this package is used (basically only in cross-spawn, where there are 10 other similar dependencies), then think about, how much effort creating the dependency hierarchy was, then look up who contributed the changes, where this micro-package was required and finally decide whether this was some thing sane people would do or if it's just for personal gain...

> Nobody read the source code

In Debian we review and vet packages.

Speak for yourself, please - there are plenty of places that aren't into cowboy coding.

> Libraries off Github literally have the source available for you and the community at large to vet.

If Github was compromised, it would be easy and obvious to insert malicious code in a repository, but hide those changes from anyone on the github website.

Which you can avoid by forking the mainline repo and depending on your fork.

Images on Docker Hub don't even need to share their Dockerfile, to talk of all the source/etc that went into their build.

>> If Github was compromised, it would be easy and obvious to insert malicious code in a repository

> Which you can avoid by forking the mainline repo and depending on your fork.

If github was compromised, the it would be pretty easy to generate forks with the same compromised code.

> It's centralized

Actually only short names go to docker hub, one can setup their own registry and use it via dns names.

Example: docker pull quay.io/letsencrypt/letsencrypt

Exactly this. For the docker images we use in production, we fork the corresponding git repo, build our own image and push it to our own local docker registry and pull it from there. It's fairly easy to setup in fact.

Out of curiosity do you resolve it so that the image is FROM scratch or do you rely on alpine/some other base image?

I forked an ubuntu image and then used it as a base for all my projects. It doesn't come for free though, you will need to periodically run security updates and then rebuild all images that depend on it.

Not docker, but library hosting, in general. My company maintains client libraries for 6 languages, also hosted in the popular place for that language. The standards for account management and authenticating the libraries are all different. Some have scary-little security, some have painful security.

Agreed 100%. It's insane the practices that fly in our industry. It's as if we didn't know any better.

One day, there's going to be a colossal compromise, and that might finally change where we place security in the priority chain.

But it was possible to see this trouble coming from way back. Docker, Inc. took on over $150m of VC investment up to the end of 2015. One hundred and fifty million dollars. How do you possibly plan to show a return on such an investment? The only way is to get one of your "services" injected into peoples pipelines as a critical component, no matter how questionable the fundamental necessity of that service is. But of course, you own the tool, you get to design the workflow and do your best to shape your users worldview.

I do wish developers would be a little wiser around these things, especially when they see companies taking such huge amounts of capital. I found it quite depressing to watch the unquestioning way development communities assimilated the docker worldview.

I was originally going to argue with it being "just a matter of time" -- there is such a thing as good security practices. It's certainly not "just a matter of time" before Microsoft or Google see such compromises. I'm pretty confident that these companies have their sh*t in order.

But no, not Docker.

You're totally right; with as important as their registry is to well funded attackers, and as startup-y and "agile" as they are, and as godawful as the security practices are that underlie their tools and standards... they hadn't a chance. They still don't.

There is no reason to expect them to get better.

Fun fact, there was a universal XSS vulnerability on google (including search, support, accounts, cloud, etc) found just last week [0]. I'd say it's always just a matter of time. That doesn't mean they don't have everything in order, but securing everything as much as possible is half the battle. The other half is a solid response when things do happen, which we will now see in how Docker handles this situation.

[0] https://twitter.com/WHHackersBR/status/1118393568656334850

And do we ever find out how much that was being exploited "in the wild"?

You'd want the XSS vulnerability to be on accounts.google.com. Much more to do before you can successfully exploit it. You still need to get people to come to your malicious page that exploits it. Then it's the question if your attack won't show up on Google's radars for abnormal behavior. Most likely for Google's security - since their landscape is so big - XSS vulnerabilities are considered a given. Then as soon as abnormal behavior is detected Google gets to discover the XSS vulnerability.

we grit our teeth and "believe" that anyone traceably affected got an email directly from the company or something :D

(that said, google main page vulnerable to xss is kind of like... what, we're afraid someone will take over google and put some cryptominers on the google.com main page?)

Well, a compromised google.com main page could return malicious search results for certain queries. How many Windows sysadmins install PuTTY by googling "putty", and then installing an executable from whatever site shows up in the first couple of results?...

If the primary install method is "search and download whatever manually from the internet," you have bigger issues than a potential Google compromise: create a site with better ranking than the canonical HTTP (!) download page, MITM the HTTP download, whatever.

The Microsoft Approach... 'people totally didn't access your email body... except we eventually owned up to it after it got leaked'

Where did they deny that anybody's email bodies were read? I'm looking for it and I can't find it. I only see that they told the other 94%(?) of people that unauthorized access did not reveal the contents of their messages in particular, which seems to be truthful?

Initial email said the body wasn't affected, and motherboard asked for a confirmation, so they said 'Yes'.

6% of the people received a specific email saying the body of their email was accessed and they had to backtrack.

Well the email said:

> This unauthorized access could have allowed unauthorized parties to access and/or view information related to your email account (such as your e-mail address, folder names, the subject lines of e-mails, and the names of other e-mail addresses you communicate with), but not the content of any e-mails or attachments, between January 1st 2019 and March 28th 2019.

Notice it says your email account. The whole email is about the account of the recipient, not those of other recipients. Given that they explicitly worded it this way and people clearly misinterpreted it to mean something else, I hope you can forgive me for being a little skeptical of third-party anecdotes that suggest Microsoft claimed nobody's email contents were accessed...

“We are enhancing our overall security processes and reviewing our policies. Additional monitoring tools are now in place.”

Why wasn’t that the case before?!

Because "Security is a journey, not a destination".

There's always a way to enhance your processes, monitor more indicators, etc. or otherwise improve your security.

Sometimes you don't know what to monitor until you know the attack vector. We are only human

1. Because humans aren’t perfect. 2. Because mistakes happen. 3. Because there’s a cost to everything: if you want better security, it’s going to cost you more, immediately. And we don’t always estimate trade-offs correctly (see points 1 and 2).

Exactly.. life in general is about constant refinement.. if today's hacker could time travel to 1999, she would be in a nirvana of Bind, SSHv1, Apache, IIS etc.. vulnerabilities. Hacks happen and we learn and improve, even down to the language being used.. a la Rust.

You are aware that Google identified a vulnerability so awful that they hid it from the public so as not to draw government scrutiny, did not retain access logs, and ultimately shut down a major public application?

It wasn't authentication credentials, but still.

Which vulnerability was this?

Presumably the Google+ exfiltration issue.

> The bigger problem for Google isn’t the crime, but the cover-up. The vulnerability was fixed in March, but Google didn’t come clean until seven months later when The Wall Street Journal got hold of some of the memos discussing the bug. The company seems to know it messed up — why else nuke an entire social network off the map? — but there’s real confusion about exactly what went wrong and when, a confusion that plays into deeper issues in how tech deals with this kind of privacy slip.


it is of course just a matter of time for either of the companies you mentioned to "be hacked" (obviously it's happened countless times with Microsoft, both the OS and their cloud services like O365, and there was a recent high profile revelation that the google apps suite APIs exposed user info to developers). the difference is incident response and layered security.

as long as you're using software somewhere in the stack that isn't like maturity level 5, AND you don't have constant audits looking for novel attacks on working-as-intended systems, you're pretty much guaranteed to inherit (or create) a vulnerability at some point, and if you're important enough it will get exploited. the reason that doesn't mean we should start modeling computer systems as "living organisms that eventually get old and die" and should keep modeling security like war is that when you get hit, you can respond. all the layers matter, and insofar as Microsoft or Google do it right, they primarily do it right by having a mature process for monitoring, patching, isolating, etc.

as for docker hub though, yeah i'm totally with you. i'm just saying we shouldn't overestimate the preventive capacity of anyone, honestly. if you're doing anything important over the internet at all, you're making some compromises somewhere.

here are 2 links to things i handwaved at above, for example's sake:



It's made worse by the fact that only a few major images are used as bases. That's normally good for security, as they are highly vetted and quickly updated, but if they could be compromised, say Alpine or Ubuntu Clould, even for a minute, countless images would be built using the compromised base and it would be very hard to ensure they were all rebuilt.

As I understand it, there's no element of signing from the actual devs of an image, just from the central trust service of Docker Hub.

I don’t think Docker has a way to revoke individual image hashes. Or does it?

That doesn't mean there aren't plenty of things they could have done to make this more secure. The fact that you can just `docker login` with the same credentials that allow access to your entire registry is pretty poor security design IMO.

Is hacking even needed?

There already have been questionable images hosted there ... just by users uploading compromised images. No hacking needed.

There's a difference between alpineworm:latest and alpine:latest. Someone would have to choose to download the questionable image, while someone compromising a base image could go unnoticed for quite some time and have a massive install base since it's used in so many other images.

This raises the question of whether any high profile images were targetted by the infiltrators?

Do they offer end to end encryption and signing. That would make your ci say f-off it the image has been played around with, and also protect any secrets, although there is no need to have source code or passwords in the image anyway.

Their hub website is pretty bad. I tried changing the password and the website came back with an error: Failed to save password. Interesting, so I tried again. This time it said: Current password is incorrect.

I thought, maybe I need to log out and try if the new password works. I clicked on Log Out link, the website has refreshed and I was still logged in.

Yep - same here. :( It changes it, but reports error...

yeh that happened to me when i rotated the password on our master docker hub (or cloud or whatever it is today) account prior to all of this.

Password reset works

Same here.

same issue.

Why am I being asked to change my password? Why haven't they just invalidated it for me already? I'm astounded I was still able to login with my existing password.

It looks like they have sent emails to everyone, not just the 5% affected.

I haven't received an e-mail, I've got multiple docker-hub accounts.

I haven't received one yet either.

Neither have I. I manage over 30 images on dockerhub. Maybe this means they are certain my data was not in the data that was leaked but I'm not sure how they'd be certain of that.

They did just post the notice in a banner at the top of https://hub.docker.com


Well, this is pretty disappointing. Docker doesn’t let you install it without an account, so I registered and used it for maybe a day in all. And poof, there goes my account data.

I’m just hoping that I was using a password manager by then.

Any word as to the cause of this? Was something important stored in plaintext, etc.?

You can brew cask install it on a Mac without an account

> Well, this is pretty disappointing. Docker doesn’t let you install it without an account, so I registered and used it for maybe a day in all. And poof, there goes my account data.

Eh? Doesn’t let you use what without an account?

Anyone can pull images anonymously. An account is only for publishing.

Downloading Docker CE for mac or windows requires an account.


I downloaded it last week without an account. It involved one of those non-obvious skip button dark patterns.

True, but they deliberately obfuscate that to get people to sign up. Not a great look.

This is the opposite of what it actually should be. All the startups of the world, please ask least amount of personal information or none at all -- for all we know these things are bound to happen.

Installing Docker for Mac/Windows has required users to login for awhile now.

It doesn’t require one to use it. I use it daily on MacOS and I don’t have an account.

It requires one to download the installer: https://hub.docker.com/editions/community/docker-ce-desktop-...

Notice the big "Please Login to Download" button.

True, they make it look required there, but you can use one of the direct download links instead (eg linked from https://docs.docker.com/docker-for-mac/release-notes/) or use Homebrew to install it.

The problem is more that they make it harder to find if you don’t log in, which is really not great. But if you don’t want to create an account there is certainly no need to do so.

They say "accessed database," so I'm thinking SQLi.

SQLi that managed to access only a single shard though? Hm.

It sounds more like a developer environment got exposed with prod data on it.

This going by the way it's worded "single hub database with a subset of non financial data"

Yeah, I got that vibe too.

Not a huge surprise. Here's another security issue with Docker Hub they've let sit for 4 years with no action: https://github.com/docker/hub-feedback/issues/590 (which is apparently a dupe of https://github.com/docker/hub-feedback/issues/260).

I've seen some failed attempts to log on to my GitHub account from 'Quito, Provincia de Pichincha, Ecuador' (which is quite far from where I live, as I live in Sweden...). Not sure this is related at all, but they started appear after this leak was announced...

Luckily I use both 2fa and random password for github, would suck to loose that account ;)

I wonder if that will encourage them to finally resolve this issue: https://github.com/docker/docker.github.io/issues/6910

Or fix this 4 year old issue where you cant use 2FA for accounts https://github.com/docker/hub-feedback/issues/358

(Side note: this obviously wouldn't have prevented the current attack)

From the same company that tried to force people to login before downloading Docker CE.

Official Article from Docker (Same Text as the email): https://success.docker.com/article/docker-hub-user-notificat...


That's a nice summary. One thing I'm curious about is:

> Data includes usernames and hashed passwords

How are they hashed? And specifically, can we expect them to be already cracked?

Yes, in particular we need to know algorithm, work factor and salting details to know whether or not the passwords may be compromised.

Just assume that it's compromised and generate a new one. There is no point in wasting time trying to estimate how long it might take someone to crack it.

It matters at lower extreme. If it was something trivial and people shared the password with another account, then they may be already compromised. If it was hard and salted per-user, they still have to change it, but the chance of compromise on other services is significantly lower.

It may also explain some suspicious behaviour / source of compromise in the past (we know when the issue was uncovered, not when the first dump was taken)

Knowing the hash algorithm, work factor and salting details would be helpful in knowing whether or not passwords may be compromised. This should be standard information given in a breach, rather than just whether passwords were hashed.

Though, as they say that passwords need changing, we can safely assume that their salting, hashing and work factor were insufficient and not following best practice. Just like the lack of 2FA.

Eh, if hashes leaked I would still suggest changing passwords no matter the crypto practices involved. If you change the password, the hash is useless. If you don't, it's sill an attack vector, even if a technically impractical one (for now)

Just wondering, genuinely out of curiosity - how does one get to this 5% number? If the attacker had access to the DB s/he had access to 100% user data right?

Or did the get access to a partition of the user data? How is this even possible?

Some very old backup that had only 5% of earliest users?

Some log file which had plain-text creds of approx 5% users?

Or did they discover the attack as it was happening and kicked-out the attacker in the middle of a data download (only 5% complete)?

Their data can be sharded whereas only a part of their databases got compromised. Or it could be a cache layer that got compromised. Or a partial user dump intended for something else that somehow ended up in the wrong hands. I guess there could be a lot of reasonable explanations.

same feelings here. On what basis they are predicting 5% ?

A differential backup file would be my guess.

Why can’t these emails just come out and say it: “your account was affected”. It’s always implicit.

Also, why rely on users to change their passwords? Is there a security log I can check?

Should they change your password for you? How do they communicate it securely then? Over unencrypted email, whose password may or may not be the same of your just-compromised docker account?

They could invalidate the passwords making you use a 'forgot password' link to enter a new password instead of keeping the old compromised ones :)

Imagine the impact if NPM got hacked instead of Docker Hub. People would go crazy, run the streets like monkeys and yelling why NPM is untrustworthy must be boycotted. Last time one user got hacked and they blamed NPM for letting it happened. Everyone went crazy...

Both situations are bad, and people are upset over Docker Hub. It just happens to be Friday night so it's not getting as much attention.

NPM is bad because the Javascript ecosystem is fast-moving with loose builds that have thousands of dependencies that are all bundled and run insider consumer's browsers.

I never complained and whined like a baby every time I install Gnome for example, using Debian's apt package manager where it fetches hundreds of packages worth of 1GB. Do you know how many Linux devs required you to use Lua libraries for example only for a single isolated piece of code just because they were too lazy to write it down in C.

Most distributions' package repos aren't a free-for-all, unlike NPM

It'd be a legit criticism of ruby gems or CPAN, but linux distros are an entirely different kettle of fish, and most of the mainstream distros take security pretty seriously

Yeah just like Mint one of the most popular Linux distro where you had a preinstalled malmware on your ISO because servers got hacked. Should I mention the ultra critical vulnerability of apt that was discovered few months ago or that apt doesn't use https, cuase it designed to work with http only in the first place.

Not sure about apt, but this is solvable. Arch's pacman supports https and package signing and only packages signed by trusted maintainers will get installed. That means it should be fairly difficult to swap legit packages for malicious ones and them getting installed.

Not impossible, nothing ever is, but fairly difficult.

APT does https and multiple flavors of signing, the repo maintainer just has to use it

That's why I said most

Wait, designed to work with http only? Link?

I think "http only" is a bit misleading given [1], but I'm no expert. In essence, apt doesn't use HTTPS because it provides limited value for a package manager. However see the link for a more comprehensive explanation.

[1] https://whydoesaptnotusehttps.com

Apt create 20 years ago, it's using HTTP protocol almost everywhere even today. They should have redesigned the whole project and ban the HTTP completely IMHO. I'm using HTTPS even on localhost services when I have for example a project that needs Grafana and influxDB.

The cryptography that apt (and other major distro package managers) use is much more safe and useful than TLS. Even if they switched to TLS on all transports, all of the package signing would still be absolutely required in order for package upgrades to be safe. In addition, the package manger should distrust the transport no matter what (in fact, it should be resilient to compromised repo servers).

Now, should apt use TLS by default? Ideally, yes. A secure transport is better than an insecure one regardless of what you're sending through it. But unfortunately it's not as simple as that. Most CDNs charge extra for TLS, and many existing free mirrors of packages don't provide TLS at all. Also, using HTTP allows for proxies to cache packages.

Unfortunately, as we discovered recently, apt had not been distrustful enough of HTTP metadata (which was a pretty big mistake since the entire design of package managers is that they must distrust the transport, especially if it's completely insecure like HTTP).

I'll admit to being ignorant of apt as my primary distributions aren't debian based, but aren't packages cryptographically signed? If package signatures are validated after download, then it shouldn't matter right? Edit: Skimming and I shamefully didn't the read grandparent post. The link addresses exactly this point.

You don't need HTTPS if everything is signed appropriately

That was the main reason why it got pawned few months ago. All main sources are using HTTP, it comes as default. It's your responsibility to make it https. Most of the distros are using HTTP by default except a few that respect privacy and security.



NPM already freaks out many people.

I secretly love NPM. If your open source project’s first code section is “npm i ...” I’m happy.

That's because npm has a history of screwing the pooch.

What did they do specifically? Not saying npm is beyond criticism, but we shouldn't just accept vague and unsubstantiated claims here.

There's a few previous issues, just use the site search here for npm and have a look.

Would have made this a bit clearer to note in the post that this is an email you received, and that you are not Kent Lamb using Hacker News as a medium to distribute Docker announcements, which is what this looks like.

Good point, it does look wrong. Updated.

I can't find an announcement of this anywhere besides HN? Will Docker be publishing info via official mediums?

I assume they will. I only just got the email and it looks like only a small subset of accounts are affected. Or at least that's what that PR spin is supposed to make you think.

I see. I originally thought this was the announcement, as that is what the post indicated.

Yea sorry about that I was more focused on figuring out what needed to be done today and who needed waking up so I just dumped the email.

I hope this doesn't hurt docker too badly. I really like the hub / auto build service.

You aren't the one hurting Docker, they've done that themselves. You put the word out there, so thank you for thinking of everyone else out there.

I got email at work.

It saddens me that docker hub is still lacking FIDO or any 2FA support.

Most companies get 2FA after the damage is done..

That would help very little.

You're probably busy, but you might want to update the splash page on Docker https://hub.docker.com to notify users of the incident ?

I did not get any email but my github is showing dozens of failed login attempts over the last 3 days.

Sending 190k emails takes time but please update us here if you don't receive in a day or so. - curious if their 190k is accurate or downplay spin.

It takes around 2 hours to send ~200k emails if you use an external email gateway and have good outgoing bandwidth.

https://status.docker.com still not a mention. Wonder how long until it is.

That's the wrong place to track a hack. The status page is concerned with uptime, not security.

I disagree, destroying a ton of keys breaks stuff.

They added it.

that's not good...

What are dockerhub's alternatives? No 2FA. That is bad.

As others have stated you could run your own registry or use an alternative service for private repositories, to minimise or eliminate the attack vector.

By replicating the images (or packages) that you need into your own account, you can minimise the possibility of a bad actor replacing a well-known image with something untrusted.

An alternative is to side-cart a service like Notary (https://docs.docker.com/notary/getting_started/) in order to establish a chain of trust for images. If an image gets changed, Docker will refuse to use it and you will be warned that it is untrusted.

Biased opinion on an alternative registry:

- Cloudsmith: https://cloudsmith.io/l/docker-registry/

But you've got other options, such as:

- Self-hosted: https://github.com/docker/distribution)

- Cloud-specific (e.g. ECR, GCR, ACR, etc.)

- Sonatype Nexus: https://www.sonatype.com

- ProGet: https://inedo.com/proget

- Gitlab: https://gitlab.com

- Artifactory: https://jfrog.com/artifactory/

If you're missing the auto-build functionality, this can be achieved reasonably easily with any of the mainstream and awesome CI/CD services out there, such as:

- SemaphoreCI: https://semaphoreci.com/

- CircleCI: https://circleci.com/

- DroneCI: https://drone.io/

Disclaimer: I work for Cloudsmith, and still think Docker Hub is great. :-)

You can run your own private Docker registry but you will still depend upon the base images pulled from hub.docker.com in your deploy chain unless you make sure to clone the base image Dockerfile from github and build it yourself. Even with this protected setup; you still have exposure from poisoned Github repos after this attack because of the compromised Github access keys. I'm not sure you can eliminate this threat, even with third-party services. What a mess.

It might be OK for the Docker Hub aspect at least, with a caveat later on; the GitHub aspect is unfortunate and I completely agree. Direct access to source is rather dangerous territory.

Back to the images bit first:

Base images are only referenced/pulled at build time. So if you've already built your own image and stored it, it'll contain all of the layers necessary to run it without explicitly pulling from Docker Hub.

In the case that you're building new images (likely), it'll need to pull the base images from Docker Hub. However, if you pull the base image(s) from Docker Hub first, you can tag them and store them in your local (or hosted) registry, then refer to those explicitly instead.

For example (using a Cloudsmith hosted registry):

  docker pull alpine:3.8
  docker tag alpine:3.8 docker.cloudsmith.io/your-account/your-repo/alpine:3.8
  docker push docker.cloudsmith.io/your-account/your-repo/alpine:3.8
Now, instead of the usual FROM directive:

  FROM alpine:3.8
You can refer to your own copy of alpine:

  FROM docker.cloudsmith.io/your-account/your-repo/alpine:3.8
As you can see Docker's syntax doesn't make this extremely pleasant, and you'll have to change existing Dockerfiles to point at the base images, but it's certainly possible to mirror your dependencies without rebuilding.

Caveat: The downside is that you have to trust those dependencies at the exact point you pull them down, so I concede it is still not perfect without rebuilding the lot. :-)

Your own repo, AWS ECR, whatever GCP's version is called, and many others.

There are actually very few alternatives for the autobuild part. The only alternative that I'm aware of is Quay, others require you to roll out your own build & push process.

I use drone.io self hosted to build all my images. They then get pushed to a self-hosted hub.

GCP's Cloud Build is also a simple option.

It's not that hard to roll your own (I'm doing that). It's not trivial, but if you need autobuild rather than just tags, it's not a huge time investment either. Some systems have all the necessary stuff exposed as plugins too (for example buildkite)

Autobuilding is really just a free GitLab pipeline.

Gitlab provides public and private projects with their own registry which can host Docker images built by Gitlab's CICD service.

And you can even run your own gitlab instance and don't expose it to the internet.

Nevertheless: The base images will be pulled from dockerhub by default and I am not sure we should trust them. Do we have any better alternatives for this?

on-site nexus is good.

Another reason to have Encrypted Container Images :) https://github.com/opencontainers/image-spec/issues/747

Hmm, my GitHub account is showing failed logins starting from 2 days ago, with none for the remaining period that GitHub shows - no email from Docker yet, but I wonder if this is related?

Why is this publicly posted here and not just in the platform and directly contacting people affected? So big became the HN influence and marker? It could have been just linked ️

Well this is fun, I'm now unable to logout. I have a feeling there's more to this incident than Docker is currently disclosing...

This step by step checklist might help you "what should I do" to review your accounts.


I’m curious, how can a database be accessed without authorization? If authorization is enabled? Also how unauthorized access can be discovered?

Say I use Mongodb and enabled authorization. Will I be fine then? How to discover unauthorized access?

Authorization has a common English definition too. If, for example, an employee's credentials were compromised, anyone who wasn't that employee who accessed the database would be considered "without authorization". And checking the access logs for any use of that employee's credentials would give you some idea of what data was accessed. Enabling authorization on your mongodb is good, but it absolutely won't stop all forms of unauthorized access. They may gain access to your server itself, or gain some credentials to your MongoDB database some other way (for example, if someone carelessly ships them as part of your software, or includes them in a github commit, or something like that).

In the worst case, if someone notifies you of a configuration problem or some software bug that allows anonymous access to your database or the ability to remove logs, you may have to assume the entire database was compromised since the existence of that configuration issue or software bug.

Is this affecting CircleCI? As far as I know their images pull from Docker Hub.

If the passwords are hashed, just what are the likelihood of your passwords being decrypted? I’d also imagine it is a one way hash since that’s typically the norm so I don’t even know how it can get decrypted.

Hashes are not decrypted, they are bruteforced.

> I imagine it is a one way hash

All hashes are one way. If it's lossless and can be reverted, it's a compression algorithm or isomorphism or encryption or cipher or any of a number of other things, but not a hash.

> I don’t even know how it can get decrypted.

It is not decrypted, but brute forced. For example, even if you can't algorithmically figure out what the input to md5sum is that gives you '1f3870be274f6c49b3e31a0c6728957f', you could apply md5sum to every word in the dictionary in a matter of seconds and find out that 'apple', when md5summed, has that output. You would then have one possible password for that hash (though technically there are infinitely many inputs that have that output).

The only way we can know how computationally difficult it is to brute force the password hashes is if we know the following: 1) the hash algorithm used (and other inputs like cost factor) and 2) the entropy of the salt used. Those two together lets us calculate the amount of computation needed to try one brute force "guess". Individual password's difficulty to be brute forced can then be calculated from their entropy (e.g. 'apple' has less entroy than '2SEZb'), to determine the average number of inputs needed to be tried, multipled by the cost of each attempt. Given that difficulty, you can then estimate how long an attacker will take to find your password by estimating how much computational power they have at their disposal.

In general, if you randomly generate 10+ character passwords and docker used best practices, the answer is that any attacker will not get your password in under a thousand years, and if you use a password which has been leaked before or is a dictionary word (or simple variation), it can be found on the order of minutes to days.

Hopefully they are salted with a unique ID because of the are using a md5sum only then you're screwed with rainbow tables.

I'm going to go out on a limb and say Docker are unlikely to be using MD5, salted or otherwise.


Rainbow tables are impractical with either salt, or more password entropy. If you use actual random passwords of, say, twelve alphanumerics because you have a password generator then even a bad choice like md5(password) is not practical to attack with brute force or rainbow tables.

The most famous application of rainbow tables is one of Microsoft's family of terrible password hashes LM. But the reason it breaks that wide open is not just the lack of salt, it's also that LM hash only works for 7 character passwords, and up to 14 chars are supported by doing two entirely separate hashes - so you can craft rainbow tables for all possible 7 character inputs and then reverse the hash.

It would be helpful if docker would tell us the work factor, algorithm and saltedness of the hashes, so we could know whether they were following best practices. Most people don't.

Docker has revoked GitHub and BitBucket access tokens tokens at least as of 27 Apr 2019 18:41:36 UTC


Let me get this right, Docker now forces users to register in order to download their client and they don't secure our data? Insane!

Error 500 when I try to login:

"Sorry, it's not you. It's us, but we are working on it!"

So, would it have been possible that the perpetrators knew about the keys and had built a way to scan them all beforehand? Or is this more likely to be an attempt at farming passwords?

hash+salt please

you guys should partner with github to disable those token which have leaked

Does this lessen the relevance that docker has these days?

Did Docker become any less useful for you due to this, or provides less value? Unlikely.

I’m thinking twice about using docker hub.

And the main usecase is k8s. So docker is just an implementation detail its relevancy is waning imo

Docker hub is a centralized service. What we are seeing is the result of having a huge centralized service: if it gets compromised, then many dependencies are compromised.

Some organizations took the risk of running docker taking images directly from docker hub. They were relaying the security of the images to them.

Some organizations are going to panic now and host their own registry. Which they need to protect as well. But in general it will create a better decentralized ecosystem.

I think this is good for the docker community in general.

We run our own registry that just mirrors images that we want to use and keeps them up to date. It’s not a silver bullet but it works.

I'd worry about mirroring the images because of cases like this, you'd want some sort of triage process before it gets into your environment.

That is a great news. Happy security.

I think this is a very ugly incident for Docker.

If you got an email you should:

- Change your password on https://hub.docker.com

- Check https://github.com/settings/security

- Reconnect oauth for Automated Builds

- Roll over effected passwords and API keys stored in private repos / containers

Quick take:

- Password hashes

- Github tokens

- Bitbucket tokens

- Your Automated Builds might need new tokens

Checking my github logs - It looks like they've known about this for at least a full 24 hours. Most people aren't going to have this looked at until Monday which kind of sucks. Hopefully there is more of a postmortem coming.

Is anyone from github able to comment on this as well?

There doesn't seem to be a way for us to tell if a repo was read by these keys over that time period.

Yesterday at 9pm PT my private Github repo produced this notification:

  The following SSH key was added to the foo/bar repository 
  by myorg-dockerhub-user:

  Docker Cloud Build

  If you believe this key was added in error, you can 
  remove the key and disable access...
I wonder if this is related? Dockerhub integration and its keys were still present on Github. In any case, I've revoked everything until the impact becomes clearer.

Can I complain a bit about GitHub? Why I can only authorize my entire GitHub account for third-party access? Could things be slightly better if the authorization is done at repository level?

GitHub provides a way for more granular third-party access: GitHub Apps. There, access can be set on a repository level [1]. E.g. Netlify can be configured as a GitHub app.

It seems like Docker Hub is implemented as an OAuth app [2], where these granular options are not available and you have to grant access to all your repositories.

[1] https://developer.github.com/apps/differences-between-apps/

[2] https://docs.docker.com/docker-hub/builds/link-source/

You can implement OAuth per repo if github wanted though, or alternatively can you grant access to a specific organisation? Not sure. The default should be per repo auth IMO.

I just looked at github OAuth scopes ( https://developer.github.com/apps/building-oauth-apps/unders... )

honest question, what's the point of using OAuth when the Authz is so coarse? Why not augment to have scopes per repo? Is it considered bad practice to have have a variable (repo name) as a scope?

IIRC the OAuth2-interfacing application needs to (or at least should) know beforehand exactly what to request access to, so if that's read/write access to all of the user's content, it's trivial. For the external application to know something specific like a particular resource is more complicated to deal with (especially with private/hidden content), so most OAuth providers don't provide that level of granularity. It can be done, it just requires more engineering than most (all?) off-the-shelf OAuth solutions provide, and it's more control than most users actually need.

Holy shit this is a crazy attack vector.

I found this snippet on Docker Hub's Linked Account Settings:

> Service user (or machine/bot account) suggested

> Attaching your personal GitHub or Bitbucket account to this Docker Hub organization will allow other organization owners to create builds from your private repositories. We suggest using a service user (also referred to as a machine user or bot account).

c.f.: https://docs.docker.com/docker-cloud/builds/automated-build/...

Seems worthwhile to do this, if you're an enterprise or otherwise have sensitive private repos. But I agree that it would be better to have an easier per-repo authorization system, since many users won't bother going through the hassle of setting up a service account.

> > Attaching your personal GitHub or Bitbucket account to this Docker Hub organization will allow other organization owners to create builds from your private repositories. We suggest using a service user (also referred to as a machine user or bot account).

> c.f.: https://docs.docker.com/docker-cloud/builds/automated-build/....

Did they remove this language from your link? I don't see it anymore.

Or to take it a step further, let me override which permissions I grant during the OAuth request.

In my case I don't even know why it needs read and WRITE access to ALL repositories. All I want is for it to build one public repository. It doesn't need any special permissions for that at all.

You can authorize specific orgs your account has access to vs your whole account if that's what you're looking for.

Also not sure what access permissions you need but deploy keys are repo level.


Machine users are another option.


Seems that dockerhub is using the github oauth permissions to do three things:

- retrieve a list of all repos to display in the autobuild setup page

- setup webhooks for the gh repo that should be built via dockerhub autobuild

- setup a deploy key for said repo, so that it can be cloned

I removed the dockerhub oauth on github side, after setting up autobuild. My builds on push to master and tag are still working. So it seems possible to remove dockerhubs write access to your github repos after the autobuild setup, which really seems to be a good idea.

At the moment I can't change the password. It fails with "Failed to save password" error, no more information.

EDIT: it finally worked, 4th attempt, and very slowly. Looks like something isn't working 100% as it should

EDIT 2: aaaand I can't login now with the new password. A password reset did work, but it looks like their password database is under some stress at the moment.

My guess is their auth system is/was under a ton of load.

Specifically to make the password database more secure, the generation of password hashes is very computationally intensive by design (e.g. that's the whole point of something like bcrypt vs. sha1)

Password systems really shouldn't be designed to handle a 10x or 100x load without some slowdown. If they could handle that, it means their password DB probably isn't as hardened as it should be.

Password reset worked for me. Trying to change it from the account page did not.

Same can't change password

I could as of 10 minutes ago

Yep - my "deleted by associated Oauth application" event was triggered 2019-04-25 20:12:25 -0400

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact