Hacker News new | past | comments | ask | show | jobs | submit login
MDN Database Disclosure (blog.mozilla.org)
73 points by diegocr on Aug 2, 2014 | hide | past | favorite | 46 comments

I have been wondering who leaked my address after I started getting the "E.N.L.A.R.G.E...Y.O.U.R....." spam exactly about a month ago.

Initially I thought that it might have been my fault for entering the email address where I shouldn't have. I am disappointed that such processes are even architecturally possible at Mozilla where internal data is exposed externally.

Also, this has raised a question. Almost everybody knows that passwords must be hashed and salted. But I haven't see anywhere encrypted email addresses. Are there any strongly negative consequences to encrypting sensitive personal data in databases?

I don't think encrypting an email address would create any issues. In fact, if I was to provide a service, I'd save the hash email address for easy authentication (ie hash the email given during a login and compare with the hash you have) and one encrypted version of the email address so I can use it if needed (to inform the user or whatever).

I started getting a lot of spam about one month ago too and even emailed LastPass a bit angry. But this Mozilla incident could well be the cause of the spam...

The encrypted e-mail address has to be read somehow, so it's just as likely that an attacker gets the decryption key as the database itself (unless you use e.g. a hardware security module). That's probably good enough for e-mail addresses, but as you likely know, not acceptable for passwords.

Can someone explain the meaning of "data sanitization process of the site database had been failing"

Isn't that another way of saying SQL injection ?

I wondered about this too. At a guess, perhaps they were doing a straight database dump from a production system which had sensitive information as well as public data. They would then run a script to delete the sensitive columns before posting the dump.

This seems likely to have been broken at the design stage: systems should fail safe. The first-order fix might be to check the return value of the sanitizer script and refuse to upload if it failed. But a better solution would be to write a system which makes it much less likely to leak private data. For example by copying only whitelisted columns (so if new sensitive columns are added to the system they are not dumped by default). Or storing sensitive data in separate tables or even a separate database (this will take more work if levels of sensitivity change over time).

I've speculated here about the details to illustrate the point about systems design. Unfortunately, too often the glue code for these sorts of things is written with little or no error checking, so when something is wrong the system just proceeds through unknown or unvalidated states as we see here. It doesn't help that the default language for cron and a lot of "supervisory" jobs tends to be Bash (or Dash) these days, where error checking is turned off by default.

Yep, naive bash scripts don't stop on failure and your non-sanitized file will happily get uploaded.

Does anyone know why Mozilla was posting database dumps(sanitized or otherwise) onto public servers?

It's useful to contributors who work on the site development.

Every unattended shell script should start with this:

    set -e

No, see jvehent's comment:

"an automated data sanitization process failed, emails and salted hashed passwords were disclosed. no server was hacked."

I don't understand why they had to do this, couldn't they just use a schema dump with random data? They are already setting the passwords to null and names to a random number in their sanitization script...

Emails were just sent out to users, full text: https://gist.github.com/simonsarris/829ba1c0669c404f0da5

In Dezember 2013, Mozilla MDN switched to their self developed Kuma wiki software (from a hosted wiki solution). An open source wiki software written in Python and using the Django framework. https://developer.mozilla.org/en-US/docs/MDN/Kuma , https://news.ycombinator.com/item?id=6876636

We launched Kuma on August 3, 2012. That post is about the MDN redesign we launched in 2013.

The email that was just sent out to MDN users seems to differ from this post. The email says:

> Your email address (but not password) was posted on that server for that 30 day time period.

There is no other mention of the word password or hash (encrypted or otherwise). However, the post says

> in the accidental disclosure of MDN email addresses of about 76,000 users and encrypted passwords of about 4,000 users on a publicly accessible server.

The emails are customized for each type of affected user.

tldr: an automated data sanitization process failed, emails and salted hashed passwords were disclosed. no server was hacked.

There is much that could be done to improve this announcement:

1- What does "encrypted, salted passwords" mean? MD5 with a static salt? Holy shit, that's a problem. bcrypt? Less so. I have no context to know how concerned I should be, or any indication of how incompetent, or awesome, Mozilla's existing processes and defenses are. Fail.

2- They talk about a "data sanitization process" failing, but then talk about a "database dump file" being publicly accessible. Say what? This could mean anything from "an input validation error allow wrong passwords to work" to "we do a regular database dump, and store that on a public HTTP directory for some cron job to grab." Without explanation, I assume the worst. Fail.

3- "While we have not been able to detect malicious activity on that server..." Again, without the context of what happens, this statement is worthless. If you leaked the database of your users, I won't expect any malicious activity. An adversary wouldn't attack Mozilla. They would crack the passwords of the users and attempt to hijack their accounts on other sites that matter, like, banking or ecommerce sites. At best Mozilla knows this and just wanted to include some proof-point that at least they have logs/basic monitoring of stuff in place, and wanted to save face. At worse, Mozilla truly believes that someone not actively attacking them somehow means that nothing bad will happen from this loss, which is stupid. And Mozilla's Security usually isn't stupid. Fail.

4-" In addition to notifying users and recommending short term fixes, we’re also taking a look at the processes and principles that are in place that may be made better to reduce the likelihood of something like this happening again." This is a completely unsatisfactory statement. If you just discovered the problem this afternoon, like, "oh shit, why is the a .sql dump in our HTTP readable /backups/ folder?" then saying "hey, we discovered a problem, we think we have stopped it, and we are looking into our processes" is a reasonable response. However when you have "just concluded an investigation" you should, I don't know, tell us your conclusions maybe? What happened? Why did it happen? What changed in your existing system that allowed it to happen? Or has this short coming always existed? If so, who is defining/vetting your processes? What are you doing so this issue doesn't happen again? What other thing are you doing to watch the thing that's going to make sure it doesn't happen again? Instead, we get a generic statement. Fail.

While not as completely opaque as some "oh no, we got pwn3d" posts, this blog post has completely failed to do the 3 things any post of this kind should do: 1) educate me about what happened 2) help me understand the risk Mozilla's actions have exposed me to, and 3) give me confidence by demonstrating clear actions you are taking so this won't happen again.

Yes attacks happen, but when a company or organization is up front, honest, and over communicates, it does wonders to calm the situation.

Mozilla, I expect more from you.

A process failed, and the DB dump that is published to help contributors improve the MDN site got out unsanitized. The sanitization/publication process will be redesigned to include stricter controls. For now, it is shut down.

MDN has been using persona for a while now, meaning that most accounts don't have passwords in the database. But older accounts still had the SHA256 salted hash that Django creates.

We traced back as much as we could. Access logs, netflow data, etc... We found that the tar.gz containing the DB dump had been downloaded only a small number of times. Mostly by known contributors. But we can't rule out that someone with malicious intentions got access to it.

Who exactly are these "known contributors", and why did they have access to this data? Why did they not report the problem earlier?

And if it was downloaded "mostly" by "known contributors", who was involved with the rest of the detected downloads?

https://bugzilla.mozilla.org/show_bug.cgi?id=932869 was the request for a sanitized DB for folks wanting to develop MDN itself. We could identify most of the handful of IPs that downloaded the file during the time period where it was unsanitized to individuals (i.e. IPs inside Mozilla offices, etc.). However because some IPs were unknown, or public, or potential NAT addresses Mozilla decided it was best to disclose the issue.

If some of the accesses were by people or systems within Mozilla, can you please address why a month went by before the problem was noticed?

If there was enough need to justify putting forth the effort required to export a sanitized version of these data for developers to use, then why didn't these users notice that something was wrong much sooner? And if they did notice, why weren't the appropriate parties within Mozilla notified sooner?

Could you please provide more specific details about these IP addresses that couldn't be accounted for, too? Perhaps a list of them, for instance? At least then affected users will be able to make their own call regarding their level of risk due to this incident.

Sorry, I can't provide a list.

Why not?

Because our privacy policies state that we won't disclose personally identifiable information about users, and IP addresses can be personally identifiable.

Unfortunately security incidents happen, but we won't violate the commitments we have made to our users; in this case, if we revealed the IP addresses we would have another, deliberate information leak on our hands.

Sha256+salt. See https://github.com/mozilla/kuma

We are still working on the rest.

Why did you decide to use sha256 instead of a kdf like bcrypt or pbkdf2? I'm not attacking you, genuinely curious.

To provide a bit more context, in early 2011 we made a conscious decision to move towards the password storage methods described here : https://wiki.mozilla.org/WebAppSec/Secure_Coding_Guidelines

While we were moving in that direction (upgrading apps, etc) we also launched Persona (BrowserID at the time). Some apps opted to switch to Persona, others opted to upgrade password storage mechanisms.

That's an interesting document.

It mainly contains assertions for what one should do. Do you know if there's an explanation for the rationale anywhere? For example _"Passwords must be 8 characters or greater"_ or _"Privileged accounts - Password for privileged accounts should be rotated every: 90 to 120 days"_.

Not on hand, those decisions were made years ago, and done in email discussions and in person meetings.

That said, the password length requirements were driven by the cost of performing effective brute force attacks against properly hashed and salted values at the time we set that length.

Privileged passwords was basically a stop gap measure to ensure that users were refreshing passwords regularly. The correct solution is to deploy multi-factor authentication.

It's what django uses, and this site uses django.

Django 1.3 or lower. Django uses PBKDF2 [1] since 1.4 (March 23, 2012) [2].

[1] https://docs.djangoproject.com/en/1.5/topics/auth/passwords/...

[2] https://docs.djangoproject.com/en/dev/releases/1.4/

MDN was on Django 1.2 for a while, albeit with monkeypatched password hashing (since Django at the time was still defaulting to SHA1, I believe). With the switch to Persona, it no longer matters -- for a new account -- what hasher is used, since Persona doesn't involve storing a password.

Not to mention it's always been easy to implement even when it wasn't bundled by default.

There is utterly no excuse for storing passwords with anything that's not PBKDF2, bcrypt, or scrypt starting in 2009.

Right, they switched over to Persona, which is far better. Unfortunately the old hashes were still left in the DB.

To be clear, the only old hashes were those from folks who haven't used persona to log in.

also the feature was built a long time ago.

current standards for sites not using persona are here: https://wiki.mozilla.org/WebAppSec/Secure_Coding_Guidelines

As the number of logins per core increases the login latency with these functions increases fairly quickly.

Please try also to improve the page load time by using a cache, faster hardware, etc.

This was one of the least detailed and helpful disclosures I've ever seen.

Reading "encrypted passwords", and then "salted hashes" without any specification of the hash algorithm from a tech company like Mozilla is utterly astonishing to me.

I feel like whoever discovered this breach sent it over to the PR department without any sort of final review of the release by an engineer.

Not at all, the decision was made to release the information necessary to help people decide if action was needed. As you can see from this thread, we are happy to provide relevant details, and like most of our other tools, Kuma, the platform that MDN uses is open source, and on github.

>What does "encrypted, salted passwords" mean? MD5 with a static salt? Holy shit, that's a problem. bcrypt? Less so. I have no context to know how concerned I should be, or any indication of how incompetent, or awesome, Mozilla's existing processes and defenses are.

Given that this leak happened at all, incompetent to criminally negligent would be my estimation.

Can we please start making people go to jail when this happens? I am so tired of having personal information so often.


Christ, it's comments like these that make me glad I don't have to write messages for public consumption at large enterprises. People absolutely hang off of every word in the most absurd fashion..

I feel like a large-enough amount of people just spend their days finding a reason to be complaining at everything. A word, a look, whatever really.

Whatever one will be doing it will never be good enough for some. They just want to be unhappy about it. They want to blame something, someone. It's cool or something.

The now-deleted comment was some donwnright vitriolic complaining about the use of the word "like" in the notification email. "We would like to inform you..."

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact