Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: Hacker News Profile Leak (Fixed)
170 points by kogir on Nov 13, 2014 | hide | past | favorite | 62 comments
Under certain error conditions, a bug in our API code briefly published 84 users' usernames, email addresses, password hashes, and 100 most recent votes. This information appeared at https://hacker-news.firebaseio.com/v0/updates. We notified affected users on Monday, November 10th via email and (for users without email addresses in their profile) on Tuesday the 11th via a message in the site header.

Affected profiles were leaked on one of 10/12, 10/20, or 11/02. In every case, the leaked data was overwritten 30 seconds later by the subsequent update batch. The leaked password hashes were salted bcrypt (FreeBSD's default libcrypt implementation). Though we think the risk is low we encouraged affected users to change their password on HN as well as on any other sites where they used the same password.

Many thanks to Ovidiu Toader for alerting us to the bug and for sending us examples that assisted us in tracking it down. While the bug was fixed on Sunday, November 9th within minutes of our becoming aware of it, Ovidiu originally reported the issue one week prior - we just didn't see it in a timely manner.

To help improve our future response times, we've created a dedicated reporting address, security@ycombinator.com that we'll publish on our contact form. We're also creating a "Wall of Fame" to properly thank and credit past and future vulnerability reporters. More details will follow.

Super sorry about this,

The Hacker News Team


A clarification, since some people seem to be misunderstanding: Only publicly available data is intentionally pushed to Firebase. That any part of a user's profile other than their username, account age, about text, and list of submitted items was published IS THE BUG, and is now fixed.

100 most recent votes

This has me curious. Why 100? Why not 0 or all? 100 seems to indicate you're aggregating the 100 most recent votes for some specific purpose, and that the feature unintentionally leaked the data.

I wonder if mods have the ability to go to a page, type in a username, and see the 100 most recent things they've upvoted/downvoted? I guess as a way of looking for voting rings?

I often upvote comments I feel are unfairly downvoted, definitely not because I agree with the comment. Hopefully vote history isn't being used as a metric of character. Then again, maybe it's a useful filter. I've often wished Reddit would drag down comments from people who upvote angry bully-type comments from other people, so there might be all kinds of interesting ways "100 most recent votes" could be used.

100 was picked because it's a large round number. We need vote history to show vote arrows correctly, and for performance reasons we try to avoid loading entire vote files.

128 is round. 100 is just weird!

It's true. I can count it on my own 10 hands.

If you flip your fingers like bits, you can count up to 1024. Using your toes would get you up to 1048576.

You have to show me how you do that toe thing. Youtube, maybe?

Just use finger up for 1, down for 0. It's a nice way to count.

with your toes??

I demand that youtube video now. Amazing skills abound!

Your circle is very square.

I would bet on technical reasons. From what I've read HN uses some unusual caching and optimization strategies - part of how the Arc application was optimized after initial development. I have no idea if this is the case here but it seems plausible - there is (outdated) source available[0] so you may be able to take a look for yourself.

[0] Mirrored at https://github.com/wting/hackernews

There's nothing particularly unusual about what Arc does. This probably has a technical explanation, but the most logical deduction seems to be that when a mod visits your profile page, they can follow a link that will show them the 100 most recent items you've voted for.

Either that, or a program is aggregating the 100 most recent votes in order to be used in some filtering or weighting algorithm to change the ranking of comments. (People who upvote what YC alums/patio11/tptacek/etc upvote may be upvoting high-quality things, so make their votes count more. That sort of thing.)

I guess it might improve server performance if the server caches your 100 most recent upvotes for page generation purposes, since when it serves you a page, it has to know whether you've upvoted each comment or not. But it seems like a stretch to say it's only used for that purpose, because it still needs to know whether you've upvoted things from very old pages, yet the list stays at exactly 100 items. And they already hunt for and punish voting rings. I was just wondering the extent to which votes are scrutinized.

EDIT: The more I think about this, the dumber I sound. The dataset that was leaked was almost certainly used to generate pages and nothing more. It's the simplest explanation, for one. Manually looking doesn't scale. The program to look for voting rings is probably a completely different program. Etc.

As I said, I have no idea if this is what is happening, it would just fit in with the pattern of "weird" caching that I've heard HN uses - I mention Arc only because some of the caching is related to eliminating closures kept around for things like user sessions IIRC[0], though I assume any Lisp would lead to similar optimizations.

[0] It appears this is not the case with the vote caching from staff comments in this thread.

> The leaked password hashes were salted bcrypt (FreeBSD's default libcrypt implementation).

As bad as data leaks are, it's at least nice to see one of these data leak stories where the passwords were actually stored correctly, instead of being MD5 / plaintext / base64.

> one of these data leak stories where the passwords were actually stored correctly

The reason we see so few hacks where passwords were stored properly might be because they do things properly, so odds are lower they get hacked in the first place. Just a thought.

Be nice to know the number of rounds used though - the default of 10 in most implementations is starting to become not enough.

Also bcrypt does not seem to be FreeBSD's default libcrypt implementation from the source code [1] - it appears to be DES (or SHA512 if DES isn't available). What makes HN think it's bcrypt? @kogir?

[1] https://github.com/freebsd/freebsd/blob/master/lib/libcrypt/...

Thanks for pointing that out. You're correct (and I was wrong) - it's not the default, but we format our salts to request it.

The docs are here: https://www.freebsd.org/cgi/man.cgi?query=crypt&apropos=0&se...

The code looks something like this:

    (withs (salt (+ "$2$10$" (rand-string 20))
            ux   (crypt pw salt))
"(rand-string 20)" provides 128 bits of randomness from /dev/urandom.

I'll up the number of rounds and upgrade them in place as users log in. Thanks!

I know it's compelling to answer users who ask for more transparency on security. But it's not the best thing to reveal implementation details after hashes were leaked.

That's not how bcrypt works. If you have the hash, you know the number of rounds.

Even if that weren't true --- and it very much is --- you'd still trivially be able to find that number given a single known password.

> you'd still trivially be able to find that number given a single known password.

I understand that it's because the salt is entirely appended in clear to the hash. Isn't it better to have a second static salt implemented in the code, in case only the database would be compromised?

I think those schemes are pretty silly, but as long as you're using a well-tested implementation of a real KDF and not some goofy scheme you hacked up yourself so you could add the second secret nonce, I don't care.

Alright, it's true that it feels silly to add negligible protections when your security here is reduced to the KDF and its implementation.

I honestly wouldn't ask if I thought that bcrypt could be cracked by knowing the number of iterations.

Now there may be other bcrypt attacks out there, but none have been announced, and the usage of bcrypt was announced in the text of the post.

Furthermore, if you have the hash, you have the number of rounds. I just have no desire to see if I can download the data (I don't trust random sites, and frankly I don't want the data).

Security by obscurity is not worth the price you pay for it.

> Security by obscurity is not worth the price you pay for it.

That's why I was posting this. Many people think that because of the Kerckhoff principle you should always disclose everything. But obscurity is not always a bad thing and it can be part of the obfuscation here. Although here it was indeed a good thing since admins are gonna up the cost of their kdf following advice from the community.

Shit happens, we're all human. All you can do is learn from it. Thanks for setting up that dedicated reporting address, and for being transparent about the incident.

I was pretty amused to see the notification come across my email...I never get to part of the fun hacks, and also, I randomize my password to make it hard to log back in (without searching for the text file I've buried it in) when I've logged back out...so no big loss. I know the email asked us to not say anything about it until everyone was properly notified but I was pretty surprised no one blurted it out anyway, this being HN and the lively discussions we have about hack incidents.

This is a perfect example of how security venerabilities should be disclosed.

• Quickly • Transparently • With a fix already in place

I was one of the lucky users. I was on the road (coming back from a family funeral in rural Kansas) at the time the email was sent, so I was barely on my cell phone network enough to see the email, and wasn't near my desktop computer where I usually do all my password changes. But I only use my password for any given site on that site itself, and I have now reset my Hacker News password, so all's well that ends well. Thanks to the HN team for fixing the bug and for notifying the users affected by the bug.

Just wanted to thank the HN team for a responsible disclosure.

It's never fun to be on the receiving end of these e-mails, but the HN leak e-mail was the most responsible data-leak notification I've received.

Thanks for being professional and responsible!

So why was that data pushed "randomly" into the updates queue? Or put another way why was it random rather than happening all of the time?

Had it happened in all cases, I'd have noticed it (I hope!) before pushing it to the live site.

This was a case of rare error handling having unintended side effects. The profiles were only published in one very unlikely case.

> The profiles were only published in one very unlikely case

Can you comment on what this case was? I'm just curious.

Why are password hashes available via public API at all? Email addresses too?

Because we made a mistake that is now fixed? It was definitely not intentional.

ok. the way you worded it, it sounded like the mistake was the publishing of the info, not that certain info was made available via the api. i was questioning why certain data would be available via the api, but it appears it was only available during error.

> it appears it was only available during error

Yes, for two different meanings of "error". I wonder if this ambiguity is part of what was confusing, so let me disambiguate.

There was a programming error in some error-handling code. If that makes sense, you can skip the following.

The first kind of error was a bug in our code. The API code is never supposed to send info to Firebase that you couldn't get by scraping the public website. Unfortunately, we wrote a line of code that (in a totally non-obvious way, which is why it got past both testing and code review) broke this rule.

The buggy line of code happened to live inside an exception handler. That is, it was only ever executed when a runtime error occurred. That's the second type of error we're talking about. An example might be a network timeout. These are infrequent, so the buggy code only ran infrequently.

Had the bug been in the happy path, testing would have caught it right away. In this case, though, even testing the error path didn't catch it, because certain other (basically random) conditions had to happen in order for the buggy code to leak the data. That's why we didn't know about it until Ovidiu told us.

Wow if HN can't even get this right, it's no wonder so many fortune 1000 and fortune 100 companies are compromised so often. Information wants to be free. Secure programming requires thorough discipline.

From the data leaked it sounds like HN is exporting a lot more than it should.

Well, yes. That was the bug that was fixed.

Ah I see. Ouch. Well, I'm sure you'll be a lot more careful in the future ;) Thanks for the transparency, it's much appreciated (and in general short supply elsewhere).

Is there a prize for the 84 users? even symbolic...

As one of the 'winners' I'm just happy that everything was hashed and salted correctly.

I won, too. HN could remove all the undeserved downvotes I get ;).

Hey, how come this is not yet on the homepage? my last post is weirdly stuck in newest as well... Any pb with the posts are picked up for homepage?

You can't derive the front page or an individual story's rank from the displayed score and timestamp alone. That's on purpose, as an anti-gaming measure.

I'm going to mark this subthread off-topic now. If you have questions like this, please don't post them here, but rather email hn@ycombinator.com, as the guidelines ask.

sure no problem

"Please put a valid address in the email field, or we won't be able to send you a new password if you forget yours. Your address is only visible to you and us. Crawlers and other users can't see it."

Welp, I'm taking my email out.

Make sure never to commit to a public git repo either.


In all seriousness though, we're really bummed this happened and wish it hadn't. We do code reviews and try our best to prevent this kind of thing from happening. That said, if you truly want your account here to be anonymous, you're right to remove all personally identifiable information. I'd also recommend using tor (and using it correctly).

How do you make an account through Tor without being insta-banned?

The ban only applies for the first 2 weeks, then is lifted.

> We do code reviews...

Out of curiosity, how many people are familiar with HN's codebase as I thought it was developed in PG's personal flavor of Lisp?

You could always ask users for their email address at sign-up, send them a random, single-use, account recovery code, and then never store their address.

I've made statements on HN I'd prefer are anonymous to the general public?

Also, there was no mention they were handing the info over to a 3rd party. If you explicitly state something like that, you should follow it and/or change it when the situation changes.

I don't have that issue with git repos.

I'm kinda amused a yc employee went through the effort of downvoting it after pointing out this situation is caused by y'all not following what you actually have in your notices for things.

>I'm kinda amused a yc employee went through the effort of downvoting it after pointing out this situation is caused by y'all not following what you actually have in your notices for things.

Seriously what is it with HN/Reddit where everyone assumes that any downvotes are from people with an agenda?

Because I know I'm right and all rational people agree with me. Anyone who disagrees is a bad person.

I'm not sure how to make it more clear than I did, but this data was not intentionally shared with a third party. Had we known it would happen, we'd obviously have prevented it.

The only data we knowingly send to Firebase is already public and visible to anyone that can speak HTTP.

Sorry if I'm still talking past you.

As far as I know, a person can not downvote top level comments on their own threads (or a reply to their comment). Perhaps employees & mods have the power to do that. But I'm not sure how you can tell it was a yc employee that downvoted you.

It took me a while to figure out what exactly you were objecting to, but I guess you don't like the fact that HN sends its data to Firebase?

I don't think it's fair to criticize the admins for that. For pretty much any web application you want to use, "only visible to you and us" should automatically be understood to include "and our hosting provider too, if they go digging or screw up."

The situation is actually tighter than that. We don't give the "only visible to you and us" data to Firebase (or anyone else), precisely so it won't matter if somebody else goes digging or screws up. You're protected from all of that. What you're not protected from, unfortunately, is us screwing up. We'll try our best not to do that again.

What's the problem?

That some people use email addresses that are personally identifiable in the non-visible portion of their profile but have an otherwise anonymous profile whose comments they do not wish to be traced back to the maker. For instance an Apple employee that speaks about Apple internals anonymously might get sacked if they were exposed.

Right, and just so it's 100% clear to everybody: we never did and never would knowingly publish this data. A small amount of it leaked (for 30 seconds on 3 occasions) because of an obscure mistake in our code, and we're deeply sorry about that. We turned off API publishing the instant we found out about it and dropped everything until we were sure it was fixed.

The API design has always been to publish only information that is already public, that anyone could get by scraping the website.

The fairly obvious solution would be to use another email. Or none. If it's a burner account, don't put your email on it. "Pay cash."

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact