
Tell HN: Hacker News Profile Leak (Fixed) - kogir
Under certain error conditions, a bug in our API code briefly published 84
users&#x27; usernames, email addresses, password hashes, and 100 most recent votes.
This information appeared at https:&#x2F;&#x2F;hacker-news.firebaseio.com&#x2F;v0&#x2F;updates. We
notified affected users on Monday, November 10th via email and (for users
without email addresses in their profile) on Tuesday the 11th via a message in
the site header.<p>Affected profiles were leaked on one of 10&#x2F;12, 10&#x2F;20, or 11&#x2F;02. In every case,
the leaked data was overwritten 30 seconds later by the subsequent update
batch. The leaked password hashes were salted bcrypt (FreeBSD&#x27;s default
libcrypt implementation). Though we think the risk is low we encouraged
affected users to change their password on HN as well as on any other sites
where they used the same password.<p>Many thanks to Ovidiu Toader for alerting us to the bug and for sending us
examples that assisted us in tracking it down. While the bug was fixed on
Sunday, November 9th within minutes of our becoming aware of it, Ovidiu
originally reported the issue one week prior - we just didn&#x27;t see it in a
timely manner.<p>To help improve our future response times, we&#x27;ve created a dedicated reporting
address, security@ycombinator.com that we&#x27;ll publish on our contact form.
We&#x27;re also creating a &quot;Wall of Fame&quot; to properly thank and credit past and
future vulnerability reporters. More details will follow.<p>Super sorry about this,<p>The Hacker News Team<p>(Edit)<p>A clarification, since some people seem to be misunderstanding: Only publicly available data is intentionally pushed to Firebase. That any part of a user&#x27;s profile other than their username, account age, about text, and list of submitted items was published <i>IS THE BUG</i>, and is now fixed.
======
sillysaurus3
_100 most recent votes_

This has me curious. Why 100? Why not 0 or all? 100 seems to indicate you're
aggregating the 100 most recent votes for some specific purpose, and that the
feature unintentionally leaked the data.

I wonder if mods have the ability to go to a page, type in a username, and see
the 100 most recent things they've upvoted/downvoted? I guess as a way of
looking for voting rings?

I often upvote comments I feel are unfairly downvoted, definitely not because
I agree with the comment. Hopefully vote history isn't being used as a metric
of character. Then again, maybe it's a useful filter. I've often wished Reddit
would drag down comments from people who upvote angry bully-type comments from
other people, so there might be all kinds of interesting ways "100 most recent
votes" could be used.

~~~
yzzxy
I would bet on technical reasons. From what I've read HN uses some unusual
caching and optimization strategies - part of how the Arc application was
optimized after initial development. I have no idea if this is the case here
but it seems plausible - there is (outdated) source available[0] so you may be
able to take a look for yourself.

[0] Mirrored at
[https://github.com/wting/hackernews](https://github.com/wting/hackernews)

~~~
sillysaurus3
There's nothing particularly unusual about what Arc does. This probably has a
technical explanation, but the most logical deduction seems to be that when a
mod visits your profile page, they can follow a link that will show them the
100 most recent items you've voted for.

Either that, or a program is aggregating the 100 most recent votes in order to
be used in some filtering or weighting algorithm to change the ranking of
comments. (People who upvote what YC alums/patio11/tptacek/etc upvote may be
upvoting high-quality things, so make their votes count more. That sort of
thing.)

I guess it might improve server performance if the server caches your 100 most
recent upvotes for page generation purposes, since when it serves you a page,
it has to know whether you've upvoted each comment or not. But it seems like a
stretch to say it's only used for that purpose, because it still needs to know
whether you've upvoted things from very old pages, yet the list stays at
exactly 100 items. And they already hunt for and punish voting rings. I was
just wondering the extent to which votes are scrutinized.

EDIT: The more I think about this, the dumber I sound. The dataset that was
leaked was almost certainly used to generate pages and nothing more. It's the
simplest explanation, for one. Manually looking doesn't scale. The program to
look for voting rings is probably a completely different program. Etc.

~~~
yzzxy
As I said, I have no idea if this is what is happening, it would just fit in
with the pattern of "weird" caching that I've heard HN uses - I mention Arc
only because some of the caching is related to eliminating closures kept
around for things like user sessions IIRC[0], though I assume any Lisp would
lead to similar optimizations.

[0] It appears this is not the case with the vote caching from staff comments
in this thread.

------
SCdF
> The leaked password hashes were salted bcrypt (FreeBSD's default libcrypt
> implementation).

As bad as data leaks are, it's at least nice to see one of these data leak
stories where the passwords were actually stored correctly, instead of being
MD5 / plaintext / base64.

~~~
baudehlo
Be nice to know the number of rounds used though - the default of 10 in most
implementations is starting to become not enough.

Also bcrypt does not seem to be FreeBSD's default libcrypt implementation from
the source code [1] - it appears to be DES (or SHA512 if DES isn't available).
What makes HN think it's bcrypt? @kogir?

[1]
[https://github.com/freebsd/freebsd/blob/master/lib/libcrypt/...](https://github.com/freebsd/freebsd/blob/master/lib/libcrypt/crypt.c)

~~~
kogir
Thanks for pointing that out. You're correct (and I was wrong) - it's not the
default, but we format our salts to request it.

The docs are here:
[https://www.freebsd.org/cgi/man.cgi?query=crypt&apropos=0&se...](https://www.freebsd.org/cgi/man.cgi?query=crypt&apropos=0&sektion=3&manpath=FreeBSD+9.2-RELEASE+and+Ports&arch=default&format=html)

The code looks something like this:

    
    
        (withs (salt (+ "$2$10$" (rand-string 20))
                ux   (crypt pw salt))
          <stuff>)
    

"(rand-string 20)" provides 128 bits of randomness from /dev/urandom.

I'll up the number of rounds and upgrade them in place as users log in.
Thanks!

~~~
baby
I know it's compelling to answer users who ask for more transparency on
security. But it's not the best thing to reveal implementation details after
hashes were leaked.

~~~
tptacek
That's not how bcrypt works. If you have the hash, you know the number of
rounds.

Even if that weren't true --- and it very much is --- you'd still trivially be
able to find that number given a single known password.

~~~
baby
> you'd still trivially be able to find that number given a single known
> password.

I understand that it's because the salt is entirely appended in clear to the
hash. Isn't it better to have a second static salt implemented in the code, in
case only the database would be compromised?

~~~
tptacek
I think those schemes are pretty silly, but as long as you're using a well-
tested implementation of a real KDF and not some goofy scheme you hacked up
yourself so you could add the second secret nonce, I don't care.

~~~
baby
Alright, it's true that it feels silly to add negligible protections when your
security here is reduced to the KDF and its implementation.

------
MalcolmDiggs
Shit happens, we're all human. All you can do is learn from it. Thanks for
setting up that dedicated reporting address, and for being transparent about
the incident.

------
danso
I was pretty amused to see the notification come across my email...I never get
to part of the fun hacks, and also, I randomize my password to make it hard to
log back in (without searching for the text file I've buried it in) when I've
logged back out...so no big loss. I know the email asked us to not say
anything about it until everyone was properly notified but I was pretty
surprised no one blurted it out anyway, this being HN and the lively
discussions we have about hack incidents.

------
voska
This is a perfect example of how security venerabilities should be disclosed.

• Quickly • Transparently • With a fix already in place

------
tokenadult
I was one of the lucky users. I was on the road (coming back from a family
funeral in rural Kansas) at the time the email was sent, so I was barely on my
cell phone network enough to see the email, and wasn't near my desktop
computer where I usually do all my password changes. But I only use my
password for any given site on that site itself, and I have now reset my
Hacker News password, so all's well that ends well. Thanks to the HN team for
fixing the bug and for notifying the users affected by the bug.

------
ncallaway
Just wanted to thank the HN team for a responsible disclosure.

It's never fun to be on the receiving end of these e-mails, but the HN leak
e-mail was the most responsible data-leak notification I've received.

Thanks for being professional and responsible!

------
Someone1234
So why was that data pushed "randomly" into the updates queue? Or put another
way why was it random rather than happening all of the time?

~~~
kogir
Had it happened in all cases, I'd have noticed it (I hope!) before pushing it
to the live site.

This was a case of rare error handling having unintended side effects. The
profiles were only published in one very unlikely case.

~~~
Alupis
Why are password hashes available via public API at all? Email addresses too?

~~~
kogir
Because we made a mistake that is now fixed? It was definitely not
intentional.

~~~
Alupis
ok. the way you worded it, it sounded like the mistake was the publishing of
the info, not that certain info was made available via the api. i was
questioning why certain data would be available via the api, but it appears it
was only available during error.

~~~
dang
> it appears it was only available during error

Yes, for two different meanings of "error". I wonder if this ambiguity is part
of what was confusing, so let me disambiguate.

 _There was a programming error in some error-handling code._ If that makes
sense, you can skip the following.

The first kind of error was a bug in our code. The API code is _never_
supposed to send info to Firebase that you couldn't get by scraping the public
website. Unfortunately, we wrote a line of code that (in a totally non-obvious
way, which is why it got past both testing and code review) broke this rule.

The buggy line of code happened to live inside an exception handler. That is,
it was only ever executed when a runtime error occurred. That's the second
type of error we're talking about. An example might be a network timeout.
These are infrequent, so the buggy code only ran infrequently.

Had the bug been in the happy path, testing would have caught it right away.
In this case, though, even testing the error path didn't catch it, because
certain other (basically random) conditions had to happen in order for the
buggy code to leak the data. That's why we didn't know about it until Ovidiu
told us.

------
undrcvr-lagggal
Wow if _HN_ can't even get this right, it's no wonder so many fortune 1000 and
fortune 100 companies are compromised so often. Information wants to be free.
Secure programming requires thorough discipline.

------
jacquesm
From the data leaked it sounds like HN is exporting a lot more than it should.

~~~
kogir
Well, yes. That was the bug that was fixed.

~~~
jacquesm
Ah I see. Ouch. Well, I'm sure you'll be a lot more careful in the future ;)
Thanks for the transparency, it's much appreciated (and in general short
supply elsewhere).

------
wslh
Is there a prize for the 84 users? even symbolic...

~~~
louthy
As one of the 'winners' I'm just happy that everything was hashed and salted
correctly.

------
spolu
Hey, how come this is not yet on the homepage? my last post is weirdly stuck
in newest as well... Any pb with the posts are picked up for homepage?

~~~
dang
You can't derive the front page or an individual story's rank from the
displayed score and timestamp alone. That's on purpose, as an anti-gaming
measure.

I'm going to mark this subthread off-topic now. If you have questions like
this, please don't post them here, but rather email hn@ycombinator.com, as the
guidelines ask.

~~~
spolu
sure no problem

------
opendais
"Please put a valid address in the email field, or we won't be able to send
you a new password if you forget yours. Your address is only visible to you
and us. Crawlers and other users can't see it."

Welp, I'm taking my email out.

~~~
kogir
Make sure never to commit to a public git repo either.

(Edit)

In all seriousness though, we're really bummed this happened and wish it
hadn't. We do code reviews and try our best to prevent this kind of thing from
happening. That said, if you truly want your account here to be anonymous,
you're right to remove all personally identifiable information. I'd also
recommend using tor (and using it correctly).

~~~
opendais
I've made statements on HN I'd prefer are anonymous to the general public?

Also, there was no mention they were handing the info over to a 3rd party. If
you explicitly state something like that, you should follow it and/or change
it when the situation changes.

I don't have that issue with git repos.

I'm kinda amused a yc employee went through the effort of downvoting it after
pointing out this situation is caused by y'all not following what you actually
have in your notices for things.

~~~
Igglyboo
>I'm kinda amused a yc employee went through the effort of downvoting it after
pointing out this situation is caused by y'all not following what you actually
have in your notices for things.

Seriously what is it with HN/Reddit where everyone assumes that any downvotes
are from people with an agenda?

~~~
tedunangst
Because I know I'm right and all rational people agree with me. Anyone who
disagrees is a bad person.

