
I'm Open Sourcing the Have I Been Pwned Code Base - vquemener
https://www.troyhunt.com/im-open-sourcing-the-have-i-been-pwned-code-base/
======
rethab
For the lazy ones searching for a github link: It is not open source yet:

> HIBP isn't in a state to simply flick the visibility of it in GitHub, but it
> needs to get to that point. Instead, I need to choose the right parts of the
> project to open up in the right way at the right time.

and then further down:

> I don't have a timeline for each step along the way yet as HIBP remains
> something I do in my spare time and I've always got a bunch of other stuff
> on my plate, but the process has already begun and I'll be sharing more on
> that as soon as I can.

------
yodsanklai
Very naive question. Suppose I offer a service (e.g. Have I Been Pwned) and I
want my users to trust my service not to store/share their data. I understand
that open sourcing the code base is one step, but how can I know that the
server running the service actually does run this codebase and not something
else?

~~~
matsemann
On the side of your question, but HIBP's API is designed in such a way that
you don't have to give them your password to get notified if it has been
hacked. You instead provide HIBP with parts of the hash, and then it answers
if you maybe have been compromised. Take a look at requests with devtools on
the page
[https://haveibeenpwned.com/Passwords](https://haveibeenpwned.com/Passwords)

So here it's taken one step further, you don't have to trust that it's the
same code running on HIBP, because you don't have to trust them at all for the
service to work.

~~~
faebi
But how many people will check in detail what‘s submitted? And again, how do
you prevent that different users get different functionality?

~~~
mcherm
> But how many people will check in detail what‘s submitted?

That's a valid criticism. It only takes one person finding clear evidence of
problematic behavior to advertise that fact to the entire community. So long
as a small fraction of people do actually check, the whole community will be
fine. But if only a _negligible_ number of people check it, then perhaps no
one will be checking when it is abused.

> And again, how do you prevent that different users get different
> functionality?

Well, it depends on how the system is discriminating among users. To a great
extent, that kind of abuse is prevented by anonymous users. You don't have to
log into HIBP in order to use it. The system could still be discriminating by
IP address, or even by various kinds of browser fingerprinting (including just
collecting the advertising company cookies that so effectively eliminate
anonymity for much of the web). Occasionally using TOR won't help against this
-- a malicious operator could simply provide "clean" functionality to all TOR
users.

\- - - -

On the whole, things like, k-anonymous API design, making source code open, or
having a small fraction of users checking for security issues all help a great
deal and make it difficult for a malicious operator to abuse the system. But
none of them are _perfect_. In the end it comes down to trust. Troy Hunt has
EARNED that trust due to his openness, and in no small part by choosing to
implement protections like these that he didn't need to implement, and I find
that extremely persuasive. If you take steps to ensure that your CANNOT cheat
your "customers", then that's pretty good evidence you aren't likely to be
trying to cheat them.

~~~
kungato
How do we know if anyone is checking?

~~~
joshspankit
Another good question but since it’s trivial (opening the browser dev tools
during a submission), we can assume a significant number. Doubly so because
any service that says it’s secure gets a lot more scrutiny.

------
deanclatworthy
Troy does talk a little bit about the data here, but the codebase is not
entirely useful without the data. I don't doubt it can be improved from Troy's
one-man-efforts in that area, but the real value of haveibeenpwnd is in the
data - and the code is useless without a huge trove of breach data - which I
guess is never going to be open.

~~~
1f60c
There’s no perfect solution here: publish the raw data, and people who haven’t
changed their passwords get cracked, or publish hashed (in some way)
passwords, and people will crack the passwords themselves. (That said, I guess
most crackers will just get the data straight from the source, but still.)

~~~
speedgoose
He does publish the passwords hashed, with the number of times each password
was leaked (without salt and the weak sha1).

------
octorian
I hate to say it, but almost all of these sites/systems end up being far more
annoying than useful. Why? Almost all the time, their alerts simply are not
actionable. At best, they'll tell you that your username/email was included in
a breach and often identify the breach itself by some nebulous data cache name
that means nothing to you.

They almost never tell you which site was actually breached, nor do they ever
give you any hints as to what password was actually compromised.

So really, when I show up in one of these alerts, I'm always asking myself:

\- Was this a recent breach, or a redundant alert from something I dealt with
months ago?

\- What account do I actually need to update, if any, to be safe from this
alert?

These questions almost never seem to be answered. As someone who uses a
password manager and a different random password for every site, there's no
way I'm going to proactively hunt down and change every single entry in its DB
when I get the "alert of the week."

(FWIW, I once worked somewhere that somehow had access to a far better version
of this data than they'll ever let the public get access to. That system
actually did generate alerts that were actionable. I only wish I had a way to
get useful alerts like that as a private individual.)

~~~
darekkay
> They almost never tell you which site was actually breached

That's probably true for most people. But that's what email aliases are for.
If I notice (through 1Password alerts) that my me+facebook@mydomain.com email
got leaked, I pretty much get an idea which site was breached. At least GMail
and Fastmail support email aliasing.

~~~
toyg
I keep using that scheme but it’s been clear for a while that most systems
(and hence leakers) clean up those addresses. Which it has to be expected for
a feature that has been around for 15 years.

~~~
clarkdave
I use `facebook@mydomain.com` instead, which is easy enough to do with catch-
all aliases.

~~~
quickthrower2
Or me+Facebook and block email to me.

------
theshrike79
I give it 24 hours after release before someone rewrites it in Rust.

48 hours before its on HN.

~~~
notmalc
This man knows

------
atxbcp
One of the tweets in the article says Github is open-source: it's not.

~~~
Techbrunch
True but FYI if you want to, you can reverse engineer the code of the GitHub
enterprise version: [https://blog.orange.tw/2017/01/bug-bounty-github-
enterprise-...](https://blog.orange.tw/2017/01/bug-bounty-github-enterprise-
sql-injection.html)

~~~
johannes1234321
The fact that you can (probably under break of license terms) get to source,
doesn'tale it open source, as understood by majority of the community (which
is somewhere close to OSI's definition)

------
nelsonic
Is there a _reason_ (Security, Competitive, etc.) that the code wasn't open
source from the start?

~~~
buster
Not wanting to have toxic user requests and issues open for something you do
in your spare time could be one good reason.

~~~
msla
It's entirely possible to run a Git server which is read-only (for the rest of
the world, anyway) and doesn't allow anyone to bother you.

~~~
jcims
What's the point of doing that though? It's just chumming the water for
unsolicited feedback.

~~~
msla
> What's the point of doing that though? It's just chumming the water for
> unsolicited feedback.

You'd do it to prevent feedback.

~~~
jcims
People shit-talking your code on Twitter is still feedback.

~~~
msla
Nobody real monitors Twitter.

------
detaro
It's an interesting question: Could some open group etc replicate it entirely,
e.g. something like Let's Encrypt (which is also a free service funded by
various companies)? The data sources and import are the key bit where trust is
an issue, and weirdly enough a single individual might have an easier time
than a well-funded foundation etc.

~~~
anaganisk
Im not sure if those corporate institutions will allow it to notify breaches
quicker, if HIBP is backed by likes of Facebook, google etc. Its better for
HIBP to stay away from them.

------
nickthemagicman
Isn't the data whats important?

Isn't the code just essentially a text input box that takes a string, hashes
it, and runs it against hashed passwords in a database?

~~~
mooreds
Yup. Gathering the data is a large part of the value he provides. I wrote an
article about how to implement this, and data gathering tasks are crucial:
[https://fusionauth.io/learn/expert-
advice/security/breached-...](https://fusionauth.io/learn/expert-
advice/security/breached-password-detection)

He touches on the non technical difficulties as well with his comment: "We
invite parties to form their own views on the legality of the data." So the
fact he's gathered it all lets HIBP be an service that other companies can use
without worrying about the thorny legal question.

But there's also his reputation as a steward of the system, which is valuable
beyond the data itself.

Anyway, while he didn't actually open source anything yet, I'm glad he's
committing to it, as hopefully that will allow this internet resource to
continue.

------
42droids
Wow this is awesome news. I always wondered how the internals could work.
Can’t wait to take a peak...

------
mikorym
Didn't Troy want to sell to someone just a few years ago?

~~~
teh_klev
As mentioned in the article in the first paragraph:

 _" and it took a failed M&A process to get here"_

Second paragraph:

 _" especially in the wake of the M&A process[0] that ended earlier this year
right back where I'd started"_

[0]: [https://www.troyhunt.com/project-svalbard-have-i-been-
pwned-...](https://www.troyhunt.com/project-svalbard-have-i-been-pwned-and-
its-ongoing-independence/)

------
aspenmayer
I asked in comments on post which open source license he picked. Will update
if he replies.

~~~
tokai
Hopefully it'll be a proper free one.

~~~
aspenmayer
Hopefully it’ll _actually_ be open source, as in free as in money _AND_ free
as in beer. That is, I hope the license Troy Hunt picks for Have I Been Pwned
is _OSI-approved_ :

[https://opensource.org/licenses](https://opensource.org/licenses)

~~~
aspenmayer
Needless to say, I hope it’s also _free as in speech_. Saved the best for
last.

------
makach
It's good just for the sake of transparency. Maybe he will find someone who is
interested in helping maintain and contribute.

------
jcun4128
I'm curious what the code would be/how far of an extent it goes. I mean from
my experience using it it's a search box, type in email, get results... so is
this going to show scrapers or something where the data comes from?

Maybe has efficient hash comparisons or something...

~~~
matsemann
He linked to lots of blogposts and also talked about the k-Anonymity API
design that was invented for HIBP.

~~~
jcun4128
Interesting I have not heard of k-Anonymity before, will check that out,
thanks.

------
throwaway77384
What is this M&A process he keeps referring to?

~~~
MattGaiser
He tried to sell it at one point.

[https://threatpost.com/have-i-been-pwned-no-longer-for-
sale/...](https://threatpost.com/have-i-been-pwned-no-longer-for-sale/153401/)

~~~
throwaway77384
Thanks!

------
samirillian
> I'm sure I speak for [Junade] as well when I say we couldn't be happier that
> other companies have taken the model we pioneered and applied it to their
> own services too because at the end of the day, that's in everyone's best
> interests.

Socialism for "other companies," capitalism for us. It's details like this
that prove to me how purely ideological it is to claim that we need capitalism
to "produce value." _We_ produce value already. Capitalism uses that value for
free or a ridiculously low rate, then turns around and charges us for it.

------
bawana
Open sourcing code is one thing. Open sourcing the email addresses of
vulnerable victims is something else. It’s like publishing the largest
vulnerability of all time before it can be patched

~~~
BenjiWiebe
Am I missing something? I understand from the article that he _is_ open
sourcing the codebase, not the data...?

