
1.4B records from “Have I been pwned” for analysis - runesoerensen
https://www.troyhunt.com/heres-1-4-billion-records-from-have-i-been-pwned-for-you-to-analyse/
======
minimaxir
Hmm. Given the amount of information removed (justifiably so), there isn't
much left to analyze other than aggregates. (Which the dashboard mockup in the
post already does)

What might be interesting is graphing a network of the relationship between
services which suffered a breach. Fortunately, I have the proper workflow
prepared to process the data in this manner, so I'll take a look.

~~~
thaumaturgy
Yeah. I would've rather had a dump of _just_ the passwords -- no usernames,
site names, service names, counts, deduplication, just 1.4 billion passwords,
one password to a line.

I'd feed it into my bad-password-indexing program
([https://github.com/robsheldon/bad-passwords-
index](https://github.com/robsheldon/bad-passwords-index)) and it would spit
out a file that could be used to improve password security in a not-entirely-
stupid way somewhere.

~~~
ComodoHacker
Also that's exactly what malicious actors would want to improve their
dictionaries, to brute-force poorly-hashed dump from the next breach.

~~~
koolba
I thought all his data came from publicly available dumps. If so, it might
make it easier to for them to have it all in one place, but it's not like he's
giving them anything they can't get if they really wanted it (and probably
have already).

~~~
lorenzhs
No, it also contains non-public dumps that someone gave to him (after
authentication) iirc. But he never pays for dumps. I think he posted an
explanation on his blog a while ago.

------
gkafkg8y8
I used to be somewhat interested by stats on passwords, etc. from breach data
dumps.

haveibeenpwned is a helpful and legit site, though I think it should have used
email confirmation instead of requiring only an email address.

I also respect Troy along with many other security researchers. Even those
that are up to no good in the security world in some ways have contributed
good things; after all, the rest of us are stronger and more vigilant now than
we used to be because of their work.

However, this anonymized data will almost certainly be used by black hats more
than white hats, and I don't see how this release is good for the majority of
those that were affected by these breaches.

~~~
Arcsech
It does require email confirmation before showing "sensitive" data, like if an
email was in the Ashley Madison or AdultFriendFinder breaches.

~~~
problems
Which in and of itself is silly. The raw dumps are already available to
everyone, blackhats included. Personally I'm tempted to make a site that just
lists it by domain and stuff. I found several people at my company with Ashley
Madison accounts using a quick grep.

~~~
hughes
Even so, raising the barrier of entry to this data will prevent some people
from casually looking up their peers. It's worth doing.

~~~
problems
Casually looking up your peers is exactly what you should be doing in my
opinion. But I'm not a good person and I'd rather see details like that
plastered everywhere. Be an idiot, get what you deserve.

~~~
scrame
You must be a blast to work with.

------
ksec
I have wanted to ask this for a long time. What do / should you do once you
have been pwned?

You definitely have to change password, or even use password manager. But your
record is now widely available it feels as you have been naked on the
Internet.

You may likely change your email address ( Login Name ). But opening another
email account is such an hassle. An opening yet another account on those of
your favorite site means you lost all of your pass record.

~~~
awad
Saw this on HN earlier in the year and have been slowly phasing into my life.
I have mail setup for my domain and assign per-site aliases (hn@mail.com,
fb@mail.com, etc). All of the mail just forwards to my gmail that I've had
forever, though that is more of a matter of convenience and habit. The key
thing is, I can spin up a new email whenever I want since I am the wildcard,
and I control the host and am not tied to the whims of gmail so I am covered
on both ends.

~~~
FT_intern
Be careful running your own email server. Some domain hosts have terrible
security (easily socially engineered).

I would love to know which DNS provider has the best security if anyone has
done the research.

~~~
quesera
Running your own name server is even easier than running your own mail server.

Running your own registrar gets tricky, however.

~~~
garaetjjte
But registrar still have access to change to what NS domain points. (and DS
records)

------
jasonallen
I'm always amazed at how efficient AWS is at detecting private keys on the
internet (checked into github, etc..), and then proactively locking down
accounts. I wonder how long it will be until we see a similar service from
consumer accounts, like github or twitter. Seems like "have I been pwned"
might offer a commercial API for such benefit...

~~~
symlinkk
HIBP is simply a database of email addresses that have been associated with
leaks. I don't see how that would help him identify private keys.

~~~
johncolanduoni
Well AFAIK Twitter doesn't give users keypairs, so I think they meant do the
same with leaked credentials.

~~~
jasonallen
yeah, that's what I mean. Provide a service that reports back leaked working
credentials so that the service provider can lock down the account.

------
kumarski
The data can all be found here as well though:
[https://www.thecthulhu.com/](https://www.thecthulhu.com/)

I've come to assume some non-zero percentage of sales/marketing/lead gen
companies are using this as a go-between for an SMTP validator.

------
meneses
Awesome Analysis of the data -
[https://www.sizzleanalytics.com/Boards/sizzle/Data-
Breaches-...](https://www.sizzleanalytics.com/Boards/sizzle/Data-Breaches-
from-Have-I-Been-Pwned/3799c30d-b9e7-419b-bcc2-085f18d7a898)

------
gnu8
I still want the dumps for myself, so I can determine if any of my accounts
are compromised. I understand why it's not possible to distribute these dumps,
but for the same reason, I'm not going to search someone else's database. How
can we have a database of dumps that people can query safely, without
revealing their information to the people with the dumps?

~~~
mdrzn
Search the dump yourself and look for your address.

Or even better, build the service you're asking for! Clearly, there's a need.

------
jxramos
Good distinction, data for analysis/trends, data for nefarious victimization.

