
Show HN: 5+ Billion Passwords in Order of Most Popular - berzerk0
https://github.com/berzerk0/Probable-Wordlists
======
lucasgonze
This list is immediately useful for validating user-created new passwords.
Just stop with the bizarre rules about having uppercase, lowercase, symbols,
numbers, length, etc. Instead require a string not in the top 10K (or 100K, or
1M) most popular.

~~~
microwavecamera
"Your password must be between 12 to 46 characters long and must include at
least one number, upper case character, special character, kanji character,
rune and quadratic equation"

~~~
dheera
I used to just tack on Aa$0 to the end of all my (already good) passwords to
satisfy these idiotic rules. Until one website decided that '0' was not a
number, and then I had to change everything to Aa$1.

~~~
terminado
I tend to revolt against this kind of bullshit by creating terrifyingly
primitive, pathetic passwords, and flagrantly sharing what my password is, in
front of relevant admins, while complaining loudly about the concept of "
_character diversity,_ " as a way of fomenting popular rebellion.

When I'm met with periodic expiration, I employ constant direct password
assistance as a means of unpopular two-factor authentication, to piss people
off, and stir the pot.

I then fail my password reset as many times as necessary to recover my
preferred password for that system, to enforce to freedom of password
selection.

I control my password. Me.

Not you. Me.

~~~
arcbyte
Im glad Im not the only one that does this.

------
gfody
you should package this up as a bloom filter and simple js routine web
developers could use to do client-side checks to validate passwords.

edit: on 2nd thought looks like a bloom filter for 5B entries at p=0.01 would
be ~5GB, so not exactly convenient

~~~
berzerk0
The largest list is 20GB, but it's not the only one.

Popularity was based on how many they appeared in files that had all
duplicates removed (in reference to themselves)

The smallest file had passwords that appeared 75+ times, and the largest file
had passwords that appeared 2+ times.

The top 195 Thousand (which appeared 25+ times in analysis) clocks in at 803kb
as a text file with nothing but the passwords themselves

------
Roritharr
Somehow I wonder if life is significantly different for people named Daniel.

~~~
fasteo
Or Alexander or Victoria. It's weird that these first names appear in the
Top196-probable.txt file.

------
berzerk0
Having issues with the Seedbox, torrents are down temporarily.

The main page contains links to Mega.NZ alternative downloads. Will be fixed
shortly, apologies for the inconvenience.

~~~
kaslai
Is there really any reason to multiply the downloads by 4 just to have
different compression methods? It seems to me like that just needlessly
dilutes the seed swarm and wastes space, since pretty much any modern archive
reader can unpack all the provided formats. Sure, specialized command-line
utils can't, but if you're using one of those then you probably know which one
to use for a given format.

~~~
berzerk0
You know, you're not the first person to bring that up - and I am having seed
problems.

Perhaps next rev I'll only include the .7z (smallest) and greppable formats.

------
jacquesm
I don't understand: Why do you assume that password checkers keep their lists
in alphabetically sorted form rather than just to load the whole thing into a
db table with an index on it?

~~~
berzerk0
As I was looking around for the files to make this project, on SecLists,
Weakpass, and Hashes.org, most of the files were in alphabetical order. This
was especially true for the larger files.

~~~
jacquesm
Yes, but it doesn't matter what order they are in if you put them in a DB. I
don't know anybody that would do a sequential scan over a file in production.

All you've done is shuffled around the most frequent cases to the beginning,
which is great if that's what you are looking for, but now something else
occupies the last slot and that case is just as bad as before.

The real solution is to see these files as input to an indexed table so that
you can get to the entry you want in log(n) time.

~~~
neuroid
_I don 't know anybody that would do a sequential scan over a file in
production_

Well, that's pretty much how one would try to crack a password using a
wordlist.

EDIT: If the goal is to crack a bunch of properly hashed (PBKDF2, scrypt,
etc.) and salted passwords then a lookup table is not very practical.

~~~
jacquesm
If you're trying to crack 'a password' you're doing it wrong. Alternatively,
you're doing something illegal.

That's not why these lists are made public (though I can appreciate the fact
that they are 'dual use').

------
smaili
Not to sound negative, but is this something we should _really_ be exposing?
It feels like the only ones who gain from this are hackers and password
crackers, no?

~~~
mvdwoord
Audits. Education. Change.

It's not like bad guys don't already have wordlists, or that they are new in
any way. Having good ones available, in the open, for everyone to use,
provides a net benefit in the long run imo.

~~~
berzerk0
This is similar to my train of thought.

Year after year people still use the same passwords that were the most popular
for the year before. I think we should make it as obvious as possible that it
is worth your time to make secure passwords.

I think people know that they shouldn't use "password" as a password, but it
just needs to be made very clear through safe means (such as publishing
wordlists that don't have any associated user data) that it leaves you very
vulnerable. Many people simply haven't stopped to wonder how secure their
password is.

My goal with this project is to flip the status of the passwords on the list -
and make them the LEAST likely to be used.

------
macscam
Wow this is so useful for um security

------
bbcbasic
If you pick the 5 billionth one you've probably picked a gooden.

~~~
dang
We've banned this account for abusing the site, including posting many
unsubstantive comments after we asked you to stop. Please don't create
accounts to break the site rules with.

------
Mz
_•These lists are for LAWFUL, ETHICAL AND EDUCATIONAL PURPOSES ONLY._

Yeah, like that is going to stop people from doing nefarious things with this
info. If you feel the need to post this screechy, all caps disclaimer, maybe
rethink your project entirely?

Geez.

