Cool! I found the usernames interesting as well, since not many studies have been done on them. "dragon" is both a common username and password! In reply to another child post: the enormous number of "michael" passwords probably has to do with the smaller, but still large, number of "michael" usernames.
I'd run some more commands, to find out how many "michael"s use "michael" as their password, but I've got to head out now. Would be interesting -- anybody up for it?
(Ooh -- you could even juxtapose the usernames against common American names by decade [1], and probably derive some data about the ages of these users as well!)
(Furthermore -- what if we started keeping track of most common passwords by decade? That could be super interesting! I wonder if it's changed much!)
$ export LC_ALL='C'
$ 0-million-combos.txt | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | head -n 20 3044 infouniq -c | sort -nr | head -n 20
2119 admin
1323 michael
1113 robert
1095 2000
1049 john
1041 david
967 null
940 richard
922 thomas
901 chris
866 mike
843 steve
832 dave
816 daniel
812 andrew
797 george
765 james
735 mark
730 dragon
For some reason I seem to be getting different values then you. However from what I got, there was only a single instance of a username 'michael' having a password 'michael'.
HOWEVER, of all of the people whose password is 'michael' 83 seem to CONTAIN the str 'michael'.
Of the set of usernames 'michael' there are 20 whose passwords contain the string 'michael'
Of the set of usernames containing the string 'michael' there are 276 passwords that contain the string 'michael'
In other words, supposing that this data is representative of most peoples' password practices, just trying these 20 passwords gives you a ~18% success rate for any username.
And... dragon. That's an unusual password to make the top-10 list. I think this might be a somewhat skewed sampling.
I think it's probably just a common thought process. I'll pick an animal -> dragons are the coolest animal -> nobody will ever guess dragon, this is way better than using my dog's name.
Have you ever seen those online riddle things that say pick a color, pick a tool, wow I bet you picked a red hammer! We all grow in relatively similar societies, we all have relatively similar ways of thinking.
> I'll pick an animal -> dragons are the coolest animal -> nobody will ever guess dragon, this is way better than using my dog's name.
I must confess, this is typically my exact thought process when crafting a password, a username, or even sometimes a nickname for people to call me in real life.
That many people have noted the "dragon" phenomenon as strange, but we don't yet have an explanation, is perhaps stranger yet. In early days, one could have hypothesized that some basic "how to use passwords" resource had offered "dragon" as an example of a password, but after two decades of internet it seems unlikely that something like that could have had such a large effect.
Part of it may be where the passwords are scraped from. If "dragon" has some relevance to the field then there's a higher probability that it will be used by people working in that field. This list is a sample of passwords from compromised databases not from all databases in the world.
I wonder about the prevalence of "allsop" as a password. I came across it in a computer I was repairing last week and it shows up 159 times in this list. Is it from the acronym SOP? Or because of the company that makes mouse pads?
My first thought was that there could be some connection with MMORPGs, which often feature dragons.
Or because of the company that makes mouse pads?
This is probably the case for "allsop" - there are people who will look around them for inspiration when coming up with a password, and what was written on their mousepad caught their attention.
I have a close friend whose password is dragon. At first, I thought it was a joke, but it's true.
Those of us who were kids in the 90's had dragons everywhere. Hell, we wore shirts with dragon patterns. Dragons were cool. Dragons were our passwords.
> supposing that this data is representative of most peoples' password practices
That might not be the case; not all passwords are created equal.
As an example, my password to some goofy online game that requires registration is nowhere near as strong as the password required to log into my work email account - for some things, I prioritize being able to type a password in quickly on a mobile device over the danger of someone breaking in and playing a low-scoring word in online scrabble.
It makes no sense to me, but I do recall a middle school phase where I used either "dragon" or "drag0n" for my passwords. I didn't particularly even like dragons and I don't recall ever hearing others use it, so it really catches me by surprise. Whenever I see it in a top passwords list I am filled with memories of after school library trips.
For sensitive sites, my preferred solution to this problem is to add a sequence of random characters to the User ID field. The user would then authenticate with something like this:
User ID: John-CPE4E38J
Password: snoopy
For extra security the code would then move the random characters to the password so the authentication library would see this:
User ID: John
Password: snoopy-CPE4E38J
In this way even an attacker who gains full access to the server database would be unable to read the passwords (assuming they have been hashed well).
Also, the User ID can be stored in a cookie so that the User ID field on screen is pre-populated and the user only has to type "John-CPE4E38J" when he switches to a new computer.
This is a horrible practice. You are trying to implement two factor auth, but with a static second factor that will not be considered private by most users. It is a huge burden on them to remember, and is providing you with dubious security at best, and actually providing a vector of attack at worst. Please don't do this.
Yes, it is two factor authentication with a static second factor that will not be considered private by most users. And yes, a 'real' two-factor authentication mechanism would provide better security.
Unfortunately, due to market competition many websites simply cannot require 'real' two-factor authentication for all users. Here are the steps I would need to provide to my father to register for a typical '30-day free trial':
1) Go to website.com and click 'Register'
2) Enter your email address
3) Think of a password and type it
4) Click 'I agree'
5) Click 'Register'
Here are the steps I would need to provide to my father to register on a website for a free trial with 2-factor authentication using the Google Authenticator app:
1) Go to website.com and click 'Register'
2) Enter your email address
3) Think of a password and type it
4) Click 'I agree'
5) On your phone, press the 'Play Store' or 'App Store' icon
6) Press the 'Search' icon and search for 'Google Authenticator'
7) Press 'Install' and wait for it to install (if you have an iPhone the install button might look like a little cloud icon)
8) Press 'Open' to open Google Authenticator
9) Press the 'Menu' button which looks like three dots in the top-right corner of the phone screen
10) Choose 'Scan with barcode'
11) Point the phone at the computer screen as though you were going to take a photo of the barcode on screen.
12) Wait for the phone to register the barcode, then enter the number shown on your phone into the website form
13) Click 'Register'
Even with all these steps laid out for him, my father would probably find it extremely frustrating to get to step 13.
Your bank has not done this for your benefit and it hasn't done it in a way that benefits you. They've done it to pass on (to you) the liability for any fraudulent activity.
"We reverse engineered the UK variant of card readers and smart cards and here provide the first public description of the protocol. We found numerous weaknesses that are due to design errors such as reusing authentication tokens, overloading data semantics, and failing to ensure freshness of responses. The overall strategic error was excessive optimisation. There are also policy implications."
"The move from signature to PIN for authorising point-of-sale transactions shifted liability from banks to customers; CAP introduces the same problem for online banking. It may also expose customers to physical harm."
Meanwhile, I switched to a bank that uses SMS as a second factor and only where it's necessary: I don't need to use an inconvenient calculator.
Are you generating the User ID with the additional characters and expecting the user to remember/keep track of it? I do think that is very user-friendly, even with the cookie trick you describe.
It seems like you are trying to force your user to remember a salt. Why not just use a proper salt and a strong password hashing function?
Also note that this protection is only useful in the case where an attacker can get a database dump but cannot perform an active attack on the server.
On the other hand, I have seen some sites (gandi.net comes to mind) do something similar to this. Wonder if they have a similar security reasoning?
> It seems like you are trying to force your user to remember a salt.
Yes, essentially I'm trying to force the user to remember a client-side 'salt'.
> Why not just use a proper salt and a strong password hashing function?
Because it wouldn't protect against the attack described by userbinator (ie. 'just trying these 20 passwords gives you a ~18% success rate for any username'). Having a client-side 'salt' gives you that protection.
> I do [not] think that is very user-friendly, even with the cookie trick you describe.
Yes, this system imposes a cost in terms of user-friendliness. But for sensitive sites (eg. medical or financial) I think it's worth it.
Sensitive sites should use 2-factor authentication by default as your method won't help against keyloggers and other malware. I don't like 2-factor authentication (it's more time consuming and costly to get a throw away phone number than a new single purpose email address to register to a random site), but this method is even less user friendly as you can't expect an average user to remember a random symbol string in few months. What would really improve security situation is a good, easy to use, cross platform, cross device password manager that would be included in major browsers by default.
From a user experience standpoint, this is a bit of a nuisance. Users are already having real difficulty remembering all of their different usernames and passwords for different things. A password manager is still an alien thing to a lot of people. A lot of people still have a little text file somewhere, or they rely on messages stored deep in their mailboxes somewhere, or they have a little piece of paper they try desperately not to lose...
You're right that their browser auto-complete will usually take care of it, but once it doesn't (because they switched browsers, because they got a new computer, because it got infected with malware and they took it to a wipe-and-reinstall shop), I'd expect a significant number of your users to fall back to just doing a password reset, which is a hassle.
From a security standpoint, I'm not sure what problem you're trying to solve. I get that you want to strengthen your users' passwords, but what is the specific scenario you're imagining where this is the best prevention? If you're concerned about someone brute-forcing user accounts from the outside, just make sure you have some sane throttling code. If you're concerned about someone stealing your database and breaking user passwords, just make sure you're using a robust password storage mechanism (blah blah bcrypt scrypt etc. etc.) and the usual other internet-facing application best practices (parameterized queries for example). If you're still feeling paranoid about that situation, then probably your server code could add some value to each password without doing any harm, I dunno. If someone gets sufficient access to your server to get your database and your code, game's over anyway. If you're concerned about your user having their credentials compromised elsewhere and that being used to access their account, do the same thing that many banks, Linode, and other services do: maintain IP white, grey, and black lists, and send a challenge/response to the user by text or email if the IP is on a grey list (in addition to checking for their login cookie first).
Your approach is different, but I don't understand it yet. :-)
Yes, I agree it would a bit of a nuisance for users to perform the 'reset password' after they switch computers, re-install the operating system, etc.
And yes, I'm trying to solve the two problems you mentioned: (a) someone brute-forcing user accounts from the outside; and (b) someone gaining access to the server database and thereby gaining access to other sites where the user has the same credentials. If it is true that "just trying these 20 passwords gives you a ~18% success rate for any username" then it seems to me that throttling brute-force attempts would not be very effective.
Is this materially different from requiring the user to have some random characters in the password, but for some reason making them type these characters into the username field where it'll be cached by the browser's autocomplete feature?
It seems like this is an amusing enough hack to do on non-sensitive sites, but I wouldn't do this on anything "real". When it comes to authentication, "hey I had this really neat idea" is almost always an immediate precursor to making things worse.
If the random characters are stored in the User ID field then 95% of the time the user just has to remember their password. It is only when the user switches to a new computer that they would need to type in the random characters. Wouldn't that be a significant benefit over having to type the random characters every time the user logs in?
I agree with your observation that "hey I had this really neat idea" is almost always an immediate precursor to making things worse. Almost.
Don't read too much into this. My main email account is in the original list that was posted in October of 2014. My account that is listed is myname@gmail.com. The password though is not the password to myname@gmail.com but rather to my "junk" site password.
For almost any site I have an account, I use a strong, unique password. For sites that I don't care about at all AND that I suspect have security problems I use a standard common insecure password. It is that common insecure password that is paired with my gmail account.
Heh. I use 'password' for when I'm purposely trying to make things unsecure (Like being nice and sharing my unlimited data via my phone's wifi hotspot on public transport).
[1]: http://www.wordle.net