Fun! $ export LC_ALL='C' $ awk '{ print $2 }' 10-million-combos.txt | tr 'A-Z' '...

dvdhsu · on Feb 10, 2015

Cool! I found the usernames interesting as well, since not many studies have been done on them. "dragon" is both a common username and password! In reply to another child post: the enormous number of "michael" passwords probably has to do with the smaller, but still large, number of "michael" usernames.

I'd run some more commands, to find out how many "michael"s use "michael" as their password, but I've got to head out now. Would be interesting -- anybody up for it?

(Ooh -- you could even juxtapose the usernames against common American names by decade [1], and probably derive some data about the ages of these users as well!)

(Furthermore -- what if we started keeping track of most common passwords by decade? That could be super interesting! I wonder if it's changed much!)

  $ export LC_ALL='C'
  $ 0-million-combos.txt | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | head -n 20 3044 infouniq -c | sort -nr | head -n 20
  2119 admin
  1323 michael
  1113 robert
  1095 2000
  1049 john
  1041 david
  967 null
  940 richard
  922 thomas
  901 chris
  866 mike
  843 steve
  832 dave
  816 daniel
  812 andrew
  797 george
  765 james
  735 mark
  730 dragon

1. http://www.ssa.gov/oact/babynames/decades/names1980s.html

jcm1317 · on Feb 10, 2015

For some reason I seem to be getting different values then you. However from what I got, there was only a single instance of a username 'michael' having a password 'michael'.

HOWEVER, of all of the people whose password is 'michael' 83 seem to CONTAIN the str 'michael'.

Of the set of usernames 'michael' there are 20 whose passwords contain the string 'michael'

Of the set of usernames containing the string 'michael' there are 276 passwords that contain the string 'michael'

I honestly expected much more.

userbinator · on Feb 10, 2015

In other words, supposing that this data is representative of most peoples' password practices, just trying these 20 passwords gives you a ~18% success rate for any username.

And... dragon. That's an unusual password to make the top-10 list. I think this might be a somewhat skewed sampling.

yoha · on Feb 10, 2015

You forgot a zero:

   >>> (55893+20785+13582+13230+11696+10938+6432+5682+4796+4191+3845+3734+3664+3655+3330+3206+3136+3126+3050+3002) / 1e7
   0.0180973

That is, 1.8%. This is confirmed by http://maxmcd.com/passwords.html.

pgwhalen · on Feb 10, 2015

It makes equally little sense to me, but "dragon" is routinely high on top password lists.

burkaman · on Feb 10, 2015

I think it's probably just a common thought process. I'll pick an animal -> dragons are the coolest animal -> nobody will ever guess dragon, this is way better than using my dog's name.

Have you ever seen those online riddle things that say pick a color, pick a tool, wow I bet you picked a red hammer! We all grow in relatively similar societies, we all have relatively similar ways of thinking.

jimmytucson · on Feb 10, 2015

    > I'll pick an animal -> dragons are the coolest animal -> nobody will ever guess dragon, this is way better than using my dog's name.

I must confess, this is typically my exact thought process when crafting a password, a username, or even sometimes a nickname for people to call me in real life.

caf · on Feb 10, 2015

Better change your passwords, Dragon.

jessaustin · on Feb 10, 2015

That many people have noted the "dragon" phenomenon as strange, but we don't yet have an explanation, is perhaps stranger yet. In early days, one could have hypothesized that some basic "how to use passwords" resource had offered "dragon" as an example of a password, but after two decades of internet it seems unlikely that something like that could have had such a large effect.

whoopdedo · on Feb 10, 2015

Part of it may be where the passwords are scraped from. If "dragon" has some relevance to the field then there's a higher probability that it will be used by people working in that field. This list is a sample of passwords from compromised databases not from all databases in the world.

I wonder about the prevalence of "allsop" as a password. I came across it in a computer I was repairing last week and it shows up 159 times in this list. Is it from the acronym SOP? Or because of the company that makes mouse pads?

userbinator · on Feb 10, 2015

My first thought was that there could be some connection with MMORPGs, which often feature dragons.

Or because of the company that makes mouse pads?

This is probably the case for "allsop" - there are people who will look around them for inspiration when coming up with a password, and what was written on their mousepad caught their attention.

Karawebnetwork · on Feb 10, 2015

I have a close friend whose password is dragon. At first, I thought it was a joke, but it's true.

Those of us who were kids in the 90's had dragons everywhere. Hell, we wore shirts with dragon patterns. Dragons were cool. Dragons were our passwords.

CamperBob2 · on Feb 10, 2015

So is "jesus", and that doesn't seem to be true here. I find this list highly dubious, compared to others I've seen (and, long ago, obtained myself.)

m8urn · on Feb 10, 2015

And it has been for 20 years

WillNotDownvote · on Feb 10, 2015

Computers are magic. Dragons are magic. QED.

I'm actually kinda serious.

Also, humans are monkeys. Ergo, "monkey" is popular.

oneeyedpigeon · on Feb 10, 2015

"humans are monkeys" - yeah, in the same way that unicycles are hovercrafts.

pavel_lishin · on Feb 10, 2015

> supposing that this data is representative of most peoples' password practices

That might not be the case; not all passwords are created equal.

As an example, my password to some goofy online game that requires registration is nowhere near as strong as the password required to log into my work email account - for some things, I prioritize being able to type a password in quickly on a mobile device over the danger of someone breaking in and playing a low-scoring word in online scrabble.

crisnoble · on Feb 10, 2015

It makes no sense to me, but I do recall a middle school phase where I used either "dragon" or "drag0n" for my passwords. I didn't particularly even like dragons and I don't recall ever hearing others use it, so it really catches me by surprise. Whenever I see it in a top passwords list I am filled with memories of after school library trips.

maxmcd · on Feb 10, 2015

https://github.com/maxmcd/pwd-guess

MarkMc · on Feb 10, 2015

For sensitive sites, my preferred solution to this problem is to add a sequence of random characters to the User ID field. The user would then authenticate with something like this:

  User ID: John-CPE4E38J
  Password: snoopy

For extra security the code would then move the random characters to the password so the authentication library would see this:

  User ID: John
  Password: snoopy-CPE4E38J

In this way even an attacker who gains full access to the server database would be unable to read the passwords (assuming they have been hashed well).

Also, the User ID can be stored in a cookie so that the User ID field on screen is pre-populated and the user only has to type "John-CPE4E38J" when he switches to a new computer.

More details here: http://security.stackexchange.com/questions/80352/is-it-a-ba...

IgorPartola · on Feb 10, 2015

This is a horrible practice. You are trying to implement two factor auth, but with a static second factor that will not be considered private by most users. It is a huge burden on them to remember, and is providing you with dubious security at best, and actually providing a vector of attack at worst. Please don't do this.

MarkMc · on Feb 10, 2015

Yes, it is two factor authentication with a static second factor that will not be considered private by most users. And yes, a 'real' two-factor authentication mechanism would provide better security.

Unfortunately, due to market competition many websites simply cannot require 'real' two-factor authentication for all users. Here are the steps I would need to provide to my father to register for a typical '30-day free trial':

  1) Go to website.com and click 'Register'
  2) Enter your email address
  3) Think of a password and type it 
  4) Click 'I agree'
  5) Click 'Register'

Here are the steps I would need to provide to my father to register on a website for a free trial with 2-factor authentication using the Google Authenticator app:

  1) Go to website.com and click 'Register'
  2) Enter your email address  
  3) Think of a password and type it 
  4) Click 'I agree'
  5) On your phone, press the 'Play Store' or 'App Store' icon
  6) Press the 'Search' icon and search for 'Google Authenticator'
  7) Press 'Install' and wait for it to install (if you have an iPhone the install button might look like a little cloud icon)
  8) Press 'Open' to open Google Authenticator
  9) Press the 'Menu' button which looks like three dots in the top-right corner of the phone screen
  10) Choose 'Scan with barcode'
  11) Point the phone at the computer screen as though you were going to take a photo of the barcode on screen. 
  12) Wait for the phone to register the barcode, then enter the number shown on your phone into the website form
  13) Click 'Register'

Even with all these steps laid out for him, my father would probably find it extremely frustrating to get to step 13.

stef25 · on Feb 10, 2015

You could do it for him. Google Authenticator is great. My bank uses 2FA but it's on some fiddly little calculator device that I never have with me.

Some sites (Coibase) do 2FA with text message which is also great.

seriocomic · on Feb 10, 2015

> My bank uses 2Fa but it's on some fiddly little calculator device that I never have with me.

I left my bank for this very specific reason (HSBC Aust)

Grrr

dingaling · on Feb 10, 2015

Conversely I stay with my bank ( Nationwide ) because they use the device...

learnstats2 · on Feb 10, 2015

Your bank has not done this for your benefit and it hasn't done it in a way that benefits you. They've done it to pass on (to you) the liability for any fraudulent activity.

From http://www.cl.cam.ac.uk/~sjm217/papers/fc09optimised.pdf:

"We reverse engineered the UK variant of card readers and smart cards and here provide the first public description of the protocol. We found numerous weaknesses that are due to design errors such as reusing authentication tokens, overloading data semantics, and failing to ensure freshness of responses. The overall strategic error was excessive optimisation. There are also policy implications."

"The move from signature to PIN for authorising point-of-sale transactions shifted liability from banks to customers; CAP introduces the same problem for online banking. It may also expose customers to physical harm."

Meanwhile, I switched to a bank that uses SMS as a second factor and only where it's necessary: I don't need to use an inconvenient calculator.

handsomeransoms · on Feb 10, 2015

Are you generating the User ID with the additional characters and expecting the user to remember/keep track of it? I do think that is very user-friendly, even with the cookie trick you describe.

It seems like you are trying to force your user to remember a salt. Why not just use a proper salt and a strong password hashing function?

Also note that this protection is only useful in the case where an attacker can get a database dump but cannot perform an active attack on the server.

On the other hand, I have seen some sites (gandi.net comes to mind) do something similar to this. Wonder if they have a similar security reasoning?

MarkMc · on Feb 10, 2015

> It seems like you are trying to force your user to remember a salt.

Yes, essentially I'm trying to force the user to remember a client-side 'salt'.

> Why not just use a proper salt and a strong password hashing function?

Because it wouldn't protect against the attack described by userbinator (ie. 'just trying these 20 passwords gives you a ~18% success rate for any username'). Having a client-side 'salt' gives you that protection.

> I do [not] think that is very user-friendly, even with the cookie trick you describe.

Yes, this system imposes a cost in terms of user-friendliness. But for sensitive sites (eg. medical or financial) I think it's worth it.

kbart · on Feb 10, 2015

Sensitive sites should use 2-factor authentication by default as your method won't help against keyloggers and other malware. I don't like 2-factor authentication (it's more time consuming and costly to get a throw away phone number than a new single purpose email address to register to a random site), but this method is even less user friendly as you can't expect an average user to remember a random symbol string in few months. What would really improve security situation is a good, easy to use, cross platform, cross device password manager that would be included in major browsers by default.

thaumaturgy · on Feb 10, 2015

From a user experience standpoint, this is a bit of a nuisance. Users are already having real difficulty remembering all of their different usernames and passwords for different things. A password manager is still an alien thing to a lot of people. A lot of people still have a little text file somewhere, or they rely on messages stored deep in their mailboxes somewhere, or they have a little piece of paper they try desperately not to lose...

You're right that their browser auto-complete will usually take care of it, but once it doesn't (because they switched browsers, because they got a new computer, because it got infected with malware and they took it to a wipe-and-reinstall shop), I'd expect a significant number of your users to fall back to just doing a password reset, which is a hassle.

From a security standpoint, I'm not sure what problem you're trying to solve. I get that you want to strengthen your users' passwords, but what is the specific scenario you're imagining where this is the best prevention? If you're concerned about someone brute-forcing user accounts from the outside, just make sure you have some sane throttling code. If you're concerned about someone stealing your database and breaking user passwords, just make sure you're using a robust password storage mechanism (blah blah bcrypt scrypt etc. etc.) and the usual other internet-facing application best practices (parameterized queries for example). If you're still feeling paranoid about that situation, then probably your server code could add some value to each password without doing any harm, I dunno. If someone gets sufficient access to your server to get your database and your code, game's over anyway. If you're concerned about your user having their credentials compromised elsewhere and that being used to access their account, do the same thing that many banks, Linode, and other services do: maintain IP white, grey, and black lists, and send a challenge/response to the user by text or email if the IP is on a grey list (in addition to checking for their login cookie first).

Your approach is different, but I don't understand it yet. :-)

MarkMc · on Feb 10, 2015

Yes, I agree it would a bit of a nuisance for users to perform the 'reset password' after they switch computers, re-install the operating system, etc.

And yes, I'm trying to solve the two problems you mentioned: (a) someone brute-forcing user accounts from the outside; and (b) someone gaining access to the server database and thereby gaining access to other sites where the user has the same credentials. If it is true that "just trying these 20 passwords gives you a ~18% success rate for any username" then it seems to me that throttling brute-force attempts would not be very effective.

chias · on Feb 10, 2015

Is this materially different from requiring the user to have some random characters in the password, but for some reason making them type these characters into the username field where it'll be cached by the browser's autocomplete feature?

It seems like this is an amusing enough hack to do on non-sensitive sites, but I wouldn't do this on anything "real". When it comes to authentication, "hey I had this really neat idea" is almost always an immediate precursor to making things worse.

MarkMc · on Feb 10, 2015

If the random characters are stored in the User ID field then 95% of the time the user just has to remember their password. It is only when the user switches to a new computer that they would need to type in the random characters. Wouldn't that be a significant benefit over having to type the random characters every time the user logs in?

I agree with your observation that "hey I had this really neat idea" is almost always an immediate precursor to making things worse. Almost.

libria · on Feb 10, 2015

I'm surprised (disappointed?) only 1 person used "correcthorsebatterystaple".

pthreads · on Feb 10, 2015

That is terrible, he/she used the same phrase as in the example!

300bps · on Feb 10, 2015

Don't read too much into this. My main email account is in the original list that was posted in October of 2014. My account that is listed is myname@gmail.com. The password though is not the password to myname@gmail.com but rather to my "junk" site password.

For almost any site I have an account, I use a strong, unique password. For sites that I don't care about at all AND that I suspect have security problems I use a standard common insecure password. It is that common insecure password that is paired with my gmail account.

vacri · on Feb 10, 2015

Looks like if you know someone called Michael, chances are that you need to talk to him and his loved ones about password hygiene...

WalterBright · on Feb 10, 2015

My name isn't Michael, but I use the password 'michael' all the time.

Edit: oh, crud

num · on Feb 10, 2015

  10938 12345

That's the same combination I have on my luggage!

benbristow · on Feb 10, 2015

Heh. I use 'password' for when I'm purposely trying to make things unsecure (Like being nice and sharing my unlimited data via my phone's wifi hotspot on public transport).

stinos · on Feb 10, 2015

So this dataset seems to be limited to english speaking qwerty using users, i.e. US only I guess?

lurkinggrue · on Feb 10, 2015

Cool! My password hunter2 wasn't at the top of the list!

cfrs · on Feb 10, 2015

here is the top 48K for lazy ones http://ix.io/ggh