Hacker News new | comments | show | ask | jobs | submit login
The longer passwords in the Last.fm database (leakedsource.com)
141 points by ProfDreamer on Sept 8, 2016 | hide | past | web | favorite | 119 comments



At first I was really impressed by `1qaz2wsx3edc4rfv5tgb6yhn7ujm8ik,9ol.0p;/`, but then I watched my keyboard and all became clear. These brute force tools are getting better and better at trying useful combinations to the point where I think all "clever" are now known to the tools and the only thing that remains is completely random passwords as they are generated by password managers.

Thank you for posting this list - this is very enlightening.


I once used shift + top number row as a password for windows.

I thought I was really clever until I changed keyboard layouts one day and had to use the browser on my phone to get a picture of the old keyboard (and its layout) to figure out what the actual password was.


Yep yep yep! A long time ago I had four or five different passwords based on QWERTY keyboard patterns and I had no idea what any of the passwords even were. I thought it was so cool!

Then I moved to a country that used predominately AZERTY keyboards and couldn't access any of my accounts...


This is known as a "keyboard walk" and its variations are one of the more common patterns tested for.


I wonder if Dvorak keyboard walks are also targeted?


I'm guessing not. Your comment is my first result in Google search for "keyboard walk" dvorak.

Edit: Ars Technica had a piece a while back on crackers and how they build their lists of probable passwords. I imagine if Dvorak-based passwords became common, they wouldn't last long as "secure" before appearing in those lists.

Edit: The Ars link follows. You'll notice that the password in the story title is a keyboard walk, which the writer confuses as being complex, when it is not: http://arstechnica.com/security/2013/05/how-crackers-make-mi...


To add to the ease of hacking around it, how the hell are you supposed to log in from a phone? Or a different keyboard layout?


Password manager but then in that case it might as well be random.


That's a matter of memorizing when you encounter that situation.


Had a good laugh at this one

     <script>alert(document.cookie);</script>


It look like someone fishing for a vulnerability instead of a real password.


Why not both? I mean Robert');DROP TABLE students;--1 seems like a pretty good password.


I'm not laughing.

This was probably an account created by a bot trying to detect the site's vulnerability to HTML injection.

It makes sense for a password cracker to include some common such attack strings, so it can get any accounts that get created by such probing: very clever.

A random, uncommon piece of programming language or markup language syntax would actually be a good password---you would think! I also wouldn't laugh if that were cracked.


If it were a bot, why would they include the `alert`? That would freeze up the UI, and provide no more useful information than `console.log`.

I'm guessing a human, who uses that password to see which websites are poorly coded.


See previous discussion here: https://news.ycombinator.com/item?id=12409530

Notably, the passwords were stored as unsalted MD5 hashes, which even in 2012 was known to be a poor idea.


This is a stupid question but how is it possible to reverse such long passwords from even a poor hashing algorithm? Even if the hashing algorithm is super fast, testing all combinations up to dozens of characters should still be impossible. And isn't there the possibility of collisions, so that even if you find a string that maps to the same hash, it might not be the original password?


> Even if the hashing algorithm is super fast, testing all combinations up to dozens of characters should still be impossible.

It's just time consuming, and compute time has gotten a lot cheaper. Advancement in GPU computing, cheap clouds, etc have really changed the game on this one since the invention of MD5 (which is why unsalted MD5 is such a bad idea now).

It's important to note that brute force is always technically possible, and the best hashing algorithms just consume more time. When we say "impossible" we really just mean "impossible to bruteforce before the heat death of the universe".

> And isn't there the possibility of collisions, so that even if you find a string that maps to the same hash, it might not be the original password?

That's correct, but if the hash is what's stored then the server is checking hash vs hash and a collision will result in login.

This is all coming from one computer security course, but I'm sure someone on HN can correct me if I've got any of this wrong.


They were obviously testing for combinations of words and not combinations of single characters. They might even have tested plain sentences. Still very impressive. After all, the leak dates back to 2012. I wonder how much time did the first one take for example.

I think strings that maps to the same hash are just inintelligible garbage. If you find something that looks like human then it's certainly the original password.


The first one is the title of a song [1]. The attackers probably have a lot of common phrases, song titles, and other catchy excerpts in their dictionary.

[1] https://www.youtube.com/watch?v=I915tOiR9sM

If it weren't a song title, it would probably have been impossible to crack. That sentence has 12 words. People say that most English conversations only use 3000 words. 3000^12 is 2^138. It has quite a bit more entropy than what we can crack nowadays. Besides, "stripper" isn't part of the 3000-word dictionary.


Those 3000 words are not random in natural language. If they were your calculation would be correct, but they aren't so the actual entropy of the system is likely nowhere near 138 bits. In other words, song title or not, if the sentence was an actual sentence the entropy is much lower. To get maximum entropy out of sets of words you have to use something equivalent to Diceware.


The ArsTechnica coverage of password cracking is eye-opening and informative about how high-speed password cracking is done. And this was 3 years ago!

http://arstechnica.com/security/2013/05/how-crackers-make-mi...


> And isn't there the possibility of collisions, so that even if you find a string that maps to the same hash, it might not be the original password?

Not really. 2^128 is a stupefyingly big number. We'll likely never find a random collision in MD5. We know how to generate collisions in MD5, but this requires knowing the preimage (e.g., the m in `h = H(m)`).


Thanks for reminder. Likely couldn't crack these in SHA-1 (or without substantially more effort), let alone some of the more new age hashing algorithms.

Still, it's impressive.


Unsalted SHA1 is just as easy to brute force as unsalted MD5. There is maybe a ~30% speed difference.

Now adding a salt would have made it ~2 millions time harder for these ~2 million unique passwords.


On a 2012 laptop with john 1.8:

• 24 million tries/second for unsalted MD5

• 18 million tries/second for unsalted SHA1

• 80,000 tries/second for crypt-md5 (state of the art… in 1995)

• 2400 tries/second for bcrypt (already becoming obsolete in 2012)

• And a tremendous 100 tries/second for scrypt (then state of the art)

So yeah, the ~30% performance hit roughly fits. But even md5 can be used much more intelligently than just salting, and would have been available in major programming languages.


Still scanning the MD5 key space 2^128 would take a while to brute force passwords. Of course it only takes time. But a lot of time. SHA1 is better, because it's 2^168 options. Also there's a problem, what if the data isn't in single block. Then you might find collision, but that collision might include something which can't be put in the password field, and therefore doesn't solve the problem.

MD5 for PQKDEj52vGQVKudQaBMSewJ5MMifgaVxNYK9zsRTxMzBkyvompLMtgYCYv6SNzDE is cd5480cf1ad1cf7fab3aedbc6495609d. I would love to see someone to reverse that. Even if you find the colliding hash with sorter set of input bits. It's highly likely, it won't be in the acceptable character set. Or am I getting something wrong?


> Or am I getting something wrong?

Yes, we're talking about passwords, not private keys. Virtually all passwords found in the wild have a massively lower entropy, so the amount of possible hashes doesn't really matter, only how long it takes to iterate through our much smaller password space. The longer that takes, the less attractive it is for attackers to try and brute-force people's passwords so they can try them on other services.


Unsalted MD5 is not that much worse than unsalted SHA-1 in this case. Afaik the only vulnerability MD5 has over SHA-1, fundamentally, is that MD5 has easy collision exploitation. But that doesn't matter when bruteforcing passwords.

* I'm not 100% on this, but let's be bold :) please forgive me if I'm wrong.


Interestingly, many of the longest passwords follow the same principle: A sentence repeated three times with two scrambled letters in one word each.


That might be an artefact of how they were found. Note that the page says "a few of the longer passwords". It's likely that they specifically searched for passwords using that principle.


Disturbingly, many do not, like the very first and longest one.

Of course, that one was gotten because it's simply the title of a song, mapped to lower case with spaces removed.

Password phrases, even if long sentences, must not be something pulled from popular culture.

I wouldn't expect, say, "thequickbrownfoxjumpedoverthelazydogs" to hold up.

I bet you that it woudn't have been cracked, had the user had changed a few words in a way that is easy to memorize. I would have changed "stripper is crying" to "stripper is dichloromethane". (Compound used as powerful paint stripper.)


Sentences usually have whitespaces between words though, makes the passwords much easier to remember and handle.

I'm assuming last.fm does not support blanks in their passwords since none of these passwords use that character. Or perhaps very few people realize you can use that character to help make passwords more manageable.

My recommendation to people who ask the past few years has been full, grammatically correct sentences.


The problem is that the vast majority of websites I've seen handle the whole process involving passwords horribly (registration, resetting, etc), which induces users to use bad passwords just to get it over with.

Some let you fill out the form and then click on submit and tell you a problem with your password or something. You change it, then they tell you it has to be shorter than 15 or 10 characters, and impose such conditions you almost wait for them to tell you "use: 2Hx,!rJ" as your password. Some don't even support "special" characters, spaces, or hyphens. By the 4th or 5th attempt to register, you're basically trying to come up with the stupidest password you can to feed this monstrosity.

Mind you, somme of these are big companies websites. I think password or registration management also affects things like talent acquisition. Companies using Taleo for instance are doing a great job of repulsing normal, mentally sane, people. The whole approach of registering one account for each company on a different company subdomain on the same domain (company1.taleo.net, company2.taleo.net) and for each one fill out the profile all over again is beyond the realm of my comprehension.

The browser asks you to save the password/username for the website, but it does so for the domain, not the subdomains which all have different passwords. I give up on a company if it's using Taleo. I'm not talented or competent, but I'm sure really competent people wouldn't want to put up with this either and it hurts recruiting.


>The browser asks you to save the password/username for the website, but it does so for the domain, not the subdomains which all have different passwords. I give up on a company if it's using Taleo. I'm not talented or competent, but I'm sure really competent people wouldn't want to put up with this either and it hurts recruiting.

This sounds like your browser's password manager's problem, namely assuming that users will only have a single password per top-level domain.


They do support spaces. There's one password with 39chars down the list (about an "unveiling")


If I know that I'm going to try to compromise a system and access its username/password lists, is there an advantage to creating a number of accounts to which I know the password prior to the break-in?

Does this make it easier to break the other accounts once I have access to them encrypted? As in, I know that the account with username X has an unencrypted password Y, so now I have guideposts to tell if my cracking attempts are pointing in the right direction, trying to get back to password Y from the hashes. I imagine there would be something of an advantage to already knowing, say, 10,000 plaintext-encrypted pairs in a big list.

If this is the case, should one be concerned in managing a system that sees a dramatic uptick in new user registrations as a precursor to an attack?


Only if you are not able to characterize the hashing scheme, or if you think the app uses a static salt that you don't know. Then having a known plaintext would help figure it out. But I think you would only need one, not a ton of them.

It seems like if someone hacks a system so badly that they get the whole DB, they can probably also figure out the hashing scheme while they are in there.

I doubt that a dramatic uptick in new user registrations is a useful precursor signal.


Any ideas on how they manage to crack these? I can't grok how they would achieve this via a dictionary attack, especially the likes of:

    MgihtyDutchmanMgihtyDutchmanMgihtyDutchman
    alapdanceissomuchbetterwhenthestripperiscrying
    <script>alert(document.cookie);</script>


Brute forcing tools (eg. Hashcat) can take lists of words, mangle them to simulate typos, concatenate them a few times. That's how the first password is constructed (mighty, dutchman). There is really not that many combinations to try: take the most common 2^13 English words, add 4 variations of each word to simulate 4 different typos (2^15 words), test all possible pairs of words (2^30 pairs), repeat each pair up to 4 times, and that's a total of only 2^32 candidate passwords, which takes 1-2 seconds to brute force with hashcat on a GPU rig.

The 2nd password seems to come from a long dictionary: "a lap dance..." is the exact title of a song with no spaces and all in lowercase. It's good practice when brute forcing to take the titles of all known books, movies, songs... and put them in your dictionary.

The 3rd password also seems to come from a dictionary, typically built by scraping a few million web pages and taking literally all strings separated by whitespace.


If they had a list of usernames it shouldn't have been too hard. Most of the stuff on here is repeated usernames. The one in the middle is a song title with no spaces -- Bloodhound Gang, Hooray For Boobies


Password cracking "dictionaries" can have phrases in them.


I get that but permuting over typos, letter casing, lengths and combination of words would make the dataset huge.

Is there a massively collaborated rainbow table database that is constantly growing? Are there other heuristics that come into play such as guessing the password length or some such thing?


Rainbow tables aren't really a thing any more, you can calculate a hash much faster than you can download one


I wonder how my password policy stands up?

I have a memorized "satisfy stupid password rules"-string made up of lowercase, uppercase, digit, special character. Eg. pA5$word

Then i take use "service name" [space] above string [space] "4-5 word sentence that first pops into my mind when i think about the service name"

So for netflix I would get:

netflix pA5$word the net is flickering

Serves me well and I have never entered the secret string in any password manager, only the ending sentence. I can't autotype it though but since it's a sentence it's remarkably easy to type correctly. It also surprises me how often I remember the "first sentence that pops into my mind".

The only problem I have with this scheme right now is services that don't allow something in this pattern (mostly no spaces) and forces me to deviate which makes my blood boil.


Depends on what the threat is.

For a brute-force dictionary attack: the "netflix" part is worth as much as a single random character, the length by the sentence will do you much good. The special chars are good.

When a hack like this becomes public happens and someone tries to attack you in specific: the "netflix pA5$word" becomes worthless, but the sentence saves you.

You forgetting stuff: the sentence will break your neck

I guess a good master-password and a password save with random passwords is better, but you are doing pretty good! Also you can use a single password on a untrusted computer without fearing to compromise all other passwords too (again, thanks to that sentence).


> His password: netflix pA5$word the net is flickering

I don't get it that you say that "netflix" in this password has no more worth than a single character. How can the cracker know that this is "netflix" and not "netfli " or "neTflix"?

Furthermore, it's not like the password reveals itself during the process. Untill all characters are found, there should be no logic in the result, or am I wrong?


I thought he uses the unchanged service name as a prefix. If I had the chance bruteforce netflix accounts with a dictionary I'd definitely have "netflix" as one of my dictionary words to it (and Netflix and netflix.com and Netflix.com etc).


I assume netflix is in the dictionary for all word based bruteforce attack. It's just a prefix word in the scheme that is super easy to remember, it's in the url. And an attacker can't know whether it's www.netflix.com, www.netflix.se, Netflix, NETFLIX, in the beginning, in the end or any number of variants that could be used consistently in the scheme. The main part is that I can remember it as "service name lower case" "breaker string" "words".


> the "netflix" part is worth as much as a single random character

Unless you mean that they'll look for "netflix" specifically because of the service they're cracking, surely a word adds much more entropy than a single character?

~50 commonly usable characters, vs tens of thousands of words.


Would be really interested to know how they cracked these passwords...


Computerphile did a great video on how to crack passwords here: https://www.youtube.com/watch?v=7U-RbOKanYs


Brute forced them?


I really doubt that they brute-forced alapdanceissomuchbetterwhenthestripperiscrying. I have no exact idea, but I guess i would take 1000s or millions of years to bruteforce 1,22680068e65 combinations (taking only lowercase letters into account), if you don't have a working quantum computer available.

UPDATE: I did some rudimentary math and think that top notch server farms would take something like 1e35 to 1e42 years to bruteforce 26^47 combinations.


It's not a random sequence of characters, there are only 12 words in there. The cracker is trying words, not just random bytes and so the search space is much smaller


It's not trying random words, it's not even trying random syntactically valid English sentences, it's trying out song titles.

Which is a laughably tiny password space.


Yes, it's not brute-forcing then. That's what I was getting at.


'97 quattuordecillion years' according to howsecureismypassword.net/, if the password wasn't thebloodhoundgang song title


It's probably a phrase from a book or movie. There are much fewer published 12-grams than 26^47.


It's actually a song title.


Which is really easy information to track down.

https://www.google.com/search?q=alapdanceissomuchbetterwhent...


Last.fm users are clearly big fans of Radiohead.


no surprises there, Radiohead were always most scrobbled artist when i used the site actively. They probably still are.


Well, this was a great promo for the people that built this site. I just paid $4 for a 24 hour pass to search view all the info of mine that's been leaked. Well worth the price in my mind.

I'd love to scan my work's customer database for hits, in order to prompt those customers to reset their passwords. But I think $1k/month is too expensive for us. Does anyone know of any cheaper alternatives?

In any case, it's a great service to provide. After one of the more recent leaks I ended up receiving emails from Pandora and Uber, prompting me to reset my password.


I'd hate to be David Iceland right now....


Not many special characters there. However still notes on what those tools try 1st: Some for keyboard walk, Some for xss thing, one dot at the end and parantheses or underscores seems not to help that much.

Seems like today a password manager is a must.


Kind of counters the idea from this xkcd comic that longer passwords are better, even when they just contain dictionary words:

https://xkcd.com/936/


The XKCD comic is showing its age. The comic mentions 1000 hashes per second. Assuming the entropy estimation is accurate (is it?), and it would take 550 years with 1000 guesses per second, that's still not very impressive.

A single AMD Radeon Pro Duo graphics card can perform an estimated 8 billion guesses per second on the password hashes (unsalted SHA-1). A sub $10000 cracking rig with four of them can do 32 billion per second. That would mean the 550 year guessing time of XKCD's example password has been reduced to 9 minutes due to sheer computation power alone [1]. This is why it's important for everyone to use a slow and salted password hashing function (Argon2, scrypt, bcrypt, PBKDF2) to make sure that GPUs cannot guess hashes so terribly efficiently.

Note that this even ignores any benefits attackers have had cracking a large amount of unsalted passwords, which will have been substantial.

[1]: Edit: Looking up the current status quo, a single Nvidia GTX Titan XP can do almost 12 billion hashes per second in oclHashcat, that's 48 billion hashes per second for your cracking rig. Down to 6 minutes it is.


The comic itself mentions that it's intended threat model is someone trying to remotely login to your web server/ssh whatever, not trying to decrypt stolen hashes.

I doubt any web service lets you try 32 billion logins/sec.


Ideally you generate the password for your password manager XKCD-style, and let the password generator spit out 128 bit ASCII passwords for everything else. Then an attacker needs to get a sufficiently recent copy of your password database first, and password managers can afford using much higher work factors for their master password than websites for every single user.


The gulf is so vast between how humans use a web service and how an automated brute force attempt uses a web service, that it should be trivial to block remote brute forces.

Limit attempts to 1 per second per user ID, and block IPs with 50 consecutive failed attempts per user ID. These should be invisible to a human, but totally stop the brute force of any but the most obvious passwords.

These don't seem like they would be difficult to do, but I am shocked at how few web apps do this. Last I checked, Wordpress ships with no limits at all on login attempts, for example.


The comic doesn't say anything about what hash function to use. At any given time you could identify a hash function which takes a millisecond per password. That will be the hash function to use for the comic's algorithm.

The point of the comic is that a character salad is both less secure and harder to remember than four random dictionary words. That's no less true today.

The MD5 hash of "correcthorsebatterystaple" might be trivial to brute-force today, but it is - and always will be - harder to brute-force than "Tr0ub4dor&3", even if in both cases the attacker knows how you generated the password.


If the server uses bcrypt and a salt even with a bank of GPUs its not cost effective to brute force. Easier to go after stupid companies that use unsalted MD5.


> The XKCD comic is showing its age. The comic mentions 1000 hashes per second.

scrypt should be about in that order of magnitude even on modern hardware.


I think the problem with the longer passwords in this list is that they're well known phrases like song titles or just repetitions of the username.


Additionally, last.fm used unsalted MD5 hashes, which are so trivial to break it's not funny any more.

(john manages 24 million hashes/second on my five years old laptop for that, even crypt(3) with md5 is three orders of magnitude harder to brute force.)


How? The passwords on the list might be long but that's the only thing they share with the method from the comic. Usually they are just the same short phrase (often well-known) repeated a few times. This is not what that comic suggests.


Not a single one of those phrases are a random mix of dictionary words.


how so?


It doesn't actually. It shows how badly flawed the XKCD comic is. The problem with the XKCD comic is that he advocates creating a passphrase without a random function. Turns out, with no surprise, the resulting passphrases are easy to guess, because they are predictable word sequences, phrases, or sentences.

To illustrate, suppose you have a word list of 8,192 entries, and a cryptographically secure random function. Shannon entropy says that each word in that list then contains exactly 13-bits of entropy (2^13=8,192). According to https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a27..., 8 Nvidia GTX 1080 GPUs with Hashcat 3.0 can process 200 billion MD5 hashes per second, which means 5 of those password cracking rigs, working in concert, can do 1 trillion MD5 hashes per second.

So, if you have a cryptographically secure random function choosing your words from that list of 8,192 words, what are we looking at?

- 1 word (13-bits): 1 in 8,192 possibilities

- 2 words (26-bits): 1 in 67,108,864

- 3 words (39-bits): 1 in 549,755,813,888

- 4 words (52-bits): 1 in 45,03,599,627,370,496

- 5 words (65-bits): 1 in 36,893,488,147,419,103,232

- 6 words (78-bits): 1 in 302,231,454,903,657,293,676,544

There is no need to go any higher than that, as we'll see in a second. If the password cracker is only interested in searching 1/2 of the total combinations, then that means at each hash, after completion, there is a 50% probability that the password was found (on average). So, armed with this, it would take the password cracker:

- 13-bits: < 1 second to search 1/2 the space

- 26-bits: < 1 second

- 39-bits: ~ .3 seconds

- 52-bits: ~ 38 minutes

- 65-bits: ~ 213 days

- 78-bits: ~ 4,792 years

It's reasonable to conclude that if your threat model is password cracking clusters working on leaked hashed password databases, and assuming the password is hashed with MD5, then at least 65-bits of entropy, or 5-6 words chosen from a list of 8,192 with a cryptographically secure random function, is a good target for a secure passphrase length.

For what it's worth, Diceware has been promoting this approach for years now, where the word list is 7,776 entries (~12.93-bits of entropy per word), and the cryptographically secure random function is 5 fair 6-sided dice. The XKCD "correct horse battery staple" approach is just a simplified implementation, forgetting the random factor.


It does say "four random words". To be fair, that could be spelled out more clearly.


Yeah, we need to be pedantic here. Humans picking words at "random" is far different from dice picking words at random. Turns out, human-randomness isn't very random at all.


Also, "random" is often used colloquially to mean "arbitrary" which further muddies the waters for an uninformed reader.


It is indeed no longer good advice, but not because of longer passwords not being better:

https://www.schneier.com/blog/archives/2014/03/choosing_secu...

> This is why the oft-cited XKCD scheme for generating passwords -- string together individual words like "correcthorsebatterystaple" -- is no longer good advice. The password crackers are on to this trick.


Argh, no!

Entropy. It's all about entropy calculation. There is no trick.

Look at it this way:

- choose 68 bits at random. This is findable by a nation state; if they are your adversary, add a couple more words. - split those bits into 4 17-bit numbers (17 bits has a maximum of about a hundred thousand) - your password is 4 words from a hundred thousand word dictionary - there is no trick: you've encoded 68 random bits securely, and guessing just those bits is pretty impossible.

Schneier is normally sensible but calling entropy a "trick" is idiocy.


Or with actual code:

https://github.com/creshal/yspave/blob/master/yspave/pwgen.p...

A password generator does not care whether its input alphabet is \d, \w, all emoji codepoints, or `cat /usr/share/dict/words`. You determine the entropy of it, and then output as many tokens as needed. It doesn't matter whether an attacker has a copy of your input alphabet, or knows your algorithm. You defeat him by setting the entropy bar high enough (and using a cryptographically secure RNG to generate it).

But a random selection of /usr/share/dict/words (~120,000 entries on Arch Linux) will be easier to memorize than a random selection of the ascii printable range.


This article is posted every time the comic is mentioned but I can't understand the argument. The calculation of password complexity in the comic is made with that in mind. It's assumed that the cracker knows the method used to generate the password, including the dictionary. The strength of this method does not rely on the fact that the password has many characters but that words are randomly chosen from a large dictionary. The attacker would need to do the same at minimum.


This is my understanding, could be missing something though:

The comic proposes that correcthorsebatterystaple would be a secure password. It is 26 characters long and contains 4 basic english words that would be in even the smallest of dictionaries. Let's say a dictionary with 100.000 entries contains these words. Combining 4 words in the dictionary gives you 100,000^4 = 1.e+20 possibilities. Not bad, but let's say 15 random uppercase and lowercase (like the first letters of every word in a phrase like Schneier proposes) letters give you (26+26)^15 = 5.49e+25, still 5 orders of magnitude better.

Edit: a phrase with 26 words in to take each letter from indeed isn't doable to remember, changed it to 15.


100k words might be a bit to many for a dictionary of words that are easy to remember. The comic proposes using not enough entropy (i.e. you should use more than four random words).

The Schneier method is basically equivalent but with important caveats: 1) Such a phrase is not randomly chosen but follows at minimum basic English grammar and at worst is well-known and thus part of the attacker's dictionary and 2) even if that is taken account for the first letters are not distributed uniformly across the alphabet, you just need to take a look at any (printed) dictionary. That greatly reduces entropy and makes it hard to reliably estimate entropy.


> 100k words might be a bit to many for a dictionary of words that are good to remember.

I've been generating all my (memorized) passwords from a 120k word dictionary for years now, can't complain. And estimating the entropy is easy.


Easy entropy calculation is the beauty of that method.

But for "take the first letters in an English phrase" you can't just do (2*26)^{length} or you are way overestimating.


Why limit yourself to one language? You could even use dialects. Five words, five languages, and not too common words.


But the words themselves follow an identifiable pattern (their spelling). As such, a 4 letter word in your password is cracked much quicker than a portion of your password being 4 characters of random info.


That only matters if, for some reason, your password is length limited. If your password must not be more than four letters long, then yes, choosing your tokens from an ascii table has the highest possible entropy. (Example: WPA2 PSKs, shitty websites.)

If your password can have arbitrary length (or arbitrary enough, about ~120 letters), you can generate a 128 bit password with dictionary words as tokens. Sure, the password will be much longer (factor ~6), but also much easier to memorize.


Schneier completely misses the point. The entropy estimates quoted by XKCD (and indeed, used by implementations) assume that the crackers are "on to this trick". For any given entropy you use to generate your password, xkcd-style passwords will be easier to memorize and type out.


Hasn't it always been the assumption that password crackers know about the trick?

If I choose 4 words from a dictionary of 50,000 words [1] that produces 50000^4 possible passphrases. That's equivalent to 62 bits of entropy, or a 10-character [a-zA-Z0-9] password. About 8 years to brute force on MD5 with 2x AMD HD 6990. And obviously an extra word makes it take thousands of years.

It's not ideal, but it's better than a lot of password advice.

[1] cat /etc/dictionaries-common/words | grep -v "'s" | egrep -v 's$' | wc -l gives me 51726


I use this:

  grep --perl-regexp '^[a-z]{5,8}$' /usr/share/dict/words | shuf -n 5 | tr '\n' ' '
There are about 31000 words.


> | grep -v "'s" | egrep -v 's$'

Why filter those out?


all the 50,000 words will not be chosen with the same probability. I think we are more like a random 8 characters [a-zA-Z0-9] password.


This is my major concern with the "4 random word" approach: password crackers are extremely clever, and there may be non-obvious heuristics they can use to reduce the size of the search space. E.g. if we aren't totally committed to the method and we generate a few 4-word combos until we find one that we think is "memorable", that introduces an obvious bias that can probably be exploited.

(For the record I do use the "4 random word" method for some passwords I want to be able to remember without having to open up my password manager. But I have nagging doubts about its security vs. random strings.)


> all the 50,000 words will not be chosen with the same probability.

Why?


red hammer effect.


How does that affect /dev/random?


The XKCD comic skips lightly over this by simply stating that the words were randomly chosen. But being truly random is actually hard for most people to do off the tops of their heads.

I doubt many people are taking away from that comic that they should use software to reliably randomly choose the words they memorize. Instead the advice seems to usually get shrunk down to "choose 4 random words," i.e. out of your own head. Most people don't carry 50,000 word dictionaries around in their heads. More like a few thousand. That changes the math considerably.


It I still don't understand how this gained traction.

I memorize 3 passwords and they are extremely hard. The rest are not created by me and I don't have repeated passwords.


> I memorize 3 passwords and they are extremely hard.

Not everyone is a masochist. It doesn't matter whether you encode 128 bit entropy in a base95 string, or a list of ~8 random words, it's still 128 bit entropy… but the word list will be easier to memorize and to type out.


How many websites? How many passwords are you going to memorize and how are you going to not repeat a password?


> How many websites?

Zero. Why would I bother memorizing them?

> How many passwords are you going to memorize

The LUKS password for my home laptop, the LUKS password for my work laptop, logon passwords words for each, and password manager master passwords for each. I guess I could move some of these to hardware keys, but I'm too lazy.

> how are you going to not repeat a password?

The same anyone is not repeating passwords: Strong password generators.


> I memorize 3 passwords and they are extremely hard. The rest are not created by me and I don't have repeated passwords.

So why are you disagree with me?


Because XKCD-style passwords are as secure as your "hard" passwords, but much easier to memorize.


A password is just not enough. 2FA is almost a necessity I would guess.


I had a good chuckle at the first one.

alapdanceissomuchbetterwhenthestripperiscrying


Which apparently is the name of a song from Bloodhound Gang.

Not that hard to brute force.


how about ilikedyoubeforeyouwerenakedontheinternet ?


Also appears to be a song by From First to Last.


How does leakedsource work? Basically they got password dumps and are selling this information to companies? Isn't this illegal somehow ?


These passwords are just abysmal.


ok, the first one is just priceless!


Laughed at that one myself. It's an old Bloodhound Gang song - https://www.youtube.com/watch?v=YMGVMtnxXEw

And there's the connection to last.fm :)


That makes much more sense than this being just a random sentence!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: