
Lessons learned from cracking 2 million LinkedIn passwords - ibotty
https://community.qualys.com/blogs/securitylabs/2012/06/08/lessons-learned-from-cracking-2-million-linkedin-passwords
======
runeks
Here's a useful one-liner to create a strong password in Linux:

    
    
        cat /usr/share/dict/words|egrep -v "é|'s$|[Åå]|[Øø]"|shuf --random-source=/dev/random -n4
    

This uses the dictionary _/usr/share/dict/words_ and skips all the words
containing characters like é, å, ø and all those ending in _'s_. The resulting
word list has 72,940 words in it. Then it chooses 4 random words from this
dictionary and prints them to the screen. This gives a password with about 65
bits of entropy.

By adding another word, thus creating a 5-word passphrase, a botnet capable of
checking 1,000 trillion passwords per second would spend, on average, 1600
years cracking away before it would find the correct passphrase.

Here are some example 4-word passphrases produced using this method:

    
    
        poetically archaisms accept constrictors
        leukemia shuttlecocked checkout benevolently
        climactic gyrate dynamical predominates
        massage beef Concords recliners
    

These are surprisingly easy to remember. I use a 7-word passphrase for the
most important things and it didn't take me more than a day or two to learn
it.

~~~
jackalope
That's only 4 tokens and you should assume at least some of the combinations
are already in rainbow tables. I don't know how long it would take to create a
rainbow table for the whole space, which is this big:

    
    
      72940^4 = 2.8304992 × 10^19
    

To put it in perspective, a 14-character password using _only_ lower case
English alphabet letters as individual tokens already beats this:

    
    
      26^14 = 6.45099747 × 10^19

~~~
archivator
Then again, a random string of 14-characters is nearly impossible to remember.

The key takeaway here is that every word adds another 16.15 bits (assuming
good random source and no non-random decisions by the user), whereas another
character adds only 4.7 bits. I'd argue that the effort to remember another 4
random characters (to reach those 16 bits) is far more than the one to
remember another random word. We're quite good with words, you know :)

------
ams6110
_no matter how elaborate a password you choose, as long as it is based on
words and rules, even if there are many words and many rules, it will probably
be cracked_

So this is what I've been wondering about the current "best practice" to use
long passphrases. How are those really any stronger than any other "rule"
based password, the "rule" being that they are likely constructed of words and
phrases from human language.

Would the passphrase "My first car was a 1972 Monte Carlo" really be harder to
crack (once the cracking tools are adapted) than a random 8 character
password?

~~~
dkokelley
Well, calculating the "true" strength is difficult to do, because even though
sophisticated tools are available to aid the process, the attackers are still
human, and can input their own guesses that may or may not be more accurate.
If the attacker knows (or can closely guess) the password rules used to
generate your password, he or she has a better chance of getting a hit.

Let's look at a password like "My first car was a 1972 Monte Carlo". The
password is 35 chars, 3 upper case, 6 special (spaces), and 4 numbers. The key
space is all upper and lowercase english letters, all numbers, and all special
characters. That's a key space of 95 characters, over 35 places. Objectively,
there are 1.66 x 10^69 possible combinations. Given that the LinkedIn password
crackers are slowed down at about 9 chars it seems like you're incredibly
secure. But let's assume the attacker knows something about your password
structure. Let's say they know that you use words (many people do, so it's a
reasonable guess). Let's also assume that for numbers the attacker knows that
years are popular for password numbers. Now instead of 35 chars, your password
has 7 words and a date. We've changed the key space from 95 to about 100,000.
(The exact number of words there are is a tricky number to pin down, but
crackers have some good data on what the most popular ones are.) As for the
date, there are really only a couple hundred interesting numbers, including
all dates from this and last century, as well as common patterns.

Password strength is (key depth) ^ (key length). An uninformed attacker has
1.66 x 10^69 possible combinations (95^35), while an informed attacker has
roughly 1.0 x 10^40 possible combinations (100,000^8). Obviously, the less an
attacker knows (or can guess) about your password structure, the better
chances your password has against being cracked.

Now, you asked about your password versus a random 8 char password. Let's take
a "strong" password like "1~qQ%57h" This password also has upper and lowercase
letters, numbers, and symbols. We can assume that there is nothing predictable
about this password for this exercise. The password strength is 95^8, or 6.6 x
10^15, obviously much lower than the longer sentence, even if the attacker
knows the sentence is 7 words and a date.

Now remember, our passwords are being matched against human crackers
attempting to guess the ways our passwords are most likely put together. For
now, most passwords are 6-12 characters. In fact, most websites only allow
passwords of these kinds, so it makes the most sense for crackers to go after
these passwords. But it's still an arms race. If we assume that webmasters see
the light and allow (or enforce) long, sentence-like passwords, the crackers
will adjust. It's plausible I think that 5-10 years from now, we'll see
articles like this one that use sentence structure syntax as an attack method.

Until we discover and implement a better system that obsoletes passwords, the
best we can really do is have long, complex, and unique passwords for
everywhere we go, and have a system to manage them for us. I believe that
something like LastPass or KeePass are the way to go for now.

*Disclaimer: This was written on a groggy Sunday morning. Do not rely on my calculations. Do not use any of the examples as passwords. Do please check my work.

~~~
Someone
One improvement: for most people, the risk is not that someone tries to crack
your password, it is that someone uses rainbow tables to crack many passwords,
one of which may be yours.

Rainbow tables have a degree of freedom: the function that maps hashes back to
passwords. You should try and pick a password that that function will never
generate. To get that, do something unique. Good options, I think, are
including a foreign language word (neither English nor your native language,
nor the site's language), reversing a word or a syllable inside it, and made
up words that have Hamming distance greater than two to any other 'obvious'
word.

Short (<= 8 characters) passwords, I think, are bad choices for that reason,
even if they consist of ASCII gibberish.

Disclaimer: I have never looked what kind of code commonly used rainbow tables
use.

~~~
nodata
I thought GPUs killed rainbow tables? (the storage space alone makes them
impractical compared to cracking realtime)

~~~
Someone
Not that that says much, but I am not aware of that. More importantly,
googling for "GPU vs rainbow table" leads me to phrases such as "a fully GPU
accelerated set of rainbow table tools". Or has the term changed meaning?

~~~
nodata
These links are good:

<http://news.ycombinator.com/item?id=3140614>

<http://news.ycombinator.com/item?id=4073597>

~~~
Someone
Thanks.

------
peterkelly
So here's an idea:

Pretty much every site with a login facility has an "I forgot my password"
option where you put in your username or email address, and it sends you a
link to reset your password. This is effectively a second form of
authentication - the ability to receive email at that address implies the
ability to log in to that account.

So what about an authentication mechanism that works as follows:

1\. You type in your username

2\. The site emails you a one-time authentication token (as part of a link)

3\. You click on that link and then you're logged into the site

Of course, there are a few obvious problems with this: it's a bit cumbersome
to have to do for every login, email is unencrypted, and message reception
uses a pull-based mechanism.

So I could envisage a standard, incorporated into browsers, as follows:

1\. When you first launch your browser, you log into an authentication server
S, supplying your password (either manually, or automatically via a saved
password)

2\. When you want to log in to a site, you type in your username, and the site
sends an authentication token to the server S

3\. S sends a push notification to your browser with the authentication token

4\. Your browser passes this token to the site, and you're logged in

This way, your (hashed + salted) password need only be stored on server S (in
the first example, this corresponds to your email server). This means that
apart from S, none of the sites you use need to store any password information
at all.

I'm sure this basic idea has been implemented previously in other contexts.
Why are we not using for all our web logins?

~~~
tjoff
I don't want to let server S know which sites I have an account to. Also, I
don't trust server S.

And as if that wasn't bad enough ( _which it is_ ) it is a single point of
failure both in terms of security and availability.

~~~
peterkelly
The role of server S would be analogous to that of your email server, which
you already trust. Just like email, it would be a decentralised system - with
numerous public providers, as well as servers that organisations and
individuals have set up themselves.

You could also have multiple accounts with different S servers, e.g. one for
work and another personal use.

I agree with the single point of failure regarding availability - if your
authentication server is down, you won't be able to log into _anything_.
Though we already have the single point of failure with the existing system,
in that once someone has your email password, they can obtain password reset
messages from any site that you've registered on with that account.

~~~
tjoff
So you gladly replace that single point of failure with two points of complete
failure? You can't conceive of any problem with that reasoning?

 _You could also have multiple accounts with different S servers, e.g. one for
work and another personal use._

If I was forced to use such a service I'd make a service that made it easy to
automatically create one "S-server account" for each "real" account and
continue to use passwords for those accounts as if nothing had happened.

In practice, BrowserID doesn't solve anything for me - at the cost of reduced
security, availability and integrity - as well as forcing me put trust in a
third party.

There is a huge difference between my mail server and the S server. If someone
uses my mail to reset passwords I will notice, since my credentials won't work
anymore. Also there are different levels of security, I value my mail account
more than say my account on hacker news. Which I haven't even entrusted with
my mail-address - love that you don't have to supply even a fake one and
considering that I don't forget my password (or allow anyone to hijack my
session) I can't possibly gain anything from supplying it.

Which is the key point, rather than me not trusting ycombinator there is just
no incentive for me to supply it - so why should I? Maybe ycombinator gets
hacked and my mail gets leaked, I might thus end up with spam - no need to
take that miniscule risk when there is nothing to gain. Just as I see _no_
reason to link independent accounts together with a service such as BrowserID.

------
gambler
_Thus, it is highly recommended to use a strong random password generator that
is known to be actually random._

The whole point of a password is that you can remember it. The moment you need
software to store and retrieve passwords, you're better off using asymmetric
cryptography. That said, I really dislike the idea that it's the only way to
achieve security. I would really like to see more discussions and propositions
for solving this problem.

~~~
fl3tch
It will get to the point that we need biometric scanners or implanted RFID
chips with private keys to do authentication, since brute forcing even
unrememberable passwords will be trivial eventually.

~~~
vegardx
Why? There are plenty of ways to make brute force hard or damn near
impossible, it's just a little harder to implement. It has been discussed a
lot of times here at HN and other places.

Also, do you really think that storing private keys on RFID would be a wise
choice? I would put that in the "stupid as fuck"-category, just above storing
your keys on a USB-stick, as they are both easy to copy and duplicate, and
with RFID I could do it remotely.

A real solution to all of these problems would be for people to stop reusing
passwords. You don't really need the account passwords when you basically have
access to all the data there anyway.

~~~
gambler
That's the problem, though. If people didn't reuse passwords, if people didn't
use words and personal information in passwords and if people had no problem
changing them every day then a lot of problems with security would be solved.
But those are not realistic expectations, and blaming users for not being
computers does not solve the problem.

Slow hash function certainly help, but I think we also need something that
goes beyond straightforward cryptography to address authentication issues.
Something that redefines the rules of the game to be more human-friendly and
less computer-friendly.

But then again, I never even heard the question being phrase this way: what do
we want from "program-less" authentication and what we can use to achieve it.

------
dhughes
Well it's to the point now where I can't remember my passwords they are so
long and complex plus I have so many accounts I need a password manager to
manage all of that.

With a password manager why bother restricting any password to anything less
than the maximum? Gmail's password limit is 100 characters so I did that and
any other account they are maxed out. Add to that extra authentication and
also change them at least once every six month at minimum.

The problem is my most valuable account, my bank, is stuck 15 years in the
past.

~~~
SCdF
I've done the same, though I tend to limit it at about 24 characters, simply
because if I come across a situation where I have to type it on a foreign
computer while reading the password from the password manager on my phone, I
don't want to be there all week..

------
donutdan4114
I've always wondered why password hashing is not a law (at least in the US).
There needs to be an agreed upon minimum level of security for storing
credentials.

Or, just make it where websites HAVE to state somewhere how they are storing
the credentials. It's shocking how many places still use plain text, or
encryption and store the key in the database..

It's pathetic that a major company like LinkedIn is simply storing credentials
with a SHA1 hash. At LEAST use a really good salt...

~~~
SkyMarshal
_> Or, just make it where websites HAVE to state somewhere how they are
storing the credentials. It's shocking how many places still use plain text,
or encryption and store the key in the database.._

I like this idea. Like a Surgeon General's Warning for the web. I wouldn't
want the government making specific laws about hashing, but requiring
transparency and disclosure about how data is stored would be useful in a
variety of ways.

~~~
krrrh
The concern I have with this is that it provides a bit too much information to
potential crackers. Security through obscurity is nothing to rely in, but it
doesn't hurt to have a little. It's why disabling the reporting of http server
version information is a common practice in hardening a server.

OTOH, it may be worth it. It's shocking that LinkedIn could be so negligent,
especially after high-profile screwups like gawker.

~~~
SkyMarshal
>The concern I have with this is that it provides a bit too much information
to potential crackers.

Only the script kiddies. The ones you have to worry about have bots and
automated scans that can figure that stuff out in an instant.

Yeah, unbelievably shocking that such an advanced web company as LinkedIn
could be so negligent. Amateur bitcoin sites, social media sites, venerable
Web 1.0 ones like Last.fm don't surprise me much, but LinkedIn? WTF.

------
tedunangst
I always use site specific, but also site derived passwords. I think it's time
to reevaluate that practice. I remember seeing that three of the top password
fragments for LinkedIn were link, job, and work. My password was all three...
oops.

~~~
Periodic
You might want to look at a browser extension like PwdHash [1]. It uses a
client-side script to generate a cryptographic hash from your common password
and the domain name. I've been using it for about four years now and have been
generally very happy. It means that if my password ever gets leaked the
attackers are not only unlikely to find my password of "ngjO3uBJrvt", but if
they get it they do not have any information about other sites, even if I
reuse the password elsewhere.

There are some newer password management/hashing tools. I've stuck with this
one both because it works for me and I know and trust the authors, a group at
Stanford.

1\. <https://www.pwdhash.com/>

~~~
tedunangst
I don't use that because of (unfounded) concerns that I will someday need to
enter my password into some device where it's not available, or the domain
will change, or some other scenario where bad things will happen because I do
not actually know the password I told the site.

~~~
16s
I call this 'primary authentication' which means you're in an environment
where you can't execute code (staring at your xfce4 desktop log on prompt for
example). Password managers and generators are only useful _after_ you've
logged on. Form there, you can execute code and use a password manager for
'secondary authentication' (websites, email, etc.).

------
16s
Linkedin allowed 6 character passwords. If a user selected six random
uppercase ASCII, lowercase ASCII and numbers, this would be the bit-strength:

print math.log(62) / math.log(2) * 6

35.72 bits

That's easy to crack. Also, keep in mind that humans don't select chars
randomly. So the bit-strength of these passwords was probably closer to 20
bits. I cracked 2.5 million with an old cpu and JtR within a few hours.

~~~
0003
This is the advice I give to my family members. The solution is to create a
one-time-pad in excel(!) that contains all of your passwords. Store it on an
encrypted thumb drive and carry it with your keys. There is the possiblity
that your OTP may contain a character set that is not congruent with a web-
service's password system, but these circumstances are rare.

~~~
saulrh
Just use keepass. If you're already carrying around a password file, you can
carry around portable binaries of the program that reads your passwords.

www.keepass.info

~~~
LinXitoW
sadly, Keepass is not very Mac/Linux friendly, because it's built on .NET, so
it's only an option for pure Windows users.

~~~
dopo
I don't know about Linux, but I've been using KeePassX for Mac for about a
year now. It only supports the older KeePass file format, and it's ugly, but
it works great. There are also a few iOS apps you can use, my favorite of
which is KyPass.

------
MetaCosm
This week has finally got me off my ass, and I generated a truly random
password for all 260 sites I have built-up in Lastpass over the past few
years... as well strong Lastpass password (most likely going to be hardened
with a two-factor via Yubikey in the near future).

I must say, just finding the password reset function on some of these forums
and less popular sites is a beast. Also, I was shocked by the number of 10 and
12 char limitations I hit.

------
16s
How to calculate real-world password bit-strength:
<http://16s.us/word_machine/bits/>

------
robomartin
With all this talk about security I am still wondering why everyone isn't up
in arms about the fact that Chrome makes all of your stored passwords plainly
visible at the click of a button or two. This has been the case for years and
many complaints have been recorded, but Google, for some strange reasons,
seems to refuse to even attempt to put forth any effort to secure their
browser.

~~~
abraham
There are going to be much better discussions about this in Chromium
discussion archives but I'll cover one of the main reasons why Chrome and
Firefox (by default) do not have a master password. If someone has physical or
remote access to your computer it is an endgame scenario and it does not
matter if you have a master password enabled or not. The most basic attack is
just to install a keylogger and steal the master password. They could also
just copy the password database remotely and brute force it.

~~~
robomartin
I understand this very well. Here's one scenario: Busy office. Hundreds of
computers. Open environment (no doors, just a bunch of desks/tables). Everyone
using Chrome.

The current version of Chrome would allow someone to, within a few clicks,
grab a pile of passwords.

Here's another scenario: Your mother takes her laptop to be repaired/updated.
She uses Chrome. The entire repair shop has easy, unencumbered access to all
of her passwords and logins.

Similar scenario: Computer goes to IT guy where you work for repairs/updates.
He now has any and all of your passwords and logins with no effort.

My point is that for all this talk about security it seems really dumb for a
prominent player (any prominent player) to not take extra steps to ensure that
our valuable data is secure within reason. With LinkedIn the problem is, at
the very least, the lack of anything beyond SHA-1 to protect passwords. Bad
idea. In the browser case, it seems to me that, unless the intent is to
provide a browser used only by those like us who understand and are very aware
of security issues, it might just be a good idea to put in a few things that
will make it harder for curious eyes or the 16 year old at the repair shop to
grab all of your login data.

I don't propose nor do I expect perfection or absolute security, but what
Chrome does today is, in my opinion, at the very least irresponsible. The
uninformed user has NO IDEA WHATSOEVER that a huge security hole exists in
their browser. Maybe we need to stop thinking in our terms and focus on mom,
dad, uncle or grandma. When you first install Chrome you should, at the very
least, see a screen telling you about security and the options you might have.
I think that a master passwords would most-definitely serve a purpose in the
case of "innocent" peeking. Yes, with pro's all bets are off. It's only a
matter of time until someone tracks identity theft to the lack of browser
security and they sue the fuck out of the browser publisher.

~~~
abraham
> The current version of Chrome would allow someone to, within a few clicks,
> grab a pile of passwords

With a USB stick and one click anyone can install malware that would give
complete control of the computer to the user remotely.

> Computer goes to IT guy where you work for repairs/updates.

IT repair guys generally need admin access to the computer and will have all
the time in the world to install any number of malware for remote access.

> but what Chrome does today is, in my opinion, at the very least
> irresponsible

For Chrome to add a master password would be irresponsible because it would
give users the illusion of security they don't have. All OSes already have
password protection against innocent peeking with user accounts and the
ability to lock your computer when you walk away.

------
mmaunder
20 years ago we called this crackerjack.

<http://web.textfiles.com/computers/jack14.txt>

------
samhan
Well heres an idea i use . If you know how to touch type just offset your
fingers on the keyboard in a special way so you can turn a memorable password
into slightly more random gibberish . Eg : I Love you o ;pbr upi

------
ronnier
Were these just password hashes or did the leak include usernames and/or email
addresses?

~~~
josso
The file that got leaked to the public is only containing the hashes. The
hackers behind the file probably has usernames or emails.

------
rikf
head -c x /dev/random | uuencode -

where x is the length of the password you want.

~~~
chmike
How do you remember it ? What do you do if you need to login from another
computer ? This doesn't work on my windows computer. If we store this random
password in a local pasword storage, how much different is that from using
asymetric keys (rsa,...) ?

