
Introduction to GPU Password Cracking: Owning the LinkedIn Password Dump - adamnemecek
https://www.trustedsec.com/june-2016/introduction-gpu-password-cracking-owning-linkedin-password-dump/
======
miles
From a comment by giveen[1] on reddit earlier:

 _" At hashes.org, 87% completed
[https://hashes.org/public.php](https://hashes.org/public.php) "_

There are _loads_ of cracked hashes from other public leaks as well at
hashes.org - worth adding these to your pen testing dictionaries.

[1]
[https://www.reddit.com/r/netsec/comments/4ozdz8/introduction...](https://www.reddit.com/r/netsec/comments/4ozdz8/introduction_to_gpu_password_cracking_owning_the/d4gypj3)

~~~
lstamour
A bit on the motivations of hashes.org:
[https://s3inlc.wordpress.com/2014/05/27/hashes-
org/](https://s3inlc.wordpress.com/2014/05/27/hashes-org/)

Seems it was started for the linked in hashes, which are at the bottom of the
page, if you were wondering...

~~~
stephengillie
This is the second[0] post I've read this weekend where someone preferred a
file-based database system to MySQL. In the other, they explain why MySQL
wouldn't be preferable to SQLite in their use case. Here, moving to flat files
was faster and easier to work with, at 1/5 the total size, and being much more
granular.

[0]
[https://news.ycombinator.com/item?id=11934826](https://news.ycombinator.com/item?id=11934826)

~~~
andreareina
They essentially built an ad-hoc database complete with sequential keys and
indexing. I'm curious to see what the resulting performance would be if they
used a database that's appropriate for the type of data they have. I'm not
familiar with non-relational databases but I suspect that a key-value or
document store would work very well.

------
afreak
Any developer today that is developing an application and isn't using
something like Argon2, Bcrypt, or Scrypt should be considering a plan to move
away from whatever they're currently using yesterday. There is no reason to be
using anything less than those three and continued use is in my mind
negligence.

If at all possible you shouldn't be storing passwords to begin with and
instead relying on another service for authentication.

This should be the takeaway from this article.

~~~
Matt3o12_
> If at all possible you shouldn't be storing passwords to begin with and
> instead relying on another service for authentication.

Should we all be using "Login with LinkedIn", then?

Passwords are always difficult to deal with even when using bcrypt. Who knows
if bcrypt is still considered secure in 5 years? How long would it take to
implement a change which updates the hashing algorithm for new logins while
still using the old algorithm for old logins? When should you erase all
passwords from inactive who haven't logged in and thus still use the old
algorithm. (If you are interested in this problem, Django's user model uses a
pretty straight forward and good approach[1]).

Outsourcing them is not the answer. It is a good idea to add that for the
user's convenience but I hate it when websites only offer the option to login
with "your" favorite social media. But even then, by outsourcing the
passwords, you are risking your users' privacy by giving them to
Google/Facebook/etc. This even discriminates users' privacy when they are not
using Facebook for authenticating because facebook can see that user X visited
your website (and sometimes even all URLs from that website you have visited).
This is because those "Login with" and "Like" buttons always hit Facebook's
and Google's servers with every webpage.

[1]:
[https://docs.djangoproject.com/en/1.9/topics/auth/passwords/](https://docs.djangoproject.com/en/1.9/topics/auth/passwords/)

Edit: Forgot the link, thanks!

~~~
lpage
> _Outsourcing them is not the answer_

It very much is, if you're outsourcing to someone who can do it with greater
competence than the average team can. Keeping current on the crypto, designing
with the ability to sunset algorithms in mind, continuous pen testing,
investing in physical security/network security/HSMs/you name it definitely
isn't cheap or easy. Unless you're in the business of doing _that_ you're
almost certainly better off having someone do it for you.

That said, I'm with you on the social logins front. I have/had? hope for
OpenID Connect as an alternative so it would be great if someone neutral like
Mozilla jumped on the bandwagon.

~~~
woodman
[https://en.wikipedia.org/wiki/Pluggable_authentication_modul...](https://en.wikipedia.org/wiki/Pluggable_authentication_module)

That is pretty much the first thing I do when I inherit a project with
authentication. You don't need to make another company your application's
doorman, there are a lot of PAM backends that you can run on premises that "do
it for you". If you have the competency to manage a LAMP stack - then you can
likely handle a well tested and existing authentication server.

All the years in physical security might have broken my brain, because I am
always surprised by how willing people are to leak information that doesn't
need to be leaked. One project I was pulled into was on the precipice of
uploading millions of customer's addresses to Google's geolocation API - had I
not been able to bring the lead to his senses I might have made a run for the
network closet.

~~~
lpage
PAM is great, and it's especially great as a layer of indirection, but I can't
agree with your overall point that using PAM = problem solved. To your no
harder than LAMP point, most teams _can 't_ competently manage a security
critical LAMP stack. They're in good company given that big
companies/governments get pwned with great regularity. Survival requires
defense in depth, and that gets expensive. It's a matter of everything from
policy (are there two-man change controls on firewall rules, do separate teams
own the route tables and firewall, do separate teams develop/audit/deploy
security critical code) to hardware (is private key material stored on an HSM,
are sensitive services physically isolated, does entropy come from a hardware
RNG). Most small companies aren't thinking about those things.

Also, given that the P is for pluggable, what's the backend? You wouldn't use
pam_unix for users outside your org. A DB? Now you're back to square one.
LDAP+Kerberos/AD? That beats the DB but it doesn't do anything for your
defense in depth requirement.

~~~
woodman
> ...I can't agree with your overall point that using PAM = problem solved.

I don't think we have the same problem definition. I'm saying that it solves
the problem of authentication implementation details - where the just-enough-
to-be-dangerous types screw up (salting, keyspace, the keeping current on
crypto part). LDAP can certainly be leveraged for defense in depth,
authorization vs authentication, but that is much less off-the-shelf. This
also provides some separation between the authentication server and braindead
PHP scripts that barf the results of ";select * from users;".

> Also, given that the P is for pluggable, what's the backend?

Kerberos is the obvious choice for authentication, LDAP integration for
authorization if you're needing a fine granularity. You'd really have to go
out of your way to end up with a PAM that dumps right into a DB with a poor
crypto policy - I've never seen it. You could use /etc/passwd - but you're
right, you wouldn't want to... the option is nice though.

I don't disagree that a company that makes money primarily on identity
management could do it better, if you assign a low value to the information
that is necessarily leaked. But let me just point out the context in which we
are having this conversation: LinkedIn offered such a service, as does
Facebook - both have suffered breaches. While that isn't how they made their
money, plenty of people used the service - following the letter of your
advice, if not the spirit of it.

------
jacquesm
85% cracked in a few days is seriously depressing.

What does a box like the one mentioned in the article cost?

Any estimate on the time to crack the remaining 15%?

~~~
zeveb
> Any estimate on the time to crack the remaining 15%?

I know of at least one person on LinkedIn using passwords of the form
2jzAwGyOzfxNoW0u3lTIIa (i.e., 22 digits & mixed-case letters); calculating
209.7 million hashes per second he should be able to crack his first one of
that sort of password in about 182,686,540 years.

I cordially wish him the best.

~~~
bigiain
Anybody using a modern password safe should be in exactly that position. I use
1Password and default to 25chars including upper/lower/digits/specials. It's
occasionally annoying when I need to transcribe one of those passwords from my
phone into a system I trust but not enough to keep my password safe on (my
work laptop, for example), butthat's rare enough that I just suck it up and
cope.

------
Keyframe
I forgot root password on my old IRIX / SGI Octane2 and had to "crack" it a
few months ago. Turned out it was using full eight characters and was on the
tail-end of the alphabet. It took less than a day to guess it on an older two-
gpu 680GTX machine with cudaHashcat. Also, how awesome of IRIX guys was to not
allow more than 8 characters in a password?

~~~
bluedino
If it was IRIX it might have been faster to run some old exploit and then
reset it

~~~
Keyframe
I looked at one where you would twiddle the bytes on filesystem, but this was
easier. Fire and forget until the results are in.

------
kriro
Pretty good writeup. My takeaway is that it's more important to use a long
password than to mix and match letters/digits/special characters if that's the
choice you have (since the cracking process is greedy and going from short to
long not from low entropy to high entropy...for lack of a better description).
Would that be a correct assumption?

Just to pose a silly example...take a password of 300x"x" (maybe turn the 42nd
into an "o" for good measure)...since many attacks probably won't enumerate
that many characters before they reached a sufficient mass of cracked PWs that
would be reasonably safe even though it is kind of a silly PW, right?

Edit: no need to do that in practice since you can just use a randomly
generated PW with a PW safe but maybe there's a case where you need to
remember the PW just in case.

~~~
e12e
> My takeaway is that it's more important to use a long password than to mix
> and match letters/digits/special characters

Yes. Trivially, if you use only two symbols (eg: "0" and "1"), a (random)
password of 128 letters should be pretty safe. Note that your example of just
300 of one letter wouldn't be all that safe. In general, a good password won't
really be easy to remember, because it needs to encode a lot of entropy.

More generally, you probably want log2(Nsymbols^length) >= 64, possibly => 96
(ie: equivalent to at least 64 or 96 bits of entropy). If you're using big and
small letters, digits, and say ten printable symbols, every single letter
(each random pick of one of the 2 _26+10+10=72 symbols) adds roughly 6.17 bits
of entropy. So you 'll need at least eleven letters in your password. If you
just use small letters, every character in your password adds about 4.7 bits -
so to "climb over" 64 bits of entropy, you'd need at least 14 letters in your
password.

Using just digits, each digit 0-9 adds about 3.32 bits, so for 64 bits you'd
need a string of 20 _random* digits.

To enumerate half of 2^64 passwords at 200 million tries/second, would take
about 2^63/(200 000 000 * 3600 * 24 * 365) ~ 1 499 years. Clearly, if you had
3 000 machines, you could do this in about half a year - so depending on your
risk profile, you might choose to aim for 96 bits: 2^95/(200 000 000 * 3600 *
24 * 365) ~ 6 439 554 927 618 years ... (That's eg: 29 random digits).

------
pellej_s
Yes, sure. You should not be using anything but Bcrypt et al for passwords
(salt, salt, salt!) – but... Out of curiosity. What if these passwords were
SHA-512 hashed (unsalted) rather than SHA1?

Anyone know of comparable articles?

~~~
afreak
As part of a presentation I did at a local OWASP chapter, here are some
numbers based on just using CPython's Hashlib processing of 14,000,000 someodd
passwords:

Intel Xeon E5-1620 3.6 GHz: SHA: 8.16 seconds, SHA256: 11.01 seconds, MD5: 8.7
seconds

AMD FX-8320 3.5 GHz: SHA: 10.63 seconds, SHA256: 13.49 seconds, MD5: 10.06
second

Intel Celeron N2840 2.2 GHz: SHA: 32.4 seconds, SHA256: 39.75 seconds, MD5:
28.95 seconds

Intel Pentium M 1.7 GHz: SHA: 37.98 seconds, SHA256: 48.12 seconds, MD5: 34.49
seconds

SHA512 isn't going to make it much better.

------
13of40
If you want to throw a wrench in a password cracker's gears, why can't you
just run your 'crypt' function on its own output a thousand times in a row, so
that anyone attempting to crack it will need to run it a thousand times with
every candidate password? What I mean is 'crypt(crypt(crypt(p)))' should take
three times as long as 'crypt(p)', right? And scale 'a thousand' to however
many iterations takes one second on contemporary hardware.

~~~
zamalek
Use scrypt or bcrypt: both are computationally expensive by design. Both are
largely unapproachable on GPUs and ASICs.

~~~
masklinn
> Both are largely unapproachable on GPUs and ASICs.

Not true with respect to bcrypt. Bcrypt is not designed to be memory hard, it
uses a constant and relatively small amount of memory (~4kB IIRC) which is
only incrementally better than PBKDF2.

 _However_ its memory access pattern means it needs _fast_ RAM, which is what
makes GPUs inefficient for it: they have lots of memory[0] but it's slow to
access from the parallel cores[1]. You can absolutely have small amounts of
fast ram in FPGA and ASICs.

[0] and a good amount of memory/core, a 1080 has ~3MB/core, a 1070 has 4 (they
both have 8GB RAM but the 1080 has 2560 CUDA cores versus 1920 for the 1070)

[1] GPUs have very high memory throughput but very high latencies even for
caches[2] e.g. on Kepler (a few generations back) the L1 was already 48
cycles, Skylake can go up to L3 in 42 cycles

[2] which are shared by multiple cores and are very very small: from what I've
found Pascal has 64KiB of L1 per SM (with each SM grouping 64 CUDA cores) and
4MB of L2 for the entire chip

~~~
zamalek
Thanks for the correction.

------
frgewut
We may discuss about proper authentication mechanisms here, but I guess the
real lesson learned is "don't gather data if you really don't need to".

------
matt_wulfeck
Honest question, how does somebody know whether a dumped hashed/"encrypted"
password has actually been broken and exists in plaintext?

Some time ago I reset almost all my passwords to 1passwd $RAND, but some of
these dumps are ooooold. Is there a legit way to find what's available for my
email?

~~~
afreak
There are several services (including one run by me).

[https://canar.io](https://canar.io) (mine)

[https://haveibeenpwned.com/](https://haveibeenpwned.com/)

Mine lets you free-form search whereas HaveIBeenPwned is there for searching
just e-mail addresses.

~~~
hodwik2
Yours is fun, in that it's more free-form, but HIBP seems to cover more
ground.

------
xapata
I'm amazed that "trustno1" is one of the top 50 most popular passwords in some
large datasets.

