
Linguistic Cracking of Passphrases Using Markov Chains [pdf] - xxtesterxx
http://www.simovits.com/sites/default/files/files/PederSparell_Linguistic_Cracking_of_Passphrases_using_Markov_Chains.pdf
======
wodenokoto
I thought passphrase was supposed to be a series of random words. If you use
natural sentences you limit the choice of words.

~~~
yeukhon
Nothing on Earth is safe and uncrackable, given enough time and compute power.
We are living in paranoid. Password and passphrase are theoretically no
different. You can generate a password like the one below and you can
considered that as a passphrase. "Password" and "Passphrase" are just
labels/names. I have passwords which are probably more secured than a
passphrase because of length, more use of upper case/lower case/number/special
characters.

Passphrase (which is supposed to replace the concept of password), should be
long and hard to guess, but easier to memorize. Is "bravo multiple
Pneumonoultramicroscopicsilicovolcanoconiosis mixture chemistry Gordon Ramsay"
a better passphrase? Oh sure. But who the hell would come up with this
passphrase? Unique but this is something I made up and I expect myself not
able to memorize it even if I know how to spell each word right.

Passphrase and password generation by a human is always limited to her own
knowledge of things. I just finished playing Battlefield 4 and the random
words I am thinking right now are bravo, delta, charlie, because these words
are constantly ringing in my ears. I just finished watching Hell Kitchen so
Ramsay is on my mind. So there are other channels you can gather what a
passphrase is likely to be for a particular user.

My point is, advise user what is a good passphrase/password, but don't make
them choose totally random words. Reject ones you know are totally weak. It is
hard enough to memorize password, so don't sell passphrase like a huge savior
while making it look like a password. You ain't helping. That's the
conclusion/advise given by the authors. Security hardness all comes down to
the strength of the policy. If you allow a plain English passphrase you are
weaker, but a requirement to mix upper/lowercase/special char/number? Back to
the traditional password.

Oh and the greatest insult to password/passphrase security? We don't change
passwords/passphrase on every site we register. So you are basically screwed
from the beginning. This ain't "update OpenSSL and you mitigate X attack."

~~~
zokier
> But who the hell would come up with this passphrase?

Computers could, except for ...

> Unique but this is something I made up and I expect myself not able to
> memorize it even if I know how to spell each word right

That's why a wordlist of _common_ words is generally used as a source.

> Passphrase and password generation by a human is always limited to her own
> knowledge of things.

That's why passwords should not be generated by a human, but a computer.

> My point is, advise user what is a good passphrase/password, but don't make
> them choose totally random words

Don't make them choose words at all. Give them tools that generate the
passwords for them.

~~~
yeukhon
If you have a word list you are no better than choosing a phrase you invented.
If you are searching for random word on the Internet, you are also likely to
suffer from common words appearing on the Internet. Password generation means
memorizing something very hard and a lot of push to use passphrase is to get
rid of the traditional password generation and different password requirement
policy. You are better off generating a SHA256 hash of some random source
(from /dev/urandom or /dev/random) if we were to generate random password.

------
based2
[https://github.com/sparell/phraser](https://github.com/sparell/phraser)

------
merpnderp
Texas nitrogen snail purple penut.

Every word from a different domain, complete with a word I often misspell, and
at least one word not in the top 10,000 most common words. I think you're
going to need a bigger Markov Chain.

~~~
atemerev
No, unless you obtained these words using a crypto-rated PRNG. Otherwise your
passphrase is vulnerable to semantic network analysis (somewhat more
complicated than Markov chains, but still trivial).

Sorry, humans can't generate secure passwords and passphrases anymore. (You
still can memorize secure passwords, but it becomes increasingly harder).

~~~
chronial
Yes, this. If you want a password that is secure against a bruteforce attack,
I would strongly advice against making it up yourself. You will never know the
entropy of your password and it could be anything.

I personally just generate 10 passphrases from a wordlist and take the one the
I like the most. This way you get both – predictable entropy and good chances
of getting something you can remember. Even though I could never quite figure
out which effect the "pick one of ten" method exactly has on the entropy :).
If anybody has any good input on that, I would be very happy to hear it.

If you have a wordlist that contains information on the type of word, you can
generate grammatically valid sentences without loosing too much entropy. I
wrote a tool to this in my native language (german) and I get sentences like
"schleimendes Hervorbringen transportiert krampflösend". This (very) roughly
translates to "slimy production transports anticonvulsantly". The german
sentence has an entropy of 52 bit.

~~~
gjm11
> which effect the "pick one of ten" method exactly has on the entropy

Probably impossible to say exactly without knowing what your brain is doing
when it picks one. But let's be pessimistic and suppose it picks the "nicest"
in some global sense and that "nicest" is the same for everyone (or, at least,
that an attacker knows what you consider "nicest".

Then the question is this. Generate _m_ samples uniformly from {1,...,N} (with
replacement? without? That depends on exactly what you do if you happen to get
two passphrases the same, but the probability of this is small enough that it
likely doesn't matter) and take the smallest; what's the entropy of the
resulting probability distribution?

That seems like a difficult combinatorial question, so let's consider a
continuous approximation. If we pick numbers uniformly on [0,1] and take the
min of m of them, the CDF is 1-(1-t)^m because the probability the min is >= t
equals the probability all m numbers are >= t, so the pdf is m(1-t)^(m-1). We
can compute the continuous entropy of this distribution (if both Wolfram Alpha
and I have done our parts right, the answer is - log m + 1 - 1/m) and of the
uniform distribution (zero). Continuous entropies are kinda meaningless but in
the limit their differences equal those of the corresponding discrete
distributions -- and for large N we are close to that limit.

So I think what this means is that by picking the "nicest" of m choices, you
lose log(m) - 1 + 1/m nats of entropy. (Divide by log 2 for the number of
bits.)

Crappy simple approximation to those: you lose log(m) nats of entropy (log2(m)
bits), which is just the same as if you divided the number of possible
passphrases by m. In reality the amount you lose is a little less.

So, e.g., if m=10 you lose about 1.4 nats or 2 bits. (The crappy approximation
says 2.3 bits.)

~~~
chronial
Thanks a lot for this clear analysis. I had gotten till the “difficult
combinatorial question” and was stuck. Didn't think of just going into
continuous space.

The “crappy approximation” is actually what I used up until now and it's good
to see that I wasn't wrong in assuming it would overestimate the loss.

You made one small mistake at the end: The approximation says you loose 2.3
nats = 3.32 bits, so they actually differ by over a bit.

Btw: What did you ask wolfram alpha to get the -log m + 1 - 1/m? It was out of
computation time when I asked :)
[http://www.wolframalpha.com/input/?i=-+int_0^1+%28m*%281-x%2...](http://www.wolframalpha.com/input/?i=-+int_0^1+%28m*%281-x%29^%28m-1%29%29+*+log%28%28m*%281-x%29^%28m-1%29%29%29++dx)
Or do you have a subscription?

~~~
gjm11
> You made one small mistake at the end

Oh yes, so I did. Sorry about that.

> What did you ask wolfram alpha to get the -log m + 1 - 1/m?

This, I think:
[http://www.wolframalpha.com/input/?i=Integrate[m+t^%28m-1%29...](http://www.wolframalpha.com/input/?i=Integrate\[m+t^%28m-1%29+Log\[m+t^%28m-1%29\],{t,0,1})]

I get a message about computation time exceeded, but I also get the answer
(both the indefinite integral and the definite).

I don't have a subscription.

~~~
chronial
Fascinating ^^. Removing the minus before the integral and changing 1-x to x
is the difference that matters. I would have though that wa is so quick with
those transformations that it doesn’t make a difference. Well, I was wrong :).

------
Pelam
XKCD got it right. Just pick 3 random words from a dictionary and the
resulting nonsense will still be easy to remember.

I wrote a script that even estimates the entropy for you based on the
available dictionary size:

[http://stackoverflow.com/questions/12646318/tools-for-
genera...](http://stackoverflow.com/questions/12646318/tools-for-generating-
strong-passphrases/12646320#12646320)

