
Anatomy of a pseudorandom number generator – visualising Cryptocat's buggy PRNG - tptacek
http://nakedsecurity.sophos.com/2013/07/09/anatomy-of-a-pseudorandom-number-generator-visualising-cryptocats-buggy-prng/
======
oinksoft

      Cryptocat's hacktivist credibility was cemented in 2012 when
      its Canadian developer, Nadim Kobeissi, was stopped at the
      US border and interviewed about his Cryptocat-related
      programming activities.
    

s/was/claimed to be/. This software clearly is not Ft. Knox, and its becoming
less and less believable that US intelligence would ever feel the need to
interrogate the author of an _open source_ project, and with such brittle
security, about the cryptography techniques used therein.

I find it far more believable that the young developer with a penchant for
dramatics and a reply for everything has a talent for PR. And a shallow
understanding of the grave consequences of faulty crypto software.

~~~
marshray
I believe Nadim is being truthful in that someone at the border asked him
about Cryptocat.

I think they probably saw his tweets "I'M CROSSING THE BORDER NOW OMG I HOPE
THEY DON'T GIVE ME TROUBLE FOR BEING A BIGTIME HACKER ACTIVIST", they Googled
him, and asked him about his website.

------
magikarp
As the lead developer for Cryptocat, I must say this is really a great example
of how to write a post-mortem for a security bug. Sophos bloggers are always
worth reading.

~~~
aortega
Just remember that your software is not you. Have a tick skin, learn and start
again. Probably a name-change will help too...

~~~
yawgmoth
I don't understand this. Why would anyone have to change their alias just
because they wrote buggy software? Mistakes happen, people move forward, but
they don't have to recreate themselves just because they mess up. That seems
radically unhealthy.

------
zaroth
First, this is a great article showing the bias in Cryptocat's very awkward
PRNG code.

However, the off-by-one bias was actually the _least_ of the problems with
Cryptocat's random numbers...

From reading Steve's write-up, the problem was that their keys were ridiculous
undersized, because they called their own function wrong:

May 7, 2012 (switched from DH to ECC):

    
    
      myPrivateKey = Cryptocat.randomString(32, 0, 0, 1);
    

April 19, 2013:

    
    
      rand = Cryptocat.randomString(32, 0, 0, 1);
      myPrivateKey = BigInt.str2bigInt(rand, 10);
    

June 3, 2013:

    
    
      rand = Cryptocat.randomString(64, 0, 0, 1, 0);
      myPrivateKey = BigInt.str2bigInt(rand, 16);
    
      

> The bug that lasted 347 days was the confusion between a string and an array
> of integers. This made the ECC private keys ridiculously small because they
> passed a string of decimal digits into a function expecting an array of 17,
> 15 bit integers. Each character was considered an element in the array. So
> each of those "15 bit integers" were only the values 0 to 9 (3.32 bits).
> Also the least significant 3 bits are zeroed giving you a key space of
> 2*10^16 (2^54.15). --
> [http://tobtu.com/decryptocat.php](http://tobtu.com/decryptocat.php)

See for yourself here:
[https://github.com/cryptocat/cryptocat/commit/a17bb1599463e0...](https://github.com/cryptocat/cryptocat/commit/a17bb1599463e0c079137d8200eb32002ac1afc7#src-
core-js-lib-multiparty-js-P4)

Even now looking at this code, for me it just doesn't pass the 'WTFs per
minute' test... It still looks fucking wrong to me...
Cryptocat.randomString(64, 0, 0, 1, 0) returns a 64 character string of 0-9,
why are they calling str2bigInt with a hex radix? Did they add the last '0' in
the wrong place? Or maybe I'm looking at the wrong check in...

EDIT: Ok, it WAS still wrong as of June 4th... It wasn't until July 4th until
it finally became:

    
    
      var rand = Cryptocat.randomString(64, 0, 0, 0, 1)
    

See:
[https://github.com/cryptocat/cryptocat/commit/8fb7f4b8e59c76...](https://github.com/cryptocat/cryptocat/commit/8fb7f4b8e59c76c78c06bd16f87d3e5fa9defc7a#L1L91)

EDIT EDIT: So just to be clear... v2.1.11 which was supposed to fix this
problem has this 'incorrect radix' bug in it? Please tell me I'm wrong, or do
we need another CVE and another release?

~~~
marshray
Maddening isn't it?

~~~
tptacek
I am totally, completely in the dark as to which coding errors (there have
been several in the past few months) attach to which CVEs and which versions,
and about when these vulnerabilities were reported to the project and when
they were subsequently disclosed. I can't be the only one confused, and I do
wonder whether that confusion is strategic.

Pointing this out would be be gratuitous, except that Cryptocat's project
leader has very high standards when it comes to the disclosure behavior of
that project's competitors:

[https://github.com/SilentCircle/silent-phone-
base/issues/5](https://github.com/SilentCircle/silent-phone-base/issues/5)

(Cryptocat is "kaepora" in that thread; the context there is that Mark Dowd
had reviewed ZRTPCPP, an open source implementation of the ZRTP protocol that
Silent Circle funds and relies on, and found several memory corruption
vulnerabilities. Mark had gone public with those vulnerabilities quickly as a
result of a miscommunication with the Silent Circle team, but then redacted
his disclosure and agreed to embargo the findings while Silent Circle
coordinated fixes with all of the Silent Circle competitors that relied on
that library; what you see in that thread is, apparently, the lead developer
of Cryptocat, which doesn't use ZRTPCPP, repeatedly criticizing Silent
Circle's handling of the incident.)

The project should meet the standards that it has loudly set for others, and I
don't think it has. I'd be happy to be corrected on this.

~~~
DanBC
[edited to keep this thread constructive]

~~~
tptacek
My concern is substantive:

In the process of handling an embargoed disclosure, Silent Circle groomed
(perhaps aggressively) a Github pull request thread. Handling a coordinated
disclosure on a Github pull request is tricky under the best of circumstances,
but the Silent Circle team was dealing with an unplanned partial disclosure
and was, with Dowd's cooperation, trying to get the genie back into the bottle
long enough for (from what I can tell) their competitors to get patched.

This was, as is evident from the thread, a problem for Cryptocat's uninvolved,
un-implicated lead developer.

Meanwhile: I cannot immediately tell from the record which of the
_independently discovered_ vulnerabilities in Cryptocat correspond to which
CVEs or which advisories and which severity descriptions.

The problem isn't one of demeanor; it's that I can't tell what's going on with
this project. Does this sound fiddly? Let's make it more concrete: did an
exploitable vulnerability get reported to the Cryptocat project, mitigated in
a public Github commit, and sit there for months without any acknowledgement
from the project?

Cryptocat's project leader is an HN commenter, and if this confusion is my own
doing --- totally possible --- I'd be happy for some clarification; not about
Silent Circle, but on how someone would get a record of what vulnerabilities
were disclosed in Cryptocat, when, and what the timeline was for each from
discovery to public disclosure. It's plausible that there's a clear record
somewhere and I just missed it.

------
speeder
I don't understand much of crypto...

But I am game developer, and game developers (specially RPG fans) love random
numbers.

Some games of mine, I suspected something was off with the PRNG, and did
something like they did on the ending, I used the random number generator to
draw pictures.

Biased generators were quite obvious, because they made obvious patterns (one
of the worst offenders was C default random function on the mingw that came
with Dev-Cpp, instead of drawing noise, or weird noise, it drew diagonal
lines... yes, LINES, it became clear to me that it was terrible...)

In the end I choose Mersenne Twister, it is a very good PRNG... For games that
is, although the Mersenne Twister does generate random looking numbers, its
author claim it is bad for crypto, Mersenne Twister, I don't remember why, can
be predictable enough for anti-crypto.

~~~
nikic
MT is not suitable for crypto because after observing 637 values you can
predict all further values. 637 is the size of MTs internal state vector and
the output of MT are basically just values from that state vector run through
a tempering function. That tempering function is invertible, thus by looking
at 637 values you get the full state vector, at which point you can generate
the next values as usual.

~~~
axman6
That's a great answer, thanks for that!

~~~
tptacek
You should definitely actually try to clone an MT instance sometime. It's
illuminating, and just tricky enough to be fun without being frustrating.

------
txttran
I'm a complete noob when it comes to cryptography. I understand that having a
PRNG that doesn't return numbers with even distribution across a range is bad.
Extreme example would be something like
[http://xkcd.com/221/](http://xkcd.com/221/).

But could someone explain how an attacker can take advantage of the fact that
0 is returned ~1% more often than other digits? It this flaw alone sufficient
to break cryptocat? Or does it simply make brute forcing easier when combined
with other crypto flaws?

~~~
patio11
At a high level of abstraction: If one runs a nuclear power plant, one does
not make a practice of tolerating small oil spills. Small oil spills are
almost harmless. So are small quantities of sparks. The combination of small
oil spills and small quantities of sparks, however, is a severe problem and
gets worse _in a hurry_ if it compounds with certain other usually benign
properties of nuclear power plants.

Unfortunately, the sort of nuclear power plant operators which tolerate oil
spills are often sufficiently not on their A game to tolerate sparks.

This is a very handwavy explanation. In particular, God doesn't hate nuclear
powerplants and try to introduce sparks into them at inopportune moments just
to see if they happen to find an oil spill, but The Adversary often can and
will do this to your cryptosystem.

------
ChuckMcM
Nice article, I can certainly vouch for creating a visualization to clue you
in that there may be a problem.

One of the things I built out of Java when I was hanging out on the
cypherpunks list was a 'Noise Sphere' applet. Basically this is a way of
testing a PRNG visually. It was fun to put various ideas through it to see how
they panned out (most really sucked) One of the cool things was using the
alpha emitter RNG from a smart card and finding out that it too was slightly
biased in the presence of foil on one side (never did find out why that was,
but it did stick out)

~~~
DanBC
H-Online have a nice article about an external enclosure that offered
"hardware data encryption with 128-bit AES, access control via an RFID chip".

The visualization (of the encrypted data, not of any PRNG used) showed that
the encrypted data had strong patterns. The data wasn't encrypted using RSA,
it used XOR with a non-changing cipher block. (This is trivially easy to
crack. Even I could do it.)

The error arose because the manufacturer of the controller chip was using
confusing terminology.

> The IM7206 merely uses AES encryption when saving the RFID chip's ID in the
> controller's flash memory. The company explained that actual data encryption
> is based on a proprietary algorithm.

([http://www.h-online.com/security/features/Enclosed-but-
not-e...](http://www.h-online.com/security/features/Enclosed-but-not-
encrypted-746199.html))

Other people interested in testing PRNGs might be interested in the DIEHARD
tests
([http://en.wikipedia.org/wiki/Diehard_tests](http://en.wikipedia.org/wiki/Diehard_tests)),
but read all the cautions too.

------
jlgreco
Really drives home what many people were saying about the authors not merely
being bad at cryptography but programming in general...

~~~
tptacek
I upvoted this because, of course, it confirms my own biases, but kind of wish
I hadn't, because we're not helping with comments like these.

~~~
glurgh
Sure but the author(s) of Cryptocat have been so uniquely resistant to
help/input, over a fairly long period of time. Maybe your euphemism
'unserious' is worth adopting for every time one wants to say 'bad' or
'incompetent', in cases like this.

~~~
tptacek
I think the project is worth criticizing, but that the concerns are serious
enough to deserve better than snark.

~~~
glurgh
Maybe at some point a project exhausts whatever potential
goodwill/snarklessness is available and just becomes an object of snark - and
at that point and beyond, perhaps the benefits of the project having a
reputation of being a snarkmagnet outweigh negatives of the inevitable snark
it generates.

Makes it a lousy, toxic topic for message boards but that seems a
comparatively small price for a defense against the notion that Cryptocat is
(or likely, ever will be) secure.

------
dchest
While we're at it, did anyone review Salsa20 implementation or willing to
review it?

[https://github.com/cryptocat/cryptocat/blob/master/src/core/...](https://github.com/cryptocat/cryptocat/blob/master/src/core/js/lib/salsa20.js)

------
mpyne
This reminds me of the IBM RANDU PRNG on which I gave an undergrad
presentation (w/ visualization). RANDU was legendarily bad; all you had to do
was plot triples in a 3-D volume to see that _all_ of the generated points
would fall within one of 15 (Edit: 15, not 11) planes.

------
mistercow
>The code above is certainly an inelegant solution, since it is, in theory, at
least, a potentially infinite loop.

As far as I know, it is the _only_ solution for generating unbiased random
numbers in a range that does not divide the native range of the PRNG you're
using.

What I _don 't_ understand is why they're working in decimal in the first
place. The obvious solution is to generate a random integer in the range [0,
2^53) (which does not require an unbounded-time loop, since 2^53 divides 2^64)
and then divide by 2^53. I am open to the possibility that there's some quirk
of floating point arithmetic that means that won't work [edit: less so now
that I see it's how Java implements it¹], but I can hardly imagine that it's
worse than this crazy business of generating 16 decimal digits because, hey,
log₁₀(2^53) is pretty close to 16.

¹[http://docs.oracle.com/javase/7/docs/api/java/util/Random.ht...](http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#nextDouble\(\))

~~~
marshray
You could accumulate numbers until you reached the least common multiple of
the source and desired ranges.

~~~
kunil
How is that any different, am I missing something?

~~~
marshray
Never mind. I'm confused. Not the first time :-)

------
cobbal
I think the criticism of the loop is a bit unfair. If your random number
generator returns >250 a thousand times in a row, you have bigger problems
than slowness.

~~~
q3k
Purely mathematically, you cannot assert that a certain value will appear in a
certain timeframe - it's random!

~~~
cobbal
True, but purely mathematically every cipher is vulnerable to the "get lucky
and guess the key right the first time" attack (OTP excepted of course).

Almost all crypto operates in the realm of absurdly close to certain.

------
JunkDNA
Seems like at a minimum, if you are doing your own random number generator,
you should have a test harness that runs the chi squared test and compare
against a known good random number source. I wonder how many other automated
tests one could stack up against a crypto codebase to do that kind of basic
checking.

~~~
yk
You are running the chi-square against the claimed distribution of the RNG,
not against another RNG. Which is just about enough to use the RNG for
simulating dice rolls ( sometimes, if the players are not too passionate). RNG
can have a lot more subtile flaws, for example the RNGs from the ANSI C rand()
with the suggested parameters lie in just a few thousand planes, if you
construct points in some vector space out of the random numbers. ( That is
bad, if you want to run Monte Carlo integration, because your error estimate
improves, but not the accuracy of the result.)

So even for simulation you can have rather subtle effects, even though there
you get bitten just by bad luck. The task for cryptographically secure random
number generators is actually quite a bit harder, because a attacker tries to
exploit any weakness in your RNG, but your tests do only tell you which
specific attacks do not work.

~~~
caf
The ANSI C rand() doesn't actually specify any particular algorithm. The
complete set of requirements for rand() is just that: rand() computes a
sequence of pseudo-random integers in the range 0 to RAND_MAX; RAND_MAX must
be at least 32767; if srand() is called with the same seed value, the same
sequence of values will be produced; calling rand() before calling srand()
acts as if srand() had been called with a seed of 1; and no library function
can peturb the pseudo-random sequence.

Historical C standard library implementations _did_ tend to use the same or
similar LCRNGs, but that was never a requirement, and is no longer the case.

------
autodidakto
I thought cryptocat was independently audited. Who audited it and why didn't
they find these problems?

~~~
afreak
Cryptocat was audited by Veracode but it is my understanding that they just
audited the functionality of it (IE: the interface and all that is related)
but not the cryptography, as it was out of scope.

------
Ellipsis753
Interesting though this is, all it really shows is that there is a bias in the
random number generation.

I would be more interested to know if this is sufficient to break Cryptocat.
Would this really make a brute force attempt much easier?

~~~
tptacek
This isn't a disclosure of a new vulnerability; it's a followup on an earlier
article which documented a much more significant problem with key management
in Cryptocat.

------
danso
I can mostly grok the scope of the error and the theory behind it...what's
more interesting to me is to hear, preferably from the Cryptocat developers,
what led to this error? An accidental regression? A deliberate decision?
Cryptocat and what it aims to do is respectable, but I think what concerns
people outside of the project is the prospect of unknown unknowns in the rest
of the code: are the recent bug revelations indicative of endemic blind spots?

~~~
sirsar
Off-by-one errors are very common (and they have a long history). My best
guess is "an honest mistake."

As for the "endemic blind spots," any crypto software needs professional
reviews and a good deal of time before it can be used in critical situations.

~~~
tptacek
Yes. A pro friend of mine is fond of saying: budget 10x to design, 1x to
implementation, and 10x to verification.

------
canvia
I don't know if the name is supposed to be a joke, but it's a bit weird that
the NSA uses it internally:

[http://blogs.marketwatch.com/thetell/2013/06/06/nsa-
targets-...](http://blogs.marketwatch.com/thetell/2013/06/06/nsa-targets-kids-
with-crypto-cat-decipher-dog-and-other-cartoon-characters/)

------
StavrosK
Very interesting article, thank you.

------
rheide
This should be possible to test automatically. And if that's not possible/easy
to do, it should be standard procedure to generate colour maps that can be
checked by humans.

------
revelation
We talk about how JavaScript delivered from a webserver can never be a secure
crypto platform, but of course in reality the problems are far more mundane.

~~~
StavrosK
s/mundane/interesting/

------
bcl
Why doesn't the article include updated graphs with the bug fixed?

------
kunil
Wow, someone used <= instead of < and they wrote 5 page report with all those
images and charts. The bug is obvious and simple, what is with all those
explanations?

~~~
baq
this is crypto. there were 'less serious' bugs like initializing arrays to
zeros (see debian) that led to catastrophic results, which is what the
explanation is about.

~~~
kunil
I am not saying if bug is critical or not. It is just pages of explanations,
charts and stuff. I was expecting a video explanation by the end.

If they are targeting non-crypto people (which includes me), explaining how a
bad random algorithm affects cryptology would be better instead of showing
that algorithm is bad in 5 different ways.

Also any link to articles about that Debian bug?

~~~
DanBC
Random numbers are used for different things, but mostly generating keys. An
attacker that knows everything about your system (all the hardware, and the
software, all the sourcecode, everything) should not be able to predict the
next bit output by a prng.

Errors include:

i) using a source that is not random. As mentioned elsewhere some hardware
devices provide skewed numbers, and even de-skewing doesn't help too much.

ii) using a poor seed.

The Debian bug is an example of ii -

> _This vulnerability was caused by the removal of two lines of code from the
> original version of the OpenSSL library. These lines were used to gather
> some entropy data by the library, needed to seed the PRNG used to create
> private keys, on which the secure connections are based. Without this
> entropy, the only dynamic data used was the PID of the software. Under Linux
> the PID can be a number between 1 and 32,768, that is a too small range of
> values if used to seed the PRNG and will cause the generation of predictable
> numbers. Therefore any key generated can be predictable, with only 32,767
> possible keys for a given architecture and key length, and the secrecy of
> the network connections created with those keys is fully compromised._

([http://en.wikinews.org/wiki/Predictable_random_number_genera...](http://en.wikinews.org/wiki/Predictable_random_number_generator_discovered_in_the_Debian_version_of_OpenSSL))

(www.schneier.com/blog/archives/2008/05/random_number_b.html)

------
zainny
I wonder - if you took in aggregate the time people spent complaining about
Cryptocat could a more secure alternative have already been built? :-)

Anyway, I really enjoyed this article. Informative, interesting, and free of
snark.

~~~
paranoiacblack
No, stop trivializing cryptography software.

