
A Note on SIMON-32/64 Security - mrb
https://eprint.iacr.org/2019/474
======
tptacek
I posted this on HN yesterday, with a comment saying I was skeptical. Then we
talked about it on Slack and Thomas Pornin utterly debunked it, so I deleted
it. Pornin posted his analysis this morning:

[https://crypto.stackexchange.com/questions/70467/what-are-
th...](https://crypto.stackexchange.com/questions/70467/what-are-the-
implications-of-the-new-alleged-key-recovery-attack-preprint-on-
sim/70471#70471)

The cryptanalysis (really, "proof of zero knowledge", if I might clarify and
shoplift someone else's zinger at the same time) they claim to have performed
is slower than brute force.

Long story short: this is almost certainly just a troll.

~~~
throwawaymath
Thanks for posting this, I came here to post the same thread.

For those who are unaware: Thomas Pornin is a professional cryptographer. He's
a member of the NCC Crypto Services team and one of the authors of the
Sosemanuk stream cipher, which was part of the final portfolio for eSTREAM.
He's also involved in the development of one of the cryptosystems which has
made it to round 2 of the ongoing NIST PQCRYPTO standardization process.

His writing on crypto.stackexchange is prolific and highly informative, and
this is a strong rebuttal in particular.

~~~
nmadden
He is also the author of [https://www.bearssl.org/](https://www.bearssl.org/)

~~~
throwawaymath
Nice, I forgot that one. BearSSL is one of only two TLS implementations (if I
recall correctly) that wasn't vulnerable to the most recent Bleichenbacher
attack variant.

~~~
tptacek
That's correct, BearSSL and BoringSSL (Google's OpenSSL fork). Pornin is a
bad-ass. But keep in mind that they were targeting C/C++ libraries, so we
don't have telemetry on (for instance) Go's crypto/tls or whatever Rust is
doing with _ring_.

------
dchest
The paper looks suspicious to me.

\- Novel crypto analysis technique which can't be revealed because "a lot of
work to be done to obtain an optimized, more efficient and industry-level
version".

\- Claim that success probability is low, but "we have a 2nd algorithm": "To
date the probability of success is still very low (p = 0.025) but we are
optimistic about the possibility of significantly increasing it in the
upcoming months. Indeed we have a second algorithm, which is theoretically
proven, with a success rate of 0.25. It is not yet fully tested and executed
because it requires a higher computing power, although it is still reasonable
for operational cryptanalysis (which can be repeated over time)."

\- Claim that SIMON algorithm had been introduced in the Linux 4.16 kernel (it
was Speck, not Simon). May be an honest typo.

\- Testing on Odroid cluster.

\- Can't find anything on "Alba3 Group" or the authors (who have
@protonmail.com address -- also suspicious).

\- "The only reference for this magnitude of cryptanalysis is [9]. In 2002, a
64-bit RC4 key was obtained in 1,754 days (300,000 participants). Today, to
keep up with the evolution of computing power since 2002, it is necessary to
divide by 2^10, or about 3 days with the same number of participants." It was
RC5, not RC4 (it's even in the title of reference -- again, may be a typo),
and how did they come up with 2^10 number?

I don't math, but it also looks suspicious:
[https://twitter.com/colmmacc/status/1127100892883312640](https://twitter.com/colmmacc/status/1127100892883312640)

See also:

[https://www.reddit.com/r/crypto/comments/bn5hds/crikey_key_r...](https://www.reddit.com/r/crypto/comments/bn5hds/crikey_key_recovery_attack_on_simon_3264/en37dh0/)

~~~
wwwigham
The math isn't too suspicious from what I can read, though I'm by no means an
expert - it's just set stuff, from what I can tell. I think that Twitter post
is just confusing the different fonts for the sets vs the scalars. "The
cardinality of the set of keys is equal to two to the power of the length of
the key" is a pretty sensible statement. The Yi is probably a typo,
considering the `Xi = Xi` on the prior line is a bit redundant (and this is
preprint, so I'd forgive it).

In any case, my real takeaway:

> Our main result is that we can find a 64-bit key in about three days
> (average time) on two Odroid MC1 clusters (8 Gb) [18] from two pairs of
> plaintext/ciphertext.

The algorithm isn't strong vs key reuse. I'm unsure if it actually claimed to
be so - key reuse is almost always a big problem, though, and in this case
they're exploiting the birthday paradox to make the search for the key more
efficient.

------
debatem1
There are really good reasons to be suspicious of short block length ciphers,
but there's nothing in this paper besides a dire warning and hint of future
results. Contrary the authors' plea at the end, I think this proves exactly
why cryptanalytic results _must_ be published.

------
emilfihlman
>We have discovered this great thing

>But we are not going to publish how

>Trust us

Yeaaaaah, no.

As highlighted by dchest and that twitter thread, highly improbably anything
real comes out of this.

------
jepler
I tried to verify "table 5", but while the "YHWH" keys all worked, only 2 of
the 5 " " keys did.

[https://emergent.unpythonic.net/files/sandbox/474.py](https://emergent.unpythonic.net/files/sandbox/474.py)

(Table 5 isn't "important", it's just a justification of why their algorithm 1
skips analyzing pairs where the plaintext in each pair is identical; and
finding such keys doesn't seem TOO hard to do in the obvious way, you can get
one such key every 4 billion trials or so)

PS The authors thank an Oleg Ivanovich Popov. I found one such person who is a
researcher ... with publications such as "Thermodynamics of Hydrogen-Sulfide
Conversion in a Claus Reactor in Coke-Oven Gas Desulfurization Circuit of MMK"

~~~
jepler
I also tried to verify "table 6", which appears to be correct except for the
second row, where the hex value shown for "C" is accidentally the same as "E".
Correcting it to 0x7761792c20666f72 causes all the "test cases" to come out as
claimed.

[https://emergent.unpythonic.net/files/sandbox/474bis.py](https://emergent.unpythonic.net/files/sandbox/474bis.py)

It does seem like there has to be some interesting cryptanalysis going on to
produce this many interesting pairs, particularly when they're (claimed to be)
drawn from a relatively short corpus.

However, when considering how you would generate such pairs, you need not
actually do "Algorithm 1" (in which step 6 is "do the secret magic"); you can
select your block Pi and key Ki, then see if the result happens to be a block
Ci that you can claim you picked first. In "pg10.txt" there are 53103 distinct
4-byte blocks, or about 2^32 pairs of blocks, so if you work in this way you
only have to do around 2^33 SIMON-64/32 block encryptions to be able to
produce a new row of "table 6".

Also, some of the C/E values are not actually present in "pg10.txt", their
supposed restricted corpus. For instance, "{" is a character in several of
them, but appears nowhere at all in my copy downloaded from the URL they gave.
So they have somehow failed to accurately describe the corpus they actually
used.

~~~
jepler
Here are 100 more items just like table 6, generated in, oh, about 2 seconds
on an i7-4790k.

[https://emergent.unpythonic.net/files/sandbox/trolled.txt](https://emergent.unpythonic.net/files/sandbox/trolled.txt)

Here's the search program (built with g++ -O3 -fopenmp -fno-strict-aliasing on
debian stretch)

[https://emergent.unpythonic.net/files/sandbox/search.c](https://emergent.unpythonic.net/files/sandbox/search.c)

Just put the corpus "pg10.txt" in the current directory and run.

Newlines in the corpus are turned to spaces; carriage returns are deleted.

I back-checked just a few using the same Python SIMON implementation.

------
brohee
I was about to read it, and then saw they were citing Filliol, which put me
off to say the list (the dude is completely discredited since he "broke" AES).

Also, interestingly, none of the authors seem to have left much footprints on
the Internet, which is extremely odd at best, the sign of some kind of fraud
at worst...

------
tempodox
Surprise! Cryptography designed by the NSA is tailor-made to be easily cracked
considering the computing resources they command.

~~~
floatingatoll
The surprise is that they may have proven that a weakness exists. To date,
it’s merely been a suspicion. It will be interesting to see whether the
weakness is deemed plausible or not by security researchers - and whether it
can be applied to non-NSA systems.

~~~
nullc
> The surprise is that they may have proven that a weakness exists.

A 64-bit keyspace is small enough that I don't think it's reasonable to take
their paper as proof.

It would be fairly straight forward to use a FPGA farm (or potentially GPU
farm) to search the entire keyspace. The fringes of the cryptocurrency altcoin
ecosystem have caused the creation of some pretty impressive FPGA and GPU
farms...

Additionally since they equate two pieces of text from the document via a key,
they can get a massive speedup if they don't actually fix them. On this basis
their proof mechanism seems highly suspect to me, unless I'm misreading it.

To be more clear of about I'm suggesting: pick a piece of text from the
source, pick a key, encrypt or decrypt, then check if the result is anywhere
in the text. Assuming the lookups are free this gets you a 4.2 million fold
speedup (the corpus they're using is about 4.2MB). I'm not familiar with the
structure of the cipher but there may be a meet in the middle that
dramatically improves this approach.

It's completely reasonable that someone might search a 64-bit keyspace as a
prank, ... I factored a 100-digit semiprime last night for a prank. My prank
was not anywhere near as "cool" as convincing a lot of people that an NSA
cipher was backdoored.

~~~
tptacek
Forget the FPGA farm. The actual task the authors have set out for themselves
to "prove" SIMON weak, you could complete on a laptop. The problem is, you can
complete it on a laptop for _any_ cipher with the same block and key size
parameters.

------
ggm
If the NSA responded, is there a response which could both command respect,
and be truthful? it feels like the only one would be to acknowledge this team
found a flaw. The secondary questions would be to ask if the flaw was
inserted, or happenstance (which implies the NSA is incompetent at maths,
distinct from politics: we already know they blotted their copybook
politically)

~~~
wwwigham
The construction is at least at face value a pretty simple ARX cipher - the
only real questions being why choose the specific rotation constants they do
for each round (ciphers developed in the open usually do things like key off
digits of pi) and why compose specifically via xor, then add, then rotate in
each round, rather than more of a blend.

If this style algorithm does indeed have an intentional backdoor: That's some
crazy mathematic chops and I'd love to read the theory behind how it (the
intentionally weak algorithm) was found (since hopefully that leads to a
natural way to generate stronger algorithms, or at least check for weak ones).
That'd be a valuable takeaway for the security community once the secret's
spoiled, if it is the case.

~~~
cyphar
> ciphers developed in the open usually do things like key off digits of pi

It should be noted that we should be very wary of these types of "nothing up
my sleeve" numbers. djb showed[1] that with enough effort you could come up
with more than a million "obviously not backdoored" numbers (this was done in
the context of elliptic curves) -- enough to exploit a million-to-one unknown-
by-the-public vulnerability.

[1]: [https://youtu.be/Cj3PN5-n108](https://youtu.be/Cj3PN5-n108)

~~~
throwawaymath
The context Bernstein is talking about is very meaningful here. The
mathematics of public-key cryptography provides a rich tapestry for covering
up hidden backdoors. ARX ciphers are very simple compared to elliptic curves,
and the complexity of round constants isn't really comparable to that of
curves.

~~~
cyphar
Right, I should've added I agree with GP that ARX ciphers are incredibly
simple and the ability to backdoor them would be a very novel (and concerning)
discovery.

My point is that "the values come from pi" is not necessarily proof that the
constants really are "nothing up my sleeve". Bernstein was discussing this in
the context of NIST curves (which could be backdoored), but the same one-in-a-
million maths works for any constants (so long as you happen to know a weak-
constants vulnerability that isn't known by the public).

