
The design of Chacha20 - loup-vaillant
http://loup-vaillant.fr/tutorials/chacha20-design
======
riffraff
> [re magic "expand 32-byte k" string] And it's readable ASCII text, so you
> can be pretty sure there's no back door in there.

I am not sure this matters?

I mean, facebook managed to get a reasonably nice .onion routing id
(facebookcorewwwi.onion) by bruteforcing stuff right?

I can imagine bruteforcing the "backdoor key space" to find something that
looks good, am I insane?

~~~
vog
I was wondering about this, too. To put even more paranoia at the table:

Readable ASCII means that every byte is in a certain range. For example, bit 7
is 0 for every byte. Maybe this allone enables a backdoor.

That is, the mere fact that this is readable ASCII could enable a backdoor.
Who knows?

~~~
tptacek
No, it does not. You're taking this notion of constrained values out of
context. ASCII strings in crypto constructions are very common; for instance,
they provide domain isolation in hash constructions (where you have a single
hash function applied to inputs of different sensitivity, and want to mint
multiple logically unrelated hash functions from the one you have). They're
also common in versions.

The ASCII we're looking at here is conceptually a _hash input_. It's not a
part of the design of the hash core itself.

------
jean-
If you enjoyed the style of this article, I would also recommend you have a
look at the brilliant explanation of Earley parsing by the same author:
[http://loup-vaillant.fr/tutorials/earley-parsing/](http://loup-
vaillant.fr/tutorials/earley-parsing/)

~~~
frankenbagel
There have also been some good AES vs ChaCha performance tests: 1) Speedify's
AES vs ChaCha in VPN: [http://speedify.com/blog/a-new-kind-of-
vpn/](http://speedify.com/blog/a-new-kind-of-vpn/) 2) Cloudflare's AES vs
ChaCha in browser: [https://blog.cloudflare.com/do-the-chacha-better-mobile-
perf...](https://blog.cloudflare.com/do-the-chacha-better-mobile-performance-
with-cryptography/)

------
vog
How does ChaCha20 compare to the established AES standard? Is it stronger?
weaker? faster? slower? easier to implement correctly? harder to implement
correctly? better for some other reason? worse for some other reason?

~~~
TillE
AES is a block cipher, Salsa/ChaCha are streams.

This makes them very useful for, say, file encryption with random access.

~~~
loup-vaillant
Chacha20 can do random access. See the end of my article, when I talk about
counter mode. To get the part of the stream you want, you just generate the
block you need (they're all the same, only the counter changes), then encrypt
it. No need to generate all previous blocks.

Indeed, one reason for using AES in counter mode is this random access, which
among other things enables parallel encryption. The same strategy works with
Chacha20.

~~~
rakoo
I'm probably stating the obvious here, but whatever your strategy for
decrypting is you still _must_ verify the ciphertext integrity, which
unfortunately for you is calculated on the _whole_ ciphertext. You may win
some time by not reading the stuff before the block you're interested, but you
will have to read the _whole_ stuff anyway if you want to be safe.

I'm no expert of course so I don't even know if there's an AEAD that can bring
you integrity on parts of the input; at least I know that minilock
([https://github.com/kaepora/miniLock/blob/master/README.md#-m...](https://github.com/kaepora/miniLock/blob/master/README.md#-minilock))
builds some kind of counter mode where each chunk is properly encrypted _and_
has everything needed to check its integrity.

~~~
tptacek
The most widespread way of using Salsa/ChaCha is in the "Chapoly"
construction, which combines ChaCha20 with DJB's Poly1305 polynomial MAC; this
is an authenticated construction. Pretty much every mainstream application of
Salsa20 is in fact a Salsa/Poly1305 construction.

You can also just combine Salsa and HMAC.

It's true that you need to authenticate your data, but this is true for any
cipher that you use.

It's a bad idea to implement your own cipher code, no matter what you're
doing. If you're looking to include Salsa/ChaCha in an application, use Nacl,
which refuses to give you unauthenticated ciphertext.

------
tptacek
This is solid but might miss the forest for the trees.

If you want to understand Chacha/Salsa, the best way to start is that it's an
_ARX-based_ _hash function_ , _keyed_ , running in _counter mode_.

ARX stands for addition, rotation, and XOR, which are the three operations ARX
designs are composed of. Addition is nonlinear in the context of ARX, which
eliminates the need for S-boxes (or complex alternatives to S-boxes) and makes
it easier to build constant time crypto. You can make any function out of just
A, R, and X (technically, any function out of A and R, but less efficiently).
A good place to start understanding why you want rotation and nonlinearity is
the Wikipedia page for SP Networks:

[https://en.wikipedia.org/wiki/Substitution-
permutation_netwo...](https://en.wikipedia.org/wiki/Substitution-
permutation_network)

Another good bit of background is the old idea of iterated ciphers (round
functions run repeatedly, rather than one giant cipher function), and the
slightly more modern idea that it's better to have a very simple round
function you repeat a lot than a complicated round function you run fewer
times. When you get this you can start grokking design decisions in terms of
how many rounds they shave off your design to achieve the same security (which
also might get you back to why you have X in addition to A and R).

The Salsa20 core is a very simple hash function designed to be fast and
flexible for multiple constructions. Bernstein designed the stream cipher we
all know based on it, and also Rumba20, which is a more tradition collision-
resistant cryptographic hash. Designing ciphers out of hash functions has been
a research interest of Bernstein's since the 1990s, when hash functions were
approved for export but ciphers not.

A keyed hash (also: PRF) is a hash function that takes a secret key input. In
a general-purpose hash like SHA3, you provide the key along with the rest of
the input by simple concatenation. Fun true fact: in SHA-2 and hashes before
that, it was unsafe to do this, which is why we have the HMAC construction,
which SHA-3 and Blake2 obsolete. At any rate, Salsa20 takes the key as a
special parameter and encodes it into a block.

Counter mode is conventionally a method for turning a block cipher into a
stream cipher. In 2017 its widely seen as the most important and primary way
you should use block ciphers (if you're not using an AEAD, most of which are
built on counter mode in some way). Counter mode is super simple: you encrypt
a counter of some sort and XOR the resulting block with your plaintext. To
decrypt, you do the same thing.

A really good place to start learning about Salsa20 (and thus Chacha20) is
Bernstein's design paper, which is extremely readable and easy to skim:

[https://cr.yp.to/snuffle/design.pdf](https://cr.yp.to/snuffle/design.pdf)

~~~
psykotic
> A good place to start understanding why you want rotation and nonlinearity
> is the Wikipedia page for SP Networks:

I guess the real story requires knowing a little bit about linear and
differential cryptanalysis, which are conceptually quite simple in their
genesis, from a mathematical perspective.

XOR and n-bit addition are both forms of addition over different finite
fields, GF(2) and GF(2^n). Multiplication in GF(2) is AND, so any linear
function on a vector space over GF(2) is some kind of "masked parity"
function, with functions only distinguished by their mask.

You can back-solve for inputs given enough independent outputs using Gaussian
elimination and other standard linear algebra algorithms. Linear cryptanalysis
is based on finding combinations of output bits that behave close enough to
linear as a function of input bits to make this kind of strategy yield usable
information. That is, just as in computational mathematics more generally, we
approximate non-linear functions by linear functions and apply linear algebra
techniques to the linear functions.

Differential cryptanalysis is the same general idea but with GF(2^n) as the
scalar field instead of GF(2). If f is a linear function then f(x + y) = f(x)
+ f(y) and f(a x) = a f(x), so it's likewise true that f(x - y) = f(x) - f(y)
by taking a = -1. That is, reading this last equation backwards, if f is truly
linear then for any pair of vectors x and y with the same difference x - y, we
should expect f(x) - f(y) to have the same exact value. If f is an encryption
function (assume the key is baked into it), then all we have is f(x) and f(y),
so we can't compute f(x - y) without knowing x and y, but we can certainly try
to feed lots of plaintext pairs x and y with the same difference and see how
the differences f(x) - f(y) of their ciphertexts relate to each other. If we
can find a large family of plaintext pairs that have nearly the same
difference in ciphertexts (by an appropriate measure of "nearly"), then this
reveals an approximate linearity in the encryption function, and at that point
we're back to being able to use linear algebra techniques to gain information
about f and hence the key baked into f.

ARX attempts to foil such techniques by mixing both XOR and addition, which
would individually create linear functions over their respective fields, but
in combination help a little bit to break up the linearity over both finite
fields. And the R in ARX is bitwise rotation, which is actually linear over
GF(2) vector spaces (it's just a permutation of the vector's entries) but
strongly nonlinear over GF(2^n) vector spaces.

~~~
fpgaminer
Are there any concrete, simple examples showing linear and differential
cryptanalysis (simple, breakable cipher + example cracking program)? As much
as I've studied the theory and perused the design decisions of modern ciphers
to avoid such attacks, I've never taken the time to sit down and actually
crack a simple cipher using them. Would be neat to do so.

~~~
psykotic
I did some of these exercises a long time ago and learned a lot.
[https://www.schneier.com/academic/paperfiles/paper-self-
stud...](https://www.schneier.com/academic/paperfiles/paper-self-study.pdf)

The starter exercise labeled 6.2 is a good way to get your feet wet with the
ideas I described. 12-round DES without any S-boxes consists of P-boxes
(permutations) and XORs, which are both linear over GF(2) vector spaces, so
it's a linear block cipher and hence trivially breakable with any linear
algebra package. RC5 without rotations is not exactly linear over either
GF(2^n) or GF(2) since it mixes XORs and (mod 2^n) additions, but the
combination is only very weakly nonlinear (there's not enough avalanching from
the carries to entangle entries that are far apart), and therefore a good
demonstration of why you need rotations in ARX to introduce rapid long-range
bit entanglement. And in case it wasn't already obvious, the exercise about
RC5 with rotations by a round number will show you why the rotation amount in
ARX should be relatively prime to the bit width. Otherwise you end up with
disconnected rotation orbits where the round function only mixes within a
given orbit. In the extreme case where the rotation amount is half the bit
width, each orbit contains at most two elements, so it's hardly any better
than no rotation at all.

I bet there are also modern textbooks in cryptanalysis with exercises and a
more hand-holding approach. Maybe any cryptographers reading this could
recommend something.

------
JoachimS
Fwiw, here is a HW-implementation of ChaCha. It is very fast due to the big
block and four parallel quarterrounds.
[https://github.com/secworks/chacha](https://github.com/secworks/chacha)

~~~
j_s
Interesting to see the changelog with a bunch of improvements in September
after silence for nearly 2 years.

------
vog
The author seems to favor XChaCha20 over ChaCha20, even though XChaCha20 is
not part of any formal standard or any widely know paper. [1] It would be
interesting to know what DJB (the author of ChaCha20) thinks about XChaCha20
and related variants.

[1]
[http://crypto.stackexchange.com/a/34605](http://crypto.stackexchange.com/a/34605)

~~~
marcosdumay
Well, NaCl (DJB's encryption library) uses XChaCha20, doesn't it?

Edit: Nope, it's XSalsa20.

~~~
panic
NaCl provides Salsa20 and XSalsa20
([https://nacl.cr.yp.to/stream.html](https://nacl.cr.yp.to/stream.html)).

libsodium adds ChaCha20
([https://download.libsodium.org/doc/advanced/chacha20.html](https://download.libsodium.org/doc/advanced/chacha20.html))
but not XChaCha20.

~~~
CiPHPerCoder
That's coming soon:
[https://github.com/jedisct1/libsodium/blob/master/src/libsod...](https://github.com/jedisct1/libsodium/blob/master/src/libsodium/crypto_aead/xchacha20poly1305/sodium/aead_xchacha20poly1305.c)

~~~
asdaksdhksajd
ugh, an extra ietf variant that pads the remaining 64-bit from the nonce to
fit 96 bit thats incompatible with all the implementations out there... :-(

why can't we get our shit together...

------
zeveb

        a += b;  d ^= a;  d <<<= 16;
        c += d;  b ^= c;  b <<<= 12;
        a += b;  d ^= a;  d <<<=  8;
        c += d;  b ^= c;  b <<<=  7;
    

I'm curious: could a C compiler look at the quarter-round function above and
determine that a+b (or the other terms) might overflow a 32-bit integer, and
thus invoke undefined behaviour to eliminate the loop entirely?

~~~
cesarb
In C, unsigned overflow is defined as wrapping, and all these numbers are
unsigned. It's signed overflow that's undefined in C.

~~~
Gibbon1
'Safe for for now'

Notable I think the NaCl crypto library implements a compare as follows

    
    
       uint32_t diff_bits = 0;
       diff_bits |= x[0] ^ y[0];
       ...
       diff_bits |= x[31] ^ y[31];
       return (1 & ((diff_bits - 1) >> 8)) - 1;
    

This because memcmp() leaks timing information. And implementing it with a for
loop also leaks.

Longer term worry is the optimizer will figure the above out as well.

~~~
loup-vaillant
> _And implementing it with a for loop also leaks._

Assuming you're talking about this for loop:

    
    
      for (int i = 0; i < 32; i++) // safe branch
          diff_bits |= x[i] ^ y[i];
    

then no, it doesn't leak, because the result of the resulting conditional
branch doesn't depend on a secret. The only reason NaCl unrolls that loop is
because _neeed moar speeed_.

If you were talking about the _early return_ straightforward for loop:

    
    
      for (int i = 0; i < 32; i++) // safe branch
          if (x[i] != y[i])        // timing leak...
              return -1;           // ...magnified
    

Then yeah, it leaks.

> _Longer term worry is the optimizer will figure the above out as well._

It may, but even compiler implementers realise the value of constant time
code. Replacing this code with an early return doesn't just require very
sophisticated optimisations, it never happens outside of a crypto library.
There is little incentive for compiler writers to do this.

~~~
Gibbon1
Problem is the optimizer is totally free to implement the 'safe' code snippet
using an 'unsafe' early return. According to the standard that would be
completely legal.

~~~
wolf550e
A compiler that did this optimization would immediately introduce a pragma
guaranteed to result in constant time memcmp from some blessed source pattern.

Maintainers of crypto libraries inspect the assembly when upgrading their
compilers, test with many compilers and document the versions of compilers
they support.

------
asdaksdhksajd
the nonce and counter state words seem to be swapped in the 3rd figure of the
"A much bigger nonce: XChacha20" section:

    
    
      block'[ 0]: "expa"      block'[ 8]: kcolb[12]
      block'[ 1]: "nd 3"      block'[ 9]: kcolb[13]
      block'[ 2]: "2-by"      block'[10]: kcolb[14]
      block'[ 3]: "te k"      block'[11]: kcolb[15]
      block'[ 4]: kcolb[0]    block'[12]: nonce[4]    <---
      block'[ 5]: kcolb[1]    block'[13]: nonce[5]    <---
      block'[ 6]: kcolb[2]    block'[14]: counter[0]  <---
      block'[ 7]: kcolb[3]    block'[15]: counter[1]  <---

~~~
loup-vaillant
Corrected, thanks.

~~~
asdaksdhksajd
the page didnt change for me yet, caches?

~~~
loup-vaillant
Crap, I thought you were referring to _another_ figure, which had the same
error. Correcting now.

------
overcast
I thought this was for Chacha the site, and I'm equally bewildered that
service is still running after twelve years!

