Hacker News new | past | comments | ask | show | jobs | submit login

Hopefully someone finds it helpful!

I should also note that the picture I painted is most applicable to block ciphers, but it does apply mutatis mutandis to hash functions and pseudo-random generators.

With hashing, you have an inherent loss of information and hence the linear functions in question are non-invertible. Linear hash functions can be expressed as a factorization into two parts, an invertible function followed by a projection onto the first n bits (for a hash function producing n bits of output). With hash functions you're typically trying to find collisions or first and second preimages. Both of these are simple for linear functions. Let me just paint the picture for collisions. If f is linear then its kernel ker(f) = {x | f(x) = 0} can be efficiently calculated by Gaussian elimination. Then you can crank out collisions like no-one's business: if k is in ker(f) then f(x + k) = f(x) + f(k) = f(x). In practice, you're not going to find perfectly linear hash functions in the wild, but if you can detect an approximate linearity on some subspace, you can calculate the kernel of the linear approximation and use that to generate perturbations (the k from earlier) for a randomized collision search with a much higher likelihood of success per perturbation than random chance.

For pseudo-random generators, you're usually trying to solve for the PRG's internal state from a sequence of outputs. The generator function for a PRG takes its current internal state and produces the new internal state and an output. Incidentally, this is a nice completion of the triangle of cryptographic primitives. With block ciphers, you had invertible (injective and surjective) functions. With hash functions, you had non-injective functions (fewer output bits than input bits). With pseudo-random generators, you now have non-surjective functions (more output bits than input bits). While we cannot see the private state output from the generator, we do have multiple examples of the public output. So if (x_n, s_n) = g(s_(n-1)) is the generator equation, then in an ideal case we have the sequence of equations (x_1, s_1) = g(s_0), (x_2, s_2) = g(s_1), ..., (x_n, s_n) = g(s_(n-1)), where x_1, ..., x_n are known to us. If we assume g is a linear function that is known to us as well (no security through obscurity), then this is just a system of linear equations. For a generator with a maximal period, if we have as many bits of output as there are bits of internal state, we may solve uniquely for the initial state s_0, and from there we can calculate s_1, s_2, etc, by just replaying g starting with s_0. As in our previous examples, things in the wild aren't perfectly linear, but it's enough for PRGs to be approximately linear to leak bits of internal state, though we usually need far more example bits of output than there are bits of internal state to make a dent.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact