Hacker News new | comments | show | ask | jobs | submit | pbsd's comments login

I find it disingenuous to see properties such as equidistribution or low discrepancy being described as 'quality' of a generator. Yes, they make certain Monte Carlo algorithms converge quicker. No, they are not closer to a truly random sequence than a CSPRNG is. The goal of a CSPRNG is to be indistinguishable from true randomness, and a predictor is quite obviously a distinguisher. But so is a statistical test, or any 'nice' property that makes some algorithm behave better.

By the way, I think I've said this before, but I'll say it again. I really dislike the usage of the term 'CSPRNG' in these discussions. A CSPRNG is typically understood to be /dev/urandom---an algorithm that not only generates a long sequence of numbers, but also harvests entropy and tries to achieve forward and backward security. But it is also sometimes overloaded to mean a stream cipher, which is a conceptually simpler construct, and also able to be much faster. Stream ciphers can definitely compete in performance with things like Xorshift.

If, by quality, you mean "indistinguishable from true randomness" then there are many valid metrics. One important metric that many non-CS PRNGs have, but many CSPRNGs do not have, is a provable deterministic cycle length for all seeds.

These characteristics don't just make Monte Carlo simulations converge quicker, they also do things like guarantee that your 52 deck shuffle will definitely produce the one shuffle that gives player two a royal flush on the flop in a round of texas hold 'em. Or, more generally, they guarantee that your Array.shuffle() method is unbiased for a reasonable sized array. In general you need true randomness and/or a long cycle linear-transformation PRNG to make those sorts of guarantees. It's also nice that they're fast. That said, I do also agree that the scenarios in which these things matter are rather rare, and most people should probably just use a CSPRNG by default.

Do you really think a CSPRNG is fast enough for your standard library's array shuffle though? Wouldn't you rather have something like xorshift1024* that's probably > an order of magnitude faster? Maybe a CSPRNG is fast enough... I don't have all the numbers. So this is an honest question.

CSPRNGs are a type of pseudo random number generator, which are defined to be deterministic. I have no problem with calling /dev/urandom a CSPRNG for practical purposes, but very strictly speaking, it's not.

Do you mean there cannot be a CSPRNG? Most of cryptography is based on security in practice, not in absolute theory (i.e. everything except for one time pads and maybe some quantum crypto). That's like saying there are no cryptographically secure encryption algorithms other than one time pads, since you can break all of them in theory. That's a pretty useless definition of cryptographically secure.

edit: By in theory I mean that with enough computation resources you could break them even if you didn't find some new, clever weakness. Not that you could break them in theory because a weakness could always be found.

No, I mean that the fact that /dev/urandom "reseeds" in mid stream means it is not strictly speaking pseudo-random, since it is not completely deterministic. Maybe I'm wrong, but the comment I was replying to was arguing that /dev/urandom was the only CSPRNG, and things like stream cipher algorithms are not.

> But so is a statistical test, or any 'nice' property that makes some algorithm behave better.

Not really. In general, the statistical tests are chosen such that when the law of large numbers comes into force, a perfectly random sequence would pass them with flying colors. If repeated runs of a perfect random number generator were failing these with a frequency that were improbable (i.e. far outside what the central limit theorem would predict), I think it would be safe to say the universe was playing a trick on you. Many of these are very closely related to predictability in practice as well. Uniformity is a statistical test, and I think it's probably the lowest bar you can set for "unpredictable".

The fundamental issue with measuring "randomness", especially for a PRNG, is that there is no set definition of information entropy suitable for calculation. Shannon entropy requires you to have a (possibly numerically sampled) PDF. The problem with that is that is you want to find the entropy of a sequence, you can't use the PDF of all the bytes you've gotten out the generator, since that PDF will have 1 for the exact sequence you obtained, and zero everywhere else. If you do it by multiple runs, there is no way you'd get enough data to construct an accurate PDF with a suitably large sample size. You have to use something based on highly subdivided sequence of values, but if you only look at each value in isolation you will only determine the uniformity with some small bin size, which ignores patterns due to proximity of values in the output.

So generally entropy is calculated using a more complex probability kernel, like the ones used in these statistical tests. But none of these can possibly be perfect. The "perfect" measure of entropy is Kolmogorov complexity[1], which captures the minimum length of a program you would need to be able to print the given output. On top of being undecidable in general, and also unobtainable in practice, you can show an upper bound of the Kolmogorov complexity on a CSPRNG that is actually quite low. Just take the algorithm the CSPRNG uses, plus any external entropy it got (from hardware interrupts, etc.). So in the end, various kinds of statistical tests are the only ones that make sense.

The actual goal of a CSPRNG is far more specific than just simulating a truly random sequence, which is why it differs from those used in simulations. It is unpredictability, defined by trying to make it so this short program which can generate all the output is impossibly hard to obtain from just the output in practice. In a vague sense, it is like (some of) the desired properties of a hash or cipher; we can't use a perfect random source (resp. one time pad), so let's get something that will take an impossible amount of computational power/measured output to break (resp. ciphertext using the same key), and mix it in with some as true random as we can get seed (resp. IV), and then mix some more in from time to time to make it even harder (resp. changing keys). That is the reason stream ciphers, while generally not as good as /dev/urandom-style CSPRNGs, are actually closely related enough that I think it's fair to call them CSPRNGs as well. They're just ones with a far worse algorithm/entropy source.

[1]: https://en.wikipedia.org/wiki/Kolmogorov_complexity

> Not really.

A statistical test _is_ a distinguisher! This seems self-evident to me. When the test repeatedly fails (with an extreme p-value or whatever criteria you want to apply here for 'failure'), you have successfully distinguished the sequence from random. Also you seem to misunderstand what I meant by equidistribution---uniformity is a different property, and of course crucial. I am puzzled, then, as to why you claim some generators are more 'uniform' than cryptographic ones.

Any pseudorandom bit generator whose next bit cannot be predicted in under 2^n "operations" also passes _all_ statistical tests that require less than this computational effort. This is a well-known result by Yao. In other words, unpredictability and indistinguishability are equivalent. In the case of, e.g., AES-CTR, n = 64 (birthday-bound distinguishers force it to be 64 instead of 128).

Your philosophizing about what true randomness really is does not strike me as very useful for practical purposes. Given unbounded computational power, obviously no such algorithm is actually indistinguishable. But who cares?

Unpredictability is irrelevant to a use case like shuffling an array. Moreover, your conjectured polynomial-time perfect CSPRNG may actually produce a less useful stream of entropy for this use case than a good non-CS PRNG because its cycle length may well be shorter, and all that matters is uniformity (which can actually be guaranteed, not just conjectured and measured empirically) and enough seed entropy to kick things off. Top it all off with the non-CS PRNG being perhaps orders of magnitude faster.

I'll repeat once again that I do think people should generally default to CSPRNGs, but I really don't understand why everyone involved in crypto insists that there are no valid use cases for non-CS PRNGs / that CSPRNGs are uniformly better for all use cases when it seems very obvious to me that they are not. I don't see any reason why a good PRNG and a good CSPRNG shouldn't exist in every standard library. Am I missing something?

A short cycle would be a catastrophic outcome for any serious cipher. Although not every generator has a provably guaranteed period, relying instead on the properties of the 'average' random function, it easy to guarantee period. AES in counter mode, where key=seed and block=counter, has a guaranteed period of 2^128.

As to why we insist that cryptographic primitives are uniformly better? Well, that is more subjective. I personally believe that they have gotten fast enough that the set of cases where they are unacceptably slow is vanishingly small, and the trend is clearly on the side of crypto getting faster. And I can sleep at night knowing that my results are not wrong due to some unknown, unexpected, property of my generator.

That's all fair-ish, and I think I need to learn more about CS alternatives. Necessary cycle length is relative. For crypto it seems like the definition is "big enough that it generates keys that are infeasible to predict," or "big enough that it can be used to stretch entropy a long ways for a cipher" for all practical purposes. But even the AES generator you cited sounds like it would be incapable of generating an unbiased shuffle of a deck of cards, which has 52! or ~2^225 unique shuffles... For a use case like this it's hard for me to wrap my head around using anything other than a good non-CS PRNG or true randomness. If you believe otherwise I'm truly interested in your thoughts, links, etc.

Another case where a long cycle length is useful would be the sort of scenario in my post, at much larger scale. If you have thousands of machines re-seeding at every service restart and then generating sequences of millions or billions of random numbers from a sequence, and you want a vanishingly small probability that those sequences will never overlap, could you use a CSPRNG?

Re: the generator properties, do you _really_ believe that? I mean, good non-CS PRNGs usually have fairly rigorous presentations and pass statistical tests. They aren't as sexy as crypto stuff, and maybe don't get the eyeballs, but they're relatively straightforward proven math whereas all of the CSPRNG results are based on conjecture. It seems at least as likely that some unexpected property of a CSPRNG will be discovered, a la RC4.

The shuffle question ties in with the overlap question. Let's stick with the AES in counter mode example. AES is a block cipher, or in other words a keyed permutation. What this means is that each key (resp. seed) determines a distinct permutation that we use to generate the blocks AES_K(0), AES_K(1), etc. In effect, each seed will result in an entirely distinct and independent 2^128-block sequence, instead of simply the same sequence starting at a different place, which is usually the case for non-cryptographic generators. So 2^128 possible keys (or 2^256, if you go AES-256) times 2^128 output blocks per key is probably enough to cover all possible card shuffles.

I do believe that. RC4 is a good example, actually; the unexpected properties that make RC4 be considered completely broken for cryptography are considered irrelevant for simulations (I won't swear by this, but I'll bet that RC4 passes all of TestU01's tests). The standards are just so much higher for cryptographic primitives. I'm sure unsexy generators can be suitable to this or that application. But due to the ad hoc nature of the statistical tests used, you can't be sure that your generator is not oddly correlated with a concrete application. I've written a comment a few days ago---on a thread very similar to this one---about a real-world case of this happening [1].

[1] https://news.ycombinator.com/item?id=10549024

Ok, I get the seed/state space vs. cycle length bit. So it can theoretically generate all of the permutations of a shuffle if you re-key/re-seed periodically. But you can't guarantee that it actually will, right? It is possible that, for all 2^128 keys, and in all 2^128 cycles there are some shuffles that AES will never generate (in other words, some sequences of 52 numbers that will never be generated) so you effectively have a 0% chance of hitting them.

From a cryptanalysis standpoint I think that's consistent -- the probability of generating any particular single shuffle is astronomically low, so the difference between the actual probability and 0% is probably outside the realm of what's computationally feasible. But with something like MT19937 I can show that its at least possible for every outcome to have a non-zero probability. Perhaps this is just interesting academically, but maybe not if you're actually playing a card game for instance?

No, that cannot be guaranteed. Equidistribution in 52 dimensions requires necessarily a minimum state size of 52 log_2(52) bits, which we don't even have. Even if we did, it could not be guaranteed anyway.

You have already covered the distinction between guaranteed equidistribution and not, so I don't think there is much to add there. I do think the concern is mostly academic, since the difference between a subsequence that doesn't exist and one that will never be reached before the universe collapses is irrelevant to the user. I will note that Richard Brent, someone who is much better qualified than I to discuss this, has argued against the usefulness of equidistribution: http://maths-people.anu.edu.au/~brent/pub/pub240.html

I think you have to hit every permutation, at least for AES, because AES is reversible.

In particular, if there were some 0 <= y < 2^128 such that AES_k(x) != y for all 0 <= x < 2^128, then there must also be some z such that two different values x1 and x2 encrypt to the same value, which can't happen if AES is reversible.

Sure but I don't immediately see how that equates to a proof of its ability to produce every combination of, say, 52 consecutive numbers... Just that it will produce every possible individual number. In PRNG-parlance it shows 1-dimensional equidistribution but not 52-dimensional (or generally (k, w)-dimensional) equidistribution. Or is there another pigeonhole step I'm missing?

Yes, a single block encrypted under 2^40 distinct keys (you can generalize this to a few popular blocks, instead of just the same block encrypted over and over). Each key you guess during bruteforce has a 2^40/2^128 chance of being correct. After 2^88 AES operations you're likely to recover at least one key. After 2^100 you've recovered, on average, 4096 out of the 2^40 keys.

In the context of reverse engineering I feel like learning assembly this way is 'doing it wrong'. Of course, there's no such thing as 'doing it wrong' when it comes to learning, but here's what worked for me.

What worked for me was going in the opposite direction: start with simple C programs [0], and see what compilers do with them. If you understand C, you already kind of understand how the machine works, though without the machine specifics. If you see an assembly instruction that you don't recognize, check the manual [1]. You can do this online these days, say, with [2]. Here's an example of a simple program that covers calls and branches: http://goo.gl/DKrYrE

Then, use an interactive debugger (like OllyDbg or whatever works for your platform) to trace through your small programs, instruction by instruction, and see how memory and registers get manipulated at each step. Change instructions and see what happens. This will also make you familiar with common compiler idioms, which is very useful in RE work. Once you get reasonably familiar with these small programs, try your hand at a program you _don't know_, or increase the complexity of your small programs. Rinse and repeat.

[0] The choice of C here is relevant, since many other compiled languages tend to add a lot of cruft to their binaries.

[1] http://www.felixcloutier.com/x86/

[2] http://gcc.godbolt.org/


I would generally agree that writing/manipulating assembly language is a better learning device than black-box reversing (I mentioned reversing because of the context of the article).

But learning to reverse assembly language is also one of those daunting tasks that turns out not to live up to its scary reputation. I wouldn't want to suggest that you can't just dive in and learn to reverse if reversing is your actual goal.


My bad; I somehow skipped over your paragraph saying to 'dive in', which renders all of my disagreement invalid.


No, your disagreement is awesome stuff. I'm glad I got you to write it down.


It means exactly what it sounds like: the author tried to attack the generator, failed, and thus concluded that it must be a Very Hard Problem.

Note that the field of pseudorandom number generation for non-cryptographic purposes is much less rigorous than cryptography. Typically, for a new generator to be accepted as "OK" all it needs to do is pass a number of fixed statistical tests, usually one of the TestU01 batteries [1]. This is usually the only falsifiable claim you get, and if you're familiar with diffusion and how to achieve it this is easy to work out. Other falsifiable claims include equidistribution, but that is not a very useful guarantee---simply incrementing by 1 will achieve that. The notion of indistinguishability against computationally-bounded adversaries does not exist. All of this contributes to this being a field where crackpottery abounds, and is hard to distinguish good from bad.

For example, here's a generator that uses 1/4 of an AES round per output word, and that passes the Crush battery of statistical tests (which uses ~2^35 samples). Is it a good generator? I would not bet on it.

  #include <stdint.h>
  #include <immintrin.h>

  struct S {
    static const unsigned kAesRounds = 1;
    union {
      uint32_t state_[4 * 4];
      __m128i  words_[4];
    } u_;
    unsigned counter_;
    const __m128i key_;

    S(uint64_t seed)
    : u_{{0}}, key_{_mm_set_epi64x(seed, -seed)}, counter_{0} {
      for(unsigned i = 0; i < 16; ++i) (void)next();

    uint32_t next() {
      const uint32_t output = u_.state_[counter_];
      __m128i t = u_.words_[counter_ & 3];
      counter_ = (counter_ + 1) & 15;
      for(unsigned i = 0; i < kAesRounds; ++i)
        t = _mm_aesenc_si128(t, key_);
      u_.words_[counter_ & 3] = t;
      return output;
[1] http://simul.iro.umontreal.ca/testu01/tu01.html


Nit: you call _mm_aesenc_si128() once per call to next(), right? I don't see how that constitutes "1/4th of an AES round per output word". (You do output 1/4th of an AES block.)

You are, of course, right about the actual point you're making. And calling _mm_aesenc_si128() once per 4 calls to next() may well suffice to pass a statistical test. Then again, even an LSFR passes most statistical tests...


Yes, you're right, I am calling the AES round once per `next()` call. But note that this his could be rewritten as

  if(counter_ >= 4) {
    counter_ = 0;
    u_.words_[0] = _mm_aesenc_si128(u_.words_[0], key_);
  return u_.state_[counter_++];
Which only uses one AES call every 4 words (and only one block of storage). Instead, I chose to avoid the `if` and compute blocks ahead of time, which makes for more predictable performance.


Is there a particular reason you would bet against it, or do you just saying that you wouldn't would bet for any implementation without proven theoretical properties?

(As an aside, it's probably just my lack of familiarity, but considering how simple the algorithm is I find the syntax surprisingly hard to follow. Is this modern C++?)


Although this thing is able to fool the 200 or so tests from TestU01, it would take a short time for an expert to devise a test that would distinguish it from random, and even recover the key (read: seed). On the other hand, AES-CTR (conjecturally) fools _all_ statistical tests that take less than 2^128 effort, and on top of that it is still very very fast (0.63 cycles per byte on Haswell). So what really was gained? Cryptographic primitives have gotten so fast that the additional speed from these ad hoc generators seems like an unworthy risk. I concede that it can be very fun to try to make them, though.

Additionally, a generator that passes those 200 tests is not guaranteed to be high-quality for a particular usage, which might as well be considered nothing but another statistical test. There is the famous case of the r250 generator---an additive Fibonacci generator that passed all statistical tests of the time---which turned out to have characteristics that rendered costly real-world simulations wrong [1].

That is C++, yes, but I don't think there is much of anything modern about it. Trying to save on space probably contributed to its unreadability a bit.

[1] http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.69....


Here's a paper that includes an analysis of a similar approach that uses 5 rounds of AES under the name ARS-5: http://www.thesalmons.org/john/random123/papers/random123sc1...


I know this paper, and like this approach. Reduced-round primitives let you take advantage of existent analysis, and keep some guarantees while improving speed. In the case of AES, we know that the probability of any differential for 4 rounds is at most 2^-113; similar for linear probabilities [1]. I'm curious why ARS-4 fails Crush; I wonder if the custom key schedule is the culprit.

[1] https://eprint.iacr.org/2005/321


about the C; https://software.intel.com/sites/landingpage/IntrinsicsGuide...


As far as academic articles go, "* considered harmful" is probably as vague and bombastic (read: clickbaity) as a title gets (perhaps after 'Ron was wrong, Whit is right'). Personally I'd prefer a more descriptive title, like 'A survey of weaknesses and attacks on the x86 platform'. But then again, I'm a boring kind of person.


It is not a common occurrence, but it does happen occasionally.

Once upon a time the best known way to compute discrete logarithms was Shanks's Baby-Step Giant-Step. This method begins by constructing a table of size sqrt(p), and then finds a logarithm in time sqrt(p) as well. But in 1978, Pollard came up with the rho algorithm, which did not require any storage beyond 2 group elements, and had the same asymptotic runtime! This method relies on collision-finding, and is still the best algorithm today---in a modified fashion to make it parallelizable---to attack elliptic curves.

Another example happened in integer factorization, but is more nuanced. In the early 1980s, the best known integer factorization algorithm to break semi-primes (i.e., RSA keys) was the quadratic sieve. Now, the quadratic sieve runs in time exp( sqrt( log n log log n ) ), but also requires storage proportional to the size of its factor base---exp( 1/2 sqrt( log n log log n ) ). Then in 1985 Lenstra came up with the elliptic curve method, which once again requires little storage and has the same asymptotic runtime as the quadratic sieve. In practice, however, the quadratic sieve will be faster for RSA numbers. And in 1990, the number field sieve improved the running time to something curve-based methods could not match.


Thanks for your response; 1971 Shanks' BSGS improvement to 1974 Pollard Rho for DLP is indeed a nice example.


My mistake; 1978 Pollard's Rho algorithm for DLP. Was looking at Pollard's Rho for integer factorization at the time. On that note; what a monster this Pollard character. Wonder what he is up to these days.


Retired: https://sites.google.com/site/jmptidcott2


Following the "List of my papers" section from the "Number Theory" page https://sites.google.com/site/jmptidcott2/nthy

>Some authors even applied the name "kangaroo" to any random walk in a cyclic group. This is zoologically absurd (a kangaroo cannot jump in one bound to another continent) - and mathematically confusing.

Spurred by this thread and especially after studying BSGS and the Pollard Rho for DLP set of algorithms more in depth over the last few days; I found his clarification and justification regarding the "taxonomy" of these methods entertaining and enlightening. Thanks again.


Loosely speaking, you need n + epsilon qubits and 4n^3 operations to break RSA-n, whereas you need ~6n qubits and 360n^3 operations to break ECC-n. For n = 256, you need around 1536 qubits, whereas you need at least 3072 for RSA-3072.

This suggests a criterion to reject P-256: it needs fewer than 2048 qubits to break. P-384 is above this threshold, at around 2.5k qubits. Hey, it's as good speculation as any.


Does this mean that barring a quantum-hard alternative we could get effectively quantum-hard crypto by using crazy key sizes like RSA-131072 or ECC-4096?


Yes, though it would be quite impractical. See for example http://cr.yp.to/talks/2010.05.28/slides.pdf


"Key almost fits on a hard drive" :-)


The important bit here is not that the IV is different, but that you can also inject arbitrary differences into the IV. This gives the attacker more freedom than in the usual case, where differences only go into the message.


Right. The relationship between SHA1 outputs and the IV seemed important to the concept of this being a building block for multicollisions.

Also: I think it's pretty surprising to generalists (it certainly was to me!) that the SHA1 hash is literally the transformation of the IV, and that you can pick up the results of a SHA1 hash and keep hashing with it. If you grok this, length extension attacks are obvious, as is the benefit of the truncated SHA-2 variants.


It's definitely a neat demonstration that he can produce a collision for these favourable "neutral-bit" IV's relatively cheaply.

I suppose the next steps are to find two inputs that produce favourable IV's, then repeat this process for those, and that seems like it's getting really close. Stevens actually already has the best near-collision in non-reduced SHA-1 I know about from this paper:


I made a quick visualization of it here, it looks pretty cool:


I also added the Shappening vectors:


It's funny that they actually collide around R74, then he has to diverge again so that when you add the IV (part of the process of creating the IV for the next block) they collide.


On that note, I was surprised to learn that if all you do is apply some invertible transform to the IV (xor with a constant, swap words, whatever) before processing the last padded block in Merkle-Damgard, you get an indifferentiable hash.

This was suggested by John Kelsey way back in 2001 for SHA-2, but never took off: http://www.cs.utsa.edu/~wagner/CS4363/SHS/dfips-180-2-commen...


> - It has powerful generics and macros, so people who want to write really clever code, can.

To take a cue from the article: how do you write a generic max() function for an arbitrary number of arguments?


Someone had commented this in a response on the site.

   macro_rules! max {
        ($e: expr) => { $e };
        ($e: expr, $($rest: tt)*) => { max($e, max!($($rest)*)) }


Very small one, but (taken from [1]): http://goo.gl/M8XThn vs http://goo.gl/RZZf1N

[1] https://cryptocoding.net/index.php/Coding_rules#Compare_secr...


Interesting. Looks like it affects all version of llvm (at least those that compile).

I guess with LTO the result could be the same, but asking to always inline seems like asking the compiler to fuse operations, and then it sees shortcuts. Dunno.


`always_inline` is only there to make -O0 smaller for presentation purposes. It has no impact on -O2.

The trick here is that -march=i386 predates CMOV, and that LLVM specializes code emitting for bool (and _Bool). If the secret bit were an uint32_t, there wouldn't be a branch anymore.


Oh! I did not look closely enough to see that this was c++ with bool. Now I am even less worried. :)


Yeah, I'm not too worried either.

There is one class of cryptographic code, however, that is entirely unsuitable to distribute in Bitcode---DPA/EM-protected code. EM attacks on middle-end ARM chips have been demonstrated recently [1, 2].

Protecting against these attacks usually involves splitting the computation into 2 or more "shares" (see, for example, [3]); these require strict control of which register each word goes into, and which registers overwrite which. This cannot be enforced in Bitcode---or any other bytecode, for that matter---and direct assembly must be used.

[1] https://eprint.iacr.org/2015/561

[2] http://cr.yp.to/talks/2014.09.25-2/slides-dan+tanja-20140925...

[3] http://keccak.noekeon.org/NoteSideChannelAttacks.pdf



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact