Hacker News new | past | comments | ask | show | jobs | submit login
The League of Entropy Is Making Randomness Truly Random (onezero.medium.com)
94 points by kiyanwang 64 days ago | hide | past | web | favorite | 37 comments



Somewhat relevant, Eric Lippert recently had an excellent 40-part blog series about fixing randomness: https://ericlippert.com/2019/01/31/fixing-random-part-1/


Not really on topic but can we discourage the use of medium? It’s annoying to have to open these links in incognito and on top of that they do some other things that make reading more difficult.


> It’s annoying to have to open these links in incognito

Note that the thing that looks like a paywall isn't - while Medium does have paywalled articles, they're rare (and as far as I can tell this one isn't) and at the discretion of the author, not of Medium, anyway. Medium aggressively wants you to sign up if you've read more than n articles in a month, but you can just click the X and read n+1 articles.


We must have gotten different versions. I’m on mobile and my paywall didn't have an X to close it. I have seen the thing you’re talking about on desktop though..


There seem to be two versions of the paywall for this article - one you can click an x on and one you can't. Which version you get seems to depend on if you have Adblocking enabled when you visit the site.


With JavaScript disabled, works like a charm.


What I don't get is that the modern chips are reaching the level of being limited by quantum noise yet modern GPUs and Tensor processors don't seem to offer any sources of quantum randomness. Can someone who understand this stuff please explain.


Quantum noise does not automatically mean quality randomness

These discussions about randomness have the tendency to get too philosophical. In practice technical details matter more. For cryptographic or statistical applications you need steady quality.

256 bits of random state that gets slowly updated with few random bits from physical source (thermal or quantum noise), is enough when it's fed into PRNG process that whitens it.


Being affected by noise doesn't mean you get a source of noise that has the particular properties you want (namely, not compressible / not predictable in a statistical fashion).

You can take 'bad' sources of noise and make them better, but these generally require a fair bit of math [1] and can be costly to implement. At that point, why not just use a PRNG a la urandom?

[1] https://en.wikipedia.org/wiki/Randomness_extractor


Doesn't the Bitcoin blockchain serve as a strong, independent, difficult to manipulate source of entropy that gets published every 10 minutes on average?

Why not just use that?


Bitcoin block hashes are expensive to manipulate, but not fundamentally difficult to manipulate. All you need to do is bribe the miners. Let's say you strike a deal with AntPool, which mines about 10% of Bitcoin blocks. If they find a block with favorable entropy, they broadcast it; otherwise, they withhold it, and you compensate them for the missed coinbase. So if you only need to control 1 bit of entropy, you can expect this bit to be available within ~20 blocks, at a cost of one coinbase. Very expensive -- but not difficult.

In essence, the difference between Bitcoin and an aggregate scheme is that in Bitcoin, your control over the entropy is probabilistic and directly proportional to the number of bribed parties, whereas in an aggregate scheme, you have no control whatsoever until you manage to bribe every party, at which point you gain full control.


The effect of this can be minimized by pre-comitting to which block will be used for entropy. Then antpool has a 10% chance of mining the relevant block, a (.1*.5=0.05) 5% chance of mining an unwanted bit and discarding it, and then from there the chance that another unwanted bit is found will be close to 50%.

All considered that means antpool has a 2.5-3% chance of flipping an wanted result into a wanted result. At $125,000 per block, that's $4 million of cost per bit manipulated favorably, getting exponentially worse as you need to manipulate more bits.


I think that the problem is that if you pick the first block mined after April 1st midnight, and there are two competing blocks, it will be a problem. In the blockchain one of the blocks will win sooner or later and it will be extended with a long enough chain to kill the other block. But the people that would have won the lottery with the other block will be angry.

And remember that the timestamp of the block is oply an approximation. You can use any time you want. I'm not sure if the net reject the block when the time is too off, but you can use a fake delayed time of a few minutes and wait to broadcast it (perhaps mining secretly the next block), or perhaps you can use a fake previous time of a few minutes and blame the delay of the broadcast to a bad connection. (It's mode difficult to do this secretly in a pool.)

The newspaper here make a big fuzz about the first baby of the year, and I always suspected the exact born time in that moment is not reliable.


You can solve this problem by picking a specific block height, and not confirming the result until it has at least 6 confirmations.


I think this claim is ought to be very controversial, I seriously doubt anyone would consider Blockchain a viable source of true-randomness.


Why? Please provide a meaningful reason why Bitcoin cannot be used for secure randomness.


Secure randomness requires unpredictability, unstoppability and unbiasable. If you are flipping coin based on bitcoin randomness, the block proposer has an advantage over you. It is not technical secure randomness. That's why Ethereum 2.0 is planning on using a RANDAO + VDF to ensure it. More info on vdfresearch.org


A block proposer has a small probability of being able to re-roll the random number a single time, and an exponentially lower probability of getting a second or third chance. And each attempted manipulation costs a lot of money.

Can easily put secure bounds on this and use it for most applications.


Additionally you can use iterated hashing: take the block hash b and hash it n times as in H(H(H(H(b)))) etc.

Since hashing is a serial operation, and each hash is a random mapping of input to output, with enough iterations (hundreds of billions) you make it completely infeasible for the miner to even know what the result was by the time they have to make the block public.

Zcash actually did this for their second trusted setup; IIRC the delay was set to be about a week's worth of computation. It's a much better scheme for many use-cases than anything else I've seen in this conversation. The main downside is exactly when which participate actually finds out what the final result is isn't well defined. But for cases where you can commit to the result in advance that's fine.


What Zcash did and Solana plans to use is a VDF without succinct verification. Unfortunately, I cannot verify it on a smart contract though, which brings me back to RANDAO + VDF for onchain verifiable randomness.


While you are right for most applications, it would not work for gambling or block selection in the case of proof of stake. Let say I'm flipping a coin (based on the last bit of the block hash) with you. You already locked the funds on the contract, waiting for me to commit to. I can wait until I'm the block proposer to push a block where I'm included. Even if we defined ahead of time how many rounds of hashing, I still have access to that information ahead of time and can decide to participate in the bet. Therefore, it is not secure randomness. (I won't get into commit and wait which could solve it but they are a pain in any case)


No need for commit reveal, just do a relative offset for picking the rng. So after you place a bet, 6 blocks later is the block that decides if you succeed or fail.

You can use simple techniques like this to make most use cases secure.


Having many sources makes it even more difficult to manipulate.


How does this sort of scheme prevent a Sybil attack? Assuming that I generate a “random” pool of people, what’s stopping me from controlling a dozen of those with a generator that’s predictable to me? (In essence, how is this different than “mixing” a bunch of different sources of random data together?)


It's (as far as I can tell) not open to the public in terms of hosting so they don't need to be Sybil resistant.

This doesn't seem to be much more than a fancy way of mixing randomness together (which produces valid randomness if just one source was truly random) and then using BLS signatures on top of that. BLS signatures let you verify the nodes that were involved, and it seems like with how they treshold, they're trying to prevent attacks based on not revealling (ie. the randomness is the aggregated threshold signature of the random data they mixed together).


I run one of the drand nodes. Yes, the membership is limited. If someone wanted to join they can by just asking and committing to run a node for the long term. It is not a permissionless system, and so Sybill attacks are not in the threat model.


> what’s stopping me from controlling a dozen of those

They're signing their contributions.


This is a solution looking for a problem that doesn't exist.

For all practical purposes a properly implemented software random number generator provided by the OS is fine. The few corner cases where this isn't the case (mosly early boot entropy problems) appear in situations where you don't yet have Internet connectivity.


The article, in fact, reveals plenty of practical purposes, besides early boot initialization, where pseudo-random numbers are entirely unsuitable.

If the article deserves criticism, it is in the suggestion that true randomness is hard to come by. We have billions of connected devices with high-density CCD image sensors. Each pixel generates random noise that may be sampled at many frames per second. The true randomness of any pixel is limited -- bias is easy to demonstrate -- but mixed with millions of others their randomness becomes exemplary. The top-quality random bits available from these legions of phone and surveillance cameras far exceeds any practical need.

Those without a camera often have access to a microphone. Microphones, likewise, provide a ready source of random noise in their least-significant bit. Where a stereo source is available, XORing low bits is better.

Those with a radio receiver can find easy random bits by tuning to an unused channel -- or even a used one -- and stirring least-significant bits into a pool.

Such a pool may be left over from previous runs, so a device with any persistent storage need never start with no random bits.

Systems with access to a CCD, microphone, or receiver have a ready solution to the problem of early-boot key generation.


How do you prove to another party that the data was random, or that you didn't just keep sampling until you got a desired result?


I think this is the largest necessity for those truly invested in a truly random performing system.

The truly random system can also provide more sophisticated simulation / effects of said simulation.

Think of trying to programmatically trying to solve unbound equations / millennium problems. Anything involved in testing to deal with chaos would not be accurately represented until a true random is created.

Now what is true random, and does true random exist is a whole other topic of conversation.


But is physical reality truly random? :>


Quantum theory pretty conclusively shows that it is in fact.


Conclusive is a stretch. I personally think that reality has "true-randomness" or, is otherwise not deterministic.

Unfortunately, at least from a philosophical perspective for now, things like Superdeterminism have to be considered. One might come to the conclusion that there are both deterministic models, and randomness models of our universe, and thus the physical laws don't commit one way or the other.


Don't forget randomness is just a hypothesis:

https://news.ycombinator.com/item?id=20670176


Does it? Maybe on the micro but at the classical macro level, it's pretty deterministic. Why not both? And in some beautiful Taoist fashion, perhaps they're mutually complementary, both opposites need to exist for reality to exist as is. But as a layman, I know enough to know how hotly debated this subject is so I digress.


Decentralized Randomness should be pretty easy to do

One of the problems with current PKI is weakness in the face of quantum computers, leading to a new crop of algorithms being submitted to NIST, etc.

I wanted to ask whether the following simple scheme, based just on cryptographic hashes, can be used CONFIDENTLY, SECURELY and RELIABLY in many situations where Assymetric Key cryptography is used today, and in many others too, such as providing provably random polling etc. It is very similar to a One Time Pad but uses a cryptographic hash function to generate the OTP codes.

Here is the scheme:

Everyone generates a random private key K[p] and store it just like any assymetric private key (encrypted with some non-stored derived key).

They use any cryptographic hash function that hasn’t had a serious preimage attack (perhaps even MD5?), hash it n (eg 10,000,000) times to get h[p][n], and publicly commit to that number. This is like a public key.

The hashes are long enough that it’s infeasible to reverse them. Key strengthening can be achieved by jumping a few onion layers between transactions.

If you start running out then you post a new public key, signed with one of your remaining onion layer codes.

Any verifiers store the original public key per participant, and then can replace them with the new public key if it was properly signed by the old one, etc.

Use case: generating provably random numbers by mutually distrusting parties

Participants they gradually reveal their hashes, one onion layer per transaction. Each provably random seed is a function of the alphabetically smallest/largest three of those hashes at the next onion layer. If not all of them reveal the hashes in time, they gossip, verify and agree on which ones are the smallest/largest three before some cutoff point like “most reported that most reported”. That leaves tons of bits of entropy coming from everyone!

Use case: Authenticator Apps

The hash h[p][n+1] would be a hash of some substring of h[p][n] with enough bits that finding all chosen preimages (by an eavesdropper of the previous code) would be infeasible in advance. Perhaps 10 alphanumeric characters is enough. Also when displaying the code to enter, the authenticator app can tell the user a number from 1-100 indicating to the verifier how many onion layers to peel, making it harder to precompute the preimages. Or the user would have to enter the entire hash via the network-connected computer scanning a QR code, NFC or something. From a security standpoint, this method seems superior to the HOTP and TOTP schemes used in authenticator apps today, since there is no need to trust the verifier with any secret keys (https://www.ietf.org/rfc/rfc4226.txt) Also there is no need to sychronize clocks, since the client simply lets the server know how many times to run the hash, and increments that number every time.

Use case: Signing Payloads

Participants reveal a payload and commit to an HMAC signature by using cryptographic key at the next onion level, which at that point would be known only to them. All these signatures are collected into a blockchain block / merkle tree timestamp / similar thing, and it is sent to the participant before they reveal the onion key they used to sign it.

Use case: Off the Record Messaging

The Blockchain or Merkle tree is private between a few parties only, so once the next onion level is revealed, no participant can prove the payload was generated by a given participant, since all the onion hashes were known, any of them could generate a new valid tree with any payload history. They can only prove it to each other, or given enough “witnesses” attest to that tree, people might trust then on the basis of consensus of (presumably) mutually distrusting parties, but that’s not the same thing as cryptographic proof. But that is true of any OTR conversation.

Use case: Restoring Access

This can be used instead of Shamir Secret Key sharing. The server would have to store keys for every participant, and M of N participants would just sign that they approve authorization of some new session, some new key, or whatever. These signatures could be easily checked by anyone who has the public keys of the M participants who signed it.

Use case: Decrypting payloads

Not sure how one would do this one, to be honest. With PKI, someone could encrypt a payload that can only be decrypted by a private key holder. I see how to do signatures and HMAC, but not the other way.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: