
Getting randomness from an Apple device with particle physics, thermal entropy - lordmax
https://vault12.com/blog/true-entropy
======
cozzyd
I'm not familiar with how the video is encoded, but maybe you have to be
careful to make sure that the encoding doesn't introduce correlations between
adjacent frames (you can imagine how P or B frames might have correlated
noise, for example).

Also, I don't think you'll ever see Cosmic Ray interactions last more than one
frame... the time scales for cosmic ray interactions are in nanoseconds, not
ms. I suppose electrons could get trapped somewhere and diffuse slowly (that
does happen in some CCD's, I don't know enough about CMOS sensors to know if
that can happen), but that would happen with bright images in addition to
cosmic rays.

In such a small sensor, cosmic rays should be relatively rare. To first order,
you can model the number of cosmic rays you see as a Poisson process, which
should be a pretty good source of entropy. You can even estimate the rate
based on altitude / sensor orientation, etc. However, nearby cosmic ray
detectors will detect correlated cosmic rays (from extensive air showers from
the same primaries), which might hurt.

~~~
lordmax
The key property for crypto randomness here is that these high energy particle
events (be that cosmic rays, background radiation, etc) are not just random,
but independent from thermal noise. They are few and far between but they
affect each sample somewhere. One way or another all that entropy will get
hashed, and having even few bits that are contributed by independent phenomena
makes final hash extremely hard to attack.

Considering all sources that contribute noise to sensors (thermal, light
photons count, high energy particles, shot/RTS noise, and i'm probably missing
a few), all with unique distributions and characteristics makes each sample
readout very hard to predict.

------
AceJohnny2
Tangentially, I wonder if most modern smartphone chips include a hardware
random number generator, and if that is exposed to userspace?

The iPhone has hardware random number generator, at least: _" The Secure
Enclave is a coprocessor fabricated in the Apple S2, Apple A7, and later
A-series processors. It uses encrypted memory and includes a hardware random
number generator."_

[https://www.apple.com/business/docs/iOS_Security_Guide.pdf](https://www.apple.com/business/docs/iOS_Security_Guide.pdf)

I couldn't immediately find if that functionality was exposed in an API.

~~~
lordmax
worth mentioning (that sort of main premise of the article that gets a little
bit unnoticed in all the methodology discussion): all existing HWRNG are
relatively low bandwidth - because they are bound by physical process, rather
then endless spinning up of /dev/urandom. They all have to wait for physics to
produce each bit, and existing chips don't have that much "physics" in them.

The main novelty factor of "camera noise HRNG" is that we effectively
leveraging 12M micro HRNGs in parallel - thats where that firehose of entropy
is coming from.

~~~
atoponce
> all existing HWRNG are relatively low bandwidth

The Intel on-die DRNG generator has a cited bandwidth of about 3 Gbps [1]. On
my Raspberry Pi 3, it has an on-die generator via BCM-2835. Single threaded
testing with dd(1) shows about 1.5 Mbps bandwidth.

Off-chip, using $25 RTL-SDR dongles, you can get about 2.8 Mbps of bandwidth.
In fact, if you look at the Wikipedia article comparing HWRNGs [2], you can
see that there are a lot of implementations with suprisingly high bandwidth.
Some you'll empty your wallet with, others not so much.

The point is, there are plenty of HWRNGs out there with high bandwidth.
Whether or not you trust them though, is a completely different matter.

1\. [https://software.intel.com/en-us/articles/intel-digital-
rand...](https://software.intel.com/en-us/articles/intel-digital-random-
number-generator-drng-software-implementation-guide)

2\.
[https://en.wikipedia.org/wiki/Comparison_of_hardware_random_...](https://en.wikipedia.org/wiki/Comparison_of_hardware_random_number_generators)

~~~
lordmax
Here is the problem: most of these stats are not what they pretend to be
(unless exact circuit / spec is published). Look at low level details of
building say avalanche noise source:
[http://holdenc.altervista.org/avalanche/](http://holdenc.altervista.org/avalanche/)
\- bandwidth is mostly bound by voltage/frequency/sampling resolution - how
often you can trigger entropy event and how many of them in parallel? True
result for one AN circuit: 2000 bits/sec.

What a lot of these vendors do is have some physical phenomena on the chip
that feeds hardware "whitener" (endless hashing) that responds without
blocking to all requests. That's practically hardware version of
“/dev/urandom" that is bound only by chip IO - but its completely disconnected
from bandwidth of actual “true” entropy phenomena underneath. of course it is
still good CSRNG, but its not “true” source. btw nice exception: TrueRNG team
are pretty honest providing direct schema - hence the real entropy speed of
40kb/sec.

In short every single entry on that list should be independent inspected down
to specs and schema of whitener. If they are not publishing chip spec with
exact details I highly highly doubt the bandwidth of “true” entropy events are
really approaching GBps - this is the speed of whitener, not of actual
generator.

------
mceachen
Really approachable writing. Fun read!

For people that want to read about other entropy sources:
[http://www.fourmilab.ch/hotbits/how3.html](http://www.fourmilab.ch/hotbits/how3.html)

~~~
lordmax
Matthew, there is my personal collection of entropy sources for your
reference. That stuff is addictive!

\- True quantum source: [http://whitewoodsecurity.com/products/entropy-
engine/](http://whitewoodsecurity.com/products/entropy-engine/)

\- Atmospheric noise:
[https://www.random.org/randomness/](https://www.random.org/randomness/)

\- Avalanche Effect Generators
[http://holdenc.altervista.org/avalanche/](http://holdenc.altervista.org/avalanche/)
[http://ubld.it/truerng_v3](http://ubld.it/truerng_v3)

\- Great deck about overall entropy engineering
[https://www.blackhat.com/docs/us-15/materials/us-15-Potter-U...](https://www.blackhat.com/docs/us-15/materials/us-15-Potter-
Understanding-And-Managing-Entropy-Usage.pdf)

And of course "lave wall" we mentioned just takes the cake:
[https://sploid.gizmodo.com/one-of-the-secrets-guarding-
the-s...](https://sploid.gizmodo.com/one-of-the-secrets-guarding-the-secure-
internet-is-a-wa-1820188866)

------
bitL
A question for physicists - how do we know that some process in the nature is
truly random and conforms to some ideal probabilistic distribution? The link
between math and real world is not clear to me. Thanks!

~~~
colmmacc
[https://medium.com/@oskansavli/bells-theorem-and-
randomness-...](https://medium.com/@oskansavli/bells-theorem-and-randomness-
ffe9820c88b) is a good write up of the relationship between randomness and
Bell's theorem.

------
saagarjha
Interesting fact: on Darwin, /dev/random and /dev/urandom behave identically;
both are nonblocking:
[https://developer.apple.com/legacy/library/documentation/Dar...](https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man4/random.4.html)

~~~
gerdesj
RLY? How on earth can a read from /dev/random be non blocking if it does not
have sufficient entropy?

My understanding is that /dev/random "promises" to return decent random stuff,
/dev/urandom returns decently algorithmically generated random stuff. urandom
is good, random is better but finicketty and can sulk for a while.

~~~
atoponce
> RLY? How on earth can a read from /dev/random be non blocking if it does not
> have sufficient entropy?

In short, to answer your question, I am not aware of any /dev/random that will
not block if not sufficiently seeded with enough entropy. Linux is the only
kernel where /dev/random always blocks when the input pool entropy estimate is
low. This is highly criticized in most security communities.

Mac OS X and iOS have implemented the FreeBSD CSPRNG, of which /dev/random is
a symlink to /dev/urandom. The CSPRNG blocks on boot until it's sufficiently
seeded with hardware timing events. Once seeded, it never blocks again.

On NetBSD, /dev/random sometimes blocks, although not as notoriously as Linux.
However, it will fully block on boot until sufficiently seeded, after which it
no longer blocks.

OpenBSD also provides no functional difference between /dev/random and
/dev/urandom. Both will block until sufficiently seeded.

As far as I know, on every Unix-like operating system, data is saved from the
generator to disk on shutdown, and on boot, that data is read into the CSPRNG
as an unpredictable seed. With every GNU/Linux operating system I can think
of, this happens as part of the install, so on first boot, the kernel is
already sufficiently seeded with random data.

On FreeBSD, this seed is saved to "/var/db/entropy-file". On GNU/Linux, either
"/var/lib/systemd/random-seed" or "/var/lib/urandom/random-seed" depending on
whether or not you're using systemd. I'm not sure of NetBSD saves a seed to
disk or not, as it's been many years since I've last run NetBSD seriously.

~~~
JdeBP
The saved seed file on OpenBSD is /etc/random.seed . NetBSD has the
/etc/rc.d/random_seed service, which uses the same filename as FreeBSD.

------
CPAhem
There can be a few pitfalls here. Assuming that the dark field image of the
Apple camera is actually random noise and not some property of the sensor that
can vary between images.

Getting the distribution of the randomness can be hard.

~~~
lordmax
you are quite correct about this - "lens closed" mode has least amount of
entropy by our measure. However its clearly proportional to intensity of light
hitting the sensor (and beside natural photon variance of light itself, the
shot noise in sensor increases with more light). So in what we call "optimal"
conditions - enough light to generate noise, not enough to oversaturated there
is plenty of natural entropy coming from the sensor.

------
rudolfwinestock
This reminds me of SID, the custom sound chip on the Commodore 64. It was a
hybrid digital/analog chip, so the sound which it produced wasn't consistent
between chips due to thermal entropy effects. El33t h4x0rz took advantage of
this by making the SID output a bit of white noise into a CPU register rather
than the speakers in order to get true random numbers.

------
mrkoot
Alexandre Anzala-Yamajako posted interesting comments on this to
[Cryptography] (@metzdowd.com):

> IMO a statistical approach based on taking a bunch of data a saying
> essentially "I don t see any signs that it s not random" is not a good
> approach for entropy seeding. The example is old but I could give you the
> output of an AES in counter mode with a null key and a null iv and no
> standard statistical test woud ever show you any defects while you have
> absolutely no entropy.

> You case is particularely worrisome for several reasons 1) you use a von
> neuman like extractor but you have also shown that your data is not only
> biased but also correlated 2) you don t seem have a model of your hardware
> source from which you could derive the output distribution 3) you do some
> wizardry to remove some correlation but nowhere show or prove that there isn
> t more corrolation to be taken care of or how 4) I didn t see in your
> document a justification of the fact that the manufacturer of the camera
> (soft and hardware) doesn t have more information than you and could
> therefore target defects in your entropy management procedure.

> You should have a look at the work of Viktor Fischer, David Lubicz, Florent
> Bernard and patrick Haddad. They invested quite a bit of effort to give
> entropy guarantees when using very specific hardware device.

Skibinsky subsequently responded:

> Alexandre, thanks for reading and suggestions! I will certainly check out
> your references.

> As it is probably obvious from the essay-style narrative, this is not
> intended to be a tight scientific paper, just our research log of first
> order ideas we coded up for minimal working prototype. You are correct on
> #1,#3 - current codebase doesn't addresses these issues. #2 is interesting,
> because besides wide variety of camera hardware that model should reflect,
> iOS camera parameters present us with an opportunity to create optimal
> hardware source. This is far from our area of expertise, so I hope somebody
> in open source community will pick it up from here and figure out both
> formal model and what physical settings will optimize the source.

> Thanks again for great suggestions, I will further emphasize impact of
> correlations & VN sensitivity to non-IID in final section.

> Most likely practical direction of course is simply use universal hash
> extractor instead of VN, since it relaxes a lot of requirements.

------
atoponce
This is really cool. The fact that I can, basically, carry around a TRNG in my
pocket is like, the ultimate nerd. Other than reseeding /dev/urandom, I don't
really have any personal need for it, but the discussions that can be
generated from it could be very interesting.

~~~
lordmax
my motivation from a while back was it would be so useful to have something
that you can dump few megs of independent entropy into your /dev/urandom right
before generating BTC/ssh keys. TrueRNG stick is also good, but just 40kb/sec
(i have TrueRNG on cron job reseeding every hour anyway)

------
iridium
Wouldn't a lossy JPEG (or HEIC) compression reduce the entropy?

~~~
lordmax
That would be correct! However we getting values from raw camera video buffer
before any compression takes place.

------
et2o
Couldn’t you have used more formal methods for spatial autocorrelation?

~~~
lordmax
Do you have any references of existing code or research that deals with
correlation issues? We considered few home grown ideas (like measuring
correlation level in each sample and then compacting a sample by that % before
quantizing) but all of them were pretty computationally heavy...

~~~
et2o
I’d just google “spatial autocorrelation correction,” the top few hits are
good intros. This is a pretty mature statistical field.

Lots of examples in ecology. Most techniques are regression based and pretty
doable.

------
zkms
> Take two video frames as big arrays of RGB values.

> Subtract one frame from another, that leaves us with samples of raw thermal
> noise values.

> Calculate the mean of a noise frame. If it is outside of ±0.1 range, we
> assume the camera has moved between frames, and reject this noise frame.

> Delete improbably long sequences of zeroes produced by oversaturated areas.
> For our 1920x1080=2Mb samples and a natural zero probability of 8.69%, any
> sequence longer than 7 zeros will be removed from the raw data.

> Quantize raw values from ±40 range into 1,2 or 3 bits: raw_value % 2^bits.

> Group quantized values into batches sampled from different R,G,B channels,
> at big pixel distances from each other and in different frames to minimize
> the impact of space and time correlations in that batch.

> Process a batch of 6–8 values with the Von Neumann algorithm to generate a
> few uniform bits of output.

> Collect the uniform bits into a new entropy block.

> Check the new block with a chi-square test. Reject blocks that score too
> high and therefore are too improbable to come from a uniform entropy source.

This reads like a highly ad-hoc process with nothing resembling a formal
justification for any of its steps; nor its general outline, nor any of the
magic numbers used in it. It's unclear what properties are being achieved and
how exactly the steps guarantee those. There is no analysis of the
predictability of the data by an adversary, either.

What does this get you that SHA512'ing the entire raw image bitmap doesn't?
Using statistical tests makes sense to verify that the camera data isn't
pathologically anomalous (say, all zeroes or all 255), but I don't understand
why this sort of procedure is preferable to using a strong hash function to
extract randomness from an image sensor's output.

~~~
lordmax
you are completely correct - as I mentioned in the end this would be easier
extractor: "Want to relax IID assumptions and avoid careful setup of a non-
correlated quantizer? Easy — use any good universal hash function. The only
new requirement is that the universal hash will require a one-time seed. We
can make an easy assumption that the iPhone local state of /dev/urandom is
totally independent from thermal noise camera is seeing, and simply pick that
seed from everybody's favorite CSPRNG."

The main reason we went with VN instead of SHA or universal hash is just was
more fun thing to build/experiment with. SHA is like flamethrower that will
deal with anything you throw at it. VN is far more brittle and you see all
mistakes in scenery or generation. Of course then you hash the output anyway
before use!

