
Entropy and the NetBSD Kernel - jayp1418
https://washbear.neocities.org/entropy.html
======
als0
> My goal was to ensure that we have collected from as many potential sources
> of randomness as possible to ensure a quality initial seed for userspace
> random number generators. So, it includes everything, including sources
> considered "low value", like interrupt timings.

While a TRNG is fundamentally necessary for initial entropy, it is generally a
good idea to mix as much other randomness as possible to reduce your
dependency on a single source, in case that single source turns out to be
biased. So I agree with the author that there is still some value in the "low
value" options.

------
loeg
If you like this sort of thing, here's a zoo of other designs:

* [http://aka.ms/win10rng](http://aka.ms/win10rng) (Windows 10, PDF)

* [https://en.wikipedia.org/wiki/Fortuna_(PRNG)](https://en.wikipedia.org/wiki/Fortuna_\(PRNG\)) (FreeBSD)

* [https://csrc.nist.gov/publications/detail/sp/800-90a/rev-1/f...](https://csrc.nist.gov/publications/detail/sp/800-90a/rev-1/final) (NIST)

* [https://github.com/torvalds/linux/blob/master/drivers/char/r...](https://github.com/torvalds/linux/blob/master/drivers/char/random.c#L55-L307) (Linux; I could not find a separate design document as such. Perhaps the best design reference is this block comment at the beginning of the file implementing it)

------
CJefferson
Picking on one point, I think the name of the hardware is a fair thing to
chuck in the pool.

As this article says, early in the process you often have several soirces of
low quality, and all we can do is hope to get the best we can from anywhere we
can. The name and versions of everything attached is fairly horrible entropy,
but it doesn't hurt.

~~~
libeclipse
Yep, one of the cool things about entropy is that it always increases. If you
keep XORing more information into the seed pool, you only ever increase the
total entropy.

~~~
dfox
As long as you use reasonably secure way to add entropy into the pool, ie.
don't XOR stuff into some kind of pool buffer but use something that at least
partly resembles secure hash even for adding entropy (as using full-blown
SHA-x to process interrupt timings and similar high volume low quality entropy
sources is not exactly practical due to both performance and synchronization
issues).

Also, total entropy is not necessarily the same thing as attacker's
uncertainity of the pool's state. The difference seems mostly theoretical, but
there are practical issues for both cases of these two quantities beeing wildy
different.

~~~
loeg
+1 on the mix-with-SHA or something like it paragraph.

Re: Entropy; in a system RNG design, entropy is typically considered relative
to an attacker. In some very real sense, the "total entropy" of the system is
only the unpredictability of the generator to the most knowledgeable attacker.

------
throwaway2048
I think OpenBSD has an interesting solution to this problem, randomness is
saved in a seedfile on shutdown/periodically during runtime, and injected
directly into the kernel via boot loader before the kernel loads.

This gives you a quality of randomness as early as possible during kernel
boot, and allows you to employ it with kernel structures that are otherwise
impossible to modify post-initialization (that say may be required to get
"randomness harvesting" working).

~~~
flatiron
I think now a days “reboots” are a thing of the past. All of my instances
scale up and scale down and have a max life of 30 days before they are rebuilt
on a patched AMI. I think most stuff is done like that and not real metal
boxes anymore.

~~~
toast0
If you're running in a VM, you are at the mercy of the host, so you may as
well ask the host for a random seed. Use other factors if you can, but those
are all controllable by a sufficiently determined host.

------
jayp1418
More is being discussed here at mailing list thread [https://mail-
index.netbsd.org/tech-userlevel/2020/05/02/msg0...](https://mail-
index.netbsd.org/tech-userlevel/2020/05/02/msg012333.html)

------
cipher_314159
A couple of considerations:

Regarding the hostname thing, the NISP SP800-90Ar1 specs for PRNGs include
"personalization strings" as an optional part of the initialization. Lots of
other systems have similar parameters. The fundamental idea is that randomness
sources are sometimes subject to failure, especially as you ask for more and
more random data (e.g., until recently, /dev/random on Linux would block if it
"didn't have enough entropy", and there have been plenty of issues related to
code NOT CHECKING for short reads). The NIST spec suggests using things like
serial numbers, user IDs, MAC addresses, software versions, timestamps, module
and applications, even random numbers derived from other sources as parts of
the personalization string. The idea is that each PRNG created will always
have at least one unique input to it, so you at least avoid repeated outputs
if the seeds and the nonces get hosed up.

So there's a case to be made for hauling in low-randomness data, even if it
doesn't help accumulate enough randomness. The hostname certainly isn't
perfect (think of all those Raspberry Pis with the same hostname), but it
doesn't HURT as long as we're honest about what that input is providing
(distinction versus randomness).

Also: the author notes that some of the random sources they tested are highly
biased. While biased output from a random source isn't GREAT, it isn't
necessarily a showstopper. The key thing for security in this context is not a
lack of bias, but a lack of predictability. Suppose I have a biased random
source, with a 1/3 probability of outputting a 1, and a 2/3 probability of
outputting a 0. If there's nobody out there who can predict the NEXT output
with probability greater than 2/3 (i.e., it's not backdoored or subject to
some nasty attack), then it's just fine as a random source. We just can't
treat each bit we read as providing a "full bit" of random into the system. In
the example above, you could just seed your 256-bit PRNG with about 280 bits
of biased input. Alternatively, you could just do the old "read twice: discard
if results match; otherwise take the first value" trick to get an unbiased
source.

Also, it's important to remember that "entropy" in these conversations is used
in a squishy way, and it's easy to mix up different definitions of the term.
Folks talking about "entropy" in the context of PRNGs usually mean
unpredictability, unstructuredness, unrecoverability, or some combination
thereof. They typically do _not_ mean Shannon entropy, which is essentially a
measure of UNIFORMITY of output . If you feed the numbers 0 through 255-- in
order-- into an entropy calculator, it will report exactly 8 bits of entropy,
even though the input was clearly structured and predictable. That's why I've
tried to avoid using the term "entropy" and focus on "random" or "randomness".

As for backdoors-- it's theoretically possible that things like RDRAND or a
USB key or whatever can be compromised. Standard, non-dedicated randomness
inputs (like keystroke timing, network packet arrival times, disk I/O info,
interrupt timing, clock drift, etc.) are still included in OS PRNGs, and PRNG
state is periodically updated to fold in that randomness. While it isn't EASY
for a hardware backdoor to overcome this, it's theoretically possible, and it
only takes a small amount of influence to create some devastating effects. Dan
Bernstein wrote an article back in 2014 about the hardware backdoor idea; one
very simple suggestion that he made was to simply design cryptosystems to use
LESS random data (his Ed25519 system generates nonces deterministically, for
instance).

~~~
tialaramex
> you could just do the old "read twice: discard if results match; otherwise
> take the first value" trick to get an unbiased source.

Huh?

If I have a legitimately random source that gives 00, 10 or 11 entirely at
random (but never 01 which is why we're filtering it), and you feed it through
your proposed process to get a 16-bit number you always get 1111111111111111
which is clearly no sort of random.

~~~
cipher_314159
I suppose I should have made it explicit that this construction relies on the
unpredictability of SINGLE-bit reads. I had actually considered some wording
about that and the independence of each sample, but figured I'd sound
pedantic.

Under the system you outlined, that single-bit unpredictability condition
doesn't hold, so you're right that the construction totally breaks down. Given
a starting bit of 0, you can predict the next bit with absolute certainty.

For something like your random source, it would be best to just skip every
other bit and look at the result as a biased bit generator. In that case, the
construction works: you would have (0, 1) and (1, 0) each happening with
probability 2/9, while the matching sets (0, 0) and (1, 1) happen with a
combined probability 5/9\. That gives (0, 1) and (1, 0) as equally likely
outputs, so just consistently take one of them, and you have an unbiased
source.

But what you mention DOES have some relevance to bit generators, too. One of
the Bernstein scenarios deals with a hypothetical backdoored RDRAND
instruction on x86. The basic idea is that the instruction is designed with
some understanding of the system the values will be used in, and doing a short
brute-force to see which value would fix the first four bits to a desired
pattern. With the "sample, check unmatched, take first" construction, that
pattern would be SUPER easy to fix.

------
29athrowaway
Buy a hardware source of entropy.

~~~
loeg
If you own an Intel CPU made in the last decade? 15 years? you've already got
one. PowerPC's got the DARN instruction as well.

~~~
29athrowaway
What Theodore Ts'o said about it:

[https://web.archive.org/web/20180611180213/https://plus.goog...](https://web.archive.org/web/20180611180213/https://plus.google.com/117091380454742934025/posts/SDcoemc9V3J)

(This Theodore Ts'o:
[https://en.wikipedia.org/wiki/Theodore_Ts%27o](https://en.wikipedia.org/wiki/Theodore_Ts%27o))

~~~
a1369209993
You obviously should never use a hardware module as your _only_ source of
entropy. What loeg is pointing out is that if you need a hardware RNG (in
_addition_ to your seedfile, RTC, environmental noise, etc), a modern CPU
fills that particular role in the SRNG system without any additional hardware.

