
Really Fixing Getrandom() - Tomte
https://lwn.net/SubscriberLink/802360/06e2457983b56edb/
======
Nican
This is a continuation of this discussion "Fixing Getrandom()"
([https://news.ycombinator.com/item?id=21114524](https://news.ycombinator.com/item?id=21114524)).

I love this story, as a complete unrelated kernel commit that reduced the
amount of file I/O, caused the kernel to not generate enough entropy at boot,
and in consequence, made the system unbootable.

------
tytso
My response to Linus:

I'm OK with this as a starting point. If a jitter entropy system allow us to
get pass this logjam, let's do it. At least for the x86 architecture, it will
be security through obscurity. And if the alternative is potentially failing
where the adversary can attack the CRNG, it's my preference. It's certainly
better than nothing, in that in that very worst case, "security through
obscurity" is better than "don't block, because blocking is always worse than
an guessable value being returned through getrandom(0)". And so if these are
the only two options, I'm certainly going to choose the former.

That being said, I'm still very worried that people with deep access to the
implementation details of a CPU might be able to reverse engineer what a
jitter entropy scheme produces. For that reason, I'd very much like to see
someone do an analysis of how well these jitter schemes work on an open,
simple architecture such as an RISC-V iplementation (you know, the ones that
were so simplistic and didn't have any speculation so they weren't vulnerable
to Specture/Meltdown). If jitter approaches turn out not to work that well on
RISC-V, perhaps that will be a goad for future RISC-V chips to include the
crypto extension to their ISA.

In the long term (not in time for the 5.4 merge window), I'm convinced that we
should be trying as many ways of getting entropy as possible. If we're using
UEFI, we should be trying to get it from UEFI's secure random number
generator; if there is a TPM, we should be trying to get random numbers from
the RPM, and mix them in, and so on.

After all, the reason why lived with getrandom(0) blocking for five years was
because for the vast majority of x86 platforms, it simply wasn't problem in
practice. We need to get back to that place where in practice, we've harvested
as much uncertainty from hardware as possible, so most folks are comfortable
that attacking the CRNG is no longer the simplest way to crack system
security.

(The above is a lightly edited almagamation of
[https://lore.kernel.org/r/20190930033706.GD4994@mit.edu](https://lore.kernel.org/r/20190930033706.GD4994@mit.edu)
and
[https://lore.kernel.org/r/20190930131639.GF4994@mit.edu](https://lore.kernel.org/r/20190930131639.GF4994@mit.edu))

~~~
pjc50
> "don't block, because blocking is always worse than an guessable value being
> returned through getrandom(0)".

It seems that this fight has continued for so long because both possible
answers are valid here, depending on the exact use case of the system as a
whole(+)? Have we got an interface which lets you properly specify the
paranoia vs. best-effort choice yet?

(+) e.g. the SSH server running on your lightbulb may accept that it's not
exposed to the internet and therefore not subject to targeted attack, but it's
really inconvenient to sit in the dark, so the security-availability tradeoff
is worth it.

> most folks are comfortable that attacking the CRNG is no longer the simplest
> way to crack system security.

Was it ever? It seems that phishing the admins has always been the easiest
way.

~~~
justincormack
If your lightbulb is running ssh not telnet it is assuming some source of
entropy, or it may as well be running telnet. There has been some work on
hardening handshakes on protocols where randomness might not be available that
it could look at.

~~~
derefr
Does that really need to be true? Protocols with fixed keys can skip Diffie-
Hellman, right?

As such, is there an option to configure OpenSSH such that it has a fixed
authorized_keys, and does the authentication handshake in the opposite order:
_first_ establishing that it can talk to the client by decrypting the messages
the client is sending; and _then_ parsing the client’s auth commands from said
decrypted stream, where one SSH AUTH message might be “auth me using the very
key we’re conversing over.”

If there is, I’m surprised IoT devices don’t go for it. If there isn’t... why
not?

~~~
tialaramex
Now I'm in front of an actual PC let's explain a bit more

It's absolutely critical in SSH that we end up with a unique session ID which
will be the output of a cryptographic hash function (these days maybe SHA-256)
but obviously the input to that function must be secret. Everything in the
higher level parts of SSH assumes that there is a secret unique session ID.
The session ID stays the same for the lifetime of a connection, although you
can run key agreement itself again if the connection is long-lived or moves a
lot of data so that it's unwise to keep using the same symmetric keys.

If you do any variant of DH obviously this session ID is the end result of the
first DH key agreement process and so it'll be different every time because
you're using random numbers.

But if you want to add a "fixed asymetric keys" mode you're going to need to
agree a secret session ID for each new connection... somehow.

At some cost you could pick at random and send it from one party to the other,
but then we're exactly back where we started about how we're assuming we have
a good source of entropy.

If we just pick any fixed value it might as well be 0000 0000 0000 0000 and so
on then obviously a bad guy can break everything and we should have just used
telnet.

------
lathiat
I had a great bug in libvirt fixed where two machines being deployed in an
automated fashion generated the same MAC address for the virbr0. And at least
one other person had reported the same issue previously on a mailing list.

Details from the bug I filed
([https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/17103...](https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1710341)):

src/util/virrandom.c:virRandomOnceInit seeds the random number generator using
this formula: unsigned int seed = time(NULL) ^ getpid();

This seems to be a popular method after a quick google but it's easy to see
how this can be problematic. The time is only in seconds, and during boot of a
relatively identical system these numbers are both likely to be relatively
similar across multiple systems which is quite likely in cloud-like
environments. Secondly, by using bitwise OR only a small difference is created
and if the 1st or 2nd MSB of the pid or time are 0 then it would be easy to
have colliding values.

Though problematic from basic logic, I also tested this with a small test
program trying 67,921 unique combinations of time() and pid() which produced
only 5,693 random seeds using PID range 6799-6810 and time() range 1502484340
to 1502489999.

During the actual incident in question, the 4 systems were all booted within
1-2 seconds of each other. We can see from dmesg that the two systems that
generated the same MAC did in fact boot during the same second and the other
two did not

~~~
pjc50
Reminds me of mjg's recent attack on Bird:
[https://mjg59.dreamwidth.org/53258.html](https://mjg59.dreamwidth.org/53258.html)

> Digging through the code revealed 8 bytes worth of key fairly quickly, but
> the other 8 bytes were less obvious. I finally figured out that 4 more bytes
> were the value of another Bluetooth variable which could be simply read out
> by a client. The final 4 bytes were more confusing, because all the evidence
> made no sense. It looked like it came from passing the scooter serial number
> to atoi(), which converts an ASCII representation of a number to an integer.
> But this seemed wrong, because atoi() stops at the first non-numeric value
> and the scooter serial numbers all started with a letter[2]. It turned out
> that I was overthinking it and for the vast majority of scooters in the
> fleet, this section of the key was always "0".

------
vanni
Somewhat relevant: CPU Time Jitter Based Non-Physical True Random Number
Generator by Stephan Müller

[http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html#toc-
Ap...](http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html#toc-Appendix-B)

[https://pdfs.semanticscholar.org/af73/17c970fd416646b2e46659...](https://pdfs.semanticscholar.org/af73/17c970fd416646b2e46659c9624108be4fcc.pdf)

~~~
mangix
FWIW this is used in OpenWrt.

------
marios
I am by not means qualified to judge on the quality of the patch, but I'm glad
Linus is moving away from adding flags to getrandom() (resulting in subtly
different behaviour than the Open/FreeBSD implementations). Now if only Linux
fixed the "/dev/random provides higher quality random numbers than
/dev/urandom and blocks when the entropy pool is empty" nonsense, that would
be great.

~~~
thenewnewguy
What's the issue there? As far as I can tell that's intended behavior - for
most randomness /dev/urandom _never_ blocks, and for the paranoid /dev/random
blocks when there's no entropy.

~~~
toast0
Classically, on Linux you had /dev/urandom which always gave you something,
even if the system hadn't achieved a seeded state, and /dev/random which would
block in case the system wasn't seeded and also in case random had been used
too much without more input into the entropy.

Neither one of those is usable for key generation. urandom may give you
repeatable data if the system hasn't seeded itself yet. random may block on
accounting that most experts find problematic.

getrandom() finally provided the right semantics of only blocking when the
system isn't seeded, and without a filesystem/device node dependency; however
changes in startup software, including filesystem improvements have resulted
in shortened boot sequences and less entropy gathered. In some cases, systems
were blocked waiting on entropy before any (non human interfaced) entropy
sources were enabled.

------
derefr
So, the big outlier in terms of TSC predictability would be virtual machines,
no? Which VM hypervisors pass through RDTSC to the host (and then, presumably,
add an offset before returning it, such that the VM can start at TSC=0); and
which ones just have a “virtual” TSC unrelated to the host’s? Do instances on
any production IaaS compute cloud have predictable virtual TSCs?

------
ChrisSD
Using CPU jitter is a clever solution. I hope it stands up to scrutiny.

> but mixing entropy from the hardware with other sources is considered to be
> safe by most, even if the hardware generator is somehow suspect.

To be clear, it's a property of bitwise xor that it preserves entropy. If you
xor a random bit with 1 you still have a random bit.

~~~
dooglius
Well, that's not entirely true, even if two random bits are independent, xor-
ing them only gives you one random bit. What actually happens in Linux is that
each new input of random bits gets pushed through a cryptographic hash, before
being xor'd.

~~~
ChrisSD
From replies it seems my two sentences weren't as clear as I thought. I'll try
to refine it:

A property of bitwise xoring random variable `x` with a constant (or otherwise
non-random but independent) value is that the entropy of `x` is preserved in
the result.

~~~
WAHa_06x36
However, xoring two arbitrary random variables is NOT guaranteed to preserve
entropy. Preserving entropy only happens under certain specific conditions.

~~~
ChrisSD
Under what circumstances would xoring `x` and `y` not produce a result with
entropy at least as great as `x`?

~~~
WAHa_06x36
In the very obvious cases where x == y, or x == not y, or x == y + 1, or many
other correlations.

~~~
ChrisSD
Only if y is known to be dependent on x. If y is random but just so happens to
be `not x` or `== x` then that won't make the result any less random.

0 is as random a number as 529890873740477 is.

~~~
WAHa_06x36
Single numbers don't really have entropy at all, so that is not something it
even makes sense to talk about.

------
codedokode
Maybe random values should not be necessary to boot the system? I remember
reading somewhere that systemd needs cryptographically random numbers to start
up and I don't understand why.

------
devit
Well, the proper solution for the future would be for the kernel to save and
restore a random seed using UEFI variables, a similar firmware mechanism or
free space in a boot sector or swap partition.

But of course this doesn't solve the problem of newer kernels on existing
installations with no such stored seed and that don't pass command line
argument telling the kernel where to store the seed.

~~~
mlyle
> Well, the proper solution for the future would be for the kernel to save and
> restore a random seed using UEFI variables, a similar firmware mechanism or
> free space in a boot sector or swap partition.

This needs to be invalidated before numbers dependent on the seed are used,
else you risk attacks where you get someone to generate the same random
numbers on consecutive boots.

In turn, if you crash during bootup at the wrong time, you're left without a
seed.

~~~
devit
Yes, you need to update the seed immediately after reading it and before using
it.

There is no problem with a crash assuming that the seed write mechanism is
crash-safe (which is achievable with block storage, but indeed I'm not sure
whether firmware is properly designed and implemented).

------
enriquto
Speaking of which,...

does anybody here know a portable, easy to use, and high quality,
deterministic random number generator? It is a requirement that you must get
the exactly the same sequence of pseudo-random numbers on any architecture
(depending on a user-settable seed).

I use a hand-crafted (two-line) linear congruential generator, but always feel
a bit uneasy about that.

~~~
ChrisLomont
PCG generators are very good, very fast, and decently random, don't suffer
(AFAIK) from bad seeds like Mersenne Twister, mix well, have small state, and
have selectable periods of about any size. They are vastly better than LCGs.

I use a 64 bit one and a 128 bit one for all sorts of work.

These are not crypto, so don't use them for that. For non-crypto I have seen
no better PRNG than these (and I've followed and written on this space for a
long time).

Code available on github.

[http://www.pcg-random.org/](http://www.pcg-random.org/)

~~~
tptacek
When you say "not crypto, don't use them for that", you mean "don't use them
for any situation in which the security of the randomness matters", right?

~~~
ChrisLomont
Yep - which is a lot, if not the vast majority, of random numbers generated.

Also, I'm answering a question by someone that looks like they want precisely
this type of PRNG. There is no need to make a mess with CPRNGs.

------
aidenn0
Has anyone considered fixing systemd to be able to load the on-disk entropy
store sooner? This would be only an issue when booting machines for the first
time (with the exception of diskless machines) if that is fixed.

~~~
tzs
An issue with that is how to deal with things that want random numbers before
the disk is writable. If you read the seed while the disk is read-only, and
then something uses random numbers based on that seed, and then something
causes the boot to abort before the on-disk seed is updated, then the next
boot attempt will reuse the same on-disk seed.

Depending on what random numbers are used for between seeding and writing the
updated seed, that could range from harmless to disastrous.

With some care in how things are ordered during system startup, it could
probably be made safe.

~~~
aidenn0
As long as you order it so the network isn't up before the disk is writable,
then no persistent effects could happen from reusing numbers, right? If you
can't send out a network packet and can't write to the disk, what other
recordable events are there? An attacker writing down numbers observed on the
console?

~~~
viraptor
That assumes your disk is not provided over network. Which may or may not be
true.

------
neilsimp1
Does anybody who is familiar with the BSDs know how those OSes solve this kind
of problem? (I don't know enough to say if this even _could_ be a problem with
their architecture, just curious).

~~~
throw0101a
The boot loaders read entropy off-disk to seed the kernel:

* [https://www.freebsd.org/cgi/man.cgi?loader.conf(5)](https://www.freebsd.org/cgi/man.cgi?loader.conf\(5\)) (see "entropy_cache_load")

When the system is brought down cleanly (init 0 or 6) a seed file is saved.
Also, when the system is started up, a seed file is created in case there's a
crash. OpenBSD's _rc(8)_ :

    
    
      # Push the old seed into the kernel, create a future seed  and create a seed
      # file for the boot-loader.
      random_seed() {
        dd if=/var/db/host.random of=/dev/random bs=65536 count=1 status=none
        chmod 600 /var/db/host.random
        dd if=/dev/random of=/var/db/host.random bs=65536 count=1 status=none
        dd if=/dev/random of=/etc/random.seed bs=512 count=1 status=none
        chmod 600 /etc/random.seed
      }
    

* [https://cvsweb.openbsd.org/src/etc/rc?rev=1.537](https://cvsweb.openbsd.org/src/etc/rc?rev=1.537)

FreeBSD also has a cron job that saves 4KB of entropy every 11 minutes in a
rotating series of files:

* [https://svnweb.freebsd.org/base/head/libexec/save-entropy/](https://svnweb.freebsd.org/base/head/libexec/save-entropy/)

* [https://svnweb.freebsd.org/base/head/libexec/rc/rc.d/random](https://svnweb.freebsd.org/base/head/libexec/rc/rc.d/random)

~~~
beefhash
The installer also leaves a random.seed file so that there's a direct chain
from the very first boot.

~~~
throw0101a
So the main remaining risk is things like (cloud) images where many
people/machines use the same seed. This can be mitigated by stirring in
RDRAND, unique CPU information (serial number?), high-res date/time, and other
possibly unique information (MAC addresses?).

~~~
anjbe
If you’re running virtual machines you can just use a virtual RNG device
exposed by the hypervisor.

~~~
wahern
The Virtio RNG driver was merged into Linux 2.6.26, released a decade ago.
AFAIU it _should_ be built into most Linux kernels. I just confirmed on Alpine
3.10 and Ubuntu 18.10 (Cosmic).

The problem is that not all hypervisors provide the device by default. I use
libvirt KVM/QEMU and the default template doesn't include it, but you can add
it. AWS EC2 doesn't provide it, though it doesn't provide any virtio devices.
Parallels Desktop 15 doesn't support virtio-rng, either

OTOH, OpenBSD's VMD not only supports it but I _think_ it's enabled by
default. (I see it in the source code. I can't find a way to enable/disable
it, but the publicly posted dmesg dumps seem to always show it as found when
an OpenBSD guest boots.)

Of course, most VMs are x86_64-based using hardware extensions and likely
using CPUs providing rdrand. But hypervisors really should provide the virtio-
rng device default, perhaps even unconditionally as OpenBSD apparently does.

~~~
anjbe
> OTOH, OpenBSD's VMD not only supports it but I _think_ it's enabled by
> default.

Yes, my VMD virtual machines have it with no configuration necessary.

    
    
        pvbus0 at mainbus0: OpenBSD
        pci0 at mainbus0 bus 0
        virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
        viornd0 at virtio0
    

Here’s the manpage:
[https://man.openbsd.org/viornd.4](https://man.openbsd.org/viornd.4)

------
magashna
I didn't know that a jiffy was the formal unit for kernel timing. That tickled
me quite a bit.

~~~
denton-scratch
Analogously, a "mickey" is the (informal) unit for timing mouse movements.

------
fjp
How the hell do people get to a skill level where they can do this stuff.

~~~
tenebrisalietum
Do what? Fix getrandom() or take advantage of getrandom() vulnerabilities?

------
cabalamat
When I want a randomness source, I use the output from `ps auxw`, `df`, and
`date`.

~~~
shakna
Ignoring that those may not be great sources for cryptographic randomness, how
does one do something like that when the userland isn't established yet
because the system is still booting?

That's why a working getrandom is so important.

