
OpenSSH taking minutes to become available, booting takes half an hour (2018) - zdw
https://daniel-lange.com/archives/152-hello-buster.html
======
notaplumber
It doesn't on OpenBSD. arc4random(3) simply can not fail, and getentropy(2)
does not block. Linux screwed up getrandom. High quality random numbers are
available very early in the OpenBSD kernel, and there is no early boot problem
for userland.

Linux has an opportunity to learn from OpenBSD.

[https://www.openbsd.org/papers/hackfest2014-arc4random/index...](https://www.openbsd.org/papers/hackfest2014-arc4random/index.html)

~~~
deathanatos
Seems like they store entropy in an on-disk file[1], and re-read that file on
startup. This feature is available in Linux as well (see urandom(4) for an
example of how to do it manually, though I think most distros bundle this in
some manner); although apparently, according to the OP, it doesn't fully work?

Seems like the man page sort of notes the problem in the article:

> _Writing to /dev/random or /dev/urandom will update the entropy pool with
> the data written, but this will not result in a higher entropy count._

But, why not, man page, why not? I feel like if I, as root, write to
/dev/random, I am saying that this is acceptable input for seeding the
generator, and that it is on me to not, say, seed it with the exact same stuff
every boot, or seed it with straight nuls, no?

How does OpenBSD guarantee that this file exists, though? Perhaps making it a
system default is wise, but what about brand new VMs/installs? Does the
install just take the time to do the first generation of that file?

The OpenBSD folks also appear to trust RDRANG, which would help not blocking.
But I disagree with that trust decision.

[1]: start reading here:
[https://www.openbsd.org/papers/hackfest2014-arc4random/mgp00...](https://www.openbsd.org/papers/hackfest2014-arc4random/mgp00019.html)

~~~
notaplumber
> The OpenBSD folks also appear to trust RDRANG, which would help not
> blocking. But I disagree with that trust decision.

It feeds the entropy pool, mixed in with many sources. OpenBSD also feeds in
data from the AMD CCP as well. The point is it doesn't matter.

[https://man.openbsd.org/ccp](https://man.openbsd.org/ccp)

And anyone running "echo badapples > /dev/random" in a loop, it still doesn't
matter.

[https://man.openbsd.org/random.4](https://man.openbsd.org/random.4)

[https://man.openbsd.org/arc4random.9](https://man.openbsd.org/arc4random.9)

> How does OpenBSD guarantee that this file exists, though?

If you keep reading, OpenBSD's random subsystem _always mixes old with new
data from many sources_. If the random.seed file is not available (for
example, boot media), it is not fatal. There is a warning displayed. But the
system rc(8) scripts handle creating the seed file at both boot and shutdown.

[https://www.openbsd.org/papers/hackfest2014-arc4random/mgp00...](https://www.openbsd.org/papers/hackfest2014-arc4random/mgp00019.html)

~~~
deathanatos
> _If you keep reading, OpenBSD 's random subsystem always mixes old with new
> data from many sources._

Sure. But say we don't trust RDRAND. It's the first boot, so you haven't got a
seed file yet. Assuming you haven't got a HW generator at hand, what other
source are there that don't take significant time to collect from? (I.e., I
know you can get it from network timings, keyboard/mouse if they exist, etc.,
but that takes time, and would cause a call not wanting to return prior to
there being sufficient data to block. If you're only mixing in unready or
untrusted sources, you might get _lucky_ and make it hard for an attacker, or
worse, if your other sources are blocking b/c they're collecting data… you
might not, but regardless it hardly seems principled?)

It'd be willing to write off the first boot, except that I feel like a lot of
things do potentially get initialized then; e.g., in a VM in the cloud, I
think that's when host keys are initialized, for example, and certainly
whatever application that VM might be hosting could have further requirements.
Not blocking and issuing a warning would be to supply data prior to having
sufficiently initialized a generator, no?

~~~
ori_b
> Sure. But say we don't trust RDRAND. It's the first boot, so you haven't got
> a seed file yet.

Why not? The install media should be collecting entropy as it does the
install, xoring it with a pool of random data built into it, and writing that
to disk as part of the install.

Take a look at
[https://github.com/openbsd/src/blob/master/distrib/miniroot/...](https://github.com/openbsd/src/blob/master/distrib/miniroot/install.sub),
specifically, feed_random and store_random

~~~
deathanatos
Okay, that's fair for install media installing to a physical machine such as a
desktop.

What about virtual machines / machines booting off an image? We can't put a
seed in the image, or it'll get distributed to all downstream consumers.

Also, there's the case posed in response to my first comment, about IOT
devices which would ship with some factory-installed image.

(I suppose there are some novel ways one _could_ work around this such as
somehow keeping a pool of one-shot images ready w/ just the seed added to
them, but I don't feel like this is how real-world systems work. E.g., an AMI
in AWS?)

~~~
ori_b
> _What about virtual machines / machines booting off an image? We can't put a
> seed in the image, or it'll get distributed to all downstream consumers._

Mitigations within mitigations.

Having one random seed per VM image that gets installed is better than having
nothing at all -- now, an attacker needs to have your install image. Then the
hardware RNG and the virtio random drivers help mix into that seed.

Given that presumably you've already booted the image for testing, you've
probably already generated the SSH keys that you need to be concerned about --
so you'd probably need to take care to regenerate them securely in any case,
from a running system that has started to gather entropy from all the sources
it can get its hands on, including the boot loader.

And, ideally part of the imaging process would write some random data to the
image. But I agree, most people wouldn't even think of doing that.

------
bcoates
Is it cryptographically possible to not require randomness for an SSH server
past initial key creation? If so it'd be worth doing, quality entropy is a
hard ask for embedded/virtual systems.

~~~
wahern
You need randomness for ephemeral keys, both asymmetric and symmetric.

Quality entropy isn't a hard ask, at least not for anything typically running
OpenSSH or other server software. Intel has RDRAND, and even where AMD's
RDRAND is broken their PSP coprocessors provide an entropy function.
Similarly, NICs and other controllers also often come with RNGs. RNGs abound
on modern embedded systems, actually, it's just that nobody has the full-time
job of plugging them into the kernel's PRNG pool. It's not a full-time job for
any OpenBSD developer, either, but they do seem to do a better job of this
than on Linux; often times the only feature supported of a miscellaneous
system component is the RNG.

It's the APIs that are broken. getrandom shouldn't block, period. A system
only needs 16-32 bytes of pure hardware randomness for _strong_ security.
That's it! You either have it shortly upon boot, or you don't. If you don't,
you're screwed anyhow, so why block? If you have 32 bytes of good entropy, all
the entropy accounting mumbo-jumbo is pointless.

I do take issue with the author's complaint that systemd shouldn't have its
own user-space PRNG. BSD systems have arc4random() as part of their libc,
which is seeded from the kernel pool. This is very convenient; developers
shouldn't have to think twice about calling into a PRNG for a 32-bit number,
but if acquiring that requires a syscall they do think twice and often screw
things up. Not to mention that Linux getrandom's default block semantics is
broken by design.[1] Until glibc, musl libc, and other Linux runtimes wisen up
and add arc4random, it's hard to blame projects like systemd for including
their own PRNG.

[1] I realize that getrandom has an option to _not_ block, but it's only
function is to cast suspicion on itself when in fact the only thing that
deserves suspicion are entropy guesstimators.

~~~
Hello71
> A system only needs 16-32 bytes of pure hardware randomness for strong
> security. That's it!

Correct.

> You either have it shortly upon boot, or you don't.

Not correct. On virtualized systems, with minimal interrupts, there is
frequently _not_ enough entropy available. This comment seems to reflect a
poor understanding of the getrandom(2) system call and its history; part of
the reason for implementing getrandom was in order to provide this new
behavior of no longer blocking once enough entropy has been collected for a
secure CSPRNG initialization. Linux needs this "entropy accounting mumbo
jumbo" because it doesn't have a standard mechanism for persisting random
seeds across boots; OpenBSD doesn't need it because the bootloader and kernel
are tightly integrated, so they can easily implement this feature.

~~~
wahern
virtio-rng has been around since 2013. If a hypervisor doesn't support passing
through entropy, then it's broken. Period.

Entropy accounting can't fix the underling problems here, all they do is
obscure and confuse. Hypothetically and in a very technical sense, they can be
useful and even necessary. But in practice they simply have no place outside
of the actual hardware-based entropy generating devices. If you can't quickly
seed yourself with 32[1] bytes of cryptographically strong entropy, then
you're screwed, period. Blocking doesn't improve security, it just induces
people and developers to implement awkward workarounds with the net effect of
drastically reducing security.

When Linux added the getrandom syscall they _should_ have dropped support for
blocking. It was patterned after OpenBSD getentropy, which doesn't block;
neither did Linux' long deprecated sysctl() random UUID mechanism that many
programs once relied upon (like Tor). But they, and Ts'o in particular, seem
unable to resist the siren call of entropy guesstimation.

[1] Even 16 bytes is enough to seed the system pool for an indefinite period,
at least relative to a system without a strong hardware RNG. And that's the
point. There are reasons for why a component might need ongoing sources of
strong entropy, but if a system can't even provide 16-32 bytes at boot then
those arguments are purely hypothetical because there clearly aren't sources
of strong entropy available, anyhow. But if those sources _are_ available,
then it's ridiculous to think they can be "depleted" as a practical matter. If
the CSPRNG pooling functions are broken, then all modern cryptography is
broken, so you gain nothing with the convoluted semantics of Linux's
traditional /dev/random machinations.

------
neilwilson
This affects Debian 10 official cloud images on various clouds - as the image
doesn’t handle virt-rng out of the box

~~~
notaplumber
It's not just their cloud images, as far as I know Debian excludes all virtio
drivers (net/disk/rnd) from all install media.

It is very frustrating.

~~~
simcop2387
I regularly do net installs of debian with virtio net and disk. I think this
used to be true but it isn't anymore.

Edit: looks like this is supported to at least back to debian lenny, released
in 2009.

------
AaronFriel
Is this because there is 0 entropy available at boot and therefore even
/dev/urandom is unavailable? Or is this because legacy tools are still relying
on the concept of being able to "drain" the entropy pool? I'm not sure which
the "getrandom()" call is related to.

~~~
kroeckx
The problem with /dev/urandom is that it's always available, even before any
entropy was added. Fixing /dev/urandom to block until enough entropy is
available once causes breakage like in the article. That is one of the reasons
why getrandom() was added, so that existing applications can keep using the
old and broken behavior but applications that care about the RNG being
properly initialized can wait for it. So software that does crypto switched to
it, and then things broke anyway.

~~~
cat199
> Fixing /dev/urandom to block until enough entropy is available once causes
> breakage like in the article.

... not to mention it is the entire point of /dev/urandom .. if you want
something to block for entropy, you use /dev/random. if you are using
/dev/urandom, you are saying I want something 'randomly random' and I'm okay
if I can't get something random enough..

------
plq
I'm so happy I've stayed with Gentoo and never had to bother with this systemd
thing that everyone is talking about.

~~~
kroeckx
systemd didn't change much related to this. The seed file never got credited,
at least not in Debian. So either you have the same problem on Gentoo, or
Gentoo has other problems.

------
wiredfool
I’ve had docker take 5 minutes to start on a cloud server due to entropy
starvation. Iirc it’s somewhat kernel version dependent between the later 4.x
kernels and the older xenial default kernel.

------
DonHopkins
TIL Linux launches a built-in denial-of-service attack on itself every time it
reboots!

------
dfeojm-zlib
1\. /etc/ssh/sshd_config: UseDNS no

2\. If haveged isn't perfect enough, setup a hw TRNG (i.e., EntropyKey,
BitBabbler, InfiniteNoise) and EGD or EntropyBroker.

[https://everipedia.org/wiki/lang_en/Comparison_of_hardware_r...](https://everipedia.org/wiki/lang_en/Comparison_of_hardware_random_number_generators)

[http://egd.sourceforge.net](http://egd.sourceforge.net)

[https://www.vanheusden.com/entropybroker](https://www.vanheusden.com/entropybroker)

3\. It maybe something else.

~~~
emmelaich
> UseDNS no

Is not sufficient by itself to never have any part of the ssh connection and
authentication not use DNS.

See [https://unix.stackexchange.com/questions/56941/what-is-
the-p...](https://unix.stackexchange.com/questions/56941/what-is-the-point-of-
sshd-usedns-option) and [https://serverfault.com/questions/576293/sshd-tries-
reverse-...](https://serverfault.com/questions/576293/sshd-tries-reverse-dns-
lookups-with-usedns-no)

~~~
jwilk
And "UseDNS no" is the default anyway.

------
peeters
Most fun bug I've ever fixed is when we had builds time out while running unit
tests, but it would never happen on our dev machines. Unless we let it run
while grabbing a coffee. Or if we just sat there watching the screen without
touching the mouse. But as soon as we would touch the mouse it would resume.

...tl;dr quick change to Java property so that SecureRandom uses /dev/urandom
instead of /dev/random and problem solved.

------
0815test
Have you tried wiggling your mouse? It worked wonders for me, back when I was
on Windows 95!

~~~
DonHopkins
Maybe AWS can hook up a Mechanical Turk service to AWS so real live people can
generate entropy by wiggling mice over the internet.

~~~
jtl999
Hope the connection doesn't get MITM'd and have the entropy replaced with bias
:)

------
stevenicr
Had similar years ago on a box and found out the disk(s) were 99.99% full -
took forever to whittle it down.

~~~
hnarn
No monitoring? :(

------
codebeaker
Needs a (2018) at least (article date) in the headline. Although it reports an
issue introduced to Linux in (2013) which was "discovered" in (2017).

~~~
marcosdumay
The systemd bug thread was started in 2016, closed as not a bug still in 2016,
and been referenced by a lot of other bugs claiming wontfix because the
problem is this one "not a bug".

There are posts as lately as this year.

~~~
pmlnr
> closed as not a bug

I can't even count these any more, systemd closed it as "not a bug". I have
angry words in my head.

------
buildzr
Very disappointing to hear that Debian and some other distros enabled
CONFIG_RANDOM_TRUST_CPU.

I'd rather not be relying on Intel hardware to do something so important
correctly.

~~~
pjc50
And yet you trust it to run all your other instructions? This is kind of a mad
threat model. The processor can see everything, change anything, and do so at
any time.

~~~
buildzr
That's fair I suppose, the difference being that there's a lot more ways to
screw up instruction backdoors and make them extremely visible than RDRAND,
where the output is claimed to be unpredictable.

In the end it's a defense in depth approach - I simply think the original and
default Linux kernel policy of demanding sufficient entropy from other means
is preferable from a security perspective.

Theodore Tso has some good posts on this on the kernel mailing lists. Due to
the way these sources get mixed, it would only improve security to turn this
option off.

