Hacker News new | past | comments | ask | show | jobs | submit login
Linux Kernel 5.3 (omgubuntu.co.uk)
190 points by AlexDGe 9 months ago | hide | past | favorite | 89 comments

An interesting discussion in Linus's release announcement email about userspace regressions: https://lkml.org/lkml/2019/9/15/241

> Do we just make it act like /dev/urandom by default, and add a new flag for "wait for entropy"?

Dear God. The CSPRNG situation on Linux is deeply depressing.

/dev/urandom is useless because it spews non-random data if it hasn't been seeded yet.

/dev/random is useless because it starts blocking if you try to read too much data from it, because of a mistaken belief that a properly seeded CSPRNG can run out of entropy.

Plus they're both slow as hell, so people try to implement their own PRNGs, often having bugs in the generator or seeding, leading to security issues.

Meanwhile the BSDs have handled this correctly for years. But inexplicably, instead of actually fixing /dev/(u)random, the Linux engineers decide to add a new getrandom() syscall which implements the correct behaviour of only blocking if the PRNG hasn't been seeded.

So finally with getrandom() Linux has a way to securely generate random data without unnecessarily blocking, and now Linus seems to be floating the idea to break it again!

The kernel has plenty of ways to securely seed a PRNG at boot on modern systems; IRQ timings, multicore tricks, sensor data, etc. Run some statistical tests on it to ensure you have a couple hundred bits of randomness and you're done.

> So finally with getrandom() Linux has a way to securely generate random data without unnecessarily blocking, and now Linus seems to be floating the idea to break it again!

Yes, getrandom() works pretty much the "right" way. But the problem is that it still can block during boot, indefinitely. And nobody really wants their computer to just stop working, because it can't guarantee that the entropy is not theoretically possibly "bad". Real users do not want this. But it happens.

The root of this is security paranoia. Security people didn't want the RDRAND instruction to be trusted. SystemD didn't credit the entropy pool when adding the saved seed file from the previous boot, until very recently it got an option to credit the entropy pool. These things are all mixed into the pool, and on any desktop machine /dev/urandom is absolutely fine, but security expert pressure has forced these systems to not trust that real entropy has been added from the many sources that are already implemented. You might be surprised how many people make this problem go away by running havaged, which provides very dubious entropy.

> Security people didn't want the RDRAND instruction to be trusted.

The recent AMD issues have shown that you certainly shouldn't trust rdrand blindly. Even using it after running some statistical tests would still have blocked the kernel bootup on affected machines.

> And nobody really wants their computer to just stop working

I would rather have my computer stop working than initiate cryptographic keys from an all-1s seed.

Of course filling the entropy pool should keep making progress, no matter how slowly, e.g. via the jitter RNG and eventually unblock the systeem. But until there's not enough entropy available for userspace it shouldn't pretend there is.

> You might be surprised how many people make this problem go away by running havaged, [sic] which provides very dubious entropy.

My personal favorite is:

  # rngd -r /dev/urandom -o /dev/random

TBF that's a proper fix for the stupidity that is the entropy estimator.

Whenever I read these discussions, I always see references to BSD vs Linux. The BSD "way" seems cleaner to me, but I'm certainly no expert.

But does anyone know what Windows and OSX (and iOs for that matter) does to "warm up" entropy?

On macOS, /dev/random never blocks. From the beginning up until 2014 or so, this was blatantly insecure: the only initial entropy was the system clock! In microseconds, not seconds, thankfully, but that's still a very low amount of entropy. securityd in userland would send the kernel more entropy once it came up, but before that point, /dev/random would just spew low-quality random numbers.

Since 2014, however, macOS has expected to get a random seed from the bootloader, which in turn gets it from rdrand if available, or some complicated timer stuff if not. I'm not sure how secure the latter is, but as of Mojave, there are no longer any supported Macs without rdrand, making the issue moot...

On Windows, I believe RtlGenRandom is pretty similar to getrandom() etc. Including the "it won't fail due to failure to open a device" aspect.

> inexplicably, instead of actually fixing /dev/(u)random, the Linux engineers decide to add a new getrandom() syscall which implements the correct behaviour of only blocking if the PRNG hasn't been seeded.

FWIW the OpenBSD folks first implemented getentropy() and recommended that Linux do the same[0], because devices causes various issues (e.g. chrooting, attacker-controlled FD exhaustion, …) having a reliable syscall is extremely valuable.

Sadly the Linux folks way over-engineered the thing with all kinds of tuning knobs so that you can get any preexisting behaviour you want: by default getrandom will read from the "urandom source" but block until the entropy pool has been initialised once.

However, you can also:

* GRND_RANDOM to have it read from the "random source" and block if "no random bytes are available"

* GRND_NONBLOCK to have it never block on lack of entropy, whether from the entropy pool not being initialised at all (!GRND_RANDOM) or because "there are no random bytes" (GRND_RANDOM), in which case it can also fail (getentropy should only fail if you give it an invalid buffer address or you request more than 256 bytes)

[0] https://www.openbsd.org/papers/hackfest2014-arc4random/mgp00... also through the libressl project complaining about the lack of safe way to get good random data[1]

[1] https://github.com/libressl-portable/openbsd/blob/4e9048830a...

/dev/urandom is useless because it spews non-random data if it hasn't been seeded yet.

Does it? Under what circumstances? Where can I read about it?

`man 4 random` states,

> When read during early boot time, /dev/urandom may return data prior to the entropy pool being initialized. If this is of concern in your application, use getrandom(2) or /dev/random instead.

it happened in my system in last boot, dmesg says dbus-daemon tried (twice!) to read urandom before its seeded, and the next message (same second, about 200ms) is about urandom has been seeded, it is a race condition!

> Meanwhile the BSDs have handled this correctly for years.

How BSDs handle it correctly?

And before people start bashing systemd, the cause for boot hangs was actually collection of randomness for a Xorg/Xwayland MIT cookie through gdm. So, Linus' e-mail linked by the parent is somewhat inaccurate.



Yet it is implicitly directed at Systemd that have not hesitate in the past to break userspace. More than once.

> The correct fix is to fix getrandom() to not block when it's not appropriate,...

I'm not a cryptography expert, but this suggestion doesn't look right.

Edit: IMO the main problem is the lack of a forward progress guarantee for entropy generation, even if there are suitable sources for entropy in the system.

I've worked in crypto before (not now though) and your statement is actually a common misconception. The information about whether the randomness is sourced from "true" entropy or not is in and of itself a form of entropy from the perspective of an assailant. It's enough that given some period of time, "some" amount of entropy is present and how much is dependent on the degree to which the system is used. You can use a game theoretic approach to determine how much that is.

edit: to clarify, occasionally not using /dev/random when it may block is not actually a security issue (in most cases)

How do we know which cases?

The only case where it could be an issue is at bootup, when the subsystem simply does not have enough entropy. OpenBSD solved that by storing a seed for next bootup at the end of boot and during shutdown.

So what happens at the very first boot (e.g. after system installation, or cloud instance just being spawned)? Is that the only circumstance where it would be OK to block? Does OpenBSD trust RdRand for such occasions?

You can set /etc/random.seed (or /var/db/host.random for spawned instances) prior to first boot. That's what cloud providers do IIRC. It also mixes in hardware random (if available).

> Does OpenBSD trust RdRand for such occasions?

It is used as one source among many to seed the PRNG, so I think the answer is no.

A new cloud instance seems like a small problem. The host could just generate an entropy file for the first boot.

Yes - the hard case is a little flash-based low power ARM router, cloned by the million.

From 2012, but still at least somewhat relevant (https://factorable.net/weakkeys12.extended.pdf).

> RSA and DSA can fail catastrophically when used with malfunctioning random number generators ... network survey of TLS and SSH servers and present evidence that vulnerable keys are surprisingly widespread ... we are able to obtain RSA private keys for 0.50% of TLS hosts and 0.03% of SSH hosts, because their public keys shared nontrivial common factors due to entropy problems, and DSA private keys for 1.03% of SSH hosts, because of insufficient signature randomness ... the vast majority appear to be headless or embedded devices ...

I hope the seed is very well protected.

wdym? if an attacker can access your filesystem its already game over

The same could be said for some of the entropy sources that were nerfed or removed over time due to security concerns. That's the main reason why we even are in this situation today.

PRNG needs entropy, but it doesn't consume it. You don't need to feed entropy into it continuously.

> PRNG needs entropy, but it doesn't consume it. You don't need to feed entropy into it continuously.

In the extreme case, this means you can run a PRNG with a fixed seed indefinitely, which is definitely wrong because such a PRNG will necessarily loop.

That might not be feasible to exploit, however.

> such a PRNG will necessarily loop

Given infinite time, energy, and computing power, yes. Given computers made out matter and running on energy for use by meat-based intelligences, no.

This is really analogous to saying "technically a 256-bit encryption key is brute-forceable". In fact, this is so close to being the actual underlying situation it's barely even an analogy.

> even if there are suitable sources for entropy in the system

What might those be on, say, a freshly booted RasPi that hasn't even brought up much of userspace besides systemd?

The problematic commit reduced the number of disk IO at bootup which resulted in starving getrandom() from the necessary entropy. If disk IO is already considered a reliable source of entropy then there is no reason to not actively exercise it when entropy is needed.

The Raspberry Pi has a HWRNG onboard.

Do you trust it?

You’re forced to trust the maker of your CPU, mainboard, etc. if you can’t trust your hardware you’re toast. There’s nothing special about RNGs in this regard.

Something else I'd wonder is whether there are other sources of entropy that could be used here other than the disk -- as improving disk IO is what seems to have caused this particular issue in the first place.

Hmm, isn't the solution to do something proactively to increase entropy when getrandom() is waiting for it? (especially for the first bytes)

Like inserting arbitrary reads of any available SSD or hard disk that has already been spun up, or something better if possible.

And in newer userspace, just properly save and restore a seed.

What does "arbitrary" mean if you don't have any randomness? Response times of devices are getting increasingly predictable.

The timer entropy daemon has been around for a while. At minimum one has access to a hardware random number with an output of 256 bits per second by waiting for an interrupt from a timer.

There's a limit to how far you can go with that policy but it does seem that in this case the regression was fairly obvious. As always there's a xkcd about it:


Great; and while we’re reverting obvious breakages of userspace performance, perhaps Linus can also un-break ZFS SIMD.

ZFS on Linux is neither mainline nor userspace. Feel free to either fix it or contact its maintainers. If they're lagging behind kernel versions, it's their fault.

Love the spelling joke in there.

Only the finest speling will do for the LKML.

What is with anything with the words “release notes” in the title being completely unreadable on a mobile device?

> 2015 era MacBooks and MacBook Pros get working keyboard and touchpad support with this release, courtesy of the Apple SPI driver

Anyone know if the audio for >=2016 era macbooks will ever be made to work? https://github.com/Dunedan/mbp-2016-linux

Btw: The quoted changes for MacBooks are slightly inaccurate. MacBook Pro's from 2015 already had working keyboards and touchpads as they have them connected via USB as well. The models which got support now are:

- MacBook 2015

- MacBook 2016

- MacBook 2017

- MacBook Pro 2016

- MacBook Pro 2017

Mind that MacBook Air's and MacBook Pro's from 2018 onward still don't have upstream keyboard and touchpad support.

There has been a lot of progress regarding audio in the past few weeks. Rudimentary audio support is there now and I'm confident we'll see better support soon. Check out https://github.com/Dunedan/mbp-2016-linux/issues/111 for more details.

What stand out is Google GVE driver. It's first of a kind:

1. GVE host will never see the light of the day

2. Never be used outside of Google's DCs

3. Hardware that backs its is very likely of Broadcoms origin, with their virtualisation API.

4. It made it! Unlike dozens of attempts by other companies to do the same for their proprietary in-DC hardware. How one does it?

Sadly looks like WireGuard missed the cut again

There's still progress being made, but WireGuard does do a lot of things to/for the in-kernel cryptography stuff. That's been the major holdup for getting it in. There's some that don't like that a new api is being made at all but I believe most are fine with that as long it shares as much internally as possible with the existing code so that there's not two copies of everything. A lot of that has been done last I heard but it wasn't quite ready yet for mainlining in that fashion. I suspect we'll see another RFC for it in the 5.4 merge window and maybe finally some traction at getting it into staging or all the way in.

EDIT: see sibling comment about wireguard and 5.4, apparently not slated for it this time but maybe 5.5.

Wouldn't expect something like that to be merged as late as as rc8.

Last patch set I remember seeing was v9 submitted to the mailing list in March. There was some feedback regarding the new zinc crypto interface and replacement crypto implementations. Jason indicated he would address them for the next patch submission which I don't believe has been made yet.

I'm confused about the source, didn't Linux Journal just shut down (for the second time)?

OP/submitter signed up a month ago and has spent the past week spamming nothing but links to his own splog, which uses old linuxjounal material for 99.9% of its content.

With this he has simply copied an article from Omgubuntu and spammed it via his splog, where he even puts his own name on it.

The link should be changed to the correct Omgubuntu link to credit original and up to date content and not lazy copy and paste spammers.

It helps if you email hn@ycombinator.com and tell them that. Dan replied to me in about 60 seconds.

Ah, thanks. I'll do that if there's a next time.

OP points to a domain name different than the Linux Journal Magazine's https://www.linuxjournal.com/

I for one am excited that 5.3 potentially includes a bug fix for the random hard freezes I see ~daily on my laptop with a Baytrail cpu [1].

Had sort of given up on it ever being resolved, but sounds like a patch got into 5.3 that may fix it. Installed 5.3-rc5 last night. Guess I'll wait and see if it actually resolves the issue.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=109051

Installed Ubuntu Mate 19.10 daily 5 days ago on asus t100ta (Atom) and have had absolutely no issues. Added bootia32 to liveusb as is the norm with this machine.Still no camera, but no big deal for me. Everything else works perfect. No c-state freezes and no workarounds needed. 5.3 seems to have fixed the baytrail issues as advertised.

I also have not had a hard freeze since upgrading to 5.3. Very pleasantly surprised. I've had a few times where things get very laggy, but they pull out of it. I imagine those are the situations where it would freeze up before.

If you game on Linux, note that the 5.3 upgrade DualShock controllers. A recent systemd upgrade also broke some udev rules for controllers. That is easier to fix though:

    sudo modprobe uinput.

does anybody know when Wireguard is going to be merged? can it be done before the release that will be deployed in Ubuntu 20.04 next April or is it too late?

According to this article: https://www.phoronix.com/scan.php?page=news_item&px=WireGuar...

Wireguard won't be in before 5.5.

5.5 should release in early february 2020 so it might be in Ubuntu 20.04.

Thanks, let's hope so, making it in Ubuntu 20.04 will be a huge success since Ubuntu LTS versions are used heavily and supported by all major cloud vendors

Far too late unfortunately.

Navi support has finally arrived.

In the Kernel, if you're not running the very specific version of Ubuntu they compiled you still need a ton of custom packages from mesa-git and others. I think we're still a few months from stable on all packages and then further months from those packages making it into distros.

Mesa 19.3-rc3 is enough. Besides, that you also need Navi firmware (not sure why it's not in upstream repo yet¹): https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/

That's about it. I wouldn't use Ubuntu for Navi though. Some rolling distro makes more sense.

1. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...

Are you referring to the AMD Drivers?

amdgpu in particular.

> makes 16 million new IPv4 addresses available

How is IPv4 address availability up to the Linux kernel?

They were previously allowed by the IP RFCs, but limitted by the kernel. They are in the 0/8 block.

Since I'd need to create a new account on that site specifically to send this correction...

Paragraph 6, s/Chromebook harder/Chromebook hardware/

Spellcheck is not a proofreader.

Monolithic kernel means the changelist includes everything from file systems to GPU drivers.

It also means that everything either works or doesn't work.

A dependency tracking system with mix & match of versions of subcomponents of the kernel that either work together or not looks like a nightmare to me.

> It also means that everything either works or doesn't work

Isn't quite a large percentage of the Linux kernel in the form of loadable modules, such that a given running system will only running a fraction of the total code?

Yes but these modules are built from the exact same code version as the kernel being used. No dependency hell.

But maybe such a solution could work with a micro-kernel architecture too. It seems that a micro-kernel and its components could also be gathered in a same code base. You would get the kind of changelog the first commenter of this thread mentioned though. It seems we are speaking of two different things here: code management and kernel architecture.

Ok I misunderstood, having fallen into the trap of reading and reacting to the first sentence before reading the rest. (: I didn't realize that the topic was dependency management, but rather about 'kernel breakage'. My apologies.

I have professional experience in large organizations that use monorepos, and have been impressed with the results.

It definitely works with a micro-kernel. The issue in question fits under the umbrella of "software configuration management." In other words, how do you make sure that you are running versions of all of your components which are the same as when you tested? And how do you establish a known-good set of components in the first place?

A competent engineering team should have a way to verify that all the important components of their system are controlled so that a different version can't sneak in. This could be through the revision control system, or it could be through some design notes and a verification activity.

But you have to do something to make sure your system is built from a known-working set of artifacts. And in a lot of cases your software configuration extends to your build system. I've been burned before by build system and host operating system changes that result in the software configuration of my embedded images changing and introducing bugs.

Hurd's changelist looks similar, because just since it's a microkernel doesn't mean you don't want the user space parts colocated in the source tree.

Additionally, pretty much all microkernels that support GPUs (which are very few) have about the same in kernel drivers as monolithic kernels since GPUs have their own MMUs and managing the MMU is about the only thing either wants to do in kernel space.

No - a single source tree means that it contains all that. A microkernel could easily have its source tree set up in a similar way (and there are many reasons one might want to).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact