More fundamentally, an important conceptual part of crypto is that you can use a random key of a very small, fixed size, like 16 bytes, to generate an infinite amount of output, such that knowing any part of the output doesn't help you guess at any other part of the output nor the input key. If true, this is an amazingly powerful property because you can securely exchange (or derive) a short key once and then you can send as much encrypted data back and forth as you want. If you needed to re-seed the input with as many bits as you take from the output, disk encryption wouldn't be sound (you'd need a key or passphrase as long as your disk), SSL would be way more expensive (and couldn't be forward-secure) if it even worked at all, Kerberos wouldn't work, Signal wouldn't work, etc.
The claim of /dev/random and this wacky government standard is that in fact disk encryption, SSL, etc. are flawed designs, good enough for securing a single communication but suboptimal because they encrypt more bits of data than the size of the random key, and so when generating random keys, you really ought to use "true random numbers" so that breaking one SSL connection doesn't help you break another. Whether it's a pen-and-paper cipher like Vigenère or a fancy modern algorithm like AES, anything with a fixed-size key can be cryptanalyzed and you shouldn't provide too much output with it, for your secure stuff you must use a one-time pad. The claim of the cryptography community is that, no, in fact, there's nothing flawed about this approach and stretching fixed-size keys is the very miracle of cryptography. We know how to do it securely for any worthwhile definition of "securely" (including definitions where quantum computers are relevant) and we should, because key-based encryption has changed our world for the better.
Isn't that the theory behind every stream cipher? (And stream ciphers are generally just 'simplified' one-time pads.)
That's what OpenBSD's arc4random(4) start as: the output of RC4.
The Kernel starts an ChaCha20 stream cipher with this supplied entropy while constantly mixing in timing entropy from devices.
This chipherstream supplies the Kernel with random data and once userland is up this is good enought and also used for /dev/random and /dev/urandom, which on OpenBSB is the same device(non blocking).
Now the fun part: When a userland process gets created it has a randomdata ELF segment that the Kernel fills and which is used as entropy for a new ChaCha20 stream, just for the application should it decide to call arc4random or use random data in any other way (like calling malloc or free, which on OpenBSD make heavy use of random data).
Interestingly, Linux provides 128 bits of random data on exec through the ELF auxiliary vector mechanism. (https://lwn.net/Articles/519085/) Between the disappearance of the sysctl(2) syscall and the addition of getrandom(2), it was the only way to acquire strong seed entropy without opening a file resource.
EDIT: Which makes me curious how Linux filled AT_RANDOM for init(1) and other early boot time processes. But not curious enough to comb through old code...
It uses get_random_bytes(), which is documented as "equivalent to a read from /dev/urandom."
This is news to me. When did they add this (neat) functionality?
No other OS has this.
(Unfortunately, glibc uses this data directly as stack and pointer protector cookies, instead of deriving something from it, which means it feels a little risky to use this to initialize a userspace PRNG. I guess you shouldn't be leaking cookies....)
Linux added this in v2.6.29 (2009) in https://github.com/torvalds/linux/commit/f06295b4 , and glibc in 2009 added support for using it if available to set up cookies (it previously read from /dev/urandom). That said, I don't really think the "we're the only OS" game is a game worth playing - if it's a security improvement, it's best if everyone has it, regardless of OS!
> The original version of this random number generator used the RC4 (also known as ARC4) algorithm. In OpenBSD 5.5 it was replaced with the ChaCha20 cipher, and it may be replaced again in the future as cryptographic techniques advance. A good mnemonic is “A Replacement Call for Random”.
So let's look at how a hypothetical CSPRNG might work. We get our random numbers by repeatedly hashing a pool of bytes, and then feeding the result, and various somewhat random events, back into the pool. Since our hash does not leak any information about the input (if it did, we'd have much bigger problems), this means attackers must guess, bit for bit, what the value of the internal pool of entropy is.
This is essentially how randomness works on Linux (they just use a stream cipher instead for performance)
This clarifies a few things:
1. even if you assume intels randomness instructions are compromised, it still is not an issue to stirr them into the pool. Attackers need to guess every single source of randomness.
2. "Running out of randomness" is nonsensical. If you couldn't guess the exact pool before, you can't suddenly start guessing the pool after pulling out 200 exabytes of randomness either.
Basically, you should think of /dev/random as having a front buffer and a back buffer. The back buffer has a certain amount of entropy in it, but you can't take part of that entropy out; the only thing you can do with it is add entropy or empty the entire back buffer into the front buffer. The front buffer doesn't have a entropy amount per se, what it has is a security rating; when you empty the back buffer into it, its security rating increases up to (not plus) the number of bits in the back buffer (this is not additive; a 256-bit front buffer combined with a 256-bit back buffer produces a front buffer with 256 bits, not 512) and the back buffer goes to zero. If you keep dumping the back buffer into front buffer whenever it reaches 64 bits, you'll never have a RNG that's more than 64-bit secure.
Reading from /dev/random doesn't deplete the front buffer (because CSPRNG) or the back buffer (because it doesn't interact with the back buffer). A memory-read attack on the other hand basically sets both buffers to zero - you have to start all over again.
So you can "use up" randomness by constantly wasting it to refresh a insufficiently-strong front buffer. And you can "run out" if someone is able to read your buffers (or brute force a weak buffer in, say, 2^64 CSPRNG invocations).
0: As a designer. As a user, you should treat /dev/random (like any cryptographic primitive) as something that will look perfectly secure from the outside even if it's hopelessly broken, and investigate the details of the specific implementation you're using accordingly.
1: Just like a cryptographic algorithm; the lowest rating involved determines how secure your system is. A 512-bit RNG with a 64-bit cypher is only 64-bit secure, and a 512-bit cypher fed by a 64-bit RNG is also only 64-bit secure.
/dev/random and arc4random(4) under OpenBSD originally used the output of RC4, which has a finite state size:
Rekeying / mixing up the state semi-regularly would reset things. It's the occasional shuffling that really helps with forward security, especially if a system has been compromised at the kernel level.
Yes, I know. Where did I say anything about revealing? My comment was about 'running out', which is (IIRC) a limitation of some random number generators because of how they handle internal state. Now, that state may have many, many bits, but it is still finite. An analogy I've seen is like having a (paper) codebook.
Of course, if a system is compromised, and the attacker can read kernel memory, they can probably then recreate the stream--which is why (e.g.) OpenBSD stirred things up every so often.
Also, when you look at cache side channel attacks -- RC4 definitely publishes its internal state.
But the point is mood b.c. the stream cipher used switched from RC4 to ChaCha20 like 5 years ago. And there is no side channel attack on ChaCha20, yet.
Yes, everybody does that. But how many bytes you drop matters; over the years the recommendations have gone from 256 bytes to 512 bytes to 768 bytes to 1536 bytes to 3072 bytes as attacks have gotten better.
1. The adversary has an amount of computing power that is feasible as currently foreseeable: in this case, all you need are K "truly random" bits where K=128/256/512 and can then use a strong stream cipher or equivalent to generate infinite random bits, so you only need to block at boot to get those K bits, and you can even store them on disk from the previous boot and have them passed from an external source at install time
2. The adversary has unlimited computing power: in this case, you need hardware that can generate truly random bits and can only return randomness at the rate the hardware gives you the bits
Now obviously if you are using the randomness to feed an algorithm that is only strong against feasible computing power (i.e. all crypto except one-time pads) then it doesn't make sense to require resistance against unlimited computing power for the RNG.
So in practice both /dev/urandom, /dev/random, getrandom(), etc. should only resist feasible computing power, and resisting against unlimited computing power should be a special interface that is never used by default except by tool that generate one-time pads or equivalent.
What would you need those bits for in that case? Literally the only things that comes to my mind is generating one time pads, as standard cryptography is useless in such scenario.
Actually, there are three:
3. The adversary has the ability to put backdoors in your hardware so you can't trust it to give you truly random bits at all.
This makes zero sense. You trust your processor/chipset/mobo enough to do your bidding and not leak the sensitive plaintext data your app is processing.
But you don’t trust that hardware enough to give you entropy via RDRAND?
If your adversary is someone who can undetectably compromise your hardware you’ve already lost.
Yes, because hardware that "leaks" app data would have to do it in ways that are easily detectable (for example, if an instruction to write a certain byte to one memory location also wrote it to some other location, or to an I/O port or a device driver). Whereas a compromised RDRAND can be undetectable--the bits it yields look random and pass all the tests I can give them for randomness, but are still predictable by the attacker.
The universe of potentially undetectable hardware compromises is a much larger threat than a potential compromise of an RNG which is certain to be under constant scrutiny by security researchers.
You assume the existence of an attacker who is able and willing to compromise RDRAND but unable ore unwilling to implement one of a million other compromises at the same time. This seems unlikely.
Not to me, for the reasons I've already given. I guess we'll just have to agree to disagree.
Hence, this is not a credible threat model.
The strongest argument in favor of not making this change was there are some (misguided) PCI compliance labs which had interpreted the PCI spec as requiring /dev/random, and changing /dev/random to work like getrandom(2) might cause problems for some enterprise companies that need PCI compliance. However, the counter-argument is that it wasn't clear that the PCI compliance labs somehow thought that /dev/random was better than getrandom(2); it was just as likely they were so clueless that they hadn't even heard about getrandom(2). And if they were that clueless, they probably wouldn't notice that /dev/random had changed.
If they really did want TrueRandom (whatever the hell that means; can you guarantee your equipment wasn't intercepted by the NSA while it was in-transit to the data center?) then the companies probably really should be using some real hardware random number generator, since on some VM's with virtio-rng, /dev/random on the guest was simply getting information piped from /dev/urandom on the host system --- and apparently that was Okey-Dokey with the PCI labs. Derp derp derpity derp....
I was confused for a bit since we're talking about the kernel...
This talk from 2015 springs to mind to me as the previous widely discussed one.
But the ncc article might of course have been the straw that healed the camels back.
Entropy tracking was used in the original versions of PGP because there were those people who had a very healthy (for that time) lack of confidence in "strong cryptographic algorithms" actually being strong.
As PBS Space Time once said, discussing Pilot Wave Theory and why it's considered unorthodox when compared to the Many Worlds interpretation of Quantuum Theory, "Orthrodoxy == Radicalism plus Time". There was a time when the Many Worlds interpretation was considered out there.
Similarly, there was a time when not trusting crypto algorithms as being Forever Strong was normal, and designing a network protocol like Wireguard without algorithm agility would have been considered highly radical. Today, trusting "strong cryptographic primitives" is considered an axiom. But remember that an axiom is something that you assume to be true and use as the basis of further proofs. Just as a Voodoo practitioner assumes that their belief system is true....
See also: https://latacora.micro.blog/2019/07/16/the-pgp-problem.html
>... the GnuPG team's belief that they can't make breaking changes without synchronizing with the IETF OpenPGP working group ...
That does not actually sound like a bad thing to me.
The linked rant against OpenPGP/GnuPG takes the form of a semi-random list of minor issues/annoyances associated with the OpenPGP standard and the GnuPG implementation mixed together in no particular order. It ends with the completely absurd solution of just abandoning email all together. So you have to explain which parts of it support your contention.
The OpenPGP standard is in reality one of the better written and implemented standards in computing (which isn't saying much). There may in the future be something better but it is downright irresponsible to slag it without coming up with any sort of alternative. It is here and it works.
It was literally not GPG's fault or the fault of the standard.
Stuff like this happened to TLS for a decade, and then the TLS working group wised up and redesigned the whole protocol to foreclose on these kinds of attacks. That's never going to happen with PGP, despite it having a tiny fraction of the users TLS has.
Cipher: AES256, AES192, AES, 3DES
Digest: SHA512, SHA384, SHA256, SHA224, SHA1
Compression: ZLIB, BZIP2, ZIP, Uncompressed
Features: MDC, Keyserver no-modify
Throwing out the discontinued things (Outlook 2007) and the webmail things that PGP can't be even sort of secure on (Roundcube, Horde) we end up with 7 bad clients out of a total of 27 good clients. So 26%. To get that they allegedly had to downgrade GPG.
This is like counting how many TLS implementations were vulnerable to Heartbleed. Was WolfSSL? GnuTLS? TLS.py? What, just OpenSSL? I guess things are looking pretty good!
... and just to be clear, we are only talking about Efail here... There is no pattern of client information leakage issues... So it is hard to generalize.
Much of this thread predicates on your claim that an issue which could have been caught in an end-use implementation is only a flaw in that implementation. By contrast, I’m claiming that it’s the responsibility of a secure specification and ecosystem to guard against classes of misuse, such that end-use implementations are not each individually required to mitigate that class of issue.
None of the above is specific to EFail, but the resulting threshold is noteworthy for EFail. Even if, by your numbers, we’re talking about 26% (I’m not sure why it would be “or zero depending on what you believe happened here”, since there’s not really any interpretation of what happened here where 0% of implementations were impacted), that’s a quarter of implementations that were impacted by this class of misuse (the ecosystem/specification not enforcing that a MAC check pass before returning decrypted plaintext).
As tptacek points out in a parallel comment, this is a pretty skewed measurement, because that 26% of implementations accounts for the vast majority of actual users (for example, “Thunderbird” and “GPGTools” account for the same weight in your percentage as “United Internet” and “Mailpile”). But even so, if a quarter of apples in my basket were bad, I’d potentially stop blaming individual apples and start to wonder about quality control at the grocery store.
As is exemplified by newer libraries like Nacl / libsodium, a primary goal of a strong cryptographic library is providing interfaces which make correct usage easy and avoid classes of misuse, so that authors of end-use tools are not each required to replicate the same correct sequence of sanity checking and safeguards, with any misstep being a security fault for their users.
I’m still curious for your threshold. For example, by your measurement methodology, is the EFail attack on S/MIME clients purely a client error? In that case, 23 of 25 tested end-use implementations were vulnerable, or 92%. Is 92% enough widespread impact for ownership to bubble up to the overall S/MIME specification, to guard against this kind of misuse?
None of this changes the fact that the GPG people claimed that the current implementation was not vulnerable with any of the clients and that the Efail people had to downgrade GPG so they could even mention GPG at all. If that is true (and there is evidence that it was) then the whole thing was just a hoax, at least as presented.
In other words, at the time of Efail, there was nothing further that the GPG people could do to work around the dodgy client implementations that were leaking data in general. They had already done it.
Even if the GPG implementation and/or OpenPGP standard had of been entirely broken, this is still mostly the email clients fault. The information leak from URLs loaded in HTML emails was up to that point routinely exploited. Heck, it is still routinely exploited. Efail did not actually result in a fix for all or even most of the email clients affected.
Given that you’d established that this issue, in your opinion, wasn’t with GPG but instead was with the end-use implementations, I thought that discussing that threshold would help clarify the point under discussion.
I didn’t ask about an “acceptable” level of anything, nor was this intended to make the discussion about you, except insofar as you are a party to the discussion, so I was attempting to get more details on your position.
The new position you’ve given, that EFail was a “hoax” and that GPG wasn’t vulnerable, is pretty readily false given the details already provided as part of the EFail disclosure. The claim from the GnuPG devs (between https://lists.gnupg.org/pipermail/gnupg-users/2018-May/06032... and https://lists.gnupg.org/pipermail/gnupg-users/2018-May/06033... ) is that GnuPG will print a warning if the MDC is stripped or fails to validate. This isn’t disputed by the EFail release, which notes that the issue occurs because decrypted plaintext is returned alongside the eventual warning, and that common clients will utilize the decrypted plaintext despite the warning.
This is the crux of the discussion as to whether this is an end-use implementation problem or a spec/ecosystem problem. My position is that as a security-focused tool, GnuPG should not present foot-gun opportunities like this which require all end-use implementations to handle their own validation for these kind of failure modes. A proper security-focused tool would refuse to provide decrypted plaintext in the case that the MAC check failed, because it would have required the MAC check to pass before ever starting to decrypt anything.
There is a tendency in these sorts of things to get so wrapped up in the details that the root issue gets forgotten about. In this case the root issue is the leakage of information from HTML emails. After that, what really matters here? What point is there in considering each and every thing that could of been different that would of prevented the attack? Sure, if I hadn't of left the house on Thursday I would not of been hit by the bus, but this particular insight is not valuable in any way.
The hoax here is the suggestion that PGP (and S/MIME for that matter) was broken in some way. The original paper was called "Efail: Breaking S/MIME and OpenPGP Email Encryption using Exfiltration Channels" which was not just misleading, it was straight up wrong.
The point of considering each thing that could have prevented an attack is clear, and is a central part of threat modeling and defense in depth. Those concepts aren’t really controversial. Thinking critically about the parts of a system that can contribute to adverse results, and then applying mitigations and avoiding pitfalls, is a pretty core part of basically all engineering (software and otherwise).
The bus analogy (if I hadn’t left the house on the day I got hit by a bus, I’d not have gotten hit) would, in a threat modeling context, be accurately identified as both ‘definitely true’ and ‘low probability’. Yes, leaving your house is dangerous, for a variety of reasons. But the relative danger of leaving your house vs not leaving your house is more questionable (staying inside the house is likewise dangerous), and the probability of leaving-the-house causing hit-by-bus is low. But a security tool returning plaintext despite a MAC fail isn’t like leaving your house, it’s like looking both ways once you’re already crossing the street. GPG warns you there’s a car coming, but you’re already standing in front of the car. A dexterous human could potentially dive out of the way, as an end-use implementation could discard the decrypted plaintext when it sees the MDC warning, but a root-cause-analysis would still rightly suggest that you should be looking both ways before crossing, rather than during.
Taking this thread in aggregate, it’s interesting to me how the goalposts keep shifting. The original thrust was “There were no real weaknesses in the OpenPGP standard or the GnuPG implementation of that standard”. When pressed, your position shifted to include “Even if the GPG implementation and/or OpenPGP standard had of been entirely broken, this is still mostly the email clients fault.” You’ve now further shifted to questioning why we should even worry about whether GPG could be better (“What point is there in considering each and every thing that could of been different that would of prevented the attack?”).
I don’t understand the rigidity with which you refuse to consider the possibility that GPG could have better handled this kind of threat.
Oh, you're commenting badly on purpose because you misinterpreted something as a 'trap'? Great.
It's not a trap question. If something gets misimplemented once, it's maybe probably not an issue with the spec. If it happens over and over again, it's suspicious.
PGP/GPG has certainly falled out of favor.
Use a non-Linux OS with reasonable /dev/(u)random or use Linux with its Sophie's choice:
/dev/random will give you something that's probably good, but will block for good and bad reasons.
/dev/urandom will never block, including when the random system is totally unseeded.
GnuPG could not use /dev/urandom, since there was no indication of seeding, so it had to use /dev/random which blocks until the system is seeded and also when the entropy count of nebulous value was low. Most (all) BSDs have /dev/urandom the same as /dev/random, where it blocks until seeded and then never blocks again . This behavior is available in Linux with the getrandom() syscall, but perhaps GnuPG hasn't updated to use it? Also, there was some discussion in the last few months of changing the behavior of that syscall, which thankfully didn't happen, in favor of having the kernel generate some hopeful entropy on demand in case there is a caller blocked on random with an unseeded pool.
GnuPG has been using getrandom() where available for over a year. Obviously some distros may not yet have updated to a recent enough version, but it (and OpenSSL) are no longer among the offenders that cause /dev/random blocking hangs.
Thanks for breaking that down for me!
The problem is, before this patch, Linux keeps track of an entropy estimate for /dev/random, and if the estimate gets too low, read requests will block. Each read reduces the estimate significantly, so something that does a lot of reads makes it hard for other programs to do any reads in a reasonable amount of time.
If you knew the system was seeded, you could use urandom instead, but there's not a great way to know. Perhaps, you could read from random the first time, and urandom for future requests in the same process... but that only helps in long running processes; also reading once from random and using it as a seed to an in-process secure random generator works almost as well. The getrandom() syscall is really the way forward, but you would need to keep old logic conditionally or accept loss of compatibility with older releases.
In summary, it's not really fair to say GnuPG is doing it wrong, when they didn't have a way to do it right.
Per-process CSPRNGs are pretty common. Most programs don't fork without exec, no problem for them. Managing a per-process CSPRNG is only hard for libraries that might be used by some programs that fork without exec, and don't want to require the program to do anything right.
No! It's not hard, just don't screw it up. This is true of most things.
A user space CSPRNG is just a foot-gun waiting to go off.
And then suddenly, as if overnight, the "crypto community" was all about crapping on it. Open source and open standards were suddenly not so important, for reasons that were never really explained. Proprietary "secure" hardware was suddenly fine and not worth worrying about. Automated updates from a single vendor, yeah, why not. And a theoretical cryptographic property whose real-world impact was marginal-to-nonexistent (perfect forward secrecy) was suddenly the most important thing and a reason to write off any existing cryptosystem.
Call me a conspiracy theroist, but something stinks there.
The current defaults GPG presents aren't that safe anymore and everyone who wants to develop integration with GPG suffers extreme pain because for GPG therer is only the CLI Interface.
Modern E2EE-capable chat solutions are a good replacement, which are cryptographically stronger and don't have the same chances of blowing up as GPG does.
I don't think it's that much of a conspiracy there is a bit of time between those events, it's simply that in the latest years, people are advocating for security tools that prefer being resistant to misuse (GPG isn't) and safe by default (GPG isn't) over other tools.
> Modern E2EE-capable chat solutions are a good replacement, which are cryptographically stronger and don't have the same chances of blowing up as GPG does.
I'm not convinced. Most or all of these chat solutions seem to involve closed-source code, single-vendor implementations, closed networks, complicated protocols that lead to incomplete analysis, lack of pseudonymity, and an embrace of closed-source operating systems and hardware, and I think those things are still just as worrying as they were 10 years ago. I'm all for improving on the safety and usability of GPG, but I don't think the tradeoff in overall security that we're currently offered is a good one.
The first response to start such a process is taligent in response #22.
A useful addition to that is #23 where Steven Ayre suggests opening that as a separate bug that focuses solely on this issue.
I'm not sure what the purpose is for the other responses you received. They seem to seek to use the breadth of your issue report to widen the discussion to maximally contentious security topics.
This is even true for shell scripting, see for instance http://www.tldp.org/LDP/abs/html/randomvar.html
> We discuss how to configure and use turbid, which is a Hardware Random Number Generator (HRNG), also called a True Random Generator (TRNG). It is suitable for a wide range of applications, from the simplest benign applications to the most demanding high-stakes adversarial applications, including cryptography and gaming. It relies on a combination of physical process and cryptological algorithms, rather than either of those separately. It harvests randomness from physical processes, and uses that randomness efficiently. The hash saturation principle is used to distill the data, so that the output is virtually 100% random for all practical purposes. This is calculated based on physical properties of the inputs, not merely estimated by looking at the statistics of the outputs. In contrast to a Pseudo-Random Generator, it has no internal state to worry about. In particular, we describe a low-cost high-performance implementation, using the computer’s audio I/O system.
On randomness, http://www.av8n.com/turbid/paper/turbid.htm#sec-raw-randomne...
> Understanding turbid requires some interdisciplinary skills. It requires physics, analog electronics, and cryptography.
There's basically only three problems generating good randomness:
1. Very early during boot you don't have a lot of good sources.
2. On very constrained devices you have limited external input.
3. Bugs in the implementation.
Randomness from soundcards doesn't help with any of these. They probably aren't yet initialized at the point in time where it matters most, they don't exist for the most problematic devices and bugs are bugs, no matter what your source of randomness.
"Randomness from soundcards" is also spectacularly unlikely to help on the cloud. Pretty sure they don't fit SoundBlaster16s to EC2 instances or Digital Ocean Droplets...
In my experience it's not usually super difficult to get a decent amount of entropy on complex desktop or servers, it's only a real issue on simple embedded hardware where you might have no way to sample the environment and all the timings are ultra-deterministic. In this case I've resorted to using temperature measurements and fan speed as a source of entropy during early boot, which isn't ideal.
Philosophically, does that make it more or less random? The whole category of side-channels suggests that there are problems with sharing a system like this.
There's also the philosophical question of how cryptographically secure a fully virtual system can be when the host has full inspection and control capability. If you're running in the cloud you need to think carefully about how to delegate this to the platform if at all possible.
This is sort of the problem the CPU random number generator is intended to solve, but see the discussion on trust.
This paper demonstrates that by adding a small amount of dopant to the RDRAND circuitry, you can weaken it enough while it still pass NIST suite. And the modification is undetectable.
In this paper we introduced a new type of sub-transistor level hardware Trojan that only requires modification of the dopant masks. No additional transistors or gates are added and no other layout mask needs to be modified. Since only changes to the metal, polysilicion or active area can be reliably detected with optical inspection, our dopant Trojans are immune to optical inspection, one of the most important Trojan detection mechanism. Also, without the ability to use optical inspection to distinguish Trojan-free from Trojan designs, it is very difficult to find a chip that can serve as a golden chip, which is needed by most post-manufacturing Trojan detection mechanisms.
To demonstrate the feasibility of these Trojans in a real world scenario and to show that they can also defeat functional testing, we presented two case studies. The first case study targeted a design based on Intel’s secure RNG design. The Trojan enabled the owner of the Trojan to break any key generated by this RNG. Nevertheless, the Trojan passes the functional testing procedure recommended by Intel for its RNG design as well as the NIST random number test suite.This shows that the dopant Trojan can be used to compromise the security of a meaningful real-world target while avoiding detection by functional testing as well as Trojan detection mechanisms. To demonstrate the versatility of dopant Trojans, we also showed how they can be used to establish a hidden side-channel in an otherwise side-channel resistant design. The introduced Trojan does not change the logic value of any gate, but instead changes only the power profile of two gates. An evaluator who is not aware of the Trojan cannot attack the Trojan design using common side-channel attacks. The owner of the Trojan however can use his knowledge of the Trojan power model to establish a hidden side-channel that reliably leaks out secret keys.
It's much much harder. They'd have to insert something on the frontend (where the instruction decoder is) or on the L1 instruction cache to recognize when it's running that specific piece of code; both parts are very critical for the processor performance, so every gate of delay counts. And that's before considering that the Linux kernel code changes unpredictably depending on the compiler, kernel configuration options, and kernel release. Oh, and you have to be very precise in detecting that code, to make sure nothing else misbehaves or even gets slower (some people count cycles on parts of their code, so an unexpected slowness would get noticed).
Contrast with RDRAND, an instruction which is defined to return an unpredictable value; it would be simple to make its output depend on a counter mixed with a serial number and a couple of bits of real randomness, instead of being fully random. It's not even on the performance-critical part of the chip; it's isolated on its own block, so adding a backdoor to it would cause no performance problems, and would break no other software.
There are ways of mixing RDRAND into the entropy pool safely and this can be done easily. Why would you deliberately choose to not mix RDRAND and use it directly instead? You wouldn't. It makes no sense. Therefore, RDRAND should be mixed into the pool, it is being mixed into the pool and there is no more reason to debate this.
Malicious entities don't play by made up rules. RDRAND being a convenient scapegoat seems like exactly the thing they'd want, too.
Imagine a parallel universe where everybody happily just uses RDRAND to get random numbers. For example when you connect to an HTTPS server, as part of the TLS protcol it sends you a whole bunch of random bytes (these are crucial to making TLS work, similar approaches happen in other modern protocols). But in that universe those bytes came directly from RDRAND, after all it's random...
Except, if RDRAND is actually an exfiltration route, what you've just done is carve a big hole in your security perimeter to let RDRAND's owners export whatever they want.
XORing RDRAND into an apparently random bitstream is thus safe because the bitstream is now random if either RDRAND works as intended OR your apparently random bitstream is indeed random.
That's making assumptions. For instance, it wouldn't be beyond the realms of possibility for the compromised CPU to also track which registers contain a result produced from RDRAND, and make (X XOR RDRAND) produce a predictable result. After all, RDRAND is already an undefined number, so the system can legitimately decide later what it would like it to be. Yes, it would require more silicon in the registers, re-ordering and dispatch system, and ALU, but it would be feasible.
Obviously, if you store the result of RDRAND to main RAM and read it back in later, this would defeat the system as the flag wouldn't be preserved. But I'm guessing most code won't do that for performance reasons.
The simpler option of just making RDRAND predictable is less powerful, because then the operating system can compensate with randomness obtained from elsewhere. The attack above allows the CPU to actually compromise the operating system's own randomness source.
To modify RDRAND so that it is less random in a way that's useful for an attacker, yet passes statistical randomness testing by the OS and other software, would require RDRAND to implement something cryptographic, so that only the attacker, knowing a secret, can "undo" the not-really-randomness.
A new crypto block would surely be very noticable at the RTL level.
Being defensive against RDRAND is just one (relatively easy) way to defend oneself against a (relatively easy to implement) backdoor. Yes, this defense isn't perfect, because there can be other (more difficult to escape) backdoors, but that's not a compelling reason not to avoid the “easy” ones…
Verifying an ADD instruction is very easy and if it were to return wrong results then it would be obvious that it is buggy. Programs would cease to work. Alternatively it would have to be incredibly smart and detect the currently executed program and exfiltrate the encryption key during the execution of the encryption function.
The first backdoor scales to millions of potential targets. The second is a carefully targeted attack against a known enemy. Targeted attacks have cheaper alternatives than silicon back doors.
There's a very important difference: ADD and MUL are deterministic, while the output of RDRAND is random. If ADD or MUL or FDIV return incorrect results, that can be detected (as the Pentium has shown). If RDRAND returns backdoored values, it cannot be detected by only looking at its output; you have to check the circuit.
Surely you can check the output and see if it's random? Don't these attacks rely on perturbing the RNG do it's no longer a TRNG, isn't output the only way to tell??
No you can't. That's an inherent property of randomness. If you're lucky you can win the lottery 10 times in a row. The only thing you can verify is that the random number generator is not obviously broken (see AMD's RDRAND bug) but you can't verify that it's truly random.
> isn't output the only way to tell??
Looking at the implementation is the only way to tell.
Let's say I generate a list of one million random numbers through a trustworthy source.
Now I build a random number generator that does nothing but just return numbers from the list.
There is an impractical way of verifying a random number generator that is only useful in theory. Remember how you can flip a coin and take 1000 samples and you get roughly 1/2 probability for each side? The number of samples you have to take grows with the number of outcomes. If your RNG returns a 64 bit number you can take 2^64*x samples where x is a very large number (the larger the better). x = 1 is already impractical (50 years @ 3 billion RDRAND/s) but to be really sure you would probably need x > 10000. Nobody on earth has that much time. Especially not CPU manufacturers that release new chips every year.
Even if you cannot be 100% sure that something is not really random, there are plenty of statistical measurements that can be used to assess the quality of the output (in terms of entropy).
> Let's say I generate a list of one million random numbers through a trustworthy source. Now I build a random number generator that does nothing but just return numbers from the list.
In less than a second your generator would be stalled which is a pretty obvious flaw to see.
The real issue is cryptographic algorithms because they are designed to simulate randomness, and they are adversarially improving when statistic methods of studies become more powerful. At every single point in time, state of the art cryptography is going to be able to produce fake random that the current state of the art cryptanalysis cannot prove as not random.
That may not be true. (As in, I'm not sure it is.)
For many useful crypto algorithms where we give a nominal security measure (in bits), there is a theoretical attack that requires several fewer bits but is still infeasible.
For example, we might say a crypto block takes a 128-bit key and needs an order of magnitude 2^128 attempts to brute force the key.
The crypto block remains useful when academic papers reveal how they could crake it in about 2^123 attempts.
The difference between 2^128 and 2^123 is irrelevant in practice as long as we can't approach that many calculations. But it does represent a significant bias away from "looks random if you don't know the key".
It seems plausible to me that a difference like that would manifest in statistical analysis that state of the art cryptanalysis could prove as not random by practical means, while still unable to crack (as in obtain inputs) by practical means.
1. Make a kernel config option that makes /dev/urandom block until entropy pool initialises.
2. Make a dependant kernel config option so /dev/random is /dev/urandom.
There done. Everyone can have their own choice on what security they want.
`cat /dev/random` may still block, but only once per reboot. It may block if it is called so early that not enough entropy has been gathered yet. Once there enough entropy has been gathered it will never block again.
I think this is a change for the best, in particular this bit sounds completely true to my ears:
> Theodore Y. Ts'o, who is the maintainer of the Linux random-number subsystem, appears to have changed his mind along the way about the need for a blocking pool. He said that removing that pool would effectively get rid of the idea that Linux has a true random-number generator (TRNG), which "is not insane; this is what the *BSD's have always done". He, too, is concerned that providing a TRNG mechanism will just serve as an attractant for application developers. He also thinks that it is not really possible to guarantee a TRNG in the kernel, given all of the different types of hardware supported by Linux. Even making the facility only available to root will not solve the problem: Application programmers would give instructions requiring that their application be installed as root to be more secure, "because that way you can get access the _really_ good random numbers".
The number of time I've had to deal with security-related software and scripts who insisted in sampling /dev/random and stalling for minutes at a time...
* Only FreeBSD symbolically links, and it does it in the other direction. urandom is the symbolic link to random.
* OpenBSD has four distinct character device files: random, arandom, srandom, and urandom.
* NetBSD (as of 2019) has two distinct character device files: random and urandom. They have different semantics from each other. https://netbsd.gw.com/cgi-bin/man-cgi?rnd+4+NetBSD-current
$ ls -l /dev/*random*
lrwxr-xr-x 1 root wheel 7 Dec 10 15:05 /dev/random@ -> urandom
crw-r--r-- 1 root wheel 45, 0 Jan 6 15:30 /dev/urandom
$ ls -F /dev/*random*
/dev/arandom /dev/random /dev/srandom /dev/urandom
Edit: better link
That’s why it is still blocking until it has been sufficiently initialized. After it has gathered sufficient entropy, the pool’s entropy is not exhausted by reading from it. /dev/random assumes that reading 64 bits from it will decrease the entropy in its pool by 64 bits, which is nonsense.
Linux’s PRNG is based on cryptographically strong primitives, and reading output from /dev/random does not expose its internal state.
Your pointless rant just indicates that you do not really understand what’s going on.
To amplify Hendrikto's point, /dev/random is implemented to "believe" that if it has 128 bits of randomness, and you get 128 bits from it, it now has 0 bits of randomness in it. 0 bits of randomness means that you ought to now be able to tell me exactly what the internal state of /dev/random is. I don't mean it vaguely implies that in the English sense, I mean, that's what it mathematically means. To have zero bits of randomness is to be fully determined. Yet this is clearly false. There is no known and likely no feasible process to read all the "bits" out of /dev/random and tell me the resulting internal state. Even if there was some process to be demonstrated, it would still not necessarily result in a crack of any particular key, and it would be on the order of a high-priority security bug, but nothing more. It's not an "end of the world" scenario.
Its completely nuts.
What that in mind, zero is ok as a value. You may not be able to calculate the state of /dev/random given the tools and techniques available to you, but that doesn't make zero an incorrect lower bound on what you could mathematically calculate from the extracted data.
The meaningful states of a CSPRNG are "initialized" or "not". Once initialized, there is never a diminishment of "entropy".
(What I disagreed with is the argument made by the GP, not you, that the Linux entropy value was incompatible with their in-principle mathematical description of "true" entropy. Pretty irrelevant to real cryptography.)
That's fine if you trust the PRNG. Linux used to at least attempt to provide a source of true randomness. You and Hendrikto are essentially asserting that everyone ought to accept the PRNG output in lieu of true randomness. Given various compromises in RNG primitives over the years, I'm not so sure it's a good idea to completely close off the true entropy estimation to userspace. I prefer punting that choice to applications, which can use urandom or random today at their choice.
Maybe everyone should be happy with the PRNG output. T'so goes further and argues, however, that if you provide any mechanism to block on entropy (even to root only), applications will block on it (due to a perception of superiority) and so the interface must be removed from the kernel. I see this change as an imposition of policy on userspace.
Linux never provided a source of true randomness through /dev/random. The output of both /dev/random and /dev/urandom is from the same PRNG. The difference is that /dev/random would provide an estimate of the entropy that was input to the PRNG, and if the estimate was larger than the number of bits output, it would block.
No, what I am asserting is simply that the idea that you drain one bit of randomness out of a pool per bit you take is not practically true unless you can actually fully determine the state of the randomness generator when you've "drained" it. No less, no more. You can't have "zero bits of entropy" and also "I still can't tell you the internal contents of the random number" generator at the same time, because the latter is "non-zero bits of entropy". Either you've got zero or you don't.
As of right now, nobody can so determine the state of the random number generator from enough bits of output, we have no reason to believe anybody ever will , and the real kicker is even if they someday do, it's a bug, not the retroactive destruction of all encryption ever. A SHA-1 pre-image attack is a much more serious practical attack on cryptography than someone finding a way to drain /dev/random today and get the internal state.
It's only true in theory that you've "drained" the entropy when you have access to amounts of computation that do not fit into the universe. Yes, it is still true in theory, but not in a useful way. We do not need to write our kernels as if our adversaries are turning galaxy clusters into computronium to attack our random number generator.
: Please carefully note distinction between "we have no reason to believe anyone ever will" and "it is absolutely 100% certain nobody ever will". We have no choice but to operate on our best understandings of the world now. "But maybe somebody may break" doesn't apply to just the current random generator... it applies to everything including all possible proposed replacements, so it does not provide a way to make a choice.
It's not complete nonsense. Suppose for explanation purposes that you had a pool with only 256 bits (2^256 possible states), and you read 64 bits from it. Of these 2^256 possible states, most would not have output that exact 64-bit value you just read; on average only 2^192 of the possible states would have resulted in that output value. Therefore, once you know that 64-bit value, the pool now has only 192 bits of entropy (2^192 possible states). Read three more 64-bit values, and on average only one of the 2^256 originally possible states could have resulted in these four 64-bit values; since there's only one possible state, the pool's entropy is zero.
However, like you said "Linux’s PRNG is based on cryptographically strong primitives": the only known way to find which of the 2^256 possible states could have resulted on these four 64-bit values would be to try all of them. Which is simply not viable, even with all the computing power in the world. That is, once the number of possible states (the pool's entropy) gets above a threshold, even if you later exhaust all the theoretical entropy from it, there's still no way to know the pool's state.
 Let's say you plug in a USB based hardware RNG, after booting. You turn a software entropy source on, after booting. /dev/random would immediately take advantage of the RNG. Already opened /dev/urandom streams wouldn't, until they are closed. (For how many people is this a critical feature?)
I was under the impression that urandom does not wait for initial entropy on Linux. Am I mistaken/has this changed?
 Notably, Solaris does not overcommit for fork.
The main allocation APIs allow for a "reservation" type of allocation, which reserves some of the virtual address space in the process, without actually allocating it. But those need to be converted to real allocations before use, and doing that adds to the processes commit charge.
I never found any API for actually making memory available to a process without increasing that processes's commit charge. That would obviously be necessary for real overcommit, as to my knowledge it is supposed to be an invariant of Windows that the sumOfAllProcessesCommitCharge < physicalMemory+pageFileSize.
If an API exists for this, I must have missed it.
I suppose a process could simulate overcommit by catching the Access Violation, verifying that the violation was a read/write of a reserved page allocated with some special "virtual overcommit" allocator, and requesting that it be committed, and resuming execution. Needless to say, such user mode page fault handling will be significantly slower than a kernel doing it.