Hacker News new | past | comments | ask | show | jobs | submit login
SSH gets protection against side-channel attacks (undeadly.org)
430 points by throw0101a 30 days ago | hide | past | web | favorite | 158 comments

RAM encryption for sensitive data is overlooked in so many applications, even "highly secure" applications like veracrypt [0] only recently started adding it.

In my opinion server-applications of all sorts should encrypt their private keys by default; this makes cold-boot attacks and other memory-escape attacks so much harder, since now two totally unrelated memory chunks have to be combined in order to retrieve the private key (in fact one could argue that, the higher the quantity of memory chunks is, the harder it is to correctly decrypt the original private key; although to me this has the security through obscurity smell).

In practice however, this is rarely done. OpenSSL for example has a whole lot of memory-management functions [1], but none of them seem to include RAM encryption (AFAIK it is not present in the source code elsewhere either, but it's a large one, so I am not familiar with all code. Related: in fact some people even argue it's way too large for it's own good [2]).

Even better still, would be to use the CPU Cache for private keys, since this memory is even more difficult to access through any sort of cold boot. An interesting paper with much more information about this subject can be found here [3], to whom it may concern.

[0] https://github.com/veracrypt/VeraCrypt/commit/321715202aed04...

[1] https://www.openssl.org/docs/man1.1.0/man3/OPENSSL_zalloc.ht...

[2] https://queue.acm.org/detail.cfm?id=2602816

[3] https://www.ieee-security.org/TC/SP2015/papers-archived/6949...

Yup, we added this feature to Varnish Cache a few years ago, random key encryption. It generates a random key at startup and encrypts all memory with it. Since this kind of memory is only resident for the lifetime of the process, it works. We stored the random key in the Linux kernel using the crypto API [0] just because its not safe storing any kind of keys in a memory space used for caching (Cloudbleed [1]). We then use the key to generate a per object HMAC, so each piece of data ends up with its own key, which further prevents something like Cloudbleed. Since we used kernel crypto, overhead was about 50%. If you stay completely in user space, its probably much lower.

[0] https://www.kernel.org/doc/html/v4.17/crypto/userspace-if.ht...

[1] https://en.wikipedia.org/wiki/Cloudbleed

Could this be implemented at the OS level, i.e. whenever a proces launches, the OS generates a key that it will keep to itself and use to transparently encrypt all memory allocated by that process?

My first thought was to try to use 'containers' (cgroups) combined with the AMD secure memory extensions to achieve this type of isolation using as much off the shelf hardware as possible.



From the quick description it sounds like this provides a way of encrypting, per memory page, based on a symmetric key that is backed by some level of hardware encryption. It was not clear (in a quick read) how or where to specify the key by which an individual page is encrypted. That would be a critical component of comprehension with respect to identifying if this could be used to encipher individual processes and further isolate memory. It sounds like it might be possible to establish per-process memory isolation, which is probably the best level of security possible without resorting to entirely isolated hardware.

Per-process keys aren't really possible because memory can change process ownership (vmsplice) or be shared across processes (fork, page cache, memfd). It might be possible for pages marked MADV_DONTFORK

Additionally a per-process key does not help against spectre style attacks where you would trick the process into speculating on protected memory.

You'd probably want a hardware module to do that lest performance plummets. Memory controllers can already deal with ECC efficiently, adding a simple cypher on top of it should definitely be feasible.

Possibly, but memory is accessed using plain CPU instructions, so it would be hard to transparently encrypt all memory for an application at the kernel level. You do have virtual memory, but I dont think that could be leveraged for this. But who knows whats possible there, maybe if you align and address each memory value at the page boundaries and always force a page fault you could have a really poor implementation :)

Transparent disk encryption, not a problem since devices have filesystems which can implement encryption at that layer.

Modern Intel chips can encrypt memory on the fly without performance loss (SGX does this). However I think it's not exposed for non-enclave use. Perhaps it should be.

Note: inside the enclave there is a performance loss but that's due to MAC checks. If you just want encryption without integrity against tampering you don't need that.

But that wouldn't prevent (mitigate) cloudbleed anymore as the problem is about isolating contexts within process boundaries.

Technically yes, but practically no, because mediating all memory reads through the kernel would be very slow.

SME/MKTME add hardware support for this.

Yes. Most research makes CPU modifications since that makes the most sense. Sometimes they try to use OS-level techniques. Here's a survey showing some of each:


Just to clarify my understanding, the reason for doing this is so that random sampling / leakage of the contents of the RAM stops being useful, you need to specifically get the key (and then presumably a whole chunk of encrypted RAM to decrypt?)?

Yup. When something goes wrong in these kinds of applications, you sometimes tend to just randomly dump memory, which is a huge data leak. Or even worse, if someone figures out a way to force a data leak, then your are completely compromised. Having each piece of data with its own key and that key is a combination of data outside of the process address space drastically lowers the chances of data leakage and total compromise.

Could you point me to the relevant source code? Am highly interested to take a look at it during the weekend.

No offense, I am genuinely curious, why would anyone use any closed source software for anything related to security after the Snowden revelations?

> Closed source

Ah, so we'll just have to trust you that it's doing anything at all, then.

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."


> "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

> https://news.ycombinator.com/newsguidelines.html

Forgive me but can we not be skeptical of claims made about a commercial product?

Of course you can, and there are plenty of ways to do so that don't break the site guidelines. Cheap, snarky one-liners are not the way. If someone's posting about their own work, there's no need to be disrespectful.

It's also not helpful to post such a clichéd dismissal of what someone else says or their work. That's in the site guidelines too.


Isn’t his reply valid and an exception to the rule given the context?

We have no problem sharing our codebase with customers, especially if there are concerns like this. Shoot me a msg if you are genuinely interested in anything you have read.

I've tried implementing a couple of (toy) password managers over the years and dealing with private keys is genuinely complicated. Even something as trivial as making sure that the memory gets zeroed correctly when you discard the key is trickier than it seems. Compilers these days are very good at detecting "dummy" memsets to memory that's never read afterwards and optimize them away. You have to use some dirty tricks to get the compiler to do what you want (copious amounts of volatile helps).

There's also the problem of making sure that your internal API is sound and doesn't copy the key around. In languages like C it's relatively easy to achieve (because copying a buffer in C normally requires some explicit code) but in higher level languages you might easily mistakenly pass the key object by copy instead of reference, leaving some duplicate of the sensitive information in some other location in RAM.

I think there really should be some kind of industry-standard multiplatform library meant to deal with secret keys that would implement all that behind the scenes and offer a simple API. It's simply too easy to make a dumb mistake when implementing these things, and you won't notice anything is wrong until somebody attempts to actually recover the key somehow.

> Compilers these days are very good at detecting "dummy" memsets to memory that's never read afterwards and optimize them away. You have to use some dirty tricks to get the compiler to do what you want (copious amounts of volatile helps).

Or you can use explicit_bzero(), which is designed for that use case.

Sure, although this is a non-standard and potentially non-portable extension. memset_s in Annex K is standard, but all of Annex K is optional, and like the rest of Annex K, it has an awful interface.

explicit_bzero or SecureZeroMemory are all insecure against those new sidechannel attacks we are talking about here. Only memset_s is.

You really need a mfence (full memory barrier), not just a compiler barrier, maybe even a clflush.

You might find this paper interesting in regard to memsets optimized away:


I agree with you on a standard platform. Id get a company like Galois to build it. They already have and open-sourced the necessary tooling from thrir prior contracts.

Yes, ALL private keys should be encrypted and stored elsewhere. The onion/knot should be unwound from some remote place, so you have to get access to a FEW places in order to get any private keys.

Now, in MY opinion, the whole idea of servers storing OR MANIPULATING unencrypted user-submitted information is what’s wrong with our current Internet culture. This needs to stop. It has massive social and economic effects and creates honeypots for hackers.

The sooner we make end-to-end encryption a basic expectation, the sooner we change our society to have a far more private and safe environment for everyone, and correct massive power imbalances: an organization or user should have power and data and connections and ability to SPAM others because people VOLUNTARILY gave them those things, not because they happened to build proprietary software and run it on infrastructure that is needed to operate the application!!!

> in fact one could argue that, the higher the quantity of memory chunks is, the harder it is to correctly decrypt the original private key; although to me this has the security through obscurity smell

I am not sure this is a bad place for security through obscurity. ASLR is similar. The thing is, here we are not using this is a primary mechanism. (that would be preventing RCE) Instead, this would be a secondary 'defense in depth' measure meant to make it harder to deploy an exploit after you have been compromised.

For such a second layer of defense in depth, obscurity is a decent option. Things go wrong if you start relying on your obscurity to do anything but slow down an attacker.

I feel like the should be a different name for that class of mitigations. Probabilistic security maybe? Security through obscurity for me is "we won't publish our algorithm". But that's not the same as ASLR or memory encryption which targets specific type of attack by adding probably-too-high randomness to the attack path.

AMD has some interesting work in this area.


RAM encryption is a feature in my password manager (java):


Curiously, RAM encryption, and its relative - clearing secrets from memory when not in use - cannot be added to the Bouncycastle library because they use java BigInteger (unless reflection is used of course).

You can't treat memory, even arrays, as singular memory locations in Java. Stuff moves around because collectors are cimpacting.. If you want to "write in place" in Java you need to either use a non-moving garbage collector, use sun.misc.Unsafe to read write memory directly, or mayyyyyybe take advantage of humongous allocations (arrays big enough that the JVM won't move them).

Direct ByteBuffer?

Yes, a direct byte buffer wrapped in class that ensures you can never copy the data somewhere else. That would also require cryptographic routines that can operate on bytebuffers or varhandles, which is quite an obstacle since many crypto libraries consume keys as byte[].

How do you guarantee actual clearing of physical memory in a garbage collected VM?

by explicitly clearing it when an operation is finished.



byte[] key = generateKey(); try { // sensitive operation } finally { Crypto.zero(key); }

once the operation finishes, the memory is cleared (and possibly subject to gc)

the problem is that some java classes, for instance BigInteger, are not designed for cryptographic operations - it's underlying arrays are not easily accessible (save for reflection).

In that code, after generateKey, during the sensitive operation, the system might need to do a garbage collection, at which point this array might have been copied to a different location in memory before your call to zero. You have to also "pin" (afaik this is the usual terminology for this) that array to a fixed location (which would then have to be a feature of that runtime and garbage collector) after allocating it but before generating the key.

Java also has no mfence or clflush support, so crypto.zero would never be secure. It might overwrite the key in the store buffer only, but not on the heap immediately, so prone to sidechannel attacks.

This is interesting. Could you explain this please?

So what you are saying is that JVM copying surviving object between generational pools might expose secrets in memory?

A very good point, thank you.

What could be done to mitigate this? Direct ByteBuffer per



In .NET you can pin memory in the GC heap, either using something like the fixed expression in C# or GCHandle. A quick Googleing did not turn up something usable directly by Java, except ByteBuffer.

That operation is almost certainly optimized out.

Would you elaborate please?

The compiler will see that your operation does nothing and simply not do it.

What I mean was: do you know this to be the case for JVM 8 later? This is an interesting subject. Was there a study or a paper you can refer me to?

JVM 8 isn't an implementation, it's a specification.

Assuming you mean oracles implementation, it's likely that it is.

https://www.sjoerdlangkemper.nl/2016/05/22/should-passwords-... http://www.daemonology.net/blog/2014-09-04-how-to-zero-a-buf... https://man.openbsd.org/explicit_bzero.3

Thank you. I was hoping for a more definitive answer, may be a reference to an explicit test involving memory dump analysis.

I believe he is referring to compiler optimisation which removes wasteful or extraneous operations. I don't write in Java though, so can't comment more than that.

Ideally private keys should never touch the main CPU at all. You should use something like a secure enclave or a a key-dongle like Yubikey.

No. The CPU cache can easily be read out with sidechannel attacks via hyperthreading. It's the similar problem as unencrypted keys at the absolute location, which doesn't get cleared with explicit_bzero, which e.g. libsodium refused to fix. Hopefully crypto maintainers will get to their senses eventually.


libsodium has the sodium_mshield()/sodium_munshield(). For libhydrogen, storing a large prekey can be a problem, so it may only be implemented on some platforms.

If I were writing a cryptographic algorithm in C++, how would I ensure the CPU cache was used for private keys? Would it have to be written in a lower level language, or does there exist a library for C/C++?

Previous thread with lots of relevant information: https://news.ycombinator.com/item?id=8542405

Short answer is yes usually, and if not, you can always do inline asm. Of course, you're really, really, really not supposed to write your own crypto.

If everyone follows this advice, who will write the crypto code? If anything we need lot more people who are formally trained to write proper crypto code and find bugs in such code etc.

Let me qualify that. You're right, we do need a lot more people. But the answer is, don't write your own, write as part of a team. Ideally, a public and peer-reviewed project. The short answer is many people will work on it together, but don't write your own.

When I first saw this post, I got excited that someone had finally addressed the side channel from 17 years ago, which gathered inter-packet timing from ssh packets (and the fact that passwords are not echo'd) to a) detect when passwords were being typed, and b) use Hidden Markov Models to extract keystroke timing to try to extract the likely passwords.

There have been defenses proposed over the years, but none were accepted by OpenSSH AFAICT.



The proper defense is to not use passwords. It's 2019, nobody should be using passwords on SSH-enabled shell accounts.

Heck, I use secure tokens for authentication so this memory encryption hack is useless to me--even I don't know what my private key(s) are, nor does OpenSSH or any other software on my computers. But I appreciate that we're a long way from ubiquitous hardware-based authentication.

After you are connected you might still want to enter a password (for sudo perhaps).

The solution there is to have the app use line buffering rather than a "raw" term mode that exposes inter-char timing on the network. How widely that's followed in practice, I do not know, but one would certainly hope that sudo does it.

If you really cannot use keys, then one mitigation is to use copy/paste to paste the entire password instead of typing it one character at a time. That can open some copy/paste vulnerabilities e.g. in X11 where any app can then read the password until you copy something else in its place. And a network observer may still determine the password length. But it closes the inter-key timing channel that permits direct character recovery.

Unfortunate that there's no commentary on performance impact. It's symmetric encryption on a few kB, so probably fast, but I'd like to have numbers.

If I understand correctly it's effectively a one-time constant cost when setting up an encrypted connection, so it should be negligible unless you have a use case for setting up many extremely short-lived connections.

SSH has multiplexing [1] specifically to speed up multiple connections... would be interesting to know whether it incurs the overheads once vs. on every connection.

[1] https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing

Just keep in mind that multiplexing makes phishing risks significantly higher. ssh wont log multiplexed auth and phished connections are auth free, up to 9 more sessions by default.

> Unfortunate that there's no commentary on performance impact. It's symmetric encryption on a few kB, so probably fast, but I'd like to have numbers.

TFA comments on the performance impact. It's only about the handshake, i.e. irrelevant.

Saying it's about the handshake is not a comment on performance impact, and doesn't need a patronizing "TFA". Especially when the article just talks about 'signatures', in a way that doesn't make it obvious exactly when this key is being used.

This code is probably not adding more than a millisecond, but it's not at all true that handshake speed is irrelevant. If you rewrote the RSA code in a slow secure way, going from 100ms to 5s on a slow chip, that would have very real effects.

If you need high performance why are you using OpenSSH?

Ssh isn't just for interactive terminal sessions. It's not that uncommon for people to use ssh tunnels in situations where proper VPNs etc. are blocked. Others use sftp or tunnels plus a SOCKS proxy in one specific application. In those applications, performance regressions are a legitimate concern precisely because it's not expected to be all that great to begin with.

If you’re concerned about performance then sftp or SSH tunnels simply aren’t the right tools for you.

If the "better" tools are unavailable, then it's the right tool. Your needs and priorities are not the only ones that exist anywhere in the universe.

I really don’t see a problem here. If you have some weird edge case that requires a less secure version of OpenSSHd then you should either stick with the old version or fork the new version.

The point is that we'd like to know that its not heavily degrading to the existing purpose. No one said anything about high performance.

What part of the existing purpose is performance-critical?

I certainly wouldn't want scp to suddenly slow down compared to what I'm used to. I use scp and sftp rather heavily to transfer very large files and do backups, that might not be performance-critical but it's definitely performance-sensitive.

I haven't had to mess with that in a while but there was a time where you'd get a significant boost with scp on underpowered hosts if you used the arcfour cipher for instance (I believe that it's now fully deprecated, and for good reasons).

If you were willing to sacrifice best practices for performance, one obvious option is to accept only hmac-md5, which is very fast and still somewhat secure.

If the systems involved lack "AES-NI" native CPU opcodes, then you might revert to chacha20-poly1305, which is supposedly faster on CPUs that lack acceleration. This also overrides the hmac-md5 above, as it is an AEAD.

If you needed to go faster still, then you could instead choose ARCFOUR (RC4) as you say, but this has been removed from the latest versions of OpenSSH because it is not safe.

The worst of the above configurations are still not as bad as classic FTP.

Using LFTP's mirror sub-system plus SFTP can be faster than all of the unencrypted protocols. It gives you the behavior of rsync, but can work with sftp+chroot logins and can spawn multiple threads per job or even per file. Working demo [1]

I find it very useful when I want to give people the ability to transfer files quickly, but I don't want to give them a shell.

[1] - https://tinyvpn.org/sftp/#lftp

Tanking DDoS? Like, the more connections you can handle normally, the less filtering you need to do on incoming connections?

OpenSSHd does not “tank DDoS”, it’ll simply consume all resources available to it if attacked.

The only way to mitigate such an attack would be to drop it before it reaches the SSH daemon.

I dont understand the purpose of this question. The existing purpose is that it works at all?

I’m really not convinced that this is a realistic concern.

And anyway, what you’re posing certainly isn’t a performance concern, but a “does the software work” concern.

"I’m really not convinced". Ok great thanks, neither is anyone else, which is why we'd like numbers.

Why do you think that this specific change would make ssh not "work at all" as opposed to the hundreds of other changes that happen? Do you even pay close attention to those changes? I for sure don't.

Why are you concerned about this in the first place? Obviously there's always a general concern about any changes, but what makes this specific change a big deal to you?

This particular change got a hacker news article. It seems to be inherently more noticeable than others.

Seems reasonable to ask about the performance (and memory) implications of the change. If they're minimal, than this solution can be easily added to other situations where encryption is being used. If it's heavy, then different solutions would need to be developed on a case by case basis.

edit: your annoyance is not worth anyones time

>reading way too far into it

Dunno, your earlier comments seem to imply otherwise:

>we'd like to know that its not heavily degrading to the existing purpose... which is why we'd like numbers

Far more important than that what I want to know is whether I should even care, e.g. is there any evidence of Spectre being used in the wild or not.

A vulnerability that has been shown to work should not be patched in the software more widely used around the world to connect all kinds of linux/unix servers and even other systems? They should wait for it to start getting exploited "in the wild"? I'm just glad that the security of my systems does not depend on people with this kind of attitude.

Where did I say it shouldn't be patched? I just said this is information I think a user would care to know along with the patch.

You said you shouldn't care if it was not found in the wild, this is just wrong

You don't even know my requirements or how I use my computers. How do are you so confident it's "just wrong"?

If you want to know wheter it was found "in the wild" and think that this is a relevant factor in deciding if you should care, that is information enough about your usage to be confident it is wrong.

It's most definitely not, but I don't get the impression you're open to seeing things as nonbinary.

Ah yes, I too love the panicked rush of trying to force application vendors to patch once we we discover something is being exploited in the wild.

Remember defense is depth is a valid strategy.

What I was trying to say was I think library users care to know how urgent it is. If this is being used in the wild then application vendors might need to provide out-of-band patches somehow, and end users should rush to get those patches. If OTOH this is known to be extremely hard to pull off and not known to be used, then it'd be nice for users to know that just the same. Nowhere was I trying to suggest they should avoid providing the patch altogether or something.

I agree with the idea that availability of information is good, and that information about the context for a security-related change should be made transparent. But how relevant is it? I would think relevant enough for FAQ or other reference information. I wouldn't include it in announcements, though.

The headline is "patch available, mitigating known exploit". "Not yet widely exploited" is barely a footnote. The release of a patch can bring enough attention to make the window between release and full deployment of the patch the single worst time to be vulnerable. If I tell you it wasn't being exploited yesterday, and you delay patching based on that information, and then the storm of exploits blows through ... I'd feel bad.

> I would think relevant enough for FAQ or other reference information. I wouldn't include it in announcements, though.

Maybe you wouldn't, but US-CERT, Mozilla, etc. do...



That's not how security works. If it can potentially lead to a software like openssh leaking secrets, it is of the highest urgency, period. It doesn't matter if it is thought to be hard to exploit and it doesn't matter if it was already found "in the wild".

Okay, but if 1 of my 3 highest urgency vulnerabilities is known to have been exploited in the wild, and is easy to exploit, then I may want to focus on that one over the other 2.

Alright, I should have said it is "critical" instead

> That's not how security works. If it can potentially lead to a software like openssh leaking secrets, it is of the highest urgency, period. It doesn't matter if it thought to be hard to exploit and it doesn't matter if it was already found "in the wild".

Really? This isn't how security works? Yeah, I guess I forgot security is a 100% binary thing. That's why you never read actual security bulletins advising you when vulnerabilities are actively being exploited in the wild. It's insane to think that should matter or raise the urgency of a patch. [1] [2]

[1] https://www.us-cert.gov/ncas/current-activity/2019/06/18/Moz...

[2] https://www.mozilla.org/en-US/security/advisories/mfsa2019-1...

None. There hasn't even been a real world demo where given normal conditions: a running sshd and other programs where a running browser script exfiltrates a key with meltdown.

Spectre probably isn't possible either and that is the easy one. The load store buffer attack seems completely impossible. Even the POC had to essentially write a program specially to be exploited.

The attacks are very interesting and neat, but I think things like this and other techniques effectively remove any last chance.

Don't underestimate what hackers with enough motivation can achieve. Specially when the stakes are so high and the geopolitical power coming from a software vulnerability can be significant.

It's not mentioned because it's irrelevant. There is no security / performance trade off for a secure program. It's secure or its not.

I don't think that makes sense. For starters, threat model is everything. To grab an obvious example here, speculative execution attacks are only a concern if an attacker can cause execution of code on your physical host. A secure program facing the network will be secure even if it fails to defend against CPU side channels. Further, reducing to a binary de facto means that literally everything is "insecure", the end. (I defy you to show me a piece of software that never has had any vulnerabilities and never will, since of course undiscovered bugs still make the system insecure)

There are a lot of knee-jerking in this thread. I don't think anybody is arguing that this is a bad patch, but if it does have a notable performance impact (which it probably does not) it's still worth knowing. If only to be able to answer questions like "why do our backups suddenly take 10% more time to complete? Do we have a problem with our network infrastructure?"

It's not about tradeoffs, it's about understanding what's going on and anticipating potential problems.

By your logic we shouldn't have to discuss the performance regressions on CPUs who implement spectre/meltdown mitigations because "it's irrelevant". Obviously these patches are necessary but the performance impact is very relevant for many users.

By the parent's logic, there's no concept of improving security, because if there remains attacks, then it is still not secure. You can't take a position of "it's either secure or it isn't."

> It's not mentioned because it's irrelevant. There is no security / performance trade off for a secure program. It's secure or its not.

Everything in the world is about trade-offs. Security is no exception. Humans figured this out a long time ago with physical security. Somehow it hasn't sunk in for cybersecurity. I would recommend mentioning this to a well-known security expert if you ever come across one and seeing their reaction.

> It's secure or its not.

Then there is no concept of "improves" security. Either it is now secure or it is not. Do they have proof that there are now no side channel attacks that can be made on their software?

EDIT: sigh This comment is not arguing that they need a proof for their change. It is arguing that you can improve security without making something completely secure, which undermines the idea that code is 'either secure or it is not'.

> It's secure or its not.

Well, it's not. Now that we've debunked performance and security, I guess we can just write everything in PHP.

I think folks are missing your point, which (I think) is something like:

I don't care if that car is faster if it is already on fire.

Degree of security is routinely measured by expense to overcome it.

Could password managers like 1Password (AgileBits Inc.) employ similar techniques? They seem to make very little effort in this regards. E.g. on 1Password, https://discussions.agilebits.com/discussion/101551/article-....

No. I just tested KeePass.

1. I created a new test-entry with a long and random password.

2. Opened the process's memory in the HxD hexeditor.

3. Ctrl+f

4. Found this password 4 times.

Even after i locked KeePass i still found the password 2 times.

This is bad!

This kind of mitigation really only makes sense on shared machines (such as servers). On a desktop OS, if an attacker is in a position to read memory from other processes, it's pretty much game over already.

Browsers implement Spectre/Meltdown mitigation on desktop OSes because without that, JS could read secrets from other JS contexts executing in the same process. One of the mitigations is in fact to just segregate JS contexts into different processes depending on the domain they belong to. But most apps don't execute untrusted code so most apps don't have this sort of in-process attacker to worry about.

> On a desktop OS, if an attacker is in a position to read memory from other processes, it's pretty much game over already.

Not really. Password managers have access information to your other accounts in the cloud. Your desktop may not have your most valuable data.

Here is a study on how well various password managers try to scrub their memory to avoid RAM dumps: https://www.securityevaluators.com/casestudies/password-mana...

I've looked at it. There's still a spectre window of opportunity to get the shielded private host keys.

sshkey_shield_private => explicit_bzero() openbsd-compat/freezero.c


It's only using the insecure freezero, which is using the insecure explicit_bzero. A simple compiler barrier only, no memory barrier. so it's unsafe against the advertised spectre/meltdown sidechannel attacks, the secrets are still in the caches.

> Attackers must recover the entire prekey with high accuracy before they can attempt to decrypt the shielded private key, but the current generation of attacks have bit error rates that, when applied cumulatively to the entire prekey, make this unlikely.

It seems the real mitigation isn't the prekey size, but the temporal sparseness of the symmetric key -- since I would've imagined attackers would just try to obtain the symmetric key rather than the prekey. Weird to see they they didn't even mention this... I imagine attackers would try to find a way to get the symmetric key to stay in memory for a while.

The prekey is hashed into the symmetric key. Both the hash function and the symmetric cipher have avalanche effects that mean that N bit errors require the attacker to bruteforce 2^N combinations.

unprotected RSA keys on the other hand have structure and are dense in memory. That means fewer bit-errors and and the ability to guess the missing bits faster than O(2^N).

Yeah but my concern was about direct leakage not brute force. That's what Spectre is about.

Oh yeah, but the temporal sparseness doesn't just apply to the symmetric memory encryption key. The most important part is that it also applies to the asymmetric host keys, which are the actual thing one wants to have protected.

prekey -(hash)-> memory key -(decrypt)-> host private key

I think the assumption is the symmetric key can be stored in AES-NI registers that aren't susceptible to RAM based side channel attacks?

Is this only for OpenBSD's implementation of SSH?

If we wait long enough, it will go all the places that the portable release at openssh.com pushes it.

It will take a few service packs, but even Windows will get it.

    C:\>ssh -V
    OpenSSH_for_Windows_7.6p1, LibreSSL 2.6.4

Does this only protect the SSH host key?

What about the session keys for the sysadmins who have logged in, and maybe keep their session up 24/7?

Session keys are rotated. See "RekeyLimit" in sshd_config(5).

Take my money!


You can donate to OpenSSH[0], whose "funding is generally done via the same donation framework" as the rest of OpenBSD, to which you can donate either directly[1] or via the OpenBSD Foundation[2]. If you're serious about donating obviously please check that these links are legitimate and I'm not a scammer. (I'm not affiliated with OpenBSD in any way.)

[0] https://www.openssh.com/donations.html

[1] https://www.openbsd.org/donations.html

[2] https://www.openbsdfoundation.org/donations.html

50$ donated! I hope that's good for some beers :)

On a sidenote paypal donation took way too long. If anyone is interested in making open source donations simpler feel free to ping me.

For your donation, you also received preliminary code for quantum-resistant key exchange.

    $ sed -n '/NTRU/,/enabled/p' ChangeLog
    sntrup4591761x25519-sha512@tinyssh.org using the Streamlined NTRU Prime
    4591^761 implementation from SUPERCOP coupled with X25519 as a stop-loss. Not
    enabled by default.

Krypton Authenticator app anyone? https://krypt.co/docs/security/privacy-policy.html

It could use the x87 floating point register stack to store the encryption secret. These registers are not used unless there would be some assembly in the SSH libs that accesses them.

Wouldn't they be saved to memory anyway on a context switch?

Yes, but I thought this is done in kernel space and a userspace attack has no access to that while the syscall is served.

That is true - it's probably more secure than storing them in the process, but several of these side-channel attacks can apply to kernel space (depending on hardware and security patches applied).

It's also not portable - OpenSSH runs on non-x86 architectures, and they might not have spare basically unused registers lying around.

Finally, I'm not sure the x87 registers have enough space to fit these keys. You have 8 80-bit registers, for a total of 640 bits. Your typical SSH private key might be 2048 bits or more.

So it's a fun and creative line of thinking, but probably not practical in this case.

> Hopefully we can remove this in a few years time when computer architecture has become less unsafe.

Narrator: It hasn’t.

Couldn't they also move the keys around in memory to every second or keep the bytes of the key separated (this seems like it would be similar to encrypting)?

These side channel attacks even under ideal conditions take a very long time and part of the problem is they basically need to guess at memory addresses. Even when data is in a known location, it is sketchy. Anything that slows down locating data would help immensely.

Does this support all Private Keys across all operations?

Trying to understand the coverage.

Host keys only, as those are the only ones that are long-lived.

Session and cipher keys are ephemeral for forward security, and are rekeyed on a regular basis: see "RekeyLimit" in sshd_config(5).

Anyone with the actual link to the specific commit?

Hm, I wonder if the symmetric encryption with a sophisticated cipher is really necessary in this scenario. The aim is require an attacker to read 16Kb of memory in addition to the much smaller key data, e.g. for each bit of the key, the attacker needs X additional, random bits.

Wouldn't it be possible to block-wise xor the random data onto the key?

Maybe use a windowing mechanism, where the window is moved forward depending on the random data (e.g. for a 16 bit key, xor random bits 0 to 15, then move forward 4 to 16 bits, depending on the current window; iterate until the maximum number of window movements necessary to go over the whole random data is reached [leak less bits via timing]; if the end of the data is reached, again start at the beginning, but with some offset to avoid result_bit_0 = secret_bit_0 xor random_bit_0 xor random_bit_0).

You want every single bit error to avalanche to the whole key. You also don't want bit errors at some offsets modulo X to cancel each other out. Cryptographic hashes provide these properties. For any custom solution you would first have to prove that it has the security properties needed.

The asymmetric crypto of a connection setup takes the lion's share of CPU cycles. I don't think it's worth the risk just to beat the cheap (relatively speaking) symmetric algorithms.

   -  /* $OpenBSD: authfd.c,v 1.113 2018/12/27 23:02:11 djm Exp $ */ 
   +  /* $OpenBSD: authfd.c,v 1.114 2019/06/21 04:21:04 djm Exp $ */ 
what weird tool/infrastructure do these guys use, that they maintain CVS/SVN string identifiers in the files but nevertheless use git?

I actually liked these identifiers. There were tools like ident (http://manpages.org/ident/1) to deal with them.

They use CVS, not git. The repository that’s on GitHub is a mirror of the CVS repository; I don’t know what tool is used to do the mirroring.

Here is the original change in authfd.c in cvsweb:


They use CVS; the github repo description says it is a mirror:

> Public git conversion mirror of OpenBSD's official cvs src repository.

> Hopefully we can remove this in a few years time when computer architecture has become less unsafe.

I think the author misspelled 'decades' here :)

I believe all consumer AMD Ryzen CPUs support Secure Encrypted Memory (SEM), but for some reason it's not enabled by default on most Ryzen devices/motherboards. It's a shame.

I'd also love to see AMD bring an updated and patched version of Secure Encrypted Virtualization (there have been some attacks against it, although still fewer than against Intel's SGX) to consumer Ryzen in the near future. With so many cores available in consumer AMD CPUs (up to 16 now), people will start to use VMs more. Even Windows 10 has the easy-to-use Windows Sandbox now, as well as the App Guard sandbox for Edge.

Not to mention they could use this as yet another "killer app" of their many-core CPUs, because otherwise people will eventually start to wonder why even get CPUs with so many cores over CPUs with fewer cores but higher singlethread performance. No different than say Verizon promoting high-quality 4k streaming on its new 5G network.

I would've already preferred to see this in Zen 2, but at least Zen 3, which will otherwise bring few performance improvements and remain on the 7nm node, should come with these as some sort of "security-focused generation of Zen".

Isn't AMD's encrypted memory meant more for the case of someone with physical access aggressively cooling a running system, then cutting power and removing the chilled memory for analysis (which will preserve contents much longer when cold than at normal temperatures)?

The case of an attack on the SSH Agent would take place within the CPU, perhaps even in the same core on a separate HT/SMT thread, where the memory would be cleartext.

..or forgot to add "/sarcasm".

But why? For security critical software, like this, they should assume as little as possible. In essence you want to make the algorithms immune to side channel attacks when possible.

Because the more complexity you have in software, the harder it is to keep it secure. Even security mitigations can potentially introduce another vulnerabilities. This is one of the reasons that as a general rule we should strive for software to be kept simple.

Given the nature of this software, it's natural to have mitigations against side channel attacks. They happen multiple times, and will happen in the future, no matter how secure we believe the hardware is.

With that in mind, it's probably the better strategy to use slower and more complicated algorithms to protect the user. This would mean that when a side channel attack becomes known, if the algorithm already protects against it, nothing have to be done. Unlike if a fix needs to be made, you not face the problem you've outlined. I believe it's better to have a better baseline security at the cost of complexity, because it means less hotfixes needs to be released.

Side channel attacks are only possible because the hardware is currently vulnerable. They are not a law of nature. Once you solve the vulnerability at its root and it becomes physically inexistent, and there's no more running hardware in the market that has such vulnerability, it would make no sense to keep such software mitigation.

Clive Robinson on Schneier's blog predicted lots of these problems after arguing they were a law of nature. He said any form of matter or energy connecting two machines might create a side channel. He said we'd have to clock all the inputs and outputs, make them predictable, and then "energy gap" the systems. We both already knew about CPU leaks since that was described as risky in 1990's. Sure enough, air-gap-jumping malware and processor leaks showed up.

Later, getting a high-level view of hardware reinforced it was a law of nature. First, there's all kinds of RF leaks that attackers might pick up from normal operation. Second, most systems aren't fault/leak-proof if attackers actively hit the system with different physical effects or RF. Finally, each process shrink increases how easily chips, including mitigations on them, break. It looked like the stuff at 28nm was kind of broken by design with fixes and stuff built in to delay failures user would notice.

This all sounds like the laws of physics are a huge obstacle to computers (a) working at all and (b) keeping secrets. Achieving (a) takes hundreds of millions to billions in R&D each year. I can only imaging what (b) might take.

Businesses still run mainframes from the 1970s - so, get ready to wait a century.

Well yes, but that's a pretty special case. Those running such ancient hardware keep their own software and patches, and you don't see many software vendors supporting them unless their are being very well paid. OpenBSD itself does not support even VAX anymore, and developers felt pretty happy when they finally deleted large portions of specific code. Ubuntu is talking about dropping i386. Given enough time, the burden of supporting old hardware outweight the benefits for pretty much everybody, so it makes sense that it should fall on the shoulders of those who decided keeping the old hardware was a good idea.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact