In my opinion server-applications of all sorts should encrypt their private keys by default; this makes cold-boot attacks and other memory-escape attacks so much harder, since now two totally unrelated memory chunks have to be combined in order to retrieve the private key (in fact one could argue that, the higher the quantity of memory chunks is, the harder it is to correctly decrypt the original private key; although to me this has the security through obscurity smell).
In practice however, this is rarely done. OpenSSL for example has a whole lot of memory-management functions , but none of them seem to include RAM encryption (AFAIK it is not present in the source code elsewhere either, but it's a large one, so I am not familiar with all code. Related: in fact some people even argue it's way too large for it's own good ).
Even better still, would be to use the CPU Cache for private keys, since this memory is even more difficult to access through any sort of cold boot. An interesting paper with much more information about this subject can be found here , to whom it may concern.
From the quick description it sounds like this provides a way of encrypting, per memory page, based on a symmetric key that is backed by some level of hardware encryption. It was not clear (in a quick read) how or where to specify the key by which an individual page is encrypted. That would be a critical component of comprehension with respect to identifying if this could be used to encipher individual processes and further isolate memory. It sounds like it might be possible to establish per-process memory isolation, which is probably the best level of security possible without resorting to entirely isolated hardware.
Additionally a per-process key does not help against spectre style attacks where you would trick the process into speculating on protected memory.
Transparent disk encryption, not a problem since devices have filesystems which can implement encryption at that layer.
Note: inside the enclave there is a performance loss but that's due to MAC checks. If you just want encryption without integrity against tampering you don't need that.
SME/MKTME add hardware support for this.
Ah, so we'll just have to trust you that it's doing anything at all, then.
Forgive me but can we not be skeptical of claims made about a commercial product?
It's also not helpful to post such a clichéd dismissal of what someone else says or their work. That's in the site guidelines too.
There's also the problem of making sure that your internal API is sound and doesn't copy the key around. In languages like C it's relatively easy to achieve (because copying a buffer in C normally requires some explicit code) but in higher level languages you might easily mistakenly pass the key object by copy instead of reference, leaving some duplicate of the sensitive information in some other location in RAM.
I think there really should be some kind of industry-standard multiplatform library meant to deal with secret keys that would implement all that behind the scenes and offer a simple API. It's simply too easy to make a dumb mistake when implementing these things, and you won't notice anything is wrong until somebody attempts to actually recover the key somehow.
Or you can use explicit_bzero(), which is designed for that use case.
You really need a mfence (full memory barrier), not just a compiler barrier, maybe even a clflush.
I agree with you on a standard platform. Id get a company like Galois to build it. They already have and open-sourced the necessary tooling from thrir prior contracts.
Now, in MY opinion, the whole idea of servers storing OR MANIPULATING unencrypted user-submitted information is what’s wrong with our current Internet culture. This needs to stop. It has massive social and economic effects and creates honeypots for hackers.
The sooner we make end-to-end encryption a basic expectation, the sooner we change our society to have a far more private and safe environment for everyone, and correct massive power imbalances: an organization or user should have power and data and connections and ability to SPAM others because people VOLUNTARILY gave them those things, not because they happened to build proprietary software and run it on infrastructure that is needed to operate the application!!!
I am not sure this is a bad place for security through obscurity. ASLR is similar. The thing is, here we are not using this is a primary mechanism. (that would be preventing RCE) Instead, this would be a secondary 'defense in depth' measure meant to make it harder to deploy an exploit after you have been compromised.
For such a second layer of defense in depth, obscurity is a decent option. Things go wrong if you start relying on your obscurity to do anything but slow down an attacker.
Curiously, RAM encryption, and its relative - clearing secrets from memory when not in use - cannot be added to the Bouncycastle library because they use java BigInteger (unless reflection is used of course).
byte key = generateKey();
// sensitive operation
once the operation finishes, the memory is cleared (and possibly subject to gc)
the problem is that some java classes, for instance BigInteger, are not designed for cryptographic operations - it's underlying arrays are not easily accessible (save for reflection).
A very good point, thank you.
What could be done to mitigate this? Direct ByteBuffer per
Assuming you mean oracles implementation, it's likely that it is.
Short answer is yes usually, and if not, you can always do inline asm. Of course, you're really, really, really not supposed to write your own crypto.
There have been defenses proposed over the years, but none were accepted by OpenSSH AFAICT.
Heck, I use secure tokens for authentication so this memory encryption hack is useless to me--even I don't know what my private key(s) are, nor does OpenSSH or any other software on my computers. But I appreciate that we're a long way from ubiquitous hardware-based authentication.
TFA comments on the performance impact. It's only about the handshake, i.e. irrelevant.
This code is probably not adding more than a millisecond, but it's not at all true that handshake speed is irrelevant. If you rewrote the RSA code in a slow secure way, going from 100ms to 5s on a slow chip, that would have very real effects.
I haven't had to mess with that in a while but there was a time where you'd get a significant boost with scp on underpowered hosts if you used the arcfour cipher for instance (I believe that it's now fully deprecated, and for good reasons).
If the systems involved lack "AES-NI" native CPU opcodes, then you might revert to chacha20-poly1305, which is supposedly faster on CPUs that lack acceleration. This also overrides the hmac-md5 above, as it is an AEAD.
If you needed to go faster still, then you could instead choose ARCFOUR (RC4) as you say, but this has been removed from the latest versions of OpenSSH because it is not safe.
The worst of the above configurations are still not as bad as classic FTP.
I find it very useful when I want to give people the ability to transfer files quickly, but I don't want to give them a shell.
 - https://tinyvpn.org/sftp/#lftp
The only way to mitigate such an attack would be to drop it before it reaches the SSH daemon.
And anyway, what you’re posing certainly isn’t a performance concern, but a “does the software work” concern.
Why are you concerned about this in the first place? Obviously there's always a general concern about any changes, but what makes this specific change a big deal to you?
Seems reasonable to ask about the performance (and memory) implications of the change. If they're minimal, than this solution can be easily added to other situations where encryption is being used. If it's heavy, then different solutions would need to be developed on a case by case basis.
Dunno, your earlier comments seem to imply otherwise:
>we'd like to know that its not heavily degrading to the existing purpose... which is why we'd like numbers
Remember defense is depth is a valid strategy.
The headline is "patch available, mitigating known exploit". "Not yet widely exploited" is barely a footnote. The release of a patch can bring enough attention to make the window between release and full deployment of the patch the single worst time to be vulnerable. If I tell you it wasn't being exploited yesterday, and you delay patching based on that information, and then the storm of exploits blows through ... I'd feel bad.
Maybe you wouldn't, but US-CERT, Mozilla, etc. do...
Really? This isn't how security works? Yeah, I guess I forgot security is a 100% binary thing. That's why you never read actual security bulletins advising you when vulnerabilities are actively being exploited in the wild. It's insane to think that should matter or raise the urgency of a patch.  
Spectre probably isn't possible either and that is the easy one. The load store buffer attack seems completely impossible. Even the POC had to essentially write a program specially to be exploited.
The attacks are very interesting and neat, but I think things like this and other techniques effectively remove any last chance.
It's not about tradeoffs, it's about understanding what's going on and anticipating potential problems.
By your logic we shouldn't have to discuss the performance regressions on CPUs who implement spectre/meltdown mitigations because "it's irrelevant". Obviously these patches are necessary but the performance impact is very relevant for many users.
Everything in the world is about trade-offs. Security is no exception. Humans figured this out a long time ago with physical security. Somehow it hasn't sunk in for cybersecurity. I would recommend mentioning this to a well-known security expert if you ever come across one and seeing their reaction.
Then there is no concept of "improves" security. Either it is now secure or it is not. Do they have proof that there are now no side channel attacks that can be made on their software?
EDIT: sigh This comment is not arguing that they need a proof for their change. It is arguing that you can improve security without making something completely secure, which undermines the idea that code is 'either secure or it is not'.
Well, it's not. Now that we've debunked performance and security, I guess we can just write everything in PHP.
I don't care if that car is faster if it is already on fire.
1. I created a new test-entry with a long and random password.
2. Opened the process's memory in the HxD hexeditor.
4. Found this password 4 times.
Even after i locked KeePass i still found the password 2 times.
This is bad!
Browsers implement Spectre/Meltdown mitigation on desktop OSes because without that, JS could read secrets from other JS contexts executing in the same process. One of the mitigations is in fact to just segregate JS contexts into different processes depending on the domain they belong to. But most apps don't execute untrusted code so most apps don't have this sort of in-process attacker to worry about.
Not really. Password managers have access information to your other accounts in the cloud. Your desktop may not have your most valuable data.
sshkey_shield_private => explicit_bzero()
It's only using the insecure freezero, which is using the insecure explicit_bzero. A simple compiler barrier only, no memory barrier. so it's unsafe against the advertised spectre/meltdown sidechannel attacks, the secrets are still in the caches.
It seems the real mitigation isn't the prekey size, but the temporal sparseness of the symmetric key -- since I would've imagined attackers would just try to obtain the symmetric key rather than the prekey. Weird to see they they didn't even mention this... I imagine attackers would try to find a way to get the symmetric key to stay in memory for a while.
unprotected RSA keys on the other hand have structure and are dense in memory. That means fewer bit-errors and and the ability to guess the missing bits faster than O(2^N).
prekey -(hash)-> memory key -(decrypt)-> host private key
It will take a few service packs, but even Windows will get it.
OpenSSH_for_Windows_7.6p1, LibreSSL 2.6.4
What about the session keys for the sysadmins who have logged in, and maybe keep their session up 24/7?
You can donate to OpenSSH, whose "funding is generally done via the same donation framework" as the rest of OpenBSD, to which you can donate either directly or via the OpenBSD Foundation. If you're serious about donating obviously please check that these links are legitimate and I'm not a scammer. (I'm not affiliated with OpenBSD in any way.)
On a sidenote paypal donation took way too long. If anyone is interested in making open source donations simpler feel free to ping me.
$ sed -n '/NTRU/,/enabled/p' ChangeLog
firstname.lastname@example.org using the Streamlined NTRU Prime
4591^761 implementation from SUPERCOP coupled with X25519 as a stop-loss. Not
enabled by default.
It's also not portable - OpenSSH runs on non-x86 architectures, and they might not have spare basically unused registers lying around.
Finally, I'm not sure the x87 registers have enough space to fit these keys. You have 8 80-bit registers, for a total of 640 bits. Your typical SSH private key might be 2048 bits or more.
So it's a fun and creative line of thinking, but probably not practical in this case.
Narrator: It hasn’t.
These side channel attacks even under ideal conditions take a very long time and part of the problem is they basically need to guess at memory addresses. Even when data is in a known location, it is sketchy. Anything that slows down locating data would help immensely.
Trying to understand the coverage.
Session and cipher keys are ephemeral for forward security, and are rekeyed on a regular basis: see "RekeyLimit" in sshd_config(5).
Wouldn't it be possible to block-wise xor the random data onto the key?
Maybe use a windowing mechanism, where the window is moved forward depending on the random data (e.g. for a 16 bit key, xor random bits 0 to 15, then move forward 4 to 16 bits, depending on the current window; iterate until the maximum number of window movements necessary to go over the whole random data is reached [leak less bits via timing]; if the end of the data is reached, again start at the beginning, but with some offset to avoid result_bit_0 = secret_bit_0 xor random_bit_0 xor random_bit_0).
The asymmetric crypto of a connection setup takes the lion's share of CPU cycles. I don't think it's worth the risk just to beat the cheap (relatively speaking) symmetric algorithms.
- /* $OpenBSD: authfd.c,v 1.113 2018/12/27 23:02:11 djm Exp $ */
+ /* $OpenBSD: authfd.c,v 1.114 2019/06/21 04:21:04 djm Exp $ */
I actually liked these identifiers. There were tools like ident (http://manpages.org/ident/1) to deal with them.
Here is the original change in authfd.c in cvsweb:
> Public git conversion mirror of OpenBSD's official cvs src repository.
I think the author misspelled 'decades' here :)
I'd also love to see AMD bring an updated and patched version of Secure Encrypted Virtualization (there have been some attacks against it, although still fewer than against Intel's SGX) to consumer Ryzen in the near future. With so many cores available in consumer AMD CPUs (up to 16 now), people will start to use VMs more. Even Windows 10 has the easy-to-use Windows Sandbox now, as well as the App Guard sandbox for Edge.
Not to mention they could use this as yet another "killer app" of their many-core CPUs, because otherwise people will eventually start to wonder why even get CPUs with so many cores over CPUs with fewer cores but higher singlethread performance. No different than say Verizon promoting high-quality 4k streaming on its new 5G network.
I would've already preferred to see this in Zen 2, but at least Zen 3, which will otherwise bring few performance improvements and remain on the 7nm node, should come with these as some sort of "security-focused generation of Zen".
The case of an attack on the SSH Agent would take place within the CPU, perhaps even in the same core on a separate HT/SMT thread, where the memory would be cleartext.
With that in mind, it's probably the better strategy to use slower and more complicated algorithms to protect the user. This would mean that when a side channel attack becomes known, if the algorithm already protects against it, nothing have to be done. Unlike if a fix needs to be made, you not face the problem you've outlined. I believe it's better to have a better baseline security at the cost of complexity, because it means less hotfixes needs to be released.
Later, getting a high-level view of hardware reinforced it was a law of nature. First, there's all kinds of RF leaks that attackers might pick up from normal operation. Second, most systems aren't fault/leak-proof if attackers actively hit the system with different physical effects or RF. Finally, each process shrink increases how easily chips, including mitigations on them, break. It looked like the stuff at 28nm was kind of broken by design with fixes and stuff built in to delay failures user would notice.
This all sounds like the laws of physics are a huge obstacle to computers (a) working at all and (b) keeping secrets. Achieving (a) takes hundreds of millions to billions in R&D each year. I can only imaging what (b) might take.