Part 2 is correct in that trying to zero memory to "cover your tracks" is an indication that You're Doing It Wrong, but I disagree that this is a language issue.
Even if you hand-wrote some assembly, carefully managing where data is stored, wiping registers after use, you still end up information leakage. Typically the CPU cache hierarchy is going to end up with some copies of keys and plaintext. You know that? OK, then did you know that typically a "cache invalidate" operation doesn't actually zero its data SRAMs, and just resets the tag SRAMs? There are instructions on most platforms to read these back (if you're at the right privilege level). Timing attacks are also possible unless you hand-wrote that assembly knowing exactly which platform it's going to run on. Intel et al have a habit of making things like multiply-add have a "fast path" depending on the input values, so you end up leaking the magnitude of inputs.
Leaving aside timing attacks (which are just an algorithm and instruction selection problem), the right solution is isolation. Often people go for physical isolation: hardware security modules (HSMs). A much less expensive solution is sandboxing: stick these functions in their own process, with a thin channel of communication. If you want to blow away all its state, then wipe every page that was allocated to it.
Trying to tackle this without platform support is futile. Even if you have language support. I've always frowned at attempts to make userland crypto libraries "cover their tracks" because it's an attempt to protect a process from itself. That engineering effort would have been better spent making some actual, hardware supported separation, such as process isolation.
The "right privilege level" allows you to see anything that happens during the execution of the lower privilege levels. I can even single-step your application with the right privilege level. So the crypto services have to run at the high privilege level and ideally your applications should leave even the key management to the "higher privilege levels." That way attacking the application can leak the data, but not the key, that is, you can still have the "perfect forward secrecy" from the point of the view of the application. So you have to trust the OS and the hardware and implement all the tricky things on that level. Trying to solve anything like that on the language level doesn't seem to be the right direction of the attacking the problem.
So is it correct to say that if a process does not want to leak information to other processes with different user ID's running under the same kernel that a necessary (but not necessarily sufficient, due to things like timing attacks) condition is for it to ensure that any allocated memory is zero'd before being free'd ?
I wonder if current VM implementations are doing this systematically.
It seems like a kernel API to request "secure" memory and then have the kernel ensure zeroing would be useful. Without this I'm wondering if it's even possible for a process to ensure that physical memory is zero'd, since it can only work with virtual memory.
All kernels I know of zero all memory they hand over to user processes. It's been part of basic security for quite some time - exactly for this kind of thing. It's usually done on allocation, not free - it doesn't really matter which way around, but doing it "lazily" can often be better performance.
In that case your original comment looks like the way to go and should make pretty much everything else in this thread moot.
It seems like the key though is ensuring that your environment uses distinct non-root users for all security relevant processes so that a security bug in one process doesn't allow the attacker to gain access to others.
EDIT: On second thought there may be some advantage to effectively zeroing memory for security critical data within a process but the likely value add seems low to me. Once a process has been hacked it seems pretty unlikely that you can hope to control what information it leaks.
Actually use of uninitialized memory is a reasonably common flaw and doesn't imply the process has been or can be hacked to execute arbitrary code.
So wiping that sort of information as soon as it becomes unneeded is good hygiene. And I still think it is reasonable to do the least you can to avoid ending up with sensitive data on the disk after a core dump.
Use of uninitialized memory is certainly a common bug but I'm not seeing what that has to do with zeroing free'd memory. It might be easier to detect such a bug if the uninitialized memory is zero'd but it seems like the work devoted to zeroing memory would be better spent fixing the uninitialized memory accesses.
As for the second point production software isn't typically configured to produce core dumps (ie. ulimit -c 0).
It's not so hard to zero memory when it becomes unused. So libraries like LibreSSL do that. Increasingly, other applications are also starting to use this pattern. It is easier to add a few safeguards into the library than it is to fix every past, present and future application that uses it.
It's a start. Adding the safeguard doesn't mean effort won't be put into fixing the actual bugs. But you just don't fix all the world's bugs overnight. That's why things like virtual memory, permissions, chroots, ASLR, NX, SSP and such exist.
How many systems enable core dumps by default? I don't actually know, but I think quite a few do. Every application you use to get stuff done is a production application. Every application that handles sensitive information handles sensitive information whether it is in production or not. Leaking passwords and keys can be as simple as working on some client software, having it crash once, then passing through airport security and getting your HD snooped on...
I understand your point but it seems like this approach ends up making security dependent on an a very deep stack of technology solutions, each rather fragile (as this post and thread demonstrate).
I wonder if it wouldn't make more sense to do a first principles analysis of what needs to be protected and then design mechanisms at the appropriate level of abstraction to ensure that these requirements are met. It seems to me that this is the approach that has been traditionally taken in OS level design and I agree that it hasn't worked very well. But I wonder if that isn't more because applications and environments are not being carefully designed to take advantage of the OS level security mechanisms that already exist.
Personally I would feel more confident depending on a robust kernel level security mechanism than a hodgepodge of application level fixes that depend on everything from compiler optimizations to CPU caching mechanisms.
I, for one, welcome new research. But it also needs to be demonstrated in practice before it can be used in practice. Until then, it's all hypotheticals and wishful thinking. Unicorns, basically.
In the meanwhile, I do what I can to review, audit, and apply best practices at application level. These are the things we can do here and now. These are things that are already in use to make your system more secure.
You're right that it is a hodgepodge of tricks and never quite perfect or capable of blocking all attacks. In an ideal world someone would design and give us a system that provides perfect security right out of the box, in a small & elegant & easy to understand manner.
I'm not smart enough to do that so I'll only dream of the unicorns. :-)
Even if you hand-wrote some assembly, carefully managing where data is stored, wiping registers after use, you still end up information leakage. Typically the CPU cache hierarchy is going to end up with some copies of keys and plaintext. You know that? OK, then did you know that typically a "cache invalidate" operation doesn't actually zero its data SRAMs, and just resets the tag SRAMs? There are instructions on most platforms to read these back (if you're at the right privilege level). Timing attacks are also possible unless you hand-wrote that assembly knowing exactly which platform it's going to run on. Intel et al have a habit of making things like multiply-add have a "fast path" depending on the input values, so you end up leaking the magnitude of inputs.
Leaving aside timing attacks (which are just an algorithm and instruction selection problem), the right solution is isolation. Often people go for physical isolation: hardware security modules (HSMs). A much less expensive solution is sandboxing: stick these functions in their own process, with a thin channel of communication. If you want to blow away all its state, then wipe every page that was allocated to it.
Trying to tackle this without platform support is futile. Even if you have language support. I've always frowned at attempts to make userland crypto libraries "cover their tracks" because it's an attempt to protect a process from itself. That engineering effort would have been better spent making some actual, hardware supported separation, such as process isolation.