
Zeroing buffers is insufficient - MartinodF
http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html
======
pslam
Part 2 is correct in that trying to zero memory to "cover your tracks" is an
indication that You're Doing It Wrong, but I disagree that this is a
_language_ issue.

Even if you hand-wrote some assembly, carefully managing where data is stored,
wiping registers after use, you still end up information leakage. Typically
the CPU cache hierarchy is going to end up with some copies of keys and
plaintext. You know that? OK, then did you know that typically a "cache
invalidate" operation doesn't actually zero its data SRAMs, and just resets
the tag SRAMs? There are instructions on most platforms to read these back (if
you're at the right privilege level). Timing attacks are also possible unless
you hand-wrote that assembly knowing exactly which platform it's going to run
on. Intel et al have a habit of making things like multiply-add have a "fast
path" depending on the input values, so you end up leaking the magnitude of
inputs.

Leaving aside timing attacks (which are just an algorithm and instruction
selection problem), the right solution is isolation. Often people go for
physical isolation: hardware security modules (HSMs). A much less expensive
solution is sandboxing: stick these functions in their own process, with a
thin channel of communication. If you want to blow away all its state, then
wipe every page that was allocated to it.

Trying to tackle this without platform support is futile. Even if you have
language support. I've always frowned at attempts to make userland crypto
libraries "cover their tracks" because it's an attempt to protect a process
from itself. That engineering effort would have been better spent making some
actual, hardware supported separation, such as process isolation.

~~~
tgflynn
So is it correct to say that if a process does not want to leak information to
other processes with different user ID's running under the same kernel that a
necessary (but not necessarily sufficient, due to things like timing attacks)
condition is for it to ensure that any allocated memory is zero'd before being
free'd ?

I wonder if current VM implementations are doing this systematically.

It seems like a kernel API to request "secure" memory and then have the kernel
ensure zeroing would be useful. Without this I'm wondering if it's even
possible for a process to ensure that physical memory is zero'd, since it can
only work with virtual memory.

~~~
pslam
All kernels I know of zero all memory they hand over to user processes. It's
been part of basic security for quite some time - exactly for this kind of
thing. It's usually done on allocation, not free - it doesn't really matter
which way around, but doing it "lazily" can often be better performance.

~~~
tgflynn
In that case your original comment looks like the way to go and should make
pretty much everything else in this thread moot.

It seems like the key though is ensuring that your environment uses distinct
non-root users for all security relevant processes so that a security bug in
one process doesn't allow the attacker to gain access to others.

EDIT: On second thought there may be some advantage to effectively zeroing
memory for security critical data within a process but the likely value add
seems low to me. Once a process has been hacked it seems pretty unlikely that
you can hope to control what information it leaks.

~~~
clarry
Actually use of uninitialized memory is a reasonably common flaw and doesn't
imply the process has been or can be hacked to execute arbitrary code.

So wiping that sort of information as soon as it becomes unneeded is good
hygiene. And I still think it is reasonable to do the least you can to avoid
ending up with sensitive data on the disk after a core dump.

~~~
tgflynn
Use of uninitialized memory is certainly a common bug but I'm not seeing what
that has to do with zeroing free'd memory. It might be easier to detect such a
bug if the uninitialized memory is zero'd but it seems like the work devoted
to zeroing memory would be better spent fixing the uninitialized memory
accesses.

As for the second point production software isn't typically configured to
produce core dumps (ie. ulimit -c 0).

~~~
clarry
It's not so hard to zero memory when it becomes unused. So libraries like
LibreSSL do that. Increasingly, other applications are also starting to use
this pattern. It is easier to add a few safeguards into the library than it is
to fix every past, present and future application that uses it.

It's a start. Adding the safeguard doesn't mean effort won't be put into
fixing the actual bugs. But you just don't fix all the world's bugs overnight.
That's why things like virtual memory, permissions, chroots, ASLR, NX, SSP and
such exist.

How many systems enable core dumps by default? I don't actually know, but I
think quite a few do. Every application you use to get stuff done is a
production application. Every application that handles sensitive information
handles sensitive information whether it is in production or not. Leaking
passwords and keys can be as simple as working on some client software, having
it crash once, then passing through airport security and getting your HD
snooped on...

~~~
tgflynn
I understand your point but it seems like this approach ends up making
security dependent on an a very deep stack of technology solutions, each
rather fragile (as this post and thread demonstrate).

I wonder if it wouldn't make more sense to do a first principles analysis of
what needs to be protected and then design mechanisms at the appropriate level
of abstraction to ensure that these requirements are met. It seems to me that
this is the approach that has been traditionally taken in OS level design and
I agree that it hasn't worked very well. But I wonder if that isn't more
because applications and environments are not being carefully designed to take
advantage of the OS level security mechanisms that already exist.

Personally I would feel more confident depending on a robust kernel level
security mechanism than a hodgepodge of application level fixes that depend on
everything from compiler optimizations to CPU caching mechanisms.

~~~
clarry
I, for one, welcome new research. But it also needs to be demonstrated in
practice before it can be used in practice. Until then, it's all hypotheticals
and wishful thinking. Unicorns, basically.

In the meanwhile, I do what I can to review, audit, and apply best practices
at application level. These are the things we can do _here and now_. These are
things that are already in use to make your system more secure.

You're right that it is a hodgepodge of tricks and never quite perfect or
capable of blocking all attacks. In an ideal world someone would design and
give us a system that provides perfect security right out of the box, in a
small & elegant & easy to understand manner.

I'm not smart enough to do that so I'll only dream of the unicorns. :-)

------
willvarfar
Excellent point! I really hope such a sensible suggestion is added to
mainstream compilers asap and blessed in future standards.

Apologies to everyone suffering Mill fatigue, but we've tried to address this
not at a language level but a machine level.

As mitigation, we have a stack whose rubble you cannot browse, and no ... No
registers!

But the real strong security comes from the Mill's strong memory protection.

It is cheap and easy to create isolated protection silos - we call them
"turfs" \- so you can tightly control the access between components. E.g. you
can cheaply handle encryption in a turf that has the secrets it needs, whilst
handling _each_ client in a dedicated sandbox turf of its own that can only
ask the encryption turf to encrypt/decrypt buffers, not access _any_ of that
turf's secrets.

More in this talk
[http://millcomputing.com/docs/security/](http://millcomputing.com/docs/security/)
and others on same site.

~~~
polarix
What's the current status of the Mill project? Is there a proof of concept
compiler / emulator? What's the bootstrap strategy to get things rolling?

~~~
Symmetry
Last I heard they're still limited to sims and are concentrating on patent
filings. The bootstrap strategy is LLVM (once they get around LLVM assuming
addresses are integers as opposed to the Mill's compound things) and to get
Linux running on top of L4 which seems doable[1]. They say they're looking for
a niche to start in before going after PCs.

[1][http://l4linux.org/](http://l4linux.org/)

~~~
xorcist
Why on L4? Is Mill somehow tied to it, architecture-wise? Or is it just that
L4 has a smaller footprint and is easier to port?

~~~
willvarfar
Its about footprint. We certainly will run Linux on the Mill, but its work we
don't have to put on the critical path. L4 is just a familiar lightweight OS,
and we're keen to play with Mill-specific security features which are
particularly applicable to microkernels too.

When we do port Linux, I expect it to become much more microkernel like, as in
why would you want your disk drivers to be able to read write video memory
etc?

~~~
xorcist
> why would you want your disk drivers to be able to read write video memory
> etc?

That is a much bigger issue than the CPU architecture, as it has more to do
with how the peripheral hardware works (firmware, DMA, etc.), but I appreciate
the effort.

~~~
willvarfar
Well its how the CPU architecture exposes the hardware to software i.e.
drivers. The Mill does MMIO but doesn't have rings.

------
AlyssaRowan
It's becoming gradually more tempting to write a crypto library in assembly
language, because at least then, it says exactly what it's doing.

Alas, microcode, and unreadability, and the difficulty of going from a
provably correct kind of implementation all the way down to bare metal by
hand.

The proposed compiler extension, however, makes sense to me. Let's get it
added to LLVM & GCC?

~~~
ctz
That works for well-defined ISAs (like ARM), but not for those with
undocumented pipelines, or instructions defined by practise (like x86 and
amd64).

In other words, if you write a crypto library in x86 assembler, Intel _don 't_
guarantee that they won't introduce a side channel in their next chip model or
stepping.

~~~
AlyssaRowan
Sadly, I know that only too well: hence my "alas, microcode" comment! A prefix
or mode or something which allows code to handle secure data and it gets
constant-time multipliers, for example, or true µop-level register
zeroisation, would be handy, but also close to unverifiable - we just have to
sort of trust it, which sucks.

Until then, we do the best we can with turtles all the way down. Software
running under that same undocumented pipeline is going to find it very hard to
access or leak (accidentally or otherwise) internal registers, at least.

For the other avenue of attack (cold-boot attacks), it's also notable that
registers, at least, have extremely fast remanence compared to cache, or DRAM
- bit-fade is a very complex process, but broadly speaking, faster memory
_usually_ fades faster.

Digression along that vein: I basically pulled off a cold-boot attack on my
Atari 520 STe in the early 1990s (due to my wanting to, ahem, 'debug' a pesky
piece of software that played shenanigans with the reset vector and debug
interrupts), with Servisol freezer spray pointed directly at the SIMMs in my
Xtra-RAM Deluxe RAM expansion (and no, cold-boot attacks are _not_ new, GCHQ's
known about them for at least 3 decades and change under the peculiarly-
descriptive cover name NONSTOP, I believe?). It just seemed sensible to me:
cold things move slower, and they had a particularly long (and very pretty)
remanence - I was able to get plenty of data intact, including finding where I
needed to jump to avoid the offending routine and continue my analysis with a
saner technique (i.e. one that didn't make me worry about blowing up the power
supply or electrocuting myself)! It's harder these days - faster memory - but
the technique incredibly still works and was independently rediscovered as
such more recently: very much a "wait, this still works on modern RAM?" moment
for me. (By the way, when I accidentally pulled out the SIMMs with the
internal RAM _disabled_ \- whoops - and rebooted the Atari on my first try, it
actually powered up with an effect that I can only describe as "pretty rainbow
noise with double-height scrolling bombs" that would not have looked out of
place in a demoscreen! I don't know if that was just mine, but... the ROM
probably never expected to find RAM not working, and I guess the error-
plotting routine had a very pretty and unusual error in that event?)

I've never seen or heard of anyone pulling off a NONSTOP on a register in a
CPU, or actually even on an L1, L2 or L3 cache ( _maybe_ an L2 or L3 might be
possible, depending on design?). They're _fast_ \- ns->µs remanence? - and
cooling doesn't help much. I don't know if it's possible at all, but I'd
tentatively suggest that it might be beyond practical attack - unless the
attacker has decapped the processor and it's already in their lab (in which
case you're fucked, no matter what!). That's what suggests that approaches
like TRESOR (abuse spare CPU debug registers to store encryption key; use that
key to encrypt keys in RAM), despite being diabolical hacks, actually work.

If you fancy giving it a try in the wild by the way, I think a Raspberry Pi
might be a good modern test subject - the RAM's exposed on top of the SoC,
there are no access problems, and it's cheap so if it dies for science, it's
not such a problem. (Of course, you'd want probably to want to change
bootcode.bin so that it dumps the RAM after it enables it but _before_ it
clears it.) The VideoCore IV is kind of a beast - and is frustratingly close
to being able to do ChaCha20 extraordinarily efficiently, if I can just figure
out how to access the diagonal vectors... or if I can, or if I can fake it.

~~~
cesarb
If I wanted to read CPU registers from the outside, there's an easy way: JTAG.
You should be able to halt the CPU, read (and modify!) the registers, and
resume the CPU.

That should be possible even on x86, though on x86 the relevant documentation
is probably hard to find. For some ARM processors, it should be as easy as
installing openocd.

Of course, JTAG requires physical access to plug the debugging cable, which
puts it in a different category of attack.

~~~
AlyssaRowan
I can't believe I'd forgotten about JTAG! Yes, that's definitely more viable
than decapping! <g> Same completely-doomed threat model though ("attacker has
physical access, can do anything they want and take as long as they need").

Sorry, I've been dealing with a few things more recently which, uh, haven't
been quite so accommodating to analysis.

------
cesarb
For AESNI, you probably are already using some sort of assembly to call the
instructions. In the same assembly, you could wipe the key and plaintext as
the last step.

For the stack, if you can guess how large the function's stack allocation can
be (shouldn't be too hard for most functions), you could after returning from
it call a separate assembly function which allocates a larger stack frame and
wipes it (don't forget about the redzone too!). IIRC, openssl tries to do
that, using an horrible-looking piece of voodoo code.

For the registers, the same stack-wiping function could also zero all the ones
the ABI says a called function can overwrite. The others, if used at all by
the cryptographic function, have already been restored before returning to the
caller.

Yes, it's not completely portable due to the tiny amount of assembly; but the
usefulness of portable code comes not from it being 100% portable, but from
reducing the amount of machine- and compiler-specific code to a minimum. Write
one stack- and register-wipe function in assembly, one "memset and I mean it"
function using either inline assembly or a separate assembly file, and the
rest of your code doesn't have to change at all when porting to a new system.

------
kabdib
I don't think this can be a language feature. It's more a platform thing: Why
is keeping key material around on a stack or in extra CPU registers a security
risk? It's because someone has access to the hardware you're running on. (Note
that the plain-text is just as leaky as the key material. Yike!)

So stop doing that. Have a low-level system service (e.g., a hypervisor with
well-defined isolation) do your crypto operations. Physically isolate the
machines that need to do this, and carefully control their communication to
other machines (PCI requires this for credit card processing, btw). Do end-to-
end encryption of things like card numbers, at the point of entry by the user,
and use short lifetime keys in environments you don't control very well.

The problem is much, much wider than a compiler extension.

~~~
userbinator
This is also why dedicated cryptoprocessors exist, with special features for
attack resistance; I'm not completely certain about this, but I'd think the
software running on those does not have to zero memory containing keys,
because the whole environment that said software runs in has been secured from
the outside already, and if it's possible to read any memory or run untrusted
code from outside on those without being detected, then there are far bigger
problems to worry about...

~~~
AlyssaRowan
Having seen a few existing designs of those, up-close and personal - actually
they _do_ have to worry about zeroisation, quite an awful lot.

And sometimes they don't worry enough either. They ought to fail a FIPS-style
audit for that. But, well... they _ought_ not to contain proprietary LFSR
"crypto" algorithms, either. They are not as well audited, or as publicly
designed, as they ought to be: many are as black-box closed-source as they
could possibly be.

They tend to be based on extraordinarily old architectures with new bits glued
on - think Intel 8051, that kind of era. If you're really lucky you _might_
get an ARM, or at least a Thumb. People making them are notoriously hyper-
conservative (most don't support ECC yet, and many don't even go above
RSA-2048 or SHA-1 without going to firmware), and minimise any changes,
perhaps for cost reasons, the effects of which are not always positive
(actually, CFRG are discussing that general area right now in the context of
side-channnel defences for elliptic-curve crypto).

So, how would you think that environment translates to writing secure
firmware, or designing secure, state-of-the-art hardware? ;-)

~~~
lazyjones
Are current GPUs suitable subsystems for running properly isolated
cryptographic algorithms? If not, why not? If yes, perhaps a well-audited open
source library would be possible.

~~~
AlyssaRowan
In the presence of closed-source drivers that manage them and compile the
shaders? I'd say probably not. Something open-source with actual direct access
to the opcodes (would Mantle work? Intel's embedded GPUs?)… _maybe_. I don't
know if I'd consider them _secure_. Some running hot and loud and by the seat
of their pants? The presence of DMA? Hm. I have my doubts they'd be better
than the CPU safety-wise.

What I do know is they can usually run them _very fast_ if asynchronous low-
communication parallelisation is not undesirable - GPUs overtook PlayStation 3
Cell processors for the "throw watts at it, but we don't have enough money to
burn FPGAs or ASICs" class of crypto attacks quite a few years ago now. (As
anyone who runs a Bitcoin mine knows!)

Might be effective on, say, the Raspberry Pi, where the "GPU"'s the ringmaster
and the ARM's the clown. That vector processor looks tempting, and if I could
figure out how to get it to do diagonals, it's a poster child for ChaCha.

------
dmm
Remember this the next time someone says "C is basically portable assembler."
It's not precisely because you can do many things in assembly that you can't
directly do in c such as directly manipulate the stack and absolutely control
storage locations.

------
pbsd
> For encryption operations these aren't catastrophic things to leak — the
> final block of output is ciphertext, and the final AES round key, while
> theoretically dangerous, is not enough on its own to permit an attack on AES

This is incorrect. The AES key schedule is bijective, which makes recovering
the last round key as dangerous as recovering the first.

~~~
tptacek
How hard is that attack to code? I have a hard time imagining a case where a
target leaks just a subkey, so this is one of those things I knew "about" but
not "how".

~~~
cperciva
Dead simple. 2nd year undergraduate programming assignment.

~~~
tptacek
Is it perhaps so simple that... Colin Percival could effectively describe how
to do it in an HN comment, perhaps even challenging someone like Thomas Ptacek
to code it up and publish it instead of just yakking on HN like he always does
I hate him so much?

~~~
cperciva
Each word in the 4-word AES round keys is computed as w[i] = Mangle(w[i - 1])
xor w[i - 4], where Mangle(x) = Subword(Rotword(x)) xor Rcon for i%4=0 and
Mangle(x) = x otherwise.

Just turn that around and you get w[i - 4] = w[i] xor Mangle(w[i - 1]). Now
start with i = 43 (i.e., w[i] is the last word of the last round key) and
count backwards, filling in words of the round keys until you get to w[0].
Then w[0..3] is the AES key.

------
nly
Anything sent over HTTP(S), such as your credit card numbers and passwords,
likely already passes through generic HTTP processing code which doesn't
securely erase anything (for sure if you're using separate SSL termination).
Anything processed in an interpreted or memory safe language puts secure
erasure outside of your reach entirely.

Afaict there's no generic solution to these problems. 99.9% of what these code
paths handle is just non-sensitive, so applying some kind of "secure tag" to
them is just unworkable, and they're easily used without knowing it... it only
takes one ancillary library to touch your data.

~~~
Taek
Some of this can be addressed by never giving sensitive data to remote
servers. This wouldn't work for credit cards, but with Bitcoin you never need
to let a non-bitcoin library touch your private key, because that's not going
over https.

Similarly, if you encrypt all of your information from within a safe library
before handing it out to unsafe libraries, they can't leak anything. This can
add overhead and redundant encryption (and you still need to trust that the
remote server processing your data is safe), but there are steps you can take
to be more safe.

------
Someone
_" As with "anonymous" temporary space allocated on the stack, there is no way
to sanitize the complete CPU register set from within portable C code"_

I don't know enough of modern hardware, but on CPUs with register renaming, is
that even possible from assembly?

I am thinking of the case where the CPU, instead of clearing register X in
process P, renames another register to X and clears it.

After that, program Q might get back the old value of register X in program P
by XOR-ing another register with some value (or just by reading it, but that
might be a different case (I know little of hardware specifics)), if the CPU
decide to reuse the bits used to store the value of register X in P.

Even if that isn't the case, clearing registers still is fairly difficult in
multi-core systems. A thread might move between CPUs between the time it
writes X and the time it clears it. That is less risky, as the context switch
will overwrite most state, but, for example, floating point register state may
not be restored if a process hasn't used floating point instructions yet.

~~~
zAy0LfpBZLC8mAC
Register renaming doesn't work like that. How could register contents of a
process changing randomly even be usable for anything? Register renaming is
about dynamically mapping a small number of ISA register names to a larger
number of hardware registers to increase parallelism, but the whole reason for
the exercise is that those additional registers don't have ISA names, so you
obviously can't read them explicitly, at least not as part of the normal
instruction set, who knows what backdoors some CPUs might have ...

------
ggchappell
This article makes a good point, but I think the problem is even worse than he
describes.

Computer programs of all kinds are being executed on top of increasingly
complicated abstractions. E.g., once upon a time, memory was memory; today it
is an abstraction. The proposed attribute seems workable if you compile and
execute a C program in the "normal" way. But what if, say, you compile C into
asm.js?

Saying, "So don't do that" doesn't cut it. In not too many years I might
compile my OS and run the result on some cloud instance sitting on top of who-
knows-what abstraction written in who-knows-what language. Then someone
downloads a carefully constructed security-related program and runs it on that
OS. And this proposed ironclad security attribute becomes meaningless.

So I'm thinking we need to do better. But I don't know how that might happen.

~~~
danielweber
You remind me why it's so hard to do secure deletion: there are a bunch of
abstractions built on old assumptions that no one cares about secure deletion.
If you forget your pointer to that memory, it can be reused, so it's
effectively deleted, we're all good, right? Meanwhile, the file you "sync"ed
to disk might be synced to a network drive or flash memory or a zillion cache
layers.

I think we need, right at the base metal, a way of saying "this data needs to
_not_ be copied" and/or "if you do copy it you _must_ remember all copy
locations so we can sanitize them all." And then we require every abstraction
on up to have a way of maintaining this, the same way all the abstractions are
required to, say, let us read data.

Or I guess this is part of what HSMs are supposed to do -- do all your
"secure" work in something that is very strictly controlled.

------
anon4
If I have enough control to the point where I can read your memory in some
way, I can just use ptrace. Heck, I could attach a debugger. It seems
ludicrous to want that level of protection out of a normal program running on
Mac/Win/Linux.

Now, if your decryption hardware was an actual separate box, where the user
inserts their keys via some mechanism and you can't run any software on it,
but simply say "please decrypt this data with key X", then we'd be on to
something. It could be just a small SoC which plugs into your USB port.

Or you could have a special crypto machine kept completely unconnected to
anything, in a Faraday cage. You take the encrypted data, you enter your key
in the machine, you enter the data and you copy the decrypted data back. No
chance of keys leaking in any way.

~~~
nathan7
What you're describing is called a smartcard, and readily available on the
market. I keep my PGP key on one.

~~~
hollerith
Does your PGP key _stay_ on the smartcard or is a copy of it transferred to
your computer on occasion?

~~~
nathan7
The key can be generated on the smartcard, and it's not possible to transfer
it out of the smartcard by design. (anything that calls itself a smartcard but
allows this _isn 't a smartcard_)

------
Chiba-City
Please, assembly is OK. It's not even magic or special wizardry. My dad
programmed and maintained insurance industry applications in assembly side by
side with many other normal office workers for decades. Assembly is OK.

~~~
pinkyand
Assembly is bad for auditability, which is important in crypto to prevent
subtle errors.

~~~
AlyssaRowan
As a seasoned and experienced reverse-engineer myself, I'm (genuinely) curious
where you got that impression. Do you find it unapproachable?

Assembly is the simplest language you can write a computer program in, for a
certain very textbook definition of "simple" \- it's just that you actually
have to do _everything_ by hand that you normally wouldn't. And yes, that can
be a pain in the ass, and yes, you do have to watch out for not seeing the
wood for the trees - but one thing it most definitely is, is auditable.

Bearing in mind, say, the utter briar-patch that is OpenSSL: a crufty
intractably complex library written in a high-level language with myriad
compiler bug workarounds, compatibility kludges and where - despite it being
open source, and "many eyes making bugs shallow" \- few eyes ever actually
looked, or _saw_ , or wanted to see, and when attention was finally paid to
it, it was found wanting... might not assembly be perhaps better for a
compact, high-assurance crypto library? Radical, I know, but perhaps an
approach that's worthy of consideration.

I understand you may well be more familiar with high-level languages, and I
don't know if you're confident about your ability to audit that - but I must
point out, if you're auditing it from source, you're trusting the compiler to
faithfully translate it. So to actually audit the _code_ , you need to include
the compiler in that audit. Compilers have (lots of) bugs and oversights too
(lots of OpenSSL cruft is compiler bug workarounds, it seems?): as the article
points out, existing compilers just weren't really designed to accommodate
writing secure code.

Meanwhile an assembler makes a direct translation from source assembly to
object machine code - that is deterministic (a perniciously-hard process with
compilers) and much more easily, and automatically, auditable and indeed
directly reversible.

To be clear, I'm _not_ suggesting we replace, say, libsodium with something
written in assembly language tomorrow! There are _good_ high-level language
implementations. And inline assembly is already used in some places for
certain functions, including this exact one (zeroing memory), to try to
minimise the compiler second-guessing us. But as the article points out, that
approach only takes us so far, and it's something we need to be guarded
against when trying to write secure code.

~~~
tedunangst
The briar patch of OpenSSL is more in the high level protocol code, and not
the asm crypto (the perl obfuscation layer makes it fun, but isn't a major
source of bugs). I would not want to write a robust asn.1 parser in assembly.
Lots of other cruft works around the presence or absence of various #define
values in header files. Rewriting in assembly is not going to solve the
problem of deciding how big socklen_t is.

~~~
AlyssaRowan
Mm, true: the long grass of the libcrypto part pales in comparison to the
thorny nightmare that is the rest of it. Thank you to OpenBSD's LibReSSL for
beginning to clear away the worst of the bramble (and uncovering the
occasional juicy blackberry in the Valhalla rampage process).

There's a crypto library, and then there's a _protocol_ library. And TLS, to
put it politely, has lots of hairy bits, and I hope and pray TLS 1.3 makes a
positive impact on that, but I'm not yet sure if it will.

If one _were_ developing from scratch (and I am not) I'd wonder if a
reasonable approach would be a ridiculously low-level approach for the
primitives, but a ridiculously high-level approach for the protocol. I might
consider writing the first in assembly, but the second? If it involved an
ASN.1 parser? I would prefer to _do something else, anything but that!_ :-) At
the very least, if someone _did_ do it, we'd be able to see exactly how. We
just might not _want_ to! I would suggest perhaps instead, maybe something
involving formal correctness proofs and then converting _those_ to assembly,
because miTLS indicates that this can be an enlightening approach, and seL4
proves that it's _possible_ to actually make use of? (For someone else. I
think it is outwith my expertise!)

~~~
tedunangst
In light of the current discussion, it's hard to make such a clean
distinction. Your private key is going to be stored in a file that goes
through the PEM and ASN.1 parsers. It's going to hang around for a bit while
you sign stuff (using some sweet asm code), but now you need to dispose of it.
The object lifetime is often much longer than we'd like even with perfect
zeroing, and there are some ways to address that, but it casts a long "shadow"
on the call graph, not all of which can be made minimal.

In short: imperfect buffer zeroing probably reduces risk enough that it drops
below several other concerns.

------
cheez
The suggestion has the right idea, but the wrong implementation. The developer
should be able to mark certain data as "secure" so the security of the data
travels along the type system.

Botan, for example, has something called a "SecureVector" which I have never
actually verified as being secure, but it's the same idea.

~~~
cperciva
This was my initial idea, but talking to compiler developers convinced me that
the dataflow analysis needed for this would be tricky. They were much happier
with the idea of a block-scope annotation.

~~~
cheez
Similar data-flow analysis techniques as volatile.

------
delinka
Why are there no suggestions to change processors accordingly? Intel should be
considering changing the behavior of its encryption instructions to clear
state when an operation is complete or at the request of software. Come to
think of it, every CPU designer should be considering an instruction to clear
the specified state (register set A, register set B) when requested by
software. Then, the compiler can effectively support SECURE attributed
variables, functions, or parameters without needing to stuff the pipleline
with some kind of sanitizing code.

~~~
tedunangst
You can clear the CPU state. But how is the CPU to know when it's safe to
clear unless the software tells it?

------
db999999
Try:

    
    
      #include <string.h>
    
      void bar(void *s, size_t count)
      {
            memset(s, 0, count);
            __asm__ ("" : "=r" (s) : "0" (s));
      }
    
      int main(void)
      {
            char foo[128];
            bar(foo, sizeof(foo));
            return 0;
      }
    
      gcc -O2 -o foo foo.c -g
      gdb ./foo
      ...
      (gdb) disassemble main
      Dump of assembler code for function main:
       0x00000000004003d0 <+0>:	sub    $0x88,%rsp
       0x00000000004003d7 <+7>:	mov    $0x80,%esi
       0x00000000004003dc <+12>:	mov    %rsp,%rdi
       0x00000000004003df <+15>:	callq  0x400500 <bar>
       0x00000000004003e4 <+20>:	xor    %eax,%eax
       0x00000000004003e6 <+22>:	add    $0x88,%rsp
       0x00000000004003ed <+29>:	retq   
      End of assembler dump.
      (gdb) disassemble bar
      Dump of assembler code for function bar:
       0x0000000000400500 <+0>:	sub    $0x8,%rsp
       0x0000000000400504 <+4>:	mov    %rsi,%rdx
       0x0000000000400507 <+7>:	xor    %esi,%esi
       0x0000000000400509 <+9>:	callq  0x4003b0 <memset@plt>
       0x000000000040050e <+14>:	add    $0x8,%rsp
       0x0000000000400512 <+18>:	retq   
      End of assembler dump.

~~~
jedisct1
That should be __asm__ __volatile__ but extended inline asm, even with no
actual opcode (even if "nop" would work pretty much everywhere) is not
portable. So at this point, you might just use clang/gcc/icc pragmas instead.

------
erik123
It very much looks like a situation in which the system has already been
compromised and is running malicious programs that it shouldn't. These
malicious programs could still face the hurdle of being held at bay by the
permission system that prevents them from reading your key file.

However, they could indeed be able to circumvent the permission system by
figuring out what sensitive data your program left behind in uninitialized
memory and in CPU registers.

Not leaving traces behind then becomes a serious issue. Could the kernel be
tasked with clearing registers and clearing re-assigned memory before giving
these resources to another program? The kernel knows exactly when he is doing
that, no?

It would be a better solution than trying to fix all possible compilers and
scripting engines in use. Fixing these tools smells like picking the wrong
level to solve this problem ...

~~~
clarry
I'm not sure this is the scenario we're fighting. The problem is when your
program (which handles sensitive data) has a flaw in it: for example, it might
be possible to trick it into leaking uninitialized data (possibly out of
bounds) over the wire. Another potential issue is core dumps (and maybe
swapping, but that's a little different). You don't want sensitive data to be
written on the disk.

Malicious programs running with your program's privileges are a different
scenario altogether, and usually they can do a lot of damage. Want sensitive
information out of another process? Try gdb.

But yes, it is trivial for the kernel to zero a page before handing it out.

~~~
tgflynn
What about malicious programs without privileged access ? Is it possible for
them to just keep requesting new memory pages from the kernel and see leaked
data that was free'd by another process they shouldn't have access to or is
this something kernels are already preventing ?

------
gioele
WRT the AESNI leaking information in the XMM registers, wouldn't starting a
fake AES decryption solve the problem?

Also, wouldn't a wrapper function that performs the AES decryption and then
manually zeroes the registers be a good enough work around?

~~~
AlyssaRowan
If you're using AES-NI, you're already using an intrinsic. I haven't yet met a
compiler "smart" enough to recognise an AES implementation and replace it with
AES-NI, and god, I hope I never do.

Yes, you probably ought to be clearing xmm* registers touched by it, and that
would _I hope_ be good enough.

The point in the article about compiled code very seldom touching xmm* so that
if you don't wipe it - is doing so currently common practice? I haven't
checked, but I feel like that would be something that needs checking! - it's
hanging around and you might leak it, is completely valid, however.

------
Demiurge
Every time I read one of these posts about a clever "attack vector", how
something can be gleaned from this special register, or a timing attack,
somesuch, I remember about a theory that the sound of a dinosaurs scream can
be extracted from the waves impact made on a rocks crystal structure.

I googled pretty hard for real life example uses of a timing attack, and now
using of stale data on the register, but couldn't find anything. Does anyone
know of examples of this actually being done?

~~~
Taek
These types of attacks though only require one person to create a system that
can reliably exploit them, and then the vulnerability will be in the wild and
a more significant problem. Pulling off this type of attack is difficult, but
you only need one piece of malware that has a reliable way to exploit this in
a general case and then it becomes available to every script kiddie who finds
some motivation for stealing private keys.

These type of attacks also might become more of a problem as more sensitive
computation is done on shared machines (IE cloud compute).

So, while there's no reason to panic because these security features aren't
implemented hardly anywhere, you can't let the issues sit unaddressed for long
periods of time.

~~~
Demiurge
But there is a whole range of potential issues. Or things compiler developers
can do. As any task, they should be sorted, weighted by ease of exploitation
and ease of solving. What I suspect, and I'm just curious to see if I am
wrong, is that developers postulate vulnerabilities that real hackers would
never bother with, and miss what they really go for, such as trivial mistakes,
such as forgetting bounds checking.

So, I've seen a lot of (conceptually) trivial exploits and combinations of
trivial exploits, but I would love to see a real world example of someone
collecting enough information from an 'bad RNG', registers, or timing, to do
anything with it.

------
lnanek2
Doesn't actually seem true. OK, running the decrypt leaves the key and data in
SSE registers that are rarely used where it might be looked up later by
attackers. There isn't any portable way to explicitly clear the registers.
Then why not just run the decrypt again with nonsense inputs when you are done
to leave junk in there instead? Yes, inefficient, but a clear counter example.
You could then work on just doing enough of the nonsense step to overwrite the
registers.

~~~
TheLoneWolfling
> Then why not just run the decrypt again with nonsense inputs when you are
> done to leave junk in there instead?

Because the compiler is perfectly within its rights to optimize that out!

~~~
MichaelGG
Not if you write the junk output to volatile variable, right?

~~~
TheLoneWolfling
Let me clarify.

If you use a deterministic nonsense value, the compiler can turn the result of
decrypt(nonsense) into a constant at compile time, and just directly output
the constant to the volatile variable, without actually ever calling decrypt
again at runtime. So it can turn this:

    
    
        decrypt(real);
        nonsense = <whatever>;
        volatile junk = hash(decrypt(nonsense));
    

Into this:

    
    
        decrypt(real);
        volatile junk = <the appropriate constant value>;
    

But even if you nonsense is non-deterministic (although I question where you
are getting the entropy - if you're using a syscall / random / etc your
performance has potentially just gone out the window), the compiler is well
within its rights to optimize the second junk decrypt of the nonsense input
differently than the first (real) decrypt - in such a way that it does not
overwrite everything left behind by the first decryption.

(Same with encryption)

~~~
Roboprog
I think the "slowness" aspect of "do it twice" is a given, making the best of
a bad situation. Clearly, the people who want languages or hardware to do this
"right" have a better solution, but if you have to get by with C on existing
hardware, it would seem that an option in a library to select further security
at less speed is reasonable. Of course assuming said option runs code that
won't be optimized away.

------
ge0rg
Even if the proposed feature is added to C and implemented, there is still the
(practical) problem of OS-level task switching: when your process is
interrupted by the scheduler, its registers are dumped into memory, from where
they might even go into swap space.

It would be consequential (but utterly impractical) to add another C-level
primitive to prevent OS-level task suspension during critical code paths. Good
luck getting that into a kernel without opening a huge DoS surface :)

~~~
kazinator
The obvious fix is to address "might go into swap space". However, the real
problem is that the process can be interrupted at any time and examined, not
that the registers might go to swap.

If someone has the root privs to peek at your memory, they can also stop your
process at any time and examine all the registers, whether they were swapped
out to disk or not.

Moving the crypto code into the kernel and running with disabled interrupts
doesn't help because the attacker is already assumed to have super-user
privileges (they can peek at arbitrary RAM, after all). There are also non-
maskable interrupts.

You basically cannot hide the machine state from someone who controls the
machine: not without splitting the machine itself into additional privilege
levels, such that there is a security level that is not accessible even to the
OS kernel. The sensitive crypto routines run in that level. The manufacturer
of the SoC provides these as firmware, and the regular kernel has no
visibility to the internals.

ARM has a security model that supports this.

There is also something even more paranoid called TrustZone:
[http://en.wikipedia.org/wiki/ARM_architecture#Security_exten...](http://en.wikipedia.org/wiki/ARM_architecture#Security_extensions_.28TrustZone.29)

------
zvrba
Posts like this make me just more convinced about that C combines the worst of
"portability" and "assembly" into "portable assembly".

------
cousin_it
I don't completely understand the C spec. Would the following approach work
for zeroing a buffer?

1) Zero the buffer.

2) Check that the buffer is completely zeroed.

3) If you found any non-zeros in the buffer, return an error.

Is the compiler still allowed to optimize away the zeroing in this case?

~~~
pascal_cuoq
> Is the compiler still allowed to optimize away the zeroing in this case?

Yes, completely. In the snippet below, the compiler is allowed to eliminate
all code after “leave secrets in array c”.

    
    
      {
        char c[2];
        ... /* leave secrets in array c */
        memset(c, 0, 2);
        c[0] = 0;
        c[1] = 0;
        memset(c, 0, 2);
        if (c[0] || c[1]) exit 1;
      }
    

The compiler is also allowed to compile the last three instructions below as
if they were “return 0;”

    
    
      {
        char c[2];
        ... /* leave secrets in array c */
        c[0] = 0;
        c[1] = 0;
        return c[0] + c[1];
      }

~~~
lazyjones
> _In the snippet below, the compiler is allowed to eliminate all code after
> “leave secrets in array c”_

gcc 4.4.5 doesn't though (-O3), it still clears the stack once and performs
the comparison.

I believe these optimizations can be defeated by declaring a global

    
    
      volatile char fill = 0;
    

and using that instead of 0 in memset().

~~~
MichaelGG
It's not guaranteed to defeat the optimization. For instance, it could just
read fill into two registers and do the comparison there.

------
ausjke
There are some chips providing zeroizing a small region of device memory when
needed and it's specially designed to hold encryption keys etc. It's also done
by hardware.

------
rsync
Would running your file system read only and optimizing the system for fast
bootup be a workaround ? If so you could zero successfully by rebooting...

~~~
tedunangst
After what? Every https request? Simply exiting the process is sufficient to
prevent most info leaks, but even that's much too slow and not even a
solution. The class of bugs here is that sensitive data is in memory and then
the same program inadvertently leaks it while performing some other operation.
If you reboot before the leak, you won't make it to that other operation,
sure, but your program won't be much use either.

User logs in by sending password. System transitions to authorized state.
System wants to wipe password to avoid later leak. If you reboot at this
point, the user will no longer be authorized.

------
higherpurpose
> It is impossible to safely implement any cryptosystem providing forward
> secrecy in C

What about Rust?

~~~
pcwalton
Rust has the same problem described in the article.

~~~
nathan7
I'm guessing that mitigating this at the Rust level isn't doable, because its
memory model has the same properties with regards to zeroing. To change that,
LLVM support would be needed. This does make me wonder — how do you integrate
this into a type system? Rust has already done a pretty awesome job at
integrating memory-safety into the type system, but memory-secure type systems
seem fairly unexplored.

~~~
Jweb_Guru
Memory-security as defined in this article isn't exactly safe. It's just a
mitigation feature. The comments above provide plenty of examples of how once
an attacker is on the system, he or she can easily get past any language-level
construct you care for. If the system were _completely_ memory-safe (which
would mean no memory safety bugs in the hardware, kernel, SELinux (or some
other kernel extension that lets you do things like deny ptrace), LLVM, the
Rust compiler, any libraries you're using, or your program itself, _and_ you
weren't doing anything that completely negated all those benefits like JIT
compiling code, then you don't need to zero the memory at all--it will do
nothing for you as a mitigation technique, and you'd be "memory safe" by your
definition. But you're not out of the woods yet, because even with zeroing you
are STILL vulnerable to ordinary, non-memory-safety bugs in your code allowing
that data to be read. Multiple threads, forgotten heap allocations, and so on.
A user with sufficient privileges could glance at the available cache lines.
Etc. Anyway, this entire scenario is a fantasy because your system isn't fully
memory safe :)

The only way I can think of to actually guarantee real memory security in any
meaningful way is to _completely_ verify a much smaller system (not just
memory safety, but that it's actually bug-free), isolate it at the hardware
level, and do _all_ of your computation using that hardware isolation feature.
It _has_ to be hardware because, for example, there's no reasonable way to
deterministically erase data swapped to SSD. You'd still be susceptible to
hardware bugs, but you can't ever protect against those completely. So
basically, get an HSM :)

