Hacker News new | comments | show | ask | jobs | submit login
Why is encryption so hard?
52 points by retube on Aug 9, 2013 | hide | past | web | favorite | 36 comments
So I see a lot of posts on HN that alude to difficulties when developing apps with encrypted comms. Do most languages not already have encrypt/decrypt libraries to leverage. E.g I would expect to be able to find a public-private (RSA) implementation in most languages, where I could do somthing like:

String encryptedMessage = RSAEncryptor.encrypt(publicKey, message)

Is this not the case? Do such libs have bugs?

You can certainly find libraries with interfaces like that. For example, OpenSSL has extensive libraries for all sorts of cryptographic primitives and protocols.

If you take a narrow focus on a particular cryptographic event (such as your encryption of a string with an RSA public key) then you miss the greater story about encryption: it's not just the individual cryptographic primitive that needs to be implemented correctly, it's everything else.

An RSA encryption like that does not stand alone. Keys must be generated, secured and distributed. The RSA library itself must be validated to ensure that it works correctly. The actual primitive must be used correctly (in the case of RSA don't use a stupid exponent as some have done). And the environment within which the encryption is used must be understood and secured (just look at the CRIME and BREACH attacks against TLS to see how something 'secure' can be broken because of something apparently irrelevant, in this case, compression).

The overriding reason that encryption is 'hard' is that secure computer systems have enemies and those enemies (attackers) will do _anything_ to attack the system. They will attack it based on timing, compression problems, flaws in the protocol, freezing the RAM to extract a private key, etc. etc. There's really no end to the variety of things you can try to attack a cryptosystem.

So, building a secure system may have encryption as a necessary condition, but it's not sufficient. So much else can and will go horribly wrong.

If you are interested in this hit the books and understand the history of cryptography. For example, look at how Vigenere was broken by Babbage, or the Venona ciphers, or Lorenz. These 'old' ciphers can tell you a lot about how people actually attack things. Then read about modern ciphers and attacks on them. Wikipedia has much. Read about TEMPEST and imagine other attacks possible in that way.

And lets not forget the entropy problems - if you control the RNG you control the encryption - case in point http://www.schneier.com/blog/archives/2008/05/random_number_... the infamous debian openssl error. So you need to be sure in your RNG.

Having a good RNG free from external interference is getting easier. Where you previously needed to buy expensive hardware or rely on external services like random.org, many "system on a chip" solutions have built-in hardware RNG (though not all systems build using them expose it in a useful manner) and the newest server lines of chips out of Intel do too.

The SoC used by the rPi for instance has such a generator, exposes as /dev/hwrng once the appropriate module is loaded. It passes all the tests I've exposed it to and seems to reliably serve ~550,000 bits/sec. No doubt many other small computers have a similar facility available by some means.

That would not have helped that infamous Debian bug of course, as that was due to a change meaning the SSL libs were not properly using the entropy pools that were available (no entropy source is useful if you ignore it).

Indeed. All of the crypto providers seem to be running into problems faced by the Allied militaries and security services as far back as WWII.

Even when Japanese Naval codes were unbroken, the U.S. Navy was able to provide Adm. Nimitz with valuable intelligence on Japanese fleet movements and concentrations by using nothing more than traffic analysis.

Now imagine that computers can automate traffic analysis and what that can bring, encryption or no...

I think you hit the nail right on the head. "Encryption" at a low level, like using AES to encrypt a block of data, is pretty easy. But you need a lot more than that to build an encryption system like a secure communications channel. Each individual part is relatively easy to get right, but putting them all together offers a lot of opportunity for attackers to find a way to exploit the system.

"They will attack it based on timing, compression problems, flaws in the protocol, freezing the RAM to extract a private key, etc. etc. There's really no end to the variety of things you can try to attack a cryptosystem."

I will say that in my experience 95% of the attacks are going to be social engineering, less about sophisticated things when the social is way easier.

The problem is that while probably FEWER than 5% of attacks are technical, they generate industrial scale issues -- e.g. tens of thousands of stolen identities.

Even so, while there have been high profile examples of encryption algorithms being shown to be flawed (e.g. RSA had to reissue all its dongles a couple years back because they were using a flawed RNG) I do not know of any actual successful dark-hat attacks along those lines (of course they may have occurred undetected or not been disclosed or I may simply be ill-informed).

High profile security breaches are generally a result of poor or no cryptographic practices, negligence (e.g. IEEE keeping its member records in a plain text file on an FTP server), or (as you say) social engineering.

In short, while really good cryptography may be hard, halfway decent is not hard, so it becomes a case of "assumed hard and left untried" rather than "tried and found hard".

Finally: there's also the problem of security theater, such as forcing people to change their passwords at a ridiculous rate.

Threats, imprisonment and physical violence can be used to attack crypto as well. Go after the weakest link.

relevant xkcd: http://xkcd.com/538/

Great reply, thanks

Lots of good answers here. NaCl (salt) is one (relatively) recent effort to be just such a library, see eg under the sub-heading "High Level Primitives" on the features page:

  High-level primitives

  A typical cryptographic library requires several
  steps to authenticate and encrypt a message.
  Consider, for example, the following typical
  combination of RSA, AES, etc.:

  * Generate a random AES key.
  * Use the AES key to encrypt the message.
  * Hash the encrypted message using SHA-256.
  * Read the sender's RSA secret key from
    "wire format."
  * Use the sender's RSA secret key to sign the
  * Read the recipient's RSA public key from wire format.
  * Use the recipient's public key to encrypt the
    AES key, hash, and signature.
  * Convert the encrypted key, hash, and signature to wire
  * Concatenate with the encrypted message. 

  Sometimes even more steps are required for storage
  allocation, error handling, etc.

  NaCl provides a simple crypto_box function that
  does everything in one step. The function takes the
  sender's secret key, the recipient's public
  key, and a message, and produces an authenticated
  ciphertext. All objects are represented in wire
  format, as sequences of bytes suitable for
  transmission; the crypto_box function
  automatically handles all necessary conversions,
  initializations, etc.

Of course, "such libs have bugs" -- it is software after all. But bugs can (and will be) fixed.

Somewhat unique to security and cryptography are the number of subtle bugs possible. There are both problems of actual "normal" bugs (like the Debian entropy bug) and system level design errors (like CRIME).

NaCl/Salt tries to reduce the number of errors possible by using the library wrong (as opposed to eg: openssl that has a very (some say too) rich interface). But you could still end up writing the secret key to swap. Or doing something silly with the plain text. Or expose yourself to a buffer overflow in the part of the code that renders those cute avatar-images for your chat application.

edit: formating

Encryption is basically a tool against wiretappers and men in the middle. But what if you don't fully trust your users, and fear they'll try to add exploits to your software ?

Another high level effort in security is meredith paterson's "language theoretic security"[1] can help you code secure protocols, to fight against this problem.

There's also a tool to help implement it called hammer[2].Not sure if fully developed yet.



Simply invoking some sort of "encrypt" library is easy, it's everything else that's hard, and you have to get it perfect.

- Simply encrypting your message as indicated will not protect you from replay attacks. Someone could record your message and re-transmit.

- Simply encrypting your message will not assure that the contents haven't been modified, someone could patiently sit in the middle poking bits to see what happens.

- Most encryption schemes will require you to choose a block cypher, doing so requires some knowledge of the options and the data you're sending. Some handle large amounts of data poorly, others fail when you send identical messages.

- Most encryption schemes will require you to initialize them with truly random data, both an early version of Netscape, and Debian messed something up and provided far less entropy than they appeared. Relying on /dev/urandom on a machine that's just booted, or otherwise faulty entropy providers is fatal.

- Attackers can record your data and play with it forever, so even if a mistake or attack isn't revealed for years, they can still go back and decrypt your data. I believe the NSA broke the Russian's use of a One Time Pad because they re-used pages years later.

- Simply encrypting data doesn't provide assurances that you're communicating with the system you think you are, the initial contact is still tricky.

So there's more to it than a single function call.

A few reasons.

The overarching problem is that you don't really get any feedback about whether what you're doing is right or wrong. For example, no cryptographer would use RSA like that, but that's not obvious just from studying the wiki article. Or from looking at the function output - it does turn ASCII into gibberish, as advertised, and that's where most developers will call it a day.

The moving parts are also treacherous. You're not just going to encrypt a string - someone is meant to decrypt it. Have you authenticated the ciphertext? Are you exposing a padding oracle? Or timing attacks? Are messages susceptible to replay? In crypto systems, these things are equivalent to locking the front door and leaving the window wide open.

In practice, most insecure crypto constructions aren't due to bugs in the implementation of RSA or AES. They're because of developers choosing inappropriate primitives, gluing them together incorrectly, or inadvertently exposing dangerous side channels.

Fortunately, there are libraries that can help. As mentioned elsewhere, NaCl/Sodium and KeyCzar provide higher-level interfaces that can abstract away many of these issues.

To answer the "why is it hard?" question, I tried to collect my own experiences at http://www.acooke.org/cute/WhyandHowW0.html - not sure I did a good job, but the main conclusion was that you underestimate how important experience is in avoiding errors.

To repeat what others have said in answer to your more general question - solutions to "real world" problems include more than a single call to a primitive. So you need to find libraries that provide a higher level API, like parts of NaCL http://nacl.cr.yp.to/, Google's keyczar http://www.keyczar.org/, etc.

Even for simply encrypting a string with a password - https://pypi.python.org/pypi/simple-crypt which is what I talk about in the first link - I needed three things: key strengthening, the encryption itself, and an HMAC. Making those work well together was harder than I expected (at least 5 bugs harder...)

Lots of good answers on this thread. I think the fundamental underlying reason is that programming is difficult and so poorly understood.

A given: all software has bugs. Usually, that doesn't matter — a CRUD app will eventually get debugged enough to the point of usability. (Sometimes even maintainability.) We do not understand enough about programming to guarantee perfect execution in all cases, but no one gains any value by causing an obscure input case to cause a null pointer exception.

Whenever we use crypto, however, we inherently have code which protects something valuable: from forum passwords to credit card numbers to state secrets. This means that all the subtleties which break in ordinary code, but no one cares about, suddenly become important. Every interaction of input to memory to processing to storage (to network) must be scrutinized for places where a crucial piece of data may leak an encryption key, or perhaps just enough known plaintext of known cyphertext to mount an attack.

Also, encryption is all about maths, so there are hundreds of ways to do just about anything, different parameters, different algorithms with different tradeoffs about speed, performance, resistance to attacks, data bandwidth, etc. etc. So I don't think a library with the kind of interface you describe would be very useful. But I do think it would be great to have a library that allows us to configure encryption based on requirements instead of technicalities.

There are two facts about crypto that often get mixed up in these discussions:

1. For a high value target like Edward Snowden, there is a broad spectrum of attacks, and any operational weakness is fatal. There are many examples of these attacks described on this thread. Unless you know what Snowden knows, odds are you will not get it right.

2. BUT, if everyone had easy encrypted email and real time communication, the mass surveillance machine would be blinded, because the kinds of attacks that are used against high value targets do not scale up well.

I guess the main problem is that encryption is foreign to most of us (myself included). It's hard to understand what is safe from what isn't.

It's also very hard to figure out if your encryption is bugged or not. I guess that for most us, once your method returns a hash, you expect that everything is secure.

On a side note, I wonder how many people on HN would claim to know the inside out of encryptions. (Not the difference between SHA1/MD5/bcrypt but the actual math behind derivations and how they work)

Encryption being painfully and needlessly difficult is one reason why it isn't widespread on both the business end and the consumer end. GPG, which _everybody_ should use for email, has one of the most terrible interfaces conceived. It is absolutely no surprise that people would rather be spied on than spend a week getting that POS working.

There's a massive market for easy-to-use encryption. Easy-to-use does not imply insecure in any way at all.

There are two orthogonal aspects, and you touch both of them.

1) All software has bugs. The problem with crypto software is that bugs are far more dangerous, even when they may appear to be insignificant. To use an utterly broken analogy: a faulty seal in a pressure cooker does not cause all the locks and hinges in your house to magically unscrew themselves. But even a slightly broken crypto implementation in software can cause a complete breach. (No pun intended.)

2) Developing good and intuitive UI's is hard. When the UI has to hide the complexity of secure key management, it's even harder. Humans are by nature lazy and inventive; if the UI allows any way to achieve convenience over security, conveniene will be what most of the users choose.

In my mind, there is one particular implementation where adding security allowed more convenience. The humble ssh-agent. When used properly, you don't need to know any of the passwords for remote systems. And unfortunately, the most convenient way to achieve this is, of course, to leave the private key either unencrypted or protected with an empty passphrase...

AFAIK there's no "encryption for humans" library (at least no widely known, widely used, widely tested one) - they all rely on the developer to specify the right parameters into the function, with no sanity checking asserts.

The results of this is things like the developer who used "1" as the multiplication factor, so to decrypt the data, you need to divide each block by 1...

Sometimes crypto libraries have bugs, but it's also easy to use them incorrectly, especially if you don't have an good understanding of cryptography.

For example, a common mistake is to assume that by encrypting something, attackers can no longer change it. Or perhaps you'll use your standard equality operation to check whether a decrypted string matches some value, without thinking about timing attacks. Or maybe you'll just use AES in ECB mode.

Just in your example there's already a problem. If you don't use something like OAEP padding (PKCS1 v1.5 padding has been proven to have issues) then you're vulnerable to attack (see: Bleichenbacher http://www.bell-labs.com/user/bleichen/papers/pkcs.ps).

For anyone interested, we have encryption examples in Python (PyCrypto & M2Crypto library), Ruby (OpenSSL), PHP (phpseclib), Java (Spongy Castle) and Objective-C (CommonCrypto) here: https://launchkey.com/docs/api/encryption

disclosure: I'm a co-founder of LaunchKey

I know it's cool to hate Microsoft and .NET here on [Y], but .NET framework actually comes with a ton of encryption classes & methods - http://msdn.microsoft.com/en-us/library/system.security.cryp...

There is a crypto challenge that explains many of the flaws in crypto done not exactly right, by giving real examples / puzzles on how to break the crypto. See http://www.matasano.com/articles/crypto-challenges/

This is exactly what the NSA wants you to think! Encryption is only a tiny part of the problem space, and yet still gets broken in fun ways (padding oracles, bad RNGs, etc). The more difficult mart is key management and distribution. This is where crypto rubs up against the human. Humans suck.

As a different, less technical response: crypto is so hard because it's natural to assume that cryptanalysts are so persistent.

A good crypto library should keep your data safe for decades. We don't make the same demands (no bugs, due to no updates possible) of other software that often.

Encryption is easy, security is hard. Every time you increase the security of a product, you decrease usability.

e.g. easiest to use :: SSH with password <<>> SSH with passphraseless keys <<>> SSH with passphrase-protected keys :: most secure

This is not entirely correct.

For your SSH example, pubkey authentication has a bit of a learning to to set up but then makes ongoing use far easier.

For networked file systems, at work we have a common shared drive (H:) and a private shared drive (J:). The private drive is clearly more secure as people who shouldn't have access can't even see the files, but it's also much easier to use as you don't have to sort thru tons of other people's old crap.


From their homepage:

Crypter crypter = new Crypter("/path/to/your/keys");

String ciphertext = crypter.encrypt("Secret message");

Encryption is hard because computers are constrained by (but exceptionally good at) discrete maths. All encryption does is slow cryptographic attacks down (a lot).

Also, proper key management is out of reach for most of us.

More simply put, and to answer the main (title) question, the reason encryption is so hard is because it's usually economically rewarding to break.

To step back a bit from the tech problems -- encryption is hard because not everybody uses it.

I think what we need, for email at least, is a completely new protocol that's end to end secure (as hard as that is). The problem though is that I don't think something like this can be done anymore, without "interested" corporations co-opting or talking it to death. The golden age of the internet is gone.

So, reading many of the replies in this thread, they all cover good points, but I have a slightly different point: The current encryption libraries make the "easy" stuff hard. If we assume that all the hard work of actually doing things is taken care of, and just looking at the API of the command line tools (the C api's are generally worse).

The NSS command line tools require you to provide a provide a entropy file of "sufficient size". "Oh!" you think, "I'm on linux, I can just use /dev/random". No, it tries to read the entire file, and thus hangs infinitely after using up all your entropy. "Ok, I'll do something like: dd if=/dev/random of=- bs=1024 count=1 | .... --nonce-file /proc/self/fd/0", nope, doesn't work, because it tries to read from this file multiple times. (Also how big is "sufficient"? 8196 bits of entropy seems overkill worth of entropy I just threw down the toilet, but the opposite case is even worse.)

The openssl command line tool has a simple "ca" command, which appears to be the "State of the art", but doesn't support revoking keys or even concurrent access. The openssl command line in general is confusing and difficult to remember.

PKCS#11 is a nice C api, but there's no standardised config to tell the machine to use PKCS#11 for everything, so you have to specify it on every command line. Forget it just once and you've possibly just created a key thats not in your HSM and you might not notice that.

A little bit of UI TLC would make the world much more secure. SSH sees high levels of uptake because while it has strong crypto, it's relatively easy to use.

Exploits with people using TLS and then not properly verifying the far ends cert is usually down to the fact that library that they're using doesn't provide a "just verify the cert for me" but expects every developer to implement it themselves. Most libraries will default to ridiculously stupid ciphers, even if the other end will let you negotiate stronger ciphers, again requiring every application developer to make sure that they have overridden the defaults to something sane, and perhaps just offloading the problem onto the system administrator.

djb's NaCl has a nice API, but it's incompatible with everything, so you can't use it to interoperate with the rest of the world. (Also djb has a history of abandonware, and licenses that prevent other people from continuing to maintain the software as the realities around it evolve, so it's probably not useful for something that's going to last >3-5 years.).

Crypto is notoriously difficult to get right, but the API of the current tools (both the language bindings and CLI tools) makes it far harder than it needs to be, inviting a slew of additional errors.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact