Hacker News new | comments | ask | show | jobs | submit login
Basic Cryptography Concepts for Developers (paragonie.com)
62 points by paragon_init on Oct 7, 2015 | hide | past | web | favorite | 45 comments

I once saw a production site that stored the password in plaintext, and then ran base64(base64()) on the result and sent that in a cookie. It still gives me nightmares sometimes. That's what I consider to be criminal incompetence.

Oh, you're going to love this - I worked for a firm that developed electronic trading software, back in the late 90s. They "encrypted" their incoming connections with a simple XOR scheme that used a "key" which was, basically, a random number. You want to guess how they generated the random number? They seeded the random function with the current time... now, almost all of their clients would just leave their trading systems running overnight, so in the morning all the clients would reconnect at just about the exact same time (market open), so then 90% of the clients would end up using the same "key" to encrypt their sessions.

Now, ordinarily it would be bad enough that almost all of your clients are using the same encryption key, but it got much worse from here - turns out there was a bug in a proxy server they had written, so that every once in a while client B's traffic would be routed into client A's session, and hilarity would ensue; because the "keys" were the same, Client B's traffic looked like normal traffic from client A, so nobody could ever figure out why Client A would call us up and complain that he was getting fills for orders that he'd never sent! It took me forever to figure out WTF was going on, but one day I was watching a logging display and noticed garbled traffic showing up in Client A's logs, and it was garbled because it was a user who actually didn't auto-logon at 0930 in the morning and subsequently had a different encryption "key" resulting in his traffic not being decoded properly. From there, it became obvious what the problem was.

Edit: Yeah, yeah, I know - you're thinking "Wait, XOR? shared secret? But, but... how did they do key exchange?" Well, how the heck else would you send a client the secret key for the session - in plain text, when the client first logs on, of course!

I would love to know why the programmer thought the second base64 would help.

"No one would ever expect the data to be base64-encoded twice! Mwa ha!"

Probably. "Looks like gibberish. Base 64 decode it. Still looks like gibberish. Must be random bytes." Never mind that random bytes are not likely to all be ascii letters and numbers.

Random ascii letters and numbers are just as good as random bytes.

Sort of.

There's less entropy per character if you know it will always be alphanumeric versus a string that of random bytes.

If you take 32 bytes from /dev/urandom and base64 encode it, it has the same entropy as the original binary string (unless you use it in a way that will result in it being truncated).

But the total entropy of the two strings is unchanged. One is just longer.

In short: You're correct, but with nuance that bears stating explicitly.

Random ASCII letters decoded as base64 are unlikely to result in only ASCII letters. And if you generated random letters to start, there's no reason to base64 encode it.

That's why I always base64-encode three times. And then swap the first and last character.

Man, you gotta rot13 that shit a couple times, just to be extra safe. And make sure you generate a random seed to to seed your RNG with. Twice as random, right?

He knew that something wasn't quite bulletproof with his "encryption" scheme. He left a comment: "Won't stop 1337 hackers".

Yeah, I mean the IETF thought one pass of Base64 was good enough for HTTP.

Should've bese64-encoded one more time just in case to add another layer of security.

You mean sort of like this? (From the article.)


> "Furthermore, don't confuse password hashing algorithms with simple cryptographic hash functions. They're not the same thing:"

"Password hashing algorithms" should probably be replaced with "key derivation functions" - which makes it clearer what they are, and makes further reading (by the reader) into KDFs easier.

For Password Hashing Competition (https://password-hashing.net/) we tried to come up with a better term, which describes both password-based KDF and password hashing for server storage, but couldn't, so it's been decided to keep calling those things "password hashes".

- "KDF" says nothing about passwords (e.g. see HKDF) and,

- "PBKDF" (password-based KDF) says nothing about password verifier, such as for server storage (e.g. bcrypt doesn't produce a "key"), plus already taken by specific algorithms,

- "password hash" sounds good for both key derivation and verifier, but makes it easy to confuse with normal hashes.

Great article for a basic primer, but anyone who is actually developing cryptography needs to hit a few more books - short articles like this are great because they make it simple, but that can often lead to the perception that a dev understands everything they need to, which leads to broken crypto in production and tears.

Reading Ross Anderson's "Security Engineering" is probably a good idea. It covers security as a system, instead of concentrating just on encryption.

PDF of the first and second editions available at [ https://www.cl.cam.ac.uk/~rja14/book.html ]

Security engineering is a great book. I learned a lot about systems (as opposed to software) from the first edition. Definitely worth a read for getting the 'security mindset'. Techniques can be picked up according to the needs of the job (and should be!)

Whoa, awesome! And it's online for free too.

I've looked at the first chapter so far and I'm hooked.

I'm adding this to https://github.com/paragonie/awesome-appsec :D

This is an excellent point. Cryptography is not a trivial field to master.

(I've updated the title to read "Basic Cryptography for Developers" instead of "Cryptography for Developers" so as to not accidentally make someone think they know it all.)

This is very true. I've been involved in building systems involving cryptography for about a decade now, and having transitioned from the weeds (side channels, physical security) to the forest (secure systems architectures), I think I could have spent the whole time in one fairly narrow field and still not know half of it. At best, I can say that I've gained an appreciation for where the demons lie. At worst, I can say that I'm educated enough to be dangerous ;)

It's a pity that very few customers are willing to pay for the effort to exorcising all (or at least most) of those demons in the systems I've seen deployed. The product differentiator (and hence the ROI) is almost always somewhere else. The science & engineering is vastly better than it was when I enterted the field, but it still has a long way to go.

What I find frustrating, personally, about cryptography (or security in general) is that a lot of knowledge seems to be concentrated in a very specialized area, but it doesn't diffuse very far outside of there.

For example: https://twitter.com/voodooKobra/status/651554578719227904

To try to combat this, I've been trying to help make basic security and cryptography knowledge accessible to web and mobile app developers, to hopefully result in an overall net gain for the security of many companies the world over.

Obviously I can't teach everyone everything there is to know about these areas. (For starters, I myself probably don't even know half of it. Lattice-based timing attacks? Not a clue!) But realistically if I can make a dent in the propensity for bad habits and worse design choices, it's something.

Agreed. I've started poking into security for http-based things (web sites, REST api's, etc), and it's got a mostly different knowledge base from the one I've studied. Compare, for example, the CHES[1] proceedings (a good source of interesting side-channel attacks and mitigations back when I was doing that) with something like the hackers playbook. Once you get down into the details, these fields have a lot of unique concerns. Some implementation principles (least privilege, reference monitors, etc.) may have value in both domains, but the domain specific knowledge is completely siloed as you have observed.

That said, my biggest lamentation is that most people don't seem to apply solid implementation principles to the systems they build. Witness vulnerabilities like heartbleed. It would have easily been avoided if the common buffer reuse guidelines in pretty much every implementation guide for security would have been used.

[1] http://www.chesworkshop.org/

> Some implementation principles (least privilege, reference monitors, etc.) may have value in both domains, but the domain specific knowledge is completely siloed as you have observed.

Some what conversely, I've proposed a model for classifying various forms of security vulnerabilities that might be easier for developers to conceptualize:


The idea is treat it like a taxonomic model: You have general security mistakes (data-instruction confusion), which can be drilled down into vulnerability classes and then specific bugs that, either stand-alone or chained together, result in specific vulnerabilities in specific implementations.

Or maybe I'm way off base here. The feedback I've gotten has largely been positive, though.

I think we're in violent agreement. I'm a fan of the high-level approach taken by the common criteria[1], which emphasizes cataloging threats, security objectives, environmental concerns & vulnerability mitigations when designing secure systems. This seems to overlap more than a bit with your thought process based on your blog post.

The challenge is to get people with good critical thinking skills (like the kind you advocate, and embodied in proceses like the CC) and domain knowledge involved in building things. For example, I think I have a good ability to reason (I started my career in formal methods) but at present I know very little about how a browser actually treats it's inputs. So, it's quite difficult for me to reason about the application operating environment presented by a web browser running some javascript. I wouldn't trust myself to start coding a secure web site today. Hopefully the industry will foster both of these skills (a critical mindset & good domain knowledge) and put them to good use.

[1] http://www.commoncriteriaportal.org/

> The idea is treat it like a taxonomic model..

You are aware of Mitre's Common Weakness Enumeration, right?


How does https://www.crypto101.io/ compare?

Crypto 101 is a ~200 page book that is far richer in detail and examples. It also covers a lot of primitives (one-time pads, hash trees, etc.) that we eschewed for the sake of brevity.

This blog post is digestible in a few minutes. Crypto 101 can occupy a novice for a few hours. Given the hard choice of one of the two, we would encourage anyone interested to read Crypto 101 over our blog post, as they'll walk away with a much more detailed understanding .

I really don't like to see code examples with all the

  /* Don't ever really do this */
comments. Why not show the correct way to do it?

It's not any more clear to see an example of how NOT to do something. And yes, despite all the comments, someone will copy and paste that into their program, remove the comments, and then never really properly update the code.

Doing things the right way is not very illustrative. This isn't meant to teach developers how to reinvent NaCl in $langOfChoice, it's meant to instill an understanding of the vocabulary.

If nothing else, it aims to put to rest stupid phrases like "password encryption".

If people copy&paste despite all the warnings, they're beyond the help of any blog post.

EDIT: In the spirit of being more helpful, I've added links to more complete sample code (on StackOverflow) and made the warnings less obnoxious.

In that case they should just leave out the examples entirely. If the code runs, someone in the world WILL copy & paste it into their project and walk away, no matter if it's marked as insecure or not, which just perpetuates the problems that the post is trying to deal with.

Also I'm not sure "password hashing" is any more descriptive than "password encryption."

> In that case they should just leave out the examples entirely.

I'll consider it.

> Also I'm not sure "password hashing" is any more descriptive than "password encryption."

It absolutely is. Encryption is a two-way transformation of data. It is, by design, reversible.

Hashing is one-way.

Password hashing is a special case of hashing where you want it to be computationally expensive (even with special hardware at your disposal) to attack, but still perform well enough to interact with.

Password encryption implies that a two-way transformation has taken place, and given the key, you should be able to reverse it. This is not within the scope of the requirements for secure password storage.

Funny thing is I did BASE64 a password. Sort of. I used to run strong passwords + random stuff on paper through SHA-2 with result encoded in BASE64. Optionally an extra iteration or two for SHA-2. Used BASE64 because it fit the max number of characters for TrueCrypt passwords: my use case for this method. Had a tool to make this easier.

Idea was to obfuscate a strong password to make cracking more difficult. Plus, it was relatively easy to implement on arbitrary computers. These days I just memorize them or write them on paper.

If you use bcrypt (which we recommend), you're technically using a variant of BASE64 too.

The title is a reference to a meme ("You wouldn't download a car!") and also a reference to some absurdly bad cryptography, e.g. http://www.cryptofails.com/post/87697461507/46esab-high-qual...

I've actually seen production code that BASE64 the password of a service account a script needed. Spotted it because the password ended in "==" and, sure enough, the script only decoded the BASE64, no cypher needed. Advised the client and chown + chmod the file. Client wanted the BASE64 encode still...

I don't quite understand the point concerning executable downloads and comparing hashes. Yes, comparing a hash found on a file downloaded from mozilla.com and a hash also on mozilla.com is stupid. However comparing a hash on mozilla.com and a download/torrent from an untrusted source seems to be valid and useful. The only attack vector in that case is at mozilla.com and not the download source.

I recall a story about an infected version of qemu (might have happened to other software) for Windows. Basically they hacked the site, replaced the binaries with infected ones AND updated the hashes.

I also recall one or two stories where the binaries were infected but the hashes not updated - this was obviously caught pretty quickly and fixed.

However, I remember a time when Firefox served downloads directly from their mirrors. This case could be good for comparing hashes - but now it looks like they use Amazon's cloudfront.

But yes - for the average guy generating a hash for your releases (where your release and hash comes from the same server) doesn't provide any real benefit.

Running it through VirusTotal is neat, as it'll tell you when it has first seen a file. If the file is old enough and the hash has been seen for a long time then it makes it less likely to be a fake. (Unless you think e.g. Mozilla has been compromised for a long time.)

> I don't quite understand the point concerning executable downloads and comparing hashes.

Downloading a file from Server A and checking the hash delivered by Server A is security theater. In this case, only a digital signature (with a pre-established public key that you already trust) can really stop the server from being compromised (or malicious).

Downloading a file/torrent from Server B and verifying the hash delivered by Server A is a different situation entirely, and that boils down to a trust decision. Do you trust Server A to not be compromised? Do you trust them to not be malicious or in cahoots with Server B? If not, at the very least it's probably a larger attack service than the previous scenario. (Trusting the public key for the digital signature is also a trust decision. Only the details are different.)

Basically: If you're going to do anything at all, verifying hashes from the same source is a waste of CPU and human effort.

I hope that helps at all.

You made exactly the same point as the post you are replying to, you just used more words.

I wasn't really trying to argue, they said they didn't understand the point.

They didn't understand why the article says that comparing hashes is "a completely ludicrous waste of time." In some cases, it's not (as you both mentioned).

Re-read their post. I think you probably didn't understand their point. :)

That's certainly possible. I'm not seeing what I missed. Maybe if I sleep on it, it will be clearer?

I've updated the article to make the context (and consequences) a bit clearer. I'm sorry for wording that part so weirdly before.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact