Hacker News new | past | comments | ask | show | jobs | submit login
The Secure Programmer's Pledge (ircmaxell.com)
59 points by ircmaxell on July 16, 2012 | hide | past | favorite | 41 comments



Problems:

I will only use vetted and published algorithms

This abets a hugely widespread misunderstanding about the security of crypto. You could in fact invent your own block cipher core, and if you dropped it into Keyczar or cryptlib's high level library be more secure than people using AES-256 directly via OpenSSL. The problem isn't algorithms. The problem is constructions, particularly at the joints.

You should change this to "I will only use vetted high-level cryptographic libraries", with the descriptive text explaining that a "high-level cryptographic library" is one that handles key generation and makes all the decisions about block cipher modes and MACs and verification and order-of-operations for you.

So with that having been said:

I will not store sensitive data in plain text

Encouraging people to encrypt data while giving them bad advice about how to accomplish that is a recipe for disaster.

Next:

I will use parameterized queries when executing SQL

Specifically, you write: "Parameterized queries are a better way of solving the problem, because it doesn't require any escaping". This is wrong. Most database protocols will allow you to bind data to a query, but not keywords, or even limits and offsets. A whole generation of programmers has been convinced that using parameterized queries shields them from SQL Injection, while writing pagination code or sortable tables that are trivially injectable.

Finally:

I will understand the OWASP top 10

I can't knock you for asking people to know what the OWASP top 10 is, but contra the words in your pledge, "OWASP" (whatever that is in reality) does not "track" the "top 10" vulnerabilities in any methodical way. It's basically just a bunch of people getting together and making a case for what they think the most important vulnerabilities are. If you know very little about web security, the OWASP Top 10 is a fine starting point, but your readers should know that's all it is.


One point: this list/pledge is for average developers, not crypto experts...

> This abets a hugely widespread misunderstanding about the security of crypto. You could in fact invent your own block cipher core, and if you dropped it into Keyczar or cryptlib's high level library be more secure than people using AES-256 directly via OpenSSL.

You could of course. But the average developer cannot. It takes quite a bit of knowledge about cryptography to be able to do this and have it be more secure than AES... And even if you have that knowledge the chance that a mistake was made is high enough that you shouldn't use it anyway (the algorithm isn't vetted). So I stand behind my point...

> Encouraging people to encrypt data while giving them bad advice about how to accomplish that is a recipe for disaster.

What bad advice? The only thing I said was hash it if you just need to verify, or encrypt if you need to reverse it.

Additionally, would you rather have CC numbers stored in plain text? I'd rather have a botched encryption on them that's somewhat easy to break than have it in plain text...

> Specifically, you write: "Parameterized queries are a better way of solving the problem, because it doesn't require any escaping". This is wrong.

It's not wrong. Raw user input should never enter a query. Period. If you're going to paginate or sort or filter, you need to white-list filter on available fields and values. Escaping and adding it to the query is just a recipe for disaster... Obviously just using a parameterized query API isn't going to do it for you. But escaping, in any context, is an incorrect way of handling it...

> If you know very little about web security, the OWASP Top 10 is a fine starting point, but your readers should know that's all it is.

I'm not trying to suggest that they should only know the top 10, but that they should know it in its entirety...


Do you understand why I said a developer who wrote their own block cipher core and plugged it into Keyczar would be more secure than a developer who used AES directly?

If you don't understand, why do you "stand behind your point"? Why don't you instead try asking questions?

Similarly: you wrote none of that stuff about "whitelisting" (whatever it is you mean by that) in your "pledge". You just told developers, "use parameterized queries so you don't have to escape them". And now, when it's pointed out that that's not great advice, you find a way to argue with it?


Ok, I'll bite. Why would it be more secure?

Would the following block cipher be more secure than AES?

    function encrypt(block, key) {
        return block XOR key;
    }


That block cipher would not in practice be much worse than the cryptosystems developers end up with when they use OpenSSL, its bindings in Python or Ruby, or "javax.crypto" to get AES.

AES in its default block cipher mode can usually be byte-at-a-time decrypted. AES in its "conservative" mode can almost always be byte-at-a-time decrypted when not augmented with another crypto building block that developers invariably forget. When developers don't forget that building block, they often manage to implement it in such a way that it too can by byte-at-a-time broken. AES in its most "modern" mode ends up being exactly as secure as naive XOR when developers use it without understanding its parameters.

On the other hand, if you read the Wikipedia page on Feistel networks and wrote your own --- or if you just used reduced-round FEAL-4 --- but used Keyczar to actually deploy it against real data, all those mistakes I alluded to above would be avoided, and your attackers would have to do real cryptanalysis to attempt to break your application; nobody does that.

Knowing this, you can now see why I'd take issue with the idea that your "Secure Progammer's Pledge" urges people to use "vetted algorithms" to protect data. AES is about as "vetted" as algorithms ever get, and its use in production code by generalist developers is almost always comically insecure.

So: no, that one example turns out not to be more secure than AES, even in Keyczar. The problem is, by itself, it usually turns out not to be less secure either.


As a security researcher (but not necessarily a crypto one), I do not understand this comment.

> AES in its default block cipher mode can usually be byte-at-a-time decrypted.

1. Block ciphers don't have default modes. Implementations might. Does OpenSSL really use ECB as the default mode? (I agree wholeheartedly with you that sensible defaults are extremely important, and so ECB-as-default seems hard to believe.)

2. What does "byte-at-a-time" decrypted mean? You haven't specified the threat or attacker models.

Are you saying that given several million ciphertexts, you can recover the key from AES-ECB? AES-CTR? Does the attacker need side channel acccess? How about given one ciphertext? Or is this a chosen-plaintext or chosen-ciphertext attack?

In short, could you please detail the attack you have in mind?

> AES in its most "modern" mode ends up being exactly as secure as naive XOR when developers use it without understanding its parameters.

As far as I can tell, this is entirely predicated on your later statement that "nobody does [real cryptanalysis]". What is AES's 'most "modern" mode'? Which parameters are you referring to here (key size, mode, any others?)

My guess is that XOR will fall in some small number of hours against someone who cares; AES-128-ECB (as bad as it is) may require many more resources for key retrieval.

For fun, which definition of security are you using to compare cryptosystems?


This comment is harshly written, but I don't mean it personally (you're anonymous, so how could I?) and anyways, I don't know what else to do with this (common) sentiment of "I don't understand the vulnerabilities you're talking about so I'm going to assume there's something basic about how stuff works that I grasp but you do not".

You're a security researcher who doesn't know crypto. This stuff isn't hard, but for some reason, most security researchers know fuck-all about how to test and exploit crypto bugs. Don't take too much offense; I was in the same bucket until a few years ago (and I'm not far from it even now), and I've been a researcher since 1994.

ECB is the default mode not because people choose overtly to make it the default mode, but because it requires no parameters to make it work. Look at a generalist programmer's cryptosystem. Flip a coin. Did it come up heads? It's ECB mode, because that's the moral default.

Nothing I'm talking about involves "several million ciphertexts".

Nothing I'm talking about involves side channels --- at least, not precision measuring side channels. "Side channel attacks" are the voodoo totem that security researchers wave around when they don't know a specific attack that will break a cryptosystem. Sort of like not knowing how to pick a lock a pin at a time, but talking about "bump keys".

Nothing I'm talking about even involves the attacker knowing for sure what algorithm the defender used. We test for this stuff black box; it takes less than a week to train people to do it.

No, I'm not going to provide more details here. Not because I jealously guard this stuff (I've written most of this stuff on HN before, and I've given talks about it that are recorded online), but because every time I get into a thread like this, someone comes back and says "oh yeah well THAT attack is LAME and I assumed that any smart developer would already have defended against it" and I'd rather reveal ignorance for what it is.

ECB will fall in seconds in most situations. If you knew how to test crypto, you'd know that none of these attacks "retrieve keys". Again: don't take offense. People way smarter than me don't know this stuff. I think it's because the papers use math notation.


I'd be the first to tell you that I don't understand crypto. Most people don't.

You're unwilling to have this conversation again; I understand. Do you have a link to one of your talks? I'd be interested in watching.

Can you at least tell me what definition you're using for "fall", if not key retrieval? Replay attack? Information leakage?

Edit:

> "I don't understand the vulnerabilities you're talking about so I'm going to assume there's something basic about how stuff works that I grasp but you do not"

Sorry if it came off that way! I'm assuming that you understand something basic about how this works that I do not, and wondering what it is :)


The security of most block ciphers revolves, in some ways, on the difficulty of brute force iterating through very large numbers --- 2^128, say.

"Byte-at-a-time decryption" means creating a scenario where attackers can brute force numbers like 2^8, winning a single byte of "plaintext" (or whatever the equivalent is depending on the primitive you're targeting). If your block size is 16 bytes long, the attacker might have to brute force 2^8 16 times; with a laptop, you might be talking about whole seconds of work.

Block cipher attacks generally never recover crypto keys.

I am being intentionally vague. Not because I want to keep information from you, but because I don't want to create yet another crypto thread that gives developers a false sense of knowing what the risks are when building crypto.

If this is something you're seriously interested in, and you can code in any programming language, email me and I'll give you a syllabus of straightforward things to work on.


Ok, you have me confused. I half want to raise the BS flag...

Could you explain something here? How can a block cipher that has 128 bits of output be attacked 8 bits at a time (where 1 bit change in the input will change on average 64 bits of the output in a non-predictable manner)? Sure, you can try every 8 bit permutation, but without knowing the form of the original text how can you know if you have a valid character? And how is that different from extracting "raw data" out of pure randomness (where the fallacy is obvious, you're extracting data that was never there)?

I'm genuinely interested, so if an email will do it, could you please follow up: ircmaxell [at] php [dot] net...

Thanks!


If I sound smart about any of this stuff, alarm bells should be going off in your head, because in terms of testing and breaking crypto, I am a piker.

I think you get my point now. Maybe rethink the crypto stuff in your pledge.


How would this change for example if instead of using just plain AES you use AES with a block cipher mode (CTR-BE/CBC/or others just no CBE)?


If you read my comment carefully, you'll see I'm talking about mistakes people make when using "safe" block cipher modes. The default mode is ECB, by the way, not CBE.


Excuse the typo, had just woken up.


Actually, depending on the mode of operation it may be about as secure as AES in ECB... That's it, not secure at all.

Using a well known algorithm is such a small part of the overall security of a cryptosystem that it makes the call to use only well known algorithms useless by itself. Hence tptacek recommending using proven high level libraries (e.g. OpenSSL's EVP_* family of functions).


Whoah. Hold on. OpenSSL EVPxxx claims to be a "high-level interface", but isn't one. Here's some acid tests for high-level libraries:

* Does the library expose block cipher mode choices to callers? It's not high-level.

* Does the library expose IVs to callers? It's not high-level.

* Does the library separate "encryption" from "authentication", offering the choice of doing one without the other? It's not high level.

* Does the library by default allow users to pass in raw buffers as keys? It's not high level.

Don't use OpenSSL directly to do crypto in applications.


Why shouldn't the IV be available to the callers? How are you supposed to implement cryptography where this is not available so that you can send data from one source to another (such as encrypted using a private/public key system) so that they have the same IV for decryption.

Why shouldn't there be a choice to offer separate encryption from authentication?

How does interoperation with other libraries/platforms work. For example trying to interoperate with Java for Diffie Hellman has for instance meant using an library that does BER/DER and decoding the values ourselves because Java uses an older standard compared to the library we are using on the C++ side (Botan). We have to have the ability to manage keys ourselves. All the higher level abstractions create all kinds of issues because the choices made don't let us interoperate because they are different from what the next library chooses.


Because encryption without authentication in most systems is pointless?

Because if you're using a high-level interface to cryptography, the handling of the IV is a detail that can easily be hidden from developers without loss of security or, in reality, meaningful flexibility? Meanwhile, asking developers to provide an IV is an invitation to cryptosystem flaws?

Because maybe you shouldn't be doing custom Diffie Hellman in your application to begin with? Is there a pointer to either your Java or C++ code on Github? Do you know how to do DH safely? I don't, but I know some flaws to look for, and that I've found in the real world on pentests.

Instead of talking about BER/DER, how about you talk about the DH pitfalls you avoided and how you avoided them? That way, readers of this thread would get a whiff of maybe why they wouldn't want to come within a mile of a custom DH implementation.


My code is not on Github because it is proprietary code, so unfortunately I can't send you a link to have a look at it.

Sure, encryption without authentication is worthless, but what if I want to do the authentication in a different manner? What if I want to encrypt using AES-128/CTR-BE and then sign using public/private key rather than using an HMAC(SHA-256). That choice can be left up to a developer. Also, there still seems to be some debate as to whether you encrypt then HMAC or if you HMAC then encrypt. (I personally prefer encrypt then HMAC).

I don't think you necessarily have to have the developer provide the IV, you can provide a way for the developer to get the generated IV back from the system after the fact that would work just as well. So long as you can get access to it, so you can build it to spec. All this hiding is good for projects where that is possible, interoperability and following gov't requirements can throw a wrench in simply using something like Keyczar.

The biggest issue with DH that we have had to overcome is the different encodings that may be used. You have various different standards for how to share the DH keys. There is an ANS X9 standard that is recommended by the government[1] (and required in our product) and then there is PKCS3.

The biggest issue we ran into is that Java (on Android) only does PKCS3, and Botan doesn't by default decode that so you end up having to implement that functionality yourself. We ended up using PKCS3 but that will most likely have to be changed for gov't approval which means we need to get Java to grok ANS X9.

When doing DH I also suggest using one of the NIST approved DH groups and not generating your own.

Do note that none of this is using a custom DH implementation, it is using the stuff that is available in Botan/Java itself, the biggest issue is finding common ground for finishing the DH key exchange because of the different formats used in the in-between steps.

Also, generating a random key from the DH data is different on various different API's. Botan can generate keys for you, and will use a KDF for you, while Java will simply give you the bytes back in raw format.

So far this isn't a solved problem for us. Since we are not using it for cryptographic purposes we may end up doing our own custom implementation so that we have less to worry about in terms of encoding/decoding various different storage schemes. And any weakness in the system won't be a fatal flaw that could be used/exploited in a meaningful manner, so fixing things at a later date won't be catastrophic.

The biggest issue is that there are so many standards and so many little variations of it in cryptography that trying to get systems to interoperate and work together is a massive undertaking and each time you run up against a wall because something is abstracted away you wish and eventually want access to the lower level primitives.

[1]: http://csrc.nist.gov/publications/nistpubs/800-56A/SP800-56A...


I mean, there are widespread DH vulnerabilities that are trivial to test for that you haven't mentioned here. Obviously, you've written code to do DH before, and crypto code in general, but how confident do you feel about telling generalist developers how to do crypto? My job is to find flaws in other people's code and then effectively exploit those flaws in front of them, and I certainly don't feel comfortable recommending implementations.

Incidentally, there's not much controversy about encrypt-then-MAC vs. mac-then-encrypt. You E-t-M, so attackers can't play games with ciphertext bits. Also: if you have the choice, you'd probably want to avoid using public key signatures when you could just get away with using a MAC. The public key stuff strictly adds vulnerabilities; it doesn't ameliorate any.


Vulnerabilities such as accidentally setting a value (g[x] or g[y], or x, or y IIRC) to 1 and then having the keys generated be invalid because the key would end up being 1?

Wasn't this an issue in an IPSEC implementation?

Or are you talking about the no authentication on DH so that MITM is trivially possible? Or the choosing of bad prime numbers (hence my suggestion of using the NIST groups) could mean it is easier for an attacker to figure out the private key that is generated.

Speaking of which, this paper which I read a while back is definitely a good idea for reading before implementing DH:

http://crypto.cs.mcgill.ca/~stiglic/Papers/dhfull.pdf

Do I feel confident how to tell generalist developers how to do crypto? Not entirely, but I do feel confident telling generalist developers what I have seen that is completely wrong or what has to be avoided.

As for the encrypt then mac, I'd definitely agree with you there, and also agree that public key may not be the best, but the option should exist. For what I am currently working on it is encrypted then HMAC the encrypted data, then the HMAC is stored publicly but the key for the HMAC is encrypted with public key, then that encrypted data is signed with private key. Now the receiver verifies signature with senders public, decrypts symmetric key/iv/hmac key with private key, and then verifies HMAC for data, then decrypts the data.

Which as far as I am aware (and please correct me if I am wrong) is the best way to make sure that the data has not been tampered with.


Zero-mod-p was an IPSEC fault, as was subgroup negotiation (but you mentioned NIST groups earlier).

Lots of people have done their own implementations of DH (DH is trivial to implement; it's just a couple lines of Ruby); how many of them do you think caught these flaws?

I have no idea what the utility of the public key is in the system you just described. Public key crypto adds susceptibility to attack; it doesn't ever really spare you from it. Here, you're vulnerable both to attacks on your HMAC verification and to implementation flaws in your more number-theoretic (and way more dangerous) public key code.


Indeed, DH can be difficult to get right if you are not paying attention.

I wish I could go into more specifics for our implementation and what it is we do.

Either way the public/private key stuff is done using various "high" level libraries that make reasonable choices and are audited, so it is not code that we are implementing ourselves, nor is the HMAC. So long as there are no issues in the libraries I should be fairly good.

Thanks for the discussion :-).


I get your point, but pretty often you must respect some general mandate (e.g. must use this approved chaining mode, this certified PRNG...) and OpenSSL EVP comes very handy if you don't find a library doing exactly what you need. It still shields you quite a bit from the lower level mistakes to be made.

What's your go to solution?


Keyczar.

For the situation you describe, a "low-level" crypto library is handy; for instance, very few libraries implement ciphertext stealing, so if you require compatibility with some wacky protocol or file format that uses CTS, OpenSSL EVP is no help either.

That doesn't make OpenSSL's primitive interface a good choice.

If you are designing crypto for your own application, and you find yourself typing "O-p-e-n-S-S-L" or even "A-E-S", you are probably in for a lot of trouble.


Except that to implement TLS for example you NEED to type OpenSSL into your code, and to set the allowed ciphers for the connection you need to type AES.

I agree with you that Keyczar is good to use for certain use cases, but that isn't always the case. Instead of always telling us to use Keyzcar how about showing us how to use the primitive libraries more correctly to have secure applications YET at the same time allow us developers to have the flexibility to use them as we need to use them for compatibility with other implementations or with government requirements (for those of us in the gov't contracting world).

You are in the unique position that you have seen all of the mistakes and know how to avoid them and thus could help developers avoid those mistakes as well.


If you're implementing TLS using OpenSSL, you aren't using it to do custom crypto for your app.

No, I don't think I'll be spending time trying to help people roll safe custom crypto by hand using the primitives in OpenSSL. For one thing: I'm nowhere close to confident that I know enough to make recommendations like that.


So don't tell us how to use the primitives, tell us in high level overview of common mistakes that are made so that we can watch out for those. Some of us simply don't have the luxury of using Keyczar.

You are well respected when it comes to the field of crypto in this community, you have a lot of knowledge that comes with that, and I believe sharing that knowledge would be more beneficial than just saying "Use Keyczar". Sometimes that one tool just doesn't fit the bill and or can't be used due to contractual obligations/constraints. Even if you are not confident to make a recommendation of how to use OpenSSL securely you can at least say "These are the top 5 mistakes I see" and "Typing A-E-S into your code" can be on that list, but it isn't very constructive.


Getting people not to make dumb crypto mistakes, by getting them to either (a) not write crypto code in the first place, or (b) if they have to encrypt stuff, to use GPG to do it, or (c) at the very least, to use a library interface that has been bulletproofed like Keyczar or cryptlib --- that seems way more productive than leading people down a garden path of the individual flaws I know how to test for, and letting them think they're secure.


I find it kinda sad that you have "I will not assume that I know better" and then your first reply is a point-by-point rebuttal of a paid expert's opinion. :)


> This abets a hugely widespread misunderstanding about the security of crypto. You could in fact invent your own block cipher core, and if you dropped it into Keyczar or cryptlib's high level library be more secure than people using AES-256 directly via OpenSSL.

No.

Someone who knew their way around cryptography algorithms could do that. I can't. This is good advice for someone who isn't a security expert. (Edit: With my level of crypto knowledge, I don't even know how to evaluate whether your claim bears any relationship to reality.)

(Yes. I would like to remedy that. I'm working on it, slowly.)


Are you sure you're right about that? Because I think you're wrong. I think you could spend 15 minutes on Wikipedia, write your own block cipher core, and if you used Keyczar as the protocol implementation, you'd be better off than if you had used AES directly.

Your AES code, I am implying, would be so bad that you'd be better off not having used it at all.


That's fair. However I'd argue that someone implementing their own algorithm would not use a library like keyczar. They would just write their own.

So while it's possible to write your own and be secure, IMHO it'd be better to stick to vetted algorithms and libraries... I think it's just that - all other things being equal - the public algorithm is more likely to be more secure...


(Psst, your reply to maxgrep is [dead].)


Weird. Tried to recover it, gave up. Other readers: you're not missing much. :)


"I will not store sensitive data in plain text"

By this logic, Gmail would need to encrypt the contents of every email and every attachment, then encyrpt the full-text indexes of those emails and attachments. Obviously my contacts should be encrypted too. Then they'd need to encrypt the names of all labels/tags, which can contain sensitive information. Then they'd need to encrypt their logs, since when I visit the service and from where is actually sometimes sensitive info.

This is basically endless. It's impossible to accurately asses the bounds of what an arbitrary user will consider sensitive. The core is reasonably easy -- passwords, CC numbers -- but there can be hugely sensitive data at the edges.

My favorite example of this at the moment is 1Password. 1Password does a very thorough job of encrypting their passwords file. You can go read a whole blog post and white paper they wrote on their keychain format. But it turns out (as I and others have raised in their forums) there is a cache they create in the clear in the filesystem where the cache files are named after the websites where you have an account and have recently visited.

Now MANY people will not consider this sensitive data. But some people will. The passwords are not leaking, but the names of sites where I have accounts IS leaking. No problem, unless you have an account at donkey-fetish-dot-xxx your partner doesn't know about or whatever. The guys who designed 1Password clearly didn't think this issue would come up because, to their credit, they probably don't spend a lot of time on sites like private-pirated-movie-trove-dot-info. Or they don't have spouses/bosses who know where to look for these cache files.

Anyway, this is a long way of saying that you'll go crazy trying to encrypt all sensitive info. I think the 1Password case is clear cut but, judging by the response from their support, they did not. You might think a users' bookmark tag is not sensitive info, but if it's named 'job-hunt' or 'divorce-lawyers' it probably is. In the end, everything is sensitive info to someone.


Particularly with web apps, this "encrypt all the sensitive data" stuff is usually masturbatory. If it's data your application needs to function, the server needs ready access either to the key, or at least to some online oracle that provides access to the data. So you end up with these silly systems that encrypt data under AES keys stored, at best, on the filesystem.

"Encrypt all data" is something that sounds good in a message board thread, but really doesn't do much to shield you from the fact that to be secure, you just have to flush all the bugs out of your application.


How about storing the AES key in a HSM (Hardware Security Module) instead of a filesystem?


Does anyone else here in fact wish for gmail to do that?


Left this as a comment on the blog posting, but felt it might be worth reiterating here:

    An important addendum to "I will use existing libraries where possible" -- 
    you should also pledge to share those implementations freely. This will allow
    your implementations to enjoy the same scrutiny that the implementations of 
    others have (cf. "I will only use vetted and published algorithms") and 
    enrich the development community.
Without this bit -- especially if your target audience is average developers -- you're inadvertently condoning ignorance due to lack of peer review.


One entry missing:

I won't make use of programming languages that make it easy to do security exploits




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: