

The Secure Programmer's Pledge - ircmaxell
http://blog.ircmaxell.com/2012/07/secure-programmers-pledge.html

======
tptacek
Problems:

 _I will only use vetted and published algorithms_

This abets a hugely widespread misunderstanding about the security of crypto.
You could in fact invent your own block cipher core, and if you dropped it
into Keyczar or cryptlib's high level library be _more secure_ than people
using AES-256 directly via OpenSSL. The problem isn't _algorithms_. The
problem is _constructions_ , particularly at the joints.

You should change this to "I will only use vetted high-level cryptographic
libraries", with the descriptive text explaining that a "high-level
cryptographic library" is one that handles key generation and makes all the
decisions about block cipher modes and MACs and verification and order-of-
operations for you.

So with that having been said:

 _I will not store sensitive data in plain text_

Encouraging people to encrypt data while giving them bad advice about how to
accomplish that is a recipe for disaster.

Next:

 _I will use parameterized queries when executing SQL_

Specifically, you write: "Parameterized queries are a better way of solving
the problem, because it doesn't require any escaping". This is wrong. Most
database protocols will allow you to bind _data_ to a query, but not keywords,
or even limits and offsets. A whole generation of programmers has been
convinced that using parameterized queries shields them from SQL Injection,
while writing pagination code or sortable tables that are trivially
injectable.

Finally:

 _I will understand the OWASP top 10_

I can't knock you for asking people to know what the OWASP top 10 is, but
contra the words in your pledge, "OWASP" (whatever that is in reality) does
not "track" the "top 10" vulnerabilities in any methodical way. It's basically
just a bunch of people getting together and making a case for what they think
the most important vulnerabilities are. If you know very little about web
security, the OWASP Top 10 is a fine starting point, but your readers should
know that's all it is.

~~~
ircmaxell
One point: this list/pledge is for average developers, not crypto experts...

> This abets a hugely widespread misunderstanding about the security of
> crypto. You could in fact invent your own block cipher core, and if you
> dropped it into Keyczar or cryptlib's high level library be more secure than
> people using AES-256 directly via OpenSSL.

You _could_ of course. But the average developer cannot. It takes quite a bit
of knowledge about cryptography to be able to do this and have it be more
secure than AES... And even if you have that knowledge the chance that a
mistake was made is high enough that you shouldn't use it anyway (the
algorithm isn't vetted). So I stand behind my point...

> Encouraging people to encrypt data while giving them bad advice about how to
> accomplish that is a recipe for disaster.

What bad advice? The only thing I said was hash it if you just need to verify,
or encrypt if you need to reverse it.

Additionally, would you rather have CC numbers stored in plain text? I'd
rather have a botched encryption on them that's somewhat easy to break than
have it in plain text...

> Specifically, you write: "Parameterized queries are a better way of solving
> the problem, because it doesn't require any escaping". This is wrong.

It's not wrong. Raw user input should _never_ enter a query. Period. If you're
going to paginate or sort or filter, you need to white-list filter on
available fields and values. Escaping and adding it to the query is just a
recipe for disaster... Obviously just using a parameterized query API isn't
going to do it for you. But escaping, in any context, is an incorrect way of
handling it...

> If you know very little about web security, the OWASP Top 10 is a fine
> starting point, but your readers should know that's all it is.

I'm not trying to suggest that they should _only_ know the top 10, but that
they should know it in its entirety...

~~~
tptacek
Do you understand why I said a developer who wrote their own block cipher core
and plugged it into Keyczar would be more secure than a developer who used AES
directly?

If you don't understand, why do you "stand behind your point"? Why don't you
instead try asking questions?

Similarly: you wrote none of that stuff about "whitelisting" (whatever it is
you mean by that) in your "pledge". You just told developers, "use
parameterized queries so you don't have to escape them". And now, when it's
pointed out that that's not great advice, you find a way to argue with it?

~~~
ircmaxell
Ok, I'll bite. Why would it be more secure?

Would the following block cipher be more secure than AES?

    
    
        function encrypt(block, key) {
            return block XOR key;
        }

~~~
tptacek
That block cipher would not in practice be much worse than the cryptosystems
developers end up with when they use OpenSSL, its bindings in Python or Ruby,
or "javax.crypto" to get AES.

AES in its default block cipher mode can usually be byte-at-a-time decrypted.
AES in its "conservative" mode can almost always be byte-at-a-time decrypted
when not augmented with another crypto building block that developers
invariably forget. When developers don't forget that building block, they
often manage to implement it in such a way that it too can by byte-at-a-time
broken. AES in its most "modern" mode ends up being _exactly as secure as
naive XOR_ when developers use it without understanding its parameters.

On the other hand, if you read the Wikipedia page on Feistel networks and
wrote your own --- or if you just used reduced-round FEAL-4 --- but used
Keyczar to actually deploy it against real data, all those mistakes I alluded
to above would be avoided, and your attackers would have to do real
cryptanalysis to attempt to break your application; nobody does that.

Knowing this, you can now see why I'd take issue with the idea that your
"Secure Progammer's Pledge" urges people to use "vetted algorithms" to protect
data. AES is about as "vetted" as algorithms ever get, and its use in
production code by generalist developers is almost always comically insecure.

So: no, that one example turns out not to be more secure than AES, even in
Keyczar. The problem is, by itself, it usually turns out not to be _less_
secure either.

~~~
blazingice
As a security researcher (but not necessarily a crypto one), I do not
understand this comment.

> AES in its default block cipher mode can usually be byte-at-a-time
> decrypted.

1\. Block ciphers don't have default modes. Implementations might. Does
OpenSSL really use ECB as the default mode? (I agree wholeheartedly with you
that sensible defaults are extremely important, and so ECB-as-default seems
hard to believe.)

2\. What does "byte-at-a-time" decrypted mean? You haven't specified the
threat or attacker models.

Are you saying that given several million ciphertexts, you can recover the key
from AES-ECB? AES-CTR? Does the attacker need side channel acccess? How about
given one ciphertext? Or is this a chosen-plaintext or chosen-ciphertext
attack?

In short, could you please detail the attack you have in mind?

> AES in its most "modern" mode ends up being exactly as secure as naive XOR
> when developers use it without understanding its parameters.

As far as I can tell, this is entirely predicated on your later statement that
"nobody does [real cryptanalysis]". What is AES's 'most "modern" mode'? Which
parameters are you referring to here (key size, mode, any others?)

My guess is that XOR will fall in some small number of hours against someone
who cares; AES-128-ECB (as bad as it is) may require many more resources for
key retrieval.

For fun, which definition of security are you using to compare cryptosystems?

~~~
tptacek
This comment is harshly written, but I don't mean it personally (you're
anonymous, so how could I?) and anyways, I don't know what else to do with
this (common) sentiment of "I don't understand the vulnerabilities you're
talking about so I'm going to assume there's something basic about how stuff
works that I grasp but you do not".

You're a security researcher who doesn't know crypto. This stuff isn't hard,
but for some reason, most security researchers know fuck-all about how to test
and exploit crypto bugs. Don't take too much offense; I was in the same bucket
until a few years ago (and I'm not far from it even now), and I've been a
researcher since 1994.

ECB is the default mode not because people choose overtly to make it the
default mode, but because it requires no parameters to make it work. Look at a
generalist programmer's cryptosystem. Flip a coin. Did it come up heads? It's
ECB mode, because that's the moral default.

Nothing I'm talking about involves "several million ciphertexts".

Nothing I'm talking about involves side channels --- at least, not precision
measuring side channels. "Side channel attacks" are the voodoo totem that
security researchers wave around when they don't know a specific attack that
will break a cryptosystem. Sort of like not knowing how to pick a lock a pin
at a time, but talking about "bump keys".

Nothing I'm talking about even involves the attacker knowing for sure what
algorithm the defender used. We test for this stuff black box; it takes less
than a week to train people to do it.

No, I'm not going to provide more details here. Not because I jealously guard
this stuff (I've written most of this stuff on HN before, and I've given talks
about it that are recorded online), but because every time I get into a thread
like this, someone comes back and says "oh yeah well THAT attack is LAME and I
assumed that any smart developer would already have defended against it" and
I'd rather reveal ignorance for what it is.

ECB will fall in seconds in most situations. If you knew how to test crypto,
you'd know that none of these attacks "retrieve keys". Again: don't take
offense. People way smarter than me don't know this stuff. I think it's
because the papers use math notation.

~~~
blazingice
I'd be the first to tell you that I don't understand crypto. Most people
don't.

You're unwilling to have this conversation again; I understand. Do you have a
link to one of your talks? I'd be interested in watching.

Can you at least tell me what definition you're using for "fall", if not key
retrieval? Replay attack? Information leakage?

Edit:

> "I don't understand the vulnerabilities you're talking about so I'm going to
> assume there's something basic about how stuff works that I grasp but you do
> not"

Sorry if it came off that way! I'm assuming that you understand something
basic about how this works that I do not, and wondering what it is :)

~~~
tptacek
The security of most block ciphers revolves, in some ways, on the difficulty
of brute force iterating through very large numbers --- 2^128, say.

"Byte-at-a-time decryption" means creating a scenario where attackers can
brute force numbers like 2^8, winning a single byte of "plaintext" (or
whatever the equivalent is depending on the primitive you're targeting). If
your block size is 16 bytes long, the attacker might have to brute force 2^8
16 times; with a laptop, you might be talking about whole seconds of work.

Block cipher attacks generally never recover crypto keys.

I am being intentionally vague. Not because I want to keep information from
you, but because I don't want to create yet another crypto thread that gives
developers a false sense of knowing what the risks are when building crypto.

If this is something you're seriously interested in, and you can code in any
programming language, email me and I'll give you a syllabus of straightforward
things to work on.

~~~
ircmaxell
Ok, you have me confused. I half want to raise the BS flag...

Could you explain something here? How can a block cipher that has 128 bits of
output be attacked 8 bits at a time (where 1 bit change in the input will
change on average 64 bits of the output in a non-predictable manner)? Sure,
you can try every 8 bit permutation, but without knowing the form of the
original text how can you know if you have a valid character? And how is that
different from extracting "raw data" out of pure randomness (where the fallacy
is obvious, you're extracting data that was never there)?

I'm genuinely interested, so if an email will do it, could you please follow
up: ircmaxell [at] php [dot] net...

Thanks!

~~~
tptacek
If I sound smart about any of this stuff, alarm bells should be going off in
your head, because in terms of testing and breaking crypto, I am a piker.

I think you get my point now. Maybe rethink the crypto stuff in your pledge.

------
mapgrep
"I will not store sensitive data in plain text"

By this logic, Gmail would need to encrypt the contents of every email and
every attachment, then encyrpt the full-text indexes of those emails and
attachments. Obviously my contacts should be encrypted too. Then they'd need
to encrypt the names of all labels/tags, which can contain sensitive
information. Then they'd need to encrypt their logs, since when I visit the
service and from where is actually sometimes sensitive info.

This is basically endless. It's impossible to accurately asses the bounds of
what an arbitrary user will consider sensitive. The core is reasonably easy --
passwords, CC numbers -- but there can be hugely sensitive data at the edges.

My favorite example of this at the moment is 1Password. 1Password does a very
thorough job of encrypting their passwords file. You can go read a whole blog
post and white paper they wrote on their keychain format. But it turns out (as
I and others have raised in their forums) there is a cache they create in the
clear in the filesystem where the cache files are named after the websites
where you have an account and have recently visited.

Now MANY people will not consider this sensitive data. But some people will.
The passwords are not leaking, but the names of sites where I have accounts IS
leaking. No problem, unless you have an account at donkey-fetish-dot-xxx your
partner doesn't know about or whatever. The guys who designed 1Password
clearly didn't think this issue would come up because, to their credit, they
probably don't spend a lot of time on sites like private-pirated-movie-trove-
dot-info. Or they don't have spouses/bosses who know where to look for these
cache files.

Anyway, this is a long way of saying that you'll go crazy trying to encrypt
all sensitive info. I think the 1Password case is clear cut but, judging by
the response from their support, they did not. You might think a users'
bookmark tag is not sensitive info, but if it's named 'job-hunt' or 'divorce-
lawyers' it probably is. In the end, everything is sensitive info to someone.

~~~
tptacek
Particularly with web apps, this "encrypt all the sensitive data" stuff is
usually masturbatory. If it's data your application needs to function, the
server needs ready access either to the key, or at least to some online oracle
that provides access to the data. So you end up with these silly systems that
encrypt data under AES keys stored, at best, on the filesystem.

"Encrypt all data" is something that sounds good in a message board thread,
but really doesn't do much to shield you from the fact that to be secure, you
just have to flush all the bugs out of your application.

~~~
rickette
How about storing the AES key in a HSM (Hardware Security Module) instead of a
filesystem?

------
Cyranix
Left this as a comment on the blog posting, but felt it might be worth
reiterating here:

    
    
        An important addendum to "I will use existing libraries where possible" -- 
        you should also pledge to share those implementations freely. This will allow
        your implementations to enjoy the same scrutiny that the implementations of 
        others have (cf. "I will only use vetted and published algorithms") and 
        enrich the development community.
    

Without this bit -- especially if your target audience is average developers
-- you're inadvertently condoning ignorance due to lack of peer review.

------
pjmlp
One entry missing:

 _I won't make use of programming languages that make it easy to do security
exploits_

