
Preventing heartbleed bugs with safe programming languages - cpeterso
http://bluishcoder.co.nz/2014/04/11/preventing-heartbleed-bugs-with-safe-languages.html
======
zepolud
To take it one step further, it might be a good idea to specify the whole
thing in Coq, prove correctness and derive the C code from there. Coq is a
theorem prover based on the Martin-Löf type theory like the ATS mentioned in
the article. Xavier Leroy (the man behind OCaml) managed to write a whole
verified C compiler in Coq[1] so while it's slower than writing C directly it
should still be feasible for projects of such critical importance.

On a related note, mathematicians are also finding it increasingly difficult
to trust unverified proofs. Papers by Vladimir Voevodsky, a Fields medalist
and one of the most prominent in the field, were also discovered to contain
errors more than ten years after publication, having been cited hundreds of
times and nobody noticing the errors in the proofs. This was one of the
principal motivations for him to initiate the project for homotopy type
theory, which also has roots in Martin-Löf type theory and is also being
developed formally in Coq.

[1][http://compcert.inria.fr/diagram.png](http://compcert.inria.fr/diagram.png)

~~~
jude-
I've said it before and I'll say it again.

Formalizing the TLS specification and proving that an implementation is
consistent with it only shows that the implementation is _logically correct_.
However, it does NOT show that the implementation is _secure_. Your
implementation can be vulnerable to side-channel attacks (particularly timing
attacks) while still being logically correct.

Proofs of correctness are a (big) step in the right direction, but they're not
a silver bullet for building a secure implementation.

~~~
tomp
You can use mathematics to prove other things as well. You could e.g. use a
special bool type for all secure data that you can't branch on, so that no
secrets could be exposed by examining the runtime path of execution. Or build
a model of your CPU's caching and prefetching behaviour and prove that you're
code doesn't expose any secrets by that channel.

~~~
jude-
Right--this will work on some types of side-channel attacks. For example, I
could prove that my decryption function runs in a constant amount of time no
matter the input.

However, you can't proactively defend against side-channel attacks. By
definition, a side-channel attack exists because the designer didn't know that
the cryptosystem leaks information when (s)he designed and proved the
correctness of the implementation.

Proving that the cryptosystem doesn't leak information is damn nigh
impossible, and often outside the scope of software engineering. My favorite
example is acoustic analysis
([https://www.cs.tau.ac.il/~tromer/acoustic/](https://www.cs.tau.ac.il/~tromer/acoustic/)).
Before this was shown to be possible, how were cryptosystem designers supposed
to know to defend against it?

The best we can do is come up with and formalize a threat model, and then
prove that the cryptosystem is secure against the adversaries in the threat
model. The problem is that threat models don't help you with adversaries in
the future, who have means of getting information from your cryptosystem that
you didn't think of.

~~~
justinsb
I take your point, but I don't think _anything_ can defend against unknown
attacks. However, we could defend against all known attacks using formal
methods, which is a lot better than we're doing at the moment. Further, when a
new attack becomes known, I would wager that having something that is
verifiable today will be easier to verify or fix in the future, as compared to
a code-base that is not provable.

Just because we can't achieve theoretical perfection, we should not rule out
using methods that would solve many of the actual problems we encounter today.

Of course, if you can find a way to defend against unknown future problems,
I'm all in favor of using your method!

~~~
conover
Unknown-unknowns as it were.

------
josteink
In the age where "most" web-servers are idling at 10% load and yaaaawning most
of the hours away, it makes no sense that security-critical network-code
should be written in unsafe languages which can leak memory, such as C.

Due to improved hardware, we've now reached the point where we can have
reasonable security _and_ performance at the same time. We no longer need to
sacrifice security by using unsafe languages. So why _are_ we using them?

Yes some companies like Google and Facebook has busy servers, but for the rest
of us, for the rest of the entire internet having to redo all our keys and
certs and passwords due to a archetypical C-style bug in OpenSSL, the _cost_
using C compared to the potential performance-benefits is absolutely not worth
it. Not even close.

TLDR: Stop writing network-code in C unless you bloody well need to

~~~
hackerboos
I think the point of TFA is that we can write this code and it will run as
efficiently as C in a language which is safer than C.

~~~
pjmlp
It was already possible in the 70's with PL/I, Mesa and Modula-2, but alas
they though it would be better not to offer bounds checking in the language,
instead of a way to disable them if required, like the other languages.

------
Scramblejams
We really need to stop writing network code in unsafe languages, period.

~~~
eatfish
But even 'safe' languages have unsafe VMs. There is no question that over the
years these VMs (or the runtimes) have had equally severe vulnerabilities.

The thing that really sets heartbleed apart is not the details of the bug,
it's the scale of the 'infection'. OpenSSL is a core dependency of so many
distributions and of so many pieces of software.

I think we could argue for lots of different solutions (diversity of
implementations, safe languages, more tests) and all of them might be good in
one way or another but none of them are a silver bullet for any possible bug
either.

I think the take away is that you need to in a position to upgrade _any_ part
of your software stack at a moment's notice not just the obvious top-level
(e.g. Rails/Django/Jetty).

~~~
iopq
Rust allows you to write safe code without a VM. It is statically checked to
be memory and concurrency safe.

~~~
eatfish
1\. I think a lot of people believe Rust will just type-check any old program
and tell you when it has faults. So you can start with a bit of Ruby/C/Python,
translate it to Rust and presto, all your bugs are exposed for the world to
see.

In practice Rust's type checker accepts only a _very_ small subset of correct
programs. I've been in a position to write some decent sized Rust code
recently and it takes a shift in your mindset to start writing decent Rust
code.

Even now there are patterns I'm unsure how to model in Rust. Arena allocation
is a good example because it was partly the cause of Heartbleed too. Arena
allocation in rust seems to require unsafe pointers and unsafe code blocks.
You can look at Rust's standard library and see this.

2\. The point being that the Rust language exposes unsafe code blocks and
pointers. At some point you're going to hit those blocks (if nothing else in
3rd party code) and you're back to square one: You need to trust unsafe code
that it is correct. It doesn't matter if that code is a VM or unsafe code.

*edited for some legibility.

~~~
ldng
The argument Rust devs make is that most of the time you would not need to use
unsafe code and when you do, being explicit about it would make you more
careful and think twice about it.

To me it makes sense. And the example you give here is very relevant. First
you'd try to do it within the standard language bounds and only when you
realize you can't do it that way, I'll resort to unsafe code. But now your
very aware that this part of the code needs to be treated why extra care. So,
to me, you're not completely back to square one.

Nicholas Matsakis make this very point near the end of this talk:
[https://www.youtube.com/watch?v=9wOzjbgRoNU](https://www.youtube.com/watch?v=9wOzjbgRoNU)

I would even add, if care is taken to make that unsafe code really small it
can even been generated by Coq for instance as stated in some comments here.

That said Rust might not be the best out there for the job but IMHO it
shouldn't be dismissed to fast either. It is similar enough to C++ to allow a
less painful transition for devs with the domain knowledge.

~~~
NateDad
Would you not assume that the entire OpenSSL library would count as being in
need of extra scrutiny? The point is that any time you let people directly
access memory, they can, and often will, screw it up.

~~~
ldng
Ok, maybe we should make a distinction between, let's say the plumbing code
and algorithms. Rust could help with the former. According to some comments
I've read here it seems OpenSSL is using it's own abstraction of malloc/free
(not that I have actually read the code). I suppose that this part of the code
would be a suitable candidate for unsafe code with special extra care taken,
then the rest of the algorithm does not not to be unsafe code. If you watch
the video you might understand better what I mean : the ARC is unsafe but
provide you with a safe abstraction for you to use in the checked part of the
language.

Of course such a project must require extra scrutiny on all level and Rust
does not resolves all the problems. I'd say pick your battles. Rust provide
some interesting middle ground between C/C++ and a completely different
language like Ada.

------
pjmlp
Since the 70's we had lots of safe programming languages with native compilers
to choose from

Ada, Modula-2, Modula-3, Oberon, Oberon-2, Active Oberon, Component Pascal,
Delphi, Turbo Pascal, Turbo Basic, Quick Pascal, D, Haskell, OCaml, Eiffel,
Go, ...

C only got widespread into the industry as a side effect of UNIX's adoption.

Nowadays we pay the price for it.

Security conscious developers should only use C when there isn't any way
around it.

~~~
dorfsmay
I'm trying to learn ATS right now, and indeed I immediately thought of ADA.
One of the advantage of ATS is that it is a lot more functional, but far from
being anywhere as mature yet.

------
chotu
There seem to be misconception that safe language implies inefficient and slow
code . It is not so in ATS2 , generated code is quite efficient and is safe
even when manipulating memory from ATS2 .

------
jacquesm
There will always be a compromise between cost, performance, features,
usability and security.

'Safe' programming languages will improve security, presumably at the cost of
usability through decreased performance. A bit like how a Ferrari is faster
than a Volvo, but not as safe.

Performance has been a major driver in the choices made so far, I'm sure that
the heartbleed affair will move the needle towards the 'security' end of the
spectrum but I doubt it will move enough to drop C as the main work horse of
systems software coding.

Reducing the complexity of the protocols used would seem to me to be a better
place to reduce the exposed attack surface. No matter what the language used
if you make a system extremely complex bugs are going to be more numerous and
due to the interactions between the various parts much harder to detect.

~~~
zepolud
>'Safe' programming languages will improve security, presumably at the cost of
usability through decreased performance. A bit like how a Ferrari is faster
than a Volvo, but not as safe.

This is fundamentally untrue. Safeness of the kind described in the article
does not come at the cost of performance -- all type checks, invariant
conditions and general formal proofs of correctness are determined at compile
time. The produced code is to be indistinguishable from a correctly written,
efficient C program. You are allowed to play as much as you like with direct
memory accesses and use all the dirty tricks you like as longs as you can
prove that the resulting program is formally correct.

What could be argued is that price you pay for all this is the difficulty of
writing such programs. But definitely not their performance.

~~~
jacquesm
Ok, point taken, if you take the pre-processor or code generator approach such
as in the article the performance hit might be manageable, but when I look at
that 'memcpy' example I can't help but suspect that that is not a free lunch
(as in, that all the extra work is done at compile time). For that you'd need
to look at the code generated.

Even if there is a 0 performance hit in this case, the majority of 'safe'
languages are neither pre-processors for C or code generators for a C like
language (which would presumably also require the linked libraries to be re-
written using something safer).

So in the specific case outlined here this may be true but in the more general
case there are usually run-time trade-offs involved.

I think the key operative word in your comment is 'correctly', writing correct
C is extremely hard and this approach makes it harder to create a certain
class of bugs at the expense of making it harder to write the program in the
first place. Tough choice, even in the absence of a performance hit!

~~~
doublec
In the example given in the article there are no run time costs. The generated
C code is much the same as the original C code from OpenSSL. There is no ATS
runtime that gets included either.

------
AnimalMuppet
OK, help me out here...

If I understand the heartbleed bug correctly, isn't there an RFC that it is
implementing? Does the RFC say that you can ask for a length, and you get that
length? That is, isn't this really a bug in the RFC?

If the security flaw's in the spec, there's not too much that the programming
language can do to help you. The best it could do is point out that you're
overrunning a buffer, at which point the programmer has a choice: Proceed to
implement the spec in code that is declared to be unsafe (useless), scream
about the spec (the right answer, and useful if anybody listens), or find some
other way to generate the extra bytes that are supposed to be in the reply
(which makes _this_ implementation safe, and still satisfies the flawed spec).

------
ASneakyFox
I feel like every one is using heartbleed to pitch their products.

~~~
mercurial
I don't the author makes any claim to have invented ATS. But why not show why
an alternative technology would have avoided this issue?

~~~
djjaxe
Just because this alternative language would have avoided this bug does not
mean a much worse but would not have been created with a higher level language
or even ATS.

~~~
mercurial
Well, people can write bad/unsafe code in any language. But ATS can remove
entire classes of bugs from a program, while C is notorious for its lack of
safety. Though obviously, it has no more built-in protection from side-channel
attacks than C.

~~~
djjaxe
Basically the cut down to everything is the laziness of how it is developed
and how many people are actually looking over the entire code. ATS allows the
developers to know that the can be even more lackadaisical about coding as ATS
will remove bugs for them...

~~~
mercurial
"There are collisions because people don't pay enough attention. Imagine if we
installed a collision detection system, people would pay even less attention!"

I hadn't heard this line of argument before.

------
Dewie
So Rust has unsafe code blocks that you use when you can't use the type system
to enforce that a piece of code is safe. I wonder if would be viable to write
those things in ATS with all the proofs to guarantee that the code is safe,
compile to C and then call that trusted C code through the FFI?

It might not work in all cases, since I have the impression that unsafe Rust
code is even more expressive when it comes to low level code (I guess this
also means more unsafe) than C.

~~~
kibwen
Rust code within "unsafe" blocks is still way, way safer than C code, while
still being almost uniformly more powerful than C.

------
coherentpony
You can write unsafe code in _any_ language. It's not the language's job to
protect you from yourself, that's what tests are supposed to do.

~~~
chriswarbo
Actually that's what types are supposed to do ;)

Tests are still useful, as sanity checks and for checking empirical properties
(eg. wall-clock time usage, timing attacks, etc.)

~~~
coherentpony
Consider the scenario where my programme is dangerous for values of an
unsigned integer less than or equal to 2. As far as I know, there is no such
type for unsigned integers equal to 3 or more. Sure, in object oriented
languages you can essentially define your own type, but this flexibility is
where you are not protected against yourself.

I see your point about types though. That said, both types _and_ testing exist
to protect you. I stand by what I said, however; languages do not exist to
protect you from yourself.

~~~
Dewie
> Consider the scenario where my programme is dangerous for values of an
> unsigned integer less than or equal to 2. As far as I know, there is no such
> type for unsigned integers equal to 3 or more. Sure, in object oriented
> languages you can essentially define your own type, but this flexibility is
> where you are not protected against yourself.

You can specify that type in a dependently typed language. Then, if you can
prove that values of that type can not violate your requirement, there is no
runtime overhead.

Are there things which are unknowable or unprovable in a dependently typed
system? Sure. But I think your initial assertion that all languages are unsafe
to a degree and therefore it can't be their job to protect you against
mistakes is unhelpful; it muddies the water by appealing to the fact that
'nothing is perfect/no approach is perfect'. But the whole point with type
systems is to eliminate certain classes of bugs - the rest can hopefully be
caught by other, less rigorous means, like fuzzy testing and unit testing. All
other bugs are relegated to problems which are (in general) undecidable.

~~~
mannykannot
I think there is an important and quite general point here: it is not just
about programming languages, but also about programming knowledge and skills.
You are replying to someone who was unaware of the state of the art in program
verification (to be fair, he recognized the issue to be solved, which is an
important start.)

As the example in the original post demonstrates, programming in languages
that have this level of support for verification is very different from
programming as it is currently commonly practiced. Not everyone will be
capable of making the switch, and for an organization to simply say 'from now
on, we are going to use this safe language', without addressing the skills
issue, is setting up for failure.

~~~
Dewie
> I think there is an important and quite general point here: it is not just
> about programming languages, but also about programming knowledge and
> skills. You are replying to someone who was unaware of the state of the art
> in program verification (to be fair, he recognized the issue to be solved,
> which is an important start.)

Well let me be clear that I knew of program verification because I'm a PL
geek, not because of any skill whatsoever.

> Not everyone will be capable of making the switch, and for an organization
> to simply say 'from now on, we are going to use this safe language', without
> addressing the skills issue, is setting up for failure.

Well, let's keep the discussion to programmers who really _need_ the things
that we are after - safety and efficiency. It's not _all_ programmers, just
programmers in some domains. Some people might even think that some people
_can 't_ adjust to writing low-level code ala C, period. But some domains need
these things, which means that we just need the programmers who are motivated
enough/have the patience to learn it. Just those programmers, not all
programmers.

If we can't get them, then maybe some one will actually have to offer some
incentives like money - instead of a purely volunteer effort as I think was
the case in this debacle. :)

~~~
mannykannot
I wasn't intending to cast doubts on any particular individual's skills, which
is why I wrote 'knowledge and skills'. I am learning this stuff myself.

You make some good points about where safety matters most, but I think a
greater general awareness would help drive adoption where it matters.
Furthermore, while this problem had widespread consequences due to it being in
widely-deployed system- or middle-level software, 'ordinary' programming can
have quite serious vulnerabilities, too.

I think schools, especially below the first tier, could do more to promote
awareness of static verification and other safe practices, and that might
modify the way their graduates approach development, even though they probably
will not be using formal methods.

There are things that can be done to improve safety in general-purpose
programming languages. I feel certain that garbage collection and the
avoidance of pointers has made programming safer, but I suspect 'duck' typing
has had the opposite effect.

In the past, the DOD has been a driver of code safety, though it has backed
down from its possibly ill-advised 'nothing but Ada' position. In fact, Ada
might be the counter-example to the idea that you can drive safety through
language choice.

You would think the banks would have a vested interest in improving things.
Perhaps they could divert a fraction of their bonus payments to create
incentives...

