Hacker News new | comments | show | ask | jobs | submit login
TLS/SSL implementation in Haskell (github.com)
231 points by dyoder 781 days ago | past | web | 176 comments



A number of people are suggesting that this Haskell implementation must be worse than OpenSSL. It probably is. Writing good crypto code is hard. There are probably bugs.

Many are saying that one problem with Haskell is that you can't eliminate side-channel attacks due to features of the language. I disagree. There is no common language better than Haskell at encoding invariants in the type system. One could, for example, implement a "biggishnum" library in Haskell using large but fixed size integers and constant-time operations.

Free monads are a powerful idea in Haskell[1]. They allow one to easily generalize "interpreters" over sequences of commands. In Haskell, more-so than any other language I've ever used, one can decouple execution from algorithm specification.

Free applicative functors generalize further[2]. They define a computational structure that must be fixed a priori. That is, by definition a free applicative functor cannot know the state of the data during its execution.

There are some problems with this. Applicative functors have an operation which can lift regular functions into it. That operation would have to be hidden, so that only a kernel was exposed that offered the ability to initialize data, and then perform computations upon it.

But it's possible to do this. It is actually not a radical idea to imagine this being done in Haskell. Making a library and a set of primitive operations that can be used by an end user safely, in provably constant time is possible.

[1] http://www.haskellforall.com/2012/06/you-could-have-invented... [2] http://paolocapriotti.com/assets/applicative.pdf

-----


> Many are saying that one problem with Haskell is that you can't eliminate side-channel attacks due to features of the language. I disagree. There is no common language better than Haskell at encoding invariants in the type system. One could, for example, implement a "biggishnum" library in Haskell using large but fixed size integers and constant-time operations.

Why do you disagree?

Timing attacks are probably the easiest kind of side channel attacks to prevent but CPU cache line eviction and branch prediction are a lot more difficult to deal with. Especially when using a programming language with managed memory where you can't be quite sure what kind of memory accesses a program will perform. And in the days of virtual machines on cloud servers, they are very relevant.

I do not think that it is totally impossible to write Haskell crypto code without any side channels, it is certainly more difficult than writing crypto code in C or Assembler.

But the big problem is that now you'd need the programmers dealing with crypto not only to be proficient cryptographers but also experts on the internals of the Haskell runtime system. That's a pretty tall order.

I would like to hear from the original authors of this library, what was their motivation in starting this project? I am sure they were not unaware of the potential of side channel attacks. I don't think this project was started with the intention of replacing OpenSSL in production use, but perhaps for fun or educational purposes. Or perhaps to write offline tools for creating and validating certificates, etc.

-----


I'm sorry if I wasn't clear. With the appropriate types in Haskell, you can write a data type that would be called a free applicative functor. It can be used to describe the shape of a computation. By reducing the surface area of said functor, by not exposing the "pure" operation, one can describe computation that are fixed in terms of their input.

This is amazing, because I mean truly fixed. The course of execution cannot be altered by the values of the input. Cache line eviction? All memory access is statically known in advance. Branch prediction? There are no branches.

The code would look a lot like what the Haxl[1] project at Facebook has produced. They used applicative functors to create a domain-specific language that described the shape of a computation without specifying the details. That allows them to write a compiler for their DSL in Haskell. In their case, they used Haxl to hide concurrency from their users, Facebook developers. Those developers can use Haxl to write concurrent, cached, optimized and type-checked queries against Facebook's datastores, without having to know how to write safe concurrent, fast queries. To the Facebook developers, there is no concurrency, the code they write just looks like a monad or an applicative in Haskell.

It should be no surprise then that one of the lead developers of Haxl originally worked on the Glasgow Haskell Compiler.

[1] Haxl slides [PDF warning]: https://github.com/meiersi/HaskellerZ/raw/master/meetups/201...

-----


> The course of execution cannot be altered by the values of the input. Cache line eviction? All memory access is statically known in advance. Branch prediction? There are no branches

This might be what it looks like before GHC had its way with it. After GHC inlining, optimization passes, rewrite rules, the code may well contain branches, conditional memory accesses, etc.

-----


I'm wondering about a situation where forgetting a strictness annotation is the analogous Haskell situation of forgetting to bounds-check some input. There are lots of ways in Haskell for something not to be evaluated. (No, the simple fixes like 'use strict fields' do not cover all cases.) To put all of them behind a free monad seems like it would be giving up a lot.

In C you could have made some 'safe buffer' struct together with some safe set of functions. But people didn't do that.

-----


> In C you could have made some 'safe buffer' struct together with some safe set of functions. But people didn't do that.

Exactly.

  struct buffer
  {
	void* data;
	size_t len;
  };

  void smemcpy(buffer* dst, buffer* src, size_t n)
  {
	if(dst->len < n || src->len < n)
		abort();
	memcpy(dst->data, src->data, n);
  }
I mean how hard is that, honestly?

Buffer bugs in C don't come from the language, they come from the tendency of C programmers to value performance highly. "I just checked that the buffer is big enough, I don't need memcpy to waste cycles checking it again. I don't need that length variable sucking up memory when I know how big the buffer is." Except when you do.

-----


It's hard to define semantics. Here, you have potentially copied less than dst->len bytes into dst->data. Is the remaining data (up to dst->len) still valid, is it garbage, should you have adjusted dst->len, etc?

When does it even make sense to partially mix byte-level representations of some data?

The buffer abstraction is necessary, but far from sufficient to build security-critical code. To do so, you also need to code in semantics somewhere, and buffers know zilch about the meaning of the data.

-----


Check again. Hint: "abort()" is not "return" -- that code doesn't prevent you from having to know how large the buffer is, it just shoots your program in the head if you don't. Which is basically the same semantics Java et al have by throwing an out of bounds exception when you don't catch it.

Incidentally, this is another reason that the language can't save you from yourself. Program termination is sometimes totally unreasonable, e.g. in real-time settings. And it's unreasonable regardless of the language. When an airplane falls out of the sky, you don't get to declare victory just because the reason it did was an uncaught out of bounds exception rather than a buffer overrun. In that context the programmer must explicitly handle the condition that the buffer is too small, regardless of the language, because "abnormal program termination" is not an acceptable outcome.

-----


Your code checks whether dst->len < n, but not whether dst->len > n; i.e., after copying 5 bytes to dst, there may be another 10 bytes of garbage in dst. This 'garbage' might as well be a password from a string that was just free'd. A fix would be to set dst->len to n after this operation.

-----


OK, I see what you're getting at. There is a difference between the end of the buffer and the end of the data. But copying some bytes into a buffer doesn't imply that you intend to discard the rest of the buffer. It's quite common to want to replace a header or some other subsection of a buffer and have the rest of the buffer remain unmodified.

Moreover, the purpose of the above is to achieve the level of bounds checking that you get with the likes of Java. If you want to go beyond that then you need something more complicated -- maybe add separate variables to the buffer struct for buffer_len and data_len and then have different memcpy functions based on whether you want existing data in the buffer to be truncated. But there comes a point at which further complexity produces more confusion than safety.

-----


Ideally, you would have a parsing and validation routine which fills in a struct. Afterwards, you manipulate only the struct instead of raw byte representation.

> But there comes a point at which further complexity produces more confusion than safety.

Buffer is simply not the right abstraction for complex protocols, regardless of how "complex" operations you define.

-----


> Ideally, you would have a parsing and validation routine which fills in a struct.

The parsing and validation routine is where you need the buffer abstraction. If you make a mistake there then a copy function that won't let you read or write past the end of the buffer will mitigate the damage.

-----


    if(obj != null) {
        obj.stuff = ...
    }
I mean how hard is that exactly? yet Java applications have null pointer exceptions all time

-----


Nothing wrong with null pointer exceptions - they are specific checks to see if the pointer is valid and are a good thing. If you see one, it means the programmer made a mistake and the runtime caught it for him.

There are also tools to help with it in the form of @NotNull, @Nullable, etc.

It's a completely different issue compared to actually accessing memory in an undefined way as you can in C. The bounds check you are replying to is manually creating something like that exception through the use of abort() while in java it is done always at the cost of some performance in exchange for safety.

-----


Consider the following: some languages don't have null. When you need nullable what you really want is an Option/Maybe type. No null pointer exceptions, ever! It just won't compile until you fix the bug.

-----


Or, more generally, I think languages should have union and intersection types - "(IFoo && IBar) || (IBaz)", etc. (I also think languages should have arbitrary Boolean expressions as types, where the compiler emits a warning and a runtime check if the compiler cannot prove either if the condition will be always met or it won't be always met, but that is another matter.)

Null is just a built-in final/sealed type(enum?) with a single value and no operations defined, like boolean is a type(enum?) with two values.

That, combined with proper type inference, would alleviate a large chunk of the boilerplate associated with statically typed languages in general, and Java in particular. (For how much Java is pushed as moving errors to compile time, there are an awful lot of things that fail only at run time that could be checked at compile time...)

Same thing with checked exceptions. A checked exception is really just an option - return foo or failure.

-----


Even when Java fails at runtime, because it forces garbage collection for memory management, buffer overflows and dangling pointers can't really happen - yes, if you're not careful, the process can still have unexpected behavior, or it might crash. Actually, the Heartbleed bug wouldn't have been such an issue if OpenSSL wouldn't use its own wrapper around malloc/free (at the very least, the malloc/free underneath would be a variable in that equation), e.g. http://article.gmane.org/gmane.os.openbsd.misc/211963

So while I love strong and static type systems, the issue at play here is memory safety - the very reason for why people still use C/C++ is the same reason for why Heartbleed happened.

-----


You're assuming that the JVM is secure. But yes, good point.

Although I'd argue that a truly strong type system should also catch problems with pointers. The compiler should try to prove that any pointer access/arithmetic can or cannot cause undefined behavior - if it could cause undefined behavior, emit a compiler error, if it cannot prove that it won't cause undefined behavior, emit a warning and insert a runtime check.

(Still have the underlying "unsafe" functions. But group them into part of the standard library as an "unsafe" package.)

-----


That's what Rust does. Its type system prevents all memory and concurrency errors. But to be able to have a self-hosting compiler (and to be able to write an OS in it) there's also `unsafe` blocks.

-----


If you use @NotNull etc in Java, then it also won't compile until you fix the bug.

-----


They come from the C language allowing it.

-----


All Turing complete languages allow you to do anything all Turing complete languages allow you to do. You can allocate a huge block of memory in Java or C# and then write your own malloc to allocate buffers from subsets of it. In some cases that may have better performance. It also allows you to have buffer overruns. You can run right off the end of one buffer and into the next and the language won't stop you because it only knows that you haven't reached the end of the array it allocated.

-----


Do Turing machines have a concept of "undefined behavior" in the same way that C does? AFAICT all behavior of a Turing machine is defined -- sure you might not calculate what you want, but that doesn't admit arbitrary behvaior outside of the semantics of the machine as in the C language.

-----


> Do Turing machines have a concept of "undefined behavior" in the same way that C does?

Sure they do. For example, nothing in the Turing machine specification tells you how fast they are. Whether a given program will run on a Turing machine in a given amount of wall clock time is undefined.

Here's a real example with real machines. Suppose you have a program with two threads. The first thread computes data and puts it on a queue for the second thread, the second thread writes it to the filesystem. When the compute thread has computed all of the data, it sleeps for 10 seconds and then terminates the program without checking whether the write thread has actually completed writing all the data to the filesystem.

That results in undefined behavior in every real language I've ever heard of. Whether the data is all written depends on how fast the CPU is, how fast the filesystem is, what other programs are scheduled on the machine that compete for CPU and filesystem access at what priority, etc. Languages don't define any of those things so you have undefined behavior.

-----


That's a different abstraction than "Turing Machine". Turing machines do not have '"undefined behavior" in the same way that C does' - it would look like a "in state 3, if you read a 0, move to an arbitrary state, write arbitrary symbols to arbitrary locations on the tape, move head to an arbitrary position on the tape, and possibly restructure the state machine in arbitrary ways."

-----


Turing machines have undefined behavior in exactly the same way that C does. What do you suppose happens if a Turing machine reads from a location on the tape that has never been written? The value the machine observes in that circumstance is not required to be well-defined in order for the machine to be able to do all computations required of a Turing machine. Moreover, the more you add features of actual machines like parallelism and timers, the more opportunities for undefined behavior you introduce. What do you suppose a multiprocessor Turing machine does if you write concurrently to the tape without locking?

-----


"What do you suppose happens if a Turing machine reads from a location on the tape that has never been written?"

It reads the default symbol (see e.g. the description of a TM on Wikipedia). It's semantics are well-defined.

The semantics of TMs are well-defined. The semantics of C's "undefined behavior" is not well-defined.

-----


I should have said "blank symbol" not "default symbol".

-----


"Undefined behavior" means "undefined by the language specification", but it is well defined by the compiler implementation. In other words, you never use the C language, always an implementation.

-----


You're thinking of implementation-defined behaviour. Undefined behaviour is not well-defined; the compiler does not have to implement it consistently or sanely, and certainly gcc won't.

-----


"All Turing complete languages allow you to do anything all Turing complete languages allow you to do."

This is false. All Turing Complete languages allow you to compute anything other Turing Complete languages allow you to compute, but that says nothing about encoding, or about interaction with the world. For an obvious encoding example, imagine you have a turning machine that can write only the symbols 0 and 2 to its tape. This is obviously Turing complete - it's isomorphic to a TM that writes 0 and 1, the TM doesn't care how the symbols are labelled. Now, if we ask it to encode 7 it has no trouble doing that in some encoding, but that can't ever be "111" on the tape.

-----


Clarification: all Turing complete languages allow you to do any _computation_ all Turing complete languages allow you to do.

-----


The Python language allows constructs such as socket.write(my_private_key). Do you think that just because Python makes it possible to write your private key to a socket that it must necessarily happen?

-----


> Buffer bugs in C don't come from the language, they come from the tendency of C programmers to value performance highly.

You have to use other people's code / language constructs which are underspecified and/or you don't have time to fully understand. Throw in versioning to the equation and now the problem is exponentially worse. This is why C code will always be full of memory vulns. Heartbleed is a very oversimplified example of the problems in C.

-----


The point is that in Haskell you could reasonably define a 'biggish-num' type where only strict and constant-time operations would be possible - you couldn't forget to make an annotation, you'd have to explicitly ask for the value to become a 'normal number' before being able to make any such operation.

In C having such a 'safe buffer' struct would require discipline from you, in Haskell that'd be the natural order of things.

-----


From your comment, I can't understand why defining a special struct in C is different from defining a special type in Haskell.

I understand that many C programmers don't do it, mind you, but how is that different from a Haskell programmer not using the proper type for a certain operation?

-----


In Haskell, there are types that you have probably heard of but not used. Functors, applicative functors, and monads. In Haskell, I could define a value and its type:

> value :: ConstantTime Int

But how do I obtain an "Int" from that, since I know it somehow relates to an integer type? The answer is: you must ask the implementation of ConstantTime for the value, and if it doesn't want to give it to you, it doesn't have to. Constructors can be one way.

The library author could define a number of additional operations. These operations are privileged, being written as part of the library, they can actually inspect the inner data at any time. This is the "small core" that needs to be verified.

> addInt :: ConstantTime Int -> ConstantTime Int -> ConstantTime Int > subInt :: ConstantTime Int -> ConstantTime Int -> ConstantTime Int > mulInt :: ConstantTime Int -> ConstantTime Int -> ConstantTime Int > divInt :: ConstantTime Int -> ConstantTime Int -> ConstantTime Int

Then to compute something like Euler's formula one could write a function in terms of those functions.

> newtonsMethod :: ConstantTime Int -> ConstantTime Int > newtonsMethod = ...

And one can be certain that the laws that apply to `addInt` apply to `newtonsMethod`.

This composition has wonderful benefits because if there is no branching operation defined by the library, users cannot use branches inside ConstantTime. To serially apply addition, multiplication, etc, one can write in a sort of reverse polish notation:

> square :: ConstantTime Int -> ConstantTime Int > square input = mulInt input input

Using the multiplication operator, (*), would be illegal there. And that function, square, cannot inspect the value of input. Because it can't inspect the value, it can't make choices based on that value.

-----


Thanks for your detailed reply.

I still fail to see how this is fundamentally different from a similar approach in C (but maybe that's because I don't understand Haskell at all). In C you would define your ConstantTimeInt as a struct, and define a bunch of functions that operate on this struct. Using a + or * operator is also not defined for this struct. You'd have to use

   csiAdd(const struct ConstantTimeInt *addend1, const struct ConstantTimeInt *addend2, struct ConstantTimeInt *res)
In all cases it requires the developer of the library to be conscious of timing attacks, and not just use a regular int with variable-time addition.

-----


In your example, any attempt to extend the core library is fraught with error. There is no way to safely do so and transitively trust the authorship. For example:

    void square(const struct ConstantTimeInt *value, 
                struct ConstantTimeInt *result)
    {
      sleep(rand());
      return multiply(value,value);
    }
This is legal C! The ConstantTimeInt structure does not allow the composition of operations to produce safe operations from their composition!

This is illegal Haskell:

    square :: ConstantTime Int
           -> ConstantTime Int
    square = do
      randomRIO (0,100) >>= threadDelay
      return $ multiply value value
In Haskell, tight, small libraries can be written using impure constructs (as these ConstantTime things likely would be) and verified logically. Then, using the type system, that kernel can be used to create an amazing variety of strongly typed systems safely from a compact representation.

In your envisioning of a C ConstantTime library, you would never finish writing the library. If someone wanted to perform date arithmetic, you would have to implement that in your library and check it by hand and monitor it and maintain it. You could never let your guard down on any commit that would touch an ever-expanding body of code that would have to be rigorously checked to handle increasingly complex scenarios.

For example, to implement a constant time big (but fixed size) integer multiplication operation, you would not be able to trust that some other author's composition of your functions is safe purely by its type.

-----


OP's point is that the onus is on the library developer to not just use Int. You're treating it like the library developer is a user of another library.

-----


I guess the point is that Haskell is more expressive and declarative, which makes this almost (not totally) the default behaviour of Haskell library developers. And I guess the resulting code would be easier to parse (by a human) because of its declarative nature. It would also be easier to test due to its functional pureness.

Whether that merits dumping the incredible amount of man hours that have gone into OpenSSL is debateable. I'd probably say it's worth fixing OpenSSL and trying to refactor it into a more provably sound system.

-----


Ok, but saying that Free Applicatives are the "default" behavior of library developers is disingenuous.

-----


Fair point

-----


In Haskell I can validate and be sure that I'm using "the proper type" in 100% operations - in C it's hard to verify that there is no code (intentional or accidental) that touches the inside of that struct in some way.

For example, C is full of operations (pointer arithmetic, read from an unbounded array) that may read the inside of that struct without that code mentioning that struct in any way.

-----


In C, sequencing isn't typechecked. In Haskell, it is. Actions must be compatible to be chained together in (what in C would be) a block.

-----


>In C you could have made some 'safe buffer'

No you couldn't've. C is fundamentally unsafe language and people like you should stop pretending it isn't.

-----


People (all people) are fundamentally unsafe, not languages. It may be harder to write safe code in some languages than others, but that arguably doesn't make the language unsafe. Most certainly though, it does not make it "fundamentally" unsafe.

-----


Browse through the National Vulnerability Database, and write down the language of each bug.

Discounting bad PHP, a huge majority of bugs are in C/C++ code. And "past popularity" isn't a good explanation, because a lot of the software is fairly recent.

So, the following isn't an opinion. It's simply a matter of observable fact. If you write in C/C++, you are far more likely to introduce security vulnerabilities than in other languages; therefore, unless there's a pressing reason to use these languages, don't.

-----


You are making the claim that there would not be an increase in representation of other languages if they were more represented in the wild. I don't know how you could prove that.

That is, I don't think this exercise really shows what you think it does. Consider, if 90% of the software out there is in c/c++, and you had equal representation of language to vulnerability, then you would expect 90% of the vulnerabilities to be in c/c++. This would not mean that you are more likely to write bugs in those languages. In fact, unless I misunderstand, it would simply mean you are just as likely to have bugs there as otherwise.

Right?

-----


You missed the point, and your characterization of my claim is way off.

1. A huge number of vulnerabilities affect C/C++ programs, almost all of which are memory based.

2. Memory-managed languages take care of this for you.

3. Therefore, C/C++ shouldn't be a default choice in domains where a managed language does just as well (a separate question).

Everything else is tangential.

But here are some reasons why your second paragraph is dangerously wrong (and why my claim was not as you characterize):

1. It only applies if we assume a uniform distribution of security effort across languages.

2. It only applies if we assume that c/c++ is being used for the same class of applications programs written in other languages, or that the applications have similar attack surfaces.

3. It only applies if we assume a uniform distribution of security effort over all code regardless of age.

and so on...

And also, 90% of code -- especially relatively recent code -- is not written in c/c++.

-----


Look, I don't even really disagree with your hypothesis. I just question whether counting all of the CVEs and "discounting bad PHP, a huge majority of bugs are in C/C++ code" proves much. Other than the majority of attacked applications are c/c++. (well, likely they are php, but we are tossing that for some reason.)

Especially if you follow that with the claim of "If you write in C/C++, you are far more likely to introduce security vulnerabilities than in other languages." In order for that claim to stick, you have to show not just that there are more CVEs against c/c++ than other languages, but that there is the same effort spent in attacking non c/c++ programs. Right? (Or, am I misreading a claim on that?)

Sure, if you reduce that to "memory vulnerabilities", it is a true statement. However, you did not make that reduction. As you point out in your counter #2, there are plenty of other vulnerabilities out there. What makes you think people are more likely to avoid those than they are memory vulnerabilities in c/c++?

As for the 90% of code not being c/c++, what is the point? Unless you can show that they receive the same level of attack as the c/c++, you can not really use that to claim that they are inherently more secure. Worse, your throwing of php under the bus just shows that recent languages don't do enough to prevent security mistakes.

Heaven help you if you throw in XSS and friends. Suddenly one of the darlings of the tech industry at the moment, javascript, is rocketing to the top of the list for security blunders.

-----


> It's simply a matter of observable fact. If you write in C/C++, you are far more likely to introduce security vulnerabilities than in other languages; therefore, unless there's a pressing reason to use these languages, don't.

And in the case of crypto code, one could argue that there is a very good reason to use C/C++: to prevent timing attacks.

-----


How hard would it be to develop a fast memory safe language that eliminates timing attacks?

-----


That seems like a great idea.

Timing attacks work when code tries to be fast, taking shortcuts where possible. There's a reason we talk big-O instead of big-Theta: we gladly accept when an algorithm finishes early. This is, in this case, highly undesirable. Shortcuts are great, everywhere but crypto; not unlike recursion being great, unless it's embedded.

A language where every expression takes constant-time to compute seems like a solution. No short-circuiting, no variable runtimes. Don't use sorting algorithms like insertion/quicksort where best-case & worst-case are very different; use selection/merge where best-case = worst-case = average-case. Use constant-time hash functions. Caching is a very prevalent (and somewhat invisible) shortcut that needs to be solved (how can a language disable caching?).

Obviously this isn't fast but you can't have both.

-----


Obviously this isn't fast but you can't have both.

Merge sort is still pretty fast. It isn't the fastest. But safe > fast when it comes to reliability and security.

-----


That's a very good question. I think it's possible that the recent revelations will at least spur some interest in researching this. It seems like we need it badly.

-----


You can imagine a language that's memory safe. In fact, these languages exist today.

-----


> There is no common language better than Haskell at encoding invariants in the type system.

There are a few uncommon dependently typed ones with code extraction to other languages though. Though doing so is fairly time intensive still.

-----


> A number of people are suggesting that this Haskell implementation must be worse than OpenSSL. It probably is. Writing good crypto code is hard. There are probably bugs.

Ok, but if it were vetted as much as openssl, it would probably be much better, yes?

-----


Define "better". Probably it wouldn't be more secure because both would be scrutinized and production-tested about the same. It would protect us against some common C errors but might make us vulnerable to "common" Haskell vulnerabilities. You really can't say.

It probably would be slower though, which is kind of a big deal for webservers and the like.

-----


Haskell is actually a very fast compiled language. Especially for webservers, where its concurrency shines.

-----


> It probably would be slower though, which is kind of a big deal for webservers and the like.

More critically, it would only be useful to Haskell code. Haskell, unless C, cannot be embedded. Having a version in Rust - which does not rely on the runtime - ought to be safe from this kind of issue (baring any problem in Rust itself) while being still embeddable from external code. It would still be somewhat slower than C, and of course Rust is far from mature.

-----


Not in only could be embedded, but also compiled into dynamically linked library.

(Shameless plug) I've collected examples of this here: https://github.com/drdaeman/haskell-library-ffi-example

-----


The nice thing about this example is that you only need GHC to create the shared library, and then GCC can do the rest with just the .so file produced.

-----


Doesn't Haskell compile first to C and then to native?

-----


Not since GHC 7.0: http://www.haskell.org/ghc/docs/7.4.2/html/users_guide/code-...

-----


http://www.haskell.org/haskellwiki/Calling_Haskell_from_C

-----


I was unaware of that. Looking a bit deeper into it, it looks pretty involved as soon as you want to do something non-trivial with it.

-----


How would you restrict the possability of lifting a function into an Applicative? Can't you simply use pure and <*> for that?

-----


While Haskell mitigates or eliminates some classes of bugs common in C (such as buffer overflow), it also makes it more difficult to guard against side-channel attacks like timing attacks[0], because lazy evaluation makes it more difficult to reason about the actual behavior of the code at runtime.

This isn't a dig at either Haskell or C; the point is that all programming languages and environments have their "gotcha!" moments.

[0] https://en.wikipedia.org/wiki/Timing_attack

-----


http://hackage.haskell.org/package/securemem implements constant-time bytestring comparisons for haskell. With the bonus that the data is scrubbed when GC'd. This is used by parts of the haskell TLS stack, for example http://hackage.haskell.org/package/cipher-aes

I don't think that laziness makes things much harder. Compare a C function with if foo return 0 in the middle and a Haskell function that lazily builds thunks that don't end up being evaluated in a particular branch. There's obvious things to do to both to make them run in constant time. The hard part about dealing with a timing attack in both cases is recognising that the function is used in such a context and needs to be constant time.

AFAIK, the haskell TLS stack has not yet been audited. For this reason, there's some reluctance to use it in parts of the Haskell community. Although http-conduit and http-client-tls do use it, so everyone writing programs that hit https from haskell using conduit is using it already.

-----


> http://hackage.haskell.org/package/securemem implements constant-time bytestring comparisons for haskell. With the bonus that the data is scrubbed when GC'd.

As far as I understand, this is not sufficient.

Some things that are needed in addition are (at least):

* constant time math operations (multiply, divide, exponentiation, etc.)

* no branching based on secrets (running the code results in no branch conditions based on secret data)

Here's an interesting article on implementing a form of Elliptic Curve Cryptography without even local timing attacks: http://blog.cr.yp.to/20140323-ecdsa.html

-----


Why can't timing be handled external to the scrambling or descrambling operations? Look at the time, do the crypto, then check the time and sleep as necessary so that this particular operation takes about as much time as the worse-case. Something like:

  cryptofunc x = do
    begintime <- gettime
    result <- purecryptofunc x
    endtime <- gettime
    sleep (worstcase - (endtime - begintime))
    return result

-----


Properly implemented crypto takes a constant number of instructions to encrypt/decrypt, regardless of key, plaintext, etc.

Time to sleep() would vary across processors, os-es, and even load (!), so it's not exactly a workable solution. Not to mention it would be grossly inefficient (Context switch on every crypto primitive? No thanks).

-----


This still would allow side channel attacks on cpu utilization. cpu utilization can be inferred, for example if the attacker has a vm on the same machine, or the attacker can make a second request which completes sooner if the cpu is less utilized.

-----


while not a completely terrible idea, sleep doesn't give you any guarantee of when it'll wake your code up again, only that it'll be asleep for at least X time. I think this is ok, but someone with more info on these sorts of bugs would be better place to comment.

It also seems reasonable to add a random length of time to the sleep time, so that differences in computation time are indistinguisable from the randomness added by the random variable. I could be completely wrong though.

(also, you probably need to force with seq or deepseq the result of purecryptofunc before getting the second time, or you might be adding that evaluation time to another part of the code, and then this code becomes just a constant time added to the computation)

-----


> The AES implementation uses AES-NI when available (on x86 and x86-64 architecture), but fallback gracefully to a software C implementation.

> The software implementation uses S-Boxes, which might suffer for cache timing issues. However do notes that most other known software implementations, including very popular one (openssl, gnutls) also uses similar implementation.

Oh, well if OpenSSL does it..

-----


Downvotes for quoting from the pages linked, which are defending their use of non-constant time operations by referencing OpenSSL. Do you guys have working brains?

-----


Perhaps it would be more interesting to discuss how the author of the Haskell TLS/SSL library gets around timing attacks. Actually I have him on twitter... paging @vincenthz.

Timing attacks is the first thing I thought of when I saw this implementation so this should be interesting if he can respond.

-----


(author here)

Getting around timing attacks is not too different in Haskell from doing the same thing in C. First, there's 2 different classes of timing issues:

1) cryptographic ones

2) and the others (for the lack of better classification)

For cryptographic ones, it's hard to enumerate all the counter measures used/not used, but some examples:

- use of blinding with RSA

- with ghc 7.8, exponiantiation is using GMP expmod_safe

- AES is using native instruction when possible.

- Use of scrubbed memory/memory constant where possible

- some lowlevel cryptographic implementations are actually in C (hash, aes, ..)

Also, while I'm linking against specific set of cryptographic implementations by default, there's nothing preventing anyone from pulling different crypto implementations from other places.

For the other issues, it's usually just a deal of strictness, and not bailing/notifying error too early. (e.g. This is what Bleinchenbacher and CRIME are all about).

- data comparaison is time constant by comparing all bytes where necessary: prevent padding attack

- there's routines to evaluate boolean values without bailing too early (False && ... -> would bail at the first false)

- lots of part in tls is using strict bytestring, and strict values.

- the GC is adding a lot of noise potentially hiding timing issues, potentially exacerbated by pure values instead of mutable values (like you would in C)

Also, I don't want to claim there's no issues or ever will be. Hopefully there's no such things, but I welcome any audits and questions about stuff that look fishy. But people shouldn't only looks at the cryptographic side of security; openssl had so many issues about basic unchecked values, buffer underflow, overflow that's not even funny anymore. Just sayin'

-----


How do you prevent the compiler optimizing constant-time operations into non-constant time?

-----


You would want to specify a version of GHC and optimization flags, then it would just involve code analysis of your output.

Likely this work would be necessary to "prove" (not in the strict form) your assumptions about constant-time anyway.

-----


GHC allows you to force fields to be strict rather than lazy.

http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/bang-...

-----


Strict fields are actually standard Haskell, http://www.haskell.org/onlinereport/haskell2010/haskellch4.h...

(The similarly looking BangPatterns extension is supported by GHC only.)

-----


If I were handed a few million dollars and told to go create an SSL implementation, my approach would be to write the implementation in Haskell in a way that compiles down to LLVM or, possibly, assembler, which is then what the project would distribute. Haskell's type system can be used for all the juicy goodness being described in the other comments, but it would not be running in the Haskell runtime. See: http://hackage.haskell.org/package/atom which goes to C; in principle it's not hard to imagine then how you'd go to another target.

There's two reasons for this... first, a pure-Haskell solution locks it to Haskell only, as there's very little realistic chance of embedding Haskell in any other environment. Having it compile to something like straight assembler or LLVM means that we can almost certainly wrap that in almost any runtime environment effectively. (We might require a bit of new SWIG-like tech to do it, but under the circumstances it would probably be worth it.) To the extent that's not a true statement, imagine that I will research and find the correct level of abstraction. (Maybe it really is C.)

Second, with all due respect to the Haskell runtimes in the world, it's just too big a surface to secure. People are asserting here in this thread that you could avoid side-channel attacks in the Haskell runtime, but how would you prove that? To any degree?

Theoretically, you might be even better off in a real proof language, but it's not clear to me that any of them have a good enough extraction-to-LLVM/assembler story.

This would, in theory, produce a solution that should be embeddable in nearly any other language, and indeed, even in embedded environments, while still leveraging the strengths of Haskell's expressiveness and type safety to build a very safety-critical library. We would also be able to write some powerful tests to verify that the putative time-invariant operations really are time-invariant. My strong suspicion is that we'd discover some processor differences here that would surprise everybody.

-----


Exactly the first thing I thought. There are plenty of side-channels in cryptographic code that require access to somewhat low-level hardware to fix (e.g. the acoustic attack on GPG that got plenty of press), and I don't see how you can address those without e.g. inline C or something else that negates the advantages of using Haskell.

-----


FWIW inline C doesn't negate the advantages of using Haskell. It's inline.

-----


The first thing I searched for in the README was "constant time".

This whole thing has got me curious about techniques for formally verifying software.

I'm a complete beginner on this topic, but I assume there are tools to help write software that's formally verified / proven to be correct. Can they make any guarantees about side-channel aspects such as timing in these systems? Anyone care to give me some pointers on where to begin?

-----


This blog post isn't about writing automated tools to detect timing attacks in code, but it has a lot of stuff about how to write crypto code -- elliptic curve cryptography in particular -- that is impervious to timing attacks, very interesting if you ask me: http://blog.cr.yp.to/20140323-ecdsa.html

-----


http://scholar.google.co.uk/scholar?q=%22timing+attack%22+%2... turns up some papers that consider this. They don't seem to provide much by way of tooling though.

A glance over the modelling doesn't seem to preclude using a model checker like http://www.event-b.org/ for it though.

The challenge comes in when you try to translate. You write your model up and validate your approach, you then try to translate it into C (or whatever) and that point you can be as certain of your design as you like, you still can't be sure that the compiled software will work properly. I haven't seen a convincing solution to that particular problem (and I've done some research on the topic for three separate projects now).

-----


Maybe Rust will provide the right balance between the two.

-----


No programming language seems to provide primitives for constant-time algorithms.

At this point, gcc and clang should at least provide constant time memcmp primitives; any custom solution not supported by the compiler is not portable and needs to be validated separately for each implementation, each platform, each compiler version.

-----


http://golang.org/pkg/crypto/subtle/

-----


Package subtle implements functions that are often useful in cryptographic code but require careful thought to use correctly.

-----


whoosh!

-----


Don't make shit up. It'd very easy to write totally strict code in Haskell. It's even possible to write code that won't yield to the run time system for a constant time op.

-----


Possibly more interesting is a machine checked implementation.

http://www.mitls.org/wsgi

-----


From the website, it seems that they proved that their implementation is correct with respect to their formalization of the interfaces in the RFCs. That is, their implementation is logically correct.

However, this says nothing about whether or not the implementation is secure. They admit that they don't model time in their proofs, so I doubt their implementation is free of timing attacks. Moreover, its written in F#, so you have to trust your CLI implementation to be bug-free as well.

-----


> its written in F#, so you have to trust your CLI implementation to be bug-free as well

Is that any different to an implementation in C relying on the processor being bug-free?

-----


If you count up all of the possible states a processor can be in, as well as the number of transitions between them, you'll come up with a very big number of possible execution paths you'll need to verify work correctly. However, if you do the same for the CLI (or any non-trivial piece of software), you'll come up with a much, much, MUCH bigger number. As in, each additional state the CLI can enter potentially doubles the number of possible execution paths you'll need to check.

The number of states and transitions for a processor today is large, but not so large that engineers can't formally and automatically verify that the processor will behave correctly under all inputs. Also, the structure of the processor and the way it is specified (i.e. Verilog) make it amenable to formal verification.

This is not true for most software, not even things written in Haskell. You can cover a lot of cases with automated software testing, but you'll find that it's very, very, VERY hard to prove that you've covered every possible case. Even if you can, modeling multiple instances of the system as they evolve in time (i.e. any networked system or interactive system) means you have to consider all possible combinations of states they can be in.

To put into perspective how hard formal verification of software is, I have a story. A friend of mine did his masters thesis on modifying TCP to allow for host mobility, and formally proving the correctness of his new TCP protocol. Despite having a 100-node cluster of beefy (48GB RAM) compute nodes at his disposal, it simply didn't have enough total RAM to verify the correctness of his protocol beyond five rounds of communication between one client and one server.

Unless the CLI developers add machine-checked but hand-crafted proofs of correctness for each and every method, I trust the processor to be bug-free far more than any piece of software it runs. Now of course, if the NSA tampers with either, then all bets are off :P

-----


The CLI is running in a processor, isn't it? I guess your argument is that the software using CLI is potentially affected by more bugs ;)

-----


Very interesting. I was thinking something similar could be done in Haskell.

-----


Haskell is not a theorem prover, and I doubt it can be made one. You can encode some properties of data via the type system, but it's still a general purpose language. Agda on the other hand is a theorem prover, but much less general purpose.

-----


This seems interesting.

I'm completely ignorant about Haskell. I see there's some code in a "Benchmarks" folder; I think it would be highly interesting to see a comparison in speed between OpenSSL's SSL implementation and this one (the operations that a web server would normally have to do).

Can anyone make that happen? I can't even figure out how to execute Haskell code in Ubuntu 13.04.

Seems to me like if the code base is 20 times smaller than OpenSSL, and we can assess whether timing attacks are present or not -- and if they are, replace the timing critical code with C code, perhaps -- that this would be a real alternative to OpenSSL. Am I being unrealistic in thinking this? Not that everyone will adopt it, mind you, but that adopting it would be a wise thing to do?

-----


    sudo apt-get install haskell-platform
You can install development packages through the "cabal" command line tool. The REPL environment is "ghci".

On one hand, it is a very high performance language with tons of purity, abstraction, and invariants built in to the type system and semantics. On the other hand, certain aspects can be problematic to reason about.

-----


    E: Unable to locate package haskell-platform
The package seems unavailable for Ubuntu 13.04. I think it's time to install 14.04 for me.

-----


http://www.ubuntuask.com/q/answers-haskell-platform-on-13-04...

-----


The big difficulty in adopting it in any current codebase is that most current pieces of software are not written in haskell. The huge advantage of projects in C like OpenSSL is that you can produce platform-native shared libraries that can be used easily from any other language.

A bit of research shows that it's possible to make shared libraries with haskell, but not quite as straightforward to use them. I have no practical experience about how troublesome it would actually be.

-----


If you're curious (like I was) if anything else in the Haskell ecosystem is using this, this page lists packages that have dependencies on tls in Hackage (the Haskell package repository). There are 26 packages that depend on tls.

http://packdeps.haskellers.com/reverse/tls

Meanwhile, HsOpenSSL (Haskell bindings for OpenSSL) has 22 dependencies:

http://packdeps.haskellers.com/reverse/HsOpenSSL

-----


Nice and everything, but I somehow cannot imagine people massively jump over it. Maybe it's superstitious, I dunno…

On the other hand, I undoubtedly agree that we should start making and deploying alternatives in more safe modern languages. In fact, I guess we should start step-by-step rewriting everything that's written in C/C++ and OpenSSL is a good thing to start with.

I guess it's a good chance for Rust & friends.

-----


Crypto poses a unique challenge with the threat of timing attacks. A major benitif of using low level languages is that it is easier to assure that your code takes constant time regardless of the input.

-----


So what candidates are there? While C may make it possible to write code that avoids timing attacks, a quick glance at [1] shows one must avoid memcmp and array lookups based on secret data, neither of which would be prevented by the compiler, and both probably similarly easy mistakes as heartbleed.

Can Rust compile to a shared lib the programs can use as easily as a C library? (I think the answer is "not yet", but I'm not sure? At least, I've seen projects for compiling Rust without any runtime library).

Today I learned this is possible with Ada [2], but it seems there is some boilerplate to start/stop the runtime. Does Ada provide the necessary low-level control to implement crypto?

Are there sufficiently strict dialects of C?

NaCl [1] says "There are an increasing number of cases where the C implementations and assembly-language implementations are automatically generated from code that was actually written in another language, such as CAO or qhasm."

CAO is a "a domain specific language for describing cryptographic software" [3], while qhasm [4] is a portable assembly seemingly not focused on safety.

[1] http://nacl.cr.yp.to/internals.html

[2] http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gnat_ugn_unw/Creatin...

[3] [PDF] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113...

[4] http://cr.yp.to/qhasm.html

-----


Ada allows much more precise and standardised control of low level details than C or C++ do. You can, in a standardised way, define a record (struct) where the fields are a signed integer taking up 3 bits, followed by a 1 bit bool and then a 4 bit signed integer, followed by two 16 bit unsigned integers, and have them all packed in the order you've specified (including whether it is big or little endian), and only allowed to be used in the specific memory locations you've selected (think memory mapped control registers on a MCU). This is all part of the language spec. Not only that, if you want you can have a pointer (or access type) to that single bit bool if you need to (so obviously access types are more powerful than pointers because almost anything can be addressed, including things on remote machines if memory serves).

You also get the powerful feature of pre and post conditions being part of the language from Ada 2012 onwards, meaning you can even more precisely control what's happening. You can have types for potentially dangerous data and safe data that cannot be intermixed without being very explicit about it (something that would've avoided this bug if I understand it correctly).

I've said it in other threads, and it's worth saying again (as someone whose primary language is Haskell) Ada is absolutely the right choice for this sort of programming. It's just as low level, if not more so, as C, with all the high level constructs of C++ (well most anyway), it encourages writing safe code by making the safe easy and the unsafe hard, and has excellent open source tools backed by commercial compiler writers (AdaCore).

-----


you could also probably go a little further for the core of the library and use the SPARK subset of Ada. It is almost like it was made for the purpose http://www.adacore.com/sparkpro/language-toolset http://en.wikipedia.org/wiki/SPARK_(programming_language)

-----


Can Rust compile to a shared lib the programs can use as easily as a C library?

Yes it can! You use the `#[no_std]` if you wish to be rid of the runtime, and the `#[crate_type = "lib"]` for outputting shared libraries (or use the `--crate-type` compiler flag).

Here's an older blog post on it, though the rust code there probably won't compile as-is with current Rust: http://bluishcoder.co.nz/2013/08/08/linking_and_calling_rust...

-----


To be clear, #[no_std] only "removes" the runtime by ensuring you don't accidentally use any functions that need the runtime. Something like

  #![crate_id="basic_lib"]
  #![crate_type="dylib"] // for a .so

  /* not there's no `no_std` */

  #[no_mangle]
  pub extern "C" fn add_in_rust(x: i32, y: i32) -> i32 {
      x + y
  }
Compiling that will give a `libbasic_lib....so` which can be linked against by any C program, and the `add_in_rust` function called without the Rust code needing to touch a runtime. (I imagine that one may have to add some extra linker flags.)

There are many functions/types in libstd etc. that can be used without a runtime (although its still not as great as it will be in future).

-----


Unfortunately, this means giving up the powerful Rust standard library and threads. Fortunately, third-party devs are creating a "zero-runtime" standard library that can be used in situations like this, and in embedded programming:

https://github.com/thestinger/rust-core

Right now, rust-core only provides basic collections and io, but more may be in the works. I suspect rust-core will eventually grow into a small but capable standard library.

-----


rust-core has been coasting along in maintenance mode for quite a while now, because it was only intended as a temporary stopgap. Nobody really wants to fragment the community with a third-party standard library.

Instead, the Rust devs intend to restructure the standard library such that it has multiple profiles, and that you can jettison parts of it (such as the parts that require the runtime) without losing all of the other goodies. Here's the most recent RFC for doing so, filed yesterday: https://github.com/rust-lang/rfcs/pull/40/files

-----


Oh, very cool. I was hoping something like this would emerge.

-----


> qhasm [4] is a portable assembly seemingly not focused on safety.

When you start to worry about compiler bugs, hand written macro assembly isn't such a crazy idea for algorithms like SHA, MD5, RC4, AES. It is also easier to reason about timing in assembly. The truly paranoid would have to check compiler output anyways.

-----


The truly paranoid worry that the processor will not execute instructions as they think it will. This ranges from timing irregularities from branch prediction and instruction re-order to bugs in CPU microcode or deliberate backdoors in the the design, or even the fab altering the mask to change the design. https://eprint.iacr.org/2007/039.pdf http://www.cl.cam.ac.uk/~sps32/microsemi_re.pdf

-----


With the downside being that things like Heartbleed become possible. Dealing with timing attacks in high-level code is not necessarily hard -- Haskell, for example, has strict evaluation.

-----


I may err, but I don't really think it's that big issue. First off, "low level" isn't exactly synonymous to "efficient machine-code", I guess we well can expect from languages like Rust to become both efficient and predictable after a while. In fact, every non-lazy well designed static typed language implementation makes it very much possible to write code that compiles in pretty predictable way, so you can control execution time as well.

And the last, but not the least, timing attacks aren't chief vulnerability, really. If you expect a timing attack being real hazard you can randomize return time explicitly.

-----


The problem is that a memory unsafe language leaves gaping holes. Protection against timing attacks doesn't help me if one can read my private keys (remotely!) with a simple Python script. Timing attacks at least require colocation in most cases and are much harder to implement and run for meaningful results. It's certainly worth thinking about how we can put memory problems to death in at least crypto code and imho C has proven to be a permanent problem in that regard.

I am still a (interested) Haskell beginner but whenever this discussion comes up the first thing I think is "why not try this in Haskell?" and I never give a thought that I would do this in Java which is my everyday language (and reasonably memory safe).

-----


"Timing attacks at least require colocation in most cases and are much harder to implement and run for meaningful results"

Not true. It's long been known that timing attacks over a LAN are practical (replace LAN with intra-datacenter network). Relevant paper from 2003: https://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf

-----


Isn't it then marvellous that Rust is (also) a low-level language?

-----


Why rewrite C and C++ parts? Just because someone doesn't program properly in one language doesn't mean you need to rewrite everything written in that language. Bit like throwing the baby out with the bathwater.

-----


Because this type of mistake is easy to make in C and other programming languages make this class of bugs impossible. By moving the need for careful design from library to language implementation you get greater leverage in preventing bugs.

-----


>Just because someone doesn't program properly

It isn't just accidental errors you need to worry about. Obfuscated intentional errors are a problem too. If the language design prevents bugs that an NSA mole would try to slip in, then that is a very good reason to rewrite in a new language.

-----


The response to a subtle weakness in cryptographic software should not be to reimplement the cryptographic implementation from scratch. This inevitably introduces far more problems than it solves.

-----


Heartbleed is not exactly a "subtle weakness," it is a gaping hole -- and it is a gaping hole that is only possible in languages with simple pointers and simple pointer arithmetic. In other words, using better languages for the next generation of crypto implementations will prevent this from happening, and then we can all get back to worrying about subtle weaknesses.

-----


You're not wrong, but this clearly wasn't written overnight in respond to the news. There are almost a thousand commits in this repo.

-----


I'd rather say that the long list of overflow-related vulnerabilities would be a good argument to ensure that all crypto libraries are implemented in some kind of a strict language that disallows a large class of vulnerabilities, even if it costs us some time and effort.

Seriously, for many languages bugs cause unintended behavior within bounds of implemented logic - such as the recent logic bug that returned an 'okay' response even if the certificate was invalid; however, the fact that ordinary bugs tend to have effects of "read arbitrary memory" or "execute arbitrary code" is a very specific niche of C and friends.

In a language appropriate for security, a bug in heartbeat extension would break the heartbeat extension - maybe disable heartbeats not work, maybe cause different heartbeats - but it should be unable to affect the rest of application. In a language appropriate for security, if a single module is secure (i.e., the limited number of API/method calls it exposes are secure), then bugs in all other modules shouldn't be able to affect the insides of that module; so if your app has a tiny core that does some signing w/o exposing the key secrets, then the rest of the app can't touch it even deliberately, much less by an acidental bug.

-----


I don't think this is a response to a recently-disclosed weakness in OpenSSL. The project has been active for almost four years.

Minor edit: almost four years

-----


This inevitably introduces far more problems than it solves.

Could you elaborate more on this? I fail to see how this is inevitable.

-----


Haskell is a particularly odd choice for a project like this, as lazy evaluation makes it difficult to reason about the runtime of various operations under various conditions.

Something with a bit more ML in its blood would be better, and for a library that pretty much means OCaml.

Edit: I do not at all mean to imply that it is impossible to implement TLS securely in Haskell. Only that there are more natural choices if one wants the advantage of a strong algebraic type system.

-----


Lazy evaluation is the default in Haskell, but that doesn't mean strict evaluation is unavailable.

Here's an example of a type definition that specifies strict evaluation https://github.com/vincenthz/hs-tls/blob/master/core/Network...

See? They don't have to switch languages after all.

-----


This is an excellent point, and I didn't mean to imply that it's impossible to implement TLS securely in Haskell (though my original comment did seem to; I plan to fix that shortly). Only that there are more natural choices that give the same primary advantage (a strong algebraic type system).

-----


Why don't you test the library before making assumptions about it's runtime characteristics. How about doing the slightest bit of research on the creator?

-----


Simple(ish) example of using strictness annotations with folds:

http://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl%27

-----


There is a TLS implementation in OCaml in progress, it still has a long way to go though: https://github.com/mirleft/ocaml-tls

-----


Could you elaborate on how lazy evaluation is problematic for such a project?

-----


In cryptography, for most operations, you need to be sure that the operation takes the same amount of time for all possible inputs. Otherwise, you leak information.

The classic example of this is checking string equality with strncmp(): this takes a different amount of time depending on how similar the strings are. If one string is secret and the other is controlled by an attacker, the attacker can use a clock and multiple attempts to discover the secret.

Obviously this particular one isn't relevant to SSL, but there are a number of other possibilities to worry about in most languages, most obviously short-circuiting operations like boolean AND/OR. Lazy evaluation makes every operation short-circuit, so you need to worry about this in every operation.

It can be done, but it's harder than it needs to be.

-----


So we should choose the approach that gives us trivial attacks that reveal 64K straight out of Compton to the approach that may be slightly harder to defend against timing attacks?

-----


This is such an obvious false dichotomy that I'm sure most people will notice, but I'm pointing it out anyway.

We could use something that gives both advantages, like the OCaml I already mentioned. Or, we could take a hybrid approach, where something like Haskell generates C code that provably can't have buffer problems. Or, we could statically verify that the library is written in a known-memory-safe subset of C++. Or, we could use a language like Rust, which (once it's eventually complete) seems ideal for this sort of application.

-----


I think the "define a strict, branchless DSL" approach is the right one, if you're going with Haskell. Then use the type system to ensure that only that stuff can touch key data. No problems with laziness or timing attacks, if the core of that is implemented correctly.

-----


Nope. This is a common troll whenever any Haskell project is mentioned. It's despicable FUD and just plain wrong. Haskell makes it very easy and lightweight to add strictness annotations to your data structures; far, far easier than it is to add laziness to a strict-by-default language.

-----


Not him, but I don't think the regular lazy evaluation worry about building up a million thunks and GC hell would be as much of a problem as it being awkward to not have things happen too fast and leave side channel attacks from timing.

-----


TLS implementation in Go. http://golang.org/pkg/crypto/tls/

Go is probably better at this.

-----


According to Adam Langley, a Go contributor, it can still be side-channeled. https://twitter.com/agl__/status/453370970552532992

-----


s/Go contributor/primary author of the Golang TLS package/g

(Also, he's one of Google's point people on TLS.)

JFWIW.

-----


>Go is probably better at this.

Why? Go doesn't really offer a whole lot in terms of security, except for better managed memory. I'm not even sure you could reliably eliminate side channel attacks in Go.

-----


>Go doesn't really offer a whole lot in terms of security, except for better managed memory.

What makes it better? Haskell's GC is very advanced.

-----


>What makes it better?

Better than C, that is. Haskell is on a whole 'nother level of "what are memory bugs?"

-----


Of course you can have memory bugs in Haskell, but it's obvious that you're doing special memory stuff.

http://hackage.haskell.org/package/base-4.7.0.0/docs/Foreign...

-----


Unfortunately this library can only be used in Go, right?

-----


You could terminate connections with it and have a local socket to another process. As we have seen putting ssl in another process is helpful for memory isolation...

-----


How bad are timing/side-channel attacks, really? I think that half of the people who talk about this are showing off. Some nerdy one-uppance.

-----


>How bad are timing/side-channel attacks, really?

Bad. They are not something to blow off. They may seem "out there", but they are actually leveraged somewhat frequently. They are particularly dangerous in certain shared-hardware environments (e.g. VPS services like DigitalOceal, AWS, etc), and against physical devices that are supposed to resist data extraction (like smart cards).

-----


When you're talking about something used to secure as much sensitive data for as many people as TLS/SSL, timing and side channel attacks are very important. Any little flaw can lead to information exposure such that people are put into danger.

-----


Extremely bad and practical : http://cr.yp.to/antiforgery/cachetiming-20050414.pdf

-----


Totally off topic, but programmers, I REALLY need your help... https://news.ycombinator.com/item?id=7559067

-----


and you can still write bugs in Haskell.

-----


I'll notify the Haskell people straight away, sir.

-----


With all due respect, I don't know that TLS/SSL implementation problems will be largely solved by changing programming languages.

-----


If changing languages lets you reduce code size 20-fold, it's very much worth it.

Reasoning about a smaller code base is nonlinearly easier. This is especially important for security software where letting your reasoning slip usually renders the whole thing useless.

-----


Is that really the case? In Java (my native language, so to say) I quite regularly browse through the language library implementation. I found it quite easy to read and comprehend. I almost always learn something new by doing so.

Not so much for Haskell. When I started with it, I similarly tried to learn by going through the language libraries. Even when I gained some more experience (I am still a beginner and only casual user, though) I have a very hard time to understand what's going on. The code is so dense that I find it hard to understand.

-----


You have an easy time understanding the libraries of your native language, but a hard time understanding the libraries of a language in a completely different paradigm that you're a beginner at? You don't say.

Seriously, the Haskell libraries are understandable and informative - once you learn the language.

-----


That's what I exactly tried to convey: I used that tactic when I was a beginner at Java and hadn't many problems with it. I used the same tactic when Generics came up. I hadn't any problems with it. The beginner can become a better programmer by reading the code. Now I am in a state where I can say: "this shouldn't be implemented that way". Which is completely different and grounds in being Java my native language.

I have casual encounters with Haskell since two years now. Library code is still hard to grok. Don't get me wrong: I like Haskell very much and I am pretty sure that its a decent language to write security critical code in, but I think this is still a problem for many beginners out there. I very rarely stumble on something and say: "hey, that's cool, I will use that in my code". But I admit that I give up after the third nested "lift-<whatever>" ;-).

I want the libraries help me understanding the language, not vice versa. But of course, that's not their primary goal.

-----


When you were a beginner at Java, did you by chance already know how to program in an imperative or OO manner? I find it highly unlikely that anyone unfamiliar with imperative/OO programming in general would get any benefit from reading Java libraries. If you want to get benefit from reading Haskell libraries, just like in Java, you need to learn how to program in the paradigm. Not just "have casual encounters" with it, not just reading things about it, but actually learn the thing.

A direct analogy to what you're saying: When I was studying French, I was able to read articles above my level and get a decently good idea of what they said. Now I'm studying Chinese, and I can't tell at all what characters mean! This should not be surprising. French is very similar to English, where Chinese is very different. Of course picking things up in the latter will be more difficult. It would be absolutely absurd to blame the Chinese language for that.

-----


> Reasoning about a smaller code base is nonlinearly easier.

I agree.

The effect could be offset by having to reason about the implementation details of a more complex platform though.

-----


Every flaw we have seen in OpenSSL in the last year would definitely have been impossible in a managed language.

-----


Not this one: http://eprint.iacr.org/2014/161.pdf

-----


I think side channel attacks are valid, but in this case they are a red herring. One side channel exploit and what three remote code executions?

It sounds like lots of people (sockpuppet echo chamber) are saying if the new language/runtime doesn't fix all possible flaws we should stay with the broken-on-purpose OpenSSL codebase?

-----


> It sounds like lots of people (sockpuppet echo chamber) are saying if the new language/runtime doesn't fix all possible flaws we should stay with the broken-on-purpose OpenSSL codebase?

I think it's more likely they are saying that replacing one broken implementation with another broken implementation isn't worth the effort.

Sure, remote code execution and reading of arbitrary memory are a lot worse than an attacker figuring out a private key. But if an implementation of a crypto system is not able to conceal the private key, it is completely useless.

Seems to me like we may need to, first of all, redesign some algorithms to be easier to implement in constant time (like Ed25519 and symmetric encryption algorithms that avoid S-boxes), and secondly, develop a language that combines both memory safety and the ability to reason about whether an operation runs in constant time.

I'm not saying this is something we just do, it's a huge undertaking. I'm saying that it seems to me this may be needed, unless we want to find 5 years down the line, when everyone is using a Haskell TLS implementation, that attackers can retrieve the private key through timing attacks. That would be a fairly large effort put towards writing another TLS implementation that is broken by design.

-----


How huge? So huge we should stay on OpenSSL!

Here are all the current TLS implementations http://en.wikipedia.org/wiki/Comparison_of_TLS_implementatio...

There is room for one in a secure language.

Completely useless? Go ahead and throw the baby out with the bathwater. At this point OpenSSL is a net negative.

It really smells like you are spreading FUD to keep people on OpenSSL.

-----




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: