
Formal verification of Amazon's s2n SSL/TLS library - NathanCollins
https://galois.com/blog/2016/09/verifying-s2n-hmac-with-saw/
======
serge2k
Isn't verifying that the math is correct the really easy part of making sure
crypto code is safe?

~~~
NathanCollins
Do you mean as opposed to e.g. verifying the absence of timing attacks? While
I agree that verifying the absence timing attacks is probably much harder than
what was done here, the difficult part of the s2n verification I linked to was
that we verified equivalence between _imperative_ C code and a _functional
mathematical_ specification.

~~~
guitarbill
Right, it says "convincing argument that the C implementation does the same
thing as the mathematical specification" and "Assuming that we didn’t
accidentally program the same “bug” into our Cryptol spec".

My understanding is it's another way of white-box testing the code against
specified behaviour, but just that using a (proven?) mathematical
specification for algorithms is probably easier than writing unit tests that
have to capture all edge cases. (In essence, it sounds like verification
software is probably set up to detect such edge cases, which I do think is a
good idea, because you only have to program such software once.)

~~~
NathanCollins
I don't think I understand what you mean by "white-box testing" here, but
perhaps it's helpful to clarify what I meant by "equivalence" above, and how
it relates to testing: what we did here was verify _input /output equivalence_
between the imperative C code and our functional mathematical spec in Cryptol,
for a range of key and input buffer sizes. This corresponds to testing _all_
inputs of those sizes, which is not possible to do by direct testing: e.g.,
for a 64 byte key and a 1000 byte message, the equivalence corresponds to
checking

8^(64 + 1000) =

772229093352564060021182203061704429810699485400692901921197
543030601797302324658889178066005708227773161814337173682980
065612522479316644103460638515687114933331680544961552375412
914711698479251875125441335427310394080188149008724146221306
402242642191159219745353079189135871713826154087180913177991
135554545843425504232155742364801022614341625532175948198587
539576566458760517446126909555225085347521013376171505426231
008775737688282539095967230536510936329489906183630574979494
541005574981802619546120394597788656899688609063922312837993
473534655739423794995816974687759952971465473538229880976237
137410666755636310464327792929854669852851716265627988045993
010404521026728809660275537200281773360887456757531693050082
473180078568595877659952113273156104380151800825339034988199
020562681928372626978536148813617979584497069978086989075685
756621893032191527888867820144068182725496496585643739551119
7590300209437142003442599950379602277911674788208191414992896

tests, which would take "forever" to verify by direct testing.

We did not prove any properties of our mathematical specification in Cryptol,
but the claim is that it's close enough to the official FIPS mathematical
specification for HMAC [1] that it's easy to believe that it's correct.
However, a group at Princeton has also verified HMAC in the past, and gone
further than us by not only proving that the imperative C code is input/output
equivalent to their mathematical spec in Coq, but also proving that their
mathematical spec has the security properties of a secure hash function [2].

[1]
[http://csrc.nist.gov/publications/fips/fips198-1/FIPS-198-1_...](http://csrc.nist.gov/publications/fips/fips198-1/FIPS-198-1_final.pdf)

[2] [https://www.cs.princeton.edu/~appel/papers/verified-
hmac.pdf](https://www.cs.princeton.edu/~appel/papers/verified-hmac.pdf)

~~~
guitarbill
AFAIK, white-box testing is simply when you can look at the source code (as
opposed to black-box testing) for example a unit test is a type of white-box
test.

What I was struggling to express is that in the mathematical notation, the
operations are well defined (right?); in C that's not necessarily the case. So
you could argue that if you were writing direct tests, you don't need to check
all inputs, but testing edge-cases will do. And maybe that's true, but
practically impossible for complex algos because how do you know which inputs
cause edge case behaviour? So I was agreeing that this approach is probably
better than having some fallible human write test cases :) (better = more
thorough and reliable) And although you'd have to make sure the same fallible
human hasn't put bugs in the mathematical spec, as you've said that's probably
easier to check.

EDIT: Nevermind, I found part three about undefined behaviour. I had written:
_You seem to know loads about this, maybe you could say how undefined C
behaviour is handled when comparing against a spec? Is e.g. shift-past-
bitwidth simply forbidden? The only alternative I can think of is looking at
the disassembly on a certain platform and checking those instructions, which
sounds less than ideal._

~~~
NathanCollins
Thanks for clarifying "white-box testing".

Some comments:

* the operations in the mathematical spec are mostly well defined, but e.g. division by zero is not defined. However, the verification handles this by checking that all operations are well-defined on all possible inputs.

* yes, identifying the "edge cases" is not something you can do easily, and hard to make formal. In some sense, the fact the non-edge-case inputs are treated in a uniform way is probably what allows the verification to succeed at all.

* a short summary of the answer you already found in the third blog post: what we actually verify is the LLVM assembly that Clang produces when compiling the C program. Much of the potentially undefined behavior in a C program is translated away by the compiler on the way to LLVM assembly. For any potential undefined behavior that remains in the LLVM assembly, the verification checks that it cannot happen at runtime.

