

GnuTLS vulnerability: is unit testing a matter of language culture? - michaelfeathers
http://gehrcke.de/2014/03/gnutls-vulnerability-is-unit-testing-a-matter-of-language-culture/

======
michaelfeathers
Unit testing is definitely influenced by language culture but there's also the
issue of how easy it is to do.

JS doesn't have the unit testing uptake that Java/C#/Ruby/Python have, and I
think that largely it is because JS is often used in the front-end and people
aren't used to having any sort of an abstraction layer.

The issue with C is that it is hard to mock. You can use the preprocessor, the
linker, or function pointers. The latter is the best option but your code ends
up looking quasi-OO with structs of function pointers. It's a style people are
used to in some OS programming but it does freak some people out.

------
dalke
I once did a security audit of Fitnesse, perhaps the best known software
developed under the red-green-refactor cycle of TDD. I found a number of
security vulnerabilities. A path traversal vulnerability meant I could read an
arbitrary file on the server, including the password file. The password hash
encoding is weak, and given a hash I was able to find a password which would
match the hash. Once I had upload access to the server, another path traversal
vulnerability meant I could upload to an arbitrary spot on the server, so long
as the process had permissions to it. By design, the server can execute Java
code. Combined, this means an attacker could run arbitrary code on the server.

While the worst vulnerabilities have since been fixed, the server still
contains a CSRF vulnerability.

These all suggest that unit testing/TDD doesn't really add much to security
protection.

~~~
michaelfeathers
> These all suggest that unit testing/TDD doesn't really add much to security
> protection.

I find it hard to believe that anyone would find that surprising. Security is
a thing you either have in mind when you are working or not.

~~~
dalke
The article is titled "GnuTLS vulnerability: is unit testing a matter of
language culture".

The coupling of vulnerability to unit testing is, IMO, a red herring. I've
never seen any evidence that unit testing leads to more secure systems than
any other commonly used method, including code review and coverage based
functional tests. My observation is that unit testing/TDD doesn't
significantly reduce security failures (ie, it may be the same, or it may be
worse), and I use my Fitnesse pen test as a real-world example. You seem to
agree.

The article goes on to say "An automated test suite should have immediately
spotted that invalid commit, right." I just downloaded the package. GnuTLS
_has_ an automated test suite. There's nearly 22KLOC in the tests/
subdirectory, with 75KLOC in gl/ lib/ and src/ combined.

The rest of the article slags on some perceived lack of testing based on C
culture ("did you really, honestly, expect a C code base that reaches back
more than a decade to be under surveillance of ideal unit-tests, by modern
standards"), and implies that "ideal unit-tests, by modern standards" _would_
have caught error. This is unjustified.

Moreover, we know what ideal non-unit tests ca. 10+ years ago looks like. For
example, SQLite is the exemplar of how a C library, without low-level unit
tests, and coupled with strong code coverage, can help produce high quality
code.

(To tie in with your other post, in part because SQLite uses "structs of
function pointers" to make a plug-in filesystem architecture for users, which
also allows emulation of unusual file system failures.)

If GnuTLS had the ideal tests of the standard of 10+ years ago, then these
bugs would still have been found, unit tests or not. As its authors point out
in the README, "Thorough testing is very important and expensive."

So, "unit tests" isn't really is issue, is it?

I would have been happier with an article titled "GnuTLS vulnerability: we are
all cheapskates".

~~~
michaelfeathers
> The article goes on to say "An automated test suite should have immediately
> spotted that invalid commit, right." I just downloaded the package. GnuTLS
> has an automated test suite. There's nearly 22KLOC in the tests/
> subdirectory, with 75KLOC in gl/ lib/ and src/ combined.

Did you see a test for the error? Maybe check to see if there were tests for
other parts of code that prevented vulnerabilities? I'm sure there are.

First and foremost, people need to be conscious of security to do well. No
amount of testing will overcome that. But, if you have security consciousness,
I'm sure that TDD can augment it. It's a practice that encourages reflection.

Re the article itself, fair enough, it uses a flawed example but I can back up
the assertion. I see less unit testing in C codebases.

~~~
dalke
I don't understand your question. I think the answer is that I don't have the
domain knowledge to evaluate the GnuTLS tests, so I don't know what I'm
looking for. I tried to compile the package but I'm missing at least one
third-party package, and then decided it wasn't worth my time to dig more into
the code.

My point is that the article makes a false claim, and the veracity of that
claim is easy to determine. This makes it hard for me to believe what the
author is proposing is actually true or meaningful.

You write that you have observed less unit testing in C code bases. It's hard
to know what to make of that observation. Is it a function strictly of
language culture, as the author suggests? Or is it more a function of age?
Perhaps C packages developed in the last 5 years have similar test
effectiveness as Ruby packages developed in the last 5 years. Or perhaps
something else is the key discriminator? While such an analysis is possible in
theory, it will be hard to standardize effectiveness across a large number of
packages.

I used the clumsy term "test effectiveness" there instead of "unit tests" for
a reason. As you write, unit testing is harder in C because of the difficulty
of making mocks. Personally, my views are much more aligned with that of James
Coplien. I believe, quoting his recent essay, "most unit testing is waste",
and "[y]ou'll probably get better return on your investment by automating
integration tests, bug regression tests, and system tests than by automating
unit tests."

Which means that I'm not really convinced that a metric based on the number of
unit tests is that persuasive an indicator of security, software quality, or
other more operationally based goals, when it should really be all tests, of
which unit tests are only one part.

As you say, "if you have security consciousness, I'm sure that TDD can augment
it". My observation though is that any sort of testing can augment security
consciousness, so why specifically promote TDD, when TDD alone is often
insufficient and other tests (including some non-TDD unit tests) are needed?
(Fuzz tests are an example of a potentially useful non-unit test which can
specifically help some aspects of security.)

That is, TDD is a design method, not a testing method. It's a strict subset of
unit testing as a whole. TDD, at least in its red-green-refactor formalism,
doesn't have a spot for adding tests which are expected to pass. These tests
might come from, say, formal boundary-condition checking, algorithm complexity
analysis, or security tests, and serve as a validation that the algorithm as
implemented can handle the full range of inputs.

Of course, I picked the RGR formalism because it's obviously incomplete in the
first place. The list of refactorings includes "Substitute Algorithm". Even if
the new algorithm is cleaner, there can be different boundary cases than the
previous algorithm, so new tests may be needed in order to fully test the new
algorithm. But most, if not all, RGR descriptions say about the refactoring
step something like "Now that your tests are passing, you can make changes
_without worrying about breaking anything_." (Emphasis mine.) That's clearly
not universally true, which tells me that the RGR formalism can't be generally
correct.

Going back to Kent Beck's original Fibonacci walk-through, it's clear that the
final stack-based Fibonacci algorithm will take exponential time, and likely
never compute Fib(30). Even if memoization is added, to give the linear time
performance and avoid a stack overflow, it will silently overflow on output
values large than max int. This example tells me that formal boundary-
condition checking is not at the heart of TDD.

It may be an add-on, but later examples, like Robert Martin's prime
factorization Kata, also fails to establish that the final algorithm works
across the range of acceptable inputs.

As you might infer, I have looked hard for a description of how to incorporate
security as part of the TDD process, and failed. There are people who suggest
using the unit test framework (post-development) for security tests, but test-
after isn't part of TDD.

Do you have any pointers to how to combine security and TDD?

Referring specifically to Fitnesse, I think it shows that "conscious of
security" is about as useful as saying "conscious of formal boundary-condition
checking." That is, the Fitnesse authors _have_ some security consciousness.
They chose a password hashing function specifically for better security. It's
just that they didn't know enough to choose a good one, and instead made one
which is easily broken.

This is really skill acquisition, which is more difficult than simple
consciousness. Security skills aren't important for the 99+% of programming.
It just that <1% of a 100KLOC program still leaves a pretty vulnerable attack
surface. (The math doesn't actually work that way, but the idea holds.)

