Hi, I'm the author of the blog post and the paper. I certainly do not argue that static typing obviates unit testing. I agree that both are valuable together. I did however find examples where individual unit tests could be deleted because they didn't test anything that wasn't covered by the static type system. In other words static typing didn't eliminate the need for unit tests (for catching bugs), but it did reduce the number of needed tests (in real world code).
So I think you are running on a false premise here efarrer.
The false premises is that anyone ships bug free code. It can be done but it's stupidly expensive. Formal methods Z proofs, etc.
Unit testing and static typing are 2 different techniques which eliminate bugs.
The question that needs to be answered is which of the 2 techniques is the most effective per developer hour spent.
Unit tests do not caught all the bugs that static typing would catch nor does static typing catch all the bugs unit tests would catch. That's not an useful observation.
Nor is there a moral here that you should be doing both, doing both is expensive and costs $$$.
Hi, I'm the author of the blog post and the paper. Every bug that was caught by the Haskel transition was verified to exist in the original Python. If I couldn't reproduce it in the Python, I fixed my translation. More information on my methodology can be found in the paper. https://drive.google.com/file/d/0B5C1aVVb3qRONVhiNDBiNUw0am8...
Hi I'm the author of the blog post and the paper. In the paper I provide a reference to such a claim by proponents of dynamic typing. From my paper:
"Because some error detection can be done by both unit testing and static type
checking, some proponents of dynamic type checking claim that static type checking is not needed [3]."
and the reference:
[3] J. Spolsky and B. Eckel, “Strong Typing vs. Strong Testing,” in The Best Software Writing I. Apress, pp. 67–77, 2005.
I do agree that development time is an important factor and one that I didn't address simply because that would require a different type experiment. I hope that researchers look into that. I do also think that in addition to development time that overall maintenance time is considered. For example its plausible that dynamic languages are faster to develop, but take longer to make changes due to it being harder to understand, and changes aren't guided by the type system. It is also possible that dynamic languages are both faster and have lower TCO. To be clear I have no idea which is faster, and which has a lower TCO. I'd love to see some scientific evidence on development time and TCO.
> I do agree that development time is an important factor
In my experience, development time is much less of an important factor than management would like to think it is. I've seen companies burn customer goodwill by releasing a quickly-built, bug-ridden product too early, because they wanted to be first to market or whatever.
And I've also found that the companies that push for faster releases (without reducing scope, of course) end up shipping late anyway. If they'd accepted a later deadline in the first place, fewer coding mistakes would end up getting made along the way. (Stressed-out developers rushing toward an unachievable deadline make more mistakes than those whose time estimates are listened to.)
I know I don't represent all developers, but I do much better when I write in a "slower" language when that language's compiler verifies more things before we get to the point of running the code. The end product is more stable and more correct than what it'd be otherwise, and I don't think I deliver slower in a way that's significant to the business.
"If your unit tests provide good code coverage, don't feel too paranoid about giving up compile-time type checking." (pp 69)
"The only guarantee of correctness [...] is whether it passes all tests which define the correctness of your program." (pp 75)
"To claim that static type checking constraints in C++, Java, or C# will prevent you from writing broken programs is clearly an illusion" (pp 76)
The closest I could find to your quote was at the end: "a dynamically typed language could be much more productive but create programs that are just as robust as those written in statically typed languages." (pp 77) In the context of the article (see previous quote), this is presumably referring to the goal of "defining the correctness of your program", and the article in fact makes the point which I made reference to, which is that incorrect inputs and API usage could well be out of scope for this goal (and perhaps were in the programs you tested). Or, to put it another way, it doesn't make sense to talk about robustness unless you also give a context.
The worst I could say about the article is that it is kind of tautological, because it's easy to no-true-Scotsman the concept of "sufficiently good" unit tests. But to be honest this also happens with type systems.
I realise there's a lot of ... let's say "enthusiasm" in this particular area, but I found the article quite a bit more measured than what was implied by the blog post.
I agree there are generally trade offs, that's essentially the results of my "60 pages of pointlessness" paper. If you did have time to read the paper you would notice a reference to this book http://my.safaribooksonline.com/book/software-engineering-an... that is an agument for unit testing instead of static typing. I think we need to use the scientific method in computer science and not just base our ideas of intuition, belief or absolutes like "It's never so black and white".
It seems like you're two criticisms are as follows:
1. The sample size is too small.
2. The quality of the Python code isn't that great.
I (the author) would like to address both of these. First of all I completely agree that more research needs to be done. I mention this in the paper. I have provided a data point not a proof. It took me a couple months of several hours a day to the do the translation, I hope more people translate more programs to see if the results hold in the face of a larger data set. Second I agree the quality of the Python code isn't that great. I wanted to see whether unit testing obviated static typing in practice. In order to avoid selection bias I choose the projects at random. I picked the first four projects that were < 2000 lines of code and that had some sort of unit testing.
I believe that my methodology, nor my conclusions are flawed but that all should remember that a single experiment does not make a scientific proof. I hope that others will try to replicate this experiment on many more code bases. If they do it will be interesting to see the results.
I (the author) appreciate the feedback. I believe that many of your criticisms are addressed in the actual paper. First of all I completely agree that my sample size is too small for a conclusive proof. I mention in the paper that I hope that others will try and replicate this experiment on other pieces of software. I do think it's appropriate when conducting an experiment to publish a conclusion, not that the experiment will constitute proof (or an established scientific theory), but as a conclusion to the study that others can try to confirm or refute.
I also mention in the paper that it would be beneficial to conduct this experiment using different type systems for the reasons that you stated above.
The argument against static typing that I was testing didn't mention any particular type system nor any particular dynamically typed language, it was a general argument that stated that unit testing obviated static typing. Because the argument was so general and absolute I felt that any static type system that could be shown to expose bugs that were not caught by unit testing would be enough to refute the argument. I was not trying to prove that any type system would catch bugs not found by any unit tested software. The paper also points out that I'm trying to see whether unit testing obviates static typing in practice, in theory you could implement a poor mans type checker as unit tests, but my experiment was focused on whether in practice unit testing obviates static typing.
Finally I believe that my conclusion in the paper was at least a bit more modest than that of the blog post. The lack of apparent modesty in the blog post was caused more by a lack of ability on my part to accurately summarize than an inflated sense of accomplishment and self importance.
Thanks for the response! I appreciate the effort you went to here, this was no small task you set yourself to.
I appreciate the clarification. I think now I see better where your emphasis was: the purpose of the paper was to refute an argument, and of course the level of burden of proof is different and far less in that case. I think this misunderstanding on my part is what caused me to call the conclusions 'trivial' -- too strong and dismissive language on my part anyway.
The irony is, you were attempting to do to the unit-testing-is-sufficient argument what I was attempting to do to what I assumed yours was: provide one counter-example to falsify a broad and generalized thesis.
That said, I think I would have liked to have seen your original unit-testing-is-sufficient argument punched up and qualified into something a little more reasonable and real-world. As you stated the argument, it seems like a straw man to me. It seems one could reduce your version of the argument to something like: "Dynamic languages with unit test coverage will always catch errors that statically-typed environments will." And of course this is far too broad and unqualified a statement, and that is precisely why all you needed was one counter-factual to refute it. You didn't even need a handful of Python programs, or 9 or 20 or 100 errors to prove your point. You only needed one, as you stated above. This is why the burden of proof for your thesis was so small, but also why, in my opinion, even with that reduced scope and more modest conclusion, we haven't really learned much.
As someone who has spent most of my career in statically-typed environments and the last 6 years or so mostly in dynamic environments, and also as someone who has made something like the argument you were attempting to refute, I have to say I would definitely never have made such a brittle and unqualified statement as the one you refuted in your paper. To put it more directly, I think I'm probably a poster-child for the kind of developer you were aiming your thesis at, and I don't feel that my perspective was adequately or reasonably represented. More importantly, having looked at the examples given in your paper, I may have learned a bit about the kinds of errors that Haskell can catch automatically that some coders might miss in a dynamic environment, but not much useful to me in my everyday work context.
I think a more reasonable version of the argument, but more qualified and therefore requiring a far larger sampling of code to prove or refute, would be something like: "Programs written in a dynamic language with adequate or near-100 percent unit test coverage, are no more prone to defects than programs written in a statically-typed language with a comparable level of unit test coverage."
I agree this is a very important conversation to have, and again kudos to the work you put in here. Obviously people have strong opinions both directions, and the discussion, however heated at various moments, is an important one, so thanks for this!