

The Empirical Evidence That Types Affect Productivity and Correctness - danso
http://www.danluu.com/empirical-pl/

======
jerhinesmith
This was just posted a few days ago -- here's the discussion for anyone
interested:

[https://news.ycombinator.com/item?id=8594769](https://news.ycombinator.com/item?id=8594769)

------
jfoutz
That very first study, ANSI C vs K&R C seems like exactly the kind of test
that's needed.

Unfortunately, most languages are now heavily shaped by the typechecker. That
is, not many languages can flip the typechecker on or off with a switch - or
have parallel typechecked and not implementations.

That was one golden chance to get people familiar with the language to try
both variations.

And the effect of that? Slightly in favor of typechecked (he typed smugly).
But that's not really enough. It's not clear there's even a 5% bump in
productivity.

I like compile time typechecking. I like static guarantees. But there's just
not enough evidence to say typechecking is a must-have. And really, I'm not
sure it's possible to construct an environment to test that.

~~~
joseraul
We could try TypeScript vs JavaScript.

TypeScript type annotations are optional, and its semantics are by design
exactly those of Javascript.

~~~
kybernetikos
I've come to the conclusion that when you can get your code running and
producing errors (javascript) in less time than it takes to compile to see the
type errors (scala) the benefits of types seem a lot less persuasive.

Anything that adds an extra step to the workflow can diminish the benefit of
types, so if you were to do a TypeScript vs JavaScript comparison, you'd have
to decide whether you wanted the comparison to include the extra time that
TypeScript adds into your iteration flow or make a fake build step for vanilla
javascript that slows things down. Maybe es6 transpiled to es5 vs TypeScript
would be fair.

------
Animats
That's a good paper. The studies that involve someone writing a small program
aren't too helpful, though. Typing is most useful when the type structure
won't fit in the programmer's head all at once, or when there are multiple
programmers and changes over time. Small scale experiments don't catch that,
and are more sensitive to how well the compiler reports errors.

Looking at code on Github is promising. Several people have done that, trying
to analyze changes. Of course, you only see the changes that get pushed to the
server, so you're only seeing post-test results.

We seem to be converging on a consensus on typing. Within a module, automatic
type inference makes life easier and programs less wordy. The newer hard-
compiled languages, Go and Rust, have that. Even C++ has that now, with
"auto". Program-wide type inference is hard and less helpful. Program-wide
inference also means the caller's types drive the callee, which is a mismatch
to the concept of libraries. So that hasn't caught on.

Expressing size remains an issue. Is size part of a type? C and C++ struggle
with this and lose. Go and Rust make a distinction between arrays and slices,
which seems to be working.

Type attributes are still up in the air. Read-only/immutable is common. By
value or by ref is sometimes explicit and sometimes implicit. Whether
something can be nil is not common, but has been tried. Rust has the "borrow"
concept, which is promising but doesn't much mileage on it yet.

There's been real progress in recent years. Until recently, type systems in
mainstream languages looked like either C or Pascal. Now they're getting
better.

------
bkirwi
Great to come across such an even-handed review; thanks!

One quick note:

> Other than cherry picking studies to confirm an long-held position, the most
> common response I’ve heard to these sorts of studies is that the effect
> isn’t quantifiable by a controlled experiment. However, I’ve yet to hear a
> specific reason that doesn’t also apply to any other field that empirically
> measures human behavior.

In practice, I'm not actually aware of any good studies on 'tools
productivity' in any other fields. (And it's not for lack of strong opinions
-- grab a professional photographer sometime and ask them about Canon vs.
Nikon.) I think it really is fundamentally difficult to measure things like
this: getting the most out of a tool can take years, and ecosystem factors can
outweigh even a large difference in 'inherent quality'.

If we care about using nice languages, it seems like the best thing we can do
is try and lower the impact of these external factors. I think the situation
here is actually pretty promising -- projects like LLVM take a lot of the
heavy lifting out of implementing a performant compiler, and the JVM and CLR
are becoming more and more hospitable to diverse language implementations.
There will always be a cost to choosing an unpopular language -- but the
smaller that cost is, the easier it will be for better technologies to win out
in practice.

~~~
chadzawistowski
I know the CLR was designed to support multiple languages from the very start,
whereas the JVM has historically been very tied to Java. What sort of recent
improvements have made the JVM 'more hospitable to diverse language
implementations'?

~~~
bkirwi
The big one is the recent add of the 'invokedynamic' instruction, which has
dramatically improved the performance of dynamically typed languages.[0]
Mostly, though, it's just externalities from the slowly-and-steadily-improving
Java language; for example, the improved support for anonymous functions in
Java 8 also promises to make Scala faster.

[0]:
[https://docs.oracle.com/javase/7/docs/technotes/guides/vm/mu...](https://docs.oracle.com/javase/7/docs/technotes/guides/vm/multiple-
language-support.html)

------
jcr
Most of the papers are available but first linked IEEE paper is paywalled: "A
controlled experiment to assess the benefits of procedure argument type
checking" Prechelt, L.; Tichy, W.F.

But it's available here:

[http://page.mi.fu-
berlin.de/prechelt/Biblio/tcheck_tse98.pdf](http://page.mi.fu-
berlin.de/prechelt/Biblio/tcheck_tse98.pdf)

------
MrBuddyCasino
I remember Google made public some very interesting statistics about hard
disks and their modes of failure, which was unprecedented in scale.

Google has lots of code in C++, Java, Python and Javascript, and their
employees all have passed very rigorous assessments before hiring, so language
comparisons should be less skewed by individual skill variance.

I would be surprised if they didn't already perform these sorts of analysis
internally.

------
davidgrenier
One big problem is programmers tend not to write programs from scratch, rather
they work in existing systems.

And even though I'm a static-typing proponent, I will concede that developer
productivity is likely to correlate a lot more with the state of your existing
systems.

~~~
bunderbunder
I'm also starting to believe that the devils is very much in the details.

Personal testimonials:

Objective-C is an uncouth mishmash of strong, weak, dynamic, static, and duck
typing, with different rules applying to different corners of the language. In
theory this should be a train wreck. In practice I found it to be one of the
most productive languages I've ever worked on - provided I was very principled
in how I built my code. Whenever I let myself slip into hack mode, the result
was indeed a train wreck.

F# has a mostly-fantastic type system that still manages to fall apart in a
couple of key areas: First, its type inference mechanism doesn't have any clue
what to do with calls to non-curried ("C#-style") methods, so any code that
interacts with the .NET API ends up littered with type annotations or
convenience functions whose only purpose is to wrap a .NET API call up in a
curried function that the type inferencer can understand. Second, no type
classes. This is particularly painful because .NET's original designers didn't
have the foresight to create a base abstraction that numeric types can conform
to.

