

Programming Languages vs. Fat Fingers - petsos
http://www.spinellis.gr/blog/20121205/

======
kps
As a historical note, the Mercury period-vs-comma¹ bug probably wasn't a
_typo_ as such. In that era, programmers didn't type their programs; they
wrote them on paper, using special printed forms² so that their intent would
be clear to the professional typists who entered it. To prevent typos,
everything would be typed at least twice; in the days before diff, there was
special-purpose hardware for this.³

I can think of three possibilities for the programmer making such an error:
actually writing it wrong (which seems unlikely to me because the mental
context of writing a loop range doesn't lend itself to writing a single
number); a skipping pen; or a locale issue — i.e. a programmer of European
origin mixing up the characters.

¹ <http://catless.ncl.ac.uk/Risks/9.54.html#subj1.1> ²
<http://en.wikipedia.org/wiki/File:FortranCodingForm.png> ³
<http://en.wikipedia.org/wiki/Keypunch#IBM_056_Card_Verifier>

------
Someone
I cannot find how they controlled for the wordiness of the different
languages. They changed one token in each file, but the number of tokens per
file might be different. For example, Python likely will be shorter than Java
due to its significant whitespace.

Also, the 'replace a single character in a token by noise' change may have
hugely different effects, not only because of differences in keywords
(begin…end vs {…}) but also, and probably more so, because of average variable
and function name length (for the languages tested, this is a cultural issue,
but it would not surprise me if the effect were large. You won't find
'FooFactory' in a perl program)

~~~
ispolin
Using my complete lack of statistical knowledge, I multiplied the wrong output
rate % by the total lines of code in the examples from original paper here
[http://www.spinellis.gr/pubs/conf/2012-PLATEAU-
Fuzzer/pub/ht...](http://www.spinellis.gr/pubs/conf/2012-PLATEAU-
Fuzzer/pub/html/fuzzer.html) to get a very bad approximation of fat fingering
adjusted for program length. You'd expect more typos in a longer program; the
original experiment always introduced 1 typo per run regardless of program
length.

You guys enjoy while I prepare for the lynch mob of Statisticians :-)

    
    
        Lang    Err %   LOC    LOC adjusted Err %
        Ruby    0.17    159   27.03
        Python  0.15    161   24.15
        Perl    0.22    156   34.32
        PHP     0.36    224   80.64
        JS      0.18    102   18.36
        Java    0.1     331   33.1
        Haskell 0.15    114   17.1
        C#      0.095   389   36.955
        C++     0.08    461   36.88
        C       0.1     458   45.8

~~~
Someone
Looks like an improvement to my (completely unbiased, of course) eyes. Haskell
moves away from C++/Java, C moves awy from them in the reverse direction, and
PHP moves into its own league.

The surprises, IMO, are JavaScript (I would place it close to PHP) and perl
(apparently, it is easy to come up with character sequences that are not valid
perl :-))

Thinking of ways to get a perfect language according to this metric: the way
to get there is to introduce lots of redundancies in the grammar. For example,
if one requires two exact copies of the same source before code compiles, any
single change will give compilation errors. However, programmers would build
tools to defeat such strategies.

Maybe, one should scale for actual content, e.g. by weighing against the size
of gzipped source code?

~~~
ispolin
Yeah, look at the JavaScript LOC! Who wrote the rosetta code for those,
Brendan Eich?!

This hints at another way to optimize for this metric; make the language as
expressive as possible. Less characters should translate into less typos. Paul
Graham strikes again! (<http://www.paulgraham.com/power.html>)

As to your point about redundancy, I think the researchers are in agreement
with you on that one if you consider unit tests to be a sort of redundancy,
expressing the same concept in two different ways. They bring this up
repeatedly in their report.

Obligatory Perl jab: It surprised me that any of the Perl solutions used more
than one line. :-P

------
Too
So, assuming all typos inserted are in fact a serious error to the program and
assume that the output they used to compare is a unit test. Then an
interesting number to look at is:

    
    
        Errors remaining = Successful run - Faults caught in unit test
    

In that regard most languages in the study are quite equal. Although with the
static languages you most likely catch the errors much earlier and you
probably get a much better hint of _where_ the error is, instead of just an
assertion that you have an error. And of course this assumes that you _do_
have unit tests.

------
lucian1900
This basically studies how impactful typos are for these various languages.
Well, typos are always trivial to find and fix.

It's more serious issues that I would want a type system to help me with (like
null references, concurrent access, etc.)

~~~
mdonahoe
A typo is trivial to find when you know there exists a typo.

The scary part of this is the percent of programs that ran successfully but
produced the wrong output. Outputs errors can go unnoticed.

I don't know if I am scared enough to stop using python though.

------
ckakman
Nice article that show some of the virtues of statically-typed languages.

~~~
Luyt
Although 'strongly typed' shouldn't be confused with 'dynamic typed'.

------
Tyr42
I find it funny that Java scored very similarly to Haskell, when otherwise the
languages are quite different.

~~~
dons
Java : lots of syntax to fuzz, all have to be right.

Haskell : few tokens, stronger checking (i.e no corecions), though Rosetta
code is not as type-heavy as real Haskell code.

I have a conjecture that Haskell examples designed by experts - with types in
mind as we do in production systems - would have lower compile rates than the
Rosetta examples, that are written mostly by non-experts without regard to
maintainability.

In production Haskell code, I will usually wrap Double values with newtypes,
to e.g distinguish currency amounts, percent data and ratios from each other,
specifically to guard against typos where I accidentally pass doubles
parameters in the wrong order. Designing code with an intent to make it less
vulnerable to fat fingers is certainly possible.

------
fox91
I don't see any value or meaning in this "experiment". What should it
demonstrate?

~~~
JulianMorrison
Features such as automatic creation of mentioned variables and dynamic typing
allow a mistake in code to change a correct program into a syntactically valid
program that does the wrong thing.

~~~
pekk
Now show that this is a significant cause of costly errors in practice...

~~~
gordaco
I know my evidence is just anecdotal, but I see it a lot. Man, do I miss my
C++ and my Haskell when I have to use Python and javascript at work.

Most people seem to disagree with that happening "a lot", so maybe I'm working
with bad codebases or just being unlucky. Indeed, a controlled study would
yield more reliable information.

