
37M Compilations: Investigating Novice Programming Mistakes [pdf] - gwern
https://kar.kent.ac.uk/46742/1/fp1187-altadmri.pdf
======
olejorgenb
Most of these errors are not really interesting. Almost all of them are simple
typos or syntax errors.

I get that this is the errors that are simple to analyze. Reference/value
confusion would for instance be more interesting, but I guess that's harder to
autodetect.

~~~
achamayou
Aren't they? The fact that confusion between = and == is 4th most frequent is
quite interesting for example. It's possible to disallow assignments in places
where comparisons are most likely (expressions that are known to coerce to a
boolean), Python does it, with presumed significant productivity benefits
(lots of time isn't spent dealing with that common mistake).

D can arguably be mitigated by differentiating bitwise operators more them
from their logical counterparts (for example by having the logical operators
in plain English).

Many more are better handled at the editor/IDE level, but this seems like a
really interesting read for anyone involved in PL design.

~~~
TheOtherHobbes
The Alto thread posted a ref to BCPL, and it was interesting that the original
symbols for the comparison operators were text: eq, ne, gt, lt.

C broke BCPL syntax that was clear, memorable, and consistent and replaced it
with a math-like syntax that must have wasted millions of person hours of
debugging time in the decades since - for no good reason.

Similarly &(pointer address) and &&(logical and) are too close and too easy to
typo.

Language design really should have considered human factors much more than it
did.

~~~
dsp1234
Interestingly, Powershell uses eq, ne, gt, lt, etc for its comparison
operators.

~~~
13of40
The historical reason behind that is the > and | symbols were used as
redirection operators to follow shell conventions so they used -gt and -bor.
All of the other operators followed for consistency. There was also a big
kerfuffle about statements like "$a = 1" or "$a++" not being able to return
values like in C because if they were used stand-alone the return value would
be printed out by the host.

------
nchelluri
I did courses at two universities (I transferred) and in either case one would
need to compile the assignment source code in order to actually do the
homework. I think that a compiler, especially javac, would catch almost all of
these and warn about some of the others.

Moreover, good syntax highlighting or an IDE with some static analysis in it
would help a lot too. I think that might be a useful thing to put in intro
programming classes. Eclipse is free right? I use IDEA-based editors for most
of my work, but even the syntax highlighting in, say, emacs without installing
packages (at least on the OS/distros I'm familiar with) would go some fair
distance to this goal. I assume the same would be true of vi.

------
PythonicAlpha
After having a short look on the compilation, it shows one thing very
accurately IMHO: C syntax is very error prone.

To sad, that so many new programming languages chose to use exactly this
syntax that is so error prone.

C was a very good programming language and the short syntax might have some
appeal -- but for learning programming, this syntax is not the best option, as
long as you don't use it as type of intellectual test to find the best
computer-people ...

~~~
nulbyte
English is also error-prone, as you have demonstrated—perhaps unwittingly—but
we still use it around the world. I would argue that this fact makes C syntax
a particularly good choice for one learning to program: Computers don't think
like humans, unless we program them to think like humans.

~~~
riwsky
Though it is in this case worth noting that there is enough redundancy in
English that you can both a) tell that "to bad" is wrong and b) know they
meant "too bad", which isn't always the case for =/==

~~~
syphilis2
Of course the desired phrase was "So sad". Or was it "Too sad"?

------
SCHiM
Error D would be particularly confusing, since it would still produce the
result the programmer intended (at least with my setup where implicit int ->
bool).

"The right thing for the wrong reasons".

~~~
icefo
This shouldn't work. I spot sometimes those errors in my code and I find them
fun to fix

------
darioush
A different perspective: The paper shows: (1) Humans quickly learn to avoid
simple syntax mistakes after they compile code and get an error message. These
messages often pinpoint the error location and suggest the fix, so this result
is hardly surprising (e.g., Invalid token '}', did you forget ';').

(2) The authors assume every type error is unintentional. This may not be
true: Consider transitioning from using a String to represent a number (eg., a
command line argument), to a numeric type. This transition may be to check for
errors upfront and to avoid parsing the number in multiple locations. All
these locations will be pointed to by type errors, after the programmer
changes the type.

------
adrianratnapala
Is it even interesting to worry about the mistakes novice programmers make?
Novice programmers are learning programmers.

What matters more is the mistakes that people continue to make even after they
are not novices.

~~~
enraged_camel
From the paper:

 _1\. Introduction

Knowledge about students’ mistakes and the time taken to fix errors is useful
for many reasons. For example, Sadler et al [10] suggest that understanding
student misconceptions is important to educator efficacy. Knowing which
mistakes novices are likely to make or finding challenging informs the writing
of instructional materials, such as textbooks, and can help improve the design
and impact of beginner’s IDEs or other educatoinal programming tools._

In other words, yes, understanding what types of errors novice programmers
make can be very interesting and useful.

~~~
adrianratnapala
Good.

I had my language-designer goggles on, but you and the paper are right that
educator goggles matter too.

------
EGreg
A lot of these would be solved by the following language:

[https://news.ycombinator.com/item?id=2044752](https://news.ycombinator.com/item?id=2044752)

