
Hundred Year Mistakes - joosters
https://ericlippert.com/2020/02/27/hundred-year-mistakes/
======
dahfizz
I once asked my college professor about operator precedence in C. He had been
writing C code in industry for decades.

"I have no idea" he told me.

He said many programmers try to make their code as short and pretty as
possible, as if there was some kind of character limit. Instead of falling
into that trap, he just used parentheses. And the order of operations never
was an issue.

~~~
hirundo
I agree with your prof, and have a peeve with linters that complain about
unnecessary parens. They may be unnecessary, but they can sure be helpful for
those of us who don't have the precedence tables fully embedded in our neural
code scanners.

~~~
Izkata
Even when I have no issues with the precedence order, I often add parens
anyway because it helps with mental grouping at a glance, instead of having to
scan and mentally parse the line.

~~~
cannam
A wise practice.

As a young egotist I would often omit parens in complicated C expressions. I
did this intentionally and in a very self-satisfied way - writing multi-line
conditionals and lining them up neatly without parens with a metaphorical
flourish of my pen.

Then one day, chasing a hard-to-find bug, I realised it had happened because
I'd mixed up the precedence of && and || in a long conditional. I was an
idiot. Since then I've made a point of reminding myself that I know nothing
and that there's nothing to be gained from pretending I do, and putting parens
in everywhere.

Sometimes, even now, I get those grandiose moments when I think the code I'm
writing cannot possibly go wrong. Those are the moments that call for a bit of
fresh air and an extra unit test or two.

~~~
codebje
I heard an aphorism once I've tried to live by since:

It is twice as hard to debug code as it is to write it. If you write code as
clever as you can make it, you will never be able to debug it.

~~~
samatman
Ironically enough, given the article, this is attributed to Brian Kernighan.

~~~
codebje
I'll try to remember the attribution as well as the aphorism in future, thanks
:-)

------
michaelt
_> The moral of the story is: The best time to make a breaking change that
involves updating existing code is now, [...] It’s fifty years since this
mistake was made_

Another possible moral is: The languages that stay in use for fifty years are
the ones that avoid making breaking changes. :)

~~~
ericlippert
Though I get the humour and agree with the sentiment, the history of C tells
us that a language can massively succeed even with constant "breaking
changes". One of the things I learned working at Coverity is that there is no
such language as C or C++; rather, there are hundreds of mutually incompatible
languages with those names. Almost no compilers implement any of the standards
exactly, and there are so many standards to choose from. The vast majority of
real-world line-of-business C programs are designed to work with a single
compiler and are therefore never evaluated in terms of their correctness when
ported to another compiler.

That said, the C# compiler team was and continues to be extremely concerned
about breaking changes because we very clearly perceived the cost to customers
and the barriers to upgrading entailed by breaking changes. I introduced a
handful of deliberate breaking changes in my years on the C# design and
compiler team, and every one was agonized over for many hours by members of
the design team who were experts on the likely customer impacts.

------
svat
Similar story: Stuart Feldman, the author of `make`, on why Makefiles require
tabs by default:

> _After getting myself snarled up with my first stab at Lex, I just did
> something simple with the pattern newline-tab. It worked, it stayed. And
> then a few weeks later I had a user population of_ about a dozen _, most of
> them friends, and I didn 't want to screw up my embedded base. The rest,
> sadly, is history._

------
bla3
Here's what clang does with this code
([https://godbolt.org/z/Ar2qt6](https://godbolt.org/z/Ar2qt6)):

    
    
      <source>:5:11: warning: & has lower precedence than ==; == will be evaluated first [-Wparentheses]
      int t = x & y == z;   // ?
                ^~~~~~~~
      <source>:5:11: note: place parentheses around the '==' expression to silence this warning
      int t = x & y == z;   // ?
                ^
                  (     )
      <source>:5:11: note: place parentheses around the & expression to evaluate it first
      int t = x & y == z;   // ?
                ^
              (    )
    

Seems like a mostly solved problem.

~~~
ericlippert
The problem is "mostly solved" by adding a warning to a single compiler?

Sure, there are plenty of ways to mitigate the problem. That's not the point.
The point is that the problem should not have arisen in the first place to
require ongoing mitigation fifty years later!

~~~
carterehsmith
What are you going to do about it? a) just use parenthesis like everyone else,
or b) turn the time back to 50 years ago and change that?

Seriously, the article was just a rant. Yeah, people 50 years ago did
something wrong. So what.

~~~
ericlippert
My blog is about the design and implementation of programming languages; by
understanding the causes of past mistakes we can learn to recognize them again
today. The best mistakes to learn from are other people's!

------
burlesona
There’s a similar problem that comes up in business often enough, where a
change in your software leaves behind “legacy” code or data. When developing
you almost always have to write things such that the new model and old model
can live side by side, and customers using the old model can (at least for a
time) just keep going while new users go on the new model.

Whenever a project like this is done, it’s very tempting from a business point
of view to just leave the “legacy” users / data alone. But, technically, all
your code continues to have to support two different models. Yes, with the
right abstractions you can make this work without being terribly painful, but
in practice there are always rough edges.

The time and effort it would take to migrate the old stuff into the new format
is high, and the perceived business value is low. In many cases the migration
never happens, and the old code is never removed.

But the problem is, you are now paying a tax to deal with that old code for
all time. Every time a decision like this is made, the tax goes up. The tax
slows you down, and it makes all future work you do more complicated.

So, there is a game theory problem here: for each individual decision, it
arguably makes more sense to leave the legacy case alone and move on. But if
you choose that path every time, it is a mistake, because the costs compound.

So far, the only thing I’ve seen that prevents this is very strong technical
leadership that ensures the migration is baked into the project and made non-
optional to the business teams. That’s really hard to do though, and arguably
not always the right decision for the business, which may need to move fast
_now_ just to survive.

------
cjfd
The operator precedence rules that are discussed here are a bit tricky and
programmers could be excused to have to think a few secs before knowing how it
was again. Therefore, it is better to use parentheses for these tricky cases.
On the other hand, everybody should know without thinking that * has
precedence over +, therefore, a * b + c * d should not have parentheses if it
is intended to mean what is written. Parentheses would just be needless
clutter in this case.

~~~
kgwxd
I know * and + precedence without thinking about it, but when I see it without
parentheses, I can't help but pause to be sure I'm not having a brain fart,
and then also question why it wasn't just put in parentheses. So I just put in
the parentheses or split up the steps, every time.

------
kozak
Same thing about frame rates derived from NTSC, where e.g. 60 fps in not
really 60 fps, but 60/1.001 fps. Those extremely inconvenient frame rates are
ubiquitous in systems that have nothing to do with analog broadcast television
(e.g. on your YouTube).

~~~
AnimalMuppet
Pedantic nit: That's the field rate, not the frame rate. The frame rate is
30/1.001 fps. This is because NTSC is interlaced.

~~~
kozak
No, I really meant what I said. If you see a video that's labeled "60 fps",
quite likely it's actually 60/1.001 fps. It applies to the entire 24/30/60/120
range.

~~~
AnimalMuppet
I re-read, and your claim was "derived from NTSC", not that NTSC itself was
60/1.001 fps. So, yeah. NTSC itself was (30 frames or 60 fields) / 1.001.
"Derived from NTSC"... yeah, that could be "60 frames".

------
rgoulter
I think the author discounts the cost of "there are only a few users, so it's
not a big deal to have breaking changes". I think stability and the backwards
compatibility of a tool are important considerations to make in "is this worth
using?".

If a tool has a large community of usage, breaking compatibility affects a
large number of users. If the tool doesn't have a large community, then a lack
of stability means there's risk the tool won't in-future support current use
cases, or that I'll have to spent much more time maintaining my use of it than
with 'boring' tools.

It's fine for a tool in v0.x to have breaking changes. I think not supporting
stable versions of a tool would make it hard to retain users in the long run,
though.

~~~
TeMPOraL
Here's a counterargument: if a tool starts developing userbase, it's likely to
have _even more_ users in the future. Up to a point, the more users it has,
the more it will have. So a breaking change while the userbase is still
relatively small can be justified on the grounds of making things better for a
larger number of soon-to-be users.

~~~
jon889
I agree. People complain that Swift changed too much in the earlier versions
but I'm glad it did and I'm kind of sad that it's slowed down now and there
are things I wish could be changed but can't. Yes it was a bit of a pain but
there were migration tools that helped. And that pain is long since forgotten.

------
carapace
I think elaborate syntax is in itself a "hundred year mistake". I've been
messing about with a language that uses Reverse Polish Notation (like Forth)
for a bit, so of course there's no operator precedence at all. Also the
distinction between short-circuiting Boolean combinators and "mathematical"
Boolean operators is obvious, the former require a quoted function as one of
their inputs (to conditionally execute) while the latter just take two ints
(or bools or whatever, primitive values rather than quoted functions).

In a crude type notation:

    
    
        bool -> (... -> bool) -> bool
    

vs.

    
    
        int -> int -> int

~~~
Twisol
You could go the next step for symmetry and make _both_ operands quoted
functions, at which point you're just working in a lazy language. :D

~~~
carapace
That's actually how I did it at first, but then I realized that it was silly
because you _always_ run the first quoted function right off anyway so it's
cleaner to just require a Boolean value.

(There's a way using Category Theory to firm up that assertion into proper
math, but I'm not good enough at CT to do that.)

:D

------
kazinator
> _You might say “just search all the source code for that pattern” but this
> was two years before grep was invented! It was as primitive as can be._

If you control the compiler, you can make the compiler diagnose all the cases
which are affected by the precedence of &, emitting a file name and line
number.

That was the thing to do with those several hundred kilobytes of code. Get the
precedence right, and then warn about any code actually relying on the
precedence. The compiler then "greps it out" accurately; even instances that
are obfuscated by preprocessing.

Or, add the warning _first_ before rolling out the change to the precedence:

    
    
      foo.c:35: warning: obsolescent & precedence: use parentheses
    

Let the programmers go through a period of obsolescence whereby they rid their
code of these warnings by using parentheses. Then, the compiler can be
changed, while maintaining a diagnostic in that area of the language for some
time. Eventually, when everyone has forgotten the old dialect with the bad
precedence, the diagnostic can be removed.

When the new precedence is rolled out, a compatibility switch can be included
for compiling code with the obsolescent precedence. Actually, that option can
even be rolled out first (so it initially does nothing).

When the option is eventually removed, the compiler will refuse to run if it
is specified.

So anyway, Ritchie had a few fairly easy and cheap alternatives to sticking
with bad precedence; he perhaps wasn't aware of them.

------
ajross
So... yeah, I totally agree with the moral of the article. Unless you are
actually planning for how to handle a user uprising due to incompatible
changes, fix your stuff now. But let's talk about operator precedence instead,
because that's more interesting:

The boolean operations are the biggest gotcha, but these are everywhere.
Similarly the "*" indirection operator binds more strongly than many
programmers think it does (and the same symbol in type expressions too, which
is why those parentheses in function pointer declarations that no one really
understands are needed). The shift operators got picked up by C++ to do I/O of
all things, where get to wreak havoc in a whole new regime.

Frankly the only precedence rules that most programmers understand (because we
were taught them in grade school!) is the two-level/left-to-right grammar for
the four basic arithmetic operations, along with a vague sense that these
should bind more tightly than an operator with "=" in it, because that
separates the "other side of the equation".

Given that, why don't new programming languages just require fully
parenthesized expressions for expressions involving other operators?

~~~
Ididntdothis
“Given that, why don't new programming languages just require fully
parenthesized expressions for expressions involving other operators?”

That would be excellent and be in line with the trend towards “safe”
languages. I have countless examples where I read code with several
expressions and when I asked the dev if they really understood the situation
rules of this language they usually didn’t. They just wrote something they
thought should be ok.

------
rob74
> _C++, Java, JavaScript, C#, PHP and who knows how many other languages
> largely copied the operator precedence rules of C_

Ok, that's understandable for C++, but for all the other languages I really
wonder what the reasoning behind this (if any) was - "our language looks
similar to C, so we have to copy as many of its warts as possible so C
developers feel at home" ?

~~~
karatestomp
PHP, at least, is (or used to be, I've been out of that world for... oh wow, a
long time now) a really thin layer over C, so that's its excuse.

------
rzwitserloot
The operator precedence in the article is just an example of a principle that
does hold: A series of choices, each individually plausibly the right call,
lead to a bizarre situation that doesn't feel right.

However, often (and I submit this very article as exhibit A), the logic is
reversed: One observes a bizarre scenario, points at it, and goes: I don't
know what or how, but clearly somebody somewhere messed up.

That's not true; sometimes seemingly simple things simply aren't simple, and
you need to take into account all users of a feature and not just your more
limited view.

Take operator precedence. __operator precedence is an intractable problem__.

Some languages try to make it real simple for you and say that all binary
operators are resolved left to right, and have the same precedence level. This
makes it easy to explain and makes it much simpler to treat anything as an
operator; a feature many languages have. Smalltalk works like this. The
smalltalk language spec fits on a business card; obviously, you must use such
principles as C's operator precedence table would otherwise occupy most of
your card!

But that does mean that `1 + 2 * 3` is 9 and not 7, and that is extremely
weird in other contexts.

So, you're in a damned if you do and damned if you don't situation: There is
no singular answer: No operator precedence rule is inherently sensible
regardless of the context within which you are attempting to figure out how
operator precedence works: It is an intractable problem.

One way out is to opt out entirely and make parentheses mandatory. If you then
also make it practical and say that the same operator associates left to right
regardless of what operator it is, you can still write, say, `1 + 2 + 3 + 4`,
but you simply aren't allowed to write `1 + 2 * 3` – you are forced to write
`1 + (2 * 3)` or `(1 + 2) * 3`. But now anybody who feels they can copy
problems from math domains, or from technical specifications such as a crypto
algorithm description now gets tripped up. You may disagree (I certainly do),
but a significant chunk of programmers just prefer shorter code. I would
certainly prefer my languages to work this way, and have configured my linters
like this as well, but it's still not a universally superior solution
regardless of point of view.

In other words, had Ritchie chosen to go with this 'when in doubt, do not
compile it' strategy, this article could not be written. And we'd still have
programmers unhappy with how the operator precedence rules work.

~~~
SAI_Peregrinus
> No operator precedence rule is inherently sensible regardless of the context
> within which you are attempting to figure out how operator precedence works:
> It is an intractable problem.

I disagree. FORTH got it right. Postfix notation is inherently sensible, as
any RPN calculator user will tell you. It eliminates the need for both
precedence rules and parentheses, unambiguously.

1 2 + 3 * is 9. 1 2 3 * + is 7.

It's infix notation that's the problem, precedence confusion is a consequence
of it. (Prefix notation can work too, but requires more parentheses).

~~~
rzwitserloot
Avoiding precedence issues is great. Absolutely. However, it's not crazy to
want to write it like most humans are used to. I have no problem with the idea
that it's more trouble than it sounds like, and '1 2 +' is the best answer to
the dilemma. That still doesn't make it a slam dunk. Or should we start saying
that just about every programming language out there has a major design flaw,
in that it has infix and not postfix operators?

------
kazinator
The hundred year mistake is ambiguous syntax with hidden precedence rules
_itself_ , rather than any specific choice of precedence.

At least, particularly poor choices of precedence can be diagnosed. Whenever
the compiler applies a precedence rule between two operators that has been
identified as awkward, it can emit a diagnostic mentioning those operators:
"warning: quirky precedence: use parentheses when & is combined with ==".

------
Camillo
I guess it's my turn to be the dipstick who complains about the website's
design instead of discussing the contents, but I'm finding the font extremely
hard to read in Chrome. In Safari it's better, but it still doesn't look great
in terms of kerning. Anyone else seeing this?

~~~
ericlippert
It's rendered using the default WordPress theme ("Twenty Eleven") with the
sole customization being that the body text colour is changed to purple. If it
is rendering poorly, take it up with either WordPress or your browser
provider; there's not much I can do about it.

------
AtlasBarfed
operator precedence is arbitrary to a language. There is no universal "right"
or "wrong" and the LISP postfix people know this, there's no operator
precedence there, just a sequence of operators and the operands that follow
that explicitly declares the order of evaluation. It does have parentheses
though :-)

So the mistake here seems to be expectation, which is arbitrary. Everyone
knows the +-*/() precedence of basic arithmetic, but boolean operations and
others? That isn't at the cultural level of basic schooling, so the author's
contention that the precedence is wrong isn't a priori.

------
thedance
The article rests on the notion that this is confusing, but fails to prove it:

    
    
      int x = 0, y = 1, z = 0;
      int r = (x & y) == z; // 1
      int s = x & (y == z); // 0
      int t = x & y == z;   // 0
    

None of that surprises me, but moreover I would never expect myself or anyone
else to be sure about it, I would refer to the operator table to be check. The
article also supposes that this line has some natural meaning that everyone
expects, but I'm not seeing it:

    
    
      if(x() & a() == y)
    

I think the author is projecting their own cognitive patterns onto the rest of
us without justification.

~~~
ericlippert
Sure, that's entirely possible.

I'm curious to know if you knew from the moment of your birth that you should
look at the operator table in this case, or if you learned that mitigation on
a particular day. If the latter, what might you have done _before_ you learned
that?

~~~
thedance
I don't know. I feel like I've always needed to think about any line of code
with several operators on the same line, but I've been in this business for a
long time, so I might have learned the habit from negative experience.

I'm glad you mentioned it though, because now I'm curious if there's a
difference among programmers who score well or poorly on a cognitive
reflection test. Maybe people with a tendency to suppress their intuitive
response are also less likely to think that a line of code has an obvious
meaning?

------
svnpenn
Personally I think the problem is right here:

    
    
        int x = 0, y = 1, z = 0;
        int s = x & (y == z); // 0
    

Other languages like Go dont allow this:

    
    
        package main
        func main() {
         n1, n2, n3 := 10, 11, 12
         n4 := n1 & (n2 == n3)
         println(n4)
        }
    

Result:

    
    
        invalid operation: n1 & (n2 == n3) (mismatched types int and bool)

~~~
dahfizz
TFA directly addresses this. The problem you demonstrate is that C does not
have a bool type, and the == operator instead returns an int.

~~~
binarycrusader
Ah, but C99 has a bool type: _Bool. Also available as 'bool' via stdbool.h:

[https://en.cppreference.com/w/c/keyword/_Bool](https://en.cppreference.com/w/c/keyword/_Bool)

But I otherwise agree with you.

~~~
codebje
It also has silent type coercion, so it will happily promote your _Bool to an
int, neatly voiding any benefits the type might have had to offer.

------
Taster5
Move Fast & Break Things approach yet again would save the day if applied

------
jlv2
Too bad he didn't post this on Precedence Day last month (February 17).

~~~
ericlippert
WOW. JUST... WOW.

I will definitely try to remember that for next year!

------
aaron695
To get into this issue it's bad programming?

You should never write something so complex?

~~~
2zcon
I only know not to write things so complex because either I have made the
mistake or some generation of someone I've learned from has made the mistake.
It's not a logic problem, where the less complicated, better understood and
more general rules of precedence allow writing equations that are terse,
objective, and easily parsed by humans: it was born with C's invented,
abstraction-breaking rules of precedence. You only learn it with a C-like
language. What's a bit to a mathematician?

I'd argue the practices that are most popularly known as 'good practices' are
to avoid mistakes that are common.

------
LeonB
A “nor only” language would eradicate the issue of operator precedence.

------
anonymousiam
Just use parenthesis.

------
0xff00ffee
This article is really funny because it is true. I started programming C in
1987 and got my first programming job in 1992 (writing drivers for flash
memory for windows 3.1). I was lucky to have a really good mentor. He beat me
over the head to always use parenthesis whenever I used more than one binary
operator and I never stopped. Sure, my macros looked more like LISP than C,
but he burned the fear of god into me.

