Ask HN: What would Your_Favorite_Compiler do?

mahmud · on Aug 23, 2009

  (let* ((x 10)
         (y x))
    (+ x (incf x)))

  vs

  (let* ((x 10)
         (y x))
    (prog1
      (+ x x)
      (incf x)))   

  [INCF does it with side-effects, use 1+ or directly evaluate  (+ 1 x) for the clean version]

Not everyone puts up with semantic ambiguity in operator precedence and evaluation. Some of us program in parse trees, directly :-)

mst · on Aug 23, 2009

Not really. ++x (as opposed to x++) is expected to return the incremented value, not the pre-increment value. So your choices are

  (+ x (incf x))

and

  (do
    (incf x)
    (+ x x))

depending on whether the language evaluates arguments in order, or if it builds the arguments as cells to observe and then reads their values (there's presumably a technical term for that which I can't remember - I blame sunday :)

So in C it's undefined behaviour because the compiler should be permitted to handle it whichever ways it wants, but in perl you'll get 21 as expected.

C is great fun for this - people often expect things like function arguments to be evaluated in the order as written but since it's a low level languages the compiler is allowed to do something completely different if it wants (note this something is still "equivalent" in terms of the C spec, just not in terms of the developer's expectations).

This is one of the many reasons -O3 can make apparently correct code start crashing all over the place.

mahmud · on Aug 23, 2009

What do you mean by "not really"? :-P

The poster's question is a classic comp.lang.c "favorite"; misunderstanding operator precedence and the two flavors of incrementing in C.

By the way, the term you're looking for is "binding", and bad languages have a habit of confusing binding and assignment.

LET "binds" its arguments to their corresponding values within its body, in a way identical to function application.

Algebraically his question looks like this:

  f(x) -> x + succ(x)
  y = f(10)

Where succ is the incrementing, or successor function, and + is the addition function.

Btw, much fun can be had with just the increment and decrement functions, along with a predicate to test for zero equality.

http://en.wikipedia.org/wiki/Primitive_recursive_function

P.S. The Lisp "DO" form is for iteration; PROGN and BEGIN are the main block-structure forms used in Common Lisp and Scheme respectively. To the original poster; invest in learning Standard ML and a few lisp dialects, on your own, they will make your compiler hacking far more enjoyable, and you probably wont waste as much time debugging a bad language design from 1970s that thought formal language research from the 1670s was too cutting edge; New Jersey heard "Leibniz" and they thought "Lebanese".

huhtenberg · on Aug 23, 2009

> The poster's question is a classic comp.lang.c "favorite"; misunderstanding operator precedence and the two flavors of incrementing in C.

It is a classic C question not knowing about the sequence points and side-effects. It has nothing to do with either operator precedence or lexer behavior.

zck · on Aug 23, 2009

>P.S. The Lisp "DO" form is for iteration; PROGN and BEGIN are the main block-structure forms used in Common Lisp and Scheme respectively.

I think mst's using Arc's do, which is progn.

3ds · on Aug 23, 2009

yes, "operator precedence" is of course the key

cia_plant · on Aug 23, 2009

    val x = unsafePerformIO $ readIORef x

    inc x = unsafePerformIO $ modifyIORef x (1+) >> readIORef x

    main = do
      x <- newIORef (10::Int)
      print $ val x + inc x

The result is 21.

Edit: an explanation. GHC runs on graph reduction, and will do "normal order" reduction in the normal case. + is strict in both arguments, so (val x) will be fully evaluated before (inc x) is evaluated.

If we replace with a lazy operator like (:), then we can get different results. Here is a program which will evaluate the (inc x) first:

    main = do
      x <- newIORef (10::Int)
      print $ reverse $ val x : inc x : []

the expression is not fully evaluated until it is printed, and then it is evaluated in reverse order, so inc x runs before val x. The result is [11,11].

Of course, none of this can be relied on. GHC does a lot of optimizations, and unsafe operations are unsafe.

huhtenberg · on Aug 23, 2009

The formal answer for C and C++ is "undefined", because the + operator is not a sequence point (meaning that the side-effect of ++ may or may not flushed). So my favourite compiler would issue a warning :)

nostrademons · on Aug 23, 2009

Interpreter, really:

    >>> x = 10
    >>> x + ++x
    20

I suspect my other favorite compiler will say 'Type error: Could not match "Num a => a" against "Num a => [a] -> [a]"', but I don't want to wait for it to finish installing.

3ds · on Aug 23, 2009

yup, that's python alright. Very odd behaviour

ajt · on Aug 23, 2009

++x in python parses as +(+(x))

jules · on Aug 23, 2009

Think about how this is compiled. In machine language there are no nested expressions, so the compiler will have to split the expression up.

    expression( ... ++x ... )

will probably be translated to

    x = x+1;
    expression( ... x ... )

This is the simplest way to compile ++x:

    1. put x=x+1 before the expression
    2. replace ++x with x

So I think most languages would get 22.

But in languages where + is a function there's a good chance you get 21.

gizmo · on Aug 23, 2009

Sounds plausible, but I don't think it's true.

Compilers don't magically filter assignments out of expressions. The compiler already has the parse tree (+ x (1+ x)), and the arguments are pushed on the stack in reverse order because that's consistent with function calls (in c++ when you were to overload operator+ nothing would change, I suspect). So first 1+ x is evaluated and the value of x is updated, and the number 22 is returned.

Or to illustrate with a stack machine:

    PUSH X
    INC     
    STORE X
    PUSH X
    ADD
    STORE Y

But the whole discussion is still silly, because the whole deal is undefined in any sane language (and for good reason). Unless variables become immutable you can't really prevent this kind of ambiguity from occurring, so it's no big deal.

jonsen · on Aug 23, 2009

Java: "Regardless of their complexity, the meanings of expressions are allways well defined."

That's why you can infer that y = 21 in Java.

Would you call Java insane for that?

3ds · on Aug 23, 2009

Java's result is 21

gcc says 22 and it gives you a warning, that this expression is illegal, we looked it up, because you cannot use a variable which is ++incremented more than once in the same expression.

pmorici · on Aug 23, 2009

I would expect 22. On a related a aside this reminds me of a time in school where I wrote a recursive function that I called using the post decrement operator as follows, some_func(x--) which was supposed to stop recursion when x reached zero. Of course, it never did and went until it filled the stack. One of those things that stares you in the face and yet takes hours before you're like duh.

gaius · on Aug 23, 2009

    # let x = 10;;
    val x : int = 10
    # let y = x + ++x;;
    Error: Syntax error

Hmm. Let's try that again:

    # let x = ref 10;;
    val x : int ref = {contents = 10}
    # let y = !x + incr x;;
    Error: This expression has type unit but an expression was expected of type int

So nothing, I guess!

yannis · on Aug 23, 2009

My favourite compiler, would have a manual one page long. I should be able to read it and program 'hello world' in five minutes. It should come with about 40 primitives and leave the rest to me and its community!(It should also throw an error if you type y=x + ++x

:)

blackdog · on Aug 23, 2009

I predicted 21 and got 22 with gcc on mac. This is why I hack Haskell :)

zokier · on Aug 23, 2009

i'd think in this way:

    y = operator+(x, operator++(x));

and therefore operator++(x) would be called before operator+()

cperciva · on Aug 23, 2009

My_Favourite_Compiler would compile "x + ++x" as a call to abort(3). I think "undefined behaviour" is a horrible idea.

daeken · on Aug 23, 2009

Undefined depends on the language. In many languages, the precedence rules are explicit enough that this code is clearly defined. In Ultilang (my own), Boo, and others, this would be parsed as x + {x += 1; x} and processed left-to-right.

cperciva · on Aug 23, 2009

Undefined depends on the language.

Of course. What I didn't mention is that my favourite language is C. :-)

sid0 · on Aug 23, 2009

I'd go one step further: my favourite compiler would give a compile error on seeing that.

viraptor · on Aug 23, 2009

y is 22. But my favourite imaginary compiler doesn't allow you to compile the second statement unless 1. your IQ is above a certain threshold 2. you explained in a comment - why?...

derefr · on Aug 23, 2009

That "comment" is called parentheses :)