Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What would Your_Favorite_Compiler do?
12 points by 3ds on Aug 23, 2009 | hide | past | favorite | 26 comments
x = 10

y = x + ++x

What is y?

Even the professor who teaches compiler construction at my university thought it was interesting to see how different languages or compilers handle this. I've tried a couple but leave it to you to post and discuss your results.

(I know it's terrible code)



  (let* ((x 10)
         (y x))
    (+ x (incf x)))

  vs

  (let* ((x 10)
         (y x))
    (prog1
      (+ x x)
      (incf x)))   

  [INCF does it with side-effects, use 1+ or directly evaluate  (+ 1 x) for the clean version]
Not everyone puts up with semantic ambiguity in operator precedence and evaluation. Some of us program in parse trees, directly :-)


Not really. ++x (as opposed to x++) is expected to return the incremented value, not the pre-increment value. So your choices are

  (+ x (incf x))
and

  (do
    (incf x)
    (+ x x))
depending on whether the language evaluates arguments in order, or if it builds the arguments as cells to observe and then reads their values (there's presumably a technical term for that which I can't remember - I blame sunday :)

So in C it's undefined behaviour because the compiler should be permitted to handle it whichever ways it wants, but in perl you'll get 21 as expected.

C is great fun for this - people often expect things like function arguments to be evaluated in the order as written but since it's a low level languages the compiler is allowed to do something completely different if it wants (note this something is still "equivalent" in terms of the C spec, just not in terms of the developer's expectations).

This is one of the many reasons -O3 can make apparently correct code start crashing all over the place.


What do you mean by "not really"? :-P

The poster's question is a classic comp.lang.c "favorite"; misunderstanding operator precedence and the two flavors of incrementing in C.

By the way, the term you're looking for is "binding", and bad languages have a habit of confusing binding and assignment.

LET "binds" its arguments to their corresponding values within its body, in a way identical to function application.

Algebraically his question looks like this:

  f(x) -> x + succ(x)
  y = f(10) 
Where succ is the incrementing, or successor function, and + is the addition function.

Btw, much fun can be had with just the increment and decrement functions, along with a predicate to test for zero equality.

http://en.wikipedia.org/wiki/Primitive_recursive_function

P.S. The Lisp "DO" form is for iteration; PROGN and BEGIN are the main block-structure forms used in Common Lisp and Scheme respectively. To the original poster; invest in learning Standard ML and a few lisp dialects, on your own, they will make your compiler hacking far more enjoyable, and you probably wont waste as much time debugging a bad language design from 1970s that thought formal language research from the 1670s was too cutting edge; New Jersey heard "Leibniz" and they thought "Lebanese".


> The poster's question is a classic comp.lang.c "favorite"; misunderstanding operator precedence and the two flavors of incrementing in C.

It is a classic C question not knowing about the sequence points and side-effects. It has nothing to do with either operator precedence or lexer behavior.


>P.S. The Lisp "DO" form is for iteration; PROGN and BEGIN are the main block-structure forms used in Common Lisp and Scheme respectively.

I think mst's using Arc's do, which is progn.


yes, "operator precedence" is of course the key


    val x = unsafePerformIO $ readIORef x

    inc x = unsafePerformIO $ modifyIORef x (1+) >> readIORef x

    main = do
      x <- newIORef (10::Int)
      print $ val x + inc x
The result is 21.

Edit: an explanation. GHC runs on graph reduction, and will do "normal order" reduction in the normal case. + is strict in both arguments, so (val x) will be fully evaluated before (inc x) is evaluated.

If we replace with a lazy operator like (:), then we can get different results. Here is a program which will evaluate the (inc x) first:

    main = do
      x <- newIORef (10::Int)
      print $ reverse $ val x : inc x : []
the expression is not fully evaluated until it is printed, and then it is evaluated in reverse order, so inc x runs before val x. The result is [11,11].

Of course, none of this can be relied on. GHC does a lot of optimizations, and unsafe operations are unsafe.


The formal answer for C and C++ is "undefined", because the + operator is not a sequence point (meaning that the side-effect of ++ may or may not flushed). So my favourite compiler would issue a warning :)


Interpreter, really:

    >>> x = 10
    >>> x + ++x
    20
I suspect my other favorite compiler will say 'Type error: Could not match "Num a => a" against "Num a => [a] -> [a]"', but I don't want to wait for it to finish installing.


yup, that's python alright. Very odd behaviour


++x in python parses as +(+(x))


Think about how this is compiled. In machine language there are no nested expressions, so the compiler will have to split the expression up.

    expression( ... ++x ... )
will probably be translated to

    x = x+1;
    expression( ... x ... )
This is the simplest way to compile ++x:

    1. put x=x+1 before the expression
    2. replace ++x with x
So I think most languages would get 22.

But in languages where + is a function there's a good chance you get 21.


Sounds plausible, but I don't think it's true.

Compilers don't magically filter assignments out of expressions. The compiler already has the parse tree (+ x (1+ x)), and the arguments are pushed on the stack in reverse order because that's consistent with function calls (in c++ when you were to overload operator+ nothing would change, I suspect). So first 1+ x is evaluated and the value of x is updated, and the number 22 is returned.

Or to illustrate with a stack machine:

    PUSH X
    INC     
    STORE X
    PUSH X
    ADD
    STORE Y
But the whole discussion is still silly, because the whole deal is undefined in any sane language (and for good reason). Unless variables become immutable you can't really prevent this kind of ambiguity from occurring, so it's no big deal.


Java: "Regardless of their complexity, the meanings of expressions are allways well defined."

That's why you can infer that y = 21 in Java.

Would you call Java insane for that?


Java's result is 21

gcc says 22 and it gives you a warning, that this expression is illegal, we looked it up, because you cannot use a variable which is ++incremented more than once in the same expression.


I would expect 22. On a related a aside this reminds me of a time in school where I wrote a recursive function that I called using the post decrement operator as follows, some_func(x--) which was supposed to stop recursion when x reached zero. Of course, it never did and went until it filled the stack. One of those things that stares you in the face and yet takes hours before you're like duh.


    # let x = 10;;
    val x : int = 10
    # let y = x + ++x;;
    Error: Syntax error

Hmm. Let's try that again:

    # let x = ref 10;;
    val x : int ref = {contents = 10}
    # let y = !x + incr x;;
    Error: This expression has type unit but an expression was expected of type int

So nothing, I guess!


My favourite compiler, would have a manual one page long. I should be able to read it and program 'hello world' in five minutes. It should come with about 40 primitives and leave the rest to me and its community!(It should also throw an error if you type y=x + ++x

:)


I predicted 21 and got 22 with gcc on mac. This is why I hack Haskell :)


i'd think in this way:

    y = operator+(x, operator++(x));
and therefore operator++(x) would be called before operator+()


My_Favourite_Compiler would compile "x + ++x" as a call to abort(3). I think "undefined behaviour" is a horrible idea.


Undefined depends on the language. In many languages, the precedence rules are explicit enough that this code is clearly defined. In Ultilang (my own), Boo, and others, this would be parsed as x + {x += 1; x} and processed left-to-right.


Undefined depends on the language.

Of course. What I didn't mention is that my favourite language is C. :-)


I'd go one step further: my favourite compiler would give a compile error on seeing that.


y is 22. But my favourite imaginary compiler doesn't allow you to compile the second statement unless 1. your IQ is above a certain threshold 2. you explained in a comment - why?...


That "comment" is called parentheses :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: