

Ask HN: What would Your_Favorite_Compiler do? - 3ds

x = 10<p>y = x + ++x<p>What is y?<p>Even the professor who teaches compiler construction at my university thought it was interesting to see how different languages or compilers handle this. I've tried a couple but leave it to you to post and discuss your results.<p>(I know it's terrible code)
======
mahmud

      (let* ((x 10)
             (y x))
        (+ x (incf x)))
    
      vs
    
      (let* ((x 10)
             (y x))
        (prog1
          (+ x x)
          (incf x)))   
    
      [INCF does it with side-effects, use 1+ or directly evaluate  (+ 1 x) for the clean version]
    

Not everyone puts up with semantic ambiguity in operator precedence and
evaluation. Some of us program in parse trees, directly :-)

~~~
mst
Not really. ++x (as opposed to x++) is expected to return the incremented
value, not the pre-increment value. So your choices are

    
    
      (+ x (incf x))
    

and

    
    
      (do
        (incf x)
        (+ x x))
    

depending on whether the language evaluates arguments in order, or if it
builds the arguments as cells to observe and then reads their values (there's
presumably a technical term for that which I can't remember - I blame sunday
:)

So in C it's undefined behaviour because the compiler should be permitted to
handle it whichever ways it wants, but in perl you'll get 21 as expected.

C is great fun for this - people often expect things like function arguments
to be evaluated in the order as written but since it's a low level languages
the compiler is allowed to do something completely different if it wants (note
this something is still "equivalent" in terms of the C spec, just not in terms
of the developer's expectations).

This is one of the many reasons -O3 can make apparently correct code start
crashing all over the place.

~~~
mahmud
What do you mean by "not really"? :-P

The poster's question is a classic comp.lang.c "favorite"; misunderstanding
operator precedence and the two flavors of incrementing in C.

By the way, the term you're looking for is "binding", and bad languages have a
habit of confusing binding and _assignment_.

LET "binds" its arguments to their corresponding values within its body, in a
way identical to function application.

Algebraically his question looks like this:

    
    
      f(x) -> x + succ(x)
      y = f(10) 
    

Where succ is the incrementing, or _successor_ function, and + is the addition
function.

Btw, much fun can be had with just the increment and decrement functions,
along with a predicate to test for zero equality.

<http://en.wikipedia.org/wiki/Primitive_recursive_function>

P.S. The Lisp "DO" form is for iteration; PROGN and BEGIN are the main block-
structure forms used in Common Lisp and Scheme respectively. To the original
poster; invest in learning Standard ML and a few lisp dialects, on your own,
they will make your compiler hacking far more enjoyable, and you probably wont
waste as much time debugging a bad language design from 1970s that thought
formal language research from the 1670s was too cutting edge; New Jersey heard
"Leibniz" and they thought "Lebanese".

~~~
huhtenberg
> _The poster's question is a classic comp.lang.c "favorite"; misunderstanding
> operator precedence and the two flavors of incrementing in C._

It is a classic C question not knowing about the sequence points and side-
effects. It has nothing to do with either operator precedence or lexer
behavior.

------
cia_plant

        val x = unsafePerformIO $ readIORef x
    
        inc x = unsafePerformIO $ modifyIORef x (1+) >> readIORef x
    
        main = do
          x <- newIORef (10::Int)
          print $ val x + inc x
    

The result is 21.

Edit: an explanation. GHC runs on graph reduction, and will do "normal order"
reduction in the normal case. + is strict in both arguments, so (val x) will
be fully evaluated before (inc x) is evaluated.

If we replace with a lazy operator like (:), then we can get different
results. Here is a program which will evaluate the (inc x) first:

    
    
        main = do
          x <- newIORef (10::Int)
          print $ reverse $ val x : inc x : []
    

the expression is not fully evaluated until it is printed, and then it is
evaluated in reverse order, so inc x runs before val x. The result is [11,11].

Of course, none of this can be relied on. GHC does a lot of optimizations, and
unsafe operations are unsafe.

------
huhtenberg
The formal answer for C and C++ is "undefined", because the + operator is not
a sequence point (meaning that the side-effect of ++ may or may not flushed).
So my favourite compiler would issue a warning :)

------
nostrademons
Interpreter, really:

    
    
        >>> x = 10
        >>> x + ++x
        20
    

I suspect my other favorite compiler will say 'Type error: Could not match
"Num a => a" against "Num a => [a] -> [a]"', but I don't want to wait for it
to finish installing.

~~~
3ds
yup, that's python alright. Very odd behaviour

~~~
ajt
++x in python parses as +(+(x))

------
jules
Think about how this is compiled. In machine language there are no nested
expressions, so the compiler will have to split the expression up.

    
    
        expression( ... ++x ... )
    

will probably be translated to

    
    
        x = x+1;
        expression( ... x ... )
    

This is the simplest way to compile ++x:

    
    
        1. put x=x+1 before the expression
        2. replace ++x with x
    

So I think most languages would get 22.

But in languages where + is a function there's a good chance you get 21.

~~~
gizmo
Sounds plausible, but I don't think it's true.

Compilers don't magically filter assignments out of expressions. The compiler
already has the parse tree (+ x (1+ x)), and the arguments are pushed on the
stack in reverse order because that's consistent with function calls (in c++
when you were to overload operator+ nothing would change, I suspect). So first
1+ x is evaluated and the value of x is updated, and the number 22 is
returned.

Or to illustrate with a stack machine:

    
    
        PUSH X
        INC     
        STORE X
        PUSH X
        ADD
        STORE Y
    

But the whole discussion is still silly, because the whole deal is undefined
in any sane language (and for good reason). Unless variables become immutable
you can't really prevent this kind of ambiguity from occurring, so it's no big
deal.

~~~
jonsen
Java: "Regardless of their complexity, the meanings of expressions are allways
well defined."

That's why you can infer that y = 21 in Java.

Would you call Java insane for that?

------
pmorici
I would expect 22. On a related a aside this reminds me of a time in school
where I wrote a recursive function that I called using the post decrement
operator as follows, some_func(x--) which was supposed to stop recursion when
x reached zero. Of course, it never did and went until it filled the stack.
One of those things that stares you in the face and yet takes hours before
you're like duh.

------
gaius

        # let x = 10;;
        val x : int = 10
        # let y = x + ++x;;
        Error: Syntax error
    
    

Hmm. Let's try that again:

    
    
        # let x = ref 10;;
        val x : int ref = {contents = 10}
        # let y = !x + incr x;;
        Error: This expression has type unit but an expression was expected of type int
    
    

So nothing, I guess!

------
yannis
My favourite compiler, would have a manual one page long. I should be able to
read it and program 'hello world' in five minutes. It should come with about
40 primitives and leave the rest to me and its community!(It should also throw
an error if you type y=x + ++x

:)

------
blackdog
I predicted 21 and got 22 with gcc on mac. This is why I hack Haskell :)

------
zokier
i'd think in this way:

    
    
        y = operator+(x, operator++(x));
    

and therefore operator++(x) would be called before operator+()

------
cperciva
My_Favourite_Compiler would compile "x + ++x" as a call to abort(3). I think
"undefined behaviour" is a horrible idea.

~~~
daeken
Undefined depends on the language. In many languages, the precedence rules are
explicit enough that this code is clearly defined. In Ultilang (my own), Boo,
and others, this would be parsed as x + {x += 1; x} and processed left-to-
right.

~~~
cperciva
_Undefined depends on the language._

Of course. What I didn't mention is that my favourite language is C. :-)

------
viraptor
y is 22. But my favourite imaginary compiler doesn't allow you to compile the
second statement unless 1. your IQ is above a certain threshold 2. you
explained in a comment - why?...

~~~
derefr
That "comment" is called parentheses :)

