Hacker News new | past | comments | ask | show | jobs | submit login
Compiler Writers Gone Wild: ARC Madness (metaobject.com)
48 points by mpweiher on June 28, 2014 | hide | past | web | favorite | 33 comments

The complaints about clang optimizations are misplaced; the example relies on undefined behavior; regardless of what you think about it, all compilers are free to do such things.

This is also a classic case of optimizing for the wrong and uncommon things (billions of objc_msgsend calls in a tight loop), instead of the really common stuff (like leaks/crashes due to a mistake in memory management, lower productivity across your entire team due to time wasted thinking about retain/release or tracking down aforementioned leaks, etc).

Two points.

1. Breaking reflexivity is not OK. Ever. a==a should hold.

2. The compiler knows that this is undefined behavior, and uses this to "optimize" to a wrong result.

However, it does not warn me about the use of undefined behavior. Not with -Wall, not with -Wpedantic.

   marcel@localhost[tmp]cc -Wall -Wpedantic -Os -o undefined-realloc undefined-realloc.c 
   1 2
I can't think of any reason for this to be OK. Either warn me, or don't use it. Silently breaking reflexivity...did I mention "not OK"?

Of course, that's only a minor point in the article, but it does capture the theme that's explored in the main point.

There's more discussion about this on this reddit thread: http://www.reddit.com/r/programming/comments/x1ql4/undefined...

The problem is that realloc builtin's return type is internally flagged as noalias, making clang think it was safe to use constant propagation for the printf. I guess LLVM doesn't take the equality check into account.

> 1. Breaking reflexivity is not OK. Ever. a==a should hold.

This reminds me of the "proof" that 1=2. Once an undefined operation is introduced, the end result means nothing.

> 2. The compiler knows that this is undefined behavior, and uses this to "optimize" to a wrong result.

Maybe, maybe not. It's not a requirement that a compiler even recognizes that undefined behavior happens. In many cases behavior is undefined simply because it would be too expensive or impractical for the compiler to deal with that behavior.

1. a==a is false for NaN, so it's not quite "Ever"

2. The compiler does not know this is undefined behavior. Dereferencing a pointer that has been previously passed to realloc is not necessarily undefined behavior. realloc may fail and return NULL; in that case the passed-in pointer remains valid.

Just a note: "a==a is false for NaN" is only true if compiled with precise or strict floating point mode. It will return true in fast mode, which is often what is used. Do not rely on this behavior for checking NaNs, use isnan instead.

> 1. Breaking reflexivity is not OK. Ever. a==a should hold.

No. Once you've invoked undefined behavior, all bets are off. There's a trivial solution: Don't do that.

Once you've called realloc() and it's returned non-NULL, p is gone--for all anyone knows it points to random nothingness. Yes, it seems weird that the library was able to enlarge it in-place but the pointers end up non-equivalent. But you can't ever blindly rely on the in-place realloc() optimization anyway.

Simply put, using p blindly after a successful realloc() is a use-after-free bug, no matter what. If you use realloc() correctly you will never see this.

Note that if you've proven the equivalence to the compiler it works fine:

    #include <stdio.h>
    #include <stdlib.h>

    int main() {
      int *p = (int*)malloc(sizeof(int));
      int *q = (int*)realloc(p, sizeof(int));
      if (p == q) {
          *p = 1;
          *q = 2;
          printf("%d %d\n", *p, *q);
Production code should never do this, though (it's pointless). The safest, most correct thing to do is to stop using p after a successful realloc().

actually, realloc infrequently moves data around, especially if it's large. quite often the returned pointer value is the same. but yes, even if it has been free'd, the page in which it resided is likely still valid, and free doesn't zero out memory. so, dereferencing p even after realloc will likely point to a section of memory with the data that was previously there, in it.

that said, it is, quite rightly, a use-after-free 101 bug. i only mention because a lot of people see this behaviour, and it only explodes sometime later (either exceed page boundary in old data, or getting stale data later on..)

I believe realloc must return the same memory spot if the size (and maybe type?) didn't change.

The type is irrelevant , as realloc gets a void* and a size. But the standard says nothing about that a resize has to happen in-place if possible (which would be the case if it gets smaller or remains the same size). There is no requirement for the case that the requested size is the same as the memory block had before. That being said, most realloc implementations will probably hand out the same pointer again if size <= old_size; it just makes sense to move only when needed. But that behaviour is not required.

the compiler will rely on undefined behaviour all the time for any given set of optimisations, a lot of them are harmless. the compiler can't know if a use of undefined behaviour is intentional and safe, or unintentional and dangerous.

you could then argue that there are some cases in which undefined behaviour is obviously a mistake. and this is what the static code analysis in clang does. it's just not on by default.

Harmless or not, I'd like to have a way to know when the compiler is relying on undefined behavior for optimization, and use that to make an informed decision. I'd argue a very large number of the cases where undefined behavior is used by the compiler are actual mistakes in the code, and having a way to get those marked by the compiler would be useful.

well, this would mean you'd get feedback every time you declare a variable without giving it an initial value? (even if the value is never read from before being written to - i.e is entirely safe to do?)

or how about when you write a for loop?

    for(int i=0; i<LIMIT; i++)
the compiler assumes that i cannot overflow in this case, because that would be undefined behaviour. on the most obvious level, it doesn't need to do an overflow check on each loop iteration, if it exploits undefined behaviour. and no, it simply can't check the value of LIMIT before the loop, because i may be changed in the loop body, or LIMIT itself may change. this also allows a myriad of other loop optimisations. pipelining, vectorization, unrolling..

i don't think you realise just how often undefined behaviour is used in compiler optimisations.

Yes, and having the compiler tell me these things would be welcome. I'm not saying I don't WANT those optimizations, I just want to be aware when they happen.

for even simple programs, this would produce a very large output. we've become accustomed to the idea that C is a low level language, but actually it's somewhat high level. optimisations are in your code everywhere, and if a C compiler were to list even only the ones that exploit undefined behaviour.. well, it would essentially be noise.

There's actually a pretty simple way to determine whether the compiler is relying on undefined behavior for optimization. Simply evaluate the following two criteria:

1. Are you using a C compiler?

2. Are optimizations enabled?

If the answer to both of these is "yes", then the compiler is almost certainly relying on undefined behavior for optimization.

In what way is reflexivity broken in your example? I don't see anywhere that a==a doesn't hold.

For anyone like me who wonders what ARC is (my first thought was "pg's Lisp dialect?") it sounds like they are talking about Objective-C Automatic Reference Counting:


Tip to writers: Even if you think your audience knows what an acronym stands for, spell it out or provide a link the first time you use it. You never know when other people will see your article who are interested in the general area but unfamiliar with your specific topic.

It's kind of ironic that you use "pg" just a few words after the acronym you're complaining about, although I understand that a blog post and a comment aren't quite the same. Still, is it really necessary to define acronyms when they're easy to find and pretty fundamental to the language? A Google search for "objective-c arc" turns up nothing but hits that discuss Automatic Reference Counting. Should he also expand shorthand like "x86" or "WWDC"?

Ah, but how would I know that I need to search for "objective-c arc"? There's no mention of Objective-C until well past the midway point of the article.

And of course a search for "ARC" turns up all kinds of unrelated topics. In fact, the only software-related reference on the first page of a Google search is Paul Graham's Arc.

It would have been fine if the first mention of ARC in the article was more specific: "Objective-C ARC".

This is different from x86 or WWDC, where the very first match on a Google search is exactly what you'd be looking for.

(FWIW, I upvoted your comment because I always appreciate interesting feedback.)

Fair point. But on the other hand, Objective-C is pretty much all he writes about. Is it really necessary to mention it every time? I understand that it makes life harder if you come in for the first time, but you did figure it out, and I imagine he writes for his usual audience. Personally, I would say that this is a good case for making the HN title different from (more explicit than) the page title, although I know that that way often lies madness.

Really, I think this is just another example of how we should name everything using UUIDs rather than pronounceable chains of letters.

The author's surprise at the size of the ARC code seems to have led him astray from the actual bug in his code. The unoptimized version performs refcount manipulation of 'id' and crashes. The optimized version avoids refcount mucking and does not crash. Seems to me this strongly implies that an invalid 'id' is being passed into the function!

I was a little disappointed that we didn't get a resolution here, but it's interesting to see a little more of ARC under the hood:

"It isn't clear why those retains/releases were crashing, all the objects involved looked OK in the debugger, but at least we will no longer be puzzled by code that can't possibly crash...crashing, and therefore have a better chance of actually debugging it."

I was also expecting to see an explanation of why the crash was occurring, given the build-up.

Here's my theory:

One or both of the parameters were passed in from unsafe references, and the objects had been deallocated. According to a simple reading of the code, you would not expect this to be a problem, because the method doesn't attempt to access either of them (it's equivalent to passing an invalid pointer to a C function, and then not accessing it).

However ARC subtly changes the semantics of the language. Strictly speaking, the parameters are supposed to be valid, as ARC adds retain/release calls for them (the fact that those are normally optimised away is purely that - an optimisation). So the bug here was that invalid pointers were being passed to a method which does actually use them (albeit non-obviously). As soon as retain was called on the invalid pointer, it ended up accessing or calling to some arbitrary location in memory, hence the crash.

I'd be surprised if suppressing the crash actually makes this easier to debug. I'd much prefer to debug with the crash in place, than to try to determine what's wrong later in the execution of the program when the bug has had the opportunity to spread.

I'm also not very impressed with trotting out the assembly without using it to diagnose the crash. Particularly not if it's being used to disparage ARC (wrongly, I believe).

It's pretty weak. He doesn't tell us which line of assembly actually crashes, nor (presuming it's one of the calls to objc_storeStrong) what object is being manipulated when it crashes. There's no point in looking at your assembly code unless you're going to actually use it to investigate what's going on.

Turn on the "standard" optimization -Os and we get the following, much more reasonable result

That's still 250% more instructions than the necessary "xor eax, eax; ret". I find it a bit disappointing that the compiler would miss this trivial optimisation opportunity, while at the same time doing subtle and often unwanted things with undefined behaviour. C was conceived as a "portable assembly language", and as much as the language lawyers, theoreticists, and compiler writers love to exercise their "undefined behaviour" rights and try to dissuade others from thinking of it that way, that's what people use it for and that's what they'll expect.

Rather, undefined behaviour should be interpreted as "do the obvious thing, the results may differ on different platforms, and turn off all optimisations exploiting it because it's either intentional or the programmer has made a mistake so the compiler should also generate obviously wrong code." A warning would be nice too. Instead of blind adherence to the standard, consider what behaviour programmers actually expect, and write compilers accordingly. One example of this is casting integers to pointers of various types, and using them to access hardware. Device drivers would be impossible to write if the compiler decided that such undefined behaviour meant it could remove the accesses completely (which is completely legal according to the standard.)

More examples: dereferencing a null pointer should generate code that accesses address 0. Signed integers should wrap around on overflow. Shifts should do whatever the shift instruction of the architecture does. Trying to access beyond the end of the array should try to access memory there. Division by zero will do what the machine instruction would do. Etc.

The 250% more instructions are the frame pointer, which is how your debugger figures out the stack trace when you happen to break execution at the "xor eax, eax". Use -fomit-frame-pointer to get rid of that. That has nothing to do with optimization.

Those extra instructions shouldn't be necessary even when debugging, since that's what the -g option is for; to generate information the debugger can use: http://yosefk.com/blog/getting-the-call-stack-without-a-fram...

When optimisation is enabled, all extraneous instructions should disappear.

The C standard even describes the probable result of undefined behaviour as ranging from "ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment". But for some reason nobody seems to pay any attention to the second possibility.

I don't know why people like the current situation so much. Strikes me it's bad for everybody. If you know how the underlying system works, you'll be confused by this second pile of strange rules that you now have to keep on top of; if you don't know the underlying system works, you're no better served by these new rules than you would have been by simply finding out. And it's not even like most systems these days vary all that much! - assuming you can even find something that isn't ARM/x86/x64 in the first place <-- this statement may be mildly tongue in cheek.

Very strange.

My favourite link on the subject is http://robertoconcerto.blogspot.co.uk/2010/10/strict-aliasin.... "The C compiler writers have no idea who their users are" seems plausible :)

Why were they compiling without the optimizer (-Oo) in the first place?

Presumably to aid in debugging- that's why I do it with native languages...

It's annoying when people write things like "13" MBPR" and not concrete CPU model...

I was really thinking it'll be about arc lisp

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact