Hacker News new | past | comments | ask | show | jobs | submit login
Mulle-objc: A New Objective-C Compiler and Runtime (mulle-objc.github.io)
102 points by WoodenChair on Nov 26, 2016 | hide | past | favorite | 41 comments

The biggest change so far seems to be a new memory management style called AAM, "Always Autorelease Mode". It's described on the compiler's Github page:


"The transformed methods will return objects that are autoreleased. Hence the name of the mode. The net effect is, that you have a mode that is ARC-like, yet understandable and much simpler."

This is like a halfway house between traditional Objective-C's "semi-manual" memory management, and the modern ARC (Automatic Reference Counting). Autorelease pools were a cornerstone of the traditional model: autoreleased objects will typically stick around until the end of the event loop cycle, which was a reasonable way to do garbage collection in mostly UI-centric Obj-C apps.

ARC replaced this model with static reference tracking in the compiler: instances are freed when the compiler knows that the last reference has gone out of scope. This is obviously more memory-efficient than the autorelease pool, and that was probably a big factor in Apple's decision to go with ARC (as the primary focus of Obj-C had shifted to iOS).

In summary, the Mulle compiler tries to combine the programmer ease-of-use of ARC with the simpler compiler/runtime implementation of traditional Obj-C memory management. The tradeoff is memory usage and incompatibility with some modern code that assumes an ARC compiler.

Here's a list of language features that the compiler will support:


Personally I agree with the author's positions in principle, but not entirely in practice. In particular, not supporting the property dot syntax will make it unnecessarily difficult to compile most Obj-C code written since about 2009. Same goes for blocks -- they're very integrated in Obj-C APIs by now, even though it does add substantial runtime complication.

> autoreleased objects will typically stick

> around until the end of the event loop cycle

They stick around until their reference count drops to zero. The -autorelease is turned into a -release when the containing autorelease pool is released, so you have full control over when that happens.

   @autoreleasepool {
      [[[NSObject alloc] init] autorelease];
The object is freed at the end of that block.

Yes, there is a default autorelease pool for the event loop.

I know. That's why I wrote "typically".

> This is obviously more memory-efficient than the autorelease pool, and that was probably a big factor in Apple's decision to go with ARC

Another big factor was the failure of implementing a tracing GC.

It had even more corner cases than ARC, it was quite easy to make it crash given the underlying C semantics and developers mixing frameworks, not all of them GC enabled.

The good idea behind ARC is that just automates what framework developers were already manually doing with Cocoa.

besides being substantially integrated, blocks are just hella-handy. objc programming got way more pleasant when blocks were added.

The best thing of this was the link about "Spirit of C", which shows that even ANSI agrees thrusting the programmer is no longer a good idea.

"12. Trust the programmer, as a goal, is outdated in respect to the security and safety programming communities. While it should not be totally disregarded as a facet of the spirit of C, the C11 version of the C Standard should take into account that programmers need the ability to check their work."

In practical terms its not very dramatic. C11 allows you to check and set exact alignment with alignof alignas. There are also bounds checking (_s) functions that set errno messages on overflow. Also _Generic can give automatic type deduction in certain cases.

All of these things simply help you 'check your work'. Bounds checking has been standard good practice for years now so its not a surprise they put that in the standard library. There is still no magic and its the programmer who has to put forward the effort to check for errors.

> Bounds checking has been standard good practice for years now so its not a surprise they put that in the standard library.

Not really, It has been removed from the standard library and made an optional Annex. A C11 compliant compiler is not required to provide them.

For me those "secure" functions are just as insecure as their cousins, because they still require C developers to track pointers and bounds separately.

The only thing they improve in regards to security is not forcing us to manually place a null character at the very end, in case the destinations are filled up. But the danger of passing the wrong length, origin or destination is still there.

I think threads, atomics and bounds checking were all TR proposals up until C11. Then they added the conditional (optional) features. Threads and atomics are mandatory and bounds checking and 'analyzability' are conditional.

If you have to deal with strings in C more then likely you have written a string interface with bounds checking. Its not clear if it needs to be standard so its good that its optional. Icc, gcc and clang support it so I would not hesitate to use it.

My point was that 'safety' in C is still more or less a process of training the programmer rather then a language feature. And that was and still is a design goal. But at least now you can point to the bounds checking functions and other new features as an example of how to get there.

I'm not going to argue whether C is safe. Or whether XYZ is safer then C. Thats way of topic.

I wasn't aware that there was something wrong with the old compilers. Reading the page, I'm still not aware of the motivations behind this.

What does this compiler do better/different than clang or gcc?

Apple's objective C runtime is not portable. There's a lot of assembly code and platform specific code. This is trying to fix that.

Doesn't at least objc_msgSend have to be assembly?

You can do it in plain C, but it will be quite slow.

It isn't only written in Assembly, it is written in clever optimised Assembly fine tuned across several compiler releases, to minimise as much as possible the impact of dynamic dispatch.

I don't remember why, but years ago I remember coming to the firm conclusion that it was impossible to write it in anything but assembly.

Don't tell that to the GCC Objective C runtime.

With the NeXT/Apple runtime,

    [object selector: arguments...]
is syntactic sugar for

    objc_msgSend(object, @selector(selector:), arguments) 
With the GCC runtime, that's syntactic sugar for:

    IMP tmp = objc_msg_lookup(object, @selector(selector:));
    tmp(object, @selector(selector:), arguments...)

No assembly necessary. I believe there are also compiler flags to assume the function won't be replaced behind your back so objc_msg_lookup can be hoisted out of a loop, etc.

objc_msg_lookup is very emphatically not objc_msgSend.

Lookup operates differently from the nominally tail-calling msgSend in that it cannot support transparent forwarding (where the receiver is switched to another object before dispatch.) Since objc_msgSend is given control over the final call, the destination can be swapped if necessary.

It's not possible[1] to implement objc_msgSend in plain (portable) C, as it needs to pass on an unknowable number of arguments without damaging the ones passed in registers, the stack frame, or the optional hidden return value pointer. Any methods attempting to do so using, say, naked functions and the method's type encoding will be more fragile than a perfect argument forwarding stub written in assembly.

[1]: __builtin_apply_args, __builtin_apply and __builtin_return (https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Constructing-Ca...) notwithstanding.

GCC's C is emphatically not "plain" (ANSI/ISO) C, but that does not mean it isn't "portable" C, given that GCC itself targets many platforms, and other compilers (clang, at least) do attempt to support GCC's dialect.

I couldn't find documentation that supports Clang offering the __builtin_apply intrinsics, but their availability would make a C objc_msgSend somewhat more palatable.

Ah. I don't mean to be too argumentative. You've helped me to understand why objc_msgSend() could not be done in standard C, and I thank you for that :)

However, in the case of GCC's ObjC implementation, I don't see why it couldn't be implemented in GCC's dialect, thus obviating any need for assembly.

Pardon me while I reflect on (and lament) the fact that we're still stuck using such crude and ancient tools for systems programming...

FWIW, it appears that Mulle-objc gets around this by actually changing the calling convention for Obj-C methods, so the IMP takes a single parameter (beyond self and _cmd) which is a struct that contains all of the method parameters. This way the compiler can just synthesize the struct for you, and then objc_msgSend() itself can be written in C because it just needs to pass the pointer to the struct along to the IMP rather than messing about with forwarding arbitrary arguments.

Interesting. Thanks!

I don't see anything w.r.t. objc_msgSend() that obviously prevents it from being written in C, as opposed to objc_msg_lookup(). I mean, worst case scenario, could it not just be:

    id objc_msgSend(id object, id selector, ...) {
        IMP tmp = objc_msg_lookup(object, selector);
        return tmp(object, selector, ...);
(forgive me for any mistyping here, as I'm not an Obj-C programmer, and for the varargs shorthand, to keep things simple)? Especially given that GCC supports some extensions to C that could reduce the overhead a bit (tail-recursion elimination, for example), I don't see why such a function couldn't be implementable in (GCC) C.

Would you be willing to enlighten me?

It's the ... that's the hard part; I've explained a little more here: https://news.ycombinator.com/item?id=13046391.

Conceptually, it is:

    IMP tmp = objc_msg_lookup(object, selector);
    goto *tmp;
but C can't handle that (and even with gcc computed gotos you need to preserve any argument registers and not screw up the stack).

objc_msgSend has a variable number of arguments and no way to determine how many there are.

I don't see why the naive approach to a possible implementation cannot be done in C, even if using extensions to ANSI C.

No. Why? What's magic about assembly code that means you couldn't do the same thing in C?

the objc_msgSend function doesn't know what it's return type is or how many arguments it has. It needs to lookup the actual function then jump to it without modifying the stack or any of the parameter registers. It simply doesn't follow the calling conventions and C can't handle it.

Right, true. I guess I'm used to working on languages where you can stick everything into a uint64, but that isn't the case with something like returned structs. if you controlled the calling convention you could do it in C.

It appears that controlling the calling convention is exactly what Mulle-objc is doing. It's layering a new convention on top of C calling conventions, where objc_msgSend() is given a pointer to a struct containing the arguments. This way the compiler can synthesize the struct for you, and objc_msgSend() itself can be written in vanilla C.

While in this case I can't think of any specific reasons beyond performance to need to implement objc_msgSend in assembly, there are absolutely instances where assembly is the only option.

Example: when performing a context switch in a kernel, assembly is required since C abstracts away architecture-specific details necessary to swap out the register set, swap the page directory pointer, etc.

My ObjC-fu is pretty old --- the last time I looked at it seriously was when poc was still a thing --- but my understanding is:

Traditionally objc_msgSend is in assembly because it doesn't know how many and what types the message parameters are. So it can't portably preserve them while it does the computations to find out where to send the message to.

This project's alternative obj_msgSend always passes parameters in a structure. That helps, because now you know that all message calls have three parameters; the destination object, the selector, and the parameter block.

However... they don't always pass the return value in the message block; instead it coerces any integer types to void* and returns them in the usual way, only using an extra field in the parameter block for those values which aren't coerceable to void.

This does make me kinda unhappy. I don't think* it'll be a problem, but... it's definitely pushing the limits of what C allows; round-tripping arbitrary integers through pointers is one of those things that's not technically allowed but all real-world compilers support, so, um. I'd much rather they just pass everything through the parameter block.

But yeah, it's a decent enough idea, which simplifies things considerably at not a lot of performance hit. I'm kind of surprised that ObjC doesn't already do that.

My main disappointment is that this is a clang variant and not a standalone preprocessor. Since poc died there isn't a working ObjC preprocessor, which means if you want to use ObjC you're limited to gcc or clang. An ObjC->C preprocessor would solve that.

I think any real world systems C project will have undefined behaviour. I'm sure HotSpot, V8, .net etc are all totally undefined.

The author says to be able to write objective-C in the future. I can get that. I did like objc. He also lists some good technological goals like speed, portability, and the rest on the page.

There is some more information available here: https://www.mulle-kybernetik.com/weblog/2015/mulle_objc_a_ne...

Apple used to be the sponsor of Objective C for a long time. Was Apple's focus on Swift prompting this? From his documentation page:

> About mulle-objc

> When I wrote the Optimizing Objective-C Article Series, Objective-C had pretty much weathered its first hostile obsolesence attempt by Java. Now a decade later, it looks like the time has come, that I have to save Objective-C - at least for myself.

Apple is pretty open that they plan to use Swift to replace C and Objective-C on their stack.


"Swift is a successor to both the C and Objective-C languages."

Of course this is something that will not happen overnight and the language is still has some corners to fine tune.

Objective-C improved a lot since the NeXT days, but the C part is still there, regardless of the safety improvements it got on top.

Also since Swift has been announced, the only language improvements Objective-C got were related to improving the interoperability between both languages.

So I imagine some devs that are passionate about Objective-C might not like the day Objective-C joins Carbon and MPW.

I surely did not like when that hapenned to Object Pascal, Mac Common Lisp or Dylan.

> I surely did not like when that hapenned to Object Pascal, Mac Common Lisp or Dylan.

The difference here being that Objective-C is unsafe, and its relatively low level of abstraction in comparison to the aforementioned languages, make it less than ideal as an applications language, especially compared to those languages. As you say, "the C part is still there".

I'm with you. I want to be using safer languages, both because there is inherent value in safety, and because safety tends to come with more and better tools for building abstractions.

While I lament the loss of Obj-Pas, MCL, and Dylan, I whole-heartedly welcome the arrival of Swift. Even if it means nothing to me directly anymore as an apostate of the Apple platform, I hope that it will fuel a push for better, safer languages across the board. In many ways, I think technologies like MCL and Dylan were victims of being ahead of their time.

These are exciting times to live in as a programming language geek/snob, as the lessons of the past are finally rediscovered, absorbed, and made practical, popular, and cool :)

Fully agree with you.

While I find this an interesting experiment, after nearly 18 years of Objective-C dating back to NeXT I have no desire to go back after switching to Swift. The future (at least some part of it) belongs to modern languages like Swift, Rust, Clojure, etc.


"Let's make Objective-C great again"

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact