
An Illustrated History of objc_msgSend - twsted
http://sealiesoftware.com/msg/index.html
======
pavlov
It's cool to get a look at a piece of assembly code that seems oddly familiar
from all those times when my app has crashed in objc_msgSend due to a
prematurely released object... (I still write non-ARC code for compatibility
reasons. Sigh.)

I find the following claim slightly misleading:

 _32-bit x86 was quickly overtaken by 64-bit, so this code received little
attention after Tiger and many of those inefficiencies remain to this day._

There are still major 3rd party apps that run in 32-bit mode on the Mac.
Google Chrome is probably the most popular.

~~~
mikeash
It's perfectly correct from Apple's perspective (which is the perspective from
which this article is written). Apple started shipping 64-bit Intel Macs less
than a year after they started shipping Intel Macs at all. OS X 10.4 was the
first version to support Intel, 10.5 was the first version to fully support
64-bit, and by 10.6 most of the system was running 64-bit. 10.7 dropped
support for 32-bit Macs entirely.

Apple had little reason to improve 32-bit since apps can simply target 64-bit
if they want the speed improvement. From Apple's perspective, Chrome makes a
deliberate choice to run in that environment and they have no particular
reason to help them out.

~~~
jamesaguilar
Anyway, since Chrome is mostly C++ code, especially in the inner loop, I doubt
that a faster objc_msgSend would noticeably improve performance.

------
taspeotis
Does anyone know if this function is actually written in asm? Case in point:

    
    
        The code was probably originally written for NeXT
        by an engineer who was familiar with load-store
        architectures like PowerPC but not so familiar with
        register-memory architectures like x86 ... Those of
        you who do know x86 better may be able to identify
        some of the inefficiencies in this code
    

Why wouldn't/couldn't this function be written in C (are there any
instructions that would need some non-portable intrinsics?) and leave it to an
optimizing compiler to get the instructions right. Sure, sending messages is
low-level and needs to be high performance but that, to me, doesn't
necessitate "we have to do this by hand" asm instead of C.

~~~
chrisdevereux
objc_msgSend would actually be impossible to implement in C.

Its role is to look up the function pointer that implements a method, then
call that function pointer with the arguments passed to objc_msgSend,
returning the result of the implementation.

Implementing objc_msgSend in asm guarantees that the argument & return
registers aren't touched by the method lookup. Similarly to how ffi libraries
and implementations of setjmp/longjmp need to be implemented in asm, it
operates at a lower level than C's stack & function abstractions.

~~~
mikeash
To elaborate on this, a C implementation of objc_msgSend would look something
like this:

    
    
        id objc_msgSend(id self, SEL _cmd, args...) {
            IMP methodImplementation = ...; // look up the method however
            return methodImplementation(self, _cmd, args...);
        }
    

But there is no facility in C that lets you take arbitrary additional
arguments and then pass them all to another function unchanged.

(In theory you could accomplish this using variadic functions. But then every
method would have to take a va_list for its parameters instead of just taking
a regular parameter list, breaking the idea that ObjC methods are just C
functions with two implicit parameters. It would also be substantially
slower.)

------
gilgoomesh
Fascinating stuff. It's so weird to realize that such a critical piece of code
is still changing significantly after 10 years at NeXT and 10 years at Apple.

~~~
kryptiskt
I think it's weirder that so often such critical bits still are the makeshift
unoptimized first version. But I know how it works in the real world, it's
good enough and there are no reported bugs, so don't touch it until you have
implemented all these features and fixed all these issues....

------
nviennot
Related: If you want to see what messages are being passed around in your
programs, you can use this: [https://github.com/nviennot/objc-
tracer](https://github.com/nviennot/objc-tracer)

~~~
adamnemecek
While this was probably an interesting exercise, I should remind everyone that
one can get the same functionality by setting the environment variable
NSObjCMessageLoggingEnabled to YES.

------
fyolnish
Where does he get that code from? The actual objc_msgSend function is much
much longer and employs a bunch of other tricks.

~~~
mikeash
Where do you get that idea? What's shown is it. There's a lot more to the
message send system, but the rest is helpers that live outside of objc_msgSend
and aren't on the fast path. There's no _room_ for "a bunch of other tricks",
because objc_msgSend needs to be as fast as possible, and the more it does,
the slower it'll run.

~~~
jws
In the Mavericks version there are two jumps to labels that are not present in
the code shown. One is for tagged pointers, the other is for a cache miss. The
cache miss is probably interesting, tricky code.

Apropos nothing… the cache search algorithm is "enter at predictable point,
linear search everything". When one thinks of all the PhD time spent
developing and analyzing search data structures, this is a good reminder that
at small N, other factors dominate.

Also, adding a one byte instruction prefix that you know will be ignored
because your processor is happier if instructions aren't so bunched up is so
close to witchcraft that the author should avoid campfires with stakes in the
center.

~~~
mikeash
The cache miss may or may not be interesting, depending on your interests, but
it's all written in C and is much longer and more boring. As soon as you miss
the cache, you're off the fast path, so the cache miss code in objc_msgSend
just does the requisite register preservation and then calls into C.

------
program
As you can clearly see from the code it's perfectly acceptable in Objective-C
to send a message to nil. This is a powerful feature that prevents the program
from crashing but it is confusing for beginners.

~~~
pavlov
It doesn't seem particularly confusing. It's pretty much the first thing you
learn: square brackets send a message, and if there's no recipient, the return
value will be zero. (Granted, it used to be more complicated in the PowerPC
era, when e.g. a selector that returns a double was not guaranteed to return
0.0 with a nil receiver, but those cases were fixed in the runtime.)

I suspect the messaging-to-nil behavior is mostly confusing to intermediate
programmers who only have experience with Java's passive-aggressive null
references.

~~~
program
In my opinion the confusing part is not about SEGFAULT or
NullPointerException. Let's make and example:

    
    
       NSArray *obj = nil;
       [obj count]; // return 0
    

In theory a nil object can hide in your code for a long time if you only use
methods that can return 0.

~~~
vor_
It's a trade-off, because it's convenient to not have to check that objects
are non-nil when you send messages to them. With experience, you learn to deal
with code that requires non-nil objects via assertions:

    
    
        NSAssert(array, @"Array can't be nil.");

------
jarjoura
Would be great to see the changes made in the ARM variants too. I know there's
no value in compiling libobjc for ARM, but at least for academic value.

