
Dissecting objc_msgSend on ARM64 - ingve
https://www.mikeash.com/pyblog/friday-qa-2017-06-30-dissecting-objc_msgsend-on-arm64.html
======
pakl
> objc_msgSend is written in assembly. There are two reasons for this: one is
> that it's not possible to write a function which preserves unknown arguments
> and jumps to an arbitrary function pointer in C.

Wow... this is a bit off topic but can anyone expand on this side note and
explain why?

(Every Objective-C implementation _requires_ assembly code?)

~~~
mikeash
A C implementation of objc_msgSend would look like:

    
    
        ... objc_msgSend(id self, SEL _cmd, ...) {
            fptr = ...lookup code...
            return fptr(self, _cmd, args...)
        }
    

There's no way to express that args... argument when calling the function
pointer, and no way to express forwarding an arbitrary return value.

However, Objective-C does not require objc_msgSend. With objc_msgSend, a
method call site generates code that's essentially equivalent to (for a method
that takes one object parameter and returns void):

    
    
        ((void (*)(id, SEL, id))objc_msgSend)(object, selector, parameter);
    

In other words, take objc_msgSend, cast it to a function pointer of the
correct type, and call it.

Instead of objc_msgSend, the runtime can provide a function which looks up the
method implementation and returns it to the caller. The caller can then invoke
that implementation itself. This is how the GNU runtime does it, since it
needs to be more portable. Their lookup function is called objc_msg_lookup.
The generated code would look like this:

    
    
        void (*imp)(id, SEL, id) = (void (*)(id, SEL, id))objc_msgLookup(object, selector);
        imp(object, selector, parameter);
    

However, each call now suffers the overhead of two function calls, so it's a
bit slower. Apple prefers to put in the extra effort of writing assembly code
to avoid this, since it's so critical to their platform.

~~~
tom_mellior
> There's no way to express that args... argument when calling the function
> pointer

Yes there is: va_list.

> no way to express forwarding an arbitrary return value

Of course there is, and lots and lots of language runtimes implemented in C
use those ways. Usually it boils down to having a base type called Object or
Value and passing around pointers to that. In fact, from your example it looks
like the "id" type is meant to play this role.

This is not syntax checked, but the code above would be something like:

    
    
        Object *objc_msgSend(id self, SEL cmd, ...) {
            fptr = ...lookup code...
            va_list args;
            va_start(cmd, args);
            Object *result = fptr(self, cmd, args);
            va_end(args);
            return result;
        }
    

Yes, this can be faster in assembly, but it's not true that there is no way to
express this. (Unless I'm misunderstanding something.)

~~~
mikeash
These are ways to simulate it. Of course you can simulate it; the language is
Turing-complete, after all. But it does not actually _do_ it. You can write
something similar to objc_msgSend in C, but you cannot write objc_msgSend in
C.

Using varargs and passing va_list into the method would mean that your method
is no longer a plain C function with the declared parameters plus two hidden
parameters. It's now a different sort of beast, and has to use va_ calls to
extract the values. This would require a lot more work in the method, and hurt
performance.

Returning everything as an object would mean boxing and unboxing primitive
values at every call, which would be horrendously inefficient.

And if you don't care about extracting every last bit of performance, it's
much easier to do the lookup approach I discussed than it is to faff around
with varargs and wrapping return values.

~~~
revelation
va_* is not a simulation. It compiles down to the exact same stack accesses.
There is no list. It is a plain C function. It is the same calling convention.
No boxing.

This is plain false.

~~~
lgg
It depends on the platforms C ABI, but no, the argument marshaling for va_args
is not necessarily (or even usually) the same as normal args. In the case of
iOS you can look here[1], the relevant bit being: "The iOS ABI for functions
that take a variable number of arguments is entirely different from the
generic version."

This actually manifests in errors if you directly call objc_msgSend, which is
why in order to guarantee direct codeine you need to cast objc_msgSend to the
actual prototype you want[2]:

"An exception to the casting rule described above is when you are calling the
objc_msgSend function or any other similar functions in the Objective-C
runtime that send messages. Although the prototype for the message functions
has a variadic form, the method function that is called by the Objective-C
runtime does not share the same prototype. The Objective-C runtime directly
dispatches to the function that implements the method, so the calling
conventions are mismatched, as described previously. Therefore you must cast
the objc_msgSend function to a prototype that matches the method function
being called."

1:
[https://developer.apple.com/library/content/documentation/Xc...](https://developer.apple.com/library/content/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARM64FunctionCallingConventions.html)
2:
[https://developer.apple.com/library/content/documentation/Xc...](https://developer.apple.com/library/content/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARM64FunctionCallingConventions.html)

~~~
revelation
This is C, I'm talking C calling convention (and x64, which is the same).
Caller cleans up the stack, so va_list is a zero cost abstraction.

Citing the bastard architecture of iOS isn't really making the case for
"usually".

~~~
mikeash
Requiring the caller to put all arguments on the stack isn't "zero cost." For
a non-variadic call on ARM64, the first eight parameters (or more, if some are
floats) will be passed in registers without ever touching the stack.

On x86-64, the caller also has to set %al to the number of vector registers
used for the call, and the compilers I've seen always check %al and
conditionally save those registers as part of the function prologue. Cheap,
but not "zero cost."

~~~
revelation
va_ doesn't change the calling convention. Parameters passed as registers
continue to be passed as registers.

We could probably argue this some more but I suggest you simply try it with a
compiler..

~~~
mikeash
Good idea!

[https://gist.github.com/mikeash/ce38d3a77b88734a9e0e9dc3f352...](https://gist.github.com/mikeash/ce38d3a77b88734a9e0e9dc3f352fbc7)

You'll notice how `normal` takes all of its arguments out of registers `x0`
through `x7` and places them on the stack for the call to `printf`. And you'll
notice how `vararg` plays a bunch of games with the stack and never touches
registers `x1` through `x7`. (It still uses `x0` because the first argument is
not variadic.)

On the caller side, observe how `call_normal` places its values into `x0`
through `x7` sequentially and then invokes the target function, while
`call_vararg` places one value into `x0` and places everything else on the
stack.

So, no, it looks to me like varargs _very much_ change the calling convention.

------
rurban
I'm a bit sceptical about the mandatory PIC (method cache) as hash. Usually
you put the most common classes into a small array upfront and search and
extend just that. The hash lookup would come in the slow part then. With
assembly it's easy the create the self modifying PIC, from eg. 0-3.

------
nimrody
Even the method cache seems expensive -- compared to an indirect function call
(using a virtual method table). Can clang replace msgSend with direct calls
when the destination class is known at compilation time? (perhaps with a guard
to verify that the object class is as expected)

~~~
thought_alarm
You have that option in performance-critical situations. The ObjC runtime does
allow you to lookup a method's underlying function pointer ahead of time in
order to bypass obj_msgSend.

The compiler can't do that automatically because the any method and any class
can be replaced at any time. Key-value observing is a common feature that
replaces method implementations on the fly at runtime.

~~~
plorkyeran
KVO creates a new subclass and changes your object to that subclass rather
than replacing methods on the original class, so checking the isa pointer
would be sufficient for that case. Method swizzling would break, but I suspect
that most obj-c code could be compiled without support for swizzling without
breaking anything.

~~~
icodestuff
Swizzling is not the only thing that would break. You'd break categories in
dynamically loaded frameworks or bundles, and yes, you'd still break KVO
because it replaces -dealloc and -class on the newly-created classes.

You'd also break dynamically adding methods to classes.

~~~
CodeWriter23
What is a typical use case for dynamically adding methods to a class?

------
tinus_hn
Great to see this series started again, to see articles with such an in-depth
take is rare.

