> As long as you cast it to the correct type, it always works. The new way of doing things thus encourages doing things correctly and makes it harder to do things wrong.
This seems like a really interesting case where removing type information, removing something was presumably thought to be a safety guard, and requiring type casting, makes the type safety stronger by forcing the user to do it explicitly. Maybe it makes sense that dynamic calling mechanisms should require casting (and static type checking is less safe) where static calling mechanisms should use static type checking (and type casting is less safe).
> C specifies that certain types get promoted to wider types when passed as a variadic argument. Integers smaller than int (such as char and short) get promoted to int, and float gets promoted to double. If your method signature includes one of these types, it's not possible for a caller to pass a parameter as that exact type if it's using a variadic prototype.
Holy crap! I’ve used these things for many years without knowing this important tidbit. Does anyone know if it’s the same in C++, or whether GPUs are currently ignoring or following the spec?
> When there is no parameter for a given argument [...] If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions.
Sections 4.5 and 4.6 spell out in detail what promotions are applied.
Were you reading "variadic arguments" as for templates or function arguments? (Because I think this only applies to functions, as C lacks templates, and I haven't ever seen varargs in CUDA (is that a thing?).
Hey again! I was thinking only functions, not templates, just wondering if C++ had changed things to not require the type promotion. CUDA has printf, but I suspect it’s implemented differently, not according to the C variadic function spec, reflecting on some of the weird ways that I’ve seen printf fail in CUDA code...
Whoa, this is quite the rabbit hole. This commit [1] and it’s partner [2] suggest that at least around 2016, there is some really special case “yeah, but we really want printf”. I couldn’t find the CUDA runtime implementation on my phone, but now I’m curious what the PTX for a device function that calls printf looks like (like perhaps it does a bunch of stuff to emit a call to the host’s C printf, except it has/had to do so in a sort of strange way of building up buffers of things to print / exiting and coming back afterwards).
I’m going to look it up Monday, pretty sure there is something surprising in the printf impl, and IIRC there’s like no type safety at all. I vaguely remember writing buggy code that printed out data that I didn’t pass to printf.
That’s common though for variadic functions, particularly printf! Many printf implementations will blindly do a va_arg (popping the stack pointer by 4 or 8 bytes) if the format string says so. If your call to printf looks like:
printf(“%d %d %d %d”, x, y);
then you’re going to “read” two more ints from the stack at the call site. clang (and later gcc, I think) thus have a semantic analysis pass that understands/parses printf-style formatting strings so it can tell “hey, you didn’t pass enough arguments” or just as frequently “you passed too few”.
Yeah, of course, but that doesn’t by itself lead me to the conclusion that those are all first converted to another type before being passed on the stack. They could be passed as-is and type-cast inside the printf implementation.
The more interesting printf example would be %f, since according to what I learned today, it’s not a float, it’s a double.
Are you saying you already knew doubles were passed in variadics instead of float, and the reason you knew is printf?
I know from having to implement va_start/va_arg/va_end and it comes down to ABI. The challenge is how you would have either the caller or callee have a consistent piece of code for pushing/popping the stack.
The int case is pretty clear, because whatever you go with a 32-bit register like most args in the C-family of ABIs, and it only matters for the register pressure (you're really just zero extending for char and short). I do find the float/double case clumsy, as is the int64 case, but it falls out of the rules for casting as mentioned and the ABI churn for both floating point and AMD64.
> They could be passed as-is and type-cast inside the printf implementation.
How would you know what to type-cast FROM? It's not like C compilers pass type information around, nor are C values self describing.
> Are you saying you already knew doubles were passed in variadics instead of float, and the reason you knew is printf?
I can't speak for the person you're answering to, but in pre-ANSI C, these promotions were ubiquitous (since there were no function prototypes yet), so programmers were very likely to have heard about them at some point.
it's just that it burns tons more registers than the promotion scheme does. And you'd still end up with a mini "interpreter" in the callee to at least switch over types.
No problem, Dave! I had to do precisely this for PHP’s variadic arguments. I think HHVM made up even more calling conventions.
Fwiw, the next optimization is to “compress” the type signatures into a handful of bits, to save registers. But it’s still not that big of a safety win for actual C code because of the clumsy switch inside the callee.
Quite the opposite — it's very precisely defined behavior if you bother to read the applicable standards. It's a rather useful exercise, and I'd recommend it to anyone who makes a living using a language.
Are there any conditions where the compiler can optimize away the call to objc_msgSend? Or is it always used for any call between 2 ObjC/Swift methods?
In general, no, as message sends are observable and swizzlable so they they must go through the bare minimum of objc_msgSend just to make sure that they can be called from the cache. The only exception to this that I know is the retain/release methods, which are rarely overridden and called extremely often, so they have code that doesn't go through the message sending machinery unless necessary (in the case of a custom implementation). And of course, pure Swift calls don’t use the Objective-C runtime at all so they can be optimized as usual.
Always wondered about this too, but there doesn't seem to be a way of knowing which implementation your code will end up calling in Objective-C. "Final" would be helpful but there's no "final" in the language.
Speculative inlining driven by heuristics or PGO should always be possible. Same thing is done for virtual functions in C++ (or any indirect call really).
Objective-C's OO model is far more dynamic than you may realize. There're no guarantees even on the object reference type at run time, i.e. that the method you are calling will be applied to an object of a specific type or compatible. ObjC is very permissive in this regard, though it can give warnings in certain cases but they are never 100% precise (from my experience anyway). You could run analysis on the entire end product and still be unsure of what's going to happen at run time.
This is also true for languages like Self as well which pioneered inline (and polymorphic) caches in the first place. Self did it using JITs, but you "only" need a sufficient hit rate to make up for the check+branch to justify inlining the most likely option(s) at compile time.
The interesting thing is that by inlining, for the inline case you will often gain additional type information. E.g. to take an example from Ruby, since I don't know Objective C very well. In isolation you have no way of telling what type "foo + 1 + 2 + 3" will return, as it depends entirely on "foo". But lets say most call-sites calling the method where this expression is found passes an integer.
If I can guarantee that "foo + 1" is an Integer addition, then I know it will return an Integer, and so I know the same addition method will be used for the next addition (and by extension the next as well), so I can turn the above into the following Ruby-ish pseudo-code instead:
if foo.is_a?(Integer)
# By recursively inlining, I know not just the type of foo, but the type of the full expression.
inlined foo + 1 + 2 + 3
else
foo.send(:+, 1).send(:+, 2).send(:+, 3)
end
Even when you can't safely inline the actual calls, you can often elide checks or resort to more specialized method caching.
Yes, of course with a JIT you can do much better than an AOT compiler, and for dynamic languages is pretty much required to get reasonable performance.
Well, compared to profile guided optimization as mentioned by the other commenter earlier, that's really only the case if the profile of called methods vary greatly between runs.
The polymorphic inline caching from Self for example is guided by collecting simple stats. Tracing does the same. A JIT ensures those stats are always completely up to date, but nothing stops you from saving it and using it for an AOT compiler as well.
But often even that is overkill, as you can often statically deduce a lot about the types a method is likely to get called with by simply looking at the call sites, and most programs have very static call profiles.
Doesn't matter, at some point there's a indirect function call and the compiler can try to guess the target, inline it and add an address check that, on failure falls back to the slow path.
This actually seems like it’d be quite beneficial, as I’d assume 90+% of method call targets can be statically guessed just by looking at the code (to increase this ratio even more, I’m sure Apple could even ignore Cocoa methods that use the forwarding machinery).
I don't know about Objective-C specifically, but generally one way to achieve this in most dynamic languages is a speed/memory tradeoff.
In my Ruby compiler project which has to deal with the same level of dynamic behaviour, I handle dynamic overriding of methods with C++-style vtables, which turns method calls into the equivalent of this C-ish pseudo-code:
(* ob->vtable[some_method_offset])(args...)
Since Ruby classes can have a method_missing handling undefined methods, for any class that doesn't implement a given method, the vtable contains a pointer to a thunk that tweaks the stack to push the relevant method symbol as the first argument and then does the same as above with the method offset of method_missing.
Since Ruby classes can have methods overridden at any time, if class Bar inherits from class Foo, inherits from class Object, and I override a method in class Object, that again has previously been explicitly overridden in class Bar, this would happen:
- Store pointer to method in Object in ptr.
- Replace method pointer in Object's vtable.
- Iterate over all direct sub-classes of Object (but here we only care about Foo)
- Compare the same method offset against ptr. Since Foo has not overridden the method, it matches.
- Replace the pointer in Foo, and iterate over all direct sub-classes of Foo (but here we only care about Bar)
- Compare the method offset against ptr. Since Bar has overridden the method, it doesn't match, so leave it alone.
This means that as long as method overrides doesn't happen extremely frequently, the cost of method overrides is relatively low: iterate over all the descendant classes of the class you override a method in.
Method calls on the other hand are about as cheap as virtual method calls in C++, except when you hit method_missing where this approach gives you a very low extra overhead of tweaking the stack to add the symbol and jumping to the method_missing implementation.
This overall approach works for most dynamic languages. The caveat is memory - if you have an application with very large class hierarchies in a language where they are all singly rooted (as in Ruby where they all ultimately inherit from SimpleObject), each vtable will cost you at last pointer_size*global_number_of_method_names. In practice so far I've not seen all that many cases where this is a problem, and it's always possible for the compiler to set a roof above which it will resort to a slow send mechanism (because you'll need to support that anyway in any language that allows dynamic send mechanics; e.g. in Ruby you can always send a message to an object by a dynamically obtained symbol so you still need the equivalent of objc_msgSend as well).
A slightly cheaper approach in terms of memory was described by Michael Franz[1]. His approach was to group methods in interfaces, so instead of a vtable of method pointers, you had a vtable of pointers to interfaces with pointers to methods. You save memory as most classes would typically implement most or none of the methods of an interface; it provides potential namespacing of the methods if you want to do that, and you can cut memory further by re-using the same vtable for an interface until someone tries to override at least one method in it. The cost is one extra indirection at call-sites.
Generally impractical for real-world ObjC, which has around 100,000 method names.
Apple ObjC used to have a simplified version of this which used a vtable for the most frequent 16 selectors. It wasn't considered profitable after sufficient optimizations on the hash-table based method cache.
Apart from sounding absolutely crazy (not doubting you; I've seen how verbose Objective-C can get),
that sounds like the method names almost certainly will consists of many sets of names that are likely only used by small sets of classes, in which case the extra indirection in Franz' approach should work just fine, and not require any caching.
What I saw when looking at this before choosing to go that route years ago is that most dynamic language implementations seems to have just rejected vtable based approaches out of hand or never considered them at all because it's become seen as a "write once" approach unsuitable for updates and the default assumption has become that a complex lookup is needed.
It's been a few years since I looked, but last time I looked Franz' paper was the only one I found that investigated dynamic changes to objects at runtime using a vtable-like approach at all, and ironically he did so with a statically typed language... It seems like a curious blindness to me - maybe it genuinely is unsuitable for Objective-C, but most dynamic dispatch mechanisms I've looked at have been in environments where the number of names being looked up tends to be too small for even Franz' approach to be necessary.
(For Ruby the method name count tends to remain quite small, to the point where Franz' approach doesn't seem worth the cost of that extra indirection most of the time)
No that’s not right. The implementation is not changing. The prototype is changing to break code that was poorly formed in the first place. If you had cast to the correct type in your code as you always should have, this change shouldn’t break anything.
As far as code breaking changes go, the impact should be low. How many times in a code base should you need to call msg send? And the risk of problems is low. Only if your code was silently relying on float corruption would you notice a change. Anybody making six figures should be able to analyze such a thing without too much whining.
Well I'm not sure either who would rely on extensive calling of this function but why would you call it "whining"? Apart from the fact that earning six figures still isn't the norm for most Objective-C developers.
> The prototype is changing to break code that was poorly formed in the first place.
I guess calling it directly would even be poor form by itself so who knows.
> I guess calling it directly would even be poor form by itself so who knows.
The point being that there are limited cases where it is justifiable. For example, a VM or language interop would likely be justified. Those types of cases mean the absolute number of call sites should be small though. In other words, the problem is somewhat self-limiting. Probably, in appropriate use, the function would be called extensively, but likely not from many sites.
> Apart from the fact that earning six figures still isn't the norm for most Objective-C developers.
I have no horse in this race.. I don't work in the industry for years.
This was also a somewhat tongue-in-cheek remark that I think you are reading too deeply into. The real point is, software engineers are generally paid well relative to median incomes and have a cushy job. On the scale of pain, this doesn't seem like a big deal - not even a Python 2 to 3 transition.
BLS average salary for software engineers is $103k, not including bonuses and stock, etc. I would think the average engineer that would justifiably use msg send directly is being paid greater than the BLS average - which includes all the enterprise LOB app slaves that don't do incredibly well.
I mean, yeah, you're not "supposed to" use it; it's a private framework. Also, the only reason why someone like me would know if it's in C or not is that I need one of the aspects of that library that can't be expressed in the high level APIs.
My point is that there's tons of non legacy C code, in the OS outside of the kernel and posix subsystems on *OS.
I wonder if the new signature has any effect on pointer authentication (or was related to those changes). Also, I think the title would be better served with the correct capitalization of the method name, objc_msgSend.
As an aside, am I missing something or is Apple's doc on objc_msgSend describing the parameters of the old form? The parameters section doesn't seem to match the declaration section.
This seems like a really interesting case where removing type information, removing something was presumably thought to be a safety guard, and requiring type casting, makes the type safety stronger by forcing the user to do it explicitly. Maybe it makes sense that dynamic calling mechanisms should require casting (and static type checking is less safe) where static calling mechanisms should use static type checking (and type casting is less safe).
> C specifies that certain types get promoted to wider types when passed as a variadic argument. Integers smaller than int (such as char and short) get promoted to int, and float gets promoted to double. If your method signature includes one of these types, it's not possible for a caller to pass a parameter as that exact type if it's using a variadic prototype.
Holy crap! I’ve used these things for many years without knowing this important tidbit. Does anyone know if it’s the same in C++, or whether GPUs are currently ignoring or following the spec?