Hacker News new | comments | show | ask | jobs | submit login
Can you recreate Objective-C in C? (orangejuiceliberationfront.com)
62 points by ingve 4 months ago | hide | past | web | favorite | 37 comments



> Objective-C under the hood compiles every method call into a call to the function objc_msgSend().

Well, kinda. You need a family of functions (objc_msgSend, of course, but also objc_msgSend_stret, objc_msgSendSuper, and objc_msgSendSuper_stret), to get this to really work for when you have structures as your return value or are calling a superclass's implementation (or both!). Since the inline assembly that is objc_msgSend really has no idea what the types of function it's calling is, the compiler must call an appropriate function based on the type information it has.


This is also why dynamic languages like PyObjC, RubyObjC, RubyCocoa, MacRuby, Nu, and F# had difficulty bridging certain functions and methods, because you have to either create an insanely complex function that's able to take any @encoding and turn it into a runtime type, or you have to shimmy it with guesses for the most common ones like NSRect et al.


LuaCocoa author here.

While the objc_msgSend, objc_msgSend_stret, et. al, is an annoying detail, that isn't really the hard part. (And in fact, all the good bridges use libffi to do more direct invocations and avoids a lot of this problem.)

The hard part of bridging is those places where Objective-C introspection is not powerful enough to discover the types of parameters or signatures of functions. Typically, this is all the C stuff.

So for example, Obj-C's runtime introspection cannot tell you the make up of a struct. So it cannot tell you the size of NSRect, NSPoint, NSSize, or the individual data types inside the struct (and the names of each item if you need to access them).

And the Obj-C runtime introspection cannot tell you about what C functions exist and what the parameter types and return types are. So for example, there is no Obj-C runtime way to look up that CGPointMake takes two CGFloat parameters and returns a CGPoint struct.

Also, inline functions (marcos) in C/Obj-C are also problematic for these bridges since they need to call the functions at runtime.

Apple shipped a framework called BridgeSupport in Mac OS X 10.5 which contains XML data for all the things that cannot be determined at runtime. It also contains .dylibs with symbols for inline functions. BridgeSupport is still in macOS today, but it isn't getting much love and Apple keeps (accidentally?) breaking things in BridgeSupport every release.


When you think about it it's bonkers that this language is doing dynamic method calls without any inline caching and I presume no inlining?!. You would think this was intractable for good performance, but it seems to work fine!


There's a lot of caching involved. And it brings a lot of benefits.


There's no inline caching is there though? And so no 'mother of all optimisations' inlining. Or is my understanding of how Objective C works mistaken?

You wouldn't dream of implementing a dynamic language these days designed for performance without basic polymorphic inline caching. But Objective C seems to get by fine without it. Maybe it's not as essential as we think.


Do any statically compiled languages have inline caches? This would be a huge loss for ObjC, since it would greatly increase memory pressure by dirtying every text page.

ObjC has per-class out-of-line caches keyed by the selector (intern'd string).

It's true that there's no hope of inlining across a message send, but ObjC's C and C++ compatibility mitigates a lot of this expense. Apps routinely use C and C++ for perf critical sections.

Also some of Apple's APIs are inline C functions (NSMakeRect, etc) and others are designed to minimize dynamic dispatch. Iterating an NSArray typically requires only one message send (see the NSFastEnumeration protocol).

There's also more exotic techniques like IMP caching.


AFAIK objc_msgSend is always called, but it's very heavily optimized and written in assembly language. Mike Ash had a great post: https://www.mikeash.com/pyblog/friday-qa-2017-06-30-dissecti...


I wonder if /u/mikeash can still comment on this thread!

My recent experience has been that indeed inlining is essential for true "zero-cost abstraction"—say an array of integers with index set/get methods—but that once your abstraction is weighty enough that a method is performing more than a few dozen "interesting" instructions, inlining doesn't win much over how well the modern CPU can already optimize across calls.

EDIT: I should add I'm talking about static inlining. I'm sure your typical alien-technology JIT can identify the exact 800 instruction trace in your inner loop and pull it all together into a perfectly machine-sympathetic sequence that runs 50% faster than anything you could construct statically in the absence of profile-driven feedback.


There’s no inlining for Objective-C methods, if that’s what you’re alluding to, because the language dictates that every method call must be observable. objc_msgSend is generally really fast, though; it’s usually as fast as or faster than a virtual function call.


There’s no inlining, because determining whether inlining wpuld be safe is probably Turing-complete. However, IMP cacheing is a thing and that’s enough for most cases. For the rest there is C and (Objective-)C++.


Your comment was killed very quickly, strangely, but I though it had valid content so I revived it. The actual issue against inlining is that anybody can intercept method calls, even from places that the compiler cannot know about such as bundles loaded at runtime. So this isn’t even an issue with Turing-completeness; it’s an impossible problem to solve at compile time.


> at compile time

That's the key thing - other languages which solve this solve it dynamically.


Yeah, that’s what a JIT does. Unfortunately that’s not something that Apple really wants to open the door to on their platforms, especially for native code.


I don’t think that comes into play here. Objective-C is C, and although you could JIT any language, C isn’t made for it, both philosophically (one of its main claims to fame is ‘close to the metal’) and technically (a source file cold be compiled multiple times with different macro definitions or with a different set of #included files)


C isn't made for JIT because it's static; that's why inclining exists. Objective-C has room to improve because it has dynamic function calls.


C can also benefit from inline caching - function pointers!


Did you mean to say "NP-Complete" where you said "Turing-complete"?


Nitpick: languages don’t do inline caching, implementations do.

Also, Objective-C being “C with a little bit of dynamism where you need it” doesn’t need it as much as languages where every call is, principally, a dynamic one.

Having said that, the current implementation does some caching, but inside objc_msgSend, not inline with the call (https://www.mikeash.com/pyblog/friday-qa-2012-11-16-lets-bui...)


Can you link to any info about implementation details for objc? I’ve been trying to make my own language and this seems helpful.


Check out Mike Ashes blog:

https://mikeash.com/pyblog/

It has posts where he recreates the functionality of the runtime, but without the optimizations that are in Apples runtime (which is open source).


Can confirm that mikeash knows a lot about this stuff and posted a lot about it.

Also bbum, he does a lot of posts on SO about this stuff, and I thought he wrote blog posts on the subject too but can't seem to find them.


> Can confirm that mikeash knows a lot about this stuff

Well, he works on Apple’s runtime team now, so it would be bad if he didn’t ;)


This post seems to confuse "shouldn’t it be possible to recreate an Objective-C program in straight C" and "can you recreate the ObjC runtime in C". The answer to the second is no, because of the reason mentioned. You can't jump to an arbitrary function with any set of arguments and get the right calling conventions in C. But you can easily call objc_msgSend from C; you just need to cast it to the right function signature first.

The real problem is that there's also no way in C to express the ObjC metadata properly. You could in principle register all the classes/protocols/methods at run time, I suppose, but that would be slow and gross.


It's certainly not slow and gross, it's in fact how you should implement a proper dynamic language. lua, potion or self e.g. did it this way. Without any assembler or ffi tricks. apply in C is easily doable if your data is tagged.


Ok, maybe I'm missing something, I'm not C nor Objective-C dev but...

Language is just a specification, right? Which is just a document.

Then you have to implement that language using that spec: interpreter, compiler, VM, runtime, whatever.

So I was always under impression that you could implement any language in any other language (almost; sometimes with the help of another, more low-level lang, say assembly, C etc.).

It is like saying "Java has no Tail-Call optimization (yet), hence you can't implement Scheme (which requires TCO) in Java". Surely you can! Just use trampolines or even use your own stack and calling convention.

What am I missing?


I think what you are missing is that while any Turing complete system can solve the same problems as any other, not all programming languages have the same functionality. An easy example is that some languages are typed and some are untyped. Some have objects (Objective C) and some don't (C). Can you replicate the functionality of Objective C in C. Apparently not completely.


But that is my point: if host language doesn't have feature X, you still can implement that feature yourself, can't you?

If host language is untyped, then write your own type system.

Another example: Java has no co-routines, has no continuations, but Kotlin (written in Java) does. They compile Kotlin code with co-routines into finite state machines in Java.


An object is just a struct with a well-known layout. What you cannot do with C is necessary to offer the same type guarantees.


BOOPSI is another one of these message based object oriented systems tacked onto C, but without the convenient pre-processor. It was introduced with Amiga Workbench 2.0, its developers supposedly impressed by NeXTSTEP. I think it's worth having a look at if you know C and are thinking of how to architect an object oriented system.


It's worth pointing out that you can in fact forward function calls using GCC builtins, although not in clang without adding some assembly.

To quote the API documentation on Constructing Function Calls: "Using [built-in functions], you can record the arguments a function received, and call another function with the same arguments, without knowing the number or types of the arguments." [1]

[1] http://gcc.gnu.org/onlinedocs/gcc/Constructing-Calls.html


This goes into a lot of detail on all this http://phrack.org/issues/66/4.html


You can use dlopen()/dlsym() to replicate something like objc_msgSend(). That's what I was doing for AppSketcher on BeOS.


I don’t quite follow how you’re using the functions in dlfcn to replace the Objective-C runtime?


The GNU Objective C runtime can be implemented in C.

  [object method: args...]
compiles to (more or less)

  IMP tmp = objc_msg_lookup(object, @selector(method:))
  *tmp(object, @selector(method:), args...)


"C is a language that is turing-complete. So is Objective-C. So shouldn’t it be possible to recreate an Objective-C program in straight C?"

Major misunderstanding about turing completeness.

A language is so much richer than simply enabling you to write a map from inputs to outputs. Sure you could probably rewrite my Haskell program in Fortran in the sense that the Fortran program produces the exact same outputs for any given inputs...but it's hardly convincing to me that you "recreated" my program in Fortran after dropping my monad transformers and lenses.

Regardless, you wouldn't even be able to generally prove (assuming you lack dependent typing) that your program is functionally equivalent to mine (since this is undecidable).


Yup. To think of it another way: one way of "recreating a Foo program in Bar" would be to write a Foo compiler or interpreter in Bar, and then use that interpreter to run your original Foo program. Problem solved!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: