Seems like a very smart way to keep things binaries up to date without developer intervention -- and possibly even allow re-targeting to different CPU architectures after the fact. That would eliminate the need for something like Rosetta if Apple ends up switching major CPU architectures again some day.
I really think that LLVM is one of the best things to happen to computer science in a long, long time.
I remember a letter from some time ago saying that LLVM IR format is private to specific version of LLVM, and is not guaranteed to be compatible with other versions in the future or in the past. Does this announcement mean that is not the case anymore, and LLVM IR is backwards compatible now?
There's IR assembly (the human readable format) and the bitcode format. There are no compatability guarantees for the assembly but new LLVM versions can read the old bitcode inside the same major version.
LLVM is a great tool, but I've had trouble in the past with version compatibility. I wrote a compiler that used LLVM as a backend (I used LLVM versions 3.4-3.6). The problem was that each minor version of LLVM slightly changed the API. It was things like removing parameters from methods, renaming methods, removing the need for some methods completely, or adding/removing some static-link libraries. If you only wish your tool to be compatible with a single version of LLVM, its not a problem, but attempting to support a selection of minor versions ended up being a pain. These minor versions would come out 3-4 times a year and I would need to find what broke each time, and if there was even an equivalent solution in the new version.
I didn't work on the level of IR, so I didn't come across any problems there, but I wouldn't be surprised if the IR syntax changed slightly across minor versions.
All of LLVM is open source, and the format isn't quite so fluid as GP makes it sound. It should be relatively trivial for one versed in the LLVM codebase to grok the current features of the bitcode.
That said, as long as you keep an indication of the LLVM version that your bitcode was generated with, I really don't see a problem with fluid bitcode specs.
The major difference with LLVM is modularity. If you create a new language frontend, you get all LLVM's CPU support "for free." If you create a new processor backend, you get all LLVM's frontend languages "for free."
This has interesting consequences such as retargeting anything from the frontend to anything on the backend. I'd venture a wager that in the old mainframe days, the monolithic nature of a JIT would not have been friendly to a porting campaign.
The difference is that they had to port the entire JIT to the new processor. With LLVM, you just need the backend components that represent the CPU. Granted, from an app developer's perspective, this is not really relevant because they're targeting a VM; where the VM runs, their apps run.
My limited understanding is that LLVM bitcode doesn't insulate you completely from the ABI/platform differences, e.g. between 32- and 64-bit. So I wonder if they'll be "fat" binaries.
Coincidentally, there's been a bunch of stuff on the mailing lists recently about embedding bitcode in object files in order to support link time optimisation.
You are correct about the 32-bit, and furthermore correct about ABI (LLVM code targeted at Mac wouldn't work on Linux without a lot of work and inefficiencies, and it's not obvious how to fix that).
LLVM IR files are already target specific. There are some targets like Google NaCl that work on several architectures, but I doubt Apple wants to go that way.
I got to understand better the whole concept of having a bytecode format for executables, with JIT/AOT deployment options when I started delving into the old mainframe world.
I used to do AS/400 backups, but never coded for it. So it was quite interesting to discover the all TIMI concepts.
Also other similar environments like the Burroughs B5000.
That's true...MS has been doing this for phone and windows store apps (the few that there are) that are written in .Net to improve performance. The get AOT'd from CIL to target binary when uploaded to the store.
I wonder why this isn't bigger in the FOSS world...maybe because the source and toolchain are already available...I don't know, it might be a neat idea to have an IR userland that compiles on install.
I think this is exactly it. Perhaps they do have a specific usage in mind that they're aiming for, but I bet they'd be taking these same steps regardless.
True, though nominally the same partner. Motorola was the M in AIM, though yes, IBM did the heavy lifting. (Remember Motorola made a Mac clone? How weird.) And the situation wasn't as dire then.
If Apple does the linking of the final binary for you, you won't have a way to recover the symbols <-> address mappings you need to desymbolicate (that'd the dSym file you usually upload to hockeyapp)
I'm honestly surprised that noone has commented on the security implications of this, yet. After all, allowing Apple to recompile your "bitcode" essentially means, that the user can in no way validate that the version he's running is the version the developer published. It would be fairly straightforward to perform MITM attacks on apps this way. (And, sure, Android's APKs are faced with quite the same issue - but AFAIK Google doesn't modify your bytecode (yet?))
They could already do that. They sign and encrypt the (plaintext) code segment you submit so they could easily insert a JMP to their own payload at the head of your code any time
They could also just have the kernel do whatever it wants to your program, because they control that too. If you are worried that Apple might tamper with your binaries if given the chance, using an iOS device is pure folly, because they outright control much more important parts of your system (the kernel, the UI libraries, the RNG…).
Apple has a pretty complex boot system in which each stage of the boot process verifies a checksum of the next layer before starting it up. The lowest layer is etched in ROM. Theoretically, if you verify the integrity of the bitcode, and you verify the integry of the bitcode compiler, you should be able to trust the native binary as well.
Seeing as the only way to run these apps is to get them through the App Store Apple could also just completely replace your app code with whatever they want.
They can also choose to just not validate signatures. If they wanted to MITM your app, this doesn't make it any easier for them as it's already very easy for them to do that.
Signed binaries don't mean anything to someone like Google or Apple who control the host OS (and the hardware).
In both their OS/kernel and in the hardware, Google has the ability to make your app do anything they want to regardless of how you coded it or signed the binary.
They've just announced a major new addition to CloudKit though to basically allow integration into CloudKit via web apps, which seems like a pretty big deal.
I'm thinking that Apple sees a new threat to their mobile (profit) dominance: Fully portable Mobile+Desktop OS enabled through Core M, i.e. Windows 10. "Bitcode" is them preparing to unify their OSes just in case Win10 is successful and buyers suddenly want to have one device to rule them all (i.e. MS Continuum). This could very soon be a reality - and Apple seems to be preparing themselves. Smart move [1].
My thought is more that they are going the Android route of using an intermediary bytecode, to get both platform independence, but on-device compilation for performance. If they can push hardware that will automatically run faster, that makes it all that much easier to sell new product.
Yes, I didn't mean Apple is going the MS route from a technical point of view, just the timing of their platform independence efforts leads me to believe that it has to do with pressure (or the potential thereof) by Microsoft.
I find it interesting this hasn't gotten as much attention as other announcements. It seems to hand over a great deal of responsibility to Apple, at least in terms of performance. Something some might feel comfortable with or not. I can imagine this lack of control about what instructions will actually run on the hardware might not fair well with some developers?
As for Android, having ART on the phone is technicality, given the platform fragmentation Google rather leaves to the OEMs the task of making ART generating the proper code.
This will affect crash reporting and symbolication. Presumably Apple strips debug symbols when they compile. Will they make the dsyms (DWARF containing files for stripped mach-o binaries, necessary to symbolicate crash reports) available somehow? If not, third parties (or anyone who chooses to handle their own crash reports directly) are in trouble.
I'm curious how that works. LLVM bitcode backwards/forwards compatibility is quite poor, from what I've heard. Will they fix some bitcode version to transmit, and have translators on the client/server sides? Have a bunch of different toolchain versions on the server side? Force everyone to resubmit with a newer version if they want to do any changes on the server side?
Presumably the backwards/forwards compatibility is poor simply because people don't tend to have these files sitting around, they mostly exist (at the moment) as part of a compilation framework and are tempfiles.
If needed, it wouldn't be too for someone with a few programmers (Apple could spare a few!) to write the proper versioning to upgrade/downgrade the file format as LLVM changes?
I am not so confident : they now require that you ship a 64 bits version of your apps, which can be also be a problem if you use 32 bits only proprietary libraries.
There's a good reason for that requirement. Loading 32bit apps on a 64but device requires loading all the 32bit version of shared libs, massively increasing the memory footprint.
I like that idea in general. The other day I was reading about OS/400 on wikipedia. It always used an intermediate bytecode...and because of it they were able to seamlessly (who know how seamlessly) from architecture to architecture.
Like Apple's .NET Native compiles to native code in the server.
It's your comment that gave the impression that you thought Apple's new bitcode thing is not AOT and that they follow MS in this not-AOT-ness.
It might not have been what you meant, but it's not very clear from the phrasing:
>"(...) Apple's AOT compilation toolchain was being discussed by some Apple fans as the way to go. Now they turn around and follow what the others are doing"
This reads like Apple had an AOT compilation toolchain that Apple fans thought it was "the way to go" and now Apple dones't have one (AOT compilation toolchain) anymore following MS lead in this regard.
Whereas what you actually meant was probably that Apple fans thought that Apple's PREVIOUS AOT compilation toolchain was the way to go, but now they've changed course and went for an MS style AOT compilation toolchain.
(it read like you think "Apple's AOT toolchain" was a thing of the past, and not they follow MS which doesn't have AOT).
> Whereas what you actually meant was probably that Apple fans thought that Apple's PREVIOUS AOT compilation toolchain was the way to go, but now they've changed course and went for an MS style AOT compilation toolchain.
What I said is that I heard from many Apple fans that it didn't make sense the MDIL/.NET Native compilation model[0] and directly compilation from XCode to the device was the way to go.
[0]Uploading IL to the store and having a server based compiler generate native code before download.
Seems like a very smart way to keep things binaries up to date without developer intervention -- and possibly even allow re-targeting to different CPU architectures after the fact. That would eliminate the need for something like Rosetta if Apple ends up switching major CPU architectures again some day.
I really think that LLVM is one of the best things to happen to computer science in a long, long time.