Hacker News new | comments | ask | show | jobs | submit login

I thought that just being LLVM bitcode wasn't enough to guarantee portability like the author assumes that it is.

There's ABI specific pieces that are still not abstracted in the bitcode like struct packing rules.

You are correct, and the article isn't even close to right.

There are things you can do if the ABI's are the same, such as optimize for microarches, but they otherwise have literally no idea what they are talking about.

Bitcode is meant for repeated optimization of the same IR.

That is, it would make sense to start from a well-optimized AOT compiled version of bitcode, then JIT the bitcode at runtime and try to come up with something better once you have profiling feedback.

I expect this is the plan, given that it's what everyone else who is serious about LLVM does.

It would not make any sense to start from bitcode and try to generate code for different architectures.

LLVM is a low level virtual machine for a reason.

There has been work on things like virtual ISA's using LLVM (see http://llvm.org/pubs/2003-10-01-LLVA.html), but the result of this research was, IMHO, that it's not currently a worthwhile endeavor. You also can do things like restrict bitcode in ways that make it portable (like, for example, PNaCL), but this is closer to the LLVA work than anything else (It's essentially a portability layer) It actually still requires porting, just porting to the single portability layer.

I appreciate the clarification.

Not only that, but "[target data layout] is used by the mid-level optimizers to improve code, and this only works if it matches what the ultimate code generator uses. There is no way to generate IR that does not embed this target-specific detail into the IR. If you don’t specify the string, the default specifications will be used to generate a Data Layout and the optimization phases will operate accordingly and introduce target specificity into the IR with respect to these default specifications."

Much of this article is simply inaccurate speculation.

[1] http://llvm.org/docs/LangRef.html#data-layout

note that data-layout used to be optional, but is now mandatory.

That's correct. Bitcode for a 32 bit processor won't work with 64 bit processors, for instance.

The ABI would be compatible between similar architectures: Apple has somewhat encouraged developers to provide binaries for all of armv6, armv7, and armv7s in the past, all of which are 32-bit ARM architectures. While some new features might be difficult to take advantage of without high-level information, I imagine a difference as drastic as ARM vs Thumb would be easy to pull off starting from bitcode.

Bitcode alone might not be infinitely malleable. But given that Apple knows intimately the original platform the submitted app targeted, it has the ability to add additional logic in iTunes Connect that permits tweaks and assumptions you couldn't get away with in the promiscuous and agnostic setting of a generic compiler toolchain.

Yeah, you more or less can't change struct layout after IR (maybe unless you had a no-aliasing no-FFI guarantee... safe Rust?) But you can encode new or different instructions and add new optimizations.

Perhaps someone can explain to me why converting a binary straight from one architecture to another couldnt be done. If emulators can work why not translaters?

An emulator has run-time information, a translator does not.

That can be important information. For example, the translator might not figure out where data embedded in the code (for example a jump table) ends, and, consequently, continue translating from a point mid-way into a multi-byte instruction.

Translating code that generates code (as done in JIT engines or to speed up image processing kernels) also is a challenge; the best one realistically can do is to translate the existing code as-is and then call the translator at run time to do the translation. If that works, the 'call the translator' step may kill any performance that was won by generating code.

Of course, self-modifying code is a challenge, too.

The problem is that perfect disassembly (figuring out where every instruction starts, and if bytes are instructions or data) of an arbitrary program is undecidable. Emulators get around that problem by only disassembling instructions that actually get executed at run-time (and therefore can safely be called "code").

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact