There's ABI specific pieces that are still not abstracted in the bitcode like struct packing rules.
There are things you can do if the ABI's are the same, such as optimize for microarches, but they otherwise have literally no idea what they are talking about.
Bitcode is meant for repeated optimization of the same IR.
That is, it would make sense to start from a well-optimized AOT compiled version of bitcode, then JIT the bitcode at runtime and try to come up with something better once you have profiling feedback.
I expect this is the plan, given that it's what everyone else who is serious about LLVM does.
It would not make any sense to start from bitcode and try to generate code for different architectures.
LLVM is a low level virtual machine for a reason.
There has been work on things like virtual ISA's using LLVM (see http://llvm.org/pubs/2003-10-01-LLVA.html), but the result of this research was, IMHO, that it's not currently a worthwhile endeavor.
You also can do things like restrict bitcode in ways that make it portable (like, for example, PNaCL), but this is closer to the LLVA work than anything else (It's essentially a portability layer)
It actually still requires porting, just porting to the single portability layer.
Much of this article is simply inaccurate speculation.
That can be important information. For example, the translator might not figure out where data embedded in the code (for example a jump table) ends, and, consequently, continue translating from a point mid-way into a multi-byte instruction.
Translating code that generates code (as done in JIT engines or to speed up image processing kernels) also is a challenge; the best one realistically can do is to translate the existing code as-is and then call the translator at run time to do the translation. If that works, the 'call the translator' step may kill any performance that was won by generating code.
Of course, self-modifying code is a challenge, too.