I do agree that current processors optimize for C/C++ (although of course there are niche systems like Azul which optimize for other languages). It would be nice to have processor extensions that allow us get better GC performance, or better handling of immutable values. There's a chicken-and-egg problem getting there.
On GPU's the binary compatibility issue is solved by having the driver compile the shader/compute kernel before it's used. As an example nvidia uses PTX (see http://docs.nvidia.com/cuda/parallel-thread-execution/) as an intermediate language in CUDA which is then compiled by the runtime into the actual ASM.
On modern CPU's the register renaming has already decoupled the physical registers from the instruction set register. As an example modern haswell has over 100 registers per core.
Would you mind expanding this.
So, in general I am saying that the road to better performance will not be in aggressive compiler optimization, but rather in higher level design tools to manage totally new software/hardware abstractions. Binaries will be specified at a higher level and look more like source code. At this point my crystal ball becomes admittedly a bit fuzzy.