Hacker News new | past | comments | ask | show | jobs | submit login

Hardware doesn't work like this. You might want to read Hennessy and Patterson, and the original RISC I paper.

http://www.amazon.com/Computer-Architecture-Fifth-Edition-Qu...

http://www.cecs.pdx.edu/~alaa/ece587/papers/patterson_isca_1...




RISC created a huge local minimum by speeding up C code to the exclusion of other languages. I predict that eventually future processors will hide more features from the higher software levels (such as number of registers, instruction types and formats) in order to improve efficiency at the machine level. I think we are seeing this trend with GPUs already. Current CPUs don't do this because they have to maintain binary compatibility with a huge installed base. We can compare notes in a decade or so :-)


Current processors already do that. You don't see the true number of registers or the true instruction set/format of any modern Intel processor. x86 instructions are translated into micro-ops, so x86 is really just a compatibility layer.

I do agree that current processors optimize for C/C++ (although of course there are niche systems like Azul which optimize for other languages). It would be nice to have processor extensions that allow us get better GC performance, or better handling of immutable values. There's a chicken-and-egg problem getting there.


GPU's don't do any OoO processing like modern CPU's do. They also don't do any register renaming. They execute things really literally, up to the point where one has to manually put delay slots for pipelined stuff if one really writes the raw asm (Which the manufacturers tend to keep really hidden, in order to avoid the binary compatibility trap, see https://github.com/NervanaSystems/maxas as an example for third party assembler for nvidia Maxwell arch)

On GPU's the binary compatibility issue is solved by having the driver compile the shader/compute kernel before it's used. As an example nvidia uses PTX (see http://docs.nvidia.com/cuda/parallel-thread-execution/) as an intermediate language in CUDA which is then compiled by the runtime into the actual ASM.

On modern CPU's the register renaming has already decoupled the physical registers from the instruction set register. As an example modern haswell has over 100 registers per core.


> RISC created a huge local minimum by speeding up C code to the exclusion of other languages

Would you mind expanding this.


In the future I believe you are going to see less emphasis on the aggressive speedup of C code for traditional CPUs. Instead you will see many more gadgets with simpler processors that run C code slower in the effort to save power. GPGPUs and algorithm specific hardware (e.g. video, crypto, network, DSP, neural nets) will fill out the rest of the chip. At some point GPUs will have enough raw power and GP features for it to be possible to run an instance of a late-80's operating system within the working set of a single GPU processing element (perhaps with virtual memory emulated as in jslinux.) At that point the need for a power hungry CPU and artificial CPU/GPU distinction will start to fade away completely. Along with Peak Oil we will have Peak CPU.

So, in general I am saying that the road to better performance will not be in aggressive compiler optimization, but rather in higher level design tools to manage totally new software/hardware abstractions. Binaries will be specified at a higher level and look more like source code. At this point my crystal ball becomes admittedly a bit fuzzy.


Your claim was that RISC "created a huge local minimum by speeding up C code to the exclusion of other languages" and it's not at all clear to me what RISC has to do with c, and why other languages are worse off for this.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: