Hacker News new | past | comments | ask | show | jobs | submit login
How to use 1000 registers (1979) [pdf] (caltech.edu)
38 points by luu 5 months ago | hide | past | web | favorite | 10 comments

CPUs today have register renaming, but not at the level the programmer sees the machine. At the level where the machine is superscalar, x86 CPUs have far more registers than the few the user sees.

Having many registers at the programmer level turned out to not be all that useful. SPARC machines did that, but it wasn't a huge win.

It's worth noting that modern GPUs have several megabytes of register file. That translates to a ~million registers, if I did the math correctly.

Manageable parallelism is an answer to "how to use a bajillion registers."

It's a little different though. GPUs have all those registers because they (1) have a desperate need to optimize around pipeline stalls (because a texture read is many times slower than blocking on main memory) and (2) "know" that they have problem areas appropriate to tiny kernels of code that can work efficiently in fixed size register sets. So they arrange that a "context switch" can be done by changing one index register in a hardware scheduler.

They have a bajillion registers because that works to solve the problem they have, not because they're trying to find out "how to use" them. If you had the same amount of die area and a scalar-ish problem, you'd spend it on a traditional cache hierarchy just like CPUs do.

Any guesses for the typesetting system used to produce this paper? Is that troff or something even older?

If you created a runtime virtual machine these days, I wouldn't use stacks or registers... address virtual/virtualized memory directly and let the CPU &| compiler || LLVM sort out register placement... registers are basically another layer in the storage hierarchy, from registers/L0 file all the way down to tape, or eww, hard copy.

That's the way JVM and .NET work. Dalvik moved the other direction and has explicit registers. Honestly the distinction is mostly a wash. The representations are essentially 1:1, any reasonable optimizer will be able to move stuff around regardless of whether it's a "register" or "local variable".

Why not go all the way with this idea and pretend everything is a hard copy? Your program's statements would all be printing the result of something into paper, or scanning that paper. And then you just let the compiler and CPU figure out which pieces of paper can be put into registers.

The answer is that this is too cumbersome and hard to deal with. So is memory, in a smaller way - when things are in memory they can be affected by other threads, the outputs of different instructions can overlap, etc etc.

That would not be a good idea. The amount of analysis effort to analyze something in memory is significantly higher than keeping things in (virtual) registers.

Sounds like the reverse problem is easier: pretending there are many registers and let the toolchain figure out how to spill them to memory.

There was an odd platform in the 80s/90s which did this called Taos. Its virtual machine had up to 5x65535 registers.


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact