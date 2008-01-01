I will however say that the Reduceron and in general the idea of doing FP in hardware in the most direct way are a terrible waste of resources and I'm pretty sure it loses to a good compiler targeting a von Neumann machine on overall efficiency.
The way to go is not make a hardware interpeter, that is no better than a processor with a for loop instruction added to better support C. The trick is to carefully partition sw and hw responsibilities as in the model to which C+Unix/RISC+MMU converged to.
It failed due to very poor performance. There is an excellent paper by Bob Colwell about why the performance turned out the way it did. Prior HN discussion: https://news.ycombinator.com/item?id=9447097
2. Eliminate virtual memory (replacing it with nothing)
I'm not a CPU designer, but my understanding is that removing features allows for a denser/faster CPU. Well, these are two features that a suitably high-level language has no need for, because a high-level language doesn't expose "memory" to the programmer.
Getting rid of virtual memory is potentially a big win, especially for architectures where you can't make the L1D cache virtually indexed but physically tagged. And in general there are a lot of special cases you don't even have to think about if different memory addresses can't alias to the same memory. You do lose out on a lot of software tricks there, though.
https://www.cs.york.ac.uk/fp/reduceron/
That's specialized for just one language, though. In general you can always speed things up, sometimes by quite a bit, if you're willing to make your general purpose computer somewhat less general purpose.
Some of what the Mill folks are doing with hardware assisted stack operations might fall under the category of higher level instructions but those are for C just as much as any other language.
https://millcomputing.com/
