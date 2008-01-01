Hacker News new | comments | show | ask | jobs | submit login
The “high-level CPU” challenge (2008) (yosefk.com)
27 points by panic 1 hour ago





Author here. I think today that apart from making my case in a bit of an obnoxious tone, I also somewhat overstated it: while it's true that many "high-level" constructs do have a cost that will not magically go away due to any logic built into hardware, at least not fully, it is also ought to be true that a lot can be done in hardware to make software's life easier given a particular HLL programming model, and I'm hardly an expert on this. My true interests are in accelerator development so starting at the GPU and further away from the CPU and so lower level and gnarlier than C in terms of programming model.

I will however say that the Reduceron and in general the idea of doing FP in hardware in the most direct way are a terrible waste of resources and I'm pretty sure it loses to a good compiler targeting a von Neumann machine on overall efficiency.

The way to go is not make a hardware interpeter, that is no better than a processor with a for loop instruction added to better support C. The trick is to carefully partition sw and hw responsibilities as in the model to which C+Unix/RISC+MMU converged to.

Intel did try to introduce a high level CPU in 1981: https://en.wikipedia.org/wiki/Intel_iAPX_432

It failed due to very poor performance. There is an excellent paper by Bob Colwell about why the performance turned out the way it did. Prior HN discussion: https://news.ycombinator.com/item?id=9447097

1. Eliminate cache coherency protocols (replacing it with cache manipulation/inter-CPU communication instructions)

2. Eliminate virtual memory (replacing it with nothing)

I'm not a CPU designer, but my understanding is that removing features allows for a denser/faster CPU. Well, these are two features that a suitably high-level language has no need for, because a high-level language doesn't expose "memory" to the programmer.

#2 saves a little but not much and precludes unsafe low-level code completely. #1 I think impairs many HLLs, certainly every multithreaded imperative shared memory ones (and IMO nothing is close to these in terms of efficiency on multicore); which languages can work well without cache coherence? (I worked in C++ on multicore with no hw coherence btw. Quite the cruel and unusual punishment.)

Usually replacing software with dedicated hardware tends to speed things up and I don't think it's obvious that using explicit communication would actually speed things up. Most memory operations don't require any form of sharing and having the sharing that's required happen automatically seems efficient.

Getting rid of virtual memory is potentially a big win, especially for architectures where you can't make the L1D cache virtually indexed but physically tagged. And in general there are a lot of special cases you don't even have to think about if different memory addresses can't alias to the same memory. You do lose out on a lot of software tricks there, though.

Finally I run into someone who shares my exact pet peeves!

Well, the Reduceron seems to count as an example. I'm not sure I'm convinced about it's performance, though.

https://www.cs.york.ac.uk/fp/reduceron/

That's specialized for just one language, though. In general you can always speed things up, sometimes by quite a bit, if you're willing to make your general purpose computer somewhat less general purpose.

Some of what the Mill folks are doing with hardware assisted stack operations might fall under the category of higher level instructions but those are for C just as much as any other language.

https://millcomputing.com/

The comments following this article (which span a period from 2008 to 2015!) are also very interesting.

And interesting comments from 2008 and 2015 here: https://hn.algolia.com/?query=The%20“high-level%20CPU”%20cha...

