Hacker News new | past | comments | ask | show | jobs | submit login
Why does the 80486 take longer to execute simple instructions than complex ones? (stackexchange.com)
64 points by segfaultbuserr 15 days ago | hide | past | web | favorite | 19 comments

This reminds me of my teacher in micro architecture who mocked the insanity of the Pentium 4 NetBurst pipeline. It's hard to believe today that Intel fully expected and were commited to make that stuff run in upwards of 10 GHz. Meanwhile, AMD offered tongue in cheek Prescott Survival Kits¹ due to, uh thermal hazards. Those were the days. :-)

¹ https://i.imgur.com/BYtbfDX.jpg

Those "Extreme Edition" CPUs[0] were nicknamed "Emergency Edition" for a reason.

[0] https://en.wikipedia.org/wiki/Pentium_4#Gallatin_(Extreme_Ed...

>"This reminds me of my teacher in micro architecture who mocked the insanity of the Pentium 4 NetBurst pipeline."

Could you elaborate on what exactly was so insane about the the Netburst microarchitecture?

Not the same person but I assume it is related to the issues discussed under "Scaling up issues" here:


Life was so simple back in the 1970s, when I began to program. Processor manuals for CPUs had a table of instruction timings, and you could just add up those times for each instruction and figure out exactly how long a code segment would take to execute. No branch prediction failures, no cache misses, no pipeline quirks...

Also made it really easy to time your glitch attacks against “secure” MCUs ;)

Yes it seems everything is inside out now. Used to be everything had to be aligned to perform well (or at all). Now, you want to be unaligned in many cases, at least as far as cache lines are concerned, to avoid every malloc returning a block that maps to the same cache lines.

And there are so many registers, its often possible to put whole algorithms into registers instead of stack or memory. I wonder if the age of the 'stack' should come to an end?

the amount of transistors required to do those extra jobs, couldn't be used for extra core ? Will it be faster or slower ?

It doesn't really work like that. It is ridiculously easy to stamp out ALUs on a die, but the problem is keeping those ALUs fed with useful data. And memory just can't keep up with modern CPU clock speeds. Modern CPUs use cache hierarchies to push off having to access main memory as late as possible, and even then, they use a lot of transistors to deal with the fact that even on-chip caches are slow compared to the ALUs. GPUs instead use those transistors to keep as many threads running as possible, treating a request to go to main memory as an event to switch to another bank of threads while it waits for the memory request to come back.

We do have processors that use their silicon to implement a bunch of simple, consistent cores that do simple operations; the most common form are GPUs. They can run parallel operations much more efficiently, and theoretically perhaps most or all programs could be implemented in a way that would be more efficient on those processors. But certainly the fastest way that's been found to run existing C-style code is these very complex execution pipelines; extra cores are not the limiting factor for most workloads.

This is relevant for me as I'm making a game using a codebase from the mid 90s designed for the 486 (a mix of C and assembler). I assume most of this was known and optimized for at the time but something to consider as I add features to keep performance.

You can find the i486 programmer's reference manual easily. It contains, in the part documenting all the instructions, the number of cycles they take. (That's kind of a simplification, but in the context of the in-order, small pipeline, and not-superscalar i486, not by much.)

Oh excellent, I'll check that out. Been a while since I read a microprocessor book.

Sometimes with these incredibly specific SO questions that have incredibly detailed and knowledgeable answers I start wondering what the likelihood is that they happened organically vs the likelihood that they were teed up Quora style to improve content quality.

> the 80486 is not a RISC CPU

My immediate thought when I saw the title was: "hey that is a CISC CPU, right?". I only get the chance to have the CISC/RISC dialogue once every couple of years. Most IT folks don't bother with chatting about CPU architecture any more. These were some fun discussions to have back in the day. With the price and speed that CPUs have nowadays these discussions feel redundant.

Not just price and speed. The original motivation behind RISC was to make a chip that was simple enough to fit all the functionality onto one die. However, chips have been on a single die for many years, so that's no longer an advantage.

The actual instruction set design is much less (but still) important these days. There aren't many "pure" RISC designs left because the universal demand for accelerated encryption, decompression, vectorization etc outweigh the ideological need for a pure design.

IMHO what's more relevant today is having an instruction set that compilers can fully exploit, rather than having an instruction set that's optimized for human understanding. In other words, it's no longer a decision between complex and simple.

> There aren't many "pure" RISC designs left

True, from the point of view of the "user" (i.e., the compilers). Internally, most CISC machines nowadays are essentially RISC machines with a CISC-to-microcode translator, so you could also say that there aren't many "pure" CISCs left either :)

Indeed, with either macro-op and micro-op fusion microarchtecture some of the performance costs are amortised.

Hit me up anytime you'd like to have an architecture convo, esp a cisc / RISC discussion.

Keybase in profile, but lmk of a different medium would be more palatable.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact