Ask HN: Why are older processors slower at the same frequency?

tlb · on Sept 24, 2022

Modern CPUs can look ahead in the instruction stream and run 4 or more instructions simultaneously. This isn't easy: you have to respect data dependencies where one instruction depends on the output of a previous one. When something depends on a load instruction that missed in the cache, CPUs can keep going farther ahead and do some other work while waiting for load to complete.

This "speculative out-of-order execution" requires a huge number of transistors to consider various combinations of future instructions it might be able to execute every clock cycle, and burns some extra power doing that. So although most of the basic ideas were known by the late 90s, adding more transistors in every generation lets it do more and more in parallel.

Also, faster and larger caches cause fewer stalls.

Also, modern cores are better at predicting branches, so they can proceed to start executing instructions past a branch before knowing which way the branch is going to go. If it guessed wrong about the branch, it has to undo the results of some instructions. So it adds a lot of complexity to track each side-effect that might need to be canceled.

Also, SIMD parallel has gotten much better. Some modern cores can do 8 floating point operations per cycle using AVX2 or Neon. While older SIMD systems had very limited instructions sets, you can do a lot with modern ones. x86 SIMD instructions can process 32 bytes at a time. With a great deal of cleverness, you can do some byte stream operations in less than one cycle per byte. See https://arxiv.org/abs/2010.03090

GPUs generally do 32 parallel floating point operations per core per cycle, with hundreds of cores.

Also, main memory is gradually getting slightly faster and wider.

Lastly, more cores are good. Back when the most cores you could get was 4, it was barely worth writing parallel software because all the locking slowed things down almost as much as the 4 cores speeded things up. But high-end Xeons can have 40+ cores, which makes it worth the hassle of writing parallel code. And GPUs have 1000s of cores, so it's worth a lot of complication to make use of them.

toast0 · on Sept 24, 2022

In a word, IPC, instructions per clock cycle. Even in most, non-pipelined, strictly in-order processor, some operations take more cycles to complete than others: division is almost always much longer than addition, although some processors simply don't divide or multiply so everything can be equal speed.

Adding more, cache and memory interfaces are important too; instructions don't run without data, and while early computers often had synchronous ram as fast as the cpu clock, that's not possible anymore, data that's not in registers takes time to load and store. Super scalar, out of order execution, etc mask some of that, but not all of it.

ksec · on Sept 25, 2022

>I think looking at the GHz and number of cores is not enough anymore.

It never was. The whole era of Pentium 4 and Centrino, or PowerPC vs x86.

>I am a software engineer so I am pretty knowledgeable in the topic of computers in general, but this specific question continues to bother me.

I think this pretty much sums up modern day Software Engineer in the industry. Is the lack of knowledge in hardware. Everything is so abstracted most people think of it as someone else's problem. Until now when Moore's law is finally dead.

etamponi · on Sept 25, 2022

To be honest I don't think this is entirely true. I'd say it's more the fact that actually _experimenting_ and _learning_ these kind of things on the field requires very specialized knowledge and a lot of time. I'd be more than happy to learn a programming language (or some other, similar thing) that helps me understand how IPC work and the effects of caching.

The fact that processor producers have put all of this under a big "implementation details" sign makes it very difficult to develop an effective knowledge.

ostenning · on Sept 26, 2022

Imo this is why it’s useful to discern a engineer from a developer. A developer can pickup a high level abstraction easily without ever needing to think about low level issues, and then comfortably stick to that silo for many years

jmartin2683 · on Sept 25, 2022

^ this. This is absolutely horrifying.

detaro · on Sept 24, 2022

Actual performance is based on two things: clock speed, and how much useful work the CPU design can do per clock step. improvements in the former have kind of stopped, so now most of the improvements happen in various aspects of the latter.

The best comparison is and has always been benchmark results.

smoldesu · on Sept 24, 2022

Like other comments said, there are a lot of factors that go into determining the performance of a clock cycle. One of the more interesting fields here is the optimization of instructions themselves - Agner Fog has a great document[0] comparing the performance of common x86 instructions across multiple CPU generations. It becomes really easy to see how great the early Ryzen chips were despite their low clock speeds.

[0] https://www.agner.org/optimize/instruction_tables.pdf

giuliomagnifico · on Sept 24, 2022

Because a CPU is more complex, you have also to look at the other parameters, like the RAM clock (that’s handled by the cpu), the instructions set, the other connections BUS, etc… obviously if we are comparing a same x86 architecture and not ARM vs x86 or other RISC vs CISC.

Using only the clock is like comparing two cars using only the number of horsepowers without the weight, aerodynamic, frame, etc…

Ekaros · on Sept 25, 2022

Newer CPUs have more transistors. Meaning they can do more things. More things you can do faster you can be. Now "more things" might be more complex instructions, more cache memory, more things attempted at one time.

Now when to replace... I suppose that is entirely depends when you feel like it is too slow. At that point evaluate if newer CPU or would help.

IceWreck · on Sept 24, 2022

If you want to compare CPUs do benchmarks like Geekbench instead of comparing a single metric.

bjourne · on Sept 25, 2022

Mostly because of Instruction Level Parallelism. While the number of cycles per second has not increased since 2015, the number of instructions per cycle has more than doubled due to more efficient architectures and duplicated circuits. So a core from 2015 could perhaps run 1.5 instructions per cycle, but one from 2022 could run 8 instructions per cycle and thus would be more than four times faster (at least on some tasks).

You can not tell when you need to replace your old processor with a new one without manually comparing them. Performance is much too workload specific these days.

rramadass · on Sept 25, 2022

The keyword to research here is ILP (Instruction Level Parallelism) - https://en.wikipedia.org/wiki/Instruction-level_parallelism

There are a whole bunch of factors involved (as pointed out in the above link and by others in this thread) but the basic idea is to parallelize instructions using both Processor Micro-architectural and Compiler techniques i.e. how to get more done in a single clock cycle aka IPC (Instructions Per Clock cycle).

kadoban · on Sept 24, 2022

Besides what others have mentioned, caches sizes and speeds help a lot, and I/O speeds in general (mostly in that some machines from 2015 might still have an hdd, not too many these days).

fulafel · on Sept 25, 2022

Look up a computer architecture book or course materials, that discusses evolution of CPU designs and how we got from multiple clock cycles per instruction to several instructions per cycle and several data items per instruction. Pipelining, superscalar, vector processors, out of order, vliw, simultaneous multithreading, simd, symmetric multiprocessing, branch prediction, memory caching, etc.

oshirisuki · on Sept 25, 2022

throughput

newer CPUs have higher levels of parallelism, therefore having higher throughput, even at the same frequency

the parallelism can be achieved via vector instructions, out of order execution, along with other changes, like better or more caching

system performance as a whole doesn't just depend on the CPU though, a beefy CPU with shitty RAM or an HDD might be worse than a mid CPU with high-speed RAM and an SSD (even a SATA one)

DamonHD · on Sept 24, 2022

Clock frequency and cores were never really enough. When I benchmarked hardware for <ABigBank> years ago, the OS made quite a bit of difference as well. Eg WinNT was maybe ~2x slower for I/O for a given MHz of CPU than SunOS or Linux, IIRC.

zhxshen · on Sept 25, 2022

All of these have been around for a long time, but just like well-maintained software, 7 years worth of incremental improvements add up:

https://en.wikipedia.org/wiki/Superscalar_processor

https://en.wikipedia.org/wiki/Pipeline_(computing)

https://en.wikipedia.org/wiki/Single_instruction,_multiple_d...

https://en.wikipedia.org/wiki/Branch_predictor

https://en.wikipedia.org/wiki/Speculative_execution

https://en.wikipedia.org/wiki/Multi-core_processor

Since RAM (slow) and/or cache access is involved in nearly every step--which becomes increasingly complicated when trying to preserve cache coherency across multiple cores--improvements in the next two are a big deal:

https://en.wikipedia.org/wiki/Memory_management_unit

https://en.wikipedia.org/wiki/Cache_hierarchy

Executive summary: Work smarter, not harder, & the ultimate measure of performance, is performance (which may sound stupid, but it's in the textbook, because it's true!).

I would also add that more cores aren't necessarily better. The utility depends upon the nature of the task & how memory-hungry it is. If the task is inescapably sequential, it doesn't really matter how many cores are on the die. Same story with parallelizable tasks that pound RAM: at any given time, one core is hogging the memory bus, and the rest are waiting their turn. They may take turns, but at any point in time, it's essentially single-core performance.

The place where multi-core really shines is when you have a highly parallelizable task, where each thread grinds really hard over a smallish data set that fits comfortably in the core's cache. In that case, you can definitely max out all cores. Though from what I see in the wild, that is a rare case.

A lot of the industry is really just gaming benchmarks at this point, which are, for the most part, bullshit. I think Apple will remain in very good shape on this front, if only because of their customers (i.e. normies instead of gamers; people who haven't fallen completely into the quantity cult). They will complain when it stops feeling fast (the only measure...), instead of taking it as a challenge & wasting their lives on overclocked water-cooling bullshit.

sys_64738 · on Sept 24, 2022

Process fabrication improvements can mean a basic CPU design gets faster with no new redesign. Faster memory chips also with the same improvements.

blackoil · on Sept 25, 2022

Fabrication improvements mostly mean that processor will run cooler, so can run at higher frequency without burning. It does not improve the IPC of that processor.

Nomentatus · on Sept 25, 2022

It's not always being cooler per se that makes the difference; other problems such as electron tunnelling (we're getting well beyond my paygrade, there's a lot more) can be limited with a better hardware design (say tapeout) as well. But yes, you get better performance from being able to function at a higher frequency on a large die (with more heat, but that wasn't necessarily the limiting factor previously) or on a smaller die where the parts are now closer together and so take less time and energy (heat) to communicate. A better tapeout geography can mean fewer lost cycles waiting for this or that info to get where it's going and so actually increase useful IPC at least a bit.

jmartin2683 · on Sept 25, 2022

Not to be an ass, but it’s horrifying that we graduate and/or employ engineers that don’t already know this, at least at a high level.

sccxy · on Sept 25, 2022

But you are an ass.

It is more horrifying seeing assholes like you disrespecting people for asking questions.

You can teach people about history of CPUs but it is much harder to teach them to respect others.