Hacker News new | past | comments | ask | show | jobs | submit login

Highlights: The version of BOOM in the paper (two years ago) had the same IPC as a Cortex-A15 in half the die area on a process node two generations older while using compiler ports which were immature at the time (and still probably have plenty of room to improve). Most impressively of all, it was designed and developed by three people (Chris Celio, David A. Patterson, Krste Asanović), two of whom (David and Krste) were not primarily focused on it.



Hi, author here. Just to point out to others (and to be fair to the A15), IPC is just one part of the final performance equation. =)


Chris, I worked hard to get you on that pedestal, don't go jumping off. ;- )

Fair enough, you didn't reach the same frequencies, but that's what the other 1.4mm² and the process shrink are for.

ARM[v7] maybe does a little bit more per instruction, what with those conditions and 14+ character non-mnemonic mnemonics; but ultimately instruction counts should be pretty close, right?

Update: also probably SIMD[or vectors], breakpoints, more interesting memory management, the handling of bizarre FP corner cases, maybe power management[high frequency dvfs? :- )], and other things go in that additional 1.4mm².


> Chris, I worked hard to get you on that pedestal, don't go jumping off. ;- )

O:-)

> ARM[v7] maybe does a little bit more per instruction, what with those conditions and 14+ character non-mnemonic mnemonics; but ultimately instruction counts should be pretty close, right?

Great question. I just so happen to have written a tech report on this very topic! https://arxiv.org/abs/1607.02318

Basically, performance should be identical between ARMv8 and RISC-V, given the RISC-V core implements macro-op fusion to combine things like pair loads together.


> Great question. I just so happen to have written a tech report on this very topic! https://arxiv.org/abs/1607.02318 .

> Basically, performance should be identical between ARMv8 and RISC-V, given the RISC-V core implements macro-op fusion to combine things like pair loads together.

Yeah, I drew my conclusions from your papers. :- )

I really should diversify my sources, I bring nothing to this exchange.


Wasn't that comparison totally rigged by omitting a bunch of stuff (e.g. SIMD) from the BOOM core that wasn't used by a particular benchmark?


I was using Coremark in that report, which is what ARM used to market their cores. The 32b ARMv7 cores have NEON SIMD, but I believe no FMAs, whereas BOOM is 64b and includes double-precision FMA units.

Also there's no current RISC-V extension for SIMD or vector ops, so I didn't maliciously "omit" things to "totally rig" the comparison. But even running SPECint is not going to fire up the SIMD unit.


Uhhh yes ARMv7 NEON has vector FMAs.

SPEC with modern compilers will definitely use SIMD, utilization on hot regions is variable but it's definitely beneficial.


Not all ARMv7 have FMA, only the newer models.

Models with FMA: Cortex-M4, Cortex-M7, Cortex-A5, Cortex-A7, Cortex-A15, Cortex-A17. I do not remember which Cortex-R have FMA. Models without FMA: Cortex-A8, Cortex-A9.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: