
SweRV – An Annotated Deep Dive of the SweRV RISC-V Core - matt_d
https://tomverbeure.github.io/2019/03/13/SweRV.html
======
_chris_
_> When you have a look at the SweRV architecture and implementation, it is
not incredibly complex, especially compared to, an out-of-order triple-issue
A-15 core. And yet it achieves similar or higher CoreMark/MHz scores._

FYI, Coremark fits entirely within an L1 cache, so performance is
significantly effected by load-to-use and load-to-load delay. The "skewed"
pipeline design of SweRV helps it really shine here! The OOO cores typically
have worse load-to-use and the OOO advantage typically shows up once you start
missing in the L1 cache.

The remaining significant factor to Coremark performance is a good branch
predictor, and to that end, I'm very impressed that their gshare can
completely learn Coremark.

~~~
tyingq
Also, SwerRV doesn't have an MMU, so a comparison to an A15 is a bit odd. They
wouldn't have the same use cases at all.

~~~
tverbeure
Author here.

My reason to choose the A15 was because it was shown on the WD performance
slide with a performance that's close to the SweRV.

The target use cases of the SweRV are obviously very different from the A15
(lack of MMU, lack of Dcache, lack of floating point will indeed do that).

But I didn't expect that adding these features would have a significant impact
on peak performance of a benchmark that can fit completely in a cache?

My comment about complexity is entirely about the core execution pipeline: OoO
vs in-order, number of pipes, number of ALUs. Even there, the A15 is
significantly more complex than the SweRV, yet it performs more or less the
same in best case circumstances. I expected at least some benefits? Since that
is not the case, I assume that this complexity helps for non-ideal use cases
(which the SweRV will probably never experience.)

Edit: as _chris_ points out: OoO helps when you start missing the L1 hits.

~~~
tyingq
Makes sense. It pops in my head because any RISCV story here usually gets
flooded with people hoping for an affordable RISCV chip that will run Linux. I
figured the comparison might spawn some false hope :)

------
justasimpleman
What is this scattered design about?

[https://tomverbeure.github.io/assets/swerv/slides/13%20-%20S...](https://tomverbeure.github.io/assets/swerv/slides/13%20-%20SweRV%20-%20SweRV%20Core%20Phyiscal%20Design.png)

~~~
tverbeure
This is the result of a simulated annealing cell placement algorithm. 99%+ of
all modern silicon will look very similar at the same zoom level.

If your point of reference is the layout of an Intel CPU, don't forget that
those are on the order of 100mm2, where this layout is around 0.1mm2.

~~~
justasimpleman
That's incredibly fascinating. It seems counter-intuitive that a random
placement would be superior to a more regular one. Does this merely optimize
for minimal length of the datalines?

~~~
tverbeure
> Does this merely optimize for minimal length of the datalines?

That's how it used to be 25 years ago. It's probably still a factor in the
cost function, but the biggest part is timing. Of course, timing and length
are closely related.

> It seems counter-intuitive that a random placement would be superior to a
> more regular one.

It's not necessarily superior if you want to have optimal timing for all paths
between all cells. But you don't need that: only a few percent of all paths
are actually timing critical. Those determine the maximum clock speed. The
others have enough slack such that it doesn't matter that they are placed tens
of microns too far.

Random placement (at an ever lower level of detail) is also better to avoid
crosstalk. If you'd place a bunch of driver cells in a nicely aligned stack
and make those wires go a nicely aligned stack of receiving flip-flops with
parallel wires in between, you'd get the mother of all crosstalk problems.

------
person_of_color
Awesome. Is NVIDIA working on any RV projects? If so, hiring?

~~~
childintime
NVidia is using RISCV in their internal "Falcon controller", for a few years.

