
Seventh RISC-V Workshop: Day Two - rbanffy
http://www.lowrisc.org/blog/2017/11/seventh-risc-v-workshop-day-two/
======
alexibm
What bothers me, is that all the Digital Design tools are freaking expensive
and closed source. For example, building RISC-V is possible on FPGA board, but
you have to have Xilinx tools (or some other brand).

~~~
FullyFunctional
Both Xilinx and Altera^H^H^H^H^H^HIntel provides gratis tools for their lower
end FPGAs, but those are plenty to implement an SoC, even one that can boot
Linux. In particular Cyclone V A9 is a pretty impressive FPGA. Sadly, Intel
decided to _not_ provide gratis support for their Cyclone 10 GX :(

------
Klasiaster
I am amazed to see so many activities for open hardware. If the idea of
collaboration is really taken up, this can be a success and realistic way out
of the current IP misery with only closed hardware.

~~~
wyldfire
I don't think we should overstate (or understate) the significance of RISC-V.
Without a doubt, industry is putting lots of time and effort towards RISC-V.

Everyone who manufactures electronic devices that includes some kind of CPU
(hard drive manufacturers, cell phone manufacturers, GPU manufacturers, etc)
-- they're all paying some royalty to someone for that CPU, even if they
designed it themselves. Eliminating those royalties is a great way to get
ahead because you can take $1 out of your bottom line -- if you're shipping
100k or 10 mil devices this year, that's definitely worth the design effort.
It's this marketplace that is fueling RISC-V.

> ... this can be a success ...

I have no doubt that RISC-V will succeed in the microcontroller/embedded
device space.

> ... realistic way out of the current IP misery with only closed hardware ...

I wonder, are you referring more to frustrations with things like Intel's ME
or frustrations with devices that ship without documentation/open source
software details? I fear that RISC-V will likely not change much in either
space. Yes, it will lower the bar for new devices that can be fully open. And
with fully open devices we could hopefully audit and trust them more. But I
think change here will be very slow. The ultimate consumer of these devices
just doesn't care much about fully open devices and it's extra work to
shepherd open source software/hardware. I hope I'm just being too pessimistic.

It's true that there's folks like SiFive who are making boards that integrate
RISC-V cores. They seem like they are ramping up to make general computer
boards that would be pretty useful. But I doubt we'll see them end up in mass
market desktop/laptop/server/tablet/etc devices any time soon.

~~~
pedroaraujo
That $1 you pay to other IP companies, like Arm, gives you access to a fully
verified design with support from the best engineers in the field. And those
companies want you to succeed because they will only those royalties if your
device reaches the market.

By saving that $1 and by not buying IP from other, you will need to invest it
in time, in engineers, in know-how, in support, in verification and so on, all
by yourself.

It might not necessarily be a good cost-benefit situation.

~~~
nickpsecurity
The IP academics keep making on the cheap is always compared to ARM cores in
similar class to prove them out. They always come out superior. So, Im
doubting best engineers or product. Since they do have good engineers, ARM
must be intentionally underdoing their designs to maximize their cash or
something. Maybe make a lot of them. Idk how much IP they actually have.

I agree it's beneficial that it's already silicon-proven by time most
customers get it. The biggest selling point, though, was what ARM themselves
said when open ISA topic first came up: the massive ecosystem, esp software
and dev tools, makes the licensing worth the money. Im not sure how true that
is. Probably varies a lot depending on who is using it but might be true for
many. RISC-V needs to get its ecosystem super-strong to challenge ARM in
general sense on top of pre-proven IP customers can trust. Ecosystem is the
bigger issue, though.

------
jabl
Some discussion about day 2 in yesterday's thread about day one:
[https://news.ycombinator.com/item?id=15811229](https://news.ycombinator.com/item?id=15811229)

To cross-pollinate further, I found yesterdays presentation about Esperanto
particularly fascinating (or to be fair, close my own interests, not that it's
objectively better or worse than something else). So let me speculate a bit.

It'll be at least a year or so before they have something out in the market.
Given that NVidia Volta claims 7 TFlops/s DP, lets say Esperanto is targeting
16 TFlops/s in order to have a competitive product at launch. Further, lets
guess a target clock of 2 GHz. To reach that performance they would thus need
16e12/2e9/2 = 4000 DP FP execution units (the extra divisor of 2 due to FMA
being counted as 2 flops). So that would match pretty well with their 4096
minion cores having 1 DP FP unit each. Now, they also say the minion cores
have the RISC-V vector extension rather than being scalar cores. And the V
extension says that the minimum vector length is 4. Implying that executing an
vector arithmetic instruction takes up at least 4 (consecutive) issue slots, a
bit like old-school pipelined vector supercomputers (think: Cray 1). Meaning
that the primary purpose of the vectors is not to get wider execution width,
but to amortize instruction execution overhead (fetch, decode, etc.) and to
drive memory level parallelism, like in old-school vector supercomputers.

Further, it is mentioned that each minion core has several hw threads. For the
sake of argument, lets make that 4. Now, lets look at the aggregate size of
the vector register files. So each VRF has 32 registers. Since they claim to
be targeting HPC as well, and not only ML/AI, lets assume that means they
support double precision FP. Meaning that the maximum vector element size is
64 bits (8 bytes). And as mentioned before, the maximum vector length must be
at least 4. Thus, at a minimum, each VRF is 32 * 4 * 8 = 1kB. So the total
size of the register file with 4096 minions and 4 hw threads/minion is at
least 4096 * 4 * 1kB = 16 MB. Or if the vector length is 8, 32 MB. Or 8 hw
threads/minion, also 32 MB. Or vector length 8, 8 hw threads => 64 MB. Which
already starts to be a pretty huge number, even on 7 nm. So I wouldn't be
surprised if the vector units bypass the caches and go directly to memory? Oh,
and you'll need absolutely gargantuan memory BW to feed this thing.

And how is this thing supposed to be programmed? Vectorizing compilers,
obviously, but what then? OpenMP? Message passing between the 16 fat cores,
and each fat core running the main thread, farming out to OpenMP threads
running on 4096/16=256 minion cores?

Anyone care to confirm, deny, or at least poke holes in the argument above?

