Oh, I just realized that Roger Espasa, the presenter of this talk, was one of the authors of the Alpha Tarantula paper. Lovely paper back in the day, that showed the advantages of a vector ISA compared to spending the transistors on ever-wider superscalar and deeper OoO.
And he works for Esperanto that was found by Dave Ditzel one of the authers (together with David Patterson) of 'The Case for the Reduced Instruction Set Computer' in 1980.
These guys know what they are up to, this Vector extension will be awesome.
Really hope they go after server CPU after the get their feet wet with their ML super computer on a chip they are trying to sell now. Shouldn't be that hard because there already is a general CPU core in their product. It would need some extensions, but nothing they can't handle (Vector unit, crypto).
They could get a general RV65GCV out there, that would be awesome.
I hope so too, but I'm not holding my breath. Competing with high-end x86 or POWER for maximum single-thread performance is (with my layman understanding) incredibly expensive. Or if you by server CPU's mean something like the recently announced Qualcomm Centriq, competing mostly in terms of throughput/W/$ while still having decent single-thread performance, that's maybe a step easier but still very expensive.
Now, if RISC-V really takes off, I guess at some point we'll see such chips. But "maximum single-thread performance" is probably the last bastion to be attacked.
Totally agree. You don't need "maximum singe-thread performance" to be useful on many server loads. If the energy is cheaper container infrastructure might want to move over.
Lets hope that RISC-V can overtake ARM on this. Seems to me that Esperanto could build something along the lines of a Centriq as well if they wanted to get into this.
I guess they see more money in ML right now, but its good that ARM is pushing into that market and showing viability of lower power chips.
It does look quite GPU-like, so one wonders what is the "secret sauce" that makes customers willing to pay a premium for it? One advantage could be if the programming model exposes the vector ISA directly without it being hidden behind a SIMT model as in CUDA or OpenCL? Of course, then the question is how to manage all those 4096 minion cores? A mix of MPI and OpenMP? Another advantage could be a single memory domain between the "maxion" and "minion" cores, so no need to transfer memory back and forth between main and GPU memory.
Taken together, the programming model could be more like a "standard" one, maybe making it easier to reuse existing code. Then again, nowadays there's probably more people out there familiar with CUDA programming than programming parallel vector supercomputers.
My impression is that the minion cores will be full RV64GCV but very simple - like rocket but with a vector unit. Also that the whole thing acts like a 4112 core with everyone in the same memory space (did he say that?). So you get 16 cores with awesome performance and 4096 that are slower with normal code but are vector capable. You'll want to make sure the right threads run on the right cores, but otherwise it sounded like a massive multicore CPU.
I was happy to see that they've simulated using the minions to do graphics. I've suggested on a number of occasions that lack of a GPU of risk-v could be a problem and that just using a bunch of RV cores with LLVM-pipe might do the job good enough for desktop use. I think they've proved that possibility in simulation. It may not be a gaming machine but it should make a nice desktop machine with solid performance at desktop compositing. Oh, and it'll have teraflops performance for ML and other things.
Me, my hobby is ray tracing which really wants to run on general purpose CPUs. Vector instructions can help a bit, but 4096 cores would be spectacular.
Also, a decent desktop experience might be had with only 4-8 big cores and 16-64 minions doing GFX. I see potential for a lot of variants of that thing. He said it will be licensable.
> My impression is that the minion cores will be full RV64GCV but very simple - like rocket but with a vector unit.
Yes, I think they must be very simple. Otherwise they couldn't fit 4096 of them, even on 7 nm. I'm quite sure it's a scalar cpu (er, non-superscalar to make it clear), with a 1 or maybe 2 lane vector unit.