Hacker News new | comments | ask | show | jobs | submit login
New Libre GPU Effort Based on RISC-V (phoronix.com)
78 points by xvilka 3 months ago | hide | past | web | favorite | 14 comments

I don't have particularly high hopes for this. "Start looking at HW" is their last step in the roadmap. GPU microarchitectures are all about being a reflection of the hardware, amortizing a lot of traditional CPU components across as many threads as you can, to such a degree that there are ISA implications.

EDIT: What would be cool would be a more traditional GPU architecture, married to RISC-V minion looking cores that today already exist, but aren't open at all. There's a lot of very closed (even more so than shader unit) RISC processors in GPUs. AMD has a RISC core reading and processing the command lists, among other things. https://github.com/fail0verflow/radeon-tools/tree/master/f32 PowerVR has one doing thread scheduling and argument marshaling (Programmable Data Sequencer). Nvidia has their cores that they just switched to RISC-V.

Imagine having a custom command list tailored to your application sort of like the benefits custom N64 RSP microcode had.

> GPU microarchitectures are all about being a reflection of the hardware, amortizing a lot of traditional CPU components across as many threads as you can...

Is that so bad? I would think the point of the project is to demonstrate proof of principle, not to actually build something usable. So efficiency would be pretty low on the list of priorities.

The whole point of a GPU is efficiency. If you don't care about that you can already run OpenSWR on a rocket core and call it good.

I have recently done a realtime tile software rasterizer and shader in SSE.

Graphics APIs (openGl, dx) are leaky in that the graphics programmer is knowingly targeting an architecture with GPU performance properties. This means raster is cheap, texture and filter is cheap, free fixed function blending and saturations, free clears, and free compressions.

CPU rendering cannot hope to compete against GPU hardware as GPU style optimizations were made by the graphics programmer in their usage of the GPU APIs to render 3d scenes.

For those interested in the RISC-V and the possibility of GPU type of things. I can mention some stuff that might not be well known.

RISC-V is devloping a Vector extension (V) will allow SIMD style programming with a variable vector length.

See from the last workshop:

Intro: https://www.youtube.com/watch?v=S4fxBZD79gc

Project Update: https://www.youtube.com/watch?v=ESu9NI3h1Y4

Initially part of the standard was also a Vector Type field for each register, and that would have allowed different types, such as Tensors, Matrix and so on.

This has been removed from the initial Vector extension proposal but work on this will continue. At least one company is already actively devloping hardware for V with Tensors (see below), sadly not open-source.

This Libre GPU project is going a slightly different route targeting the Simple-V, a slimmer version of the Vector extension, Simple-V. See: http://hands.com/~lkcl/simple_v_chennai_2018.pdf

Esperanto Technology is devloping a chip (and IP) that will do this, but it will be closed source (as far as we know).

See this talk by David Ditzel: https://www.youtube.com/watch?v=f-b4QOzMyfU

I love this idea. It's kind of a Xeon Phi, but with RISC-V and an ISA that could be tailored for different functions.

Imagine a processor with a couple beefy RISC-V cores, with lots of memory bandwidth and deep pipelines, sharing the system with some more cores that are more power efficient (but slower) and some more cores that have very wide SIMD pipelines, but sacrifice branch prediction and speculative execution for that.

I'd love to program such a beast.

Well, on the other hand the article is very light on details; indeed it doesn't even say that the proposed GPU would be massively multicore in the way Larrabee/XeonPhi was.

Also, RISC-V completely defeats the benefits that Mike Abrash & co. designed into the ISA. LRBni was x86 based, but certainly had a lot of highly-CISC-y features. At minimum, fused load-op & store-op, in addition to replicated mask setting. For instance, one of the most important instruction primitives was the instruction 'addsets'; for registers A, B, and K

    A += B
    sets K to the sign bit of scalar float in A
Which is an odd duck, except if you knew this function's nickname: "the rasterizer". LRBni was chock full of these instructions; they were added because Mike knew what the hotspots were in building SW rasterizers after decades of experience.

There were a few other instructions that were implemented (MAD233) and not implemented (full permute) that were needed to finish out the performance profile.

In addition, LRB was designed so (almost) every instruction retired in 4 stages. Each 'core' was a 4-wide barrel shifter so, except for a wart dealing with RAW mask register ops, all instructions (including fused test-jmps) retired "the next cycle".

LRB died on it's shit backbone (the triple ring). If they'd had a proper message-passing cache, with a parallel scratch RAM next to the cache hierarchy it'd have knocked everyone's socks off---even in 2009, three years late.

For those curious, but without the context enough to google the right terms, addsets is pretty obviously the inner loop of a barycentric coordinate rasterizer. With the right setup, K will be set based on whether or not a pixel is inside a triangle.

Indeed. The closest was this linked page http://libre-riscv.org/3d_gpu/

It targets Simple-V, meaning a vector ISA extension, that 'could' be massively parallel.

This is something I have been interested in for a long time. I was hoping that if Risc-V were to split off to a GPU we would get a very similar set of instructions for both the CPU and GPU allowing us to use one compiler.

let's see if an open source gpu would unleash user friendly oss machines.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact