Hacker News new | past | comments | ask | show | jobs | submit login
Is RISC-V ready for HPC? Evaluating the 64-core Sophon SG2042 RISC-V CPU (arxiv.org)
102 points by anewhnaccount2 on Dec 10, 2023 | hide | past | favorite | 67 comments



Closely related: discussion of the Milk-V Pioneer workstation employing this chip [1].

[1] https://news.ycombinator.com/item?id=38553647


This is how an academic paper looks like nowadays? I thought it would fit Tom's Hardware more... maybe I'm getting old.


These are workshops paper. ACM workshops were always like that.


The other thing to look at is the affiliations. All these people work for https://www.epcc.ed.ac.uk/ i.e. they are working for a HPC facility attached to a university.


I see, that's a bit of a relief. Thanks for the clarification. Though I still think it should go to Tom's Hw or ServeTheHome instead of arxiv :)


This particular chip is very slow, so definitely not. However Sophgo are going to switch the cores to using SiFive P670 in the next iteration (SG2380) and those cores have much faster single thread performance.


It's not that slow.

It's about the same as the AWS Graviton 1 that went into their data centres in 2019, except it's got 64 cores while the Graviton has 16.

On tasks that can use all 64 cores (building software, web serving, and yes HPC) it can have about the same total performance as a current generation 16 core x86 that costs about the same.


Is that still happening after SiFive let go a bunch of staff recently?

https://news.ycombinator.com/item?id=37996295


To my understanding SiFive continues to offer their selection of core IP. Anyways, I would assume any existing contract would have to be fulfilled for various legal reasons.

SG2042 itself is T-HEAD C920 design which is a mess, and might not be even called a RISC-V compliant design. We are kinda stuck it existing and being used in various chips. There are other design issues discovered IIRC (atomic might not work properly [at least on the kernel side workarounds required]; floating point failures in glibc testsuite because FP not being compliant). SG2044 is scheduled for the next year (2024). Not many details are known: 64 cores, 8 DDR controller, 3x memory bandwidth, vector v1.0 support, 2x PCIe (unknown what that means, Gen3 -> Gen4? More lanes?). The cores are unknown, but SG2038 is SiFive P670. T-HEAD has C908 that support vectors v1.0 (and solves some other issues), but that's a smaller core. Not a replacement for C910/C920.


Thanks, that's good info. :)


This is dated after the layoffs, so presumably yes

https://www.sifive.com/press/sophgo-licenses-sifive-risc-v-p...


Sophon CPU? Is this a reference to "The Three-Body Problem" novel?


Those transistors better be down to the size of protons at least.


If they’re not that small, just fold up some dimensions until it is.


I can guarantee they are at least as large as protons.


I don't really get why people are so excited about RISC-V. An open ISA doesn't really offer any additional freedoms to the user. It does not mean we get truly open hardware.

This whole thing is just about Chinese companies not wanting to pay for ARM licenses anymore. Which is good for them but I don't get the excitement from tech people.

Even if a RISC-V CPU would someday offer the same performance for the same price than an x86-64 counterpart, it would be a strictly worse deal, as software support will be so much worse. The x86 monoculture was an amazing time to live in and helped us enjoy so much backwards compatibility. The promised lower power-consumption might be nice but we will have to see how much the ISA really matters for that.


From a tech-nerd point of view, here is the reason why I am so excited about RISC-V. RISC-V means there is a possibility of creating a completely open source implementation of an industry-grade microprocessor, just like the Linux kernel did for operating systems, democratising the business of building advanced microprocessors. I can't work for Intel or Arm design teams. Hardware is not my main job and entry barrier to those companies is so high (without mentioning age and geographic barriers). However, I would have loved if there was a real, industrial, free and open implementation of a microprocessor where you collaborate with smart people all over the world to solve interesting problems and implement advanced hardware designs and algorithms.


I feel the same way. I think RISC-V has changed my opinion on the hardware aspect of things, and now for the first time I want to try to create a simple CPU at home.

There is even a simpler RV32E spec, for those (like me) with little hardware experience. Perhaps RV32E is reasonable to start with?

https://five-embeddev.com/riscv-isa-manual/latest/rv32e.html


The only difference between RV32E and RV32I is 16 registers vs 32. That saves a little silicon die space, or some TTL chips if you're building something like in the 1970s, but it doesn't remove any complexity.


The 10 cent CH32V003 https://www.wch-ic.com/products/CH32V003.html implements RV32E

I find it hard to believe that much silicon is saved these days by having 16 registers vs 32; in fact, I wish they would not have made this extension -- the lowest-end RV CPU possible. In other words, if the minimum were 32 regs, as is standard, maybe such a chip would cost 11 cents.

On the 3rd hand, compilers handle 16 regs just fine, and also, the kinds of jobs these 10-cent parts do are no doubt well-served by having 16 registers.

$ 4.95 for 50 pieces: https://www.aliexpress.us/item/3256804850399956.html


I'm well aware ... I have some CH32V003 chips and dev board. Very cool little chips:

https://www.youtube.com/watch?v=1W7Z0BodhWk

https://www.youtube.com/watch?v=-4d3PgEXhdY

The main reason for 16 registers is not even the silicon savings, but decreased interrupt response time from having fewer `A` and `T` registers to save and restore. Standard RISC-V has 15 of them, plus `RA`, which is a lot more than the main microcontroller competition. The downside is that normal code can run 20% to 30% more slowly due to an increased number of register spills to the stack.

Interrupt response time is more about the ABI than the ISA, so this could be handled with simply a different ABI, but some people want the smaller silicon size as well. Those who want to can compile `-march=rv32i -mabi=ilp32e` to use the RV32E ABI but also have `x16` to `x31` available as callee save registers.

It was initially assumed that only RV32E would be defined, but when RV32E was ratified a year ago, RV64E was also because there has been unexpectedly high demand from chip designers for 64 bit RISC-V with 16 registers, for small and simple controllers in Cortex M0 class, but in a chip with 64 bit applications processors, so the controller can access all of RAM.

Both 32 bit ARM and 64 bit x86 have been getting by with 16 registers, but Arm put 32 in their 64 bit ISA, and Intel recently extended x86 to 32 registers (and 3-address instructions!) also.


Note that the C910 CPU cores used in this chip are in fact open source:

https://github.com/T-head-Semi/openc910

(C920 is just C910 plus RVV draft 0.7.1 vector unit which pretty much no software uses anyway, sadly)


There is the Berkeley BOOM core as well [1], which is more interesting to me. [1] https://github.com/riscv-boom/riscv-boom


Really, I think it’s about competition and monopolies. If enough vendors get behind the ISA, we could see a massive ecosystem flourish, and cheap RISCV chips that are widely available.

A bonus would be if enough people are working on the ecosystem, and contributing back, we could get the benefit of all the contributors.


Just speculating and projecting into the future - if world further drifts apart as it currently seems to be the case then both ISAs are US or UK which might come under export control and such.

But in theory, that might not happen with an open source ISA though I have not read the fine print on the RISC-V licensing and this euphoria might not be warranted.


the risc-v privileged spec was the first time i looked at a page table structure and thought 'hey, writing a protected-memory operating system doesn't really sound that hard' — though admittedly i haven't done it yet

also though rv32i is a pretty nice assembly language, though not quite as nice as arm32. what would be just about perfect would be arm32 with compare-and-branch instructions instead of condition codes, and modeless rv32c-style compressed instructions instead of a whole separate decoder mode that's full of easy ways to confuse your disassembler

also, a lot of companies and individuals are doing experimentation with the isa that arm licenses didn't allow them to do

there's no arm equivalent to the ch32v003 or serv or even picorv32. picorv32 means you can run linux on an fpga, and it's tantalizingly close to being supported with a fully free toolchain

it would be cool if you could write an assembly-language program on a ch32v003 and run it on a supercomputer, but that probably won't be a useful thing to do in practice; rv32 and rv64 are incompatible enough to be problematic, and cramping your code to fit into 16 registers will compromise its performance on your supercomputer, not to mention not using isa extensions like v. and if you don't care about its performance on your supercomputer you might as well just use qemu


I don't get why people trust PRC-fabbed silicon. You couldn't pay me to put anything that comes from these fabs on my home network.


Its so so important for gatekeepers not to exist for free market capitalism to succeed. RISC V is reinstating this state of normalcy..


As much as free market capitalism can exist when China completely dominates the RISC-V market.


Except they don't.

At the moment Chinese companies (well, WCH) dominate the stand-alone RISC-V microcontroller market, which has mindshare with engineers designing PCBs well out of proportion to the market size or profitability. Western companies dominate the market for RISC-V cores that are invisibly going into larger chips in huge markets such as mobile phone SoCs made by Apple, Qualcomm, and Samsung.

Chinese companies are also at the moment dominating the assembling of RISC-V cores into Linux-capable chips with the large collection of other IP needed for this, and putting those chips on to SBCs. But the CPU cores used are a mix of Chinese and Western ones, with for example SOPHGO's upcoming high performance 16 core SG2380 SoC using American SiFive P670 cores.


The appeal in my case was thinking with the RISC instruction set, I just picked up the vision v 2 SBC and plan to use it to learn.


I think it's amazing that RISC-V is advancing at such a pace that there are already RISC-V cores that are "only" ten times slower than similarly-SotA x86 cores.


4x slower. It's only 10x slower if you can use SIMD on the x86 but don't use the Vector instructions the RISC-V has.

By this time next year the speed difference in shipping machines will be reduced to 2x. Speed parity in probably 2026 -- or maybe 2020 x86 speed in 2026 ... close enough that it doesn't matter for most purposes anyway.


Considering that they are comparing against an Icelake Xeon with AVX-512, it is pretty decent.

I mean the answer is basically no, it isn’t ready. But x86 has a pretty huge head start, RISC-V had to start somewhere.


Hi all - I'm one of the authors of the paper so thought I would post my thoughts. Firstly, thanks for the interest in the paper - it's really nice to have this sort of discussion. As it's been highlighted in the comments this is a workshop paper (the workshop on RISC-V for HPC last month at SC) which allows us to focus on some of the more practical aspects compared to, for instance, a main-track conference or journal etc. Given the availability of this 64-core RISC-V CPU we felt that it would be interesting to, independent of the manufacturer, explore some of the performance and try and answer the question around how realistic a proposition this is for HPC workloads (I suppose really trying to preempt questions from the HPC community around whether this 64-core CPU moves us closer to RISC-V being a realistic choice for HPC).

Obviously the numbers are in the paper so you can draw your own conclusions, but we were pretty impressed by the results overall - both in relation to the SG2042 itself and also more widely what this means for RISC-V. The SG2042 isn't perfect (as has been highlighted here, it only supports RVV 0.7.1 for instance), but my feeling is that it's a significant step forward for the RISC-V community.

For the SG2042 specifically, as it's been highlighted in these comments, it is within the same order of magnitude (pretty much anyway depending on which number(s) you look at) as well established x86 high-performance CPUs (and we threw in an old Sandybridge CPU too as a bit of a baseline). I think that's pretty impressive, after all the SG2042 is a first-generation RISC-V CPU from Sophon being compared against mature x86 CPUs. As someone else has said they need to start somewhere and are now building on this as illustrated by their roadmap. Furthermore, something we didn't consider in the paper was price - this is a tricky one as it can depend on where you are geographically with exchange rate etc, but I think that the SG2042 is probably a fair bit cheaper than some of the x86 CPUs we compared against too (when they were new anyway).

What I think is pretty phenomenal here is the pace of change for RISC-V more widely. At the start of 2023 the best commodity available RISC-V hardware that we could get was the 4-core VisionFive V2. As we show in the paper, each C920 core in the SG2042 is quite a bit faster for the benchmarks than the U74 in the V2, but also the SG2042 is providing 64 vs 4 cores. This is within the space of 12 months, or so, and there seem to be a whole load of new high performance RISC-V hardware planned for general availability in 2024 (from a range of manufacturers across the globe) including new CPUs and high-core count accelerators (e.g. see the slides of the four vendor talks at the workshop we presented this paper at https://riscv.epcc.ed.ac.uk/community/sc23-workshop/ ). So I think it's really interesting to see the trajectory here of RISC-V to date and over the next few years to track whether this pace continues (or even accelerates!)

My personal feeling is that unless it unlocks some very significant new capabilities (which to be fair is possible), using RISC-V CPUs instead of a x86 CPUs in supercomputers will probably be a tough sell in the short term. However, I think there is a lot of potential on the accelerator side of things and I suspect this is where we will start seeing RISC-V emerge for HPC initially (and maybe by stealth where people are unaware that their compute or in-network accelerators are leveraging RISC-V in some way).


Thank you for doing this!

How is availability of this kind of chip? Did you have to jump through hoops to get your hands on it or was it as simple as placing an order somewhere? And what about the rest of the system?

I take it this was your test system: https://milkv.io/pioneer but in spite of the 'buy now' links it doesn't seem any of them will actually get you a system, it's all pre-order not GA so I'm kind of curious how you got your hands on one. I hate it when 'buy now' doesn't mean 'buy now'.

Edit: Ah, in the acknowledgements: "We thank PerfXLab for access to the SG2042 used in this work."

More about these systems:

https://www.crowdsupply.com/milk-v/milk-v-pioneer


Shipments of preorders are starting right about now. I think there will be GA sometime in January.


Ok! Very interesting development this.


you don't need a paper from a group of UK developers to understand its performance & potential. The processor is made by a Chinese startup, as China is fighting an all out semiconductor war with the US, there is literally unlimited public & private investments that could be poured into such a 64 cores processor if it is indeed remotely on par with the SOFA of x86. Any Chinese company holding such crown jewel would spend billions of $ on PR to get everyone in China to know its name to further milk more investments.

As someone who live in China and has been in the area for decades, the only reason why I have not heard about it until today is pretty obvious - it is a toy implementation no one cares about. There is actually a dedicated term in Chinese describing such junk - “落后产能”, which means backward production capacity.

They knew the expected performance, they knew it is going to be laughed by peers in the area, but they still did it for a good reason - to just fool certain low IQ investors in China to get a free ride of the whole RISC-V thing. Whoever behind this laughable release should really be ashamed - what is the next move? glue together maybe 1024 of those 8051 "cores" and claim it has built a supercomputer on a chip?


you evidently do need a paper from a group of uk developers to understand that it is not junk; as the abstract explains, it's 5–10 times as fast as other risc-v hardware, and even performs better on some workloads than "the x86 high performance CPUs under test", though it still lags behind them by a factor of 4–8 on average

this is enormous progress, and much more convincing than performance projections from simulations carried out by the chip's own designers


> as the abstract explains, it's 5–10 times as fast as other risc-v hardware

you mean those controllers designed to power your SSDs or the 10 cent RISC-V based microcontroller?

> though it still lags behind them by a factor of 4–8 on average

10x slower when compared to the second gen Epyc processor, which is already 2 generations behind.

that 10x gap is not for some specially picked areas, they are the day to day tasks.


no


Nothing laughable about it.

It's comparable on a core to core basis with the first AWS Graviton server chip that was deployed in 2019, only four years ago -- except this has 64 cores while the Graviton has 16.

It's also comparable to current x86 chips with 16 cores that cost about the same but use a lot more electricity. Each core is about 1/4 the speed of the x86, but you get four times more of them, which is fine for many server or HPC workloads.


> It's comparable on a core to core basis with the first AWS Graviton server chip that was deployed in 2019, only four years ago

you do released that this is not an apple to apple comparison, don't you?

First gen Graviton was created specifically for ec2's A1 instances where performance is not the goal. Amazon designed the chip with a well defined & targeted market in mind, they delivered just that. Tell me which half decent org is going to use this 64 cores junk?

You also realize that Graviton can afford to go from a relatively slow gen1 to the current gen4 in just 4-5 years largely due to the fact that they licensed the cores from ARM which has been building this whole ARM stuff for 30 years. Tell me what makes you to believe that some random & unknown newcomers can duplicate such progress without paying all those extremely expensive & time consuming efforts?

The concerned core is built by a group known for selling fake stuff online, historically and statistically, those people don't build the next gen tech.

> except this has 64 cores while the Graviton has 16.

gluing together more highly inefficient cores is not rocket science.


It is absolutely apples to apples. The SG2042 is the first RISC-V server chip, with much the same goals as the first Graviton.

It's not going to take 4-5 years for RISC-V server chips to match x86 (and Arm). They'll be there three years from now, by the end of 2026.

Those designs are already in the pipeline, made by many of the same people who previously designed the fastest chips at AMD, Intel, and Apple.


[flagged]


now that’s an elaborate way to stomach not everyone sharing your opinions


Does Betteridge's law apply here?


yes


I'm sure it's far from being a sophon.. :)


Three body problem


What's very nice about RISC-V 64bits: code assembly once, run it everywhere, almost quite literaly... no absurdely and grotesquely massive and complex compilers anywhere, no planned obsolescence, feature creeps on computer language syntax nowhere to be found, ultra stable in time, near 0 SDK.


That's not really true though is it? A lot of the speed and interesting bits for HPC would come in the form of ISA extensions, and the paper even mentions challenges in this area due to the chip only supporting version 0.7.1 of the RVV vectorisation extension.

Regardless, I imagine in HPC you'll want to recompile anyway to get the most bang for your buck, unless you're doing a short run. Why throw away performance if you'll be running your code for days or months?


Well, this is much more true that with classic computer languages. That's more than enough when you look at the desastrous stability of classic computer languages. Not even reasonable with C, and I have strong suspicions about rust syntax (it seems it became not much less complex and insane than c++, wrong?). Most other real life computer language syntaxes are hopeless if you are honest with yourself, that on even less than 10 years cycles (blame ISO and compiler extensions).

If one cannot write reasonably a naive but real-life alternative compiler for a computer language syntax, which would be REALLY stable in time -> full frontal compiler planned obsolescence and feature creep.

That's why risc-v has a very high potential: roughly speaking, you exit the compilers, which is a very good thing.

That's why I wish RISC-V to succeed, we all know once some code paths are assembly written, we get a very strong independence of those absurdely and grotesquely massive and complex compilers, that with a world wide/PI lock free standard ISA: this is priceless.

I see risc-v compiler support as legacy support.

You can even get a nice middleground with high level language program interpreters written directly in 64bits risc-v assembly. Think about a python3 interpreter, a javascript interpreter, lua, etc etc...


I'll admit I have a fever so perhaps that is why, but you're not making much sense to me.

It's not like we haven't had ISAs spanning different core architectures already. Take Intel's NetBurst (Pentium 4) vs Core (Core 2 Duo fex) architectures. Same ISA, quite different optimization targets.

I don't see why RISC-V will be different in this regard.

Of course relying on higher-level languages with JIT compliation is a thing, thats why Julia[1] exists. And with RISC-V you do have those extensions that languages like Julia could take advantage of. But it would have to be implemented in Julias complier and libraries, there's no free lunch.

[1]: https://julialang.org/


> But it would have to be implemented in Julias complier and libraries, there's no free lunch.

Actually, for the most part Julia itself doesn't need to be concerned at all with the ISA or hardware differences. That's mostly LLVM's job. So yes, someone does need to implement it, but many languages would get to benefit from that work, not just Julia.

It may not be a free lunch, but you can pay for one lunch and feed many mouths.


I don't think it has anything to do with your fever- the post looks like autogenerated garbage. Below they were speaking about "planned obsolescence of compilers" which makes no sense as many make efforts to support legacy architectures.


Hey maybe you can get together with the grand unified pure pipeline dataflow functions theory guy and make some kind of HN self-help group.


That was more than a decade ago.

Nowadays you write pipeline generic assembly as a lot happen at runtime in modern micro-archs. If some static "optimizations" go in, it is mostly those who are likely to be "true or benign" enough across most if not all micro-archs, for instance cache line or code fetch window alignment (and even that...), branching reduction, etc.

You can still have pipeline specific optimized assembly code, usually just a matter of installing/branching to the right assembly code path at runtime, hardly more.

We have to realize than with that, the entire insane (the word is fair) cost and planned obsolescence of optimizing compilers are literaly... gone... and just for that, even if some assembly code paths are a bit slower, oh god, this is worth billions!

But there is a pitfall though: if the assembly code is written using an ultra powerfull macro preprocessor, "c++ grade" (you get the picture), this would be a complete loss, at it is just displacing the core of the issue from an optimizing compiler to an omega preprocessor.


> no absurdely and grotesquely massive and complex compilers anywhere

Absence of evidence is not evidence of absence, and anyway there's not even an absence: https://github.com/riscv-collab/riscv-gnu-toolchain https://llvm.org/docs/RISCVUsage.html

> feature creeps on computer language syntax nowhere to be found

At least one of us is very confused, and in case it's me, how do language details matter to RISC-V?


Yeah, no. I have a bad feeling that fragmentation will become a mess in RISC-V world. Just few month ago Qualcomm threatened to not support the existing C extension and proposed their own 16-bit instruction set. Also another example of such mess from a sibling post by davidlt (https://news.ycombinator.com/item?id=38591977):

> SG2042 itself is T-HEAD C920 design which is a mess, and might not be even called a RISC-V compliant design. [...] There are other design issues discovered IIRC (atomic might not work properly [at least on the kernel side workarounds required]; floating point failures in glibc testsuite because FP not being compliant)

Guess what in the above T-HEAD C920 situation would be to best way to achieve "code once"? Add the workaround for the buggy instructions in the compiler and then recompile, with no changes.


I think once google picks their minimum riscv extension set for android (soon?) then the rest of the vendors will aim at that leaving everything else as obsolete


"Rome was not built in one day": Mistakes, debugging, fixing MUST happen.

And it is going to hurt. To get rid of the plague of optimizing compilers, I am willing to pay that price.

And instead of having to to fix assembly code generation in those ultra-complex compilers, it is much more reasonable to hook in an assembler which is beyond orders of magnitude saner than dealing with any optimizing compilers.


Would love it you would expand more. "Mistakes, debugging, fixing" ?

'plague' of optimizing compilers? Who really cares what they do, as long as the code it produces executes the abstract machine that the compiled language provides?

If you're saying optimizing compilers are broken, that's different. I fully expect AI to be bolted on to optimizing compilers - especially for global optimization - that is going to be mind-blowing when it works error-free.


No ai will be able to solve np-hard problems


It's not an issue of 'solving' arbitrarily large np-complete problems; it's solving the required problems (e.g. register usage across multiple function calls) better than today's compilers.

And, AI will for sure do that.


you might do the same on x86 as long as you code using only the instructions you'd find on a 286 or a 386 (if you feel fancy you can assume you always have the x87 math coprocessor)


Just realized that the chip is probably backed by Alibaba, it uses the core built by Alibaba, that is the company behind most of those fake stuff sold online. Its founder openly challenged the financial system in China as he argued that his algorithm targets and profits from the poorest more efficiently than transitional banks!

Now everything can be explained. What you can expect from Alibaba?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: