Alibaba/T-HEAD's Xuantie C910: An open source RISC-V core

magicalhippo · 2025-02-04T10:27:42 1738664862

Yet I also feel the things C910 does well are overshadowed by executing poorly on the basics. The core’s out-of-order engine is poorly balanced, with inadequate capacity in critical structures like the schedulers and register files in relation to its ROB capacity. CPU performance is often limited by memory access performance, and C910’s cache subsystem is exceptionally weak. The cluster’s shared L2 is both slow and small, and the C910 cores have no mid-level cache to insulate L1 misses from that L2. DRAM bandwidth is also lackluster.

I'm not a CPU designer but shouldn't this be points that one could discover using higher-level simulators? Ie before even needing to do FPGA or gate-level sims?

If so, are they doing a SpaceX thing where they iterate fast with known less-than-optimal solutions just to gain experience building the things?

pjc50 · 2025-02-04T10:59:50 1738666790

Quite likely, yes. It should be possible to make estimates of how much your cache misses are going to impact speed.

But there's a tradeoff. It looks like they've chosen small area/low power over absolute speed. Which may be entirely valid for whatever use case they're aiming at.

Note from the git history is that this is basically a 2021 design. https://github.com/XUANTIE-RV/openc910/commits/main/

brucehoult · 2025-02-04T16:20:29 1738686029

No, that's when they open sourced it. It was designed in 2018/early 2019 and picked up the May 2019 RVV spec. By late 2021 I already had a commercially sold C910 dev board (RVB ICE).

https://linuxgizmos.com/dev-kit-debuts-risc-v-xuantie-c910-s...

Android was shown running on an earlier C910 board (ICE EVB) for the same THead ICE test chip already by January 2021:

https://www.hackster.io/news/alibaba-s-t-head-releases-open-...

JoachimS · 2025-02-04T09:46:53 1738662413

Another amazing analysis by Chester Lam. I'm astounded of the cadence and persistence, and at the same time the depth and comprehensiveness.

nxobject · 2025-02-04T10:25:47 1738664747

I swear there's one brillaint chip journalist/analyst at every moment that has the Mandate of Heaven to do brilliant things. Anand Lal Shimpi was that person once, then Ian Cutress...

IanCutress · 2025-02-04T15:02:34 1738681354

We need to get Chester on the podcast more :)

clamchowder · 2025-02-04T19:32:21 1738697541

Oh that should be fun. Would have to fit it around work though

gnfargbl · 2025-02-04T07:56:02 1738655762

There are some comparative benchmarks on a slightly older post: https://chipsandcheese.com/p/a-risc-v-progress-check-benchma...

fcanesin · 2025-02-04T08:42:03 1738658523

Apache-2 open source: https://github.com/XUANTIE-RV/openc910

klelatti · 2025-02-04T10:04:17 1738663457

Open source but my impression is that this contains verilog generated by another program?

RobotToaster · 2025-02-04T12:47:55 1738673275

https://github.com/XUANTIE-RV/openc910/issues/9

Vperl, apparently

klelatti · 2025-02-04T12:57:19 1738673839

Thank you, that's very helpful.

pjc50 · 2025-02-04T11:01:01 1738666861

Could you point at some examples? It's not unusual for things to be code-generated and it doesn't necessarily impact the license.

klelatti · 2025-02-04T11:44:31 1738669471

All the RTL basically. It’s in a directory called gen_rtl (generated RTL?) and has remarkably few comments for such a complex code base.

Also although technically it's open source if it's generated verilog then isn't that a lot less useful than the code that was used to generate the rtl?

torginus · 2025-02-04T11:09:33 1738667373

Has it been demonstrated that RISC-V is architecturally suitable for making chips that equal the performance of high-end x86 and ARM designs?

I remember this post,by an ARM engineer, who was highly critical of the RISC-V ISA:

https://news.ycombinator.com/item?id=24958423

gpderetta · 2025-02-04T12:56:34 1738673794

As long as it looks vaguely like any other register based ISAs, generally there is very little in an architecture that would prevent making an high performance implementation. Some details might make it more difficult, but Intel has shown very effectively that with enough thrust even pigs can fly.

The details would be in the microarchitecture, which would not be specified by RISC-V.

Havoc · 2025-02-04T11:21:05 1738668065

If I was an arm engineer I’d be critical of something threatening my job too

klelatti · 2025-02-04T11:41:19 1738669279

For clarity from the original analysis

> This document was originally written several years ago. At the time I was working as an execution core verification engineer at Arm. The following points are coloured heavily by working in and around the execution cores of various processors. Apply a pinch of salt; points contain varying degrees of opinion.

> It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I'd likely implement the architecture to benefit from the existing tooling.

Narishma · 2025-02-04T16:29:52 1738686592

> Has it been demonstrated that RISC-V is architecturally suitable for making chips that equal the performance of high-end x86 and ARM designs?

How would you demonstrate that besides actually building such a chip?

brucehoult · 2025-02-04T16:37:52 1738687072

The criticisms there are at the same time 1) true and 2) irrelevant.

Just to take one example. Yes, on ARM and x86 you can often do array indexing in one instruction. And then it is broken down into several µops that don't run any faster than a sequence of simpler instructions (or if it's not broken down then it's the critical path and forces a lower clock speed just as, for example, the single-cycle multiply on Cortex-M0 does).

Plus, an isolated indexing into an array is rare and never speed critical. The important ones are in loops where the compiler uses "strength reduction" and "code motion out of loops" so that you're not doing "base + array_offset + indexelt_size" every time, but just "p++". And if the loop is important and tight then it is unrolled, so you get ".. = p[0]; .. = p[1]; .. = p[2]; .. = p[3]; p += 4" which RISC-V handles perfectly well.

"But code size!" you say. That one is extremely easily answered, and not with opinion and hand-waving. Download amd64, arm64, and riscv64 versions of your favourite Linux distro .. Ubuntu 24.04, say, but it doesn't matter which one. Run "size" on your choice of programs. The RISC-V will always be significantly smaller than the other two -- despite supposedly being missing important instructions.

A lot of the criticisms were of a reflexive "bigger is better" nature, but without any examination of HOW MUCH better, or the cost in something else you can't do instead because of that. For example both conditional branch range and JAL/JALR range are criticised as being limited by including one or more 5 bit register specifiers in the instruction through having "compare and branch" in a single instruction (instead of condition codes) and JAL/JALR explicitly specifying where to store the return address instead of having it always be the same register.

RISC-V conditional branches have a range of ±4 KB while arm64 conditional branches have a range of ±1 MB. Is it better to have 1 MB? In the abstract, sure. But how often do you actually use it? 4 KB is already a very large function -- let alone loop -- in modern code. If you really need it then you can always do the opposite condition branch over an unconditional ±1 MB jump. If your loop is so very large then the overhead of one more instruction is going to be far down in the noise .. 0.1% maybe. I look at a LOT of compiled code and I can't recall the last time I saw such a thing in practice.

What you DO see a lot of is very tight loops, where on a low end processor doing compare-and-branch in a single instruction makes the loop 10% or 20% faster.

clamchowder · 2025-02-04T19:31:48 1738697508

"don't run any faster than a sequence of simpler instructions"

This is false. You can find examples of both x86-64 and aarch64 CPUs that handle indexed addressing with no extra latency penalty. For example AMD's Athlon to 10H family has 3 cycle load-to-use latency even with indexed addressing. I can't remember off the top of my head which aarch64 cores do it, but I've definitely come across some.

For the x86-64/aarch64 cores that do take additional latency, it's often just one cycle for indexed loads. To do indexed addressing with "simple" instructions, you'd need at a shift and dependent add. That's two extra cycles of latency.

dzaima · 2025-02-04T23:28:41 1738711721

Note that Zba's sh1add/sh2add/sh3add take care of the problem of separate shift+add. But yeah, modern x86-64 doesn't have any difference between indexed and plain loads[0], nor Apple M1[1] (nor even cortex-a53, via some local running of dougallj's tests; though there's an extra cycle of latency if the scale doesn't match the load width, but that doesn't apply to typical usage).

[0]: https://uops.info/table.html?search=%22mov%20(r64%2C%20m64)%...

[1]: https://dougallj.github.io/applecpu/measurements/firestorm/L... vs https://dougallj.github.io/applecpu/measurements/firestorm/L...

dist-epoch · 2025-02-04T14:03:12 1738677792

And people have been even more critical about x86 architecture. It doesn't matter that much.

teleforce · 2025-02-04T13:41:35 1738676495

Can we have open source and RISC-V inserted into the title to be more descriptive?

dang · 2025-02-04T21:59:11 1738706351

I've put them up there now. Is it better?

whatever1 · 2025-02-04T08:40:18 1738658418

Apart from the lip service that has obvious reasons to support that latest chip technology is crucial to the national security, I fail to understand why this is the case.

What is the disadvantage that a country has if they only have access to computer technology from the 2010s? They will still make the same airplanes, drones, radars tanks and whatever.

It seems to me that it is nice to have SOTA manufacturing capability for semi-conductors but not necessary.

qwertox · 2025-02-04T10:13:35 1738664015

Once China has a CPU that is really good enough for most critical tasks, it might as well start dealing with Taiwan in order to let other countries see how well they progress if they no longer have the manufacturing capabilities of TSMC and others at their disposal.

If played well, it could even let them win the AI race even if they and everyone else have to struggle for a decade.

pantalaimon · 2025-02-04T10:17:43 1738664263

They already have Loongarch which is faster then current RISC-V designs.

janice1999 · 2025-02-04T20:08:35 1738699715

And Zhaoxin which is equivalent to 2015 Intel.

https://en.wikipedia.org/wiki/Zhaoxin

nabla9 · 2025-02-04T09:40:12 1738662012

In the early 2000s, bringing China into the global community was widely seen as a strategic decision. The Bill Clinton and George W. Bush administrations supported integrating China’s economy into the international rules-based system.

China did not want to integrate. China has been seeking strategic independence in its economy by developing alternative layers of global economic ties, including the Belt and Road Initiative, PRC-centered supply chains, and emerging country groupings for longer than US.

Too much technological ties with China are seen as a potential vulnerability. It's not just technology itself, but it's importance in trade and economy. If the US or its allies have value chains tightly integrated with China on strategic components, it creates dependence.

buyucu · 2025-02-04T10:26:54 1738664814

This is dishonest. China didn't spend the last 20 years invading multiple countries, committing acts of mass-murder and destabilizing the whole Middle-East.

If anything, China's rise is a stabilizing factor for the whole world. It balances the aggression originating from the United States.

whatever1 · 2025-02-04T13:26:54 1738675614

Ok to be fair, for the past ~3000 years the Middle East has not shown any evidence that it can be stable.

pjc50 · 2025-02-04T14:26:02 1738679162

The Ottoman empire lasted for longer than the United States currently has.

bzzzt · 2025-02-04T08:51:50 1738659110

> What is the disadvantage that a country has if they only have access to computer technology from the 2010s?

Try running a LLM on hardware from 2010.

whatever1 · 2025-02-04T09:00:21 1738659621

Running LLMs I am absolutely positive I could do with my Dell r810 cluster back in 2010, if I had access to DeepSeek.

Training a frontier model probably not. Again not clear what is the strategic benefit of having access to a frontier LLM model vs a more compact & less capable one.

Almondsetat · 2025-02-04T12:34:00 1738672440

Government office computers need LLMs?

pjc50 · 2025-02-04T14:25:07 1738679107

At this rate they're going to get them whether they need them or not. Big push in the west for "AI everywhere" e.g. Microsoft Copilot; UK has some ill-defined AI push https://www.bbc.co.uk/news/articles/crr05jykzkxo

logicchains · 2025-02-04T10:19:27 1738664367

>They will still make the same airplanes, drones, radars tanks and whatever.

Eventually there'll be fully autonomous drones and how competitive they'll be will be directly proportional to how fast they can react relative to enemy drones. All other things being equal, the country with faster microchips will have better combat drones.

05 · 2025-02-04T11:50:32 1738669832

Alternatively, the country which has the biggest drone manufacturer in the world that can sell a $200 drone[0] capable of following a human using a single camera and sending the video in real time over 20 km using the same inhouse designed chipset both for AI control and video transmission [1] would probably win.

[0] https://www.dji.com/neo

[1] https://fpvwiki.co.uk/dji-neo-fpv-drone

saidinesh5 · 2025-02-04T11:13:54 1738667634

> All other things being equal, the country with faster microchips will have better combat drones

That's very unlikely imo. When it comes to drones.. no matter how fast your computation is, there are other bottlenecks like how fast the motors can spin up, how fast the sensor readings are, how much battery efficient they are etc...

Right now the 8 bit ESCs are still as competitive as 32 bit ESC, a lot of the "follow me" tasks were using lot less computational power than what your typical smartphone these days offers...

logicchains · 2025-02-04T11:27:00 1738668420

Current drones are very limited compared to what they could do with a lot more processing power and future hardware developments. E.g. imagine a drone that could shoot a moving target hundreds of metres away in the wind, while it itself was moving very fast.

nolist_policy · 2025-02-04T11:50:55 1738669855

Battleships could do that in the 1960s without any silicon: https://m.youtube.com/watch?v=s1i-dnAH9Y4

A drone moves faster, but I don't think that changes the calculations.

pjc50 · 2025-02-04T14:30:43 1738679443

The difficult bit for the drone is probably spotting the target.

There doesn't seem to be a great interest in having (small) drones shoot things yet, all the current uses seem to be:

- drone itself is the munition

- drone is the spotter for other ground based artillery

- drone dropping unguided munitions (e.g. grenades)

"Large" drones (aircraft rather than quadcopter) seem to follow the same rules as manned aircraft and engage with guided or unguided munitions of their own. If the drone is cheap enough then "drone as munition" seems likely to win.

blacksmith_tb · 2025-02-04T15:46:46 1738684006

Not that I am eager to see armed drones, but I would have thought recoil would be a hard problem for such a light vehicle?

re-thc · 2025-02-04T10:15:30 1738664130

> They will still make the same airplanes, drones, radars tanks and whatever.

At the same cost and speed? Volume matters.

> is crucial to the national security

National security isn't just about military power. Without the latest chips, e.g. if there were sanctions, it could impact the economy. The nation can be become insecure e.g. by means of more and more people suffering from poverty.

gostsamo · 2025-02-04T09:43:57 1738662237

On the top of my head, you would like to cut costs for materials and nuclear research, big data analytics, all kinds of simulations, machine learning tasks that might not be llm-s, but still give you intelligence advantage, economic security in being able to provide multiple services on lower prices. If necessary, one can throw money, people, and other resources at a problem, but those could be spent elsewhere for higher return on investment. Especially, if you have multiple compute intensive tasks on the queue, you might have to prioritize and to deny yourself certain capabilities as a result. So, I'd say that it is not any one single task that needs cutting edge compute, it is the capability to perform multiple tasks at the same time on acceptable prices that is important.

hassleblad23 · 2025-02-04T14:07:22 1738678042

I wonder if the Nvidia stock can rally as much now as it did in the past.

Alifatisk · 2025-02-04T09:11:31 1738660291

They used words like ”ultra” high performance, I am not that knowledgeable to confirm this, but is it really that?

gnfargbl · 2025-02-04T09:26:18 1738661178

From the benchmarks on another of the blog posts, I very roughly estimate this to be about 1/50 the performance of a Ryzen 7950X for CPU-bound tasks not requiring vector instructions. For vectorizable workloads it will be much slower due to the lack of software SIMD support.

stonogo · 2025-02-04T09:14:39 1738660479

Only compared to other RISC-V processors.

logicchains · 2025-02-04T10:14:32 1738664072

Does anyone know why 910 is used for both this and Huawei's 910 AI chip? Does 910 have some special meaning in Chinese, or is it just a coincidence?

sylware · 2025-02-04T11:31:44 1738668704

This site is so good (for now...).

That said, isn't the C910 with a critical buggy vector block?

It is amazing acheivement in a saturated market. The road for reaching a fully mature and performant large RISC-V implementation is still long (to catch up with the other ones)...

... but a royalty free ISA is priceless, seriously.

spaceman_2020 · 2025-02-04T08:01:51 1738656111

87 points and no comments? Strange!

haunter · 2025-02-04T09:04:54 1738659894

My usual HN observation:

- If points are (significantly) higher than comments: the submission is very niche or highly technical that a lot of people can appreciate but only a few can meaningful comment. See right now here 120 vs 14

- If comments are higher than points or levitating towards 1:1 ratio: casual topic or flamewar (politics). See the DOGE post on the front page now, 1340 vs 2131

That being said I think "healthy" posts have a 1.5:1 - 2:1 ratio

simion314 · 2025-02-04T08:10:15 1738656615

why strange, you really wanted to see my comment "nice article with good research"? but I am not a hardware guy so it might be filled with mistakes.

misiek08 · 2025-02-04T08:25:19 1738657519

I think „scary” is the best word. If of course rumor about choosing RISC-V over ARM is true! We just saw big win of ARM over Intel's multi-decade scam and yet here we are with another split from ARM, because of politics. Scary to see how stupid, talking people, convincing other people to hate without real reasons can lead to such stories…

wren6991 · 2025-02-04T08:39:51 1738658391

> We just saw big win of ARM over Intel's multi-decade scam

We also just saw Arm sue one of their customers following an acquisition of another of Arm's customers, and try to make them destroy IP that was covered by both customers' licenses. Nobody wants to deal with licensing, and when the licensor is that aggressive it makes open alternatives all the more compelling, even if they're not technically on-par.

buyucu · 2025-02-04T08:55:57 1738659357

RISC-V is open source. Everyone wins when we switch from Arm to RISC-V. Well, everyone except Arm.

Malidir · 2025-02-04T08:45:50 1738658750

RISC-V is open source.

Stop being so squareand embrace the future maaaaann.

stonogo · 2025-02-04T08:35:22 1738658122

I'm less convinced the RISC-V move is about politics and more convinced it's about not paying for ARM licenses.

Malidir · 2025-02-04T08:48:11 1738658891

Arm is Softbank.

And Softbank is all in on Team USA.

buyucu · 2025-02-04T11:09:05 1738667345

softbank is usually the dumb money on the poker table, funding bad ideas long after anyone intelligent leaves them. wework is probably the best example.

getting funded by softbank is probably a good proxy indicator for companies losing their competitive edge.

JoachimS · 2025-02-04T09:48:17 1738662497

At least in the USA, Europe. Remember ARM China going rouge: https://news.ycombinator.com/item?id=28329731

martinsnow · 2025-02-04T15:09:54 1738681794

I don't see how they went rouge? https://www.datacenterdynamics.com/en/news/arm-lays-off-70-s...

Arm is still in control. Some went to form another company however.

pantalaimon · 2025-02-04T10:16:49 1738664209

But they don't have access to new designs and no way to add new architecture extensions that could be widely adopted.

Mistletoe · 2025-02-04T11:11:30 1738667490

China is already red.

martinsnow · 2025-02-04T11:12:32 1738667552

Arm the company is based in the UK and thr UK wants to rejoin the EU. I don't think you should be so sure about that.