Hacker News new | past | comments | ask | show | jobs | submit login
I don't expect to see competitive RISC-V servers any time soon (utcc.utoronto.ca)
65 points by goranmoomin on June 25, 2023 | hide | past | favorite | 65 comments



    Given that moving to RISC-V will make people's life harder, these servers
    and their CPUs need to be unambiguously better than the x86 (and ARM)
    server systems available at the same time.
While this would generally be true, it misses an extremely important current factor.

China really wants to find a way to circumvent US soft power limiting their future growth.

The current tech embargo's by the US and friends are only going to make China try harder, and RISC-V seems perfectly positioned to become that circumvention measure.

Note the plethora of Chinese RISC-V boards and chips coming out... which is exactly what they need to do (iterate as fast as possible).


They can do the same copying ARM and x86 designs.


They won’t be able to export them to other countries then


Sure. They're likely doing that too meanwhile. :)


I'm pretty bullish on RISC-V, if anyone has suggestions on how to bet some money on this instinct (invest) I'd be interested to hear about it. Openness, tooling support being decent enough already and general capabilities are all good (and power comsumption could be a differentiator). Actual chips being produced. I'm curious why it's so "underrated" or sometimes even reduced to politics (China). Even with no trade war nonsense it makes quite a bit of sense for countries to move to open architectures (imo). At least I'd be a bit more comfortable betting on it as opposed to ARM where licensing terms may or may not change. Curious to see when the first mass produced RISC-V cell phones will hit the shelves (android support is announced already).


> I'm curious why it's so "underrated" or sometimes even reduced to politics (China)

Because Tenstorrent is licensing 8-wide RISC-V cores and... No one is taking them up on it to turn around and sell.

Because OPENPower is ahead of RISC-V in most ways, with much better software support + IBM pushing it, yet has basically gained zero traction.

RISC-V is nice ISA for high end hardware, in theory, but like other theoretical hardware its irrelevant if its not economical.

In fact, I think ARM was kinda a fluke, and gained traction because of its huge embedded presence and the stars aligning among some big companies.


> No one is taking them up on it to turn around and sell.

And you know that how? Because you can't buy a server with it in it?

> Because OPENPower is ahead of RISC-V in most ways, with much better software support + IBM pushing it, yet has basically gained zero traction.

OpenPower was simply not open. It was 'marketing' open. It was tightly controlled by a few companies and they wanted to sell you expensive CPU.

Only way after RISC-V was successful did OpenPower finally change its license terms, by then it was already to late as RISC-V already had all the momentum.

Being from IBM is not actually a benefit for many companies.

We have seen far, far, far larger investments in RISC-V then we have ever seen in OpenPower. And I question if the software support is really that much better by now.

Ventana, Alibaba, Esperanto, Tenstorrent, SiFive are all investing lots of money, there is a lot of traction. I think we should be careful predicting 10 years ahead with any confidence.


> And you know that how? Because you can't buy a server with it in it?

Nor a desktop/laptop. And running a linux desktop on it (much less a Windows desktop) is not easy from what I can tell.

I want to be enthusiastic about Tenstorrent in particular, but their supposedly open market PCIe accelerator cards never materialized.


I literally have a RISC-V SBC running a graphical desktop next to me right now, it's a MangoPi MQ-Pro.

Sure there aren't many boards so far (good list: http://krimsky.net/articles/riscvsbc.html ), but come on, give it a couple more years! The progress is insanely fast for the hardware world.


You do realise it takes three to four years to go from licensing a core from someone like Tenstorrent to having a mass-production chip, on a board, in the market?

We are just now in the first half of 2023 getting mass-production chips and boards using cores announced in October 2018 (SiFive U74 -- VisionFive 2, Pine64 Star64 and PineTab-V, Milk-V Mars) and July 2019 (THead C910 -- Sipeed Lichee Pi 4A, Milk-V Pioneer).

This is NORMAL.

It took the same amount of time from Arm A53 announced (October 2012) to Raspberry Pi 3 (February 2016) and Arm A72 announced (February 2015) to Pi 4 (July 2019) and Graviton 1 installed at AWS (Movember 2018).

It takes a while, but once set in motion it is basically clockwork inevitable.


> Nor a desktop/laptop.

You realize the world doesn't evolve around you and what you need right?

The market for chips is large selling them to you might not be top priority.


I think the thing you're missing is that RISC-V is already a huge success. It's now the de facto choice for "hidden" CPUs that don't need to run user code - auxiliary CPUs in SoCs, BMCs etc.

It might not seem like it because you never see those.

Given that it's already a success and not going anywhere I don't think it's a stretch to think that it will eventually break into the mobile/server market. Google already announced that they're going to support RISC-V on Android.


Those hidden cores need to be running open source firmware.


It would be better if they did, but this has nothing to do with the ISA.

The only possible link I can see is that you can arbitrarily modify RISC-V on your core, so it is easier to tivoize your software... but this seems like a reach.


No they don't. You can run anything you like on them.


It isn't possible to trust or fix proprietary software, same goes even more for proprietary firmware. All firmware needs to be open source.


You would like all firmware to be open source.

I agree that would be nice. Though for most of these chips it wouldn't help too much without documentation for the chip and all its peripherals/registers, and probably a signing key to get it to actually run your code.

The fact that the code is closed source doesn't really change the trust model much either since the code is usually supplied by the chip vendor. If they wanted to do nasty things and keep them secret they could do it in hardware.


The most likely way risc-v makes it into the datacenter imo is through AI accelerator chips like tenstorrent. Initially this will not replace traditional compute but merely be deployed alongside it. If the cores prove to be fast for general purpose compute, someone will use them for that, since they are already available. From there I can imagine dedicated servers (although still likely aimed at niche workloads for some time)


The author seems to believe that speed is the only competitive factor. The fact is that many services don't need but a fraction of the speed they have available to them and that power consumption is, in some areas, the more important competitive distinction. RISC-V will beat out Intel on power consumption out of the gate, even if they never compete in terms of raw speed.


Low power consumption isn't worth much if the performance sucks. Performance-per-watt is important. Surely AArch64 is going to win over RISC-V there for the foreseeable future?


> Low power consumption isn't worth much if the performance sucks.

Not every service needs high performance. A web site serving static content, for example, or a low-traffic forum. i co-admin multiple public forums, e.g. sqlite.org/forum, which run on single-CPU nodes and get along just fine with that.


10 years is a long time in tech. It's hard to say what will happen in that kind of timeframe.

I suspect RISC-V won't have taken over datacenters by then, but I could see it being an emerging, stably growing phenomenon.

Without sounding too critical, I feel like this is sort of a discussion about strawmen. Keller set up the strawman so this is responding to that, but I think the title or something is misleading and so it doesn't do much to address the fact it's still a strawman. "No competitive RISC-V servers" is not the same as "everything will be competitive RISC-V servers."


This seems like an awfully clueless post. What is "soon"? This year? No. Within five years? Yes. Ten years could well see >50% share. Ten years ago there was no Aarch64 chip available from anyone. The iPhone 5s came out in September 2013, Android phones early 2015, the first SBCs (e.g. Pi 3 and Odroid C2) in 2016 -- just 7 years ago.

Now Arm makes up 25% of AWS.

The RISC-V Linux kernel, gcc & llvm, Debian and Ubuntu are all in fine shape for generic C/C++ datacenter stuff. All the interpreted languages are there, the things with their own JITs are in the worst shape because they are the only things that need a lot of special RISC-V work (given gcc and llvm are long done) but coming along nicely.

There are a ton of "Raspberry PI" size and cost RISC-V boards now, with quad core 1.5 GHz dual-issue JH7110 SoC (like Pi 3) or quad core 1.85 GHz OoO TH1520 SoC (like Pi 4).

There is a server chip (SG2042) available RIGHT NOW that has similar cores to the first Graviton generation four years ago, but 64 cores per chip instead of 16. You can get it on a single-socket board today, and dual- and quad-socket boards are coming within months. That's up to 256 cores, hundreds of GB of RAM, lots of PCIe, lots of L3 cache.

The $100 TH1520 boards make a great development device for things to run on the SG2042 as they have the same C910 CPU cores.

RISC-V chips in the same class as current (or very recent) x86 and Apple M1 will be available for development purposes NEXT YEAR and in mass production in 2025 or 2026.

Is RISC-V going to take over entire datacentres? No, of course not. Is it going to have a significant and growing place in them within the next five years? Yes.


It is unfortunate people takes Jim Keller's word for granted. Much like many people believed in Raja Koduri.

Not only going by literal sense about completely take over Server in 10 years. As in every server in use. I am willing to bet in 10 years time half of the shipping CPU used in server wouldn't even be RISC-V. A feat that even ARM is still far off from claiming it.

It is sad that most if not all CEO ( even if they are supposed to be engineers ) likes to paint some overly optimistic pictures just because they have vested interest.


As L.Torvalds once said, developers wants server with same hardware as used dev machine. So I think RISC-V servers success will happen only after RISC diffusion in dev desktop market.


And yet many developers now have ARM laptops.


While I don't believe in RISC-V taking over the server for the foreseable future, Mr Torvalds definitly never used shared UNIX development servers with X Windows/Telnet connections.


The issue is that servers that go in a datacenter have multiple constraints, not just one.

Computational speed, RAM capacity, networking, amount of power used, all play a part.

For instance, say a RISCV 1024-core multi-chiplet CPU with ability to have 4TB on the motherboard; even if it used 400W for the CPU, it might make sense because of how many VMs you could pack per-system. For datacenter it sometimes may make sense to talk about how "wide" rather than how "deep" you can go, especially for cloud or VM tasks. Each core being slightly slower might not matter, in that you have a lot of concurrent tasks that may well be disk or network IO-bound part of the time...


Intel pairing up with SiFive shows the obvious path forward here where the new ISA is paired with existing tech from a large company.

If Intel decided to ship a core that could execute either x86 or RISC-V instructions, we could see massive adoption in no time at all.


I think it would be informative to draw a comparison with the option of Linux.

The Top 500 supercomputer OS mix is maybe the easiest example to use:

https://en.wikipedia.org/wiki/History_of_Unix#/media/File:Op...

I was specifying large systems for banking applications in the late 90s, and we never installed Linux anywhere. It was all Solaris/HP-UX/AIX (and mainly Solaris!). I actually had Linux running at home at the time, so I was totally aware of it's capability, but there's no way i'd have seen it displacing the Unixes in the sort of timeframe that happened with the Top500.

So, i'd probably conclude that the 'out of nowhere' rise of RISC-V is a possibility. I'd say 5 years sounds quick, but 10 years is a long time. But hey, i'll be able to look back on this comment and reflect what an idiot I was about this in a few years time :)


Killer price / performance / energy efficiency ratio would do the job way sooner than anyone expects


I think one of the most crucial factors for riscv adoption is proper rvv intrinsics and compiler support.

It is currently almost* impossible to add rvv support to things like SIMDe, because neither gcc nor clang can eliminate redundant rvv vector load stores. (https://godbolt.org/z/ocs5rnzrs)

*you can actually do it now with clang, but only if you use statement expression macros instead of functions and set hardcode -mrvv-vector-bits=n: https://godbolt.org/z/r3haM9Yar


Ventana, Alibaba, Esperanto, Tenstorrent, SiFive are all investing lots of money, there is a lot of traction. I think we should be careful predicting 10 years ahead with any confidence.


Nothing new here. We all perfectly know that without extremely performant RISC-V implementations (open source or close source), that on the latest silicon node process, its success will be somewhat limited, even if it means modern ISA standard for all and that worldwide royalty free.


I'm curious why AWS chose ARM instead of RISC-V for their in-house graviton chips.


RISC-V was not ready. The necessary specs were only ratified in late 2021[0].

The first batch of hardware that's compliant with these (which include Tenstorrent Ascalon) is expected to show up in 2024.

0. https://wiki.riscv.org/display/HOME/Specification+Status#Spe...


At the time Graviton 1 deployed in AWS (November 2018) there were only two RISC-V ASICs in the world (SiFive FE-310 and FU-540) and both were low-volume test chips with just CPU cores and some RAM and GPIO. No PCIe, no USB. Proof of concept only.

The base instruction set (RV64GC + priv 1.10) was only set in stone in July 2019.

And of course the Graviton project will have started in more like 2015 when RISC-V was a rapidly-iterating draft spec with no chips, just starting to transition from being a university research project to being managed by the RISC-V Foundation.

A lot has changed in the last 4 1/2 years. There are now RISC-V server chips that are better than Graviton 1 (A72-class, 64 cores vs 16), with much better ones in the pipeline.


> all of this will take money both literally, for hardware, and possibly figuratively, for people's time

anyone who's ever run a company knows that people's time is not figurative money but literal money


The performance characteristics of ARM, RISC-V, and x86 are the same under a server workload. So there's no reason to switch.


Technology shifts are always hard to predict. I don't really have a perspective on whether or not RISC-V will dominate the server market in 10 years ( i think we simply do not have enough info to say one way or the other)

However, the author is taking a very techno-centric line of argumentation. I think the success of RISC-V will mainly decided on a combination of economical and cultural factors. RISC-V wasn't really created to address a fundamental technical limitation of the current ISA lanscape (power,arm,x86,mips etc...) and it's debatable how much of an impact an ISA has on top-line metrics (Jym kellers said so i think).

IMO RISC-V came to be because of the frustration around the lack of innovation and progress around ISA/uArch/CPU design. The author set out to create a common platform to foster innovation and collaboration.

It's mainly a play on better license and modularity which would enable a wider range of innovative design, better collaboration between the research world and the industry.

The closest analogy i can think of is the way LLVM gained market share on gcc on the last 10 years. LLVM/Clang didnt need to be "better" C++/C compiler than GCC to be viable. It was a combination of friendlier code base, a focus on user experience (mainly error message, compilation speed etc...) and a licensing that friendlier to companies which foster massive investment from some of the biggest players. In parallel, the modular nature of LLVM made it the de-facto research platform, so most of the innovation happened there.

I think something similar will be the deciding factor for RISC-V success in the server landscape : - Do we have enough big, deep pocketed player for whom ARM/X86 is a sufficient pain in the rear to warrant large investments - Will the research community come up with enough innovative design to make the transition interesting.

I can't really comment on the research side. But i think if RISC-V succeed on the server side, it will start will some private design from the hyper-scaler to run large in-house software (google search etc...)

The dynamic from the hyper-scaler perspective is definitely changing and interesting : - The hyper-scaler have a lot of in-house software and services where they have much more control on which software runs. Sure they all use OSS software, but the contact surface is much lower there. Making the transitions pain easier to handle. - The economy of producing/designing a chip is very different from a chip manufacturer vs an hyper-scaler.

PS : powerISA is also open, but doesn't seem to gain any traction.... IBM... always to early at the party


I agree with the author. This will be like turning around thousands of tankers.


Some risc-v chips can execute in order and avoid having spectre-like problems. I work at a IaaS where customers deploy through our hosted cicd and we are very interested in risc-v solving our multi-tenancy concerns.

GCC and clang already have enough support for risc-v that we have no concerns for support being there. If GCC and clang have what's necessary, everything else is fine.

The moment we see close to a juncture where we can put an order in for risc-v with very many cores, our investors are going to dump a massive round of funding.

Whether risc-v is popular or not doesn't matter. Our multi-tenant services won't have fundamental security isolation problems, everyone serving on x86_64 and ARM will. Looks like risc-v will dominate where things actually matter, unless some new iteration of OpenPOWER comes out.


> Risc-v chips execute in order

Odd then that the one of the first RISC-V hardware implementations called itself "Berkeley Out-of-Order Machine".


What the person said was "SOME risc-v chips can execute in order".

In fact the vast vast majority of RISC-V chips shipped to date are in-order, including the JH7110 SoC in this year's VisionFive 2, Star64 and PineTab-V, and Milk-V Mars. The only shipping OoO RISC-V chips are the TH1520 and SG2042 with C910 cores, as used in the Sipeed LicheePi4A and Milk-V Pioneer boards.

BOOM was (and is) an experimental university project, not a product. But as Jim Keller said in a recent interview "with RISC-V a couple of university students created an OoO core that was better than the first Intel i7s which hundreds of engineers worked on".


That's a revision. The comment originally made the blanket statement that RISC-V executes in order.


Ok, I didn't see that. It's true for 99.99999% of currently-deployed RISC-V cores but, since the TH1520 and SG2042 boards started shipping in May, not 100% any more.


Most of the server chips I've seen coming down the pipe (Tenstorrent, Ventana, Rivos) are all relatively standard OoO cores. The XuanTie c910 in the Alibaba chips are simpler, but OoO cores as well.


Architecture does not determine microarchitecture like speculation. The BOOM is designed to speculate, and there are anti-spectre protections chips can use.


> don't have spectre-like problems

You could just disable out-of-order execution but there’s a major performance hit. Some simpler ARM designs don’t have speculative execution.


I don't think in order is related to the ISA. There may be a guarantee of the order of operations when it comes to the view of memory the programmer sees, but that does not prevent the hardware from executing out of order, as long as the end result is the same. X86 has strict memory ordering also, but both intel and AMD use out of order at the hardware level


Isn't encrypted memory a better strategy for multi tenancy?

Also, why not have dedicated CPUs per customer? Designing a RISCV CPU for multi tenancy shouldn't be impossible regardless of whether it uses out of order execution or not.


SiFive makes OoO cores for riscv


How is speculative execution inherently insecure?



Because it leads to timing attacks as an immediate next issue.


That's not inherent.


So we’re going to speculatively execute some code and then not speed anything up? I mean, theoretically yes but practically isn’t that the point?


The relevant question is if any information leaks to the other processes. And that can be prevent if you take care in your processor design.


The timing attacks result in data leakage between processes, even if the processes are in different VMs.

It's a huge problem for cloud providers, but most customers don't fully understand the issue and cloud providers aren't forthcoming that "mitigations" are partial at best, so it's really just a huge problem for cloud customers.


Yes but how are the timing attacks an inherent vulnerability and not the result of an accounting error? If the bookkeeping is done properly and the correct amount of cleanup time on a speculated branch is done I don't see how adjacent processes would leak data.


This is an idea many have had before but it doesn't quite work. When you do this, you tend to lose all the performance gained from speculative execution. It's essentially data-independent-timing as applied to loads and stores, so you have to treat all hits as if they were misses to DRAM, which is not particularly appealing from a performance standpoint.

This is not to mention the fact that you can use transient execution itself (without any side channels) to amplify a single cache line being present/not present into >100ms of latency difference. Unless your plan is to burn 100ms of compute time to hide such an issue (nobody is going to buy your core in that case), you can't solve this problem like this.


Why hits to DRAM? Just use cache for speculated branches. The performance gain of the difference between the length of the speculated branch and the length of the bookkeeping is still there. There are workloads with short branches that would have a performance penalty. In those cases it would be helpful to have a flag in the instruction field to stop speculative execution.


It's not that simple. The problem is not just branches but often the intersection of memory and branches. For example, a really powerful technique for amplification is this:

ldr x2, [x2]

cbnz x2, skip

/* bunch of slow operations */

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

skip:

Here, if the branch condition is predicted not taken and ldr x2 misses in the cache, the CPU will speculatively execute long enough to launch the four other loads. If x2 is in the cache, the branch condition will resolve before we execute the loads. This gives us a 4x signal amplification using absolutely no external timing, just exploiting the fact that misses lead to longer speculative windows.

After repeating this procedure enough times and amplifying your signal, you can then direct measure how long it takes to load all these amplified lines (no mispredicted branches required!). Simply start the clock, load each line one by one in a for loop, and then stop the clock.

As I mentioned earlier, unless your plan is to treat every hit as a miss to DRAM, you can't hide this information.

The current sentiment for spectre mitigations is that once information has leaked into side channels you can't do anything to stop attackers from extracting it. There are simply too many ways to expose uarch state (and caches are not the only side channels!). Instead, your best and only bet is to prevent important information from leaking in the first place.


Some timing difference are inherent but if they are exploitable is the real question. There are paper and tools produced that can give you a high confidence that you are not leaking.


Much of transient execution research over the years has been invalidated or was complete bogus to begin with. It was extremely easy to get a paper into a conference for a while (and frankly still is) just by throwing in the right words because most people don't really understand the issue well enough to tell what techniques are real and practical or just totally non-functional.

You have to stop the leak into side channels in the first place, it's simply not practical to try to prevent secrets from escaping out of side channels. This is, unfortunately, the much harder problem with much worse performance implications (and indeed the reason why Spectre v1 is still almost entirely unmitigated).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: