
Ask HN: Does anyone have something negative to say about RISC-V? - kiriakasis
There seem to be (deserved) unanimous love for RISC-V, but are there thing that could have been done better? or choices without a clear best answer on which there is still some debate?<p>I believe that learning where a technology is weak is a necessary step to learning where it is strong
======
Someone
It’s hard to critique a design that’s mostly hypothetical (implementations
exist, but those parry critiques such as “it is slow” or “it doesn’t have
vector instructions” with “it’s only the first version”).

So, the only thing one can readily critique is the instruction set. Here, they
made trade offs that some would argue are bad. For example

\- They deliberately choose to leave parts of the instruction space open.
That’s good for extensibility, but must have a negative effect on code
density.

\- They are extremely liberal in allowing implementations to mix and match
instruction set extensions. There already are 13 different extensions,
potentially allowing for 8,192 different instruction sets (many of them quite
silly), and they are already thinking on how to handle over 26: _“Since single
letters will run out someday, it is currently discussed to use a new naming
scheme with Zxxx for standard instruction extensions and Yxxx for non-standard
(vendor specific) instruction extensions”_
([https://en.wikipedia.org/wiki/RISC-V#ISA_Base_and_Extensions](https://en.wikipedia.org/wiki/RISC-V#ISA_Base_and_Extensions))

~~~
nickik
First of all, the ISA does not need implementations to be non-hypothetical.
The ISA is defined and frozen and that includes most of the extensions.

There have been 100s of tap-outs from different RISC-V chips. So clearly its
not hypothetical in that sense either.

> \- They deliberately choose to leave parts of the instruction space open.
> That’s good for extensibility, but must have a negative effect on code
> density.

and

> \- They are extremely liberal in allowing implementations to mix and match
> instruction set extensions.

That's not a problem that is a feature. It was specifically designed to
extendable and to have many different extensions so implementers could pick
and choice of what they want.

RISC-V is designed to work for close to sensor tiny chips all the way to
super-computers and everything in between. To be able to handle that with one
instruction set you can not force every deeply embedded chip to implement all
the features.

To combat the explosion theoretical ISA the foundation defines profiles.
Profiles are designed to select standard configurations for specific
applications or groups of users. For example, RISC-V for Linux uses 'RV64GC'
and all Linux software uses it.

However if you use RISC-V internally you can pick whatever you like for your
needs.

~~~
Someone
Early Bluetooth had (and probably still has) profiles, too. I don’t think it
is a slam dunk to say they were a net benefit.

Similarly, USB-C cables may or may not allow USB 3.1 (doubles speed to
10Gbit/s), power delivery, or alternate mode (sending video signals over some
of the pins). That makes it easier for suppliers to create such cables, but
does it help adoption?

Further back, the first phones running java featured so many profiles that
programs were barely portable between phones.

------
wk_end
There's nothing particular interesting about RISC-V. It's a simple, clean,
boring MIPS-like ISA.

A lot of their freedom to keep it simpler and cleaner than older RISC ISAs
comes from the fact that they aren't responsible for engineering real
implementations; whereas the MIPS and SPARC and ARM folks ended up with delay
slots and register windows and conditional execution because it helped them
get high speed CPUs out with the processes of the day, the RISC-V people just
say "well, those eventually turn out to be bottlenecks for high speed-
implementations, so we won't have them" \- never mind that these hypothetical
high-speed implementations will require enormous engineering efforts and will
require piles of money to actually build. We're not quite talking
"Sufficiently Smart Compiler", because the technology does exist - but it's
not totally far off.

People have been comparing it to Linux, but it's just an instruction set, not
an implementation of one - so it's really more like POSIX - just a theoretical
standard of compatibility that'll allow an ecosystem to build around it.

The problem is that, as Someone mentions in this thread, there's lots of
officially codified extensions, and no restrictions (as far as I know?) to
keep implementors from devising their own. We'll see how this turns out; my
guess is, if RISC-V catches on, we're going to end up with mutually
incompatible extensions, balkanization, and proprietary extensions becoming
accepted "standards" in the "embrace, extend, extinguish" style.

~~~
nickik
> A lot of their freedom to keep it simpler and cleaner than older RISC ISAs
> comes from the fact that they aren't responsible for engineering real
> implementations;

So David Patterns doesn't understand anything about implementations?

They designed and tapped out many chips including out-of-order cores before
they froze the ISA and they are confident that it works for simple and high-
performance implementations.

You seem to imply that 'delay slots and register windows and conditional
execution' might not be needed for high performance but maybe that RISC-V does
not have them harms them if they don't have 'high-speed implementations' but
that is not correct. RISC-V is competitive or better compared MIPS, SPARC, ARM
on the low end.

~~~
wk_end
I don’t mean to suggest that the involved parties don’t _know_ about
microarchitectural implementations. Names aside, it’s clear from reading the
RISC-V documents that they do. And I don’t mean to imply that they’re wrong
about the various “unclean” elements of other RISC ISAs being bottlenecks,
either.

All I mean to say is: as academic designers of a license-free ISA in the
2010s, to whatever extent they’re incentivized at all, they’re incentivized
towards designing a “simple” and “elegant” ISA regardless of the concrete
impact on performance or die space in a way that the MIPS, SPARC, and ARM
teams in the early 80s - designing for commercial production and profitability
using the processes of the time and their limited resources - weren’t.

That’s not necessarily a criticism - but the consequences of their decisions
on performance and adoption are harder to say. You’re claiming otherwise, but
without further evidence and based on my personal understanding of
architecture and microarchitecture design I believe that ceteris paribus a
low-end RISC-V implementation is likely to perform worse and use more silicon
than the corresponding MIPS, ARM, or SPARC implementations. A high-end
speculative, out-of-order, modern process, etc RISC-V implementation would
likely be able to outperform a similar one for those architectures with less
silicon. The question for me is whether we get to the point where the market
builds one, and whether those potential performance improvements would (alone
or in tandem with RISC-V’s other benefits) justify adoption.

~~~
nickik
I see your point about intensives.

However I have to disagree on your opinion on the hardware. RISC-V was
specifically designed to be able to do low end very well. The whole encoding
for example is optimized to a tiny amount of space. The overall instruction
set again is designed to use minimum amount of space as well.

There is a whole project about 'ultra low power processors' at ETH Zürich [1].
They have specifically selected RISC-V because it fit their needs so well and
they produce cores that beat available ARM and MIPS cores specially on energy.
IBM selected a RISC-V core for their project in 'Next-Generation Edge
Computing' [2].

So it seems to me that small and simple really does pay of when going to very
tiny cores.

[1] [https://www.pulp-platform.org/](https://www.pulp-platform.org/)

[2] [https://content.riscv.org/wp-
content/uploads/2018/05/16.10-1...](https://content.riscv.org/wp-
content/uploads/2018/05/16.10-16.25-Seiji-Munetoh-IBM-Japan.pdf)

------
cesarb
I have been following the main RISC-V ISA design mailing list for a while, and
of course there has been criticism of some design decisions there.

My personal criticism would be that there are still missing parts: the B
extension (bit operations like count leading zeros/count trailing
zeros/popcount/byte swap), the V extension (extensible vector operations), and
the P extension (packed SIMD) aren't ready yet. If your problem domain needs
these, you'll have to either wait until they're specified, or use a non-
standard extension (of course, a benefit of the RISC-V ISA design is that it
makes easy to use non-standard extensions when you need to).

Other criticisms I've seen in the mailing list:

\- As already mentioned by someone else here, the RV32 and RV64 variants
(32-bit and 64-bit registers) are incompatible, that is, code written for RV32
won't run on RV64, while with some small tweaks to the instruction encoding,
it could;

\- Some have argued that the C extension (compressed 16-bit instructions -
normal instructions are 32-bit) could have been done in a better way;

\- All Linux distributions have converged on RV64GC as their base ISA, which
includes the C extension, and some have complained about this choice;

\- There have been some complaints of possible interrupt latency on embedded
systems due to having to save/restore registers without hardware help.

------
RNeff
Integer divide by zero does NOT raise an exception.

From the Spec: We considered raising exceptions on integer divide by zero,
with these exceptions causing a trap in most execution environments. However,
this would be the only arithmetic trap in the standard ISA (floating-point
exceptions set flags and write default values, but do not cause traps) and
would require language implementors to interact with the execution
environment’s trap handlers for this case. Further, where language standards
mandate that a divide-by-zero exception must cause an immediate control flow
change, only a single branch instruction needs to be added to each divide
operation, and this branch instruction can be inserted after the divide and
should normally be very predictably not taken, adding little runtime overhead.

~~~
nine_k
Is this a genuinely bad decision? The spec makes a good job arguing why it's
not.

------
giomasce
It is not really something negative (because it will have to be seen in the
future), but to me the biggest point about RISC-V is whether it will deliver
its promise of eventually giving birth to completely open and high quality
CPUs. The experiments so far are interesting and the people and firms working
on it seems (to me) to be trustworthy, but the implementations so far are
still lacking: the HiFive Unleashed board has a lot or proprietary IP inside
and I am told that it has some stability problems. I understand and agree that
things take time, but in the end we will have to see if a good and open
ecosystem develops around RISC-V or not.

~~~
jdoege
Moreover, no one is releasing the gate-level netlists nor the LEF and DEF nor
mask data. Without those you can't have security unless you build your own.

------
jdoege
What I have negative to say has more to do with those producing
implementations than the architecture or RTL. In order to have the security
afforded by being open, every part of the process must be open from the spec
to the silicon. The entities producing RISC-V devices are simply passing
forward the RTL along with their additions and changes. They are not releasing
the work product of any of the steps between there and silicon. There are many
differences between RTL and the gate-level representation, many of which
represent potential security issues (testability features, for instance.)
Between gates and physical design more changes can be made and between
physical design and mask, even more. Some of those could result in sloppy
security holes or some could even be maliciously introduced security holes.
With any of the current providers there is no way to know or test for these
potential issues.

------
CodesInChaos
Note that the following are just my impressions from reading though the spec
about a year ago. I did not do any deeper investigation beyond that and might
misremember some details.

* IMO they were a bit too generous with using up 32-bit instruction encodings, only 3 of 28 major opcodes (blocks of instructions) are still reserved, though there is still some space for individual instructions within several of the used blocks. Using 4 major opcodes for 4 different sign variants of fused multiply-add seems particularly wasteful. (Of course there is still plenty of space in 48-bit and longer instruction spaces)

* Integer overflow handling code is quite verbose and might also result in lowered performance (at least in simple implementations). I'm unhappy that many languages choose silent wrapping over trapping on overflow out of performance concerns (e.g. Rust and C#), so efficient overflow checking is quite important to me. (Efficient overflow checking is also useful for seamlessly switching to big integers on overflow, which is my preferred approach for high level programming languages)

* Their approach to atomics is elegant but also unconventional. This might make porting existing code difficult. In particular the lack of double wide compare-and-swap looks like a pain point.

* Their approach to SIMD/vector instructions only seems a like a good approach for floating point arithmetic, but might struggle with common integer tasks. For example I'm not sure how shuffles fit into their variable length vector model.

* I don't like their choice of sign extending unsigned integers in the ABI.

~~~
brandmeyer
> * IMO they were a bit too generous with using up 32-bit instruction
> encodings, only 3 of 28 major opcodes (blocks of instructions) are still
> reserved, though there is still some space for individual instructions
> within several of the used blocks. Using 4 major opcodes for 4 different
> sign variants of fused multiply-add seems particularly wasteful. (Of course
> there is still plenty of space in 48-bit and longer instruction spaces)

Along the same vein, the FPU also uses three bits per instruction to provide
for static rounding modes. Almost nobody even touches the rounding modes, and
those that do only do so dynamically because that's what ARM, Power, and x86
give you. Its as if the designer wanted to get rid of the floating-point
status and control register. The experiment failed, but the legacy cruft is
still there.

~~~
titzer
Rounding modes aren't going away, so adding them to the instruction encoding
is IMO cleaner, since it gets rid of ambient state that alters the semantics
of instructions. It no doubt simplifies the internal implementation, since
there are no implicit dependencies between every FP operation and the status
register. Fewer dependencies == more parallelism.

~~~
brandmeyer
The trouble is that rounding modes need to be dynamically specified in order
to enable debugging of suspicious floating-point programs. See William Kahan's
rants about debugging tools for floating-point arithmetic for a detailed
explanation of it, which I will attempt to summarize.

If you have in hand several algorithms A, B, and C, and one or two of them
produce results that differ significantly from the other, how do you tell
which one is more likely to be in error? Kahan has a tool that helps: The
method which is more sensitive to roundoff error will have its results
perturbed worse by directed roundings, and is more likely to be the method in
error.

So, you run each of the suspect codes in three modes: Round to nearest even
(the default), round to +inf and round to -inf. The code whose results don't
change the least under directed roundings is more likely to be correct than
the others. In practice, the difference can be pretty dramatic.

This method just helps to point the arrow of suspicion, it isn't foolproof.
But it is a useful debugging tool. Static rounding modes don't help for a
broad class of problems without recompilation. Good luck running some slice of
a Python program rounded three ways, for example. It also requires the ability
to control which portion of the program is affected, since decimal -> float
parsing code only works correctly in the round to nearest even mode.

That's the crux of my argument: IEEE754 requires dynamic rounding, the
numerical analyst needs dynamic rounding, and dynamic rounding subsumes static
rounding. Given all that, burning three bits per opcode is pretty damn
expensive, especially when you're cramming FMA4 into 32-bit opcodes. Its a
mistake at least as bad as branch delay slots. But since its hiding in the
floating-point portion of the ISA, it isn't as obviously wrong to the common
programmer.

------
deepnotderp
1\. It fully subscribes to the Stanford Orthodox RISC propaganda. (E.g. "RISC
is the best, CISC sux ") and this leads to some rather questionable ISA
choices, e.g. No condition codes, poor vector instruction implementations,
excessive architectural registers, bad code density, too many ISA extensions,
lack of exceptions on many things, etc.

The other general point I really dislike is their focus on ISA as if that
matters so much compared to uArch.

~~~
nickik
> poor vector instruction implementations

All I have heard is that the people who know about vectors absolutely love the
extension. Why is your problem with it?

> bad code density

That is just plain false. Its actually extremely good, RV32C and RV64C easily
beat every other ISA. Even without the 'C' extension the difference to the
other ISA are minimal.

See:
[https://www.youtube.com/watch?v=Ii_pEXKKYUg](https://www.youtube.com/watch?v=Ii_pEXKKYUg)

> The other general point I really dislike is their focus on ISA as if that
> matters so much compared to uArch.

You don't like that the people who are designing an ISA focus on the ISA? That
seems an incredible strange argument.

It is well known and the RISC-V people often repeat that ISA don't matter much
for performance. That's why it makes 0 sense not to have an open one that is
not restricted so that you can have competition and open source
implementations.

------
rwallace
As I understand it, if you are going to go out of order, a large number of
architectural registers is actually a handicap; it costs more resources to map
them to physical registers.

Given that, was it the right choice to go with thirty-two architectural
integer registers? Sixteen registers would be mostly adequate, and would seem
to make it easier to implement out of order. What am I missing?

~~~
titzer
> As I understand it, if you are going to go out of order, a large number of
> architectural registers is actually a handicap; it costs more resources to
> map them to physical registers.

Hmm, I doubt it. You need at least as many hardware registers as reorder
buffer entries, which these days is 100+. The only thing I could think of
would be extra space for checkpoint entries inserted into the ROB slot having
more bits, and generally more bits to encode register indexes. Besides that,
more hardware registers means more ILP available to the compiler before it
starts spilling to the stack, which is overall a good thing, since it gives
the OOE more instructions that have explicit data dependencies through
registers rather than implicitly through spill slots.

------
cmrx64
rv32 and rv64 are incompatible. this is unfortunate, and it's not clear to me
what the tradeoff for that is.

~~~
pertymcpert
What kind of compatibility are you thinking of? Running 32 bit code in 64 bit
mode?

~~~
cmrx64
yes. [https://groups.google.com/a/groups.riscv.org/d/msg/isa-
dev/e...](https://groups.google.com/a/groups.riscv.org/d/msg/isa-
dev/eOQ6xN9pzrs/63-QG6xcAgAJ) has a long thread about this. it's not the worst
thing in the world, but I think the choice is confusing when it seems easy to
sidestep. this is the only negative thing I have to say about RISC-V and it's
not very major.

------
londons_explore
I question if RISC-V went far enough.

Having human readable and understandable instructions seems unnecessary in the
days of optimizing compilers.

Instead, the CPU should have a bunch of hardware, and then allow the compiler
to define it's own instruction set via a large control blob. Each software
binary should begin with "here are all the instructions I'm going to use, and
here is what actions should be taken for each".

The mapping the applications desires for each possible instruction to the
actual hardware should then be left to some OS component which is easily
upgradable. It is the component which will have by far the biggest performance
impact.

------
ChaosMarine
The only thing I have against RISC-V is that I do not have a real RISC-V
server up and running.

There is this 'SiFive unleashed'[1] but there could be something more real
than that. I am ready to put down a lot of money on getting myself some real
hardware. I am totally fed up with x86, its management engine and insanely
bloated ISA. It is time for open hardware to take over my server closet.. and
desktop.. and laptops.. and so on.

Now, not later!!!

[1]: [https://www.crowdsupply.com/sifive/hifive-
unleashed](https://www.crowdsupply.com/sifive/hifive-unleashed)

------
bitcoinmoney
I work for one of the big semicon and there is effort in the industry/our
company to replace ARM with RISC-V. It is currently in the trial phase right
now where I think the low-end would be tackled first to do a pipe-cleaning of
the design flow and as a proof of concept. Once that's done (don't know how
long) the big desktop cores will probably follow and then server. I don't know
if it will play out that way but that's probably how it's being sold by VPs in
order to get funding.

------
repolfx
Well, compared to Intel's chips it lacks a lot of advanced tech like SGX, TSX,
MPK etc.

Compared to the Mill it's a rather boring design that doesn't try to do
anything new. The Mill is even more hypothetical than RISC-V but at least it
is a genuinely interesting architecture.

~~~
nickik
RISC-V started with the simple core to bootstrap software and cover many
initial workloads.

But RISC-V will add extensions for more advanced features because many core
will need them.

Saying that RISC-V is boring is a bit unfair because there is lots of
interesting work done in terms of the Privilege Architecture, Vector unit,
trusted execution and so on. Those will all be part of the RISC-V standard.

SGX/MPK see:

\- [https://keystone-enclave.org/](https://keystone-enclave.org/)

\- There is a TEE working group in the foundation:
[https://content.riscv.org/wp-
content/uploads/2018/05/09.00-0...](https://content.riscv.org/wp-
content/uploads/2018/05/09.00-09.45-Security-Task-Group-8th-RV-Workshop-
May-9-2018.pdf)

\- There is lots of work on tagged memory (see LowRisc/Shakti)

------
ibotty
RISC-V is a something one might even learn somethings from, maybe. But it will
never be a viable commercial alternative.

\-- ARM

------
mhkool
There is no RISC-V-based computer commercially available.

~~~
nickik
What's your definition of a 'computer'?

~~~
brandmeyer
I need to be able to buy it in quantity 1 without talking to a salesman. For
embedded, that pretty much means Digikey, and for general-purpose compute that
means Newegg.

There are plenty of economically interesting use cases where neither of those
things are true, but I don't consider any of them to be computers in and of
themselves.

------
btrask
Yes, I do.

Turing completeness means that instruction sets simply do not matter. In fact,
you cannot buy a "native" x86 processor today. Every modern x86 processor
actually has hardware to run some highly optimized microcode instruction set,
plus an x86 emulator in software that runs on top.

ARM is the current viable alternative to x86. You could put an ARM chip in a
desktop and it would be cheaper and faster than RISC-V for the foreseeable
future. RISC-V is even more experimental than that and it has nothing to offer
even in theory. The only standard for an ISA is performance. Any other
features can and should be built on top.

I set the bozo bit on anyone crowing about RISC-V. Don't get me started on the
Mill, it's just a shitty GPU.

~~~
deepnotderp
Please explain why you feel the Mill is "just a shitty GPU"

~~~
btrask
GPUs are the state of the art in massively parallel processors. If you are
building a parallel processor, it should be evaluated as a GPU to see how it
stands up. For example, Larabee was tried as a GPU and found wanting.

~~~
jcranmer
Uh... not really.

Historically, the design of GPUs are essentially about having extremely high
data-parallelism of relatively tiny, simple codes (shaders) that all operate
on the same massive data unit but have very little communication. The
tradeoffs that get involved in making this code means that GPUs are effective
at speeding up only certain kinds of computations. Workloads that are memory-
latency bound or communication heavy--such as a BFS graph traversal--turn out
to be utter crap on GPUs, and you can only claw back some of that performance
with epic engineering effort.

Larrabee and its successor Xeon Phi concepts were actually fairly well-
received in terms of giving you something that could tackle non-GPU-amenable
problems. What caused Larrabee to suffer was that Intel basically tried to
flog a "just write C/C++/Fortran and the autovectorizer will take care of
everything" model for programming which, well, didn't work (as many compiler
people could have predicted for you before you started). Nvidia also made sure
to do a massive push to get CUDA working in terms of library support,
educational cooperation, and language flogging to get people interested into
CUDA, which Intel didn't attempt for Xeon Phi.

~~~
imtringued
>What caused Larrabee to suffer was that Intel basically tried to flog a "just
write C/C++/Fortran and the autovectorizer will take care of everything" model
for programming which,

This reminds me of the story of ispc [1] One guy wrote a compiler based on
SPMD while the rest of the company was focused on the doomed auto
vectorization.

[1] [http://pharr.org/matt/blog/2018/04/30/ispc-
all.html](http://pharr.org/matt/blog/2018/04/30/ispc-all.html)

~~~
jcranmer
That was one of the sources I used for the comment.

