This has been discussed many times on many forums on the Internet.
The summary is that RISC-V is inefficient because it requires more instructions to do the same work as other ISAs and it does not have any advantage to compensate for this flaw.
Those extra instructions appear especially in almost all loops and the most important reason is that RISC-V has a worse set of addressing modes than the the vacuum-tube computers from more than 60 years ago, which were built only with a few thousands tubes, compared to the millions or billions of transistors available now for a CPU.
Because of this defect of the RISC-V ISA, the Alibaba team who designed the RISC-V implementation with the highest current performance (Xuantie910, which was presented last month at Hot Chips) had to add a custom ISA extension with additional addressing modes, in order to be able to reach an acceptable speed.
Whenever the designers of the RISC-V ISA are criticized, they reply that the larger number of instructions is not important, because any high-performance implementation should do instruction fusion, to be able to reach the IPC of other ISAs.
Nevertheless, that is wrong for 2 reasons, instruction fusion cannot reduce the larger code size due to the inefficient instruction encoding and the hardware required for decoding more instructions in parallel and for doing instruction fusion is much more complex than the hardware required for decoding less instructions with a better encoding as in other ISAs.
> Nevertheless, that is wrong for 2 reasons, instruction fusion cannot reduce the larger code size due to the inefficient instruction encoding
RISC-V includes a compressed extension that makes instruction encoding competitive or better than x86(!), and with none of the drawbacks of ARM's Thumb modes.
It is very difficult to make a non-biased comparison between different ISAs.
If you just compile some benchmark programs for 2 different architectures and you look at the program sizes and the execution times, the differences might happen to be determined mostly by the quality of the compilers, not by the ISAs, in which case you could reach a wrong conclusion.
Many years ago, on one occasion I have spent many months working at the porting of a real-time operating system between Motorola 68k and 32-bit POWER. At another time I have also spent a couple of months with the porting of many device drivers between 32-bit POWER and 32-bit ARM and Thumb.
Such projects required a lot of examination of the code generated by compilers for the target architectures and also a lot of time spent with writing some optimized assembly sequences for a few parts of the code that were critical for the performance.
After spending so much time, i.e. weeks or months, with porting some large program, whose performance you understand well, between 2 ISAs, you may be reasonably confident of having a correct comparison of them.
If you want to reach a conclusion in a few hours at most, it is unlikely to be able to find an unbiased benchmark.
RISC-V is however a special case. Even if I have never spent time with implementing any program for it, after having experience with assembly programming for more than a dozen ISAs, when I see that almost any RISC-V loop may require up to a double number of instructions compared to most other ISAs, then I do not need more investigations to realize that reaching the same level of performance with RISC-V will require more complex hardware than for other ISAs.
Also, when comparing ISAs, I place a large weight on how good those ISAs are at GMPbench, i.e. at large number arithmetic. In my experience with embedded system programming large integer operations are useful much more frequently than traditional RISC ISA designers believe.
While x86 has always been very good at GMPbench, many traditional RISC ISAs suck badly, because they lack either good carry handling instructions or good double-word multiply/divide/shift instructions.
RISC-V also seems to have particularly bad multi-word operation support.
Thanks for the perspective and GMPbench reference. I'm sure you're correct that RISC-V has a lot of optimization work to do both at the compiler and chip implementation levels.
I'm curious whether vector operation support in RISC-V might also make up for any apparent shortcomings in raw arithmetic throughput - I guess a lot of it will depend on the types of workloads involved.
The summary is that RISC-V is inefficient because it requires more instructions to do the same work as other ISAs and it does not have any advantage to compensate for this flaw.
Those extra instructions appear especially in almost all loops and the most important reason is that RISC-V has a worse set of addressing modes than the the vacuum-tube computers from more than 60 years ago, which were built only with a few thousands tubes, compared to the millions or billions of transistors available now for a CPU.
Because of this defect of the RISC-V ISA, the Alibaba team who designed the RISC-V implementation with the highest current performance (Xuantie910, which was presented last month at Hot Chips) had to add a custom ISA extension with additional addressing modes, in order to be able to reach an acceptable speed.
Whenever the designers of the RISC-V ISA are criticized, they reply that the larger number of instructions is not important, because any high-performance implementation should do instruction fusion, to be able to reach the IPC of other ISAs.
Nevertheless, that is wrong for 2 reasons, instruction fusion cannot reduce the larger code size due to the inefficient instruction encoding and the hardware required for decoding more instructions in parallel and for doing instruction fusion is much more complex than the hardware required for decoding less instructions with a better encoding as in other ISAs.