
RISC-V formal spec public review - edwintorok
https://github.com/riscv/ISA_Formal_Spec_Public_Review/blob/master/README.md
======
ncmncm
Well, this is progress.

I wonder how many counterparts to delay slots, stack windows, conditional
moves, and other embarrassments we are inadvertently enshrining. There's
nothing like hindsight to make you facepalm. (The crypto extension is my bet
ATM for most-likely-to-embarrass. But that's without reading it.)

The only way to approach this project sensibly is to assume every single FPGA
produced after some near future point will have at least one, and more
typically dozens of RISC-V cores scattered around like the multipliers you see
in them now, just to try to be competitive.

Personally, I am banking my enthusiasm for when the Bitmanip extension goes
in.

~~~
astrange
What's wrong with conditional moves? They're good for mispredictable branches,
although I liked them better on PPC which had 8 condition/flags registers
instead of just 1.

~~~
ncmncm
Conditional moves depend on state left over from the last instruction. But
when you want to re-order your instructions to keep all your functional units
busy, keeping the conditional moves right makes a big mess. They're fine as
micro-ops after you've scheduled them all, but are lousy at the ISA level.

Intel and AMD make it work by throwing another 10,000 or 100,000 transistors
at it.

It is better to let macro-op fusion hardware identify opportunities to convert
a branch-over-move sequence, all by itself. RISC-V is supposed to be all about
powerful macro-op fusion.

Clang is really aggressive about generating cmovs. On Gcc you can still use (x
& -c) expressions to get nicely pipelined conditional expressions, but Clang
stomps them all to cmovs.

Cmov is one of the methods to mitigate Spectre because Intel refuses to
speculate loads in them. So, cmov from memory pessimizes your code in cases
where you aren't worried what might be sharing your cache.

~~~
devit
Isn't cmov just a normal instruction with 3 source operands?

Or do you mean that the problem is having 3 instead of 2 source operands?

Or that one of the operands is the flag register on x86-like architecture?
(but you can just use a normal register being nonzero)

~~~
mellum
Having 3 input operands is definitely an issue. In the Alpha EV6 architecture,
it was decided only 2 inputs are supported, so a CMOV would need to be split
into two actual instructions, the first of which setting the 65th bit of one
of the operands to the result of the test and the other selecting the correct
output. Thus, CMOV had 2 cycles latency and still required extra hardware
resources to implement.

------
nagisa
Is over.

> The Public Review period is: March 29, 2019 through May 13, 2019

------
edflsafoiewq
How easy is it to do bigint arithmetic in RISC-V?

~~~
Dylan16807
It has perfectly good add and multiply instructions, so uh easy? What at an
ISA level would make it hard?

~~~
ncmncm
Connecting from one word to the next. Do you have to do 32-bit multiplications
so as to retain the high half of the result, or does a 64-bit multiply deliver
the high 64 bits of the result somewhere?

Also there is stuff about the carry from 64-bit addition.

(Substitute 16 and 32, for 32-bit units.)

~~~
Dylan16807
There are instructions to get the top half of a multiply, and a suggested
ordering so that the chip can deliver both halves with only one calculation.

There's no carry flag but I'd say it's still easy to work around that.

~~~
waterhouse
Specifically, I think the "sltu" (set if less than, unsigned) instruction is
useful for carries. When you add two unsigned integers, the result is smaller
than the inputs if and only if an overflow occurred.

    
    
      ; a1:a0 * a2 --> a5:a4:a3
      ; using t0 as temp reg
      mulhu a5, a1, a2
      mulu  a4, a1, a2
      mulhu t0, a0, a2
      mulu  a3, a0, a2
      add   a4, a4, t0
      ; carry a 1 or a 0, overwriting t0
      sltu  t0, a4, t0
      add   a5, a5, t0
    

(edit: switched a1 and a0 for consistency)

