
The Design of Scalar AES Instruction Set Extensions for RISC-V - gbrown_
https://eprint.iacr.org/2020/930
======
microcolonel
I'm interested in if there's a simple design for a ChaCha/Poly1305
accelerating ISA extension for RISC-V (outside the general crypto extension,
not sure even as a member of that where it is going). I feel like it is a lot
simpler, if for no other reason than there being effectively a single mode for
the ChaCha primitive rather than three or more. The whole operation is
composed of bitwise rotations and adders that can be arranged in a static
network, and (as far as I know) is more or less not a source of timing
sidechannel information one way or another.

Maybe I could try banging my head against the XCrypto repository this weekend.

~~~
brandmeyer
ARX ciphers suffer somewhat on vanilla RISC-V due to the lack of native
rotations. You have to synthesize rotations from shifting and logical ops.

Prime field algorithms suffer somewhat on vanilla RISC-V due to the lack of
good bignum support. For example, there is no add-with-carry, nor a convenient
way to get the carry-out at a low level. You have to use the set-if-less-than
instruction after an addition to separately compute the carry bit.

~~~
a1369209993
> For example, there is no add-with-carry, nor a convenient way to get the
> carry-out at a low level.

IIUC, you're supposed to break the bignum up into (say) 56-bit chunks, and use
the upper bits of the register as the carry. I'm sceptical that that works as
well as it should for practical bignum applications, though. (I haven't had
occasion to try it out.)

~~~
microcolonel
I think this becomes less terrible the wider your bignum is.

------
cordite
Secure constant time primitive computation is important. Outside of symmetric
encryption with AES, other primitive operations could be beneficial.

Avoiding side channels and valuing small code size can result in some really
funky work arounds. For example, using a whole ECDH key exchange in a JavaCard
co-processor in order to do EC point calculations. As seen in [1] slide 30.

[1]:
[https://www.blackhat.com/docs/us-17/thursday/us-17-Mavroudis...](https://www.blackhat.com/docs/us-17/thursday/us-17-Mavroudis-
Opencrypto-Unchaining-The-JavaCard-Ecosystem.pdf)

------
BooneJS
What’s the strategy for operating systems, compilers, and/or language runtimes
to figure out what a RISC-V chip supports? It used to be one thing checking
which SIMD units an x86 had, but the nature of an open source processor lends
itself to a near infinite listing of /proc/cpuinfo flags.

~~~
microcolonel
> _It used to be one thing checking which SIMD units an x86 had, but the
> nature of an open source processor lends itself to a near infinite listing
> of /proc/cpuinfo flags._

I mean, x86 sets a very high bar for the sheer number of feature flags. Most
ISA extensions of interest to compilers are standardized with the RISC-V
Foundation †.

The mechanism is similar to other chips, and the situation so far seems to be
about the same, except a couple orders of magnitude difference in the current
number of flags.

† Seems it's now _RISC-V International_

~~~
fuoqi
So is there a CPUID-like instruction in the base RISC-V instruction set? After
a cursory search I couldn't find anything like it. If there is indeed no such
instruction, it will be a real shame. Why would you not include it into a such
extensible ISA?

~~~
loeg
> So is there a CPUID-like instruction in the base RISC-V instruction set?

Yes.

> After a cursory search I couldn't find anything like it.

This was the first google result for "riscv cpuid feature instruction" for me
(RISC-V Instruction Set Manual v1.7):

> The mcpuid register is an XLEN-bit read-only register containing information
> regarding the capabilities of the CPU implementation. This register must be
> readable in any implementation

~~~
fuoqi
Thank you! I was looking at the Volume 1 only.

But IIUC it will not be accessible at user-level, so capability detection will
be significantly less convenient for cross-platform library authors compared
to x86.

~~~
Narishma
Why does it need to be accessible from user-space? Applications can just ask
the OS what extensions are available before using them.

~~~
jlokier
Ew. So generic RISC-V libraries, that do things like calculations, crypto or
even memcpy(), need to have _OS-specific code_ in them?

They won't work on OSes the library author didn't know about, or hasn't
supplied code for, or which didn't exist when the library was created.

There are a lot of OSes.

Most likely it will be a call to libc or equivalent, and therefore be tied to
libc. For generic libraries it may be the only reason for a dependency on
libc.

There are even more libc variations than OSes. In Linux terms, it is almost
distro-specific.

This is effectively part of the RISC-V ABI, and it means there is no OS-
independent ABI.

~~~
microcolonel
> _There are a lot of OSes._

Maybe, but right now there are only a couple OSes that run on RISC-V, where
you would care to detect these things at runtime. As of now, I think FreeBSD
has an AT_HWCAP, which exposes RISC-V standard extension information. (though
yes, it will be _some_ platform code, one #ifdef and a couple lines to call
either _getauxval_ or _elf_aux_info_ ).

> _...or even memcpy()_

P.S. memcpy goes in libc, the same place that implements these functions, so I
think probably choosing the right memcpy is not an issue.

------
userbinator
Making RISC-V not so RISCy... it turns out that complex instructions are
actually useful in practice, and ISAs which don't have them will eventually
develop them in order to remain competitive.

~~~
simias
I'm not sure if it's very unRISC-y. CISC is all about these sort-of-generic-
but-slightly-high-level methods such as SCASB (find a byte in a string), LOOPD
(loop while equal) and all the in-memory operations, prefix instructions
etc... All that with a lot of redundancy in the ISA. A RISC CPU will "emulate"
this instruction with a couple of more primitive instructions, favoring speed
and simplicity.

Of course the line is blurry, but to me things like SIMD, floating point or
AES are more like "coprocessor" extensions, it's not something that can
decently be emulated using a handful of CISC opcodes. Making a quasi-religious
point that CISC shouldn't support high-level instructions will doom these CPUs
to irrelevance.

A CISC CPU is no less CISC if you put in on a board next to a hardware video
decoder module for instance, so why should an on-die AES extension be a game
changer?

The original PlayStation's CPU had a coprocessor dealing with 3D
transformations (the part of the pipeline that would be handled by a vertex
shader these days) so it actually supported opcodes such as "MVMVA: Multiply
vector by matrix and add vector". I don't think anybody would consider that
unRISC-y.

~~~
fluffything
In fact, the RISC-V V (Vector) extension supports vector "shapes", so you can
take a f32x16 vector and make it a f32x4x4 4x4 matrix, and then multiplying it
with another f32x4x4 vector will do a 4x4 * 4x4 matrix multiplication, in
hardware. IIUC this shapes generalize to tensors, so you can use this hardware
units to, e.g., implement DNNs, similar to nvidia's tensor cores.

------
anonymousDan
Slightly off topic, but can anyone tell me if the RISCV ISA makes static
disassembly easier than x86-64?

~~~
brandmeyer
Some of the operand fields in the 16-bit instructions are positioned in weird
places in the instructions, but otherwise the process is pretty mechanical.

