> No document that I could find has ever tried to specify an instruction set ind...

acuster · 2024-04-28T06:52:38

Sorry, it's three in the (sunday) morning, and I've been hitting the whysky trying to handle the estabilshment journalists having fun, while other journalists are talking about humans struggling to get water while themselves being asked if they will survive the night. ---I'm not at my best.

You're right to call me on my statement; I should have all my notes on hand to make that claim and I don't. Paah, no, I do: ARMv7-M Architeture Reference Manual ... Part A Application Level Architecture ... ...processor in Thread mode (vs. in Handler mode).

So ARM already has a lot of detail whereas the RISC-V architecture is trying to (has to?) start even more abstract, where code doesn't even have modes (no interrupts).

This all started a pandemic saturday morning, cup of coffee in hand, enthusiasm to read the "RISC-V Spec" and see what I could learn. Download. Confusion: it says "manual," did I get the right thing? ... Ok, yeah, that's what's on offer. Half an hour later, I'm actually pissed off, like actively angry. I'm reading this from the point of view of "what's the execution environment that I'll be working against?" and I'm getting hit with "unprivileged" which is just wrong. It turns out they are mixing up "the environment of general purpose programmers" with "the minimal that needs to be implemented"---it's a royal mess, they kindda give up on it in the middle. I'm angry about being asked to read this as "the product"; it's not even properly proof-edited. So I took my frustration and tried to figure out 'what would you do to make this better?'

The 'RISC-V' spec is trying to specify: [instructions], and what they do to the [architecture]. I don't know much about the details, but I have a notion that there was push back on writing this up as a 'state machine' and how each instruction might change that state. I assume Prof. Asanović had his own good reason to avoid framing things that way but he's yet to give us a good explantion of why. So probably he's right, I just don't know why.

So how could this be done?

I went to look at the history. The original x86 spec was tied to the chip they were trying to sell. PowerPC, MIPS, if I remember right, were not 'specified' in a clean way--none of them had the same challenge as RISC-V does, starting in pure execution environment mode. I went to read the infamous von Newmann writeup and got side-tracked by his virtural neurons but didn't find the right level of abstraction there either.

So, I'm sorry I can't really justify myself here, but this is all subtle and hard. From what I have found, I don't think anyone has faced the challenge that RISC-V faces, so I don't think we have a roadmap for the spec that RISC-V ought to have.

cheers

thechao · 2024-04-28T13:50:27

I really think something like Ghidra/SLEIGH and a formal specification of the p-code could help; but, only if the following things happened:

1. A p-code parser front end in C existed;

2. An alternate XML/JSON version of SLEIGH; and,

3. A way to integrate the above to document (book) generation.

For the latter I'd prefer HTML. I've found the SLEIGH spec, itself, heavy enough going that I can't tell if it supports full constraint specifications, or not.

IAmLiterallyAB · 2024-04-28T18:56:54

> full constraint specifications

Is that a technical term? (if so, can you explain further)

I've made SLEIGH specs for two architectures. In my experience, it can describe 95% of the semantics well enough for decompilation (it gets weird when your ISA has quirks). Not as comprehensive as SAIL appears to be

Also, SLEIGH compiles to an XML format which is what Ghidra actually uses

thechao · 2024-04-28T23:55:28

CPUs are fairly orthogonal in terms of capabilities; if the instruction can encode it, the CPU can interpret it. Coprocessors (GPUs, NPUs, etc.) have ISA where the legal encoding space is much larger than a the legal instruction space: the set of valid instructions is not dense in its own encoding space. This smaller legal space is defined by a set of constraints on the set of legal encodings.