Peter_Sewell's comments

Peter_Sewell · 2024-04-29T05:32:46

Neither of these are actually the case: one doesn't need to be fluent in PL research to understand Sail specs (see @timhh comment above, forex), and Sail was developed independently of the origins of RISC-V.

Peter_Sewell · 2024-04-28T14:48:33

Here is a piece of the Arm-A ASL:

    (result, -) = AddWithCarry(operand1, imm, '0');
    if d == 31 then
      SP[] = ZeroExtend(result, 64);
    else
      X[d, datasize] = result;

and here is some analogous Sail (automatically translated from the ASL, in fact):

    (result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
    if d == 31 then {
        SP_set() = ZeroExtend(result, 64)
    } else {
        X_set(d, datasize) = result
    }

I'm somewhat at a loss to understand in what way that's less readable?

(There surely are things that can be improved, of course - e.g. perhaps the concrete syntax for fancy types - but readability was a primary goal in the Sail design; it's really not a theorem-proving language or something that needs fancy PL knowledge.)

sweetjuly · 2024-04-28T19:50:17

Hmm, thanks for keeping me honest; I took another look after reading your response and I've changed my mind a bit :)

I think my primary criticism is the way in which the formal specification is surfaced in RISC-V as opposed to ARM rather than any issue with Sail in particular.

In ARM, the relevant portions of the pseudocode/ASL are available directly in the architecture reference manual. A snippet is either included directly underneath the relevant instruction or there is a link to it. This makes it very easy to use since only the relevant bits are presented when it is being used for reference/human spec reading purposes. This relationship to pseudocode/ASL, of course, makes sense given the history of ASL. The documentation pseudocode was originally just a nice and clearer way to explain how instructions work (free from the trappings of natural language). Later, they cleaned up the pseudocode and created an actual language specification for it so that it could be used as a formal model.

In RISC-V, we don't really get any such luxuries. If you find yourself confused about how exactly an instruction works, you have to somehow know that the Sail model is the official specification (it is not called out anywhere in the natural language spec!) and then go trawl through the entire model yourself hunting for the definition of your instruction. This isn't super hard if you know what you're doing, but there is a lot of other machinery in these models that you can trip over (it includes instruction binary encoding formats and the assembler language specification as well). I would say that I'm comfortable using the model as a reference but it's definitely not as easy to access as it is over on ARM.

I see now though that there is an effort to integrate the Sail source into the natural language specification. I hope that makes it eventually, it'll be a big quality of life improvement.

Peter_Sewell · 2024-04-28T08:19:16

Sail is pretty similar to ASL (both current ASL and ASL 1.0) except that (1) it has a more expressive type system, so that bitvector lengths can all be statically checked, (2) it has proper tagged unions and pattern matching, and (3) there's a wide range of open-source tooling available, for execution, specification coverage, generating emulators, integrating with relaxed concurrency models, generating theorem-prover definitions, etc. We've recently updated the Sail README, which spells some of this out: https://github.com/rems-project/sail .

As Alastair Reid says, one of the main things missing in the current RISC-V specification documents is simply that the associated Sail definitions are not yet interspersed with the prose instruction descriptions. The infrastructure to do that has been available for some time, in the Sail AsciiDoc support by Alasdair Armstrong (https://github.com/Alasdair/asciidoctor-sail/blob/master/doc...) and older LaTeX versions by Prashanth Mundkur and Alasdair (https://github.com/rems-project/riscv-isa-manual/blob/sail/r...).

Peter_Sewell · 2024-04-19T09:48:09

There's increasingly mature work on CHERI temporal memory safety, eg https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/2020oa... and more recent versions.

Peter_Sewell · 2024-04-19T06:24:28

If you want to narrow the bounds of `data` on reuse, you can do easily do that in CHERI C (you'd need to also keep the larger capability somewhere to rederive the `data` capability from, either in this struct or elsewhere, depending where the allocation come from).

PhilipRoman · 2024-04-19T06:57:03

Oh that is great news, I was not aware that CHERI was this powerful. Interesting if this could be abused by JIT compilers to provide free runtime bounds checks, kind of like the free null pointer checks done by catching SEGV.

Peter_Sewell · 2024-04-19T05:39:09

CHERI enables a rather high level of memory safety. It uses dynamic hardware checks rather than the safe-Rust static checks, which means that existing C/C++ code can often be ported to CHERI C/C++ with minor changes. Of course, as others note, "memory safety" is not a simple single thing, and there are certainly some cases that CHERI C/C++ don't depend on (as noted in this paper by Vadim Zaliva and others in our group). But in examples like this one from another comment:

    struct buffer {
          char *data;
          size_t capacity;
          size_t length;
        }

the pointer 'data' will in CHERI have to be a valid capability, not just a virtual address, to permit access. It should normally have been instantiated with the correct bounds from the appropriate allocation, separately from the 'length' field, so the hardware will do the right check.

Peter_Sewell · on Aug 12, 2020

There's work along these lines (substantially predating Sail, but one could in principle do much the same starting from Sail specs) by Dias and Ramsey, e.g. the POPL 2010 and 2011 papers at https://www.cs.tufts.edu/~nr/

ampdepolymerase · on Aug 12, 2020

This is how Go's assembler works.