Hacker News new | past | comments | ask | show | jobs | submit login
F# RISC-V Instruction Set formal specification (github.com/mrlsd)
134 points by mrLSD-dev on July 29, 2023 | hide | past | favorite | 42 comments



RISC-V CPU formal specification written on F#. Formalazation of RISC-V ISA architecture.


I know F#, I know roughly what RISC-V is. Not sure I understand the high level though. Is it that they formalized the instructions set in F# or is it that F# now can use the formal instruction set to compile programs?

Sorry this isn't my wheelhouse but I am curious.


Due to the properties of F# as a functional language, using a pure representation of functions and a strong type system - in this case, this is a formalization of RISC-V ISA (instruction set). Since we don't have side effects for pure functions. As it has a State machine, one fun opportunity is to execute elf-bin files for it for RISC-V architecture.

I'm not sure what do you mean "compile programs", because it's not the compiler.


I think the right word would have been 'execute' programs. Not compile.

Thanks for explaining.


Just glad to see F# used for something and hitting front page.

Such a great language, it never seems to get traction, to hit any critical mass.


The main competitor of Haskell, and also not the most popular language. However, the only way to popularize a language is to write in it. This project is trying to reveal the possibility of F#, and show the worthy side of F#,


Always fun to see executable specs. For a bit I was experimenting with using F# to write the WebAssembly specification. We ended up using OCaml.


I remember in 1982 or 1983 we had a very very slow implementation of Ada on our university VAX that was written at NYU as an executable specification in a language called SETL.


why ocaml instead


I don't think it was a formal decision that got made at any point, so there wasn't any particular reason. When someone else started writing the actual spec he picked Ocaml. It ended up probably being the wrong choice in the short term (we had to reimplement 32-bit floats in software, and for a while we couldn't run the ocaml files on windows very easily) but in the long term it seems like it worked out OK.

I kind of wish we had used a theorem prover language of some sort though since we had the opportunity to do so. I didn't really have the chance to change things there since other people had higher seniority than me.


I'm slowly coming around to SML and HOL as the best available tools for stuff like the op. In particular cakeml and candle look suspiciously like a solid foundation for modelling instruction sets.


Is this subset of F# itself formally specified? How is it different from an emulator that is slightly more clear to read?


Since F# is a functional language, it allows, using a purely functional approach and a system of strong types, pure functions, to formally verify the correctness of a particular ISA. The emulator is nothing more than a side effect.


>it allows to formally verify the correctness of a particular ISA

That must be hypothetical. "Functionalness" of the language/code doesn't make everything that is written automatically subject to formal verification. (mechanized or pen and paper). What kind of correctness properties does it actually allow to formally verify? I understand if it was the F* language, which is a full blown dependently typed proof checker, but with F#, which is defined by the implementation and 300 page English spec, I don't think you can verify anything interesting. As far as I know F# itself doesn't have mechanized formal semantics and its type system could be unsound.

https://github.com/mit-plv/riscv-coq and https://github.com/riscv/sail-riscv (don't know how complete they are) approaches actually allow to formally (mechanically) verify riscv properties.

Your executable specification could probably be called "Formal", in the same sense as a PDF spec being called "Formal", but being subject to verification by the virtue of F# being functional or being written in the functional style might be pushing it too far.


I completely agree. And I specifically draw your attention to the fact that this is not a formal verification, which it would be reasonable to do: Coq, Isabellll, Agda, F* etc. However, Formal Specification. Those. representation of the specification in a formalized form. Haskell example: https://github.com/rsnikhil/Forvis_RISCV-ISA-Spec

In this case, the term "formal" refers to the formalization of the representation of the specification. And it seems to be already established.


>this is not a formal verification

Then what have you meant by "allows to formally verify"?

>Haskell example:

I think (I'm not gonna insist on it) they are misusing the word "formal" too. Formalization must involve logic/math, i.e. a formalism, either in code or in prose. Just having lambda calculus as a base somewhere deep doesn't cut it. Haskell's type system is known to be (logically) unsound/inconsistent for example. This is an executable specification imho.


Just out of curiosity, what about that: https://people.csail.mit.edu/bthom/riscv-spec.pdf


They chose Haskell for the reasons stated in the section 2.1. Namely human reasons of it being a general purpose language that is supposedly more readable by engineers, and preexistence of translators that they needed. To me the "formal" aspect of their spec starts after the output of the hs-to-coq compiler, which isn't a verified compiler. I already have to trust coq, in which about a paradox a year have been discovered(maybe less now), and which isn't small (about 20k lines minimal core iirc). Trusting an additional compiler and a language subset(some total Haskell subset) is stretching it formality wise. Using a general purpose language is also their innovation (Academic papers sometimes are written to cover the yet unclaimed design space, not necessarily because the approach is the best or practical). They even used the following language "It may turn out that the ideal community-shared spec language is quite different from Haskell, but our interest here is showing that a general-purpose language can work well as the host for a multipurpose spec, leaving fine-tuning of the approach for future work." I see no reason why they couldn't have done it in coq or lean directly, other than the inconvenience(having to write to-verilog and other translators) and "non-innovativeness"(publishability). Good old C today has been fully formally specified, and therefore could also technically be used to provide a "formal" specification for hardware. Would it be a good formal specification? The further it is from the logical core the less useful it seems to me as a formalization. Haskell's formal aspects are also scattered across multiple papers throughout the years and Haskell's versions as far as I know. Coq on the other hand has https://github.com/MetaCoq/metacoq


> To me the "formal" aspect of their spec starts after the output of the hs-to-coq compiler, which isn't a verified compiler. I already have to trust coq, in which about a paradox a year have been discovered(maybe less now), and which isn't small (about 20k lines minimal core iirc). Trusting an additional compiler and a language subset(some total Haskell subset) is stretching it formality wise.

I'm a bit confused from my lack of knowledge here. If it's fed into Coq, why does the origin of the code matter? As it's a proof assistant, wouldn't it reject invalid proofs of validity (assuming you fully trust Coq)? Shouldn't it be fine even if the code is randomly generated by monkeys on typewriters?


>why does the origin of the code matter?

The subject of discussion was whether an executable specification in F# or Haskell are also formal specifications. While Haskell is based on System FC, a type of lambda calculus, there are many additions and quirks, and as far as I know no mechanised metatheory, or even complete language semantics stated in one document. There are also some language features known to make Haskell inconsistent in the logical sense, not to mention that turing completeness makes any language inconsistent as a logic. In the paper linked above authors call their multipurpose specification in Haskell formal, while I don't like it, they have their reasons and justifications (total, non-turing complete subset of the language, haskell's metatheory scattered across papers, System F, which is well established). But to me when one sees some formal statement, the shorter is the path to the trustable foundation the better, and in statements in Haskell I see all of the issues above, plus possibly bugs in the language. If there were proofs of correctness/soundness of hs-to-coq compiler and of Haskell metatheory I couldn't have found reasons to complain. (But then it would have been better to invest this effort into verifying a hardware specification DSL to coq translation or embedding) As the result, what one would have to trust in this situation is the translation of this Haskell spec to coq, not the spec itself, considering that in this setup you have to prove theorems in coq anyway, and guarantees of the proven theorems only work for coq definitions. So Haskell's spec is seen by me as a superfluous entity, that only complicates things and lowers trust. You are correct that a proven statement is coq doesn't depend on the source from which boilerplate was generated, but you would have to trust the statement in coq, and not the "spec", and that's a fail for a formalization in my eyes. I suspect (did't check) that the translation and language subset they chose are trivially translatable to coq and they don't use anything particularly questionable, but the superfluousness doesn't go away.


Now I understand the concerns you brought up, and I see how it complicates things. Thanks for clarifying.


My understanding of this is that it is an emulator that is meant to be very clear to read.



As far as F#/OCaml goes, that's excellent.

(I'm convinced one of the reasons why Rust is so popular is because it's OCaml with a legible syntax.)


F# has diverged quite a lot from OCaml and is very readable IMO.

In any case, I think this view stems from our Algol/c/Java centric CS education system that casts ML and Lisp as weird and “unreadable”, when really it’s just a matter of perspective.


> when really it’s just a matter of perspective.

it's really not, in the engineering school I did in the first year we were taught both C and LISP during the same semester and it was obvious to almost everyone except math nerds how much easier C was mapped to a normal human being (e.g. someone without prior programming experience)'s mental model of the world


Hadouken! They wish they had monads.


Not necessarily. I would say that the hand-rolled argument parsing business it a call for Argu: http://fsprojects.github.io/Argu/tutorial.html

But the code isn't all that complicated to begin with.


thats cool! Is there any project that does this for other architectures?


I created this for x86 many years ago: https://github.com/kristiandupont/rtasm

It's not an emulator, it allows you to assemble code in C++ at runtime. It breaks down the architecture (as it looked at the time) quite detailed, if you are interested :-)

EDIT to add: the .cbi file is a text file that contains most of the documentation: https://github.com/kristiandupont/rtasm/blob/master/RTASM.cb...


thank you!!


Right there with you. I'd love one of these for x86-64 or amdgpu. Assemblers are good things (like the sibling to this) but it's a huge effort to translate the ISA docs to an executable representation.

If you've got that encoding <-> code mapping though, you can turn that into an assembler, an emulator and a compiler backend with a sufficiently determined code generator. Probably a profiler and debugger too. That's a whole set of tooling derived from a single source of truth. In the best case, you persuade the people writing the verilog to provide said information as something like xml. More likely it's an error prone transcription from incomplete pdf files.


aw hell yeah, something like this could be ridiculously useful. Been playing with my own interactive low level language[glorified forth, dont wanna touch C and friends again.] and something like this would make life super easy.



There are Sail models for ARM, RISC-V, x86 and MIPS. I think ARM has its own executable spec too.


Very cool. Does this allow simulation / verification of L1/L2 cache usage?


unfortunately not, because it does not apply directly to ISA. However, the idea is interesting.


Can anyone ELI5 this? Is it basically an emulator?


Basically yes, but with the goal being semantic correctness rather than performance.

Seems to be similar to the official Sail model but F# instead of Sail.

Kind of funny since Sail can already compile to OCaml - probably wouldn't be too hard to add a F# backend. Then again, more independent implementations are always nice to have.

Would be interesting to know their motivation for this.

Edit: actually this looks like it has been dead for 3 years so maybe it was just a precursor to the Sail model.

https://github.com/riscv/sail-riscv


It's possible to emulate. But not only. The main goal is to formalize the representations of the RISC-V instruction set (ISA), decoder, executor, and state machine. So it's more formal point of view for RISC-V ISA.


> the RISC-V instruction set (ISA), decoder, executor, and state machine

So does this repo define these things as F# function and as a user I can import this repo as library and call those functions and my function call would directly run instructions on RISC-V processors, with no "middlewares" like operating system and such?


You can easily import and use specific functions for the decoder, or executor for specific ISA. Or even use the whole state machine. And this is represented by tests. Those. any single RISC-V architecture instruction, or an entire program. Because it can be used as a cpu emulator. Those. OS doesn't matter in this case. However, I draw your attention to the fact that this is only a processor, and not an emulation of the PC and its peripherals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: