
A Verified, Efficient Embedding of a Verifiable Assembly Language [pdf] - pjmlp
https://www.microsoft.com/en-us/research/uploads/prod/2018/12/popl19-vale.pdf
======
austincheney
It appears the F* language is created for verification as its primary project,
Project Everest, is a replacement technology stack for HTTPS with cryptography
embedded.

The challenge this paper attempts to address is embedding a low level language
in a high level language plus verification. In a general sense language
grammar embedding is a common problem. Examples include Boost templates in
C++, React JSX which embeds XML in JavaScript, and HTML templates (ERB, Twig,
PHP, Elm, Handlebars, Dust.js).

I have been attempting to solve for the problem of language embedding, in
various ways, for years. The root cause of the complexity is that different
parsing schemes are executed as a document is passed through different
applications ignoring the remainder of the given document outside the
delimiters of the syntax concerned by the parser at present. There are
consequences to that approach in performance, integrity, and structure. This
became apparent when I encountered documents containing FreeMarker code and
Handlebar code in a way that did not respect the structure of each other
template scheme but made sense in isolation.

I was able to solve for embedding by realizing that embedding is a data
structure imposed upon the outer language. If a language is reduced to a
primitive structure, such as a parse tree, then all languages become equally
represented as various data structures. That won’t make practical sense unless
the structure representing these languages is uniform. If there were a common
parse tree to all languages that represents respective languages as data
structures and the points of embedding are just child structures apart from
the whole the problem of embedding is solved for.

Because everything is represented as simple structures in a set of like
structures recursion from nesting is accounted for without increased
complexity. An example in this case would be to embed an instance of assembly
in the outer F* grammar which then contains as instance of F* as a string or
comment in that assembly.

A demo of the simplified and uniform parsing scheme is at
[https://prettydiff.com/parse-
framework/runtimes/browsertest....](https://prettydiff.com/parse-
framework/runtimes/browsertest.xhtml) but at this time only a few languages
are supported. I am still working on expanding support and proving the
concept. Unlike the research paper though, a data structure alone is not
enough to provide verification, which requires a proof of some sort.

~~~
bdamm
I was ready to shred your idea but then I tried out your parser. It's
interesting, actually. But, the detail is where things get really challenging.
Escape tokens tend to be pretty quirky, like <?= or ```. And C++'s template
declaration operators, along with C's parameter type declarations, will be
super challenging because there is grammar feedback into the tokenizer for
100% correct tokenizing. But in general I like your idea of a global standard
for tokenizing, leading to a universal tokenizer. Full parsing is another
story though. For a correct parse tree of C, you need to actually have
feedback from the symbol table. And I suspect plenty of languages have one or
two weird parsing tricks, so supporting it through all languages will end up
being a major challenge. Good luck!

------
renholder
If anyone else was just as confused as myself, by not knowing what F* was, see
this[0].

[0] - [https://www.fstar-lang.org/](https://www.fstar-lang.org/)

------
eggy
With F# being a practical language for most tasks, and F* meshing well with
it, this makes for a great combination of languages to know for verifiable
programs. F* outputs F# (and OCaml), and C (Kremlin). Now you can have your
low-level, systems programming, and your general PL along with some code
verification and dependent types. I code F# on my iPad Pro with a full, live
IDE called Continuous[1]. The author of Continuous basically forked the Roslyn
compiler for iOS. Idris was my other contender for this niche, however, the
F#/F* combo seems to be more practical especially if you want to do
iOS/Android in addition to desktops.

[1]. [http://continuous.codes/](http://continuous.codes/)

