Their approach lines up pretty well with how we do regions in Vale. [0]
Specifically, we consider the "spine" of a linked list to be its own private "region", separate from the elements' region. This lets us freeze the spine region, while keeping the elements region mutable.
This mechanism is particularly promising because it likely means one can iterate over a collection with zero run-time overhead, without the normal restrictions of a more traditional Rust/Cyclone-like borrow checker. We'll know about this iterating benefit for sure when we finish part 3 (one-way isolation [1]); part 1 landed in the experimental branch only a couple weeks ago [2].
The main difference between the two approaches is that Vale doesn't assume that all elements are self-isolated fields, it allows references between elements and even references to the outside world. However, this does mean that Vale sometimes needs "region annotations", whereas the paper's system doesn't need any annotations at all, and that's a real benefit of their method.
Other languages are experimenting with regions too, such as Forty2 [3] and Verona [4] though they're leaning more towards a garbage-collection-based approach, where Vale is more like a memory-safe C++.
Honest question. Why many researchers in the area reach for a completely custom language? There is a huge barrier for new language. All the tooling, bindings for OS, tailor made libraries for specific use cases and so on. It all has to be done almost from scratch.
Why not to start from what people already know and love and bolt your experiments on top of that? What I do with my DSL AmpLang is to start with Python, throw away parts I don’t like and add the things I’m interested in. It seems to work well enough and my language feels familiar for Python developers. I did not test that but many of existing Python programs would work out of the box in AmpLang. Python grammar is very simple. Even more so for 3.9 which does not have pattern matching.
Previous PL researcher here. There are a few reasons for this.
First, it's important to distinguish between "positive" features and "negative" features. "Positive" features are where you can do additional things you couldn't do before. I'm guessing your AmpLang is primarily about positive features? Researchers do implement these as libraries or DSLs. For example, the most influential paper in pretty printing is Wadler's pretty printer, which is implemented as combinators in Haskell.
However, a lot of research is on "negative" features, where the compiler or type system gives guarantees that certain bad things can't happen. You can't usually add to a language and get negative features. For example, this post's paper is in the "negative" category, so it wouldn't make sense to develop it as a DSL, and furthermore it's about static type systems, so it wouldn't make sense to use anything like Python. Also, existing languages are huge, and when you're doing proofs, you need to fully model the language you're proving things about, so papers would get vastly more complicated if they tried to build on top of an existing language. Instead, the usual approach is to develop a teeny tiny boring language that also has the new feature (in this case, the new type system that handles concurrency).
Another thing to realize is that most "languages" in research papers aren't meant to be used. Success for such a language doesn't mean that people start using the little prototype they built (if the language is even implemented); instead success means that someone transfers their ideas into a new or existing language, and builds all of the tooling for that language. The hard problem that the researchers are solving is proving things. This is very hard! Real languages (especially type systems and compilers) are built on top of decades of language research! But it's very different from developing all of the libraries and tooling for a practical language, which is an entirely different process.
It's a lot less work to start with a language that's as simple as possible so you can focus on the problem that you're interested in solving, instead of spending years to figure out how your idea would interact with all the existing features of some existing language.
If it turns out that your idea might actually work and improve something, somebody else can take it and figure out how to add it to some existing language.
A counter example for this would be Haskell. Before Haskell most of the researchers in functional languages started by defining CPU instruction set, then added their programming language, defined how backend of the compilation works and only after that they presented their point. This is sort of situation we see here.
Then Haskell came. They presented a nice language based on something others were familiar with (lambda calculus). Since then, two interesting things happened as I understand it. Poeple started adding more nuanced optimization to lambda claculus based backend. This is similar to what LLVM does. Second thing is that instead of comming up with their own languages, most adopted Haskell-style syntax. Just look at modern of functional languages. The mostly look like Haskell (well, ML).
You seem to be implying these researchers are reinventing the wheel by making a syntax that isn't based on anything that's already implemented (like Haskell). That couldn't be further from it. The syntax is absolutely unimportant to a paper like this. Choosing to use curly braces is their version of naming a variable "foo". The goal is not to make a compiler that produces machine code or eventually gets a package manager, at all. They're not playing that game.
They are writing a type system, not a language. (Note the title.) They're attempting to write one that captures the very weak guarantees of concurrent systems, and this is actually really complicated, so research tends to start with a simple version of the new type system, and then steadily narrow the assumptions behind it over subsequent publications. The target is not machine code, but having a computer check your typing rules work and that you can write interesting proofs with it. The toy language only exists for this purpose. The same way a Haskell programmer might code until it compiles and then put it into production, a PL researcher codes until the Coq checker gets all the way through their proofs and then submits to the conference. And much the same way that you can implement a GHC extension to test an idea, Coq is an extremely feature-rich base to start building a type checker demo language and proof system. They are not wasting any time here.
The syntax they end up with is not meant to be used, it is meant to be the smallest possible language that demonstrates the theory in a machine-checkable way. If you want these features in a real language, take the OCaml type checker they wrote and go write it again for that new C++ frontend Circle or something, this time accounting for all the wild requirements that pop up in a real compiler.
These papers typically contain (semi-)formal descriptions of the type system and various proofs about it. The technical material for this paper is in an appendix:
For real-world languages, this is even more involved, and existing formalizations only cover supposedly distinct sub-languages, not the full language (with the risk being that there are unexpected interactions). Most languages also have complicated things like higher-order functions or some form of polymorphism, and the new type system would have to be integrated with that.
This is true, but higher-order functions and polymorphism are extremely well-understood. (Including that there are various compromises with type-inference when you combine them with other constructs),
I think there are two scenarios which both call for different solution.
One scenario is when your goal is to make something new. Something, that would feel different and have different ergonomics. This is like inventing a Python when there are mostly C-style and Pascal-style languages.
Other is when you are interested in a particular behaviour of the language. Like type system. Why should you bother whether you write let x = 5; or int x = 5; when all you are interested in is its type. This is likely the case of this paper.
Honest answer. Because people doing this sort of research have actually studied computer science. They don't need to cobble things together using the only language they barely know.
Where it kind of fails short, as usual, is when OS IPC, external resources or multiprocessing enters the picture, with the type system not being able to prevent external events from taking place.
The concurrency remains fearless only for process private in-memory data accesses.
> Figure 2. Removing the final element of a singly linked list.
Note that both the returned result and list remain mutable,
and the returned result is no longer encapsulated by the
linked list, unlike in prior work (e.g., [26, 46]). Note also that
this function returns none on lists of size one, as it would be
impossible to separate the list from its tail in this case.
Huh, now that sounds like a substantial pitfall... any good way around this?
Unless length of the list (a "Vector" in typical dependent type nomenclature) I can see the issue.
I have not read the paper, but why is not a the empty list the value that returns None when called with this function? I assume the empty list is still a list in this type system?
No, it’s not—-they’re passing in a “node”, not a list, which always contains both a head element and a (nullable) tail pointer. That’s why it can’t separate the head in the case of size one: the node must maintain a non-null head.
Who said anything about painful? HN posts that are direct downloads for pdf files are universally marked with [pdf]. Either because the poster remembered to do so, or because the title gets edited after the fact.
This is not an article, it's a download, and should be marked as such.
It's only a download if your browser chooses not to display it. On Chrome, it's just another tab among tabs; you can choose to save to a file if you want.
I just love the beauty of type systems, the vast complexity and effort put into building an extensive architecture. And on the same time, systems without it are capable of wonderful things.
Specifically, we consider the "spine" of a linked list to be its own private "region", separate from the elements' region. This lets us freeze the spine region, while keeping the elements region mutable.
This mechanism is particularly promising because it likely means one can iterate over a collection with zero run-time overhead, without the normal restrictions of a more traditional Rust/Cyclone-like borrow checker. We'll know about this iterating benefit for sure when we finish part 3 (one-way isolation [1]); part 1 landed in the experimental branch only a couple weeks ago [2].
The main difference between the two approaches is that Vale doesn't assume that all elements are self-isolated fields, it allows references between elements and even references to the outside world. However, this does mean that Vale sometimes needs "region annotations", whereas the paper's system doesn't need any annotations at all, and that's a real benefit of their method.
Other languages are experimenting with regions too, such as Forty2 [3] and Verona [4] though they're leaning more towards a garbage-collection-based approach, where Vale is more like a memory-safe C++.
Pretty exciting time for languages!
[0] https://verdagon.dev/blog/zero-cost-borrowing-regions-overvi...
[1] https://verdagon.dev/blog/zero-cost-borrowing-regions-part-3...
[2] https://verdagon.dev/blog/regions-prototype
[3] http://forty2.is/
[4] https://github.com/microsoft/verona