Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
High-order Virtual Machine (HVM): Massively parallel, optimal functional runtime (github.com/kindelia)
493 points by Kinrany on Feb 5, 2022 | hide | past | favorite | 151 comments


Author here.

HVM is the ultimate conclusion to years of optimal evaluation research. I've been a great enthusiast of optimal runtimes. Until now, though, my most efficient implementation had barely passed 50 million rewrites per second. It did beat GHC in cases where optimality helped, like λ-encoded arithmetic, but in more real-world scenarios, it was still far behind. Thanks to a recent memory layout breakthrough, though, we managed to reach a peak performance of 2.5 billion rewrites per second, on the same machine. That's a ridiculous 50x improvement.

That is enough for it to enjoy roughly the same performance of GHC in normal programs, and even outperform it when automatic parallelism kicks in, if that counts! Of course, there are still cases where it will perform worse (by 2x at most usually, and some weird quadratic cases that should be fixed [see issues]), but remember it is a 1-month prototype versus the largest lazy functional compiler in the market. I'm confident HVM's current design is able to scale and become the fastest functional runtime in the world, because I believe the optimal algorithm is inherently superior.

I'm looking for partners! Check the notes at the end of the repository's README if you want to get involved.


The How page [0] is nicely written! It was a real pleasure to read :)

[0] https://github.com/Kindelia/HVM/blob/master/HOW.md


This is very cool research that I’ve been loosely following for a while but I feel the benchmarks you’ve presented are very misleading. They are comparing parallelized code to sequential code and then making a big deal that it’s faster. Of course, you could also write a parallel Haskell version of the same benchmarks in about the same amount of code, and I’d expect similar speedups.

Haskell doesn’t parallelize every independent expression because it’s a general purpose language and that’s unlikely to be the correct choice in all contexts.

I don’t have much of an intuition for whether the optimal evaluation stuff will be useful in practice for the kinds of programs people actually write. Like I get that it helps a lot if you’re doing multiplication with Church numerals… but can you give some more compelling examples?


I didn't want to be misleading, sorry. My reasoning was to just benchmark identical programs.

I personally think Haskell's approach to parallelism is wrong, though, since it demands in-code annotations to work. The problem is that the decision on whether an expression should be parallelized doesn't depend on the code itself, but on where the expression is used. For example, if `fib(42)` is the topmost node of your program's normal form, you always want to parallelize its branches (`fib(41)` and `fib(40)`), but if it is called in a deeply nested list node, you don't. That information isn't available on the code of `fib`, so placing "spark" annotations seems conceptually wrong.

HVM parallelizes by distributing to-be-evaluated redexes of the normal form among available cores, prioritizing these close to root. Here is an animation: https://imgur.com/a/8NtnEa3. That always seems like the right choice to me. After all, it just returns the same result, faster. I could be wrong, though. In which case you think parallelism isn't the correct choice? Is it because you don't always want to use the entire CPU? Would love to hear your reasoning.

> Like I get that it helps a lot if you’re doing multiplication with Church numerals…

This isn't about multiplication with Church numerals. Don't you find it compelling to write binary addition as "increment N times" and have it be as efficient as the add-with-carry operation? Making mathematically elegant code fast matters, and HVM does that. See the overview [0] for instance. Also, this allows us to have all the "deforestation" optimizations that Haskell applies on List with hardcoded #rewrite pragmas, for any user-defined datatype, which is also quite useful. There are certainly many uses that I'm not creative enough to think of.

[0] https://github.com/Kindelia/HVM/blob/master/HOW.md#bonus-abu...


If you're interested in collecting a more diverse set of benchmarks, Stephanie Weirich presented her work [1] at the recent WITS workshop [2] comparing the efficiency of different reduction strategies in the untyped λ-calculus across a dataset of natural and synthetic terms. I would be curious to see how your implementation compares.

[1]: https://github.com/sweirich/lambda-n-ways

[2]: https://popl22.sigplan.org/home/wits-2022


> My reasoning was to just benchmark identical programs.

I would just report the results of a parallel Haskell program as well, and let people decide what programs they think are comparable. I don’t really think you come off well making the speed claims you do since many people who bother to dig in will conclude you are comparing apples to oranges.

> After all, it just returns the same result, faster.

That is not always true. There’s some overhead to introducing parallelization so it’s not always worth it unless you have a big enough chunk of work.

But the other reason fine grained parallelism isn’t often worth it even if it can provide a speedup for a single computation: you have other concurrent things going on, and plenty of potential parallelism from that. Think of a web server responding to many concurrent requests. In this situation, your program could easily have worse performance by injecting parallelism within each connection. Imagine we have k cores and k concurrent connections - we would likely be better off just giving each connection one core rather than having them all thrashing each other to use all the cores.

> Don't you find it compelling to write binary addition as "increment N times" and have it be as efficient as the add-with-carry operation?

It’s very cool in a “wow that’s neat” kind of way, but, well we already have machine ints that are even faster. :) I feel you need better examples to be compelling - people just don’t care very much about optimizing Peano arithmetic. If those speedups generalize to things people do care more about then you should show that and then people can be impressed! It sounds like you do have some ideas for examples, and I think your point about getting deforestation “for free” is cool… but, show us!

My other question - it seems like this approach requires giving up on separate compilation? Like you can’t compile a function in isolation, since it will do completely different things when composed with other functions? Or am I misunderstanding?


> That is not always true. There’s some overhead to introducing parallelization so it’s not always worth it unless you have a big enough chunk of work.

This isn't fine-grained parallelism though! As I said, HVM spawns threads on to-be-evaluated redexes closes to the root of the program's normal form. So, either there is a significant speedup, or the program is sequential (or too small), and there is no speedup, but the overhead is minuscule, since a bounded, small amount of spawn() occurs. To be clear: there is no situation where HVM's parallelism will make the program significantly slower, which *does* happen if you use too many sparks on Haskell. That will never happen on HVM.

> But the other reason fine grained parallelism isn’t often worth it even if it can provide a speedup for a single computation:

Yep that's definitely a good reason. Thanks for sharing. (You can just run HVM in a single thread, though.)

> It’s very cool in a “wow that’s neat” kind of way, but, well we already have machine ints that are even faster

I couldn't disagree more. Just because I used integers as an example, which happens to be optimized, it means an entire optimization technique isn't interesting? This applies to every data structure that can be defined algebraically.

> My other question - it seems like this approach requires giving up on separate compilation?

Compiling a function in isolation is fine, why wouldn't it be? Not sure I get it.


> As I said, HVM spawns threads on to-be-evaluated redexes closes to the root of the program's normal form. So, either there is a significant speedup, or the program is sequential (or too small), and there is no speedup, but the overhead is minuscule, since a bounded, small amount of spawn() occurs.

To me that sounds like overhead. Rather than just computing 1 + 1 as a single assembly language instruction you’re sending the 1 + 1 expression tree to a thread’s work queue or something? If I’m misunderstanding can you clarify?

> Compiling a function in isolation is fine, why wouldn't it be? Not sure I get it.

It seems like optimal evaluation is a whole program optimization - you need the whole program available in order to do it, you can’t just compile one function to assembly language in isolation, and then link that against other precompiled functions. Do I have the wrong idea here?

> Just because I used integers as an example, which happens to be optimized, it means an entire optimization technique isn't interesting? This applies to every data structure that can be defined algebraically.

I do think it sounds cool as I said, but I want to see it on an example I care about, not Peano arithmetic. Especially since deforestation is already an optimization that is used by languages that don’t do optimal evaluation. So I want to see how HVM does it better, if that’s the case. And not just you telling me how it’s theoretically better. :)

Lastly, a while ago I remember you mentioning that you couldn’t compile arbitrary lambda calculus terms. Is that still the case and if so can you better describe what can’t be compiled?


I think I very much agree with your points and approach to parallelism.

> This applies to every data structure that can be defined algebraically.

Is this a mutually true? I’d expect this to hold for inductive algebraic data types but not co-inductive types?


> You can just run HVM in a single thread, though.

How easy would it be to share data structures between different threads? Say I have a large binary tree in one thread, can I send it to another thread without much cost? How about lambdas?


That isn't how it works. HVM threads work transparently, you don't need to think about them. The same data structure *can* be visible to multiple threads under some conditions, through cloning; for example, consider the following program:

    (Sum Nil        )  = 0
    (Sum (Cons x xs))  = (+ x (Sum xs))
    (Head (Cons x xs)) = x

    (Main x) =
      let xs = (Cons 1 (Cons 2 (Cons 3 Nil)))
      (Pair (Sum xs) (Head xs))
The result of this program is `(Pair 6 1)`. Since `Pair` is the topmost (root) node of the normal form, each of its elements will be computed in a separate thread. Because of that, the same `[1,2,3]` list will be accessed by each thread. But that's not actually a shared reference, because values only exist in one place; instead, the lazy cloner makes copies of the list, layer-by-layer, and sends these copies to the thread that will handle them.

Note that if the same expression was written in a deeper position of the normal form, it would be evaluated sequentially. This is what avoids the fine-grained parallelism issue. It also means that in many cases HVM will not be as parallel as it could, though! But when the parallelism kicks in, it is always productive.


Ok. I'd be interested in the following benchmark: A program that reads in a large map of key-value pairs; it then reads in a list of keys, and produces a list of values (order of the output is not important). The map can be modeled by e.g. a red/black tree.

I'd be interested in the performance in Haskell, if it was implemented using a number of worker threads (each with their own reference to the map), versus the best implementation you can think of in HVM.


Looks good. I'll keep that in mind and let you know if we write such a benchmark. Opening an issue would be helpful to remind us!


I wonder if there's been attempts at solving "when is auto-parallelization worth it" with profile feedback?


> if `fib(42)` is the topmost node of your program's normal form, you always want to parallelize its branches (`fib(41)` and `fib(40)`)

How does that make things more efficient? Naively, it seems like the fib(40) and fib(41) threads are going to do almost exactly the same computation, assuming they're also defined recursively.


It's neat when code that looks wildly inefficient goes fast anyway, but might that mean minor changes can have unpredictable effects on performance? We see that a fair bit with JavaScript, where there is a happy path for performance but you can fall off if it with apparently minor changes.

Perhaps, with more experience, the performance of programa targeting HVM will be predictable, but I suspect it will mean a lot of relearning intuitions about what makes programs slow?

Do Haskell programmers have good intuition about performance? It seems like laziness would make it hard?


> Do Haskell programmers have good intuition about performance? It seems like laziness would make it hard?

It’s very hard. I’ve found most experienced Haskell devs still get it wrong at times. It’s one of the things that makes me think that we need to rethink the overall approach to laziness


> Do Haskell programmers have good intuition about performance? It seems like laziness would make it hard?

Laziness in data structures usually gives you good enough performance in most cases, but harder to make sure it's consistent and reason about when you need to adjust it. For common operations with an average requirement of performance, laziness seems to give the developer an easier chance of good performance, with the higher possibility to screw it up if the requirement is really strict.


> Do Haskell programmers have good intuition about performance? It seems like laziness would make it hard?

The solution is to just not use laziness when you don't need it, then reasoning about Haskell performance is the same as in any other language.

http://h2.jaguarpaw.co.uk/posts/make-invalid-laziness-unrepr...


I vaguely recall reading in https://www.amazon.com/Parallel-Concurrent-Programming-Haske... that there was research done into this in ghc but it wasn’t possible to generalize the behavior so that there weren’t situations which were very bad. I’ll look for it soon, see if I can find it


I’d love to see this as well.


A huge improvement by improving memory layout sounds fascinating. For example the described "pointer-heavy graph to something else" sounds like a transformation class that could in some cases be made in a compiler/tooling assisted way. I wish compilers (or other automated tooling) would help more with memory layout optimization. Anyone know if there's any interesting work going on in this area in the FP world (or outside, for that matter)?


This work is mindblowing me to me. Also I love the README!

You did mention that you pretty much implement pages 14-39 of The Optimal Implementation of Functional Programming Languages. Any idea why the authors of the book didn't implement it themselves?

Also, what lead you to this recent memory layout breakthrough. What was the journey of thinking towards this end?


I don't think the authors are focused on engineering matters; they're researchers after all. But that's just my guess. Also, there is an obsession with covering the full λ-calculus, which the abstract algorithm doesn't. A lot of energy has been put in that. I think this is just wrong. Rust has severe limitations on how you can write lambdas. HVM has some, which seldom occur in practice.

Regardless, some people did try implementing this in practice, dozens of times, including myself. It just wasn't that fast except for these cases where the asymptotics are superior. A naive implementation just stores graphs/edges on memory, but most of these edges are redundant. For example, a lambda doesn't need to point to its parent. By trying to turn the graph in a tree, I've ended up with SIC [0], which, once implemented efficiently, cut down memory and computation costs significantly. Doing so was tricky, specially because 1. the missing edges complicated the transversal; 2. some upward edges can't disappear (variables); 3. DUP nodes aren't part of expressions, they just "float", which was counter intuitive to get right. Finally, I learned to appreciate the fact that global rewrite rules are better than case-trees for recursive functions, since they greatly reduce the total rewrite count.

So, in short, my journey was:

1. Learn the abstract algorithm

2. Implement it naively as a graph

3. Optimize it by making trees whenever possible

4. Favor rewrite equations over case-trees

These steps lead to the design of HVM, which does fairly well in practice.

[0] https://github.com/VictorTaelin/Symmetric-Interaction-Calcul...


And another question, is this for purely functional code only, or does it / could it also apply to mutable data structures. In particular, is it somehow related to copy-on-write techniques (like for example used for Swift's data structures), or is this something entirely different?


Since HVM is linear, adding some mutability would be completely fine and would NOT break referential transparency (because values only exist in one place!). So, we could actually have "pure mutable arrays". In order to make that mix up well with lazy cloning, though, we'd need to limit the size of array to about 16. But larger arrays could be made as trees of mutable 16-tuples. There was a post about it, but I can't find it right now.


That makes sense to me; I am currently implementing a package with mutable data structures with shallow constant time cloning (via copy-on-write) for TypeScript, and the dup idea somehow seems to be compatible with that.


But wouldn't it make mutability only useful in local scopes?

```

fn f(x) { *x = 42 // lazily cloning on write };

let x = 1; f(x); // x still equals 1 here

```


Based on my understanding on the HOW.md text this relies heavily on mathematical equality for performance. In its execution it is essentially constantly inlining operations as it goes.


Could HVM let Haskell entirely eliminate its GC, doing away with GC pauses and enabling its use in real-time applications? That could be just as revolutionary as the beta-optimality.


It does that! There are no GC pauses on HVM's model. Compiling Haskell to it would be neat.


Could this lazy cloning be applied to an eager language like Rust? (sorry if that's a dumb question. This isn't my area of expertise)


If I were serious about porting this project to Go, where would you recommend that I start?


You mean, the whole project? Or just the runtime? The runtime has recently been ported to go by Caue F. He posted it in our Telegram: https://pastebin.com/0X0bbW8b

Perhaps you could team up?


Oh my lord, this is seriously the best news of the last decade. You, sir -- along with Caue F. -- have just made my decade!!

EDIT: is there a license for this? Can I use it?


This is MIT licensed, so feel free to fork, use, edit and sell it without even notifying me. Now go ahead and get rich!


I'll get right on that. :)

In all seriousness, thank you. I have exactly zero experience with Rust, but if I can help with anything, feel free to reach out: @lthibault on GitHub.

Moreover, I'll likely be spinning this off into a full-fledged repo of some sort. I'll keep you updated, and am more than happy to coordinate our efforts.

Thanks again, and congratulations on this incredible piece of engineering!


Hey, this was my port, nice to see people interested.

Yeah, it still needs a few changes on the HVM rust compiler to work, but we could figure it out together if you want.

Hit me up on @Cauef on telegram, caue.fcr@gmail.com, or let's talk on the issue pages of HVM!


Have to say this is the most excited comment I've ever seen on this website.


This is amazing!!


This is the sort of project which causes even me to break from wasting time on HN and go actually download things and try things.

In particular, a while back I came across an interesting tree-based hugenum representation in Haskell, Paul Tarau's "Hereditarily Binary Natural Numbers". It might be something (somewhat) practical that HVM can be used for right now. I'll see if I can translate it and get it to run.


JFYI, the "This book" link on https://github.com/Kindelia/HVM/blob/master/HOW.md goes to a research gate page which has the wrong book for download (something about category theory).

So, this dup primitive, is that explained in the book as well, or is this a new addition in your implementation?


I fixed the link. This is all part of the system described on the book, HVM is merely a very fast practical implementation. The additions (thinks like machine integers and constructors) don't affect the core.


I've seen your post in /r/brasil. Nice work.



> TornadoVM accelerates parts of your Java applications on heterogeneous hardware devices such as multicore CPUs, GPUs, and FPGAs.

So a HVM-like interpreter could be written in Java or variants thereof (instead of Rust), in order to take advantage of TornadoVM to run efficient lazy evaluation on exotic devices.


This looks pretty awesome, and holds much promise for the future. Optimal beta reduction has always seemed like something of theoretical interest only, but Victor has been working many years on making it more practical, which deserves a lot of respect.

> HVM files look like untyped Haskell.

Why not make it look exactly like Haskell? So that every valid Haskell program is automatically a valid Hvm program, while the reverse need not hold due to hvm being untyped.

Instead the syntax looks like a mix of Haskell and Lisp, with parentheses around application, and prefix operators. I prefer the cleaner Haskell syntax.

Some of the speedup examples look somewhat artificial:

The exponential speedup is on 2^n compositions of the identity function, which has no practical use. It would be nice to show such a speedup on a nontrivial function.

The arithmetical example uses a binary representation, but strangely uses unary implementation of operators, with addition implemented as repeated increment, and multiplication as repeated addition. Why not use binary implementation of operators? Does that not show a good speedup?


VictorTaelin here. It won't show a speedup because you already optimized it to a low level algorithm manually! The point of that benchmark is to stress-test the cost of overwhelming abstraction. What is really notable is that addition by repeated increment on HVM performs as well as addition with a carry bit. The former is an elegant one-liner, the later is a 8-cases error prone recursive definition. The point is to show how you can often write highly abstract, elegant mathematical definitions, and it will perform as well as a manually crafted version, and that is very cool.


Oh, you can write addition as repeated increment, and you get the efficiency of addition in binary?

Wow; that certainly is impressive.


Yes! The fact HVM can apply a function 2^N times in N steps is still mind-blowing to me. I think solutions to important problems might come from exploiting this fact. Keep in mind you need to use some techniques for that to work. In special, there are 3 important rules. I've just written an overview here: https://github.com/Kindelia/HVM/blob/master/HOW.md#bonus-abu...


> The fact HVM can apply a function 2^N times in N steps is still mind-blowing to me. I think solutions to important problems might come from exploiting this fact.

Is it just me, or does that look similar to the Fast Fourier Transform (FFT) [1] speedup (against the regular Fourier Transform implementation)?

[1] https://en.wikipedia.org/wiki/Fast_Fourier_transform


What kind of similarity do you perceive? What analogue of Fourier transforms and of the involved algebraic structures can you find in heterogeneous graphs of constructors?


> Why not make it look exactly like Haskell? So that every valid Haskell program is automatically a valid Hvm program, while the reverse need not hold due to hvm being untyped.

> I prefer the cleaner Haskell syntax.

"Haskell syntax" is a pretty bad design in my experience. Firstly, it isn't just one thing: there's Haskell 98 and Haskell 2010, but most "Haskell" code uses alternative syntax from GHC extensions (lambda case, GADTs, type applications, view patterns, tuple sections, etc.). (Frustratingly, those extensions are often applied implicitly, via a separate .cabal file, which makes parsing even harder; plus lots of "Haskell code" is actually written as input for a pre-processor, in which case all bets are off!)

GHC alone includes two separate representations of Haskell's syntax (one used by the compiler, one used by TemplateHaskell). Other projects tend to avoid both of them, in favour of packages like haskell-src-exts.

I've worked on projects which manipulate Haskell code, and getting it to parse was by far the hardest step; once I'd managed to dump it to an s-expression representation the rest was easy!


> I prefer the cleaner Haskell syntax.

My God, this is written so matter-of-factly that it hurts.

To me, Lisp is awesome BECAUSE of its uniform syntax!

To me, Lisp is waaaaaay cleaner and always unambiguous!

I guess what I'm saying is that it's subjective.


Lisp and Haskell are both easier to read than this syntactic hybrid of the two, but that might be familiarity


I totally agree on the syntax, Haskell is far cleaner.

I’m not sure what the reasoning was here. I hope the author or others can shed some light.


The reason for the syntax is that HVM aims to be a low level compile target, which sounds confusing because closures make it look very high level, but it should be seen as LLVM IR. It isn't meant for direct human use on the long term. Ideally it will be the compile target of languages like Haskell, Lean, Idris, Agda, Kind (my language) and others. As such, I just went for a syntax that simplifies and speeds up parsing, while still keeping it usable if you need to manually edit: there is no backtracking on this syntax.


Is there some pages that compare Kind and Idris 2?


There aren't. Kind is more stable, is more JS-looking and less Haskell-looking, which makes it more practical IMO, has a super fast JS compiler and you can write React-like apps on it easily, but some work is needed to improve the type-checker performance when certain syntax sugars are used. The core's performance is superb, though.

Idris2, I can't comment much. All I know is that I envy its unification algorithm and synthesis features. Its syntax is more haskellish, if you like that. It seems less mature in some aspects, though. Very promising language overall.


My guess is the author just picked the simplest syntax to parse in order to demonstrate the tech. If it works and shows to be successful you can always write a GHC (or whatever) backend that compiles to this runtime. Maybe come up with some binary IL etc


Or write self-hosting GHC backend that works like HVM does. That would be very neat.

For me, these projects (such as HVM) are great research stuff showing us how to do (or not do) things.


> I’m not sure what the reasoning was here. I hope the author or others can shed some light.

From my point of view, the author almost made it into s-expressions, but decided not to go all the way, which (imo) would have been amazing and the "cleanest" (although I don't like using the term "clean" for code, it's so subjective). I'd like to have some light shed over the decision not to go all the way.


Changing it to pure s-expressions would be easy though, I guess just `let` isn't following that rule? I do love the no-parenthesis lets though.


On a quick scan, it seems to be some more things than just `let`. One example would be `(Main n) = (Sum (Gen n))` should be closer to `(define Main (n) (Sum (Gen n)))` or similar. Not sure the change would be as easy as you think, but happy to be proven otherwise ;)

A `let` could assume un-even amount of forms in it and only evaluate the last one, using the other ones as bindings.

Example from your README:

    (Main n) =
      let size = (\* n 1000000)
      let list = (Range size Nil)
      (Fold list λaλb(+ a b) 0)
    
    (define Main (n)
      (let size (\* n 1000000)
           list (Range size Nil)
        (Fold list λaλb(+ a b) 0)))

Could even get rid of the parenthesis for arguments (`(n)` => `n`) and treating forms inside `define` the same as `let`.


> On a quick scan, it seems to be some more things than just `let`. One example would be `(Main n) = (Sum (Gen n))` should be closer to `(define Main (n) (Sum (Gen n)))` or similar.

Good point.

> Could even get rid of the parenthesis for arguments (`(n)` => `n`) and treating forms inside `define` the same as `let`.

No, because non-curried functions are a feature. If we did that, every function would be curried. Which is nice, but non-curried functions are faster (lots of lambdas and wasteful copies are avoided using the equational rewrite notation), so they should be definitely accessible.


Just wanted to clarify that I definitely don't think this looks like a bad language as-is, I don't want to give the impression because of this small preference, it's less valuable (but I'm sure you knew that already). It's really impressive design, congratulations on getting it out for the world to play with :)


Curious, this suggests (lambda a b form) to go along with (let a 42 form). I've seen let without the extra parens but not define or lambda. Thanks for the aside.


Yeah, can't remember if I've seen that in the wild either before. Just as a disclaimer, the idea is off-the-cuff and I'm not 100% it would work in practice or be beneficial beyond containing less parenthesizes. You wouldn't be able to have multiple forms that gets evaluated for example, without having to wrap it in something like `do`, is the first argument against it. But maybe multiple forms in the body is less common than one in the first place.

Many questions arise, for a small amount of benefit (imo), so not sure it's worth it.


Well we gotta have _some_ word for subjective quality.


We do have "elegant," which to me means "intellectually ergonomic." Something can be "beautiful" while being totally intimidating and impossible for a human to work with, it can be "easy" despite wasteful complexity because it you can confidently work through the cruft, but "elegant" implies both beauty and ease, fitting naturally to our human mental capabilities without unnecessary detail.


Not very knowledgeable, does it mean it can be a foundation for a Scheme compiler?


So many new programming languages coming up with clever features that makes me wish I could mix and match their core features while using libraries from more popular languages.

Up and coming Languages I am excited about -

1. Roc - https://www.youtube.com/watch?v=vzfy4EKwG_Y

2. Zig - https://ziglang.org/

3. Scopes - https://sr.ht/~duangle/scopes/

4. Gleam - https://gleam.run/

5. Odin - https://github.com/odin-lang/Odin

6. Koka - https://github.com/koka-lang/koka

7. Unison - https://github.com/unisonweb/unison


I have seen talks and/or papers on all of these except Scopes and Gleam. Out of the list the only one that does not provide something I am interested in is Unison. Given that your feelings for interesting language features seems to be at least marginally close to mine I am going to check out Scopes and Gleam just to see what they have that interested you.

Personally, from a conceptual level, I find that Koka and Roc provide some of the more interesting developments in PL design. For anyone wondering, as to Roc (which I don’t think has an implementation or even a full description yet) I am particularly interested in their concept and implementation of the ‘platform’/‘application’ layer idea. As to Koka, the papers the team has generated are almost all excellent. The paper describing Perseus, the GC/resource tracking system, and the compiler implementation review are stunning.


Unison is really quite interesting:

* Effect system

* Built in code distro distribution. As in: call a function on a different node that doesn't have that code yet

But most interesting: stops storing code as plain text files, but in a database instead. This breaks many workflows but could offer quite a few interesting properties.


I don’t really have any outstanding objections to the goals of Unison, it is just focused on paths through the PL design space I am not particular deep into. My hobby language is a multi-modal, multi-type system,‘concatenative’ language. So my tastes and interests are pretty far away from where Unison is experimenting. But varying paths travelled hopefully lead us to a more diverse and productive place for everyone.


There is an initial version of Roc, though the repo is private and you need to ask Richard Feldman for access


Don't forget Kind (https://github.com/Kindelia/Kind) which is mentioned in the "How can I Help?" section as a potential language for targeting HVM.


Nim - https://nim-lang.org/

Lets you mix and match other libraries with their native ABI as it compiles to C, C++, ObjC and JS + has excellent FFI.


What about support for using Golang or C# packages? That'd be handy.

Interesting read about NIM-

https://www.quora.com/Why-hasnt-Nim-Nimrod-become-as-popular...

Key points:

- Nim lacks official support to multiple inheritance or interfaces (as that in Java) or mixin (as that in Ruby). It feels uneasy when implementing common design patterns, which is important for middle to large-sized projects.

- The error messages emitted from Nim compiler looks a bit obscure. Sometimes you have to guess or google to comprehend what really happens in your code.

These seem like pretty good problems compared to common annoyances and caveats in many other languages.


TBF the reply in that link was written 4 years ago, before the language went 1.0

> Nim lacks official support to multiple inheritance or interfaces

Multiple inheritance is a bit of a trainwreck IMO (see diamond problem).

The language isn't designed around OOP, and is instead procedural with metaprogramming for extension. You get more bang for your buck this way, but for people with their head in inheritance it’s probably a shock.


Not general purpose but has very interesting ideas nonetheless - https://imba.io/


there is also maybe Dada https://dada-lang.org/welcome.html


Great work Victor, it's so encouraging to see Asperti & Guerrini's research receive engineering attention! Anton Salikhmetov also had some nice implementations [1] a few years ago. As I understand, even Andrea Asperti didn't believe his approach was practical [2] :)

Any idea how hard it would be to extend with a nondeterminism operator, as in Plotkin's parallel or? (fork a computation and terminate if either branch terminates)

[1] https://dblp.org/pid/49/4772.html

[2] https://arxiv.org/abs/1701.04240


Andrea Asperti's only mistake was thinking he was wrong.


Readers will find HOW.md interesting too: https://github.com/Kindelia/HVM/blob/master/HOW.md


It says:

> Notice also how, in some steps, lambdas were applied to arguments that appeared outside of their bodies. This is all fine, and, in the end, the result is correct.

But one of the example steps is this transformation:

    dup a b = {x0 x1}
    dup c d = {y0 y1}
    (Pair λx0(λy0(Pair a c)) λx1(λy1(Pair b d)))
    ------------------------------------------------ Dup-Sup
    dup c d = {y0 y1}
    (Pair λx0(λy0(Pair x0 c)) λx1(λy1(Pair x1 d)))
I think I’m missing a subtlety. This step substitutes the a/b superposition into the lambdas, but I don’t see how it knows the the x0 goes into the first lambda and the x1 goes into the second one. After all, in HVM bizarro-land, lambda arguments have no scope, so the reverse substitution seems equally plausible and even syntactically valid:

    (Pair λx0(λy0(Pair x1 c)) λx1(λy1(Pair x0 d)))
This substitution produces two magically entangled lambdas, which seems fascinating but not actually correct. How does the runtime (and, more generally, the computational system it implements) know in which direction to resolve a superposition?


`dup a b = {x0 x1}` will just substitute `a` by `x0` and `b` by `x1`, no matter where `a` and `b` are. So, if you apply that, verbatin, to:

    (Pair λx0(λy0(Pair a c)) λx1(λy1(Pair b d)))
You get:

    (Pair λx0(λy0(Pair x0 c)) λx1(λy1(Pair x1 d)))
Does that make sense?


Wow, I was looking for an ELI5 and I never expected to find one from the author of such a project...

I actually feel like I kind of understand it now - kudos to @LightMachine!


“In other words, think of HVM as Rust, except replacing the borrow system by a very cheap .clone() operation that can be used and abused with no mercy. This is the secret sauce! Easy, right? Well, no. There is still a big problem to be solved: how do we incrementally clone a lambda? There is a beautiful answer to this problem that made this all possible. Let's get technical!”

Every third paragraph feels like the autogenerated flavour text gibberish of a kerbal space program mission description.


As an author of a language myself and going through the process of documenting it, let me assure you that writing technical docs is several orders of magnitude harder than writing comments on HN. And not only is it hard but it is also extremely time consuming and far less interesting than writing code too. Yet it needs to be done. To that end, the value of community support in writing documentation can never be overstated. So if you have any issues specific issues with his documentation then I'm sure the author would welcome a pull request.


I know it needs improvements, there is even a disclaimer in the top of HOW.md. I'll go over it and improve sentences like that in a future.


I enjoyed it, you can clearly see the author is excited about explaining his work.


I don’t know that I would go as far as you did, but the writing style is a little bit to hyperbolic or maybe just to rhetorical/conversational for my tastes.


I see the author is replying here so I just wanted to say I enjoyed following the Formality posts on your blog and updates on GitHub and hope there's time for more of that in the future. Wishing you all the best with this project!

As an aside it may be a convenient reference if you laid out the history of Formality/Kind as I think a large portion of your target audience already knew about Formality. The rename gets a little lost in IMHO, at first glance at least.


This had a Show HN a few days ago: https://news.ycombinator.com/item?id=30152714


Really neat. I've been working on a lazy parallel language and so I read the "HOW" document closely to see the secret sauce. What's exciting to me is that, while I'm not familiar with the academia behind their decisions, it seems that I arrived at the same result with regard to their lazy clones: I realized early in the language that caching the eval results would make things really fast, and this caching ends up doing exactly what their lazy clones achieves!

Of course one place I don't cache is in function application because the language is not pure (so we want side effects to occur again), but this makes me think it might be really good to derive which functions/subexpressions are pure and allow the cache to be retained through function calls for those.

I'm gonna keep a close watch on this. I should probably read more white papers :)


> I realized early in the language that caching the eval results would make things really fast, and this caching ends up doing exactly what their lazy clones achieves!

Sounds similar to the evaluation model of the Nix expression language (the pure functional programming language used by the Nix package manager). It caches evaluation results, which (a) avoids re-computing identical expressions, and (b) allows some infinite loops to be detected.

See https://www.researchgate.net/publication/222827743_Maximal_L...


Thanks for bringing that up! I suspected that nix did that, but wasn't sure. This language is also a "task runner" (being used for build systems mainly), so it's not surprising that it behaves like nix does.


Thank you!


The fellow behind this has been hammering at experiments around so called optimal reduction strategies for a while!

As another commenter’s remarks imply; some of the so called benefits for what are called optimal reduction strategies are limited to efficiently manipulating data structures that are represented by lambdas. And that’s also probably why this is Untyped. Makes it easier to embed / support such an encoding


Agreed, but the reference he makes to the symmetric interaction calculus blog post seems to at least have basis in sound logic/semantic principles. I haven’t read the whole post, but I am at least willing to give it a chance.


Linear logic inspired ideas are always great


Disclaimer: I'm just a Pascal programmer... so I didn't grok the nitty gritty of the lambda calculus, but I think enough to see where this is going.

This evokes a similar feeling to when I learned that Lisp wasn't John McCarthy's target programming language. I began to wonder how powerful that intended language would have been, if everyone hadn't stopped at the bootstrap language.

There is great power in what is shown, and I suspect even more power in what it enables. I look forward to see where this goes, there clearly could be another similar leap.

Question: Does this mean it's possible to rework Rust to remove the accursed borrow checker? (I tried learning Rust, and stopped at the Borrow Checker, so I gotta ask)


> Question: Does this mean it's possible to rework Rust to remove the accursed borrow checker?

Sure, with the "unsafe" keyword. But you probably don't want to do that in most cases as it will open new failure modes the compiler can protect you from. As another commenter pointed out, the borrow checker has become more clever over time (with NLLs) so if you tried Rust years ago i definitely recommend to give it a new try.

At first i was a little confused with compiler errors, but they're pretty good (unique error code, and usually suggestions for fix) but over time i realized the compiler was not my enemy but rather a very friendly reviewer preventing me to make obvious mistakes in my code. Nowadays i pseudocode my Rust on paper then write the actual code, then spend some time arguing with the compiler, but the design-code-debug lifecycle has been much reduced (compared with other languages) because there are very good chances the program works perfectly as soon as it compiles, and when it doesn't in 99% of cases that's because my logic was wrong to begin with, not because of a programming error.


For the most part, Rust's borrow checker just serves to verify that non-`unsafe` code doesn't modify/deallocate memory that is still accessible through a different pointer (a reference , in Rust terms; actual (raw) pointers free of such implications can't be used for data access without `unsafe`).

They further imply that they are non-NULL and their pointee is dereferenceable (no segfault; correctly aligned) and a valid bitpattern for the pointee type. These assumptions are provided to LLVM, allowing a bunch of optimizations already. Mutable references further guarantee `noalias`/`restrict`, as a mutable reference to some memory location implies no other references to that memory exist, an exclusivity enabling aggressive store-to-load forwarding and elimination of redundant stores.

The non-NULL nature makes `Option<&T>` pointer-sized, so semantically nullable pointers have no memory cost and just force explicit handling of their NULL state when accessed (there are convenience method-syntax functions and `match` statements who's cases (patterns, in Rust terms) look like type constructors on the left side of a `=>` and have the inner values bound to the identifiers on the expression to the right of the `=>`). If a function itself returns an `Option`, `foo?` can be used to just unpack a `Some(bar)` and `return None;` on the nullable case. This syntax also works with `Result<T, E>` which allows conveying detailed errors (similar to exceptions, but with performance matching C's negative return codes instead of C++'s unwinding).

References are written as `&T` for immutable ones and `&mut T` for mutable ones. Raw pointers are written as `const T` for immutable ones and `mut T` for mutable ones.

These guarantees of no-mutable-shared-memory not guarded by explicit `Atomic` types (or concurrent datastructures (I count `Mutex` and `RwLock`)) carry semantic guarantees beyond what an auto-`.clone()` and/or GC can provide.

I highly suggest you to try Rust again, but this time using non-lexical-lifetimes (NLL(s)), which shift the lifetime enforcement from simple lexical scope to a control-flow-graph based reachability. They lack the unintuitively-restrictive limitations particularly notable with loops, while still simplifying to abstract over combinatorial path explosion.

BorrowChk is your friend, I promise. Just ask in e.g. the Community Discord if you're getting rejected for no good reason despite trying to understand/work around it's complaints. You're not really fighting it; it's more like a sparring partner than an enemy.

Also, `.clone()` and `#[derive(Debug, Clone, Copy)]` are your friends when fighting BorrowChk.


The submission title says "optimal", the README says "beta-optimal", what do they mean in this context and are they true?


Optimal beta reduction is a very technical term. In effect it means any "family" of lambda terms will all be reduced "at once." What is a family? Go take two semesters in rewriting theory and then perhaps our two heads together can pin it down exactly.

In laymen's terms, it means that you're doing a variant of call-by-need, except you can be absolutely sure you don't copy applications without need.

I am personally skeptical it's a useful idea at all. Machine computation is what we really care about (how fast can you push instructions through the pipeline) and graph reduction systems were already all the rage in the 80s. We gave up on them because they didn't interact with the cache well.

So, this is the "motherload" of all graph reduction strategies, yet it will get absolutely trounced in settings like numerical computation.


There's an interesting interaction with implicit parallelism. For a given reduction to occur at most once you need communication across threads so that two don't do the same reduction at once.

I suspect at most once per hardware thread is a faster choice - repeating some calculation for decreased communication.

Or you need a static assignment of reduction to threads which means predicting how long each reduction will take.


As a counter argument ( and i'm not 100% sure it holds up ).

The cache failures was because the cache was to small. Whereas now we care about CPU OP data dependencies.


Can you link to learning resources?


"Optimal" in this context means it never repeats the work of beta reduction when it doesn't have to. Ie, evaluating `(\x -> x + x) (2 * 2)` only evaluates `2 * 2` once.


Why is this special? Doesn't that mean GHC is beta optimal since afaik it stores thunks to computations.


GHC is not beta optimal, due to some complications around what happens when a closures are created. (This is `HOW.md` claims)


That's not optimal on real machines; it's often faster to recalculate x*y than to cache it.

In general this "optimal" strategy seems to assume infinite memory that's all accessible at the same speed.


The compiler could have heuristics and profiling data to replace optimal but "small" reductions with cheap recomputation with even less locking where it seems like a good idea; it's an additional level of improvement rather than a competing choice. As the Python documentation says, "practicality beats purity", particularly when the sacrificed purity is only the pride of using an universal solution.

In a similar vein, if typical strategies for unstable sorting of arrays in place use quicksort splitting for long spans and insertion sort for short spans it doesn' mean that either algorithm is "better".


"Beta Optimial" != Fastest Execution strategy for all kinds of code. Just defining the term, not arguing about it's engineering application.


The project certainly says it has "drastically improved efficiency" and "uncharted levels of performance".

I don't think "massively parallel" is all that good for performance either, because battery life is part of that.


Ok well _I_ was just defining the term. I didn't write the project. You will note they provide benchmarks with data showing the efficiency gain. What's left (as a reader) is evaluate the quality of those benchmarks.


Can someone point me to a reference explaining what beta reduction is?


https://en.wikipedia.org/wiki/Lambda_calculus#%CE%B2-reducti...

Basically, beta-reduction is the single primitive that performs all computation in the lambda calculus. When you apply the function mapping x to M, to argument N, the result is M with all occurences of x replaced by N.


Is there any practical example on this? Whats a common use casE?


Any evaluation of a functional-based language is going to use this. Here's a simple example:

Let's say I have a function `add1` `add1 = \x -> x + 1` (A function that takes a value `x` and then computes `x + 1` And then I write the program: `add1 2`

If we want to evaluate this program, first we substitute the variable to get `add1 2` -> `(\x -> x + 1) 2`

Then we perform "beta reduction" to get: `(\x -> x + 1) 2` -> `2 + 1`

Then it's simple arithmetic: `2 + 1` -> `3`

"The point" here is to have formal rules for defining what a computation means. This helps with writing proofs about what programs do, or what is possible for programs to do.

For example, say I have a simple program with static types. I might want to prove that if the programs type checks, type errors will never occur when you run the program. In order to do this proof, I have to have some formal notion of what it means to "run the program". For a functional language, beta reduction is one of those formal rules.


I will note though that in the actual lambda calculus there is no "+", "2" or "1" - it's functions all the way.

1 is typically defined in the Church encoding as \f -> \x -> f x. Addition is \m -> \n -> \f -> \x -> m f (n f x). Finding the Church encoding of 2 is left as an exceecise.

We can then use beta reduction to compute 1 + 1 by replacing m and n with 1:

  1 = \f -> \x -> f x
  (\m -> \n -> \f -> \x -> m f (n f x)) 1 1 =>beta
  \f -> \x -> 1 f (1 f x)

  \f -> \x -> 1 f x = (\f -> \x -> f x) f x =>beta
  1 f x = f x

  \f -> \x -> 1 f (1 f x) <=> \f -> \x -> 1 f (f x)

  (\f -> \x -> f x) f (f x) =>beta
  \f -> \x -> f f x - the Church encoding of 2.
This is how computation using beta reduction only looks like.


Yup, was just trying to give an example that utilized primitives to make it easier.


Yes yes, I wanted to add to your example, not to say that you said anything wrong!


> Whats a common use case?

If we expand the discussion from "lambda calculus" to "pretty much every programming language", then "beta reduction" becomes "calling a function with an argument"

> Is there any practical example on this?

Given the above, you'll be able to find beta-reduction in practically all software ever written ;)


Isn't that the process of compilation?


"Compilation" usually describes translation from one language to another (e.g. C to machine code).

In contrast, beta-reduction rewrites expressions in the same language. It's a "reduction" since it brings expressions closer and closer to their "normal form" (i.e. a halted program); although some expressions have no normal form (they run forever, i.e. the Halting Problem).

In Javascript syntax, the following expression is not in "normal form":

    (function(x) { return x + 5; })(42)
If we apply beta-reduction to it, we get back the function body, with occurrences of 'x' substituted by '42':

    42 + 5
That expression can no longer be beta-reduced; it is "beta-normal". Lambda calculus only uses beta-reduction (although there are "alpha" and "eta" replacements, which are more like refactorings).

However, Javascript has more reduction rules than just beta-reduction; in this case we can use its numerical operations, to reduce the expression further:

    47
That is in normal form for Javascript; i.e. the program has halted.

Note that it's not "compilation", since we stayed in the same language at all times, just replacing one expression with an equivalent expression.


This is incredibly exciting and a tour-de-force. My immediate thoughts are on turning this into a hardware automaton, eg. an FPGA softcore like Reduceron. I see some interesting questions like how to handle user defined rewrite rules, but nothing insurmountable.


Does anyone know how this affects people who write code in OOP-languages like C++/Java? The only time I have used a functional language is first year of college, and it was a pain.


Really awesome, any thoughts on autodiff?


A problem with Haskell and with this is that they are inherently only suitable for batch processing. You cannot build interactive systems that have real-time requirements with these languages because how computation is done is not generally under control of the programmer.

People attempt to program interactive systems with Haskell but then they spend most of their time adapting their program to GHC’s particular evaluation strategy. Usually the optimized program ends up convoluted and nothing like how Haskell code should look, additionally still with no guarantees on the run time.


It's hard to build systems with hard real-time requirements with GHC Haskell.

The vast majority of systems people build do not have hard real-time requirements. People have built video games, real-time audio, and all sorts of other soft-real-time systems with Haskell


I've worked on a real-time requirements system with GHC Haskell and it's certainly possible!


> soft-real-time systems

No they aren’t soft real time systems either, most applications in general a don’t satisfy soft real time requirements.

Most people hack it but hacking it with Haskell requires rewriting your pure functional code into a form that strongly resembles procedural languages like C. This defeats the purpose of a lazy pure functional programming language.


If most applications don't satisfy soft real time requirements either, then maybe those requirements aren't really important for most cases.

There's only one industrial Haskell codebase I've worked on, and although parts of it were very procedural, that was probably less than half the codebase, even including the IO-bound code. Sure it's one way to use the language, but it's certainly not the only one.


> If most applications don't satisfy soft real time requirements either, then maybe those requirements aren't really important for most cases.

Most applications are buggy and crappy. Sure, if you’re churning out minimally passable garbage for a paycheck, then don’t worry about correctness or timing requirements and pray for the best. Cf. Slack desktop client. If you are producing something that necessarily has a higher quality bar then these things matter, e.g. an aircraft control interface. Haskell isn’t up to the challenge (imagine indefinitely waiting without feedback for a long-lived heavily nested thunk to evaluate when you press “land” ouch).


That is really not true, but I do agree Haskell passes that impression, for the wrong reasons. I'm working hard to fix these misconceptions on HVM, Kind-Lang and related projects!


It is true and it’s not a bug. It’s a feature. It’s an inherent property of the lazy pure functional programming model, not any particular language. I can’t remember where he said this but SPJ himself said that was a design goal of Haskell. In C you must specify both how and where computations take place. In Java you don’t have to worry about the where (because memory allocation is automatic) but you still how to worry about the how. Haskell goes one step further and abstracts both time and space away from the programmer. Touting this as a benefit of the language but then denying the negative consequences is attempting to have your cake and eat it too.

And for the record Java and other garbage collected languages are not generally suitable for interactive applications either. Anyone who has ever waited on a GC pause can attest to that. This is the exact reason Rust exists and why people continue to use C/C++ despite being inconvenient languages to use.


I do disagree with SPJ here, though. On HVM, you can have complete control over how the resources (both space and time) are used in your programs and algorithms. That is why it outputs the memory (space) and rewrites (time) cost of each program you run, and it is so precise you can even make formulas of how much memory and how many rewrites your functions will demand. You can absolutely have tight memory control in HVM, just like in C. Being lazy, high-level, etc. is NOT incompatible with that, I'd argue.


Would you say DOOM is interactive with real-time requirements?

https://github.com/rainbyte/frag


The vast majority of the code in that repository is in the IO monad and uses carefully placed “!” eager evaluation annotations. Both facts only serve to reinforce my point that a programmer must have control over the way in which computations proceed to build interactive applications. Haskell does allow some level of control over this as I said but it ends up not looking like Haskell, it looks like C with funny syntax.


> The vast majority of the code in that repository is in the IO monad and uses carefully placed “!” eager evaluation annotations.

Looking through that repository, I believe we have extremely different definitions of "vast majority."


Nearly everything in Render.hs is in the IO monad. That is a fact, not a subjective opinion. I point out those routines because they are the heart of the application.


Yes, rendering is by definition I/O. That's also one file. You stated:

> The vast majority of the code in that repository is in the IO monad and uses carefully placed “!” eager evaluation annotations.

Do you have anything to back that up?


> Yes, rendering is by definition I/O.

First of all rendering is not by definition I/O except in the trivial sense that all functions take input and produce output. A pure 3D rendering function takes game state as input and produces a list of triangles + attributes to draw as output.

Even if that were true it would only further validate my point.

> Do you have anything to back that up?

Yes, I already showed that Render.hs validates my point.


> The vast majority of the code in that repository is in the IO monad and uses carefully placed “!” eager evaluation annotations.

This is the claim we're talking about. Since you're into facts, not subjective opinions, show some evidence that the vast majority of the code in the repository is in the IO monad and uses carefully placed "!" eager evaluation annotations. Just to be clear, you haven't done that yet. That's a fact, not a subjective opinion.



Here's one more file than you listed without BangPatterns enabled:

https://github.com/rainbyte/frag/blob/master/src/BitSet.hs

https://github.com/rainbyte/frag/blob/master/src/Command.hs

https://github.com/rainbyte/frag/blob/master/src/Curves.hs

BangPatterns is a normal language extension to have enabled, was this used heavily? (hint: it's not enabled on "every source file.") You listed two of 28 files there, I'm assuming to try to show that the vast majority of the code in that repository is in the IO monad? You went from "vast majority" from one file to now two files. I'm looking for objectivity here, as you seem to be into. Let's see some numbers. I think anyone that glances at that repo would need some convincing of your claims.

Render.hs is 211 of 5580 lines of Haskell in the repository, by the way.


Curves.hs may not have BangPatterns but it is heavily in the IO monad. My point is sufficiently supported by the provided evidence, even evidence provided by you, exceptional cases not withstanding. Thank you.


You still haven't shown me any data to back up your assertion, but you're welcome I guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: