Hacker News new | past | comments | ask | show | jobs | submit login
Provably space-efficient parallel functional programming (sigplan.org)
115 points by matt_d 6 days ago | hide | past | favorite | 24 comments

This sounds a bit like software transactional memory (STM), in the sense of giving threads the illusion of total freedom while keeping them separate in reality. In STM data races are solved by unwinding the victim thread, essentially allowing threads to ask for forgiveness instead of permission.

Crucially, this new scheme still allows for mutation of shared data structures as long as they were allocated before the thread's birth. You can enjoy parallel execution managed by the compiler, but maintain your ability to footcannon if you need to, and don't have anywhere near the bookkeeping overhead of STM.

> We view disentanglement as generalizing the notion of “time” as it relates to memory for parallel programs: in both sequential and disentangled parallel programs, an instruction can only access objects that are allocated before it.

> Disentanglement opens a new perspective on parallel memory management: rather than viewing memory as a global shared resource, we can instead organize memory as a distributed structure that mirrors (i.e., corresponds with) the structure of parallelism in the program.

Parallel, concurrent, and distributed systems are moving closer and closer to a relativistic model of the world. If we consider memory as roughly equivalent to space (N bits of memory take O(sqrt(N)) physical space http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html ) then this approach is essentially scheduling the work across spacetime (rather than just time); or, from a different perspective, the interactions defined within the program can be interpreted as a spatial layout of the problem/solution, which this compiler is utilising when deciding how to arrange the allocations.

See also https://en.wikipedia.org/wiki/Relativistic_programming

I know nothing about distributed programming, but Leslie Lamport's paper [1] from 1978 about distributed systems already reads very close to the typical intro in special relativity about clock synchronization. I would be surprised if there is not already a body of research about whether:

1) Relativity in the physical world arises if we consider some distributed model of computation

2) Physical laws, in particular speed of light, affecting algorithm complexity limits

[1] http://lamport.azurewebsites.net/pubs/time-clocks.pdf

It's not Einsteinian, though. Simple Gallilean relativity is enough to model what's going on.

I disagree. Concurrency (including parallelism) only defines a partial order on events, not a total/global order. This leads to one of the following:

- We assume there's a total order anyway. This lets us use Gallilean relativity, but is an incorrect model, which manifests as race conditions.

- We impose a total order, by introducing causality between every event (e.g. synchronisation, or data dependencies like counters, etc.). This lets us use Gallilean relativity and maintain correctness, but it is no longer concurrent/parallel.

- We only assume a partial order. This avoids race conditions, but allows races between independent results (which can therefore be calculated in parallel). This is inherently relativistic, since independent events can occur in a different order for different observers (in the case of this article, the observers are CPU cores).

See also https://en.wikipedia.org/wiki/Relativity_of_simultaneity

This seems cool, but this blog article doesn't really delve into what the "provably" part is -- I can see how the structure they're describing of having processors handle GC for a collection of heaps which are structurally linked might make GC more efficient, but they don't really talk about what is proved or how.

I guess, even from a practical perspective, how does this end up comparing with GC in something like the JVM with functional languages like scala?

Near the top there's a link to a paper which seems to go into more technical detail: https://dl.acm.org/doi/10.1145/3434299

Context around the link:

> In our POPL 2021 paper (which won a Distinguished Paper award), we proposed a specific heap scheduler and showed that it yields provable work and space bounds.

I haven't read it yet, but this is where their claims of being provably space-efficient seems to come from.

EDIT: A quick read later, section 6 is where they prove their claim. It's efficient when compared to equivalent sequential execution.

> It's efficient when compared to equivalent sequential execution.

It looks like they're comparing against MLton, which is interesting since it performs whole-program optimisation. That makes their speedups even more impressive!

MPL is based on MLton, so they also do whole-program optimisation (the same one, in fact).

Yes, but I imagine traditional compiler optimisations like inlining, unrolling, etc. are easier to apply to serial code.

Well, to the compiler, the parallelism probably just looks like an opaque call to a higher-order function. The bulk of the program will still be serial code, which will be subject to all the normal optimisations. I talked a bit with one of the main authors of MPL, and he mentioned that the existing optimisation passes in MLton just kept working without any nontrivial modifications.

So they are using the same strategy as Erlang (partitioned heaps), with a nice twist of adding hierarchy.

That is great, but runs into the next problem with concurrent programming - shared data - where, for performance, you absolutely have to share memory between heaps and pass ownership between heaps.

Erlang has specific escape hatches for that, I wonder how that would in a functional language.

Author here. One of the cool things about the property we're using here---disentanglement---is that it specifically allows for shared data, under only mild restrictions. This allows us, for example, to implement fast libraries which utilize shared mutable state for efficiency under the hood. A good example is our parallel arrays library (https://github.com/shwestrick/sml-parseq), which is "purely functional" in terms of its interface, but not its implementation.

It's helpful here to distinguish parallelism from concurrency. Disentanglement naturally emerges in data-race-free parallel programs, which have no concurrency. But certainly, you bring up a good point for programs that are highly concurrent in addition to being highly parallel. There's lots of work already on concurrent functional programming, for example CML (https://en.wikipedia.org/wiki/Concurrent_ML), and we think these ideas could be adapted to work with disentanglement really well.

I don't know enough about how Pony was implemented, but I know that each `Actor` has its own GC going on independently of any other actors, which sounds very similar to the idea being proposed in this research?

Memory sharing in Pony uses a more advanced form of borrow checker which allows not just two "kinds" of shared memory (as with Rust's `mut` and not `mut`), but several: `val`, `ref`, `iso`, `trn`, `box`, `tag`[1]. I think that for any solution to this problem that actually allows any shared memory to exist between threads, something in those lines must be required.

[1] http://jtfmumm.com/blog/2016/03/06/safely-sharing-data-pony-...

> To write parallel programs for such hardware, researchers have converged on using implicit parallelism, where the programmer expresses all opportunities for parallelism, and the compiler and run-time system then work together to manage parallel execution, relieving the programmer from an otherwise tedious and difficult task.

This idea would be hugely controversial in all places I've worked at, and is often the opposite direction of where many languages, runtimes etc are moving. Maybe it's just innocent wording, but "research converging" on something that few in the industry believe in makes me think academia may be navel-gazing beyond healthy levels, even if the research itself is very interesting.

> Maybe it's just innocent wording, but "research converging" on something that few in the industry believe in makes me think academia may be navel-gazing beyond healthy levels

Did the industry believe in the value of linear or affine types? Sound static typing? Pure functional programming?

It's the typical arrogance of the practionier to rule out computer science research as irrelevant just because the application doesn't come in a nicely advertised package yet.

Of course, the scientific community can get lost on wrong paths or mix up fashion with established knowledge. I don't know enough about this particular topic to form an opinion here. But in general, when there has been a scientific consensus regarding programming languages it turned out to be relevant for the field.

> I don't know enough about this particular topic to form an opinion here.

To be fair, I don't either, depending on what's enough. I base this on what I've read, heard and experienced regarding GCs, query planning, and a lot of runtime "magic", that often end up being over generalized black boxes in many practical cases. In turn, this led to a Renaissance of simpler systems that are predictable, easy to reason about, and tweakable for the specific domain. Just my observations.

> But in general, when there has been a scientific consensus regarding programming languages it turned out to be relevant for the field.

I have a different view of academia than most people, so not expecting agreement here, but scientific consensus in a field like programming language theory should be rare, in fact I'd expect much more vigorous debate. This field is so young, incredibly complex and has common surface area with industry use cases and our mental models.

Looks exciting,

Quick question. What sort of parallelism are they discussing, Mimd or Simd?

Notably, can what they're discussing run on a GPU?

Difference here: https://en.wikipedia.org/wiki/Flynn%27s_taxonomy

A combination of both. This work targets multicore, but execution on the GPU is definitely possible. It's just a different line of research. I'd highly recommend taking a look at Futhark (https://futhark-lang.org/), which has a similar programming model (purely functional and parallel), but targets GPU. The compilation strategy is quite a bit different!

I have looked Futhark, however...

What I find more interesting is things like Dyna-Soar[1] as well as the parent project/paper. Futhark seems primarily very tensor-type computation specific (like APL+ extra features) and I'm interested in things that more naturally prone to code and data divergence and in trying to find ways to compensate for this.

[1] https://2019.ecoop.org/details/ecoop-2019-papers/12/DynaSOAr...


"Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. They're too common to be interesting."


For me scrolling works fine on mobile but is broken on Chrome desktop (page snaps around in ways I don't expect). Just disable javascript on blog.sigplan.org, the page works fine without it.

in what way?

for me: white layout repaints. They could probably prevents this with CSS contain even though this does not solve the root cause of this anomaly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact