

Show HN: Micro-mitten – Research language with compile-time memory management - doctor_n_
https://github.com/doctorn/micro-mitten
I&#x27;ve been working on implementing the compile-time approach to memory management described in this thesis (https:&#x2F;&#x2F;www.cl.cam.ac.uk&#x2F;techreports&#x2F;UCAM-CL-TR-908.pdf) for some time now - some of the performance results look promising! (Although some less so...) I think it would be great to see this taken further and built into a more complete functional language.
======
steveklabnik
I haven't dug into the details a ton, but I am excited to see this! Would love
to see more research in this direction.

> micro-mitten's approach is significantly different from Rust's. Rather than
> depending on single ownership and a complex lifetime system, micro-mitten
> uses a series of data-flow analyses to statically approximate heap liveness.

To be clear, Rust these days also looks at control-flow. This was what all the
"non-lexical lifetimes" hubbub was about. And the next generation checker is
based on datalog...

~~~
throwaway894345
> And the next generation checker is based on datalog...

Where can I learn more about this?

~~~
Rusky
It's called "polonius": [https://github.com/rust-
lang/polonius](https://github.com/rust-lang/polonius)

There are some posts on Niko Matsakis' blog, starting with this one:
[https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-...](https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-
alias-based-formulation-of-the-borrow-checker/)

More recently a really good talk, with slides here:
[https://nikomatsakis.github.io/rust-belt-
rust-2019/](https://nikomatsakis.github.io/rust-belt-rust-2019/)

------
TheAsprngHacker
Discussion on r/ProgrammingLanguages:
[https://www.reddit.com/r/ProgrammingLanguages/comments/gfgn0...](https://www.reddit.com/r/ProgrammingLanguages/comments/gfgn0r/research_programming_language_with_compiletime/)

Discussion on r/rust:
[https://www.reddit.com/r/rust/comments/gfgt1b/rustlike_langu...](https://www.reddit.com/r/rust/comments/gfgt1b/rustlike_language_with_static_memory_management/)

I look at this and I think it's a innovative and promising idea - the freedom
of a garbage collected language, but with the tracing done as a type-aware
static analysis, and the cleanup code inserted at compile-time!

~~~
pjmlp
It is not the only one

[https://www.csail.mit.edu/event/safe-parallel-programming-
pa...](https://www.csail.mit.edu/event/safe-parallel-programming-parasail-
ada-202x-openmp-and-rust)

[https://chapel-
lang.org/docs/master/builtins/OwnedObject.htm...](https://chapel-
lang.org/docs/master/builtins/OwnedObject.html)

And a couple more with affine types, or algebraic effects.

Yes, this might be the future, but we are still far from the overall
convenience of GC for common programming scenarios.

If anything one was to thank the Rust community for pushing more people to
look into this area, regardless of its outcome in the language market.

------
flohofwoe
> This means that it maintains the ability to insert freeing code at
> appropriate program points, without putting restrictions on how you write
> your code.

How does the approach in mitten compare to Automatic Reference Counting in
Objective-C (and I think Swift too)? From my experience, ARC can still add a
surprising amount of memory management overhead to a program and needs a lot
of hand-holding to keep that overhead down to an acceptable level (e.g. low
single-digit percentage of overall execution time in programs that talk to
Obj-C APIs a lot). I would be surprised if a "traditional GC" can do any worse
in that regard (maybe reference counting smears the overhead over a wider
area, e.g. no obvious spikes, but instead "death by a thousand cuts").

One thing I'd like to see in modern languages is to encourage and simplify
working with an (almost) entirely static memory layout, and make manipulations
inside this static memory layout safe. This static memory layout doesn't need
to be magically derived by the compiler as long as the language offers
features to easily describe this memory layout upfront.

A lot of data structures in applications don't need to live in "short-lived"
memory regions, but they often do because that's what today's languages either
encourage (e.g. when built on the OOP philosophy), or what happens under the
hood without much control from the code (e.g. in "reference-heavy" languages
like Javascript, Java or C# - or even "modern C++" if you do memory management
via smart pointers).

Minimizing data with dynamic lifetime, and maximing data with static lifetime
could mean less complexity in the language and runtime (e.g. lifetime tracking
by the compiler, or runtime memory management mechanisms like refcounting or
GCs).

~~~
eklavya
From what I understood, it’s not reference counting but trying to determine at
compile time when to drop using data flow analysis to come up with an
approximation of the liveness.

I had a thought sometimes back, can compilers do a profile run to get
information about the liveness of objects it couldn’t determine statically by
dumping gc info?

~~~
amedvednikov
Most programs are complex, have lots of branching, so this wouldn't work.

------
myu701
If a language like this were to take off, I could see linter-style errors pop
up that are not currently possible.

"ERROR: maximum memory usage computed to be XYZ MB, which is higher than the
speicified limit of 500MB"

Now that would help keep the RAM bloat down!

~~~
Ono-Sendai
My functional language Winter has errors like that. You can compute the
maximum memory usage of the program and then throw an error if it exceeds some
threshold.

Edit: I'll add that there are lots of programs the prover can't effectively
handle right now, so can't compute a good memory bound. It works fine for some
relatively simple programs however.

~~~
MaxBarraclough
> there are lots of programs the prover can't effectively handle right now

And always will be, due to Rice's Theorem. Can still be useful though -
various formal methods techniques are like this.

~~~
Ono-Sendai
Winter is not a Turing equivalent language, so technically Rice's theorem
doesn't apply (I think). Nevertheless practically speaking you are right,
there will always be valid programs that the prover can't prove are correct.

------
fulafel
This could be an interesting avenue to work towards data layout optimizations.
If the language is built to require static knowledge about memory accesses, it
could change the layout to be more cache-friendly, use range optimizations to
pack fields more tightly, and customize array-of-struct representations to be
blocking/tiling friendly etc.

------
xiphias2
The great thing in Rust is that variable lifetimes are defined on function
boundaries. Taking that away would take away the guarantees that Rust
libraries can provide.

In another word it's a good thing forprogrammers that Rust doesn't allow more
freedom, and requires them to restructure the code if necessary.

------
amelius
I'm thinking that until they are perfect these kind of languages lure the
developer into a one-way street, convenient until you reach a dead-end at
which point your only escape (if you are lucky) is a whole bunch of contrived
and difficult to maintain typing constructs.

Still interesting research though.

~~~
iknowstuff
Your comment carries very little substance (reads like a baseless opinion) but
appears on the very top of the page. Interesting.

~~~
edjroot
At least according to [1] and [2], comment order is not determined just by the
comment's score but also by the posters', when it was posted, and other
things.

[1]
[https://news.ycombinator.com/item?id=1398764](https://news.ycombinator.com/item?id=1398764)

[2]
[https://news.ycombinator.com/item?id=13867739](https://news.ycombinator.com/item?id=13867739)

------
dwenzek
It's refreshing to see new approaches for memory management exploring notably
the power of static analysis at compile time. I will take the time to read
this dissertation!

Just a question. It reminds me previous works on static inference of stack-
allocated regions. What are the relationships, if any?

* [A Simplified Account of Region Inference]([https://hal.inria.fr/file/index/docid/72527/filename/RR-4104...](https://hal.inria.fr/file/index/docid/72527/filename/RR-4104.pdf))

* [MLton regions]([http://mlton.org/Regions](http://mlton.org/Regions))

~~~
zozbot234
The GitHub readme links to a scholarly thesis discussing the general
approach[0], as well as to the author's own dissertation[1] on this specific
project (which "aims to investigate the practical viability" of [0] "by
building the technology into a real compiler" for empirical evaluation on
real-world hardware). Region inference is discussed throughout [0], and
extensively in Chapter 9.

[0] [https://www.cl.cam.ac.uk/techreports/UCAM-CL-
TR-908.pdf](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-908.pdf)

[1] [http://nathancorbyn.com/nc513.pdf](http://nathancorbyn.com/nc513.pdf)

~~~
sitkack
If the paper is anything like the abstract, this will be wonderful!

------
est31
Quite interesting.

quicksort example: [https://github.com/doctorn/micro-
mitten/blob/f2e7eb12a5d8f88...](https://github.com/doctorn/micro-
mitten/blob/f2e7eb12a5d8f8812358e1119e35a6ecef0ed164/src/test/benchmarks/quick_sort.mmtn)

------
xiaodai
"In its current form, asap is not equipped to handle CONCURRENT programs.
Managing memory in concurrent programs poses its own set of challenges."

So it's doesn't have the fearless concurrency of Rust yet and I wonder if it's
possible this approach at all. I guess it's an open research question.

~~~
zozbot234
Concurrency is a "proposed extension", per section 6.4 of the referenced
thesis. Among other things, it is noted that _cooperative_ concurrency with
explicit yield points would be somewhat feasible, whereas anything more
general than that is very much an active research area to say the least.

------
api
If unrestrictive compile time GC is possible, couldn't this be retrofitted
into JITs for JavaScript, Java, .NET, WASM, etc.? Isn't this just another way
of implementing GC?

~~~
pjmlp
JITs already do this kind of thing, it is called escape analysis, it is just
quite hard to get right.

Also many GC based languages are adding some form of linear types, so that you
can still enjoy the productivity of having a GC around, while being able to
get hold of these kind of tools.

[https://github.com/apple/swift/blob/master/docs/OwnershipMan...](https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md)

[https://gitlab.haskell.org/ghc/ghc/-/wikis/linear-
types](https://gitlab.haskell.org/ghc/ghc/-/wikis/linear-types)

------
Ericson2314
> series of data-flow analyses

Screems highly non-compositional to me. No thanks if so, I'll stick with good
old types and proofs.

------
syockit
I imagine the compile would be magnitudes slower than Rust or C++, and I can't
really stomach slow compilers. Yesterday I was hacking on a Qt app and the
time taken to rebuild after a slight change to the header is distressing (more
than 5 seconds, which can afford a full rebuild on a typical C program). I'm
kind of surprised that the quick sort example took only around twice to thrice
as long to compile compared to the no GC approach. The example seems to be
working on a compile-time defined list though. I'd like to see how it scales
on arbitrary runtime-defined input.

~~~
fluffything
> I imagine

No need to image, the thesis mentioned in the README provides compile-time
results. From no noticeable overhead to 2x larger compile-times than Rust.

Quite reasonable if you take into account that one is one person's university
thesis, and the other is a project with 100s of active developers, 100s of
open PRs, etc.

~~~
TheAsprngHacker
To my understanding skimming the paper, the analysis must compute "call
contexts" of functions, which use information from call sites. I wonder if
this will impede incremental compilation and modularity. As programs get
bigger, perhaps this approach may not scale.

------
pietroppeter
might be worth knowing that there is a _production-deployed_ programming
language which - besides being a _great_ language in many, many respects -
will very soon (next release) have compile-time memory management (already
working and performant for stdlib including async) in a _stable_ release: Nim.

[1] [https://forum.nim-lang.org/t/5734#35562](https://forum.nim-
lang.org/t/5734#35562) [2] [https://forum.nim-
lang.org/t/6125#37829](https://forum.nim-lang.org/t/6125#37829)

~~~
tayistay
Looks like the ARC we've had in other languages (ObjC, Swift) for many years
now. Any difference?

