
Lessons in Managing Haskell Memory - ruuda
https://tech.channable.com/posts/2020-04-07-lessons-in-managing-haskell-memory.html
======
kccqzy
> But by pretending to need the region inside using touch#, after converting
> the address back, we ensure that the compact region remains live from the
> GCs point of view.

Wow, but this seems incredibly fragile to me. touch# is a primop that is very
magical and, in the face of simplification, prone to being eliminated by the
compiler. I remember some bug in which the compiler eliminated the touch#
after deducing that the code is dead. Chaos ensues.

If it were me, I'd just keep a random object inside the compact region, and
then refer to that object in the regular way (i.e. not via the custom hash
table of raw addresses) in order to keep the compact region alive.

In fact, it seems that the main purpose of that complicated ST-like RankNTypes
dance is to ensure liveness of the compact region, when objects in the compact
region are being referred to using raw addresses. Then why not just keep some
regular Haskell value inside the compact region and use that to keep track of
regions instead of phantom types? That is, every `Entry` now contains some
token that is stored inside the compact region, as well as the address of the
data it actually points to. Then, whenever any Entry is reachable in GC, the
entire compact region is reachable in GC.

Am I missing something?

~~~
Kutta
I agree that storing a reference besides an Entry is probably safer than
#touch. It's also similar to arrays slices, where we store reference to the
array itself + offsets.

The ST-like region typing can be still useful though, because we get a static
bound on the lifetime of a compact region. We'd prefer to avoid leaking space
via large compact regions which survive beyond their intended life.

~~~
radekchannable
That would remove the need for touch#, but the point of Entry is that it _does
not_ contain references to values that the GC has to follow. If we would add a
reference to the region, then the GC will have to look into the Entry, which
is what we want to avoid. :)

~~~
Kutta
I thought the cost of GC following a pointer into a compact region is
basically constant time, or is that incorrect?

------
rwmj
I wonder if Haskell has a way to (optionally of course) put objects into
manually allocated memory regions? I implemented this in 2006 for OCaml[1]. We
used it for (what we would nowadays call) "big data" processing where we had a
huge amount of static data for analysis and we didn't want the GC to be
constantly traversing it. We could move this data into the manually allocated
region we called the "ancient heap". It still appeared as regular OCaml
objects, but when the GC saw a pointer that crossed into the ancient heap it
would stop following it.

Of course the obvious down side is you have to remember to manually deallocate
it :-)

[1]
[https://opam.ocaml.org/packages/ancient/](https://opam.ocaml.org/packages/ancient/)

~~~
arianvanp
This is exactly what the article is describing no? This sounds like the
"compact regions" feature the entire article is about.

~~~
rwmj
Oh I see - on first reading I thought that compact regions were separately
GC'd regions in the old generation. Skimming the paper
([http://ezyang.com/papers/ezyang15-cnf.pdf](http://ezyang.com/papers/ezyang15-cnf.pdf))
it does seem as if they independently rediscovered ocaml-ancient.

------
logicchains
Is it possible to allocate an object directly into a compact region, rather
than allocating outside and then copying it in?

~~~
rwmj
If you explore ways it might work, I think you'll see that's not easy. For
example, let's say you have an "alloc_compact" function. To put a tuple
"directly" into the compact heap you might write:

    
    
        alloc_compact (1,2)
    

But wait! You've actually allocated (1,2) and then copied it.

------
AnimalMuppet
You're almost trying to do C++ memory management, but do it in Haskell. This
seems almost like masochism.

~~~
kccqzy
I think that, for many applications that need to deal with a large heap
written in a GC'd language, sooner or later people realize the GC just isn't a
good fit and reinvent manual memory management. This is the same regardless of
whether you choose Haskell or Java or Go. Even in Java world you can find
plenty of evidence that people find GC + large heap unsatisfactory and resort
to alternatives like memory-mapped files or JNI or just moving the data out of
the process. Same thing with Go. I wouldn't call it masochism.

~~~
AnimalMuppet
"Any sufficiently complicated program in a garbage-collected language contains
an ad hoc, informally-specified, bug-ridden, slow implementation of half of
malloc and free"?

