This post seems related to the authors of MPS (1) that seems to be a general garbage-collector to use with various languages.
Many GC'd languages really didn't bother with stack-allocating variable-size entities, and regardless of if they did then _precicely_ scanning the stack would be complicated without compiler help.
If the compiler doesn't leave any info to the GC, then it can't know if it's scanning a pointer or a float and if your GC strategy relies on compacting memory (ie moving objects) then trying to guess between a float or a pointer can become fatal.
I've always wondered why there is no API to tell the memory allocator to take some time to reorganize itself.
E.g. a http server just processed a request and sent a response. Before it sits in the queue to pick up a next request, it could tell: Now is a great time to stop this thread if you really need to. Ditto a GUI processed an event.
GC could do a minor collection there. Malloc/free could take or give back some chunks to a lower level or the OS.
Java had System.gc() but it was more a 'take as much time as you want on every thread and go deep'. I am thinking more of 'Here is avtime limit of a few ms, reorganize this thread only'.
Thread-local compacting GCs already handle this. Requests tend to be served by a single thread and the GC dishes out memory from a thread-local pool to avoid contention. Server-side GCs can also be concurrent and avoid suspending threads as much as possible. If you aren't generating garbage that escapes the nursery generation they may basically never suspend threads or block at all.
Arenas are a similar concept that work well in non-GC environments if you don't let objects allocated during a request escape to live beyond the request.
Thread-local pools to satisfy allocations are also a well-known technique used in various allocators so your system's malloc might be doing this for you. It might also be able to efficiently collapse free lists which more or less behaves like an arena allocator when your request pipeline doesn't let allocations escape.
In practical terms: most approaches generalize well enough that if you follow the rules you'd need to follow anyway you get the benefits without the drawbacks. In most of the remaining edge cases you get even better benefits by adopting other strategies. For example Unity uses C# which has a GC... but games shouldn't allocate on the game loop _anyway_ so GC vs ref counts vs malloc isn't very important compared to avoiding allocations altogether.
Or if you already obey the rules you'd need for an arena allocator in your server then letting the GC or allocator deal with it might mean you can coalesce cleanup across multiple requests which may actually be better for performance anyway. And without manual arena management you don't risk problems if some esoteric edge case in the code needs to allocate something that will outlive that specific request.
Most people just want an allocator that works reasonably well, and maintaining that expectation means not exposing too many details that you might be held accountable to. If you care deeply, there are usually alternatives.
There's nothing really stopping systems from exposing a hint for controlling this, but usually the people who might care about it don't just want hints, but actual guarantees, and then you have to consider all the users who hint incorrectly (or correctly, but for a different system/version). So, the benefit/cost ratio is low.
Integration of GC with thread scheduling was once an active area of research, but the world has mostly moved on (perhaps prematurely, but so goes).
There’s a concept of a memory arena… allocate a bunch of memory per frame/request ahead of time and the just increase a pointer when you need memory and then release everything at the end of the frame/request.
I think apache httpd works like this, and there the 2 subjects are mixed. They have a per-request arena. As it gets destroyed at the end of the request, you can just use a bump allocater which is one of the fastest ways to alloc memory.
I think there's two common usages of the term arena, I might call them homogenous and non homogenous. Is there much reason to use a non homogenous arena with something other than a bump allocator?
I'm not sure what homogeneous means in this context?
Arenas allow you to not worry about freeing memory, since it will all be freed in a single chunk. It also allows using allocators which are less concerned with fragmentation, since it is known that the arena memory will eventually be dropped.
The amount you are concerned with fragmentation ranges from zero concern (bump allocator) to moderate concern (maybe buddy allocator) to larger concern depends on the application and its tradeoffs, like most things in software engineering.
Lua's garbage collector can be asked to do a "step", which is a not-especially-well-defined concept, but which can be tuned until it takes the amount of time you intend. Works well in practice.
I thought the link to the "Memory Pool System" they maintain was interesting. I can't think of too many stand-alone alternatives to libgc (by the somewhat famous author, Boehm) out there.
The most depressing thing about memory management is that there's been 0 innovation in the past 10+ years, and it seems like we're fundamentally at a dead end where only tradeoffs exist.
There are basically 3 options going forward:
- manual memory management (C++, Rust) "I can verify your code but you need to `unsafe` to do anything interesting"
- reference counting (Swift) "I leak memory through cycles"
- tracing GC (Java, Go) "I stop the world and trash your swap"
The rest is engineering (e.g. Go reduced the "stop-the-world" phase to O(1) from O(n) where n is the size of memory)
You might be interested in what we're working on with Vale, its core innovations happen to be around memory management.
Some off the top of my head:
* Generational references, for single-ownership memory safety without garbage collection, ref counting, or borrow checking. [0]
* Higher RAII, a form of "linear typing" that lets us add parameters and returns to destructors, to enforce that if you do X, you won't forget to do Y. [1]
* Regions, which could make it so we can eliminate most generation checks in Vale. If it catches on, it could also help other paradigms and get reference counting back into the fight. (Note, we aren't done with this one yet, it's about 3-8 weeks out) [2]
Those are just a few off the top of my head, there are a lot of crazy ideas floating around there. I'm hoping they'll combine to form something like an easier more flexible Rust, because that's the holy grail of language design, to me at least.
A few years ago, I was also pretty down about the state of memory innovation... it felt like not much had been discovered since borrow checking. I knew there had to be something with its speed, but the flexibility of C++. Perhaps we found it!
Generational References is quite a neat trick, obvious in hindsight! I assume that the general idea is that non-owners really only have "weak" references (and safety comes from the compiler warning/checking that this isn't broken).
One thing though about them is that I assume the generation to be co-located with the memory? If so, then the runtime wouldn't be able to free memory back to the OS while running? (Unless implemented with a side-lookup but that would hamper runtime performance?)
The generations are colocated with the objects, and we can release memory back to the OS using the virtual memory shenanigans explained in [0].
However, the design and implementation have simplified quite a bit since that article. Now:
* We use these generations only for assertions on dereferencing non-owning references, and don't use them for weak references anymore.
* We let malloc release memory back to the OS like normal. If the user dereferences something in a released page, we'll get a page fault instead of an assertion failure, but both are handled the same.
* When the generation hits INT_MAX, we just let it overflow now instead of blacklisting the allocation forever.
These adjustments mean we never need the heap which is a big improvement, and really unlocks Vale as a systems programming language.
The purposeful overflowing does mean that only 99.99999998% of invalid memory accesses will halt the program, which is more than enough certainty in practice [1]. (The probability is actually higher, as we increment generations by prime numbers and won't even hit our first possible collision until well after most runs have ended.)
I really like the ideas behind Vale, but I don’t understand this:
“These adjustments mean we never need the heap which is a big improvement”
Why would the heap be problematic? It is not different from the stack, there is nothing different between the stack and the heap from the perspective of the CPU. Memory locality is what matters - if you allocate an arena and fill it tightly and refer to elements in it in a predictable manner it will be just as fast.
The real disclaimer should perhaps be that this stuff is something that's still being worked on wrt. production use, just like many other kinds of PL research. Once it's ready, Rust will probably be in a very good position to adopt these features.
That would be pretty cool! I'm not sure Rust can support these features, though.
* Generational References involve some pretty deep compiler support. One can implement something like it in Rust with a thread-local side table, but with >3x overhead compared to language-supported generational references.
* Higher RAII requires that a language not treat destructors and panic handlers as the same thing. Unfortunately, Rust uses one function (drop) for both. Additionally, it would be a backwards-incompatible change; Rust generic parameters already assume a zero-arg drop() is available for all parameters, even though there's no trait bound declared for them.
* Regions get most of the borrow checker's optimizing power with a fraction of the user complexity. Having both borrow checking and regions would just be adding complexity though, which defeats the purpose a bit.
Still, there are a few other features of Vale that we're thinking of proposing to Rust, especially around concurrency, determinism, and better memory safety guarantees.
If talking about regions and HGM, that's fair to say. Regions are further behind where I hoped because the latter stages are blocked on generics, and I had to take 8 months off a year ago because of some PTSD from work (long story).
I acknowledge that these are excuses, and that these two features of Vale can reasonably be regarded as vaporware until we get them finished. We're over halfway done with regions, fingers crossed for September!
For what it's worth, most of our innovations are actually complete and released. Generational References, Fearless FFI (minus sandboxing), and Higher RAII are all done and usable, and the Perfect Replayability proof-of-concept is finished as well. All these are in version 0.2, which can be downloaded at [0].
And if you ever want to see what we're working on, we have a "What's been happening recently" on the roadmap page [1].
I'm not privy to the story here, but taking eight months off to treat PTSD doesn't sound like an excuse to me, but a really important period of time that needed to be taken. I hope you're doing alright now, but don't let yourself go thinking that taking time to deal with something like that is an excuse for other things not happening - it's far more important.
Separately, I've not checked Vale out yet, but props to you for diving in on new ways of handling memory and RAII - looks really interesting and I'll be giving the language a whirl at some point soon!
Thanks mate, I appreciate the kind words. I try not to judge myself too harshly for taking the time off, but it does help to hear it from a kind soul such as yourself. Luckily, the PTSD isn't spilling over into my other interests (games, Vale, etc) much any more so life is steadily improving!
Well first of all there is tons of research being done right now into linear type systems, which have the potential to statically verify a whole bunch of things that Rusts borrow checking still struggles with. It's a very hot field of CS research atm.
I also think that you discount the amount of advancement that has been done in the development of pauseless GC. It's a shame that many of these are only available on the JVM (and sometimes only at a steep price), but I have no doubt the main ideas will eventually get adopted by other languages.
But really there are so many unexplored niches. AFAIK no current architecture has hardware support for read/write barriers, having that would make pauseless GC much more feasible. Zig has been experimenting with letting people use custom memory allocators, but there are many more avenues there that have not been explored. Why not have a language feature for "this value lives outside GC management" similar to Haskells compact regions but more powerful? Nim has some cool ideas about mixing reference counting for most values and a more general GC for cycle breaking. Mixing GC and non-GC values is also still a seriously underexplored field IMO.
Actually _most_ hardware architectures has support for read/write barriers, it's just that the technique is patented. :(
Also fully agreeing on the GC/non-GC mixing being underexplored, however outside of compiler analysis optimizations this would require language support and the intersection of people who would be interested in this mix doesn't seem that large.
Nim has for a long time had automatically managed `ref` as well as manually managed `ptr` which is similar to (perhaps directly inspired?) your last comment, and has several options for Auto Mem Mgmt.
The hard truth is that there is no free lunch. We like to pretend we're using a Turing machine, but the moment you start caring about performance or memory limits, the abstraction breaks down and you realize that physics dictates that we have a finite state machine instead. This was "solved" about as well as it could be many years ago with however many billions were poured into GC R&D for all the people who don't care deeply about the limits, but nothing is going to magic away that fundamental trade-off unless we discover new physics.
Every innovation since then has been about developing workarounds to deal with that trade-off in more or less sophisticated ways.
I'm doing some pretty different things with civboot.org and http://github.com/vitiral/fngi, but most would consider them regressions instead of advancements.
The goal is to have fragmentation-free memory managers that can work with minimal hardware support. This has limitations like block size (only 4k blocks allowed), but forces me to innovate. For instance, I'm planning on an "arena stack", where most functions simply use the arena on the top of the stack. Functions can reserve an arena on the arena-stack, call a function, then copy essential data and drop the arena, reducing fragmentation.
Might not be the kind of ideas you were thinking, but I consider this simpler than the current abstractions which hide complexity.
By your definition of what is "engineering" I feel like all improvements or changes are just engineering.
Because you've described three classes:
* No explicit runtime knowledge of whether a given object is alive
* Explicitly tracking how many references there are to an object
* Searching the program memory to see if there are references to an object that make it live
I'm not sure there is another class of memory management that isn't one of these three?
It's also somewhat unfair to rust - the static and parametric lifetime management _is_ new, and doesn't "need unsafe" to do anything interesting; and to tracing GCs modern GCs have developed ever more techniques to not be "stop the world" including parallel, iterative, lock free tracing, etc.
Doesn't the JVM ecosystem have fully concurrent GC with the Azul C4 collector? It's not my favorite set of languages but I don't think we can realistically claim the JVM is not a production-ready runtime.
Yes, "pauseless" is another keyword I forgot to mention.
Azul uses a read barrier which I guess makes it so slow that noone uses it (and the same ideas haven't made their way into JDK collectors). The previous version required hardware-supported read barrier, now AFAIK they only require a Linux kernel extension.
> Algebraic data types are usually represented using a heap. Because of their non-uniformity, algebraic data types are more difficult to scan.
Using a heap??? We can represent ADTs using stack as well, that's what we usually do in C and Rust.
Shameless plug: https://github.com/Hirrolot/datatype99 (a library of mine that generates type-safe ADTs for pure C99).