I don't know Zig, but it sounds like Zig allows to use arbitrary allocators for ...

yellowapple · on March 7, 2021

To clarify, it's up to the function being called; the convention set by Zig's standard library is that if a function needs to allocate memory, then the allocator (specifically, a struct of function pointers to an allocator's implementations of realloc and shrink) should be one of the function's arguments.

There is of course nothing stopping a function from ignoring this and using the C global allocator if need be (or, as I've done in some experiments, using a C library's custom allocator - in my case that of SQLite).

(EDIT: from what I understand, there's technically nothing stopping C from using this sort of strategy, either; a struct of function pointers ain't exactly exotic. It's just a matter of libraries being written with that convention in mind, which doesn't seem to be very common.)

glandium · on March 7, 2021

Oh so it has, in fact, the same caveat as Rust's scheme.

Edit: I guess the difference is that Zig doesn't have years of not supporting it, making the ecosystem more prone to support it.

roca · on March 8, 2021

In a large complex application you are going to want to use the same allocator everywhere, or close to everywhere, and almost all functions may allocate memory directly or indirectly, in which case this Zig convention will require most every function to have a useless parameter. That sounds enraging.

tjalfi · on March 8, 2021

It's common to use multiple allocators in some domains; game developers often use a bump allocator[0] which is reset at the end of every frame.

[0] https://twitter.com/SebAaltonen/status/1080235671883841541

roca · on March 8, 2021

Thanks, that's a good counterexample.

yellowapple · on March 8, 2021

A large complex application seems like exactly the kind of environment where being stuck with a single allocator would be enraging. I personally like the idea of being able to give each component of a large system its own fixed chunk of memory (and an allocator over that chunk), such that if one component goes crazy with memory consumption it's isolated to that component instead of choking out the whole system.

roca · on March 8, 2021

That makes a little bit of sense if by "component" you mean not a software component but a unit of work users care about (e.g. a document).

yellowapple · on March 8, 2021

It's applicable in either case:

- As you mentioned, if I'm editing a document, it's useful to have an allocator on a chunk of memory dedicated to that document. When I close the document, I can then simply free that chunk of memory - and everything allocated against it.

- If I'm implementing an operating system, I'm probably going to want to give each application, driver, etc. a limited amount of memory to use and an allocator against that memory, both so that I can free a whole process at once when it terminates and so that a single process can't gobble up memory unless my operating system specifically grants it more to use (i.e. by itself allocating more memory and making the process' allocator aware of it).

roca · on March 8, 2021

> When I close the document, I can then simply free that chunk of memory - and everything allocated against it.

You probably don't want to do this directly. Instead you want to walk the object graph and run cleanup code for everything in that graph, because in general there will be resources that aren't just memory that need to be released, and for consistency with normal operation, it should deallocate memory as it goes.

You probably don't want to allocate an actual "chunk of memory" either. That just creates unnecessary fragmentation. All you really need is accounting and the ability to report when you're consuming too much memory.

Your driver example is not an example where you would allocate memory per software component. You would actually want to allocate per device, not per driver module; it's just confusing because in many cases there is only one device. But if you can plug in many devices that use the same driver, you'd want independent allocation accounting per device.

yellowapple · on March 9, 2021

> in general there will be resources that aren't just memory that need to be released

Zig already handles this with its "defer" feature; as a resource goes out of scope, it can be released automatically. In the document example, that document's existence would likely be a running function, and as that function terminates, it would likely have "defer" statements kick in that free the document's chunk of memory and release any file descriptors and such.

> You probably don't want to allocate an actual "chunk of memory" either. That just creates unnecessary fragmentation.

If anything that should help reduce fragmentation, or at least help reduce its impacts, since you have better control over whether that allocation exists as a contiguous block.

> All you really need is accounting and the ability to report when you're consuming too much memory.

Which is trivial to do when you know for sure that a given component can only work with a given chunk of memory.

But yeah, nothing stopping anyone from implementing an allocator that cares nothing about where its bytes actually live, and just keeping a running tab of how much memory it's used. That is: using custom allocators is an elegant and simple way to implement that accounting, since that's basically what an allocator already is.

> But if you can plug in many devices that use the same driver, you'd want independent allocation accounting per device.

We're probably talking about the same thing here, then, but with slightly different terminology (and perhaps different structure); I'd be pushing for each device to be controlled by an instance of a driver (much like how an ordinary process is an instance of a program), and it would be those per-device instances that would each have their own allocator. Those instances are what I'm calling "drivers" in this context; they might share the same code, but they run independently (or at least they should run independently; a single malfunctioning disk shouldn't bring down all the other disks).

roca · on March 9, 2021

> that document's existence would likely be a running function

No, that would mean an application managing multiple documents would need one thread per document, which is not normal practice for GUIs. In fact it would then need one event loop per document thread which is not even possible on many platforms.

"defer" simply doesn't serve as a wholesale replacement for destructors, but that's a tangent to this discussion.

> If anything that should help reduce fragmentation

No, there would be fragmentation at document granularity. For example, if you create a document, add a lot of content to it, then delete some of that content, then do that again for several documents, the memory used would be the sum of the maximum sizes of the documents.

I agree with the rest of your comment.

yellowapple · on March 9, 2021

> No, that would mean an application managing multiple documents would need one thread per document, which is not normal practice for GUIs.

Unless those functions are async, which Zig also supports (even on freestanding targets!). Single OS thread, single event loop, many concurrent cooperatively-scheduled functions. Or you can get fancy and implement a VM that in turn runs preemptively-scheduled userspace processes, in essence basically reinventing Erlang's abstract machine (and this is exactly a pet project I'm working on, on that note).

And even keeping each document in its own (OS) thread ain't really that unprecedented; browsers already do this, last I checked (each open tab being a "document" in this context) - in some cases (like Chrome) even doing one "document" per process.

> For example, if you create a document, add a lot of content to it, then delete some of that content, then do that again for several documents, the memory used would be the sum of the maximum sizes of the documents.

Would that not also be the case if all those documents used a single shared block of memory? Again, splitting things up helps avoid fragmentation here, especially if you know that most documents won't exceed a certain size (in which case fragmentation is only an issue for data beyond that boundary) - or, better yet, if you ain't storing the whole document in memory, in which case the buffer of actively-in-use data can be fixed. Further, if each allocation is a whole page of memory, then that's about as much control over fragmentation as an application can hope for beyond itself being the OS (and probably won't make much of a difference if those pages are scattered across RAM anyway; swapping would definitely suffer on spinning rust, but that's already bad news performance-wise anyway).

roca · on March 9, 2021

> And even keeping each document in its own (OS) thread ain't really that unprecedented; browsers already do this, last I checked (each open tab being a "document" in this context) - in some cases (like Chrome) even doing one "document" per process.

That is not correct. (Source: I am a former Mozilla Distinguished Engineer.)

Chrome (and Firefox, with Fission enabled) do one process "per site", e.g. one process for all documents at google.com. (In some cases they may use finer granularity for various reasons, but that's the default.) In each process, there is one "main thread" that all documents share.

> Would that not also be the case if all those documents used a single shared block of memory?

No. Memory freed when you delete content from one document would be reused when you add content to another document.

steveklabnik · on March 7, 2021

That is being backfilled in; Vec already implements it on nightly, IIRC.

And really, what you're talking about here is "the standard library data structures," which aren't super likely to be used in firmware anyway. It's a lot easier for ecosystem data structures to add support, after all, they already would choose to call the global allocator, so now they can do either. And it is much easier for them to cut backwards-incompatible changes, if they have to.

yellowapple · on March 7, 2021

> which aren't super likely to be used in firmware anyway.

Why not? Zig's std library is specifically designed to be usable for freestanding/baremetal targets (e.g. firmware), and the compiler is smart enough to only include the parts of a library (including std) that are actually used. If you do need to reimplement a part of std, you can just... reimplement that part, and import your own implementation instead of the one from std.

Unless you're talking purely about Rust?

steveklabnik · on March 7, 2021

I am talking purely about Rust, yes. Firmware tends to use libcore, and if it does happen to dynamically allocate memory, liballoc. libstd assumes you have an OS, so...

glandium · on March 7, 2021

I mean, in terms of Rust, it sounds like Zig allows to use any allocator for anything in any crate. Not only structs in std or other crates that explicitly allow a custom allocator. In Rust, and only talking about std, you'd need to change a lot of things to allow e.g. BufWrite, etc. to use a custom allocator. And every crate that uses types that allocate stuff under the hood. But maybe I'm misunderstanding what Zig allows.

steveklabnik · on March 7, 2021

You are not misunderstanding what Zig allows, but Rust can do the same thing. https://doc.rust-lang.org/stable/core/alloc/trait.Allocator.... just isn't stable yet. And it's conventional for it to take this as an argument for everything that needs it in Zig.

BufWrite would do it the same as any data structure would, an additional parameter, all the same.

glandium · on March 7, 2021

I actually did misunderstand. I thought it allowed callers to give an allocator and the callees didn't have to know for it to be used.

yellowapple · on March 8, 2021

I mean, I'd say you were mostly right, in the sense that the callee doesn't know the implementation details of the passed allocator; it's only aware of the interface (i.e. the struct of function pointers that defines that interface).