>> Erlang does not use a contiguous chunk of memory to represent a sequence of bytes. Instead, it something called an “I/O list” — a nested list of non-contiguous chunks of memory.
An `iolist()` or `iodata()` structure is mainly designed for I/O.
The idea is rather simple but powerful.
Let's say you want to send `"Hello " + name`, where `name` is a string, over the network. Traditionally, in C, you would allocate a buffer that is large enough and copy the "Hello " literal into it and then copy the `name` string before calling `write(fd, buffer, length)`.
If you wanted to avoid allocating the buffer and doing the copy, you would make use of `writev(fd, iovec, count)`, and this is exactly what the `iolist()` structure allows for. The erlang runtime system (erts) makes use of efficient `writev` calls instead of having to allocate temporary buffers just to send data over the network (something Erlang is notoriously good at) when you make use of `iolists`.
Definitely maybe. If your I/O list is a list of integers (the traditional Erlang string type), probably not.
If your I/O list is a bunch of binaries you got from here and there and you don't need to inspect them, just pass them along to another destination, then maybe yes. When BEAM writes an I/O list to an FD, it's going to use scatter/gather I/O and at no time does BEAM need to form a contiguous chunk of memory with all the data; what the OS does with that is up to the OS though.
> almost certainly will never win any medals for speed ... The language is slow, awkward, and ugly
I suspect the answer is probably moot.
But I also think that the answer would depend on what the bytes are being used for, how large the chunks are, etc. CPU cache utilisation is influenced by many factors, and can't be deduced from examining the data structure alone.
Probably isn't too bad. chunks probably fill a cache line. Then it is just a question of whether or not beam inserts prefetches for the next chunk, and or the CPU prefetch realizes we are chasing this set of pointers.
Note: I am more familiar with C++, so C++ digression here.
C++ has std::deque that is a similar non-contiguous chunked container, comparing it to std::vector(the contiguous container) it is really close for things vector is good at, and better than it at things vector is bad at like random insert and removal.
It punts that question to how you end up using it. If it lets you avoid blowing out your cache by copying huge arrays all the time, then it's more cache-friendly than using flat arrays. In most use cases you'll end up using chunks that are "big enough". It's definitely possible (unless the VM is doing something very clever) to get yourself into a pathological case where your tree degenerates into a linked list and your performance becomes awful - but that's something you can fix in how you construct your things rather than an inherent problem with the datastructure.
is this cpu cache-friendly ?