
Checking up on realloc efficiency - luu
http://www.tedunangst.com/flak/post/checking-up-on-realloc-efficiency
======
hosay123
Ted's C programming posts make me feel really old, because I can remember
exactly the same programming topics being discussed _in the mid 90s_. And
that's C development for you, really.. every old bug is new again.

It's frankly boring that 20 years later we still feel attracted to discuss any
of this stuff. It's not like the machines or the code running on them have
fundamentally changed in the interim..

~~~
Flow
Not fundamentally changed, but some things have become more expensive so old
assumptions and optimisations should be re-evalued, it's just sound
engineering really.

~~~
userbinator
One of the best examples of this being the increasing gap between memory and
CPU speeds. Good cache usage is far more important than before.

~~~
Flow
Yes, I had that example in mind.

Another example, I might be wrong here though, is the ooold disk handling code
in BSD, where you had to specify cylinders, heads and other things, and the OS
tried to optimize performance based on closeness. See here for more info:
[http://www.cdf.toronto.edu/~csc369h/fall/lectures/w10/DiskIO...](http://www.cdf.toronto.edu/~csc369h/fall/lectures/w10/DiskIO.pdf)

Today, I imagine most I/O interfaces use a large block index instead of
cyl/disk/head and other ancient specifications. All the optimizations about
head and so on is device dependent and is best handled by the disk itself.

Imagine the feeling of being able to delete old code in the disk subsystem and
get a cleaner and more abstract view of the disk.

------
haberman
If I'm reading this right, it is advocating that malloc() calls be spaced in
memory to increase the likelihood that realloc() won't need to relocate/copy
the memory.

Isn't that a tradeoff of memory utilization vs. CPU? The memory holes you
leave sitting around for a potential realloc() is memory that will be wasted
if realloc() never comes, or if the new size is too big to fit anyway.

Any because of cache behavior, bad memory utilization is a CPU efficiency
issue too.

So what's the right tradeoff here? That's what I was hoping to see in this
article, but I didn't see any evaluation of that tradeoff. Since it is
inherently a tradeoff, it seems like it needs to be a data-driven decision
rather than working exclusively from first principles.

~~~
PythonicAlpha
More discussion would really help here.

I think, in some cases, spaced allocation will come (nearly) for free, since
with 64bit addressing, there is enough space reserve (most processors can only
use a fraction of this space as physical RAM). The other point is, that with
the use of MMUs and a proper spacing (regarding the MMUs page size), the gaps
come nearly "for free".

But I think, that needs very careful thought, to make it right and not
unnecessarily introduce new tradeoffs, as you very talking about.

What you should not do, is use such spacing for small allocations. An other
point would be, to inform the allocator, that it is likely/unlikely that a
reallocation will be done ... in most cases, such information is available.

~~~
userbinator
_An other point would be, to inform the allocator, that it is likely /unlikely
that a reallocation will be done ... in most cases, such information is
available._

I agree. A lot of blocks which get allocated are never resized (e.g. objects),
while other blocks may be subjected to frequent resizing - expanding buffers
and strings being the most common example. malloc()/realloc() are very simple
APIs, and work for the general purpose of "I want to allocate some
memory/resize it", but this simplicity also means that they can't take
advantage of specific usage patterns very well.

Working with data of indeterminate size is always going to involve more
complexity, and while it may look like a solved problem (just resize when
there is not enough space left) as evidenced by the proliferation of string
implementations in various languages, the efficiency aspects are more subtle
and still well within the area of "programmer must know what he/she is doing".
Designing systems and protocols that don't need resizing of allocations, for
example (how long something is, is easily calculated and known in advance.)

~~~
PythonicAlpha
The "one size fits it all" notion of malloc/free/realloc is just wrong. When
you want optimal allocation performance, you can't get around specialized
allocators of various sorts for various needs.

