
Please grow your buffers exponentially - AndrewDucker
http://blog.mozilla.org/nnethercote/2014/11/04/please-grow-your-buffers-exponentially/
======
bhaak
From the comments, the linked SO question is probably better a treat of the
title than the article: [http://stackoverflow.com/questions/1100311/what-is-
the-ideal...](http://stackoverflow.com/questions/1100311/what-is-the-ideal-
growth-rate-for-a-dynamically-allocated-array)

I would have guessed by now it is common knowledge that you should grow your
buffers exponentially. The only question being by how much?

For general usage it seems 1.5 is usually used but another factor could be
better if you have different constraints for reallocating and memory
fragmentation or value them differently.

~~~
venomsnake
I think that we need smart realloc. Something that takes a buffer's history to
determine the proper increase ratio.

~~~
Cthulhu_
Probably easy enough to implement, but it'll take someone smarter than me to
explain why it probably won't be an improvement.

~~~
byuu
Heap allocators already are ridiculously smart. But there's just too many
divergent use cases for allocators.

I wrote a simple string heap allocator (grab a 16MB block and hand out
256-byte chunks) and the resulting code was 85x* faster than jemalloc (
[http://pastebin.com/dAqa4dbN](http://pastebin.com/dAqa4dbN) ) [* I know the
way I am benchmarking is shit and probably way off from real world usage, it's
hard to do benchmarking ideally though.]

But of course, mine wasn't thread safe, couldn't handle larger blocks
(fallthrough to malloc), sucked at fragmentation, and couldn't detect double-
frees.

Similarly, I found a massive speed boost for strings by using my own hand-
rolled memcpy [[ while(l--) * t++ = * s++; ]] ... undoubtedly because all the
trickery in setting up SSE and handling lengths not evenly divisible by the
register sizes was more expensive than just dumbly transferring the data, when
most strings are very small in size. Though I am sure memcpy would destroy me
if I wanted to copy 2GB of data in one go.

All of the extra logic to be smart will help you when it actually works, but
will make your worst case substantially more painful.

Let's say you had to create 100 objects. Would you rather create all 100
objects in 10ns each, or 95 objects in 1ns each, and the remaining 5 objects
in 1000ns each? What if you were writing a particularly weird app that ended
up needing 1000ns for every alloc it did?

One immediate concern I'd have with a smart exponential allocator was when you
were memory constrained and your application was a web server or SQL database.

~~~
sitkack
Back when we used 32 bit machines, the exponential container would kill us.
Couldnt use all of the memory without running out of ram. When ur at 1gb of
space do you really want to double that on insertion? Yeah, yeah demand
paging, doesn't work in practice. My instinct is that u want a doubling
allocator with a max, some sigmoid that responds to free space and allocation
rates.

For FF, they might be better off just making a couple memory pools per page
and bump allocating. Throw the pool away on exit and be able to migrate a long
running tab into a dynamically allocated pool.

------
kabdib
That, or don't make your buffers contiguous unless you absolutely have to.

I did one of those classic "100X" speedups once where the buffer was growing
by a set value on every expansion. Could have gotten 10X or better improvement
with exponential growth, but did even better by having an interface that
didn't guarantee address-based access from one element to another, so all I
needed was a tracking structure for multiple smaller buffers.

Don't use an array unless you need an array's cross-element addressing
leakage.

~~~
twic
It continues to baffle me that most standard libraries don't have collections
which work like this. They seem like they'd be a much better general-purpose,
worst-case-avoiding choice than either arrays or linked lists.

It's not like they're difficult. I wrote one in Java not long after 1.2 came
out!

~~~
userbinator
They don't because it's easy enough to make one using what's available (a
linked list or tree of arrays), and then you get to decide all the little
things that matter with composite structures like this: the size of each
array, whether to resize an existing "segment" or to add a new one, how you'd
like to access the data in them, etc.

------
blt
I thought Firefox was mostly written in C++? Pretty surprised that raw
realloc()s are widespread enough to warrant a note like this.

I agree that it's better to know how much space you need up front - even if it
turns a one-pass algorithm into a two-pass one.

One of my happiest moments as a programmer came when reading a co-worker's
code to render 2d parametric functions smoothly in screen space. It
recursively divides down the parameter interval until it's small enough, then
draws line segments. The code is written iteratively and manually manages a
stack to simulate recursion. The stack is a fixed-size static C array instead
of a vector. "Aren't you worried about overflow?", I asked. But he had
computed the number of divisions in half to get from a FLT_MAX-length interval
down to an FLT_MIN-length interval, and made the array big enough to hold that
many steps. Goodbye malloc(), hello one not-really-that-big static array that
stays warm in the cache.

~~~
reubenmorais
>I thought Firefox was mostly written in C++? Pretty surprised that raw
realloc()s are widespread enough to warrant a note like this.

It is, but STL usage is very limited, because of all the headaches involved in
linking against all the different STL implementations in all the different
platforms. mozilla-central has its own containers.

~~~
ExpiredLink
The C++ variant most used in the real world is 'C with (some) classes'.

~~~
cerberusss
That, or people stay put in a framework such as Qt. I'd consider myself an
experienced Qt developer. C++? Not so much.

------
nly
This approach is still actually suboptimal from a spatial efficiency
standpoint, wasting half or a third, depending on your chosen factor, of
memory whenever you push that key Nth element on to the back.

The real question is, why do people use completely contiguous memory so much
when they really don't need to? For I/O we have nice APIs like
readv()/writev() which deal with segmented buffers just fine... and you really
don't need _humongous_ contiguous vectors to get all of the benefits out of
your CPU cache...

This paper details another approach that has much of the same benefits if you
don't need completely contiguous storage, and only wastes O(sqrt(N)) memory at
a time, resulting in much smoother allocation while still maintaining O(1)
append operations.

[https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf](https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf)

------
barrkel
Not using exponential growth in buffers / vectors / other contiguously
allocated collections is a rookie mistake. It shouldn't be made by anyone
who's ever studied algorithm complexity, never mind studied CS at any level.

About the only exception is when you have global knowledge about how a buffer
is going to be used. But that's rare in modern modular software.

~~~
abcd_f
What a remarkably misguided rant.

To each his own. What works for a general-purpose app on a desktop hardware
doesn't work for a resource-constrained firmware (that still has to deal with
potentially unbounded inputs). Yes, there's exponential reallocation, but
there's also a multitude of other strategies, each best fit for its particular
application domain. "Rookie mistake".

~~~
pja
You're making a pedantic special case correction to a generally correct truth.
This is not an endearing trait.

Everybody knows that a statement such as the one the parent post made will
have special case exceptions: The point is that exponential growth should be
the default & any other choice requires justification.

Asserting this is not a "misguided rant".

~~~
nnethercote
> Everybody knows that a statement such as the one the parent post made will
> have special case exceptions

Thank you for reading with a spirit of intellectual generosity.

When I write posts like this I always wonder about how many qualifications and
weasel words I should add just in case it makes Hacker News and the nitpickers
come out in force. In this case I wrote things like "A strategy that is
_usually_ better is exponential growth" but still many of the comments are
polite variations on "exponential isn't best in all circumstances, you idiot".

~~~
Someone
Instead of weasel words, [RFC 2119] should be sufficient
([https://www.ietf.org/rfc/rfc2119.txt](https://www.ietf.org/rfc/rfc2119.txt))

(Google also gave me
[http://tools.ietf.org/html/rfc6919](http://tools.ietf.org/html/rfc6919),
which I did not know about yet. The third month of the year often sees
remarkable productivity, culminating in superb output in the beginning of the
fourth month)

------
ot
Please do grow your buffers exponentially: multiplying the size of your array
by c at each reallocation yields amortized O(1) operations for any c > 1.

But please don't use c=2: c=1.5 is a much better constant.

The reason is that that when you grow the array one element at a time, after i
reallocations your memory is like this:

    
    
        <previous allocations>|c^i| |c^(i+1)|
                               old    new
    

Since the previous allocations are freed already, you would like to place the
new allocation in the freed space. How much is that? Well that's 1 + c + c^2 +
... + c^(i-1), that is (c^i - 1)/(c - 1). For c=2 that's 2^i - 1, so the new
allocation can _never_ fit. It turns out that if c is smaller than the golden
ratio (1.618...) then eventually new allocations will fit in the freed area.
It is a nice exercise to see how the golden ratio shows up.

c=1.5 is a generally used constant also because it's easy to compute as n +
n/2.

~~~
Chinjut
I don't understand; why hang on to previously freed array space to reuse for
new iterations of the array? Isn't the whole point of freeing array space to
allow arbitrary other data to use it as needed?

~~~
pedroo
Take a look here:
[https://en.wikipedia.org/wiki/Memory_management#DYNAMIC](https://en.wikipedia.org/wiki/Memory_management#DYNAMIC).

No one is really hanging on to previously freed space (except for the
allocator, or if you're using a custom object pool allocation scheme)
specifically for new versions of the array.

But if you're allocations look like this:

    
    
      array = malloc(<capacity>);
      // do stuff with array
      free(array);
      ...
      array = malloc(<new capacity>);
      // do stuff with array
      free(array);
    

with no other allocations in between then it is possible that the allocator
might reuse previously freed array space.

~~~
Chinjut
I guess let me put my question another way: what's the advantage of being able
to reuse previously freed array space for new iterations of the array, vs.
having that space used for something else and using some other space for new
iterations of the array?

It seems to me that, when reallocating, one ought to be able to say to the
memory allocator "I want you to reallocate this array to a new size of
[whatever]; go find a contiguous block of size [whatever], possibly
overlapping the current array but not overlapping any other allocated memory,
and copy/move the contents over there appropriately". (I believe that's what
the "realloc" function does, no?). And, in that context, I can't see why any
of the golden ratio stuff matters (though, of course, exponential resizing is
still useful as noted).

------
userbinator
This argument in this article is mainly based on a naive realloc() that does
allocate-copy-free, but on a system with virtual memory the memory allocator
should be able to reallocate by modifying page table entries, which is far
faster than copying the data around; in the (highly improbable) unfortunate
case that there are no contiguous VAs on every reallocation, this method still
has to move a quadratic number of PTEs, but for the given example of growing
to 1M, assuming 4K pages that's only 256 PTEs in total - and 1 + 2 + 3 + ...
256 is 32896. Assuming each PTE is 4 bytes, that gives 131584 total bytes
moved, in the pathologically worse case.

 _And if you’re lucky the OS’s virtual memory system will do some magic with
page tables to make the copying cheap. But still, it’s a lot of churn._

There is no actual copying of data, and as the numbers show, a little over
128K for zero-copy reallocations via VM is still 16x less than the 2M of a
doubling, copying buffer.

Thus, a better strategy could be allocate-copy-free with doubling size for
small buffers, and resizing in page-sized increments allowing realloc() to do
zero-copy via VM for large buffers.

Related links:

[http://stackoverflow.com/questions/16765389/is-it-true-
that-...](http://stackoverflow.com/questions/16765389/is-it-true-that-modern-
os-may-skip-copy-when-realloc-is-called)

[http://blog.httrack.com/blog/2014/04/05/a-story-of-
realloc-a...](http://blog.httrack.com/blog/2014/04/05/a-story-of-realloc-and-
laziness) (discussed previously at
[https://news.ycombinator.com/item?id=7541004](https://news.ycombinator.com/item?id=7541004)
)

------
jonstewart
This advice is not nearly as good as it may first seem. C++'s std::vector
generally uses array-doubling for growth. A few years ago I had some code to
create a graph for an automaton, backed by C++'s std::vector. Growing to a few
million vertices from 0 would take minutes (on a machine with only 3GB of
RAM). When I added a routine to guess the ultimate size of the graph and
preallocate a large enough array, the time to construct the complete graph
fell to a few seconds.

Array-doubling works fine on small–medium arrays, but once you're in the MB
you really need to find a way to guess the right size, or switch to a better
data structure, like a linked list or a rope, and rely on smart pointers, etc.
As noted in the comments, a smart realloc() can also avoid copying data.

~~~
dpark
I think you found the bottleneck you were looking for and not the one that
actually existed. 31 doublings gets you from 1 byte to an out of memory
exception on your 3GB machine. I doubt that a pure alloc and copy 30 times
took several minutes. You would have only moved 4GB total, worst case. Either
your vector implementation was not using a doubling strategy or something else
was at fault.

What was the structure of the object you were putting into the vector? Did you
have an expensive copy constructor? Perhaps some deep copying was happening
(possibly chasing lots of pointers)?

~~~
jonstewart
It was a small value type. Allocation never failed. Thanks for playing.

"Several" in this case may be "two", can't quite remember that detail very
well, and we're talking about a white plastic MacBook with the first-gen Core2
Duo with a crappy 5400rpm drive. I suspect some fragmentation-induced paging.
But you are right that I didn't get all dtrace on it; I chose a good guess for
std::vector::reserve() and that was that.

Jon

------
praptak
A related puzzle - a robot is on an infinite line going both directions.
Somewhere on this line there is a pebble which the robot has to find, which is
only visible when robot crosses over it. The problem is that the robot doesn't
know which way the pebble is.

Program the robot so that it finds the pebble, optimize for the distance
traveled as the function of the initial distance between the robot and the
pebble. The robot can tell the direction (say east from west) and measure the
distance traveled.

~~~
computer
What's the statistical distribution for the location of the pebble? (Without
that, the answer is "it depends"/undefined.)

Note that a uniform distribution isn't possible over an infinite line.

~~~
praptak
Agreed on the impossibility of the uniform distribution. Partially disagree on
undefined: a function from the initial distance to the travel distance can be
defined regardless of the distribution of the initial distance.

------
flohofwoe
That's bad _general_ advice in my opinion, instead encourage that a programmer
reserves the expected capacity beforehand, or at least tweaks grow attributes
for the specific 'growth pattern'. At least in games (only area where I have
experience), dynamic containers will often grow very quickly to and remain at
a fairly predictable size for a long time (several seconds to minutes).
Reserving capacity beforehand to the expected size will always beat 'smart'
growth strategies. If a lib can't predict the size on its own because it
doesn't enough information, provide API functions to configure it from the
outside. As a fallback I use 1.5x growth factor of previous capacity, clamped
against a per-container-configurable min and max growth value.

~~~
Roboprog
Back when I was writing parser/tokenizer stuff for a dev tool shop called
"Morada" back in the early 90s, it was a bit hard to predict how many lines
(parsed items) the input files would be, unlike a game with a constrained
"arena". So, this double-upon-realloc was exactly the approach I used in some
library calls I made up for dynamic array / "index" / list functions I put
together in C. So, I would say that the article is very GOOD advice for many,
if not all, usages.

. . .

To me, it seemed an obvious implementation of the "List ADT" stuff that I
learned in uni in the 80s (even if it was in Pascal back in the day), but my
bosses were minicomputer guys from the 70s, so they wanted to be sure I didn't
"steal" the code from somewhere. (and this is pretty much the approach taken
in the Java runtime for Vector & ArrayList)

It was my first job coding C on a regular basis. I think my employers may have
confused my initial unfamiliarity with where to put my "asterisks and
ampersands" (C gobble-de-gook syntax) with some kind of general ignorance :-)

(to their credit, they knew I wasn't stupid - I had done a sub-contract job
for them in another language)

------
agumonkey
hints about how to decrease a dynamic structure at this page's bottom :
[http://c2.com/cgi/wiki?DoubleAfterFull](http://c2.com/cgi/wiki?DoubleAfterFull)
#hysteresis

------
cirwin
Doubling is actually pretty memory efficient in real world scenarios. i.e. in
libxml, using 2 __n buffers wasted less memory than an additional 10 bytes on
the end of each.

[https://mail.gnome.org/archives/xml/2012-May/msg00009.html](https://mail.gnome.org/archives/xml/2012-May/msg00009.html)

------
zamalek
In one of our structures here we used to use exponential growth, which worked
pretty well.

We improved it by taking some real-world measurements of the structure's use
and using that to pre-allocate an estimate. If it still ran out we would use
the old exponential growth. It's been some years since we made the change, so
I no longer have the figures but I do remember the heuristic yielding orders
of magnitude performance improvements.

As for the exponential growth factors, we arrived at those by gathering stats
on real-world scenarios - with the value being the mode of all the scenarios.

------
Sir_Cmpwn
I just addressed this the other day, when a pull request wanted to use
exponential growth in the buffer. I told them not to. The difference is that
this software is going to run on an embedded sytsem for which I wrote the
kernel, and I wrote the realloc function for. My realloc expands the block
instead of moving data when possible, and I know that that's a likely scenario
during normal operation. On top of that, there's only 32K of memory to go
around, so wasting space on exponential growth is a bad idea.

Always be willing to question the standard doctrine.

~~~
beagle3
There's only 32K of memory to go around, and you're doing dynamic memory
allocation? It's been 20 years or so since I last touched such a limited
memory system - and everyone I know who did switched to static allocations
after being bitten by the unpredictability of dynamic allocations.

~~~
Sir_Cmpwn
Going on 4 years and it's working out well so far.

------
salgernon
This is great until you start vending an api to a higher level client that
makes several arbitrary mutable copies of your buffer object. There are still
32 bit machines and it is not unlikely to have a contiguous 4 meg region of
heap unavailable in a long lived app. (One option to help is to ensure that
the naive object duplication code always compact the new copy. But malloc can
(and will) fail.)

~~~
Roboprog
If you are handing out (long lived) references to your elements, then yes, you
cannot relocate them. In that case, you need an array of pointers, rather than
an array of inline values. In this case, your memory use will become even more
fragmented, though.

Either way, make sure you have swap space, and you should be all right,
especially if the swap is usually just holding idle slop. 2 GB (2^31 bytes) is
still a pretty big address space for many purposes. It will hold about 500 4
MB buffers (512, less overhead).

~~~
Roboprog
P.S. - I never had to deal with 8-bit machines, but spent plenty of time on
16-bit machines. 32 bits is a big address space, despite Microsoft having
pissed all of it (and a CPU core) away just to load the OS, leaving little-to-
nothing for actual work.

------
ccleve
This is actually a pretty old idea:

[http://en.wikipedia.org/wiki/Buddy_memory_allocation](http://en.wikipedia.org/wiki/Buddy_memory_allocation)

The primary benefit of it is that it makes space reclamation more efficient.

(At least I think so. Might not be true with modern garbage collectors.)

------
muyuu
There is no substitute to profiling and testing your code thoroughly under
real world conditions (and worse, stress conditions).

Having this in mind you can have your rules of thumb, but trying to sell a
problem like this as solved with a simple, universal silver bullet is not a
good idea IMO.

~~~
nnethercote
Sure, but when writing a generic container implementation -- such as nsTArray
-- you have to pick a single strategy. And generic containers generally use
exponential growth. And in this case it made a clear improvement on memory
usage in the general case, as the AWSY result showed.

Someone will probably now ask why Firefox doesn't use standard containers like
std::vector. Several reasons: (a) having our own container lets us know that
the implementation is the same on every platform; (b) having our own container
gives us more control and allows us to add non-standard features like
functions that measure memory usage; (c) Firefox doesn't use C++ exceptions
(for the most part) but the standard containers do.

~~~
muyuu
Yep but your post is about "fixing" all possible allocation strategies to
exponential growth: "please grow your buffers exponentially", "if you know of
some more code in Firefox that uses a non-exponential growth strategy for a
buffer, please fix it, or let me know so I can look at it".

That sounds as if very little effort was put into the allocation strategies in
the first place, since you're willing to override whatever thought was put
into deciding an allocation strategy and fix it with exponential growth (that,
admittedly, is often a good heuristic) or just have other people do it.

It's perfectly plausible than in other circumstances a different approach is
better. That said, in both patches mentioned it makes sense (I think the
minimum for XDR encoding buffers should be at least quadrupled from its
current 8KiB if it's true that on startup it gets already bigger than 500KiB).
One thing about exponential growth with rates as high as x2 each time, is that
picking a reasonably big (even slightly overshoot) on the expected buffer size
it's the conservative thing to do. Because if you let the allocator do the
growing it's often going to overshoot a lot more. If you are going to have
buffers in, say, a normal distribution of maximum sizes over their lifetime,
it's wise to preallocate as much as the 90% percentile expected size and then
instead of growing x2 perhaps grow x1.5 . Something worth testing and tweaking
because it makes a real difference.

Sorry if this sounded negative, it wasn't meant to.

------
qwtel
What is the rational behind this? My guess is that when a buffer reaches its
limit it's expected that it will need to grow by another factor of its current
size, while a fixed-size rate does not incorporate previous or current growth
needs.

~~~
ubernostrum
This is a problem that occurs in lots of places, not just memory allocation.
To pull an example I'm a little bit familiar with, check out CVE-2012-3444
from Django a couple years ago.

The tl;dr there is that determining the dimensions of an uploaded image file
was done by reading 1024 bytes at a time until enough information had been
read to figure the dimensions. Which works on a lot of common stuff, but falls
over hard on other files, since there are plenty of cases where you have to
read the whole file to get the dimensions. At 1024 bytes per read operation,
that can be a _lot_ of operations, enough that it was a denial-of-service
vector.

The solution was exactly what the linked blog post advocated: we switched from
reading a fixed size in bytes on each operation to doubling the size each
time. The exponential growth of successive reads means that the pathological
case (having to read the whole file) becomes far less pathological.

IIRC Python's dictionary implementation does a similar trick, doubling in size
every time it needs to grow. Turning lots of small read or write or realloc
(or whatever) operations into fewer but larger-each-time operations is almost
always correct, in my experience, and when not done from the start you'll
almost always end up noticing sooner or later when you wonder why you have
degraded performance.

~~~
vidarh
You're absolutely right, and too few realise this.

In particular people doing IO with small buffers drives me crazy. People
unfortunately don't seem to realise how expensive context switches are, and
how brutal they are on throughput.

I've seen this so many places. MySQL's client library used to be full of
4-byte reads (reading a length field, and then doing a usually-larger-but-
still small read of the following data). I believe it's fixed, but I don't
know when. I also remember with horror how t1lib - a reference library Adobe
released for reading type 1 fonts ages ago (90's) that spent 90%+ of it's time
on the combination of malloc(4) and read( ... ,4) - for tables of a size known
at the outset (basically some small table with one entry per glyph, that
stored a pointer to a 4 byte struct instead of storing it inline).

Currently I'm hacking on SDL_vnc now and again, and it's full of 1-16 byte
reads (seems to make sense at first glance: After all the VNC protocol packets
are of a size that depends on values of different fields; but for high
throughput networks or local connections it makes the small read/writes
totally dominate overall throughput even when the protocol overhead is a tiny
percentage of the bitmap data being pushed)

Basically pretty much anywhere where you want to read less than 4K-16K,
possibly more these days, it's better to do buffering in your app and do non-
blocking read's,so you can read as large blocks as possible at the time...

But the general problem is not paying attention to the number of system calls.
People not paying attention to stat()/fstat()/lstat() etc. is another common
one (common culprit: Apache - if you use the typical default options for a
directory, Apache is forced to stat its way up the directory tree; it's easy
to fix, but most people don't seem to be aware how much it affects
performance)

------
asgard1024
While I agree with the gist of the article, is the recommendation to use
powers of 2 as the size correct? Every allocator adds a small overhead to the
size, so one could be better off by using sizes like 2^n - 16 or so..

~~~
Dewie
It isn't necessarily a power of two, since the initial capacity might not be a
power of two.

------
cowabunga
This doesn't always work.

When you have 512MiB of memory and your current allocation is 256MiB, where do
you go?

Also when your dataset allocates 64MiB up front, then chucks two bytes on the
end, where do you go?

Know thy data and test test test is the only rule.

~~~
nnethercote
> When you have 512MiB of memory and your current allocation is 256MiB, where
> do you go?

Then you're out of luck no matter what strategy you use. Even allocating
256MiB + 1 byte won't work, because you can't deallocate the old buffer until
the new one has been allocated.

> Also when your dataset allocates 64MiB up front, then chucks two bytes on
> the end, where do you go?

What I didn't say in the article is that for some of the Firefox examples
where the buffers can get really large -- such as nsTArray -- it switches to a
growth rate of 1.125 once it gets to 8 MiB. So in that case it would grow to
72 MiB.

~~~
cowabunga
Well the first point depends on what platform and how the kernel memory
allocator is configured. It's possible to overcommit on Linux for example
which is acceptable for short windows. NT will most likely tell you to go
away.

Now that last point is the important one and is actually what I do which I was
hoping to hear somewhere :-)

~~~
barrkel
You can reserve more address space than physical memory just fine on NT.
Committing beyond the amount of physical memory available will mean paging at
some point, should you try and use all of it.

------
Heliosmaster
A few months ago a similar story popped up with hacker news about jemalloc
(can't find it now). Had very cool proofs why it was optimal as well :D

------
JoeAltmaier
Pedant:Doubling is geometric, not exponential. Right?

~~~
bdhe
Both okay according to
[http://en.wikipedia.org/wiki/Exponential_growth](http://en.wikipedia.org/wiki/Exponential_growth)

~~~
JoeAltmaier
I would think 'geometric' meant x^2 or x^3, while 'exponential' meant 2^x. Are
there different words to describe these different growth rates?

------
easytiger
Slightly confused: doesn't realloc() grow existing memory by n, avoiding the
new/copy/free cycle?

~~~
andrewla
realloc will grow if possible, but will malloc and copy if it cannot be grown.
Some allocators will round up to a larger size (like a page size), so
allocations up to the page boundary will be free. Most modern allocators use
small block optimizations and other techniques, the net effect of which is
that realloc will usually require a malloc/copy.

You should treat realloc as a semantic hint to the allocator, but for
performance analysis you should always assume that it will allocate/copy.

------
kazinator
More importantly, Firefox, please _shrink_ your monstrous memory footprint
exponentially!

~~~
ben0x539
I bet this never occured to them

~~~
kazinator
I mean, I don't want want to be the "640K oughtta be enough for everybody"
person here, yet somehow I can't accept that it should take a quarter gigabyte
to render a "hello world" HTML document. Every morning when I get into work,
the first thing I have to do is click a button that I installed into Firefox
to restart it so that my machine (4Gb RAM) becomes responsive. (That this
important button even has to be a third-party add-on indicates serious reality
denial.)

~~~
nnethercote
This sounds like atypical behaviour. One option is to try resetting Firefox,
which gives you a new profile while keeping your history, bookmarks, etc:
[https://support.mozilla.org/en-US/kb/reset-firefox-easily-
fi...](https://support.mozilla.org/en-US/kb/reset-firefox-easily-fix-most-
problems). It can fix a lot of weird problems.

If that still doesn't help, I'd be interested to see what about:memory says.
Instructions are at the top of [https://developer.mozilla.org/en-
US/docs/Mozilla/Performance...](https://developer.mozilla.org/en-
US/docs/Mozilla/Performance/about:memory). Please file a bug in Bugzilla or
email me. Thanks.

------
signa11
i was under the impression that it was common knowledge to have a doubling
strategy for buffer sizes to amortize the cost of each addition. fwiw, java
uses a 3/2 strategy, and python uses a 9/8...

------
eru
Shouldn't that be geometrically?

~~~
darrenmambo
Although exponential is technically correct when the base is 2, I find it
really masks what real exponential growth is like.

When I hear "exponential growth," I think 256, 65536, 4294967296, 2^64, 2^128

~~~
eru
2^(2^n) is a double exponential function.
([https://en.wikipedia.org/wiki/Double_exponential_function](https://en.wikipedia.org/wiki/Double_exponential_function))

------
edman
This is the same old story of time vs. space. There is no clear answer it
really depends.

------
rinon
Is there a reason the link is directed to a comment?

~~~
AndrewDucker
Yes. I'm an idiot, and it's been more than 15 minutes since I posted it, so I
can't trim the URL!

If there are any mods about, I'd appreciate their help...

~~~
dang
Should be fixed now.

------
maxk42
There is no circumstance under which Firefox should be allocating 4kb of heap.

------
riffraff
[https://areweslimyet.com/](https://areweslimyet.com/) seems badly named: all
the curves are trending upwards, so it seems like it should be
"wellneverbeslim.com"

------
throwawayaway
haha, cue firefox going back to using ridiculous amounts of memory in 5, 4, 3,
2, 1.

seriously, there is no one size fits all solution and you cannot really
predict who is going to end up using your code later on. or what manner they
will use it in.

i've worked on defects created by exponential buffer growth, as you can
imagine they can be a lot worse than malloc churn.

~~~
DrJokepu
I was under the impression that modern kernels and memory allocators are
clever enough to not initialise memory pages that have been allocated but not
used yet, is that not the case? So unless you calloc() or memset() your newly
allocated memory, overallocation won't really affect actual memory usage?

~~~
nnethercote
That's right. The untouched memory won't be committed, i.e. put into physical
memory or swap. But it will take up address space, which can be a problem --
on Windows Firefox is a 32-bit process and running out of address space tends
to happen more often than running out of physical memory, ironically enough.

~~~
e12e
That reminds me (as I now have windows 8.1 on one of my machines again) - does
anyone know if there is a roadmap/plan for official 64bit builds of ff on
win64?

~~~
nnethercote
It's being actively worked on.
[https://wiki.mozilla.org/Firefox/win64](https://wiki.mozilla.org/Firefox/win64)
seems to be an up-to-date planning document.

