
C++ containers that save memory and time - dsr12
http://google-opensource.blogspot.com/2013/01/c-containers-that-save-memory-and-time.html
======
chrisaycock
Facebook released their C++ containers last year as well, which they named
Folly. An interesting feature they added is that their vector class is "aware"
of the memory allocator. Ie, if jemalloc is available, then the vector class
will attempt to resize all dynamic memory structures to fit within cache
hierarchy sizes.

<http://news.ycombinator.com/item?id=4059356>

------
arianvanp
Reading C++ code always gives me a weird feeling something is inherently wrong
with the language. Header files containing nearly all the implementation
because that's the only place where you can define templates is so odd...
(<https://code.google.com/p/cpp-btree/source/browse/btree.h>)

~~~
viraptor
Comments like this one don't help either:

    
    
        // Inside a btree method, if we just call swap(), it will choose the
        // btree::swap method, which we don't want. And we can't say ::swap
        // because then MSVC won't pickup any std::swap() implementations. We
        // can't just use std::swap() directly because then we don't get the
        // specialization for types outside the std namespace. So the solution
        // is to have a special swap helper function whose name doesn't
        // collide with other swap functions defined by the btree classes.

~~~
zanny
I don't get whats wrong. They have an implementation of swap in their
namespace already they don't want to use, the standard one isn't templated to
work with their btree, so you use a different swap in methods.

Sounds more like this should be btree::swap and whatever they have as swap()
should be something else.

~~~
viraptor
I think the syntax gymnastics needed to achieve the result are quite complex
for what they aim for. There's nothing technically wrong with it. It's just
that choosing a function from a namespace shouldn't need a 7 line comment
about how and why in my opinion. It's a single statement function.

Additionally `using std::swap; swap();` not being exactly equivalent to
`std::swap();` is rather strange (even if it can be explained). A candidate
for code/design smell in my opinion.

~~~
lbrandy
> I think the syntax gymnastics needed to achieve the result are quite complex
> for what they aim for. There's nothing technically wrong with it. It's just
> that choosing a function from a namespace shouldn't need a 7 line comment...

It's probably important to point out that your last sentence above is not
correct. That's not what this code is doing. If all it needed to do was call
swap from a particular namespace, it would be as simple as you imply. It's
trying to maximize the caller's ability to specialize swap for the type he's
using in the template, while still allowing for std::swap to be used as a
fallback.

> Additionally `using std::swap; swap();` not being exactly equivalent to
> `std::swap();` is rather strange (even if it can be explained).

One says "use standard swap". One says "bring std::swap into this namespace
and choose the best swap". I agree with you the difference is subtle and non-
obvious. But it is important when doing sufficiently generic programming.

------
leetrout
And this is why you need to know all the algo and data structure "basics" when
interviewing at the big G. Because people there are working on these sorts of
optimizations in their spare time.

~~~
Smerity
Agreed. When I was working on Google App Engine, the Quota team sat next to
me. The Quota team are the service that just about every other service at
Google calls to ensure a user isn't abusing "The Google". That means that they
need to have insanely small response times to ensure that all the services run
on time.

They hit an issue no-one really thinks about: resizing hash tables. No-one
thinks about it, as it happens transparently under the surface, but for them,
the few milliseconds it took to resize the hash tables would have enormously
detrimental effects on just about every other Google service. The solutions
they were talking about were really quite amazing. That's the insanity they
face.

So, if anyone hasn't seen Google Sparsehash[1], check it out: it's another
near drop-in improvement to the STL. Two bits per entry of overhead and
insanely fast. I can't remember if the Quota team used it specifically, but I
wouldn't be surprised if it's one of the results of their work.

[1]: <http://code.google.com/p/sparsehash/>

~~~
jzwinck
No one thinks about resizing hash tables...except for any high-frequency
trading firm worth its salt. And probably several game companies. I'm sure
there are others.

It gets weirder when we talk about C++ in particular: did you know that thanks
to a standards committee decision, erase by iterator in a typical C++ TR1 hash
set or map is O(n)? Erase by key actually has better performance despite the
extra lookup. People do think about these things, in a few places, but it can
be hard to get ideal performance standardized...even in C++.

~~~
antirez
Redis has an implementation of incrementally rehashing hash table in "dict.c"
file. Very simple to understand code.

<https://github.com/antirez/redis/blob/unstable/src/dict.h>

<https://github.com/antirez/redis/blob/unstable/src/dict.c>

As you can see no rocket science there.

~~~
matthavener
Really great C code. However, "timeInMilliseconds" should really use
clock_gettime(CLOCK_MONOTIC, ..) instead of gettim eofday(..). A sudden change
in time due to NTP, administrator, timezone, etc, could have the loop in
"dictRehashMilliseconds" running for longer than expected.

(Here's someone explaining the problem with gettimeofday
[http://blog.habets.pp.se/2010/09/gettimeofday-should-
never-b...](http://blog.habets.pp.se/2010/09/gettimeofday-should-never-be-
used-to-measure-time))

~~~
antirez
Good advice, thanks, taking note

------
shin_lao
I would recommend to have a look at sorted arrays first. There is an
implementation available in Boost:

[http://www.boost.org/doc/libs/1_52_0/doc/html/container/non_...](http://www.boost.org/doc/libs/1_52_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx)

~~~
cbsmith
Sorted arrays are a common choice that I've seen people go to in lieu of
std::set, but a proper B-tree implementation kicks butt compared to a sorted
array for performance when mutating data, and actually an _unsorted_ array
tends to kick but for small array sizes (<100 elements).

------
damian2000
Anyone know why the STL doesn't use B trees? Are there use cases where Red-
Black trees are better?

~~~
gsg
The STL guarantees both iterator and address stability for the elements of a
set, map, or the unordered variants. Those guarantees can't be provided by
many more compact representations.

Note that this rules out more than just B-trees: open addressed hash tables
and judy array like structures have the same issue.

~~~
brooksbp
What do you mean by "iterator and address stability"?

~~~
gsg
Iterator stability means iterators that point into a container are not
invalidated by operations (such as insertions or deletions) on that container.

Address stability means that addresses of elements are not changed by
container operations. If you think about it, you can see that maintaining this
property rules out contiguous storage of any subset of a container's elements,
hence my comment about "compact representations".

------
olivier1664
I'm confuse. From my understanding of B-Tree, B-Tree are good when you load
data from hard drive. It allow to minimize the number of time you load a range
of data from the hard drive. Since C++B-Tree is in RAM, and thus the data
loading should be less important, how does it appears to have a so good
benchmark? What is the secret ingredient?

~~~
alexgartrell
If I had to guess, I'd say better cache locality. It's tremendously more
expensive to go to main memory for something than to go to the cache.

------
grundprinzip
One question: Is this something similar to the CSB+ Tree (Cache Conscious B+
Tree)

<http://people.cs.aau.dk/~simas/dat5_07/papers/p475-rao.pdf>

Or are the optimisations mentioned there still valid?

