
Graph Engine vs. C++ unordered_map: memory consumption test, Part 2 - graphengine
http://blogs.msdn.com/b/graphengine/archive/2015/06/07/graph-engine-vs-c-unordered-map-memory-consumption-test-part-2.aspx
======
jandrewrogers
I would make the point that C++ unordered_map is not remotely optimized for
this use case, so it is not surprising that it is less efficient. You normally
would not use it for something like this. In fact, C++ unordered_map is not
tunable enough to offer excellent performance for _most_ use cases. It is a
very generic implementation of an unordered_map.

In any code base where performance matters, most important use cases for
unordered_map would be replaced with a purpose-built implementation (like with
GraphEngine). Not only is it pretty trivial to design and implement custom
maps for various use cases, you can usually ensure at least a 2X improvement
across all relevant metrics relative to the standard library unordered_map.

Based on the relative metrics for GraphEngine and C++ unordered_map in this
article, I would expect a purpose-built C++ implementation to be significantly
faster than GraphEngine. I know that is not what is being measured but it in
real-world C++ implementations, it is how most properly engineered
architectures would do it.

~~~
nly
Indeed. std::unordered_* implementations are hogtied by the interface and
complexity requirements in the C++ standard (it basically mandates chained
buckets), and more generally the need to work with almost any hash function
(linear probing would suck with crappy hash functions for example).

Jo Muñoz wrote a cool blog series on the current implementations in
circulation, the flaws with the interface, and how his implementation (Boost
multi_index) made further optimisations despite the constraints.

Don't miss the other two parts of the series - look under October on the
right. He did extensive bookmarks in November as well.

[http://bannalia.blogspot.co.uk/2013/10/implementation-of-
c-u...](http://bannalia.blogspot.co.uk/2013/10/implementation-of-c-
unordered.html)

~~~
StephanTLavavej
I bookmarked those posts a while ago in the hopes that I'll get a chance to
rewrite VC's unordered_map in the future. We've taken the first step towards
doing so by deprecating the non-Standard hash_map (which shares its guts with
the Standard unordered_map) in 2015, so I can remove it outright at the
beginning of the next development cycle. Then the unordered family can be
changed without fear of breaking the hash family.

~~~
nly
Thanks for what you do STL (For those that don't know, Stephan maintains
Microsofts C++ standard library). Even though I don't develop against MSVC at
all, improving the state of C++ at Microsoft makes life easier for all of us
in terms of portability and bringing nice new things in to common use.

Thank you also for maintaining your GCC distro for Windows[0] and the the many
great presentations you have given at CppCon, Channel 9, Going Native etc.

[0] [http://nuwen.net/mingw.html](http://nuwen.net/mingw.html)

~~~
StephanTLavavej
You're welcome! And even if you don't use VC's STL, hopefully make_unique and
the other things I've designed will be helpful.

~~~
jaytaylor
I'm curious..

Is it possible to use VIsual C++'s STL inside of Linux?

~~~
StephanTLavavej
Nope, our sources assume they're running on Windows.

------
huhtenberg
> _with MSVC 's built-in memory allocator_

... which has been a trivial wrapper around HeapAlloc for a long time, and the
HeapAlloc implementation varies by the Windows version. In particular, Vista+
versions are low-fragmentation heaps and W7+ versions have an _outstanding_
multi-threaded performance.

In other words, deferring to a "MSVC's built-in memory allocator" doesn't make
much sense.

~~~
StephanTLavavej
And recently (in 2013, I forget about previous versions) the CRT began using
the process heap, instead of having its own heap handle. So the
malloc/new/std::allocator family is an even thinner wrapper around HeapAlloc
in that sense.

(In 2015, we're introducing a little bit of extra trickery though. In
std::allocator, we'll highly align large allocations to be friendly to
autovectorization. This isn't done at the lower malloc/new levels because they
don't get size information at deallocation time, but the std::allocator
interface has always required that information even if it was previously
ignored. IIRC this buys 15% or so, which makes the back-end devs drool. Our
current magic number for "large" is 4 KB, so this basically never affects
node-based containers, only stuff like big vectors.)

~~~
stinos
> In std::allocator, we'll highly align large allocations to be friendly to
> autovectorization

Ok this and other things you posted here are mighty interesting - do you (or
someone else working on this) have (has) a blog where news like this gets
announced?

~~~
StephanTLavavej
VCBlog is the VC team's official blog. See
[http://blogs.msdn.com/b/vcblog/archive/2014/06/06/c-14-stl-f...](http://blogs.msdn.com/b/vcblog/archive/2014/06/06/c-14-stl-
features-fixes-and-breaking-changes-in-visual-studio-14-ctp1.aspx) for where I
announced the alignment thing.

------
chuckcode
I've found measuring memory usage with STL containers to be difficult as they
have their own allocator and unless you're using reserve() often end up
allocating extra space for things. Dr Dobbs had an interesting article a long
time ago around optimizing allocators [1] and also memory management was one
of the main reasons that Electronic Arts rolled their own variant of the STL
[2]. I use a lot of STL for convenience but have been burned enough times that
when memory consumption his high or has heavy usage I manage it manually and
make sure to use a modern allocator like jemalloc or tcmalloc[3].

[1] [http://www.drdobbs.com/cpp/improving-performance-with-
custom...](http://www.drdobbs.com/cpp/improving-performance-with-custom-
pool-a/184406243) [2] [http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2007/n227...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html#Motivation) [3]
[http://locklessinc.com/benchmarks_allocator.shtml](http://locklessinc.com/benchmarks_allocator.shtml)

~~~
StephanTLavavej
By default, std::allocator just wraps new, so it doesn't make things any
harder to measure. vector isn't "allocating extra space" for no reason - its
capacity grows geometrically in order to provide amortized O(1) push_back. If
you use reserve() incorrectly, you can actually trigger quadratic complexity
(as I've done in the past when I didn't know any better). Even with unused
capacity, vector is quite space-efficient. For example, with GCC's 2x growth,
on average a vector will be using 33% extra space (that's 0.5/1.5). With VC's
1.5x growth, on average a vector will be using 20% extra space (that's
0.25/1.25). That's a pretty small cost to pay for vector's optimal locality.
And in practice, many containers will actually be exactly-sized (range
construction and copy construction will produce exact sizing), so the overhead
is even lower.

------
halayli
This test is flawed.

    
    
        start_time = std::chrono::steady_clock::now();
    
        for (auto entry : param_entries)
        {
            void* cell_buf      = new char[entry.cell_size];
            auto  upsert_result = LocalStorage.emplace(entry.cell_id, CellEntry{ cell_buf, entry.cell_size, 0 });
    
            if (!upsert_result.second)
            {
                std::swap(cell_buf, upsert_result.first->second.ptr);
                delete[] cell_buf;
            }
    
            if (rand() % 3 == 0)
            {
                delete[] upsert_result.first->second.ptr;
                LocalStorage.erase(entry.cell_id);
            }
        }
    
        end_time = std::chrono::steady_clock::now();
    
    

He's not testing unordered_map performance alone, but the performance of
new/delete & unordered_map. Also entry is a copy of param_entries's item, it
should be changed to auto& entry. So he's essentially copying the whole array
while iterating.

~~~
v-yadli
I'm pretty sure the values in entry will be loaded into registers.

~~~
halayli
That doesn't change any fact.

------
peterazo
Yet another example how C# developers can't write C++ code. :)

