Hacker News new | past | comments | ask | show | jobs | submit login
Advancing the state of the art for std:unordered_map implementations (bannalia.blogspot.com)
87 points by fanf2 on June 19, 2022 | hide | past | favorite | 51 comments



The fact that the STL api leaks these kind of implementation details is such a shame. Multiplied globally over all users of unordered_map, the amount of CPU time that has been wasted because of that decision boggles the mind. Has to be one of the biggest library design blunders in the history of the industry (up there with C strings being null-terminated). They should just add a new map to the standard with a better API.


> The fact that the STL api leaks these kind of implementation details is such a shame.

There were good reasons for it to do so. C++ is often used for performance reasons, and requiring low level access was a means to prevent some implementation being standards compliant yet with horrible performance.

I once had a coworker who had spent some time on the standards committee. He said they would dive into the technical details and tradeoffs to an insane level (utilizing journal papers, theses, and real world benchmarks). He often would quip that the amount of times some coworker came to him with a claim that they have a better implementation than the standard's and were correct, was zero. Sure, if you're a Boost developer, you may have the chops to improve on the standard, but you really need to be at that level.

The other issue, of course, is that when you go into such detail, then yes - the standard is a bit brittle. The solution is for the standard to introduce another unordered_map (with a different name) that allows for open addressing to the standard, and keep the old one for backwards compatibility. Anyone who's spent time with the algorithms library knows that this is a common approach: Adding new functions to improve deficiencies in the previous ones.

Anti-disclaimer: I hate C++. Not a fan.


> There were good reasons for it to do so. C++ is often used for performance reasons

It is asserted that there were good reasons. I believe Facebook/ Meta openly tells you that in practice you can probably drop in their notionally incompatible hash map and it'll just work. So it seems that either somehow only internal secret source projects needed these details like the bucket API or else it was bullshit and - as seems far more likely - the bucket API was exposed because nobody on the committee had the imagination to see that a better unordered_map might not use buckets at all.

One of the things Rust gets to do here is the "Crater run". Since so much Rust (all popular libraries, much else of note) is on crates.io it's no problem to just "scrape" all that source code and run the toolchain over it to find crashes, compiler performance changes or whatever. This would have been completely unthinkable in C++ as it stood when standardised, and nothing comparable is done today so far as I know, but for Rust it was no big deal.

This puts the Rust maintainers in a much better place to judge the difference between "This is technically incompatible but will not break real uses" and "This is a change that will sow Python 3.x style chaos and discord" if it ever comes to that - rather than just guessing.


> in practice you can probably drop in their notionally incompatible hash map and it'll just work.

The article unintendedly makes kind of a similar point in the "Which hash container to choose" section: Often when using std::unordered_map, we don't rely on every single word of the API. All we care is that there is an insert(), find(), begin(), end() [and the associated iterator], size(), erase() and maybe clear(), with the correct function signatures and O(kay) performance class. Only when we rely on e.g. count() really being O(1) or being able to create pointer to objects inside the map (I never did that, but do other people really do that?), other parts of the API become relevant.


> being able to create pointer to objects inside the map (I never did that, but do other people really do that?)

Yes, we do it at my day job in several places, where we take advantage of the pointer stability to improve our overall performance a lot. And a much more impactful performance improvement for our overall product, than micro-optimizations of the unordered_map's insert/erase times would achieve.

Think of, for example, multi-index containers. Or an ordered hash-map (one that is both a hash map and keeps insertion ordering).

In such cases you can keep the actual objects in an unordered_map, and only store pointers to the nodes in other containers (e.g., vectors).

Obviously you can achieve the same thing with open-addressing hashmaps, by making the map's nodes unique_ptrs (i.e., creating the actual objects on the heap externally), which is what such hashmaps tell you to do if you need pointer stability.

But you asked if anyone really used pointer stability, hence my answer.


I can totally see the line of reasoning that ended up with this decision. It was the wrong decision, but I absolutely can see how smart and competent people could've arrived there, or missed the reasons why (for instance) promising pointer stability was a bad idea. I've seen that Facebook/Meta talk as well, but they didn't have that data in 2003. Hindsight is 20/20.


open addressing of course gives completely different guarantees about pointer and iterator invalidation on insertion, and even removal.

Those guarantees are important for people who write non-trivial code.


> There were good reasons for it to do so. C++ is often used for performance reasons, and requiring low level access was a means to prevent some implementation being standards compliant yet with horrible performance.

The API guaranteed mediocre performance and prevented good performance.


On the bright side, constraints breed innovation. Joaquín wouldn't be bringing this great content to us, and furthering the state if the art for node based hash maps, if the standard library wasn't constrained.

Squeezing more efficiency out of something already used in millions of programs is very noble


> Has to be one of the biggest library design blunders in the history of the industry

May I introduce you to std::regex?

Although to be fair, that one's not so much a blunder due to the standard, as it is one of poor _implementations_, coupled with an overly restrictive standardization preventing fixing them.


C++ STL design is broken. The language shouldn’t define an API for std, it should provide an implementation. AFAIK no other major language defines the std library the way C++ does. It’s so bad and problematic.


Sorry, can you elaborate? Some things in the STL require compiler support, so you can't really "provide" an implementation in the narrow sense.

Also, how does providing a concrete/reference implementation prevent the API problems? Hyrum's law [0] means that you would make the situation worse because now people will depend on every detail you can think of. And if you want to improve the implementation in a new version, you now are (most likely) breaking ABI on every platform at once, instead of giving different platforms the ability to find compromises as needed. Just look at the recent "no ABI break in C++23" discussions to see how contentious that topic is.

[0]: https://www.hyrumslaw.com/


STL should “just” be a library written against the language spec. The fact that there are so many different, unrelated implementations is crazy.

The STL spec is so over designed that they strongly imply, if not mandate, a particular implementation. But they don’t actually provide an implementation so every platform has to implement one. However those implementations select different trade-offs which means different behaviors so actually developers wind up writing their own version that is consistent. It’s basically the worst outcome.


STL is simultaneously overspecified and underspecified. In some situations, like std::unordered_map, the specification mandates an implementation that is suboptimal for many applications. In other situations, the specification is so vague that those parts of the library are often useless.

For example, random number distributions are underspecified. The standard only defines the distributions but not the algorithms. If you need deterministic behavior after choosing the rng and the seed – for example for testing, reproducibility, or procedural content generation – you can't use the standard distributions.

C++ is typically used in applications where performance is important. If you want a "good enough" standard library, it must be good enough for most people who care about performance. A cross-platform implementation is rarely good enough, because performance often depends on platform-specific parameters and instructions.

Data structure APIs are particularly difficult if you care about performance. If you use a generic API, you will often end up using the data structure suboptimally. Your mental model is likely wrong, and the API may not pass through information that could help you avoid redundant work. You want an API that matches the actual implementation closely – but not so closely that every implementation change becomes a breaking API change.


> A cross-platform implementation is rarely good enough, because performance often depends on platform-specific parameters and instructions.

I respectfully disagree with this.

We’re talking about std::unordered_map here. I don’t know of any platform that has done implementation work to make std::unordered_map not mediocre.

Let’s be specific. What platforms are we talking about? How many STL implementations are there? Which ones offer a significant improvement for which platforms?

C++ has been my daily driver for 15 years. I’ve shipped quite a lot of code that uses std::vector and std::unordered_map, amongst others! If I really cared about performance I’d pick a hash map implementation that wasn’t mediocre.

I would definitely not consider any std::unordered_map to be highly tuned for any platform. Quite the opposite. It’s over-specified but still stuck with different implementations. IMHO it would be much better if the C++ committee released C++XX libraries that were considered “pretty good” with support for common platforms. New or proprietary platforms can then add the relatively small bits of code necessary to support their platform (think PS5) or release a completely separate library with different constraints.

Part of the problem here is C++ also really, really sucks to compile. Maybe someday modules will make it easy to drop in new libraries and containers. Jury’s still out on that one though.


I also have something like 15 years of serious C++ experience, but I've used it more for data processing than software engineering. While std::vector has almost always been good enough for me, I've often had to replace std::unordered_map with something smaller or faster, or with something where I have more control over memory layout or serialization.

std::unordered_map is not a particularly good example of things that need platform-specific optimizations. The implementations I have seen are pretty close to standard C++. std::vector implementations, on the other hand, are often really weird and rely extensively on compiler-specific annotations.


random was what came to my mind after your first sentence. It could really use a deterministic test harness for the different layers.


There's no reason the standard template library (basically Generic container types and their support) should require compiler support in C++. However C++ programmers tend (incorrectly) to use the abbreviation STL to mean the entire standard library for their language. So this can result in talking at cross purposes.

Thus while your parent may have meant that types like std::unordered_map shouldn't be magic, it's true that for example you can't implement atomics from the standard library usefully without help from the vendor.


> However C++ programmers tend (incorrectly) to use the abbreviation STL to mean the entire standard library for their language.

Just like some people have a pet peeve to correct people on this usage, I have a pet peeve to tell those people to look at the name of the MSVC implementation of the C++ standard library.

https://github.com/microsoft/STL

Yes, historically "the STL" was a subset of what became the C++ standard library. But unless you want to tell one of the big three standard library implementations that you know better than them what the name should be, using "STL" for the entire C++ standard library is simply not incorrect.

(Your point that this may be a misunderstanding between the other poster and me stands though.)


Microsoft in abysmal naming shock: Surprise as company notorious for giving products useless or misleading names gives its variant of the C++ standard library a bad name.

We're talking here about the company that has two very well known products "Active Directory" and "Visual Studio" but decided there was no reason it couldn't call unrelated products "Azure Active Directory" and "Visual Studio Code".

Here's how Sony named their video game consoles: PlayStation, PlayStation 2, PlayStation 3, and PlayStation 4. Makes sense. Microsoft decided to make video game consoles too. It named them (I am not kidding for anyone who doesn't play video games, this is real): Xbox, Xbox 360, Xbox One, Xbox Series. Why ?


For the record, Alexander Stepanov's original focus for the Standard Template Library was generic _algorithms_. Of course, he provided generic data structures ("containers") too. See http://stepanovpapers.com .


> Back in the day, open-addressing approaches were not regarded as sufficiently mature, so closed addressing was taken as the safe implementation of choice.

I always chuckle when I read statements like this because it didn't match my own experience as a learner. When I learned computer science, I was taught hash tables right after arrays, but before linked lists or even pointers. So open addressing had always been the default for hash tables for me. I'm curious to hear what other people's learning journeys are like.


I lack experience to tell - does anybody use chaining in production?


The Linux kernel uses chained hash tables in many places. https://github.com/torvalds/linux/blob/master/include/linux/...


It's a tradeoff. Open addressing is faster due to cache locality, but needs a good hash function. If the hash function tends to cluster values and generate collisions you might (key word here "might", as usual you need to do profiling to check each case) end up with way more operations per query.

But for 90% of cases, you don't really care that much about performance and whether it's open or closed addressing, as long as the mapping works.


std uses chaining because deletion isn't allowed to invalidate iterators.


I guess what we need now then is std::open_unordered_map ;-)


It is conceptually simpler, especially once deletion is involved, and is the more common implementation.


I think it was a while ago, but at this point, the cost of pointer chasing and memory overhead of separate chaining mean that is is being phased out.


unordered_map uses chaining


Why would any, when absl::flat_hash_map<> and absl::node_hash_map<> is available ?


IIRC php and go use separate chaining (unless php switched while I wasn’t looking).


I thought Java's Hashmap used chaining.


IME, if you want a fast hash map, the very first thing you should do is ignore anything that uses closed addressing. If you really d I need one of the extra guarantees provided by std::unordered_map, see if you can’t get it by using a container of unique_ptr<T> or shared_ptr<T> instead of unordered_map<T>.


What's the problem with iterator increment being (amortized) constant time for closed addressing? The algorithm could iterate over each bucket and on each bucket iterate over each element. Going from the end of one bucket to the start of the next one may be slower than within the same bucket, but still constant time.


There's a link in the article to a paper (N2023) proving this is not constant time. The issue is that as you delete elements, and the hash table becomes sparse, finding the next nonempty bucket is not so fast. However, I believe that the paper conveniently glosses over the fact that there will be rehashing into a smaller table which would fix this issue.


thanks. I got confused between constant time relative to the number of elements in the table (as typically calculated) and constant time relative to the number of slots in the table.


There also has been a lot research lately in the field. For example, [1,2] has a lot of new methods on concurrent or space efficient hash tables.

[1] http://algo2.iti.kit.edu/maier.php

[2] https://github.com/TooBiased/growt


I am always sad that the default STL collections leak so much implementation detail that they are stuck on such abysmally performing algorithms.

That every serious project ends up with its own hashmap, etc is such a shame.


Not only can't you do standards-conformant closed addressing hash map implementations of std::unordered_map I don't think you can do cuckoo- or robin-hood-hashing either. It's just so restrictive.

Unordered maps are also annoying for a completely different reason: You generally want your program to run in a reproducible way and unordered maps will undermine that goal again and again in subtle ways you will discover too late. std::map doesn't scratch that itch because you have to provide an ordering function when 99% of the time insertion ordering would be just fine. I wrote about this at https://blog.toit.io/hash-maps-that-dont-hate-you-1a96150b49...


Features and containers have been deprecated before. Could it be an option to deprecate only the complexity requirements on some of the operations on std::unordered_map, so that we don't need a whole new container and standard library implementers get the option to use open addressing?


The problem with STD requirements is they choose an implementation and then tailor the requirements to it tightly, thus preventing any other set of tradeoffs.


That's not actually what drove it here. std::unordered_map has the requirements it does in order to make it a drop in replacement for std::map in cases where you don't care about traversal order. They wanted it to be as easy as possible for people to find/replace in their code.


Can I just settle for O(log n) lookup and forget about the theoretical O(1) and constant struggle to find the "best" hash function and data structure?


std::map (as opposed to std::unordered_map) uses a balanced binary search tree approach and thus guarantees O(log n) lookup for anything that has a well-defined total order (i.e. comparison function), without worrying about hash functions, etc.

So, yes you can ;)


Sure, just use a tree-based map such as `std::map`. (Which, as the name implies, was the OG associative container in C++.)


If you don't care about performance at all you can of course do that. For many real world applications however asymptotic runtimes are not enough, you need to care about constant factors too.


Really interesting post, love reading these kind of details into how collection types I take for granted work, and are iterated on


Now I'm curious to know what's the state of the art for open addressing implementations in C++.


It seems like the relevant benchmarks would be vs. absl, folly, etc, not libc++.


Both are relevant. This is supposed to be a drop in replacement for std::unordered_map whereas the other implementations break from the std API contract in various ways (either by not supporting certain operations, or not providing the same runtime complexity for certain operations).

So you'd expect the performance to fall in between std implementations and the others, with folly/absl/etc essentially serving as an upper bound on the possible gains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: