
How expensive are the union and intersection of two unordered_set in C++? - AndreyKarpov
http://lemire.me/blog/2017/01/27/how-expensive-are-the-union-and-intersection-of-two-unordered_set-in-c/
======
_benedict
This seems like a fairly lazy article.

It is not at all a like-for-like comparison, to compare merging two _already
sorted_ vectors with (naively) merging two hash collections.

Yet there is no real elucidation of the meaningful take-aways, such as random
walks in memory over a larger structure are slower than a linear walk over a
more compact structure, or that if you have an already sorted collection and
don't need to shuffle it, you probably shouldn't.

Nor any attempt to normalise the results, by for instance constructing two
sorted vectors from the unordered sets, and merging these; mentioning of
course that this necessitates worse algorithmic complexity (but better
constant factors).

Nor even any discussion of the more efficient approaches for producing
intersection/union if you cannot afford to do batch-wise construction of a
sorted vector.

Basically, if you did not already know this before you read the article, you
probably are no better informed now.

------
greg7mdp
Interesting, but not unexpected, the memory locality of the vector dominates
otherwise similar algorithms.

One remark. In the unordered_set case, it would be faster to assign rather
than inserting the first set, because with insert() we loose the fact that the
elements in A are already unique and out will not be resized appropriately.

out = A; out.insert(B.begin(), B.end());

