As they said though, the hash function on a string type requires looking at every element in the string. Also, modulo is historically very slow. So, they still have to deal with O(n) operations, where n is the length of the input string. If they can improve the memory hopping problem associated with lists / graph structures (which they claim to have done in their library), then a trie could would be much fast enough, which is what they observed. Combined with the fact that they claim that 90% of the time there is a miss when querying the trie, then you exit early a lot, whereas you always need to compute the whole hash on the input string when doing the hash map strategy.
The hash does not need to be computed on the whole string. I pointed this out in my other comment but just as a example: a hash function could be as simple as xoring the first 16 and last 16 bits of the string then indexing a 2^16 array. That means hashing is two pointer offsets and an xor (no modulo required). If there are 100 strings that need to be removed, then ~99% of rejections will be a very fast O(1).
And in the case of a match, a fast hash + memcmp will be way faster than a trie traversal. In fact, according to the trie-hard github readme, std::HashMap is already much faster than their trie implementation when searching for a match.
> As they said though, the hash function on a string type requires looking at every element in the string.
Maybe for the default hash function. As another commenter pointed out, your data may make the following hash very effective: s[0] + s[len(s)//2] + s[-1] which would be very fast. The point being is spending a day seeing if such a hash exists is worth it.
CRC32 is the same speed as an integer multiply, going all the way back to Nehalem (2008). 3 cycles latency, and you can start a new one each cycle (or more than one, on Zen 5).