Hacker News new | past | comments | ask | show | jobs | submit login

Linear scan is very fast, but it's not competing against a worse scan it's competing against not scanning which is what the hash table is going to do.

At N=4, say we're mapping one 64-bit integer to another for some reason, so that's 16 bytes per entry, four entries, 64 bytes, conveniently sized linear scan table.

The linear scan reads each of the four keys in turn, if one matches that's a hit, if none do we missed. Four 64-bit reads, four 64-bit compares & branches.

A Swiss table big enough for N=4 has eight slots (128 butes) plus 16 bytes of metadata (total 144 bytes). Half of the slots are empty.

We do a single arithmetic operation on the key, picking the only 16 bytes of metadata which exist, then we load that data, and a single SIMD instruction matches either zero or one entry, and if it wasn't zero we load and compare that entry.

In practical systems you'll get more leeway because Swiss tables want a real hash, your keys probably aren't suitable so maybe there's a dozen or more ALU operations to hash a key - which is a substantial head start for a linear scan. But hey we're talking small amounts of data, who even cares about hash quality?

But do measure, I too am interested, at least in a real shoot out between an actual linear scan and a decent hash table. If the "same API as a hash table but same implementation as linear scan" makes sense for some sizes maybe it's worth writing that collection type.




> Linear scan is very fast, but it's not competing against a worse scan it's competing against not scanning which is what the hash table is going to do.

Not sure I agree since "not scanning" would imply that we have a hash-table where no probing ever fails beyond the initial step. That is not a very realistic workload so I also think that N=4 is way pessimistic. Memory prefetchers, especially with linear access patterns in your code, will hide the latency so I would expect that N should be much much larger to show the diff.


Probabilistically, the long probe lengths occur for large N, but for large N the linear scan is already horrible. But it's definitely worth actually measuring.


the point is that linear scan is not horrible when held in cache lines that are several times faster than main memory

that and branch prediction (or absence of it) leads to performance characteristics that seem entirely counter intuitive for people of my age who grew up counting cycles on a 68000


The point, as I have already said once, is that even though it's fast it's not faster than not doing it at all.

For very small containers the linear scan wins because it's the same work you'd have done anyway so we're cutting to the chase. My conjecture is that they have to be very small. Several people now have insisted that it will somehow just stay faster beyond that, and my guess is that while they've thought about an actual linear scan on modern CPUs they have chosen to imagine the 1980s closed addressing hash table and so they've racing against pointer chasing - but that was last century. A modern hash table is just linear data but laid out more cleverly, so it's not chasing pointers it's just not wasting time on your linear scan.

C++ std::unordered_map is going to involve pointer chasing (the "hash table" just has pointers in it, each bucket is allocated separately), which is probably going to result in a cache miss and give you a lot of time to win the race. A Swiss Table is not, it's just M slots plus metadata, where typically M <= N*2 - several other techniques have the same consequence but I won't spell them all out.

Because M > N if we linear scanned the entire table it's maybe 2x slower, but we're not going to scan the entire table that's why it's a hash table, we're probably jumping to a single entry if it's even present.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: