Linear scan is very fast, but it's not competing against a worse scan it's compe...

menaerus · 2024-10-08T11:51:48 1728388308

> Linear scan is very fast, but it's not competing against a worse scan it's competing against not scanning which is what the hash table is going to do.

Not sure I agree since "not scanning" would imply that we have a hash-table where no probing ever fails beyond the initial step. That is not a very realistic workload so I also think that N=4 is way pessimistic. Memory prefetchers, especially with linear access patterns in your code, will hide the latency so I would expect that N should be much much larger to show the diff.

tialaramex · 2024-10-08T15:27:16 1728401236

Probabilistically, the long probe lengths occur for large N, but for large N the linear scan is already horrible. But it's definitely worth actually measuring.

cmrdporcupine · 2024-10-09T01:16:24 1728436584

the point is that linear scan is not horrible when held in cache lines that are several times faster than main memory

that and branch prediction (or absence of it) leads to performance characteristics that seem entirely counter intuitive for people of my age who grew up counting cycles on a 68000

tialaramex · 2024-10-09T07:23:11 1728458591

The point, as I have already said once, is that even though it's fast it's not faster than not doing it at all.

For very small containers the linear scan wins because it's the same work you'd have done anyway so we're cutting to the chase. My conjecture is that they have to be very small. Several people now have insisted that it will somehow just stay faster beyond that, and my guess is that while they've thought about an actual linear scan on modern CPUs they have chosen to imagine the 1980s closed addressing hash table and so they've racing against pointer chasing - but that was last century. A modern hash table is just linear data but laid out more cleverly, so it's not chasing pointers it's just not wasting time on your linear scan.

C++ std::unordered_map is going to involve pointer chasing (the "hash table" just has pointers in it, each bucket is allocated separately), which is probably going to result in a cache miss and give you a lot of time to win the race. A Swiss Table is not, it's just M slots plus metadata, where typically M <= N*2 - several other techniques have the same consequence but I won't spell them all out.

Because M > N if we linear scanned the entire table it's maybe 2x slower, but we're not going to scan the entire table that's why it's a hash table, we're probably jumping to a single entry if it's even present.