The main point is that the big O notation ignores the cache misses.
- try writing a quicksort as a template to avoid the comparison function calls
- in the sorting loop, switch to an insertion sort when buckets have <= 32 elements
You will see speed improvements of 2 to 3 times faster than a vanilla implementation of quicksort.
Although this approach is slightly more complex on the abstract level, it's faster because it reduces lots cache misses.
O(n) + O(n) = O(2n) = O(n) = O(n + 1) = O(n) + O(1)
Cache performance is just one example of a constant factor, right?
The point is to use data that was recently fetched into the (faster) cache memory as much as possible instead of incuring the penalty of a cache miss
Cache performance really depends on memory access patterns.
Anyways :) I'm being pedantic here, I should probably go back to work.
Yes, random access (missing the cache every time) may be 100x slower than sequential (hitting the cache almost always), but if you're iterating through an array twice as large, it will still be 100x slower.
I don't know about you, but I don't sort 100k records in a single batch, and if I am it's because I messed up. But I might sort different batches of 100 records 1000 times a minute.