Hacker News new | past | comments | ask | show | jobs | submit login

Of course this was a single example.

The main point is that the big O notation ignores the cache misses.

For example:

- try writing a quicksort as a template to avoid the comparison function calls

- in the sorting loop, switch to an insertion sort when buckets have <= 32 elements

You will see speed improvements of 2 to 3 times faster than a vanilla implementation of quicksort.

Although this approach is slightly more complex on the abstract level, it's faster because it reduces lots cache misses.

Surely the main point is that

    O(n) + O(n) = O(2n) = O(n) = O(n + 1) = O(n) + O(1)
and the constant factors left out of O() often dominate execution time.

Cache performance is just one example of a constant factor, right?

Cache performance is not constant, if it was, we wouldn't even consider it when optimizing.

The point is to use data that was recently fetched into the (faster) cache memory as much as possible instead of incuring the penalty of a cache miss

Cache performance really depends on memory access patterns.

Anyways :) I'm being pedantic here, I should probably go back to work.

Memory access time is a constant multiplier, whether it hits or misses the cache. There can be a big difference in that constant depending on the access pattern, but it is still considered a constant factor.

Yes, random access (missing the cache every time) may be 100x slower than sequential (hitting the cache almost always), but if you're iterating through an array twice as large, it will still be 100x slower.

I agree, on the same machine, the slow-downs are constant for each level of cache and for ram access.

One of my favorite ACM papers was an analysis on how the asymptotic runtime behaviors of various sort algorithms didn't tell the full story, and that the faster algorithms took a pretty big data set before they even broke even with some of the other sort algorithms. In fact I think at even 10,000 records it was still about 30% slower than a 'worse' algorithm and you had to get up to sorting a considerable amount of data before it was 'best'.

I don't know about you, but I don't sort 100k records in a single batch, and if I am it's because I messed up. But I might sort different batches of 100 records 1000 times a minute.

Not necessarily. You might want to create an algorithm that iterates in a sequential fashion through memory, since your memory controller will see the iteration and prefetch the next block of memory. This is a huge performance win on larger data structures.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact