Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You’re stretching there imho.

L1 and L2 are per core. A cacheline is 64 bytes and per-core cache size is on the order of 32kb and 256kb for L1/L2.

Reading data from L1 is on the order of 1 nanosecond (or less) and RAM on the order of 50 nanoseconds.

If you’re scanning an array and load a dozen cachelines that’s almost certainly preferable to several cache-misses (and lines).

Memory access is very often an application’s bottleneck. The answer is almost always more arrays and fewer pointers.



> The answer is almost always more arrays and fewer pointers.

The number of people who dismiss the lowly array is way too high. Arrays are fast. Keep your data in the cache and they're 10x or more faster than normal, and plain flat arrays are almost always faster than literally any OOP nonsense. By "almost always" I mean that I've never once encountered a situation where flat arrays weren't the fastest solution attempted, and I haven't seen everything, so I can't claim they're always fastest.

People really don't understand how fast their computers really are, thanks to developers not caring how fast their code is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: