Hm. On my 7300HQ, single channel 2667MHz DDR4 my hand crafted assembly code runs through 512 megs looking for a binary needle (8 bytes length for this example) in random data (mersenne twister) in 59 milliseconds. The needle is, of course, at the end of the 512megs. No SSE, on a single core and it scales nearly perfectly.
Not sure what to make of this article. In the end, what truly matters is knowing how a CPU operates and what not to do if you want to have your code running actually fast, not just "perceived fast".
That being said, I still consider the 59ms to be slow.
Not sure what to make of this article. In the end, what truly matters is knowing how a CPU operates and what not to do if you want to have your code running actually fast, not just "perceived fast".
That being said, I still consider the 59ms to be slow.