I, and my data, disagree. Most memcmp() calls look at a single byte and return. ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jws on May 14, 2009 \| parent \| context \| favorite \| on: Exploring memcmp I, and my data, disagree. Most memcmp() calls look at a single byte and return. Let us consider five test cases: memcmp0 - Calling memcmp() at each byte in the file looking ofr a match, as the original article. memcmp1 - As above, but optimize by checking the first character and only calling if it matches. memmem0 - A single call to the glibc memmem() function. Not portable. kmp0 - The Knuth-Morris-Pratt algorithm as implemented at "Exact String Matching Algorithms", http://www-igm.univ-mlv.fr/~lecroq/string/index.html bm0 - The Boyer Moore algorithm, also from above Looking for a 1k needle which lies at the end of a 100MB haystack of random bytes, with the data in file system buffers on a 1.6GHz Atom processor running Linux 2.6.29: `algorithm milliseconds memcmp0 2400 ================================== memcmp1 530 ======== memmem0 460 ======= kmp0 970 ============== bm0 70 =` On the whole codesink1 wins with the Boyer Moore suggestion. In broader terms, algorithm selection beats optimization and all is right with the CS universe. (KMP's poor showing surprises me, but I've not used it before and don't know what to expect.) If we assume the data is out on a hard drive, then we get these numbers (same machine, a WD Green Power HD which is an evil thing to benchmark with because it has variable spindle and seek speeds, but suffice it to say it is not terribly fast). All OS caches flushed before each test. `algorithm milliseconds memcmp0 3500 =================================== memcmp1 1300 ============= memmem0 1200 ============ kmp0 1700 ================= bm0 1100 ===========` Here disk read time sets a lower bound, algorithm is still important, but not so much as before. I have a sneaky suspicion that mmap() is a poor choice here. There doesn't seem to be much overlap between the IO and compute when looking at the memcmp0 times. Testing on a beefier machine, with a Core 2 Duo 2.8GHz, OS X gives this: `algorithm milliseconds memcmp0 1100 ========================= memcmp1 180 ==== memmem0 not portable kmp0 360 ======== bm0 88 ==` The processor is several times faster, but bringing pages in from the OS is slightly slower. I don't know how to flush OS buffers on this machine so I can't do the disk reading tests. Update: well crap. this fell of the front page before I finished testing. No one will see these results, but I enjoyed the tests anyway. It is useful to update my rules of thumb and preconceived notions.

tptacek on May 14, 2009 | [–]

I'm impressed with the benchmark, but it seems to me like using the right algorithm for the job (in this case, Boyer-Moore or memmem) beats hacking memcmp to do a job it shouldn't be employed to do.

jmtulloss on May 14, 2009 | | [–]

I see it, and I'm impressed. I'll link to it for you!

codesink on May 14, 2009 | [–]

very good analysis, thanks for sharing that

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact