Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wrote my own duplicate file finder way back in the days.

I did the obvious trick of binning by size before trying to compute any hashes, and was mildly surprised to find how few out of my ~million files had exactly the same size.

For multiple files with identical size I just did the full file MD5, we only had HDD's back then and we all know how much they like random access.



I wrote one too, over 20 years ago. Still works, that .exe, even today. Unsurprisingly I was using CRC32 too. When I look at the code that is there I cringe, such is the mess there. Oh well, everyone has to start somewhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: