But why use MD5 anyways? Google has quite an impressive image searching algorithm. Why not just use that? Using MD5 won't do anything because, almost certainly, a false positive is /not/ the same image. You'd need some kind of rolling hash that gets longer the longer the file is. Then do a soft check on that.