

Microsoft releases tool to spot abuse images - jorD8
http://www.bbc.co.uk/news/technology-33567981

======
nacs
So apparently, it's a cloud only service -- you have to upload all images to
Microsoft to use it.

Also to be able to use the service you have to apply as a business and be
approved manually.

This would have been more useful as a software that runs directly on your
server and checks at the time of upload (or to run against existing local
user-content) instead of requiring re-uploads of the content to their cloud,
not to mention the privacy implications.

Another puzzling thing is that they say MS doesn't retain the images and that
they're converting the content to hashes immediately and comparing against
known abuse hashes. Wouldn't it be more efficient for everyone if the hash
were generated locally and only the hash sent up to their cloud?

~~~
cmdrfred
I agree that sending the hash is the best implementation if it works as you
have described, but that is a system that would be trivial to circumvent (make
the .png into a .jpg for example). Maybe they convert all the images to a
similar file type and then hash it?

~~~
nacs
The hash they use is more complicated than that:

    
    
      converts images into a common black-and-white format 
      and uniform size, then divides the image into squares 
      and assigns a numerical value that represents the 
      unique shading found within each square. Together, 
      those numerical values represent the "PhotoDNA 
      signature" or "hash" of an image
    

\- [https://www.microsoft.com/en-
us/PhotoDNA/FAQ](https://www.microsoft.com/en-us/PhotoDNA/FAQ)

What I was suggesting was that the hash generation tool they use be
distributed so the hash can be generated locally before sending to MS. This
way MS can continue to keep all known hashes to themselves while making it
easier for businesses to bulk check images without huge bandwidth charges or
privacy issues.

~~~
cmdrfred
It appears we are in agreement. The only reason I can think of for them doing
this is maybe they want to keep the PhotoDNA stuff private? Imagine the
overhead on their end as well. The best way might be downloading the hashes to
each server and comparing them like a virus scanner.

