Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Generating Collisions on NeuralHash (gist.github.com)
22 points by GistNoesis on Aug 9, 2021 | hide | past | favorite | 6 comments



This is a proof of concept on generating collisions on a perceptual hash similar to the one Apple recently decided to use to scan your devices for CSAM. (https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...)

Perceptual hashes are not cryptographically secure, so no matter how much crypto you add on top of them, they are one weak point of attack of the whole system.

To create a collision you search for an input image which minimize the distance between the hash and the target hash. Because neural network are continuous you can use a gradient method.

I don't have the architecture nor the weights of the actual neural hash they use so this is only a POC in the general case.

What does this mean : An attacker can easily produce natural images that have a specified perceptual Hash. An attacker can generate a meme image which looks normal but has a hash which is inside the CSAM database. (he can easily get one of these bad hash values by computing the neural hash of a known offensive image) Then he send you a mail, then you save the image to your cloud because you find it funny (or because your phone automatically back-up your mail to your cloud) , and a collision is registered and you get arrested (when the manual review fails, for example if the attacker has steganographically hidden offensive content in those images).


>I don't have the architecture nor the weights of the actual neural hash they use so this is only a POC in the general case.

Then why is this titled "Generating Collisions on NeuralHash" and why are you making claims about the how easy it is to fool NeuralHash based on it?


It's a shorter version of "Generating Collisions on a Neural Perceptual Hash like Apple's NeuralHash" optimized for click-baitingness ;), but the spirit of the attack is there.

It's a pretty standard attack based on the fact they use a neural network, therefore the function is continuous(ly differentiable) and vulnerable to (gradient-)optimisation based attack. Neural networks are famous for being susceptible to adversarial attacks.

I don't own an IPhone, so that's as far as I can go. (If you have one, jailbreak it, locate and copy the weights).

Unless they have additional security on their version of the NeuralHash they don't speak about in their technical documentation, they have the same weakness that this standard generic neural network perceptual hash has.

The neural-network is considered public as it's run on device without homomorphic-encryption. They also can't change it very often because every-time they would need to recompute the hashes of the database of sensitive material they don't own (every-time leaking more information on the raw sensitive material).


In the manual review failure case, you're saying that the reviewer would decrypt the steganographical encryption, discover the hidden offensive content, and thereby decide that it's a true positive?

And related to that, I wonder how steganographically hidden offensive content would affect a perceptual hash. Would it help or would it hurt the similarity measure if an attacker were trying to generate a benign-looking collision in the manner described.


My guess is that adding steganographically hidden content is almost free for the attacker as perceptual hash is resistant to small variation.

There is a free parameter called gap in my code which help make sure that you can modify the image a little bit and still get the same hash.

So probably the attacker can use the least significant bit of each pixel as a hidden channel. Take 8 pictures and you can hide any grey image ((or hide them in the same one if the original image is of high resolution (perceptual hash is almost like acting on thumbnails) ).

The human eye won't be able to see anything special in the picture, but a forensic software will definitely notice that something's off in the least significant bits of your pictures and light some red flag. Then once the reviewer figure the steganographical scheme, he will decode the hidden picture, and after having a ha-ah moment he will be convinced 100% that this is offensive content and that it should be reported.


What you're highlighting is the lack of avalanche effect with perceptual hashes (on purpose). This makes them great for identifying transformed content (e.g. local TV channels via a cable box, ads on channels with slid in lower third tickets, etc) but horrible as an evidentiary basis.

That said, there's no need to clickbait here. Perceptual hashes are going to be a hot topic for a while here.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: