Why would anyone bother with such an attack? The end result is that some peon at Apple has to look at the images and mark them as not CSAM. You've cost someone a bit of privacy, but that's it.
It's entirely possible to alter an image such that its raw form looks different from its scaled form [0]. A government or just well resourced group can take a legitimate CSAM image and modify it such that when scaled for use in the perceptual algorithm(s) it changes to be some politically sensitive image. Upon review it'll look like CSAM so off it goes to reporting agencies.
Because the perceptual hash algorithms are presented as black boxes the image they perceive isn't audited or reviewed. There's zero recognition of this weakness by Apple or NCMEC (and their equivalents). For the system to even begin to be trustworthy all content would need to be reviewed raw and scaled-as-fed-into-the-algorithm.
This attack does seem easily defeated, even naively, by downscaling by three different means (bicubic, nearest neighbor, Lanczos, etc.) and rejecting the downscale that most differs from the other two, since the attack is tailored to a specific downscaling algorithm -- the attack seems to only be effective against systems that make no effort at all to safeguard against it.
Granted, Apple makes no mention of any safeguard, but it would be trivial in principle to protect against, and is not an unavoidable failing.
The objective of being mindful of the thumbnail is to fool the human reviewer responsible for alerting the police to your target's need for a good swatting - the algorithm has already flagged the image by the time it is presented as a thumbnail during review.
You'd basically start off with an image known (or very likely) to be cataloged in a CP hash database.
Note its NeuralHash.
Find a non-CP image that would, after being scaled down or otherwise sanitized, fool an unaccountable and likely disinterested Apple employee into muttering "close enough" while selecting whichever option box it is that causes life ruination.
Feed that imagine into an adversarial network until it spits out the desired NeuralHash.
Distribute that image to everyone who has ever disagreed with you on the internet, prayed to the wrong god, competed with you in business, voted the wrong way, etc.
my aim was to point out that the above reverenced "image scaling attack" is easily protected against, because it is fragile to alternate scaling methods -- it breaks if you don't use the scaling algorithm the attacker planned for, and there exist secure scaling algorithms that are immune. [0] Since defeating the image scaling attack is trivial, it means that, if it is addressed, the thumbnail will always resemble the full image.
With that out of the way, that, obviously, just forecloses this one particular attack, specifically, where you want the thumbnail to appear dramatically different than the full image in order to fool the user that it's an innocent image and the reviewer that it's an illegal image. It's still, never-the-less, possible to have a confusing thumbnail -- perhaps an adult porn image engineered to have a CSAM hash collision will be enough to convince a beleaguered or overeager reviewer to pull the trigger. The "Image Scaling Attack" is neither sufficient or necessary.
(However, that confusing image would almost certainly not also fool Apple's unspecified secondary server-side hashing algorithm, as referenced on page 13 of Apple's Security Threat Model Review, so would never be shown to a human reviewer: "as an additional safeguard, the visual derivatives themselves are matched to the known CSAM database by a second, independent perceptual hash" [1])
> However, that confusing image would almost certainly not also fool Apple's unspecified secondary server-side hashing algorithm, as referenced on page 13 of Apple's Security Threat Model Review...
Uh, on what timescale? If you mean "tomorrow" then sure, if you mean "for years" - then no. They're relying on the second perceptual hashing algorithm to remain a secret, which is insanely foolish. Just based on what I know about these CP hashlists and the laziness of programmers, I feel pretty confident that it is either an algorithm trained on the thumbnails themselves (which would be laughably bad) or it was a prior attempt that got replaced by what is now deployed on the users' hardware. Why would I think that? Because it would have been the only other thing on hand for the necessary step of generating the hash black list. So they're stuck with at least one of those forever - and will have a very limited range of potential responses to the massive infosec spotlight picking them apart... unless they want to recatalog every bit of CP all over again.
Yeah, I don’t have that answer, of course. But nothing prevents them from changing that secondary algorithm yearly, or at whatever rate the CSAM database owners would tolerate full rehashing, or chaining together multiple hashes. They can literally tune it to whatever arbitrary false positive rate they want. Although, not knowing any better, I would guess that they would just use Microsoft’s PhotoDNA hash unchanged, and just keep it under wraps, since I think that’s what they already use for iCloud email attachment scanning. PhotoDNA just does a scaled down, black and white edge/intensity gradient comparison, and not a neural net feature detection. I would think using a completely different technology would make the pair of algorithms extremely robust taken together, but that’s not my field.
While there may not be an immovable obstacle standing between them and a complete recataloging, there are a lot of factors that would strongly disincentivise it. Chief among them being the fact that the project is a already a radioactive cost center - and unless they plan on switching industries and giving Blue Coat a run for its money, it always will be.
> ...chaining together multiple hashes.
That would be the lazy programmer way to do it that would very likely result in a situation where correlation starts popping up - that is why DBAs weren't advised to do some whacky md5/sha1 mashup that avoids requiring every user rekey in the wake of a digest bump up.
> ...I would guess that they would just use Microsoft’s PhotoDNA hash unchanged...
That is a reasonable guess, because that is what all the NGOs have been using - IWF being one of the more notorious. That would be bad news though, for anyone expected the thumbnail perceptual hashing step to provide meaningful protection.
> I would think using a completely different technology would make the pair of algorithms extremely robust...
Nope - which is why you don't see hybrid cryptographic algorithms. Also, if they are using PhotoDNA on their verification step then they actually implemented the thing totally backwards... because the high-pass filter approach makes it resistant to the hash length extension attacks that are imperceivable to humans. That counts for nothing by the time the first algorithm has been fooled by an extension attack (and this neural thing is definitely vulnerable to it), because the attacker would already be selecting for a thumbnail image that would fool a human in the second step - and PhotoDNA would be looking for the exact same thing that a human would: points of contrast.
BTW, PhotoDNA is a black box with no outside scrutiny to speak of - you can count on one hand the number of papers where it is even mentioned (and only ever in passing).
> The objective of being mindful of the thumbnail is to fool the human reviewer responsible for alerting the police to your target's need for a good swatting - the algorithm has already flagged the image by the time it is presented as a thumbnail during review.
Yeah, mentioned something like that here [0]:
>> And then one can compromise and infect millions of such backdoored devices and start feeding (much cheaper than the government enforcement implementation) spoofed data into these systems at scale on these backdoored devices that act like "swatting as a service" and completely nullify any meaning they could get from doing this.
So have you never heard of catfishing? Because if you have, then you know it wouldn't be hard to do just what you described - and you're pretending otherwise for some reason.
The attack relies on the fact that when downscaling by a large factor, the tested downscalers (except Pillow in non-nearest neighmode mode, and all of them in area averaging mode) ignore most of the pixels of the original image and compute the result based on the select few which are the same in all modes, making the result look nearly the same regardless of the mode.
Thanks for that reference to Pillow. I presume it's from "Understanding and Preventing Image-Scaling Attacks in Machine Learning" [0] which mentions secure scaling algorithms immune to the attack. I wish I could mention this in the grand parent, but the editing window closed.
Just a correction for you, there's not a list of approved images. The CSAM database are a list of illegal (unapproved if you will) images.
Other than that, yes it's possible to add noise to an image so a perceptual algorithm misidentifies it. I described the false positive case but it can also be used for false negatives. Someone can apply noise to a legit CSAM image (in the NCMEC database) so Apple's system fails to identify it.
The false positive case is scary because if it happens to you your life is ruined. The false negative case just means people have CSAM and don't get found by the system. I'm much more concerned about the false positive case.
Keep in mind that there are multiple straight paths from the false negative case to the false positive case. I'll give you one examples: pedos can and will use the collider to produce large batches of CSAM that collide with perfectly legitimate images (e.g. common iPhone wallpapers). They literally have nothing to lose by doing this.
Eventually, these photos will make their way into the NCMEC database, and produce a large number of false positives. This will also make the other attacks discussed here easier to execute (e.g. by lowering the human review threshold, since everybody will start with a few strikes).
> The end result is that some peon at Apple has to look at the images and mark them as not CSAM.
As others said, if the non-csam looks sexual at all, they'll probably get flagged for post-apple review.
Beyond that, it doesn't seem to be in apple's interest to be conservative in flagging. An employee reviewer's best interest is to minimize false negatives not false positives.
As many mentioned, even an investigation can have horrible affects on some (innocent) person's life. I would not be shocked to learn that some crafty individual working at "meme factories" creating intentional collisions with distributed images just for "fun" - and politically motivated attacks seem plausible (eg. make liberal political memes flag CSAM).
Then there are targeted motives for an attack. Have a journalist you want to attack or find a reason to warrant? Find them on dating app and send them nudes with CSAM collisions. Or any number of other targetted attacks against them.
> Beyond that, it doesn't seem to be in apple's interest to be conservative in flagging. An employee reviewer's best interest is to minimize false negatives not false positives.
I would have thought the opposite. If there is a false positive that leads to an arrest and ruins someone's life, but the public sees that it is a false positive, then Apple will take an enormous hit in the marketplace. Nobody will want to take on the demonstrated real risk being falsely accused of possessing CSAM.
If they have a false negative, it is unclear to me what negative effects Apple would suffer. As far as I know, nobody would know about it outside of Apple.
Another commenter used the right phrase, "terribly asymmetry". There won't be any such thing in the public eye as a "false positive", only kiddie porn traders who got away with it.
> Nobody will want to take on the demonstrated real risk being falsely accused of possessing CSAM.
I don't think this is much of a risk. Defamation is difficult to prove, never applies to the law enforcement agencies who are going to make these arrests and "ruin people's lives", is never going to be down to Apple anyway (Apple is only referring things for investigation). I don't think you'll see much public indignance about this--even arguing against it in the real public eye (outside of wonky venues like HN) sounds tantamount to "supporting kiddie porn" in the naïve view of most members of the public.
> If they have a false negative, it is unclear to me what negative effects Apple would suffer. As far as I know, nobody would know about it outside of Apple.
This could potentially arise in any case with an abuser or producer or enabler or pimp or Maxwell/Epstein customer who has an iPhone; which is a lot of cases. As soon as Apple devices are supposed to "detect" kiddie porn, people will ask why people like this weren't caught earlier; and since Apple has money, they won't just ask this in the court of public opinion, they will sue for damages for abuses that "should have" been prevented by Apple's inspection of their pictures. Even if that's unlikely, it's much easier for a peon whose job it is to look at kiddie porn to just forward it on; and such a case really could damage Apple's optics.
At least I'm given to understand that they already scanned all these photos uploaded to iCloud anyway (in the same way many other similar providers do). Whether it happens on the device or the server doesn't seem to make any difference to this attack.
(That's not to say that (a) the scanning of stuff on a server was a good idea in the first place or (b) encouraging politicians to use your own device to spy on you is a good idea or (c) this isn't the thin end of a very painful wedge, just that we've not opened a new vulnerability)
Because people behind the keyboards make mistakes all the time. Just in the last month i experienced
* a call center agent at a haulage firm, instead of entering the delivery date we talked about on the phone, clicked for the delivery to be returned to the factory.
* Google automatically blocked an ad account from delivering ads because we allegedly profiteered from Covid (untrue of course, but we surly talked about the challenges caused by the pandemic somewhere on the site, so the "AI" apparently got triggered by some keywords), and humans repeatedly confirmed the AI decision.
* Facebook blocked an ad account that was unused in 2020, wanted ID, got the correct ID (identical name etc.), and the human denied confirmation.
Google and Facebook are of course known to be beyond kafkaesque, so this is no surprise. But imagine the costs the innocents pay once they accidently get entered into the FBI CP suspect database.
Why is this question being downvoted? I too would like to know what this attack achieves.
From what I see, the end result of false flagging is either someone has CSAM in iCloud and you push them over the threshold that results in reporting and prosecution, or there is no CASM, so the reviewer sees all of the hash collision images, including those that are natural.
Is the problem that an attacker can force natural hash collision images to be viewed by a reviewer, violating that persons privacy? Do we know if this process is different than how Google, Facebook, Snapchat, Dropbox, Microsoft, and others have implemented these necessarily fuzzy matches for their CSAM scans of cloud hosted?
Or am I missing something that the downvoters saw?
You are one underpaid random guy in India looking at CSAM all day clicking the wrong button away from a raid of your home and the end of your life as you know it.
You're assuming the police are nice and quickly announce when
they haven't found anything.
The more likely outcome is that it takes several months until the case is dropped silently.
Could you explain this? The process, according to what's publicly known, is that the images will go to NCMEC for further review, then NCMEC will report it to the authorities, if it's actually CSAM. The low paid (a big assumption here) Apple reviewer is only the final step for Apple, not prosecution.
This, these charges are damning once they are made. Plus the countless legal dollars you are going to have to front and hours spent proving innocence and that's assuming the justice system actually works.. Try explain this to your employer while you start missing deadlines due to court dates.. The police also could easily leverage this to warrant hop. As they have been found doing in the past. I think the bike rider who had nothing to do with a crime and got accused because he was the only one in a broad geo location Warren is all the president of of you need that this will be abused.
The idea I've heard is that images could be generated that are sexual in nature but that have been altered to match a CSAM hash, making a tricky situation.
That's an interesting point! From my understanding, Apple's hash is not the final qualifier. SCMEC also reviews them before reporting to the authorities. But, I can imagine a scenario that might require opinion.
> "The end result is that some peon at Apple has to look at the images and mark them as not CSAM. You've cost someone a bit of privacy, but that's it."
This can be abused to spam Apple's manual review process, grinding it down to a halt. You've cost Apple time and money by making them review each such fake report.
> You've cost Apple time and money by making them review each such fake report.
Ok, but… how do I profit? If I wanted to waste Apple employee time, I could surely find a way to do it, but why would I? The functioning of society relies on the fact that people generally have better things to do than waste each others time.
Or you could identify the factors that cause a hash to be computed and then start generating random images that compute to the same hash, creating 10s of thousands of images of digital noise that all look alike to the computer.
It can’t be. There’s a different private hash function that also has to match that particular csam image’s hash value before a human sees it. An adversarial attack can’t produce that one since the expected value isn’t known.
This second "secret" hash function, because it is applied to raw offensive content that Apple can't have, has to be shared at least with people maintaining the CSAM database.
You can't rely that it won't ever leak, and when it does, it will be almost undetectable and have huge consequences.
As soon as the first on-device CSAM flag has been raised, it becomes a legal and political problem. Even without a second matching hash, it already put Apple in an untenable position. They already are in a mud fight with the pigs.
They can't say : we got 100M hits this month on our first CSAM filter but we only reported 10 cases, because to avoid false positives our second filter throw everything to dev/null, and we didn't even manually reviewed them because your privacy matter to us. It has become a political problem where for good measure they will have to report cases to make the numbers look "good".
Attackers of the system can also plant false negatives aka real CSAM that has been modified enough to pass the first hash but fail this second hash. So that, in the audit, independent security researchers who review Apple system, will be able to say that Apple automated system, sided with the bad guys, by rejecting true CSAM and not reporting it.
Also remember, that Apple can also do something else than what they say they do for PR reasons : maybe some secret law will force them to reveal to the authorities as soon as the first flag has been raised, and force them not tell about it. And because it's in the name of fighting the "bad guys", that's something most people expect them to do.
From the user perspective, there is nothing we can audit, it's all security by obscurity disguised with pseudo-crypto-PR, it's just a big "Trust us" blanked signed paper that will soon be used to dragnet surveil anyone for any content.
What if I can generate an attack that will mark your own picture of your own toddler nude in a bathtub as CSAM? Do you still feel confident in "some peon at Apple" to mark it as not CSAM?
Okay, let's play peon. Here are three perfectly legal and work-safe thumbnails of a famous singer: https://imgur.com/a/j40fMex. The singer is underage in precisely one of the three photos. Can you decide which one?
If your account has a large number of safety vouchers that trigger a CSAM match, then Apple will gather enough fragments to reassemble a secret key X (unique to your device) which they can use to decrypt the "visual derivatives" (very low resolution thumbnails) stored in all your matched safety vouchers.
An Apple employee looks at the thumbnails derived from your photos. The only judgment call this employee gets to make is whether it can be ruled out (based on the way the thumbnail looks) that your uploaded photo is CSAM-related. As long as the thumbnail contains a person, or something that looks like the depiction of a person (especially in a vaguely violent or vaguely sexual context, e.g. with nude skin or skin with injuries) they will not be able to rule out this possibility based on the thumbnail alone. And they will not have access to anything else.
Given the ability to produce hash collisions, an adversary can easily generate photos that fail this visual inspection as well. This can be accomplished straightforwardly by using perfectly legal violent or sexual material to produce the collision (e.g. most people would not suspect foul play if they got a photo of genitals from their Tinder date). But much more sophisticated attacks [2] are also possible: since the computation of the visual derivative happens on the client, an adversary will be able to reverse engineer the precise algorithm.
While 30 matching hashes are probably not sufficient to convict somebody, they're more than sufficient to make somebody a suspect. Reasonable suspicion is enough to get a warrant, which means search and seizure, computer equipment hauled away and subjected to forensic analysis, etc. If a victim works with children, they'll be fired for sure. And if they do charge somebody, it will be in Apple's very best interest not to assist the victim in any way: that would require admitting to faults in a high profile algorithm whose mere existence was responsible for significant negative publicity. In an absurdly unlucky case, the jury may even interpret "1 in 1 trillion chance of false positive" as "way beyond reasonable doubt".
Chances are the FBI won't have the time to go after every report. But an attack may have consequences even if it never gets to the "warrant/charge/conviction" stage. E.g. if a victim ever gets a job where they need to obtain a security clearance, the Background Investigation Process will reveal their "digital footprint", almost certainly including the fact that the FBI got a CyberTipline Report about them. That will prevent them from being granted interim determination, and will probably lead to them being denied a security clearance.
(See also my FAQ from the last thread [1], and an explanation of the algorithm [3])
Fair enough. I suppose it's true that you could create a colliding sexually explicit image where age is indeterminate, and the reviewer may not realize it isn't a match.
> Given the ability to produce hash collisions, an adversary can easily generate photos that fail this visual inspection as well.
Apple could easily fix this by also showing a low-res version of the CSAM image that was collided with, but I'll grant that they may not be able to do that legally (and reviewers probably don't want to look at actual CSAM).
The problem is that it is a scaled low-res version. There are well publicized attacks[1] showing you can completely change the contents of the image post scaling. There's also the added problem that if the scaled down image is small, even without the attack, it's impossible to make a reasonable human judgement call (as OP points out).
The problem isn't CSAM scanning in principle. The problem is that the shift to the client & the various privacy-preserving steps Apple is attempting to make is actually making the actions taken in response to a match different in a concerning way. One big problem isn't the cases where the authorities should investigate*, but that a malicious actor can act surreptitiously and leave behind almost no footprint of the attack. Given SWATting is a real thing, imagine how it plays out if child pornography is a thing. From the authorities perspective SWATting is low incidence & not that big a deal. Very different perspective on the victim side though.
* One could argue about the civil liberties aspect & the fact that having CSAM images is not the same as actually abusing children. However, among the general population that line of reasoning just gets you dismissed as supporting child abuse & is only starting to become acknowledged in the psychiatry community.
You're adding quite a lot of technobabble gloss to an "attack vector" that boils down to "people can send you images that are visually indistinguishable from known CSAM".
Guess what, they can already do this but worse by just sending you actual illegal images of 17.9 year olds.
While it would be bad to be subjected to such an attack, and there is a small chance it would lead to some kind of interaction with law enforcement, the outcomes you present are just scaremongering and not reasonable.
I suggest you reread the comment, because "people can send you images that are visually indistinguishable from known CSAM" is not what is being said at all. Where did you even get that from?
The point is precisely that people can become victims of various new attacks, without ever touching photos that are actual "known CSAM". For Christ's sake, half the comments here are about how adversaries can create and spread political memes that trigger automated CSAM filters on people's phones just to "pwn the libz".
> Guess what, they can already do this but worse by just sending you actual illegal images of 17.9 year olds.
No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people. E.g. your own photos are not in the NCMEC databases, and you'd have to reveal your own illegal activities to get them in there. You (or malicious political organizations) especially cannot attack and expose "wrongthinking" groups of people by sending them photos of 17.9 year olds.
> No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people.
An attacker can embed a matching image inside of a PowerPoint zip file, and email it to any corporate employee using O365.
Or, an angry parent can call the police and let them know that a 16 year old possesses nose pictures of their 15 year old girlfriend.
The over top response to this controversy is really disappointing.
Sure, your proposed attack, that requires the victim to have a 15 year old girlfriend, to break an (admittedly silly) law by having nude photos on their phone, for you to call the cops, and for them to take such a call seriously is clearly comparable to a vector that can be used to target innocents, groups of individuals, etc. who did not break the law in any way, and that do not require the attacker to handle prohibitex material at all, and requires Apple to keep a ton of information completely obscure to even provide a weak semblance of security (it was shown to be completely broken except possibly for one unknown hash, in two weeks). Clearly comparable. Sure. Clearly.
For one last time, the NeuralHash collisions make this tool perfectly unusable for catching pedos: all of the next generation of CSAM content will collide with hashes of popular, innocent images. Two weeks after it was deployed, Apple's CSAM scanning is now _only_ an attack vector and a privacy risk. It's completely useless for its nominal function. This would be a massive, hilarious own goal from Apple even if the public reaction was over the top (although it isn't). They just reduced the privacy and security of nearly all their customers, further exposed themselves to the whims of governments, and for no gain whatsoever.
Can you explain how these theoretical political memes hash-match to an image in the NCMEC database, and then also pass the visual check?
> "No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people."
Did I say "taking"? I am talking about sending (theoretical) actual images from the NCMEC database. This is functionally identical to the "attack" you describe.
Yes, I can. This is just one possible strategy: there are many others, where different things are done, and where things are done in a different order.
You use the collider [1] and one of the many scaling attacks ([2] [3] [4], just the ones linked in this thread) to create an image that matches the hash of a reasonably fresh CSAM image currently circulating on the Internet, and resizes to some legal sexual or violent image. Note that knowing such a hash and having such an image are both perfectly legal. Moreover, since the resizing (the creation of the visual derivative) is done on the client, you can tailor your scaling attack to the specific resampling algorithm.
Eventually, someone will make a CyberTipline report about the actual CSAM image whose hash you used, and the image (being a genuine CSAM image) will make its way into the NCMEC hash database. You will even be able to tell precisely when this happens, since you have the client-side half of the PST database, and you can execute the NeuralHash algorithm.
You can start circulating the meme before or after this step. Repeat until you have circulated enough photos to make sure that many people in the targeted group have exceeded the threshold.
Note that the memes will trigger automated CSAM matches, and pass the Apple employee's visual inspection: due to the safety voucher system, Apple will not inspect the full-size images at all, and they will have no way of telling that the NeuralHash is a false positive.
Okay, perhaps the three thumbnails was unclear. I didn't mean to illustrate any specific attack with it, just to convey the feeling of why it's difficult to tell apart legal and potentially illegal content based on thumbnails (i.e. why a reviewer would have to click "possible CSAM" even if the thumbnail looks like "vanilla" sexual or violent content that probably depicts adults). I'd splice in a sentence to clarify this, but I can't edit that particular comment anymore.
Ok yeah, I do agree this scaling attack potentially makes this feasible, if it essentially allows you to present a completely different image to the reviewer as to the user. Has anyone done this yet? i.e. an image that NeuralHashes to a target hash, and also scale-attacks to a target image, but looks completely different.
(Perhaps I misunderstood your original post, but this seems to be a completely different scenario to the one you originally described with reference to the three thumbnails)
This attack doesn’t work. If the resized image doesn’t match the CSAM image your NeuralHash mimicked, then when Apple runs it’s private perceptual hash, the hash value won’t match the expected value and it will be ignored without any human looking at it.
We have no reason to believe that Apple's second, secret perceptual hash provides any meaningful protection against such attacks. At best, we can hope that it'll allow early detection of attacks in a few cases, but chances are that's the best it can do. We might not ever learn: Apple now has a very strong incentive not to admit to any evidence of abuse or to any faults in their algorithm.
(Sorry, this is going to be long. I know understand most/all of this stuff, it's mostly there to provide a bit of context for the users reading our exchange)
The term "hash function" is a bit of a misnomer here. When people hear "hash", they tend to think about cryptographic hash functions, such as SHA256 or BLAKE3. When two messages have the same hash value, we say that they collide. Fortunately, cryptographic hash functions have several good properties associated with them: for example, there is no known way to generate a message that yields a given predetermined hash value, no known way to find two different messages with the same hash value, and no known way to make a small change to a message without changing the corresponding hash value. These properties make cryptographic hash functions secure, trustworthy and collision-resistant even in the face of powerful adversaries. Generally, when you decide to use two unrelated cryptographic hash algorithms instead of one, executing a preimage attacks against both hashes becomes much more difficult for the adversary.
However, as you know, the hash functions that Apple uses for identifying CSAM images are not "cryptographic hash functions" at all. They are "perceptual hash functions". The purpose of a perceptual hash is the exact opposite of a cryptographic hash: two images that humans see/hear/perceive (hence the term perceptual) to be the same or similar should have the same perceptual hash. There is no known perceptual hash function that remains secure and trustworthy in any sense in the face of (even unsophisticated) adversaries. In particular, preimage attacks against perceptual hashes are very easy, compared to the same attacks against cryptographic hashes.
Using two unrelated cryptographic hashes meaningfully increases resistance to collision and preimage attacks. Using ROT13 twice does not increase security in any meaningful sense. Using two perceptual hashes, while not as bad, is still much closer to the "using ROT13 twice for added security" than to the "using multiple cryptographic hashes" end.
Finding a SHA1 collision took 22 years, and there are still no effective preimage attacks against it. Creating the NeuralHash collider took a single week. More importantly, even if you were to use two unrelated perceptual hash functions, executing a preimage attacks against both hashes need not become much more difficult for the adversary: easy * easy is still easy. Layering cryptography upon cryptography is meaningful, but only as long as one of the layers is actually difficult to attack. This is not the case for perceptual hashes. In fact, in many similar contexts, these adversarial attacks tend to transfer: if they work against one technique or model, they often work against other models as well [3]. In the attack discussed above, the adversary has nearly full control over the "visual derivative", so even a very unsophisticated adversary can subject the target thumbnail itself to the collider before performing the resizing attack, and hope that it transfers against the second hash. If the second hash is a variant of NeuralHash (somewhat likely, it could even be NeuralHash performed on the thumbnail itself; we don't know anything about it!), or if it's a ML model trained on the same or similar datasets (quite likely), or if it's one of the known algorithms (say PhotoDNA) then some amount of transfer is likely to happen. And given an adversary that is going to distribute a large number of photos anyway, a 10% success rate is more than enough. Given the diminished state space (fixed size thumbnails, almost certainly smaller than 64x64 for legal reasons), a 10% success rate is completely plausible even with these naive approaches. An adversary that has some (even very little information) about the second hash algorithm can do much more sophisticated stuff, and perform much better.
But what if we boldly rule out all transfer results? Doesn't Apple keep their algorithm secret?! Can we think of the weights (coefficients) of the second perceptual hash as some kind of secret key in the cryptographical sense? Alas, no. Apple would have to make sure that all the outputs of the secret perceptual hash are kept secret as well. Due to the way perceptual hashing algorithms work, they provide a natural training gradient having access to sufficiently many input-outputs examples is probably enough to train a high-fidelity "clone" that allows one to generate adversarial examples and perform successful preimage attacks even if the weights of the clone are completely different from the secret weights of the original network. This can be done with standard black box techniques [4]. It's much harder (but nowhere near crypto hard, still perfectly plausible) to pull this off when they have access to one bit of output (match or no match). A single compromised Apple employee can gather enough data to do this given the ability to observe some inputs and outputs, even if said employee has no access to the innards or the magic numbers. The hash algorithm is kept secret because if it wasn't, an attack would be completely trivial: but an adversary does not need to learn this secret to mount an effective attack.
These are just two scenarios. There are many others. "Nobody has ever demonstrated such an attack working end-to-end" is not a good defense: it's been two weeks since the system was rolled out, and once an attack is executed, we probably won't learn about it for years to come. But the attacker can be rewarded way before "due process" kicks in: e.g. if a victim ever gets a job where they need to obtain a security clearance, the Background Investigation Process will reveal their "digital footprint", almost certainly including the fact that the NCMEC got a report about them, even if the FBI never followed up on it. That will prevent them from being granted interim determination, and will probably lead to them being denied a security clearance. If you pull off this attack on your political opponents, you can prevent them from getting government jobs, possibly without them ever learning why. And again, this is one single proposed attack. There were at least 6 different attacks proposed by regular HN users in the recent threads!
As a more general observation, cryptography tends to be resistant to attacks only if one can say things such as "the adversary cannot be successful unless they know some piece of information k, and we have very good mathematical reasons (e.g. computational hardness) to believe that they can't learn k". The technology is flawed: even the state-of-the-art in perceptual hashes does not satisfy this criterion. Currently, they are at best technicool gadgets, but layering technicool upon technicool cannot make their system more secure.And Apple's system is a high-profile target if there ever was one.
Barring a major breakthrough in perceptual hashing (one that Apple decided to keep secret and leave out of both whitepapers), the claim that the secret second hash will prevent collision attacks is not justified. The chances of such a secret breakthrough are very slim: it'd be like learning that SpaceX has already built a base on the Moon and has been doing regular supply runs with secret spaceships. Vaguely plausible in theory (SpaceX has people who do rocketry, Apple has people who do cybersecurity), but vanishingly unlikely in practice.
And that's before we mention that the mere existence of the collider made the entire exercise completely pointless: the real pedos can now use the collider to effectively anonymize their CSAM drops, making sure that all of their content collides with innocnent photos, and ensuring that none of the images will be picked up by NeuralHash anyway. For all practical purposes, Apple's CSAM detection is now _only_ an attack vector, and nothing else.
The first half of your post is predicated on it being likely the noise added to generate hash A using the NeuralHash is likely to produce a specific hash B with some unknown perceptual hashing function (which they specifically call out [1] as independent of the NeuralHash function precisely because they don’t want to make this easy, so speculating it might be the NeuralHash run again is incorrect). Hash A is generated via thousands of iterations of an optimization function, guessing and checking to produce a 12 bit number. What shows that same noise would produce an identical match when run through a completely different hashing function that is designed very differently specifically to avoid these attacks? Just one bit of difference will prevent a match. Nothing you’ve linked to would show any likelihood of that being anywhere close to 10 percent.
For the second part, yes if an Apple engineer (that had access to this code) leaked the internal hash function they used or a bunch of example image’s to hash values, that would allow these adversarial attacks.
Until you can show an example or paper where the same adversarial image generates a specific hash value for two unrelated perceptual hash functions, with one being hidden, it is not right to predict a high likelihood of that first scenario being possible.
Here’s a thought exercise, how long would it have taken researches to generate a hash collision with that dog image if the NeuralHash wasn’t public and you received no immediate feedback that you were “right” or getting closer along the way?
> Until you can show an example or paper where the same adversarial image generates a specific hash value for two unrelated perceptual hash functions, with one being hidden, it is not right to predict a high likelihood of that first scenario being possible.
"There is no paper attacking ROT13 done twice, therefore it must be secure". Usually, it's on the one proposing the protocol to make a case for its security. Doubly so when it's supposed to last a long time, a lot of people are interested in attacking it, and successful attacks can put people in harm's way.
You know what, if you think that this is difficult, feel free to pick an existing perceptual hash function H, cough up some money, and we'll announce a modest prize (say $4000) on HN for the first person to have a working collision attack for NeuralHash+H. H will run on a scaled-down thumbnail, and we'll keep the precise identity of the algorithm secret. If the challenge gets any traction, but nobody succeeds within 40 days, I'll pay you $4000 for your effort. If you're right, this should be easy money. (cf SHA1, which lasted 22 years)
Heck If Apple claims that this is difficult (afaict they don't, it would be unwise), they might even join in with their own preimage challenge for $$$. It'd be a no-brainer, a simple and cheap way of generating good publicity.
They claim their H is resistant to adversarial attacks, so they are claiming this to be difficult.
If I took an exact public perceptual hash function implementation and used that as H in your contest, it might be possible for a researcher attacking all public perceptual hash functions to stumble on the right one within 40 days.
I agree with you that we are trusting Apple to implement this competently. This isn’t something that can be proved to work mathematically where nothing about the implementation has to be kept secret.
So worse case everything you say could come true but to imply that is likely is wrong.
This leaves open the question of how the image gets on the device of the victim. You would have to craft a very specific image that the victim is likely to save, and the existence of such a specially crafted file would completely exonerate them.
2. Generate an objectionable image with the same hash as the target's photo. (This is obviously illegal.)
3. Submit the objectionable image to the government database.
Now the target's photo will be flagged until manually reviewed.
This doesn't sound impossible as a targeted attack, and if done on a handful of images that millions of people might have saved (popular memes?) it might even grind the manual reviews to a halt. But maybe I'm not understanding something in this (very bad idea) system.
This requires the attacker handling CSAM which defeats the benefit. The risk in all cases is anytime you actually handle CSAM then the attack is void since you're now actually guilty of the crime and have to do it (very few will cross that line).
The point though is that this is something someone's Apple phone is doing, that their device is not. So the goal is to send a hash collided images by non-Apple channels (email) where there is a reasonably good chance that image would make it's way into someone's global device photo store and into automatic iCloud uploads.
Sending an MMS would work, for example, or a picture to Signal which then someone saves to outside of Signal (a meme).
In all these cases, the original sender doesn't have an Apple device: so they're not getting scanned by the same algorithm, but more importantly their device is not spying on them. Importantly too: they've done nothing illegal.
But: the victim is getting flagged by their own device. And the victim has to have their device seized and analysed to determine (1) that it's not CSAM, (2) that they were sent those images that flagged and aren't trying to divert attention by getting themselves false pinged upfront, but then (3) the sender has committed no crime. There's no reason or even risk to investigate them, because by the time the victim has dealt with law enforcement, it's been established that no one had anything illegal.
It's the digital equivalent of a sock of cat litter testing positive as being methamphetamine, except if it was your drive through McDonald's order.
The goal is not to get convictions, the goal is harrassment.
Perhaps that's true in the narrowest sense, but aren't the odds of generating a colliding file so low as to all but rule out coincidence and therefore strongly indicate premeditated cyber-attack (which is illegal)?
If I were law enforcement, at the very least I'd want to keep tabs on these sources of false positives. Probably easy enough to convince a judge that someone capable of the "tech wizardry" to collide a hash can un-collide one too, and therefore more thorough/invasive search warrants of the source are justified.
Your argument is "the technology is flawed, there let's also arrest anyone who we suspect of generating false positives".
Like security researchers. Or the people currently inspecting the algorithm. And also frankly what are you going to do about overseas adversaries? The most likely people looking at how to exploit this would explicitly be state-sponsored Russian hackers - this is right up the alley of their desire to be able to cause low level chaos without committing to a serious attack.
And at the end of the day you've still succeeded: the point is that by the time you've established it was spurious, the target has already been through the legal wringer. The legal wringer is the point.
None of those thumbnails (or visual derivatives) will match the hash value of the known csam you are trying to simulate since it won’t be possible to know the hash value target since that hash function is private.
Seeing how "well" app review works, I would not be surprised if the "peon" sometimes clicks the wrong button while reviewing, bringing down a world of hurt on some innocent apple user, all triggered by the embedded snitchware running on their local device.
> The end result is that some peon at Apple has to look at the images and mark them as not CSAM
Btw, this reminded me of a podcast about FB's group to do just this. Because it negatively impacted the mental health of those FB employees, they farmed it out to the contractors in other countries. There were interviews with women in the Philippines, and it was having the same impact there.
Some dudette/dude is going to look at my personal pictures every now and then? What if they are of my naked children and what if that person is a csam interested person? And she/he takes a picture of the screen? Ugh it feels so bad!
I don’t want there to be a chance some person is going to look at my pics!
Why would anyone bother calling the cops and telling them that someone they don't like is an imminent threat? The end result is that some officer just has to stop by and see that they aren't actually building bombs. You've cost someone a bit of time, but that's it.