Reading carefully through the paper, an important part of their calculation for the "one in a trillion" claim seems to rest on the cryptographic threshold approach they are using. In particular, it seems likely to me that the number matches required for your account to be flagged is relatively high (perhaps a dozen). If that is the case, their hash collision likelihood could be "only" 1 in a million, but it would still be vanishingly unlikely for a typical iCloud user to get a dozen false positives. 1e-6 is _much_ more testable than 1e-12 for the perceptual hashing, and the cryptographic parts of the secret sharing are easy to analyze mathematically.
As a disclaimer, I haven't done the actual math here. This also implies that the risk of your account getting flagged falsely is tightly related to how many images you upload.
You're assuming that perceptual hashes are uniformly distributed, but that's not the case. If I post a picture of my kid at the beach I'm far, far more likely to generate perceptual hashes closer to the threshold. Not to mention intimate photos of/with my partner.
Good point about the possibility of capturing a bunch of distinct photos with the same perceptual hash, either by taking a burst of photos or by editing one photo a bunch of times. I guess a better implementation would never upload two different encryption keys for the same perceptual hash and just send dummy data instead, but I haven't seen any indication that they actually do that.
yep. what if i take a burst of 12 photos that all incorrectly fall as a false positive to NeuralHash (which is a ML black box), and an Apple reviewer is now invading my privacy by looking at my photo library?
As a disclaimer, I haven't done the actual math here. This also implies that the risk of your account getting flagged falsely is tightly related to how many images you upload.