Hacker News new | past | comments | ask | show | jobs | submit login

The NCMEC database that Apple is likely using to match hashes, contains countless non-CSAM pictures that are entirely legal not only in the U.S. but globally.

This should be reason enough for you to not support the idea.

From day 1, it's matching legal images and phoning home about them. Increasing the scope of scanning is barely a slippery slope, they're already beyond the stated scope of the database.




The database seems legally murky. First of all, who would want to actually manually verify that there aren't any images in it that shouldn't be? If the public can even request to see it, which I doubt, would you be added to a watch list of potentially dangerous people or destroy your own reputation? Who adds images to it and where do they get those images from?

My point is that we have no way to verify the database wouldn't be abused or mistaken and a lot of that rests on the fact that CSAM is not something people want to have to encounter, ever.


It’s a database of hashes, not images, though, right? I would argue the hashes absolutely should be public, just as any law should be public (and yes, I am aware of some outrageously brazen exceptions to even that).

Anyone should be able to scan their own library against the database for false positives. “But predators could do this too and then delete anything that matches!” some might say, but in a society founded on the presumption of innocence, that risk is a conscious trade-off we make.


yes, it is a database of hashes but I don't know if the hashes are public information per se although I am sure copies of it are floating around. But I am referring to the images that the hashes are generated from. There is no verification of these that I know of. No one would want to do that and if you did you might be breaking the law.

The law requires companies like Google and Apple to report when they find CSAM and afiact they would generate hashes and add to this database if new material is found.

I don't know if there is any oversight in this. It's all done behind closed doors so you just have to trust that the people creating the hashes aren't doing anything nefarious or mistaken and that's a separate point apart from what others have said on here that you should be able to trust your devices you own to not be informants against you.


Not an American, but shouldn't you be able to FOIA this?


The NCMEC, who manages the CSAM database, is a private organization.


Wait, so two American companies, Apple and NCMEC, are working together to install spyware on all Apple devices world-wide, with no government involvement?



I'm sure they'd argue until they're blue in the face about how it's not spyware, but... yes.


Not quite, they're an NGO and the CyberTipline which it operates (the database Apple will use) was established by S.1738 [PROTECT our Children Act of 2008], and they get an appropriation from Congress to run that and work with law enforcement.

It's kind of like the PCAOB... private 501.3(c) with congressional oversight and funding.

I think the strategy is that the organization is able to do more for helping children internationally if they're not seen as part of the Justice department and the executive, which after the debacle with CBP and "kids in cages", was probably the right call.


Interesting how that shields them from any and all government transparency.


Could you say more about these legal photos? That's a pretty big difference from what I thought was contained in the DB.


If there are pictures recovered alongside CSAM but are not CSAM themselves they can be included in the database.

The thing I can publicly say is that the database is not strictly for illegal or even borderline imagery.

NCMEC try to keep the contents, processes and access to the database under wraps for obvious reasons.


So are we talking about images which are not actually CSAM but a reasonable person would consider to be CSAM if they encountered them? Or is it just the README.txt for whatever popular bittorrent client?


A reasonable person would call it a normal picture if they encountered it in isolation. Like I said, content which is not even borderline are included.

By both a legal and moralistic standard they're not CSAM. Not even nearly.

Of course, this is a minority of the content in the database. But even 1 such image is gross neglect of stated purpose in my book.


But are these photos that are combined in a row with other CP, and thus indicative that if you have this photo, it’s from a CP collection?

Why would I have any content in my phone that would be in that database?


Imagine things like:

- A kitchen with nobody in frame.

- A couch with nobody in frame.

- Outdoor scenery with nobody in frame.

- A bathroom with nobody in frame.

Is it hard to believe you wouldn't download something like this without knowing where it came from?

I'm not talking about borderline stuff. I'm talking about content that has not even a hint of pornography or illegality.


But why am I downloading such things to begin with? Not only do these sound like very boring photos, given their providence I don’t understand this realistic pathway to get onto my phone.


So why is it in there and why do they care?


Why assume competence of a shadowy unelected NGO?


Law enforcement care because they want to get pinged whenever someone shares imagery of interest to an investigation, irrespective of its legality.

A person who creates CSAM likely doesn't just create CSAM all the time, right? Those innocuous pictures get lumped together with illegal content and make it into the database.

The database is a mess, basically. Of course it is. It's gigantic beyond your wildest estimates.


Maybe cropped photos, or innocent frames from videos containing abuse? Not sure what GP is referring to.


I am reminded of an infamous couch and pool that are notorious for appearing in many adult productions... Possibly stock footage of a production room or repeating prop intended for being subsampled for so that multiple or repeated works by the same person or group can be flagged. I recall a person of interest was arrested after of all things posting a completely benign YouTube tutorial video. My thought at the time was likely a prop match to the environment or some such within the video. The method is definitely doable. Partitioned out to every consumer device with unflinching acceptance? Yeahhhh.

Remember, these databases are essentially signature DB's, and there is no guarantee that all hashes are just doing a naive match on the entire file, or that all scans performedare fundamentally the same.

This is why I reject outright the legitimacy of any Client-based CSAM scanners. In a closed source environment, it's yet another blob, therefore an arbitrary code execution vector.

I'm sorry, but in my calculus, I'm not willing to buy into that, even for CSAM. It won't stay just filesystems. It won't stay just hash matching. The fact there's so much secrecy around ways and means implies there's likely dynamicity in what they are looking for, and with the permissions and sensors on a phone that many apps already ask for, my not one inch instincts are sadly firmly engaged with no signs of letting up.

I'm totally behind the fight. I'm not an idiot though, and I know what the road to hell is paved with. Law Enforcement and anti-CSAM agencies are cut a lot of slack, and enjoy a lot of unquestioning acceptance by the populace. In my book, this warrants more scrutiny, and caution not less. The rash of inconvenient people being rather frequently called out as having CSAM found on hard drives in media with no additional context indicates the CSAM definition is being wielded in a manner that produces a great degree of political convenience.

Again, more scrutiny, not less.


Who knows what that code blob is really doing? It's cop spyware. Sometimes cops plant evidence. Maybe courts shouldn't trust it.

In principle, the OS environment could be made independently auditable - keeping undeleteable signed logs.


It's easy to imagine a collection with pictures from numerous sources. Including pictures of adults.


I would imagine it would include things like Nirvana's Nevermind album-cover, or a better example Scorpion's album cover for Virgin Killer.


Til - and both findable via a google imagine search


Why would there be legal images in the db? do you have a source for that?


Can you give more information about this? What kind of legal images might it match?


What about pictures of you own children naked ?


Since this is using a db of known images. I doubt that would be an issue. I believe the idea here is that once police raid an illegal site, they collect all of the images in a db and then want to know a list of every person who had these images saved.


But it said they use a "perceptual hash" - so it's not just looking for 1:1, byte-for-byte copies of specific photos, it's doing some kind of fuzzy matching.

This has me pretty worried - once someone has been tarred with this particular brush, it sticks.


You can’t do a byte-for-byte hash on images because a slight resize or minor edit will dramatically change the hash, without really modifying the image in a meaningful way.

But image hashes are “perceptual” in the sense that the hash changes proportionally with the image. This is how reverse image searching works, and why it works so well.


Sure, I get how it works, but I feel like false positives are inevitable with this approach. That wouldn't necessarily be an issue under normal police circumstances where they have a warrant and a real person reviews things, but it feels really dangerous here. As I mentioned, any accusations along these lines have a habit of sticking, regardless of reality - indeed, irrational FUD around the Big Three (terrorism, paedophilia and organised crime) is the only reason Apple are getting a pass for this.


There is also a number of flagged pictures to reach before an individual is actually classified as a "positive" match.

It is claimed that the chance of being a false-positive for a positive match is one out of a trillion.

> Apple says this process is more privacy mindful than scanning files in the cloud as NeuralHash only searches for known and not new child abuse imagery. Apple said that there is a one in one trillion chance of a false positive.

https://techcrunch.com/2021/08/05/apple-icloud-photos-scanni...


This isn't CSAM or illegal, nor would it ever end up in a database. Speaking generally, content has to be sexualized or have a sexual purpose to be illegal. Simple nudity does not count inherently.


> Simple nudity does not count inherently.

That’s not entirely true. If a police officer finds you in possession of a quantity of CP, especially of multiple different children, you’ll at least be brought in for questioning if not arrested/tried/convicted, whether the images were sexualized or not.

> nor would it ever end up in a database

That’s a bold blanket statement coming from someone who correctly argued that NCMEC’s database has issues (as in I know your previous claim is true because I’ve seen false positives for completely innocent images, both legally and morally). That said, with the amount of photos accidentally shared online (or hacked), to say that GP’s scenario can not ever end up in a database seems a bit off the mark. It’s very unlikely as sibling commenter said, but still possible.


>That’s not entirely true

That's why I said it's not inherently illegal. Of course, if you have a folder called "porn" that is full of naked children it modifies the context and therefore the classification. But, if it's in a folder called "Beach Holiday 2019", it's not illegal nor really morally a problem. I'm dramatically over-simplifying of course. "It depends" all the way down.

>That’s a bold blanket statement

You're right, I shouldn't have been so broad. It's possible but unlikely, especially if it's not shared on social media.

It reinforces my original point however, because I can easily see a case where there's a totally voluntary nudist family who posts to social media getting caught up in a damaging investigation because of this. If their pictures end up in the possession of unsavory people and gets lumped into NCMEC's database then it's entirely possible they get flagged dozens or hundreds of times and get referred to police. Edge case, but a family is still destroyed over it. Some wrongfully accused people have their names tarnished permanently.

This kind of policy will lead to innocent people getting dragged through the mud. For that reason alone, this is a bad idea.


> But, if it's in a folder called "Beach Holiday 2019", it's not illegal nor really morally a problem.

With all due respect, please please stop making broad blanket statements like this. I'm far from a LEO/lawyer, yet I can think of at least a dozen ways a folder named that could be illegal and/or immoral.

> This kind of policy will lead to innocent people getting dragged through the mud. For that reason alone, this is a bad idea.

Completely agree.


I believe both you and the other poster, but I still haven't seen anyone give an example of a false positive match they've observed. Was it an actual image of a person? Were they clothed? etc.

It's very concerning if the fuzzy hash is too fuzzy, but I'm curious to know just how fuzzy it is.


> Was it an actual image of a person? Were they clothed?

Some of the false positives were of people, others weren’t. It’s not that the hashing function itself was problematic, but that the database of hashes had hashes which weren’t of CP content, as the chance of a collision was way lower than the false positive rate (my guess is it was “data entry” type mistakes by NCMEC, but I have no proof to back up that theory). I made it a point to never personally see any content which matched against NCMEC’s database until it was deemed “safe” as I didn’t want anything to do with it (both from a disgusted perspective and also from a legal risk perspective), but I had coworkers who had to investigate every match and I felt so bad for them.

In the case of PhotoDNA, the hash is conceptually similar to an MD5 or a SHA1 hash of the file. The difference between PhotoDNA and your normal hash functions is that it’s not an exact hash of the raw bytes, but rather more like the “visual representation” of the image. When we were doing the initial implementation / rollout (I think late 2013ish), I did a bunch of testing to see how much I could vary a test image and have the hash be the same as I was curious. Resizes or crops (unless drastic) would almost always come back within the fuzziness window we were using. Overlaying some text or a basic shape (like a frame) would also often match. I then used photoshop to tweak color/contrast/white balance/brightness/etc and that’s where it started getting hit or miss.


There are examples (from the OP, but in reply tweets) in the submission.


Unless I'm missing something, those are just theoretical examples of how one could potentially deliberately try to find hash collisions, using a different, simpler perceptual hash function: https://twitter.com/matthew_d_green/status/14230842449522892...

So, it's theoretical, it's a different algorithm, and it's a case where someone is specifically trying to find collisions via machine learning. (Perhaps by "reversing" the hash back to something similar to the original content.)

The two above posters claim that they saw cases where there was a false positive match from the actual official CSAM hash algorithm on some benign files that happened to be on a hard drive; not something deliberately crafted to collide with any hashes.


You're not missing something, but you're not likely to get real examples because as I understand it the algorithm and database are private, the posters above are just guardedly commenting with (claimed) insider knowledge, they're not likely to want to leak examples (and not just that it's private, but with the supposed contents.. Would you really want to be the one saying 'but it isn't, look'? Would you trust someone who did, and follow such a link to see for yourself?)


To be clear, I definitely didn't want examples in terms of links to the actual content. Just a general description. Like, was a beach ball misclassified as a heinous crime, or was it perfectly legal consensual porn with adults that was misclassified, or was it something that even a human could potentially mistake for CSAM. Or something else entirely.

I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that. But I also think that information is very important if they're trying to argue a side of the debate.


> Just a general description.

I gave that above in a sibling thread.

> I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that.

In my case, it’s been 7 years so I’m not confident enough of my memory to give a detail description of each false positive. All I can say is that the photos that were false positive that included people were either very obviously fully clothed and doing something normal, or the photo was of something completely innocuous all together (I seem to remember an example of the latter was the Windows XP green field stock desktop wallpaper, but I’m not positive on that).


NCMEC is an private organization created by the U.S. Government, funded by the U.S. Government, operates with no constitutional scrutiny, operates with no oversight / accountability, could be prodded by the U.S. Government, and they tell you to "trust them".


To be fair the Twitter thread says (emphasis mine) "These tools will allow Apple to scan your iPhone photos for photos that match a specific perceptual hash, and report them to Apple servers if too many appear."

I don't know what the cutoff is, but it doesn't sound like they believe that possession of a single photo in the database is inherently illegal. That doesn't mean this is overall a good idea. It simply weakens your specific argument about occasional false positives.


I hardly trust a cutoff number to be the arbiter of privacy or justice




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: