Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How do I keep child porn out of my site?
252 points by VexedSiteOwner on July 11, 2015 | hide | past | favorite | 186 comments
(Pardon this disturbing subject interfering with your Friday night rest and my (very necessary) throw away account.)

A year or two ago I started an image sharing site that's been modestly successful in terms of traffic (a blessing). No money or fame, but it's nice to see movement.

I try to filter user uploads to at least classify the sexual stuff (80% of it) as nsfw and feature good stuff on the homepage. This is excruciatingly time consuming with 6000 galleries posted per day, but I suffer through it as I can.

Sadly, I've noticed a huge amount of extremely taboo photos on the site. From rape and bdsm, which I can kind of tolerate, all the way to extreme child porn. The latter is extremely disturbing.

Amazingly, these people post this openly.

I never see the press talking about the nsfw side of Youtube, Tumblr, Reddit, Imgur, and others. How do those sites deal with this problem? What kind of content filtering systems do they use to keep the visible parts of the site clean? How many interns are flagging photos all day long? Is it wise to allow these pages to be indexed? What's my legal burden under Safe Harbor?

And.. more importantly.. how does the organic traffic in the nsfw sections play into the strategy of these huge user-generated content sites.

NB. I've attempted to build user profiles and a kind of self-moderation system, akin to how Reddit flagging works, but my users seem to be mostly interested in "one thing," and no community-focused members have emerged so far. I still have hope, but need a solution that I can use now.



If you're in the United States you should call the National Center for Missing & Exploited Children[1]. They already work with internet service providers to help identify unencrypted images depicting abuse transported over their network. They do this, I think, at an automated level. They should have the information you need. You should probably also call the FBI.

http://www.missingkids.com/Contact


But, should he/she contact legal counsel prior to contacting the FBI or anyone else? Personally, I think I would want to understand my potential culpability and other factors here.


You should definitely consult legal counsel before and during talking to the authorities (which you should also do). The laws surrounding CP in particular are outdated and do not fit well into the digital world. For example, simply looking at CP can be a crime, which can make it difficult to report unless you know the right words to say. Always consult counsel in these cases.


Definitely talk to legal counsel, but I can already tell you what the FBI told me when I asked them about this exact situation in a hypothetical:

"You didn't ask for it or seek it out did you? Someone else uploaded it to your server and you don't want it? Report it to us, then delete it once we've collected our evidence."

That's probably representative of the average agent's disposition, but make sure your ass is covered first.


> Report it to us, then delete it once we've collected our evidence.

This seems very ominous.


Why does that seem ominous? Genuine question. What would be a non-ominous response from the FBI in this situation?


Part of what makes it ominous is that the agent is too casually requesting that the host open the gates, and suggesting that the host has zero-risk, simply because he/she states his/her innocence.

This, when it seems pretty obvious that they'd have to do some investigation of the host, if only to rule out his/her degree of involvement.


Yes, you're right, like I said in my other reply, I'd assumed the parent poster's engagement with the agent had been trimmed down for the sake of the story, that there was more substance they left out because that was the overall thrust of the story.

And I think you're also right that they'd do some investigation, but I imagine the investigation would be over very quickly. This "people are uploading filth to my website" situation isn't so uncommon like it once was (back in my day, sonny!!).

The chance that that individual is trying to buddy up to the FBI in order to escape detection is more of a Hollywood fantasy than how real life would play out. These agents are human beings too, and they know that if someone's coming to them to ask for help, they're all on the same page.


>I imagine the investigation would be over very quickly

You're most likely correct. I'd add that even with the agents being human beings, there may still be some protocol that they are compelled to follow in vetting the host. I'd want to know that going in.

That's still not to say that this would end badly for the host. It's just that, given that the tone of the response is correct (even if shortened) , there is clearly more involved. And, when you have a representative of an agency with pretty broad powers, deep resources, and potential mandates soft-pedaling what's at stake, it can have a pretty ominous feel to it.


Something written from the FBI, like a receipt that the FBI has received the evidence and it is therefore of no consequence if the files are deleted. Something that will keep him out of jail if by chance another unrelated law enforcement agency happened to be investigating and became upset or suspicious because evidence was disappearing.


Ok, fair enough. I assumed that that was a somewhat trimmed-down version of the actual discussion, for the sake of telling their story briefly.

I didn't imagine that the FBI would literally have a 30-second conversation with someone who claimed to have child porn in their possession, with no actual follow-up or action steps discussed.

But I can see how that would seem problematic if you did take it literally.


"Collecting evidence" seems extremely broad. What is the evidence collection process? Access granted to servers, wire sniffing, seizing of hardware? How long will that process take? What recourse is available should the FBI seize hardware?

I know the parent commenter said they would speak with a lawyer, I just wouldn't take comfort in a casual remark by an FBI agent.


not to me. ominous would be "we'll delete it once we've colected our evidence."


Never never trust the FBI. That's how they nailed DotCom, instructing him to let them collect 'evidence' against someone else, then they used it against him. Seems like you're in a hornet's nest, and you're not even making any money for your trouble.


>This seems very ominous.

It does. Likewise with the casual questioning of guilt and suggestion that his/her answers will simply be taken at face value.


Why would they self report it then?


You sound like someone that expects human beings in positions of authority to operate rationally.


Guilty people also self-report crimes to which they are obviously close and for which they assume they'd fall under suspicion anyway. A fairly heavily trafficked, public website featuring CP images may well already be under investigation.

In any case, self-reporting does not absolve one of suspicion; else it'd be easy to get away with pretty much anything.


One might self-report if they were involved, suspected they were under investigation, and wanted to give the appearance that their involvement was unwitting.


Exactly. Beyond my own liability, the other question I would want answered is whether I could potentially be compelled to cooperate in some long-term investigation. If so, then what could that mean in terms of time and expense, and is it worth it?


The alternatives are

- shut it down right now

- take a 'wait and see' approach. Then one day during the course of a bigger investigation they find that your server is hosting CP. Also, you apparently knew about it and didn't do anything (admittedly proving this will be quite difficult and unlikely, but still). In that case, they'll come down on you like a hammer.

Better be proactive. And if you're paranoid about the feds jailing an infrastructure provider who actively came to them asking for help (do you have any examples of where this happened? even just an investigation?), then all you have left is option 1.


The alternative I'm suggesting is to retain legal counsel to determine what my actual alternatives are and their associated potential costs/risks.


You are not seriously asking whether it's worth it to help law enforcement stop child abuse?


Yes, I am.

Volunteering to help stop child abuse and being compelled to participate in an investigation of unknown depth, breadth, duration, and resource burden (time, money, etc.) to you are two completely different things.

If you've never been involved in litigation or other legal situation wherein you couldn't just stop the process whenever you chose to, it might be more difficult to imagine the stress and costs involved, as well as the loss of control over one's own life.

It's nice to think that it's worth it at any cost (and at the sacrifice of one's other life responsibilities). But, of course, given that one can volunteer to make such a sacrifice without waiting to be compelled by a police investigation, then anyone who has not already chosen to do so might be wise to consider whether it's really a manner in which they can afford to help.

(EDIT: conciseness)


> "potentially be compelled to cooperate"

In the US, you're likely to be left alone with all the associated costs. Help the cops all right, but if I do their work for them, I don't want to bear all the costs.


Talking to cops is a bad idea. I'd only do that if I had to and even then I'd minimize the exposure: https://medium.com/human-parts/good-samaritan-backfire-9f53e...

Also, the abuse already happened, you are only stopping the dumber CP collectors from sharing images of it.


That abuse has already happened sure but it will probably continue. You want to follow any trace you can find to suppliers. Shutting down demand might also help in eliminating any economic incentives that might exist on the supply side.


> You are not seriously asking whether it's worth it to help law enforcement stop child abuse?

How about you donate all your time, money and resources stopping child abuse.


Nobody is talking about donating all time, money and resources of anyone towards that goal.

In any case if you create a platform (you possibly profit from) that is used to distribute child pornography you are faced with restrictions that the rest of the public understandbly isn't.

I see no reason OP shouldn't be legally compelled to cooperate in an investigation in at least the same way a witness can be.


> Nobody is talking about donating all time, money and resources of anyone towards that goal.

That is exactly what you implied by mocking the fact that the OP even asked the question.

> I see no reason OP shouldn't be legally compelled to cooperate in an investigation in at least the same way a witness can be.

This is irrelevant to the question of whether the OP should look out for personal interests. You implied that one should not be so selfish as to even ask the questions about personal risks and costs. You mocked the very idea that he might ask such a question.

Before you make such a callous and judgmental comment, you should really think about what you would yourself sacrifice to cooperate with law enforcement. If you had, you would not have been so myopic. You would't question the seriousness of the question, even if you still thought that cooperation was necessary.


You say that as if the cost of helping is near zero. What if authorities decided that your hobby project's server was interesting to their investigation, and subsequently showed up at your home with a warrant to seize every electronic device in your home/business, including the server that hosts your business, as well as unrelated things like cell phones, video game consoles, etc?


So instead of going to the authorities directly, describing the situation and offering to work together you propose sticking your head in the sand, trying to deal with the problem on your own and hoping the authorities won't ever come across child pornography on your site?

Especially at a scale where you need automated systems to deal with the problem, law enforcement will inevitably notice sooner or later. I can't help but fell that it's not going to go over well with them (and it shouldn't), if they notice you deleted that content and possibly destroyed evidence in the process.

Technology companies and law enforcement have cooperated on this issue for a long time very successfully. They have experienced people working on nothing but this kind of thing and you're not going to deal with some local low level idiot that barely manages to deal with noise complaints. There is no reason to be paranoid and to believe they are going to act stupid.


Is there an automatic script to thumbnail images? That simply multiplies the number of problem images in your data store.


You should report any child porn to the CyberTipline, run by NCMEC: https://report.cybertip.org/index.htm

NCMEC has protocols around how to report the images/video, and how to delete it on your end.

I would highly recommend against calling the FBI. You should work with NCMEC, as they have experience working with this stuff and their CyberTipline is one of the major ways that Congress has mandated that online service providers should report this stuff. Plus talking to law enforcement employed by the federal government has a host of risks associated with it:

https://en.wikipedia.org/wiki/Making_false_statements


Microsoft made an automated system (PhotoDNA) for detecting known child pornography images available to the public a few years ago and it's probably a good starting point: http://www.microsoft.com/en-us/PhotoDNA/

Hopefully this can help you.

(Disclosure: I work at Microsoft but not on PhotoDNA.)


PhotoDNA is the gold standard for this. I tried to get access to this via the NCMEC to use with Neocities, but the process was, frankly, very convoluted. I signed at least 10 forms and still didn't end up getting what I needed.

I'm happy that Microsoft is providing this as a free service. It's going to be a lot less painful for me to use it than to figure out how to run my own (or in this case, figure out how to even get it).


Update: I just tried to get it to work and surprise! It doesn't work.

Somebody please just give me access to the PhotoDNA code, the hashes, and a little funding. I'll make an API anybody can use for this. It's ridiculous how hard it is to do this. It's still easier for people to get spam IP lists than to see if CP is being uploaded to their servers. You can't just have it available for Facebook and Google or it doesn't work, you need to make it available to everybody in an easy, simple way.

Seriously, if you are connected with this at all or want to fund this work please email me, I am more than happy to work on improving this: kyle@neocities.org.


Contact this person. He manages Microsoft's Safety Services division, which includes PhotoDNA.

https://www.linkedin.com/pub/john-scarrow/6/2b8/354


Would you like to describe what did not work? What was your experience? Please blog about this!


On the other hand, if everyone uses the same non-transparent list of magic hashes to ban hosting images then censorship potentially becomes a concern.


If non-CP images start being blocked by this system, some gallery author is going to notice it pretty quickly and report it to the website owner. This censorship problem is remarkably easy to destroy the trust of the CP filter and I doubt people at Microsoft would fail to predict this.


Yahoo, as well as Tumblr, uses an internal service that is synced with PhotoDNA. Content is also filtered by outsourced help as well. There is some company that employs people in some Southeast Asia country which provide this service to many Internet companies (including Yahoo). (I work at Yahoo and discussed this with the team here).


what company? Please ask for the name. Thanks!



Do you know any other software like this that is open-source? I posted a thread a few days ago about image comparing software like google reverse image search. Similar to OP, I wanted to index a few popular image boards and make sure that no one had tried to post unauthorized photos on them.

edit: I should clarify not child porn, but personal photos such as instagram and facebook which are private/semi-private and then are posted to public forums.


There's a system called IQDB used for various 'booru' websites. It's open sourced and available here: http://iqdb.org/code/

Really though it's not too hard to whip something up yourself. I did it for a bunch of those 'booru' sites (roughly 3 million images) like this:

- Find image hashing library (I used https://github.com/JohannesBuchner/imagehash but there's a nice series of articles here http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-I... if you want to implement your own)

- Build a database of image hashes using said library

- Use an algorithm that allows you to lookup hashes by distance. In the case of hamming distance (used by many image hashes) you can just throw them in MySQL. You could also use any of the nearest neighbours search algorithms like k Nearest Neighbours or locality sensitive hashing (you'd want one of these for larger datasets)


Why does one need to sign in to azure to help fight CP? Why are the the hashes not available via a public API so that any webmaster could just use it right now? Please explain.


Why would it even need an API? Just provide the hashes.

What are they afraid of? That "pedophile hackers" will be able to reverse the hashes and get the images?


The service is more than a hash matching service. It hashes different regions of the image, allowing it to match images that have been altered.

Authenticating access to the service is desirable for many obvious reasons.


>Authenticating access to the service is desirable for many obvious reasons.

Help me out with the obviousness please? Are those reasons more important than deleting child pornography from the web?


Off the top of my head: DoS, providing perverts with confirmation that an image is what they want it to be, giving organized groups intel that images are known to law enforcement, etc.


Perhaps to avoid people figuring out how to evade it?


Just compressing the images into an archive, if not encrypting them, is enough to evade such a filter.

There are a lot more legitimate uses for a public repository of CP hashes along with free software for verifying them locally. Not only entrepreneurs and online community operators who don't want the stuff on display, but also users of poorly moderated online communities who don't want the stuff in their browser caches.


One vendor selling an automatic child porn filter using data from INTERPOL is https://www.netclean.com/ It is also using Microsoft photoDNA technology.


Interesting. I had not known about that service and it's cool that it's free (though I'm not sure if it's always free; it says to qualified applicants).

I am curious how one goes about developing a service like this without having to see child porn itself. Is there a database somewhere with known hashes? I'm assuming there would have to be along with a way to generate hashes yourself so you could test as I couldn't imagine running automated unit testing with real child porn.


AFAIK the database of hashes is maintained by the National Center for Missing and Exploited Children, who unfortunately do have to deal with the disheartening task of viewing some of that stuff.


I was going to suggest a convolutional NN, but you'd need to go through the gruelling task of creating the training corpus.


Are you based in the US? There's a good chance you are required by law to report images of apparent child pornography. You should talk to a lawyer.

https://www.law.cornell.edu/uscode/text/18/2258A

http://www.ncsl.org/research/telecommunications-and-informat...


You may be interested in this article:

The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed http://www.wired.com/2014/10/content-moderation/


I came here to share this article.

Anyway, I think your best bet is to outsource this kind of work to the sort of company described in the article. It seems to be a regrettable necessity for any sizable user-generated content site.

Also, of course, please try and get in touch with the relevant authorities mentioned in other comments and assist their efforts in tracking users who try distributing that kind of... content.


By default, any site that allows users to share content will devolve toward an attractive nuisance [0]. Like any security issue, passive measures are a Maginot Line awaiting blitzkrieg, even all the resources of a Google or Facebook aren't enough to automate all these things...they depend on communities to report issues [e.g. webmasters for Google]. And that's the only defense in depth: community.

"Everybody who signs up" isn't a community. There has to be some higher order interest...and what you're finding is that unfortunately the higher order interest of the community for your site is child porn.

There's no fixing DNS. If child porn is not what you want, your site is broken. Shut it down. The sort of users you want don't either don't care enough to keep out the bad or are overwhelmed by it's volume just as you are. They are or will be moving on. You have my sympathies.

Yeah it sucks but you have learned some things:

  1. Community is the hard part.
  2. Technology is necessary but not sufficient.
  3. You can build something that scales to the point where
     it becomes useful to a community.
Consider this version 0.1. You've gotten feedback and that says that the product (not the code) has failed by your definition of "fail" because it has not attracted the market segment you want. You have a platform from which to relaunch.

Good luck.

[0] https://en.wikipedia.org/wiki/Attractive_nuisance_doctrine


Until there is a Machine Learning algorithm that can detect CP, you'll have to have human beings flag it and then other human beings view it and remove it.

Someone brought it to my attention that Bing's cache is full of CP, after the offending websites are taken down, Bing keeps the images for a long time. The Rapidshare sites are also full of it and they password protect RAR files so admins cannot peak into it. It is a major problem that has no solution for it yet. People run Wordpress blogs and spambots leave comments that link to CP sites.

This has become a hot topic issue because that Jared guy from Subway had a manager of his foundation that was found with CP, and they raided Jared's computers and found more evidence.

My ethics and morals won't allow me to look at porn, but it is a big industry. There are all kinds of porn out there. The CP is the worst of it, and a lot of children are trafficked as sex slaves for it. They grow up with a criminal record and sex offender record, and by the time they expunge the record they are in their 40s and can't find work. I was contacted by a woman who was in that situation on Github during the Opal CoC debates. She is trying to get out of her situation by programming and cannot find work because of it.

This CP stuff ruins the lives of the children who suffer abuses for it. Once they grow up they have a hard time in life trying to make ends meet. Some have serious psychological problems that are hard to treat and deal with.

I remember that in some cases the website is found responsible for the content that users post on their websites. Laws in your nation may vary on that. If you find illegal content you should remove it, least you be found liable for it. Make sure to report the IP address of the poster to the government or a non government agency that handles it.


Are there any good ML algorithms for detecting porn at all? I tried to implement the standard "pink detector" with mixed results.


No. I looked into this a lot for a dating app I ran and no algorithms came close to human moderation, even for images you'd consider are obviously pornographic, which can get expensive.

A funny idea I had was to reverse the whole system - feed UGC content of a site that's supposed to be SFW into a porn site which is definitely NSFW, one that has lots of thumbnails. The ones that don't get any clicks to enlarge probably aren't porn and can pass the test :)


That's brilliant!


Until you go searching for porn and every other picture is corn on the cob or doorknobs.



ML is very difficult for porn. You end up with so many false positives it almost becomes useless compared to human moderation. "Porn" is also a very context sensitive term. What is porn vs. nude art? Is it still porn if the people in it aren't "pink"? What about cultural differences? What is considered inappropriate in the US may not be inappropriate in Europe, etc. How much skin has to be shown for it to be porn? There are so many questions. I'm sure detecting porn accurately will be one of the hardest problems we'll overcome in computer vision, as outrageous as that sounds - because of the level of context that is required.

See also: SFW porn - where other images are super imposed on top of real porn. It's hilarious, but is it really SFW? You decide. http://www.reddit.com/r/sfwporn


ML is very limited. It tagged African-American people as Gorillas for example. White people it tagged as dogs or goats. The African-american people were more outraged because being called a gorilla is racist. Even if the ML doesn't know what race is. Facial recognition is also hard, the ML confuses some people for others who have similar faces but are different people. ML just isn't at the level where it is reliable enough to do what we ask of it yet.



You'd think HN would spot a market opportunity like that and exploit it. Good programmers with unfair criminal records at below market rates?


Your safest bet is running a system where you have no way of knowing what users upload. Depending on jurisdiction, reviewing and moderating content may increase your civil and/or criminal liability. There's typically a "safe harbor" for service providers. You just need to respond to LEA and DMCA takedown requests.

Edit: Other advantages: 1) you never risk viewing stuff that you can't unsee; and 2) you outsource content review to concerned users and other third parties.


This is bullshit. http://www.niemanlab.org/2009/01/david-ardia-why-news-orgs-c...

> They say “the lawyers” tell them they can’t edit out an obscenity or remove a rude or abusive post without bringing massive legal liability upon themselves [...] That’s not true, and hasn’t been true since 1996.


I'm not saying that the site can't remove CP. But I was arguing that it's safest to base decisions about removal on input from users and third parties. PhotoDNA etc would also be good resources. The site operator should be responsible for the algorithm, and not for particular outcomes.


Safest in what regard? If you argue on Chapter 110 grounds then I will tell you I am not a lawyer but my understanding is that you need to report CP (2258A) but beyond that you are covered (2258B). If you argue on free speech grounds then parent post applies.


Yes, one would need to take steps to identify CP, and then to report and remove it. But the law is unsettled. It would seem best to be plausibly proactive, and yet to distance oneself from particular choices. But hey, I'm not a lawyer either. Maybe I'm too immersed in the world of VPN services and Tor, where providers avoid case-by-case filtering.


Wouldn't this result in providing a service for pedophiles or sick peoples to propagate and amplify their voice ?


As long as reported images were removed, it wouldn't.


This is the best answer so far. While some jurisdictions protect web admins from the actions of their users, others don't. So if you don't want to be extradited to some fascist state, you had better make sure that you can prove you have no ability to moderate or even know what content is being uploaded.


Doesn't there need to be a "report abuse" button on each image? Users could report it and only then would he use something like Microsoft's PhotoDNA (which an user mentioned above).


Only a qualified lawyer can say.

But then what? Look at positives, and decide? Or just forward all positives to LEA, let them decide, and nuke what they indicate? LEA probably wouldn't like that.


A user above mention Microsoft's PhotoDNA. He could run the flagged images through that first and if it returns nothing maybe look at it.

Or don't look at it until a certain number of users flag it but still run it through the PhotoDNA. Now I am curious how imgur handles this problem.


Or use all of those CP databases, and just report images flagged by some subset of them.

I am curious how this gets handled. Having been goatse'd a few times with CP, I cannot imagine reviewing that crap on a regular basis.


I used to work for an also-ran social network (20m users), and this was a big problem for them too, particularly when they found that they were a popular option for sharing CP. When I say a big problem, it's really an existential threat for any kind of user generated content sharing site.

Yes, you need a way of finding and flagging this stuff. Algorithms help, but people always need to be involved, and that's problematic. It can be hard to find people that want to be exposed to this material as their full-time job, and it's a liability headache. Even if some employees are ok with being exposed to it as part of their jobs, other employees might have a legitimate expectation of not having to be exposed at their workpace, and it's difficult to contain.

Yes, you will need to develop a relationship with law enforcement. They have a number of programs for submitting evidence, they're actually quite easy-to-use, and they are cooperative if you follow their rules. Even so, it's time consuming, and if you don't maintain a good relationship and comply fully, then you can become a target for enforcement.

You say you've become moderately successful in terms of traffic, but there's a big proportion of dubious content. Frankly, this means that certain people have noticed that your site is not as good at identifying, flagging, and reporting this content, so they're gravitating to you, having been kicked out of facebook, etc. That's fine in the short term, in the long term it's unsustainable from a business and legal perspective. Either you'll need to devote more resources to fighting this (instead of development, marketing, more interesting things), and find a way to attract more legitimate users, or you will become the next attractive target for legal issues.

This is not a simple problem that can be solved with mechanical turk, an algorithm, etc. It's a never-ending game of cat and mouse, walls and ladders, and a fundamental problem to be dealt with on any site that allows sharing. It's not just sexual stuff, there's also copyright - the music and movie industries are pretty keen about finding targets too.

It might be feasible to compete with facebook on product, or popularity with niche audiences, but competing with them on their ability to keep bad content off their site so that it's palatable for a wide audience is a lot harder. That's their core business, and they employ a lot of humans to make it work.


This is just an idea, since I never built such filter, but you could automate a large part of filtering NSFW images. A quick search on google lead to this paper : http://cs229.stanford.edu/proj2005/HabisKrsmanovic-ExplicitI... Once you have that in place, I guess it's better to make it agressive and report false positive as NSFW.

Google "safe image search" has the additional help of searching the content of the page the image is used. You might be able to do the same, up to some limit, by checking the http referer header field to know where requests are coming from. You could scan the referer's page for some keywords. This might give you a better idea of the context where the image is used. Note that this might be tricky, since you probably don't want traffic coming out of your server to some child porn site.

That said, those are just some ideas. Youtube has a good community that flags videos, but also an army of reviewer that look at the flagged content.

http://mobile.nytimes.com/2010/07/19/technology/19screen.htm...

Another way to look at it would be to try to manually select some images as "front page worthy", instead of trying to filter the bad stuff.


Other posters are correct; you are obligated under US law to report child porn to the NCMEC CyberTipline.

The fine for not complying started off fairly low, but has been increased in subsequent legislation. In my experience though, NCMEC is mostly just interested in getting regular reports uploaded to their system. I met with them once, and they have a rough sense for how many reports should be sent over for a site of a certain activity/traffic level, and if the number of repots is zero... then they know you're not in compliance.

Their reporting interface is beyond awful though. Maybe they've improved it in recent years; when I last saw it, everything had to be uploaded and reported manually.


Would it be enough to report content flagged by other users as CP? Is there a requirement to review content before submission? That's not something I'd ever want to do. I don't think that I'd want to pay someone else to do it either.


As I understand it, there's no requirement to review content before submission. My understanding is that once someone has flagged or reported it to you, you're mandated to report it.


As a father this horrifies me. If this were my site, hobby or not I would spend a great deal of time implementing a system to report to the authorities. Just imagine if you did manage to help/save just one kid how good that would feel.

The PhotoDNA API looks absolutely brilliant. That is one reason why I love Bill Gates he always (or a lot of the time) gets involved with projects that truly help people.

Don't think of it as a problem, it is an opportunity to help a child or a parent that may not know a relative, teacher, or stranger is hurting their child.

Many times these crimes are committed by loved ones and the children are not abducted, they are lured / tricked by people near and dear to them.


In all likelihood the photos are just stuff people copied from around the internet and not from the original abusers.


Yes. Based on what I've read, abusers typically trade with other abusers in more private ways.


So you catch these guys and follow the trail from there. Just because you don't reach your goal with your first step doesn't mean that it's not worth making that step at all.


What are you attempting to say, that the images shouldn't be filtered because they aren't from the original source?


I think the implication was that the uploaded are not the perpetrators, so reporting them is unlikely to reduce abuse. I'm not sure I buy that line of thinking. I suspect plenty of perpetrators upload in public forums.


very true, but there is still the possibility that it was copied recently and the authorities are not aware of the source. even more reason to report these people so that the authorities can reduce the amount of "hops" to find the origin.


From the experience of running a similar site.

1. Monitor only most-viewed pages as 99% of images nobody will never see again, not the uploader nor the law agencies. The page must have some traffic to be discovered. Just make a page "top 200 today" and have a look from time to time.

2. "report nsfw" button does not work. The pedophiles do not report, the rest have no chance to hit the pedo-page.

3. Almost all the pedo-uploaders use Tor. Check how many non-pedophiles use Tor and consider to block IPs of exit nodes (or make Tor-uploaded images initially hidden until reviewed).

4. Own your IP address or setup a relationship with the owner of your site's IP. Law enforcers send email to them (or to you and CC: to them). If "whois $YOURIP" shows not your email but for example abuse@digitalocean.com or abuse@azure.com then your server have a good chance to be disconnected hours before you would know why).

5. About the big players - at least Twitter has a lot of pedo-content (my service is screenshot-oriented and I have seen many screenshots of Twitter pages with CP). "how does the organic traffic in the nsfw sections play into the strategy of these huge user-generated content sites" - very good question, I would like to know as well.

6. About the advice from comments to check images in the porn context. It does not work. SFW-images are very clickable being surrounded by NSWF (think of a portrait of a celebrity in the context).

PS. My advices may look as semi-measures, but they provide the same level of quality as Machine Learning or Mechanical Turk solutions (which are not endloesungen as well) for lower price.


If there isn't already, maybe there should be some kind of public perceptual hash database (http://www.phash.org/) for this kind of stuff.


Such databases exist, but I don't think there's any that is publicly available. https://wiki.openrightsgroup.org/wiki/Indecent_image_identif... lists some, there's more.


There is a government ran database of all known child exploitation images. For obvious reasons you need to show a reasonable need for access. Contact the National Center for Missing and Exploited Children.


No need to apologize, this is not only an interesting topic and problem, but also a very good discussion.

Thanks for bringing that up, might it be ok if I use your question to build around it and see what it is like for non US-websites?


There is already some interesting solutions posted here. If you wanted to try and tackle the issue with a stop-gap in the meantime you could add an image hashing step in the upload process to identify images that have already been flagged as NSFW or worse.

dHash is fairly simple to implement, and you might even be able to offload the hash checking at the database level. Comparing dHash's is just a matter of AND'ing the two hashes and counting the number of bits.

Obviously as the sample size increases so will the computation time. You could help the process by prioritising checks against new accounts, certain IP ranges (if you're seeing more or less content of a certain type from different countries or VPN providers) or if an account has a history of uploads in the past.

Its a horrible problem to have. Best of luck!


Not sure how useful this would be, but the first thing that came to mind is CrowdFlower[0].

[0] https://www.crowdflower.com/type-content-moderation


At least with reddit, there's community moderation (read free employees) which enforces the contents of each section.

is there any incentive to participate in your community?

With moderators you feed the "power tripper".

With karma you feed people obsessed with points.

This is a bit complicated: what if you had some sort of capcha that required users to classify images as nsfw/sfw/illegal?


Reddit doesn't allow users to upload images. Just links. And they ban problematic domains.


Linking to certain NSFW domains in a comment will prohibit the comment from being posted by Reddit's internal logic.


This is a bit complicated: what if you had some sort of capcha that required users to classify images as nsfw/sfw/illegal?

If the user normally encounters photos on the site by requesting them (e.g., by entering a search query or browsing a friend's album) rather than having random photos thrown in their face (like HotOrNot.com), I would think you could run into some very upset users (and possibly legal problems) if you are throwing random photos that might contain disturbing images in their faces. I mean, if you go to a website intending to browse photos your friend took of his boat and the site throws up some random child porn on your screen, you'd be pretty annoyed, right?


no i mean something like this.

prompt a capcha when someone attempts to `upload` a file

https://pbs.twimg.com/media/CA4S4lFWEAEJUUB.png:large


Forcing your users to look through a selection of random images which could include several which may be illegal for them to view is not okay at all. Especially if you know you're putting them at that risk!


I don't think nothing good can come from throwing random pictures that may very well contain child porn at random users.


This would indeed be a problem at the start. But after some time it could weed out. People wanting to share illegal images would go away from the site if they ger rejected. We could expect a significant drop in nasty images.


You would to expect a significant drop in traffic after the ensuing PR nightmares, fbi raids. Followed by years of lawsuits from people who claim PTSD symptoms and what not after being forced to watch CP.


I'm not sure why it never caught on with my users.

I do support Twitter login which is fairly common, and have had a few hundred users sign up (out of like >10mil uniques), but I wouldn't say that they've exhibited overly engaged behavior after doing so. They're basically the same from a stats perspective from what I've observed.

I'm afraid to make user logins compulsory, especially considering the kinds of knuckleheads that are on my site.

How do I get from "eh, have fun as a guest" to "everyone's got an id" without destroying my stats?


Why are stats so important? Do big numbers mean more than doing the morally and legally correct thing?

I also question the assumption that taking steps toward accountability and community will hurt the numbers. Maybe in the short term, but if your site becomes a cesspool it will eventually drive your real users away anyway. If you figure out how to sustainably handle the filth, it may be possible to attract a much larger audience who would otherwise have been repulsed. Yik Yak is an example of that strategy.


Another possibility is to add a service to label/tag images.

See if if you could turn it into a game where people would gain karma points by properly labelling/tagging images. Make it peoples choice to participate in it instead of forcing them into it with a captcha.

Your image stack would gain significant value by being labelled and searchable by label.

See https://www.cs.cmu.edu/~biglou/ESP.pdf


If you give people a way to send passworded links to cash subscribers, then none of his subscribers will rat him out. If you make all images open to view, you can then appeal to people to flag items for removal, or just autoremove them with - say - 2 or 3 flags, and trust to your nicer clients to police the site. If your clients all sign up with throw-aways, then load a huge block of images, all with their own password, then they can sit far away and sell passwords all day and never emerge to be caught. If they want to add more images = a new thow-away account every day if you like. Full accounatbility is the answer so all images can be tracked back to a real address and name. sadly, only good people deal with this, but it might be a way to thin the crowd. A secret untrackable photo site will also soon attract the police as they hunt for child porn sellers, so they will sooner or later come knocking on your door.

One way it to make contact with the police and get permission to list the names of the police agencies that are allowed to inspect the site via backdoor etc. Of course this might enrage some?? So some sort of middle ground might be to quietly approach the police for advice


Paid moderation. Putting up an open image sharing site with no security is akin to opening up a nightclub with nobody checking ids at the door and no security.

I can't tell you what to do to meet the minimum legal standard of covering your ass, but that's going to vary by jurisdiction and current whims over time. I can tell you, though, that by the time somebody has stumbled over a terrible image and reported it, they 1) will be horrified by your site and never use it again, and 2) the poster will have shared the url with everyone they wanted to and the image will already have been distributed as far as it was intended to be. If the number of terrible galleries is increasing, you're probably becoming well known within tiny circles as a convenient place to share the stuff.


People are recommending very exotic solutions of image detection. But unfortunately that's an ongoing arms race between two groups of very smart people. One driven by the moral justification to stop abuse against children and the other driven by their unfathomable lusts.

So I would rather just K.I.S.S. and put your time into maintaining a reporting system where any anonymous user can report violations.

Then put emphasis into making this system as easy as possible for you or any potential moderators. The ideal would be to have an app that would notify you of any new reports and allow you to just swipe from image to image pressing an icon if you want to delete, ban or whatever.

Edit: Such a system could eventually form the basis of a bayesian filtering system.


> ... swipe from image to image ...

I would not want that job!


A "flag content" link should be good enough. Sounds like you're US based; you're not required to manually check all user-contributed content. Set up a DMCA section and link to it at the bottom. If LEA contact you, make sure you react fast, and maybe you want to offer them an automated way of takedown if this really happens that often.

For the future: Asking legal questions without stating your jurisdiction is... not helpful. :)

Yes, big sites employ a lot of people to clean content. I remember reading an article about poor people in $third_world_country that do this all day long.


If you can locate the article, I'd love to give it a read.

I'd rather keep my money inside US borders, but it ain't cheap, and staring at the sickening contents of the internet's collective wet dream ain't exactly a high-profile career path for young folks.


Here's one: https://www.wired.com/2014/10/content-moderation/

"Hemanshu Nigam, the former chief security officer of MySpace who now runs online safety consultancy SSP Blue, estimates that the number of content moderators scrubbing the world’s social media sites, mobile apps, and cloud storage services runs to “well over 100,000”—that is, about twice the total head count of Google and nearly 14 times that of Facebook."


Why not report their ip address to the police? Hopefully people will learn that by posting cp on your site they will get flagged.


I wonder if it is possible to check whether the request came through Tor, and not report it in that case? Bringing down exit nodes with CP reports does not seem like it would hurt the right people (only very indirectly).



Supposing I was ok with directly alerting the authorities, which for some reason seems to be perhaps a bridge too far, how would I even go about that? Where's the "report pedophiles" REST API?


Why is it a bridge too far? I'd argue it's a civic duty. I'm sure the fbi has a tip line / email.

Also, post on your site that you report child porno. That should scare people off.


OP said "API", and refers to many thousands of images which need to be evaluated. It seems like he is pointing out here that this problem is unlikely to have a completely algorithmic solution, and if it does, the solution is probably not to spam the police with tens of thousands of criminal accusations against his customers that have never been vetted by a human.

Whether or not this is moral is irrelevant if it's not scalable. How about automated flagging of images for moderator review, and thinking about ways to reward or remunerate the moderators?

CV age detection should really be a thing, if it isn't already. Maybe there's an opportunity there if it's not already done, or way too hard?


> CV age detection should really be a thing, if it isn't already. Maybe there's an opportunity there if it's not already done, or way too hard?

I think this falls into the category of impossible. It is a hard enough problem for the human eye/brain. see https://en.wikipedia.org/wiki/Traci_Lords


Because making pictures -- however contrary to present sensibilities -- illegal is a bit too close to thoughtcrime?


Producers and consumers of CP are not prosecuted for their thoughts, they are prosecuted for their actions. The subjects of these images suffer real and lasting psychological and physical damage.


Once again, these images don't contain meta-data stating the persons age, ages of consent are different around the world, someone can accidentally click something they didn't mean to or reasonably expected was not CP as almost the entire internet avidly sanitizes these images. I wouldn't go as far as to say it is "thought crime", but it can be non-obvious what is illegal. I am assuming the GP meant this and not that since these images already exist it is acceptable as the fantasy is in one's head, as that is not a reasonable excuse.


That argument doesn't apply here though. If someone is manually uploading an image of a child being raped to an image hosting website, there's no accident about it. Reporting someone who uploaded that image (i.e. someone who is distributing child pornography) is the only ethical and sensible option.

Now, if someone is uploading an image of their 16-year-old girlfriend taking a nude picture of themselves, that's still illegal but more of an ethical gray area, in which case I could understand hesitation at reporting to authorities. But based on OP's description, it's the real bad stuff, so reporting seems to be a no-brainer.


I am not crusading to uphold CP on the internet. It is interesting that you probably think my above comment was distasteful. But then you say:

> if someone is uploading an image of their 16-year-old girlfriend

That isn't a grey area, that is wrong unless you have consent[0]. My point is that it is hard to tell what is and isn't child porn(hence this thread in general) and that an 18 y/o could look younger. What I think is possible is that someone could upload photos of THEMSELVES when they are either in the US or even in a country where that is legal and it is non-obvious what the age is.

Child Porn is fucking disgusting, I am not here to defend it. I am just pointing out how hard it is to identify it and that some people upload their own photos online. If you had a database that magically let you know if an image was illegal then of course I would say take it down.

edit: [0]I meant this generally, not that it would be OK to do this if she was 16, but if your gf was of legal age and you posted pictures of her or the two of you together.


Let's say the person did have consent. It's still illegal, but is it still wrong?

Let's say they didn't have consent. It's wrong in the sense of it being "revenge porn", but should it really be considered as child pornography in the way it's classically viewed?

It's a gray area in that regard.

If someone is uploading things to his website which he can clearly identify as child porn without further analysis (if the child depicted is under 13, it's probably pretty easy to tell, unfortunately), I don't think he should hesitate to report it to law enforcement.

And if law enforcement investigates and learns it's not actually child porn... then they almost definitely won't charge anyone so long as no other crimes were committed. There are no downsides to reporting it and many downsides to keeping it to yourself.


That doesn't make a lot of sense if the photos are records of real crimes.


Photos of most kinds of disgusting crime are legal.


And I would imagine a bunch of child porn photos don't technically show anything illegal (nothing illegal about a child being naked on their own, what's illegal is the taking of the photos for sexual gratification - don't ask me exactly where the line is there, though)


It's one thing to not want to report IPs that access the image, as it could be a complete accident. But why wouldn't you want to report someone who is actively trying to share such content?


This is who many large (top 20) websites report to:

http://www.missingkids.com/

Reporting to them is the same as reporting it to the FBI.

They have various APIs you can use, but you probably need to contact them to make sure you meet the requirements.


I know there is something out there that allows you to report. I have not had experience with it so I cant really be more helpful in this direction.

If you aren't comfortable reporting it, what is your banning process like currently?


You would contact the FBI and give them the relevant information, perhaps in a spreadsheet with date/time information.


I'm disgusted replying to you. A bridge too far?

Anyway, here's your API https://developer-westus.microsoftmoderator.com/docs/service...

Guess you can be trendy AND help child rape victims.


I think you may be making unwarranted assumptions about the OP's reasons for not wanting to automatically, algorithmically contact police.


Just to provide a counterpoint, many people (myself included) believe free speech and privacy are fundamental in society, extending to the internet. Also, having a very US centric view of the world can make you forget that age of consent is arbitrary depending on where you are in the world with many european countries allowing for different laws. Also, it is quite difficult to ascertain the exact age of a subject given simply a photo. So while you are "disgusted" it is important to realize that this is a difficult problem both in identifying content and identifying perpetrators.

That being said, if it is obvious that the image is CP and you have the IP address of someone who uploaded it, I couldn't imagine not passing on that information. Also, I would put a public disclaimer on the site indicating your policy.

edit: before people start flaming me as a rape apologist I want to clarify that notifying the FBI and giving away your users public information is a strong action. It is warranted in the case of CP (one of the few things I personally think) but that, as this thread is indicative of, it is difficult to identify IRL.


First time ever posting on a throw away account, but this shit is too personal.

As the victim of child molestation; this absolutely needs to be reported. Saying anything else is sympathy for the lowest filth in existence.


People who commit crimes like this are some of the worst in the world and I am truly sorry that happened to you. As I stated, I personally think that reporting it is an absolute necessity and something that should be written into your sites policy and prominently displayed. Just to reiterate though, the issue isn't whether child porn is wrong it is how to properly identify it. If you are going to accuse someone of being a paedophile and give their information to the FBI it is important you are correct.


And I'm sure parents of murdered or otherwise killed children would be offended by reddit's "pictures of dead kids". Yet it's not illegal to share such images.


If you think it is CP you should report it, period. It's law enforcement's job to determine whether it is for sure, not you. Better to raise a false alarm or two than to let a child predator get away.


Like most forms of human behaviour, if you add enough resistance, people generally start looking for alternatives. With that in mind, I would suggest you consider a service similar to Mechanical Turk to perform random checks on images, and ban any accounts the violate your terms.

Using a service like this (perhaps not MTurk exactly, i'm not sure how they feel about this content type being reviewed on their service) you could identify how long it takes until you can trust an account, to reduce your expenses associated with the filtering of traffic.

Sadly, not this is not a free option (Probably 1c per image + MTurk commissions), but I imagine eventually you can optimize the process to a point where either you spend very little reviewing images, or you hit a critical mass of a user base that the site can moderate itself.


Honestly, I am not sure what you can do about this, other than request users report it when noticed.


After you follow the suggestions for using PhotoDNA.

I would also announce to uploaders that you will be processing their images through the service. Including their ip address, etc.

That might help with just the volume of bad images.

But I do have some questions:

1) considering the headache - maybe shut down site for maintenance while implementing service.

2) process all images currently

3) consider that some images will inevitably be not in the database but be CP.

... therefore, at what point do you decide to just shut down the service.

It would suck to have to spend a lot on a lawyer to stay out of jail.

Its nice to pretend that the FBI would understand that you are an innocent victim - but what happens when those images end up on your machine (browser cache) and a fed prosecutor sees things differently?


I hope I'm not reiterating anything people have said... but generally sites with a community focus, have volunteer moderators who monitor for this stuff and help ban users who post it.

If your'e doing anonymous posting, and the images are meant for sharing on other sites (ala Imgur) then you'll have a harder time of it. You can ban the images themselves and save the hash of the image to compare to future uploads and automatically reject those.

I can probably tell you more about how we do this on my own sites and my newer project, 9Cloud.us, which allows site owners to outsource all these concern. Just email at admin@9cloud.us


It would be interesting to see if this kind of content is amenable to classification. Maybe it would be worth looking into something like Caffe [1], it may even help you with managing the site in general. I can't search right now, however I think that a quick search in Google Scholar could yield a few different approaches in this direction.

[1] http://caffe.berkeleyvision.org/


Visibly watermark NSFW images with the uploader's IP and the exact time when the image has been uploaded. This should deter the most casual child porn uploaders.


And then no one (including the CP people) would use the site... Sure you don't have CP on your site but you don't have any users either. There are other ways to handle this.


They could silently do it in IPTC form. I don't know why this isn't already a thing.


Or all images uploaded, just in case your algo isn't perfect at detecting fleshy things.


One could use a hash set to identify and exclude known child porn images. Having done that, the law may require one to report it to law enforcement.

See the 'Child Exploitation Hash Sets' available here: http://www.nist.gov/oles/forensics/forensic-database-tech-di...


I believe machine learning is the way to go. It might require quite a bit of coding and time, but in my humble opinion it is your best bet. Also,someone wrote some JS code a while ago that was posted on HN that would detected nudity on a given photo. I believe you could automatically tag NSFW if the script alerts for nudity. I will edit for link as soon as I find it.


Machine Learning is the best way to detect illegal images. Here's a story on this topic: http://abcnews.go.com/Technology/detect-digital-child-pornog...


Have you blocked known tor exit nodes? That would be my first course of action. Blacklist IPs for any offence.


Here is how Facebook and Youtube does it: http://www.dediced.com/links/the_laborers_who_keep_dick_pics...


Look there is no 100% way to keep child pornography off of your site. The only way to attempt to prevent it is have a reporting button on all images and pay someone to go through the moderation or have an automated system that flags images and removes them for you.


HN today (7 days later): MS have released their PhotoDNA publically so it's now easy to use for small sites:

https://news.ycombinator.com/item?id=9903263


Several third-party services for reviewing uploaded images have been suggested. I don't argue with their value. But how can one review effectiveness without viewing images? There is stuff that just can't be unseen.


"From rape and bdsm, which I can kind of tolerate"

I hope you mean simulated rape (although I'm not sure how people know the difference). Victims don't suddenly become unworthy of your concern because they pass 18.


First, be careful because the law on this is pretty tough, even though the content is user generated.

But I'm wondering how much messaging you present indicating that illegal content is forbidden and will be prosecuted?


1 - Charge a one-time fee of USD .01 to your users payable only via PayPal. 2 - Report to the police any child pornography issue handing them the PayPal coordinates.


3- Loose all your users ?

I wouldn't pay USD .01 to access a simple image upload site


Maybe mechanical Turk?



There are APIa to find naked people. Some are terrible, pifilter, which is designed specifically to recognise body parts is pretty good.


You should look into automatic classification of the images. Ianal, but I imagine you probably need some kind of permit if you diy.


For CP if you're US based, log it, take it down, and use one of many report GOV/NGO systems available.


Seems like a good use for MechanicalTurk (mturk.com), at least until a free/open-source API becomes available that can replicate the abilities of a human.

I'd recommend sending new photos to a queue, and only letting them get indexed after someone from mturk has marked them as acceptable.

If the workers find something illegal, seek counsel and do as the counsel advises.


Definitely not!!! Asking people to look at child porn for you is a terrible idea, and may be illegal in many places.


You're not asking them to look at anything that you've identified as illegal material. If you already knew it was illegal, you'd have deleted it or reported it yourself.

You're asking them to moderate unknown material from new users you know nothing about. That's something that every volunteer moderator from Reddit does.


There were paid staffers doing that at MySpace back in the day. I talked to their head of security; not a fun job.


thats pretty much exactly how it works though


what site is this?


How-old.net


What's your problem with BDSM, as long as it's consensual?


Images of BDSM is illegal in some countries, like the UK, IIRC.


Oh, such a weird argument coming from Hacker News, the libertarianism haven.


Well, to clarify (seeing how I can't edit my post) - if I was in OP's position that's what my concern would be. As I live in the UK I'd have to be careful about hosting things like BDSM pictures as the porn laws here are rather retarded.


I figured. I was just wondering the downvotes with no clarifications. (I don't care about the votes themselves, but I wanted to know what other people thought)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: