

The SOPA Debate and Congress's Understanding of Child Porn - danso
http://danwin.com/2012/01/the-sopa-debate-and-how-its-affected-by-congresss-understanding-of-child-porn/

======
memset
I really enjoyed reading this perspective. It reminds me of something Feynman
described (can't find the link), wherein some military general asked
scientists to develop a tank which would convert rocks and sand into fuel as
it drove across the terrain. And Babbage before that - "I cannot comprehend
the confusion of ideas..."

If we want to frame SOPA in the context of these kinds of incidents, one
optimistic view would be that, as has happened in the past, science - if we
can agree that determining whether or not a copyrighted work is stolen - will
prevail.

~~~
danso
Great anecdote...I'm ashamed I didn't fit it in even though I read it
recently. It's from his memoir, "Surely You're Joking Mr. Feynman"

<http://goo.gl/KjW6V>

~~~
asmosoinio
Please don't use link shorteners in HN. Full link:

[http://books.google.fi/books?id=7papZR4oVssC&pg=PA289...](http://books.google.fi/books?id=7papZR4oVssC&pg=PA289&lpg=PA289&dq=I+work+out+a+way+in+which+we+could+use+silicon+dioxide%E2%80%94sand,+dirt%E2%80%94as+a+fuel%3F&source=bl&ots=esP-9irL-Z&sig=l9wPtf8dzvKp18OuXhaiFRLhYUw&hl=en&sa=X&ei=IbQNT9PLBcTa0QG4-bDuBQ&redir_esc=y#v=onepage&q=I%20work%20out%20a%20way%20in%20which%20we%20could%20use%20silicon%20dioxide%E2%80%94sand%2C%20dirt%E2%80%94as%20a%20fuel%3F&f=false)

------
bitwize
_In a more physical sense, this is like detecting a machine that can determine
from a photograph of your handbag whether it’s a cheap knockoff and whether or
not you actually own that bag – as opposed to having stolen it, or having
bought it from someone who did steal it._

Congressman: "That's no excuse. I've seen CSI. You can just go 'enhance,
enhance, enhance' and find out anything about it."

~~~
wazoox
I think the quoted handbag example is a great, understandable analogy, that
should be taught to everyone without the necessary technical understanding. It
makes obvious how ridiculous the whole project is.

------
thebigshane
I think I see a hole in this logic. A very well thought article and I agree
with the main premise, but allow me to nitpick a crucial point...

 _[Warning: speculation, I do not work for Google...]_ Google has developed a
very effective algorithm for detecting porn, _not_ child porn. They can use
flesh tone analysis for detecting nudity but algorithmic analysis of a single
image to determine the age of the person in the picture seems way harder. File
names don't work either: a picture of a handbag with a file name of
"16yr_old_porn.jpg" is not illegal.

Battling _child_ pornography requires a human to look at stuff. Search results
get flagged by normal users (and by paid teams I'd bet). These flags are
analyzed and used as patterns to detect other similar images (similar colors,
sizes, names, patterns in the image, source urls, watermarks) and this
secondary analysis can be done with fancy algorithms but it still requires
groups of people to flag things and create these patterns.

If I'm right, then a similar method could be used to help combat infringing
material. But as danwin points out, there is a new factor: to figure out if a
particular piece of content is licensed or infringing, which is hard, even for
a human. But there are some people who can look at a video, picture, (listen
to a) song, etc, and be pretty sure that it is infringing. Clips or full
videos of some popular movies on youtube are obviously infringing. Most mp3s
on the web are infringing.

There would be false positives and false negatives (just like with child
porn), but you could build a similar type of pattern database that could be
used to find similar files. This same type of flagging/reporting could be used
to identify infringing sites. I know youtube has an existing algorithm for
identifying patterns in audio tracks to some database, but I don't think these
patterns are based on flags, I think they are hard-coded.

Users could still browse directly to a URL (or IP) hosting questionable
content but it would definitely cut down on the easy access and spread of
files.

NOTE: I don't like the idea of Google (or any organization) being mandated to
implement such a process but it may be better than interfering with DNS
mechanisms and lawsuits and court orders. I also don't like how reliant we are
on a private organization holding such power over what the world can find in
their results. Nearly everyone agrees on child porn but I think one of the
main reasons SOPA is controversial is because we don't all agree on the
importance of fighting infringement. (Counterfeit goods maybe, but not license
violations)

By no means do I claim that SOPA is a good bill. I think doing nothing is
better than implementing what is in SOPA.

EDIT: An assumption made here is that we consider the way Google is fighting
child pornography is effective, sufficient, and ethical. I think that is a
reasonable assumption.

EDIT2: One factor I didn't consider is the quantity and diversity of child
pornography versus infringing material. I'm not sure how the algorithm would
be affected but it'd probably be slightly less effective.

~~~
burgerbrain
_"Clips... ...of some popular movies on youtube are obviously infringing"_

Or you know, fair use.

~~~
thebigshane
"Fair use" is pretty specific. Most interpretations allow to _use_ a clip in
your work. But if we are taking about a clip of a movie on its own, like the
first 10 minutes, without any _transformative_ addition/change, I don't see a
"fair use" defense.

~~~
trevelyan
My sole encounter with the Youtube copyright police came from posting a two
minute, highly edited visual commentary on the film Inception
(<http://www.youtube.com/watch?v=B6rLEzrWzCw>), pointing out the religious
allusions to Matthew 7.24 and the allegorical significance of the final scene.

It was auto-flagged by their system and I had to jump through hoops to get it
back online. I can't imagine this sort of thing would survive SOPA. And what
other platform is there but the Internet for the sharing of multimedia
commentary and criticism?

~~~
wmf
YouTube isn't the Internet, BTW. I know people have forgotten how to do this,
but it is possible to host videos on your own Web site. In that case you
probably would have been left alone, or you might have received a DMCA
takedown which you could easily defuse with a counter-notice.

~~~
trevelyan
Sure I could have hosted it one of my servers, but that isn't the same any
more than posting a couple of links in an Apache directory is the same as
sharing it on Reddit or Facebook. And am I supposed to be maintaining a
fulltime blog on film criticism in order to share a single video with an
audience of people who might care?

There's no sane reason we should lose the visibility, distribution and
community relevance of social platforms simply in exchange for exercising fair
use of something. That said, the point was less about Youtube being unfair and
more about how these platforms are _already_ heavily biased against the
assumption of fair use.

~~~
thebigshane
The fact that you were able to get your video back up, (and remains up at the
current moment) is testament that this is a _decent_ , but not perfect,
solution.

And its better than letting Congress enact SOPA, which is my whole my point
upward in this thread. I know that's not saying much, but SOPA supporters
claim that we can't (or won't) offer any alternatives to SOPA. Since Congress
is always in the mindset of "we have to do _something_ ", let's give them an
alternative that they can support that shows "something is being done".

~~~
trevelyan
I agree. I was responding more to what I read (perhaps incorrectly) as the
insinuation that fair use is well-defined and well-protected. I had read this
as implying that it would consequently be protected under SOPA as well.

------
mindslight
The other major difference between the two (which today's politicians are
incapable of understanding) is that most everyone disapproves of child
pornography. Their laws criminalizing it do not _make_ it go away - they are
but a tool to enforce the general societal view that child pornography is
despicable. On the other hand, copyright is a government edict that very few
people care about. You can't legislate societal values - good laws merely
codify them.

------
jopt
This ties back to why opponents of copyright as such call it "imaginary
property:" there is nothing about the copy that makes it any worse than the
original. We can't recognize a copy because then it isn't a copy, in the
information theory sense.

Meanwhile, in the physical world, a knock-off handbag can be easily identified
as such by airport personnel. This is the world these pro-PIPA politicians
live in. Child porn, online as offline, is inherent in the work. You don't
need supplementary information to decide if something is child porn. Online,
the difference between a copy and the original is in some people's minds, not
anchored in reality.

~~~
rcthompson
Good point. The misunderstanding is rooted in a fundamental difference between
physical objects and ethereal "content".

------
kpanghmc
"But we have the technology, Google has the technology, we have the brainpower
in this country, we certainly can figure it out."

Even if this were true, just because Google has the capability to do this
doesn't mean every company does. If it costs Google $X million to prevent
online piracy on their sites, is it reasonable to expect every web startup to
do the same?

~~~
mgkimsal
SOPA backers can't be responsible for every undercapitalized entrepreneur in
America.

~~~
WildUtah
For those who missed the joke, sixth paragraph:

[http://www.pittsburghlive.com/x/pittsburghtrib/opinion/colum...](http://www.pittsburghlive.com/x/pittsburghtrib/opinion/columnists/reiland/s_557486.html)

~~~
asmosoinio
For the lazy:

\---- "I can't be responsible for every undercapitalized entrepreneur in
America," Mrs. Clinton said in 1993, responding to charges that her plan would
bankrupt businesses and cut employment.

------
lomegor
Wow! Really good unbiased article on why some people support SOPA/PROTECT-IP.
It's hard to find these things these days when everyone is accusing the
government of being run by corporations.

~~~
knowtheory
It's also worth noting that Dan Nguyen (pronounced "win" even) is an awesome
dev who works for ProPublica and has written a book on learning ruby
(<http://ruby.bastardsbook.com/> ), amongst other things he's accomplished!

------
Jun8
"So for child porn, we are able to design a machine that is able to detect
child porn. You can detect certain colors that would show up in pornography,
you can detect flesh tones."

AFAIK, this is complete BS! I was quite up to date with research in this field
till about 5yrs ago and the systems are still in their infancy. If you put an
automated system to automatically detect pornography, the false alarm rates
would be very high. And differentiating child porn from regular porn is also
extremely hard.

~~~
thebigshane
I haven't tried it but looks fairly effective: <http://www.patrick-
wied.at/static/nudejs/>

~~~
burgerbrain
The author seems to say that has a detection rate of 60%.

------
pfraze
Actually, how hard would it be, really? All you'd need to do is create a
central registry of content-owners and licensed content-users, build Shazam-
like content-detection, then force everybody to operate on that
infrastructure.

Please don't do that, but how hard would it be?

~~~
SoftwareMaven
Don't forget to ID every user on every access so you can see what permission
they have. Without any regard to requiring your papers to traverse the
Internet, I can't imagine how difficult that would be to scale. And if it goes
wrong, the whole Internet goes down.

~~~
pfraze
Nah, you just need it for content uploaders and copyright holders. No need to
implement a global permissioning system.

~~~
burgerbrain
In case you haven't been paying attention for the past decade or so,
"uploaders" are now the general population. _Everyone_ uploads things.

~~~
Natsu
I get the feeling that a lot of people have no idea how bittorrent works.

~~~
burgerbrain
Yes. Or simply just posting comments online.

~~~
pfraze
It's not that I'm unaware of peer-to-peer sharing, or the fact that uploading
is inherently part of browsing. My thinking is, most content hosts are
published to be found. An indexing process could find published content, and
laws could enforce findability.

It's about as effective as our existing regulatory systems. For instance, a
lot of commerce is taxed and regulated, but you could just walk up to somebody
and make exchange. Never air-tight, but it is enforcement.

Is my suggestion wise? I definitely don't think so, but it is feasible. And
maybe I'm not considering something correctly -- but, please, don't be rude
about my technical prowess. I'm just trying to strengthen our argument.

------
Bobby_Tables
What annoys me to no end is that the counter-argument to the "Google is smart
enough to build it" canard is legal, not technological. Child porn is illegal
in this country, full stop. If it exists, it's prosecutable. An MP3 of a
Justin Bieber song isn't necessarily illegal, and determining whether a
particular copy is legal requires information that is not readily available
for financial security reasons. So nobody can build a detector, regardless of
how smart they are.

Which is exactly the point of SOPA...since we can't stop the flow of
copyrighted media on the internet, we can't have the internet.

~~~
chairface
This is just a friendly suggestion to edit that second sentence - it threw me
for a bit.

~~~
Bobby_Tables
GOOD CALL. This is why I should not post on HN while waiting for my tests to
run.

------
phamilton
From a technical standpoint:

We can verify authenticity. We can transfer authenticity. I think about
BitCoin and the security of the network. Everyone on the network can verify
the origin and authenticity of a transaction.

Could something similar be put in place for media? A distributed DRM
essentially? While DRM in its current form is a nightmare, isn't the main
drawback being tied down to certain platforms? If DRM were decentralized, I
don't think I would have a problem with it.

Thoughts?

~~~
wazoox
This looks like "what colour are your bits?".
<http://ansuz.sooke.bc.ca/entry/23>

Your concept seems ridden with so many problems and possibilities or abuse
that I'll let this as an exercise to the reader... (example: how do you "tag"
existing files? existing CDs? whose interest would this be? would people
bother? etc.)

~~~
phamilton
To tag existing CDs, we've already seen that problem solved with iTunes Match.

To get people to do it, it would be a part of uploading media files. If SOPA
were to pass, I imagine hosting sites would require only "signed" media. It
wouldn't be perfect, but any distributed file would be signed at creation.
Self signing would be simple enough if it were built into media programs. The
key to all this is that nobody controls it, and the standard is open source.
Once again looking at BitCoin as a role model.

Also, I don't think it's fair to tag a concept "ridden with problems and
possibilities for abuse". It's a concept, not an implementation. And it was a
mighty vague one at that.

------
usaar333
_So what Rep. Marino essentially wants is for Google to build a Shazam-like
service that doesn’t just identify a song by “listening” to it, but also
determines if whoever playing that song has the legal right to do so. Thus,
this anti-pirate-Shazam would have to determine from the musical signature of
a song such things as whether it came from an iTunes or Amazon MP3 or a CD.
And not only that, it would have to determine whether or not the MP3 or CD is
a legal or illegal copy._

This isn't as hard as the author is making it sound and I suspect Youtube
already handles it with movie clips. With the aid of an external "copyright
registry", Google could see if the fingerprint of an unknown song is close to
one in the registry. Obviously, there would be many false positives and true
negatives, but that is certainly true with child porn detection as well.
Indeed, it wouldn't take all that much coordination to pull off; a better-
funded copyright office could handle this.

I don't agree with Google/internet being forced to do this due to the burden
it creates, but it isn't that hard to pull off...

~~~
wmf
Especially since basically zero major label music or Hollywood movies are
legally available for free, so if Google detects such a file it's
presumptively infringing. (Oops, is that something you can't say?)

~~~
funthree
>basically zero major label music or Hollywood movies are legally available
for free

only because its not true?>

------
bad_user
TL;DR ... If a normal person isn't able to properly classify something (good /
bad), then a computer algorithm definitely cannot.

My opinion - pushing for such a legislation would make it a criminal act to
err on the side of false negatives. Which means false positives will be very
common.

This point even a monkey could understand. If it refuses, then it is wishing
for bananas or something.

~~~
feralchimp
> TL;DR ... If a normal person isn't able to properly classify something (good
> / bad), then a computer algorithm definitely cannot.

Technically untrue. Computers are able to correctly classify (and with good
accuracy/precision) lots of things that even domain-expert humans have a tough
time classifying.

One problem with those systems is that, depending on how they're trained up,
humans aren't even able to pick apart how the software is making its decisions
(and use that to advance the state of the relevant science, for example).

Creepy and awesome.

~~~
bad_user
Examples required.

A 7 year old that can read could be trained to properly recognize spam, child
porn, (allegedly) copyright infringement and also give you advice on purchases
after seeing your spending habits. And a 7 year old may not have the bandwidth
of a supercomputer to process hundreds of thousands of items at once, but he
is able of greater accuracy and that's because the human brain is the most
advanced pattern-matching processor in existence.

You can classify _anything_ by means of statistics, sometimes with surprising
results, however my point (and maybe I wasn't making myself clear) is that
you'll get a lot of errors of judgment. Which is why algorithms will be
trained to err on the side of false positives, because doing otherwise will
put the business in jeopardy.

~~~
feralchimp
I should have mentioned up front that I strongly agree with your core point
about legal penalties and their effects on selection bias.

I was hoping the Wikipedia page on 'expert systems' would bail me out the
examples front, but its 'disadvantages' section isn't all that clear or
complete. It does touch on the issue I mentioned, though.

I think many high-frequency stock trading algorithms are examples. The trades
may as well be magical, and as long as the program makes money the owner
doesn't much care why.

------
wallawe
I think this article is empathetic in how it has shifted focus to trying to
understand the view of SOPA from "their" position. This is a good thing to do,
because without understanding your opponents argument you can't fully
understand your own.

However, as kpanghmc pointed out, even if Google had the capability, that
doesn't mean every company does. I'll take that a step further. Even if Google
had the capability, the government doesn't have the right to step in and
demand that a single private entity build something to stop it, even using
government (taxpayer) money. This is how you know the government has gotten
too big.

Furthermore, the article while trying to empathize, omits the fact that this
is _censorship_. The banning of web pages or websites as a whole without undue
process is not constitutional and un-American.

------
jimworm
In fact, the computer program could only ever get to analyze a copy of the
file, and not "the" file itself. The legality of the in-memory copy remains in
a quantum superposition, simultaneously legal and illegal until a politician
discovers how computers work.

In other news, the police start withdrawing money from every bank account
using an ATM and analyzing the $100 bills for signs of the account holders'
criminal transactions.

------
Joakal
Demand anti-Child porn politicians to answer this anti-Internet question:
What's more important, blocking rare child pornography content or allowing
children to have unrestricted access to knowledge and future with the
Internet?

If they wish to block child abuse, they basically admitted to wishing to kill
the future of children just to hide evidence of abuse.

The ball's in their court.

~~~
CWuestefeld
If this saves even one child from the degradation and abuse of pornography,
then any cost is worth it.

</sarcasm>

~~~
Joakal
If this ruins the ability of almost a billion children to freely access
knowledge and to express themselves in the future, then I hope the cost is
worth it, congressperson.

Speaking of, I wonder how many children there are in the world.

------
tkahn6
> REP. MARINO: I only have a limited amount of time here and I appreciate your
> answer. But we have the technology, Google has the technology, we have the
> brainpower in this country, we certainly can figure it out.

Wow that is infuriating; ignorance masquerading as arrogance.

~~~
Natsu
Indeed, that was an incredibly pernicious comment. I can understand how he got
into politics.

Sadly, even though it was carefully explained to him, data on whether or not
something constitutes copyright infringement does not exist. That's not a
technical problem. It's not something you can change with a magical computer
program. Infringement rests upon whether or not the person has permission to
do whatever it is they're doing. That data is simply not available, so there
aren't any numbers for Google (or anyone else) to crunch. He can only say it's
easy because he doesn't know anything.

If he worked in sales, I bet he'd promise customers a time machine, collect a
huge bonus, then blame the geeks for failing to deliver.

