
Google doesn't recognise or penalise stolen content - ollieglass
http://www.pi-datametrics.com/fatal-flaw-googles-inability-recognise-stolen-content/
======
bpodgursky
I think this is pretty fair on Google's part. How could you possibly figure
out who owned content?

What if I published a book, it was copy-pasted in blogs, and then later I put
it somewhere crawlable by Google? You certainly can't just say "first time we
saw it, that's the proper owner". It would either require a massive amount of
manual QA to get right (and even then, there are going to be interminable
copyright battles), or have a super high error rate.

I think Google's best value is letting proper content owners easily find
violators via normal searches, and let them deal with them via takedown
notices or the court system -- which is where it should be done, not in a
pseudo-court run by a Google who does not want what responsibility.

~~~
ChuckMcM
So back when Blekko was a consumer search engine we could 100% figure out who
owned content on sites we crawled often. And even when we didn't we could
often guess correctly more often than not based on the domain registration
dates. (not to mention registry owners). That is because few people who rip
off content rip off just one web site, they will rip off dozens of web sites
and they will all share the same AdSense ids and the same domain registrar.
This is _easy_ stuff to spot when you crawl the web regularly.

I suspect that Google simply doesn't care. They get Ad revenue regardless and
in their laissez-faire editorial position it doesn't matter. What are you
going to do, use another search engine?

~~~
sounds
Or, more likely, they can't get involved for legal reasons. If they took steps
to block the easy stuff, an arms race would ensue, and the content providers
would never be satisfied with the performance being provided _for_ _free_ by
Google. The content providers would always demand stricter enforcement, and
could threaten to sue for copyright infringement regardless of merit.

~~~
ChuckMcM
I think if that were the case they would not have pushed out the "Panda"
updates which penalized content farms so heavily. If their past behavior (with
content farms and other "low value" content sites) is a guide they will not do
anything until enough people complain about it.

In the mean time it isn't even Google's content so its not a hosting issue,
they are just the "neutral" third party providing their 10 blue links (oh and
supplying the advertising engine those sites are using)

------
55555
I was huge into SEO for a few years. I try to stay out of it now, but it's
worth noting that this is almost certainly due to the current algorithm's
obsession with "freshness." The weaker site is ranking higher with the stolen
content because their site was updated more recently. Steal some back and I
bet they swap ranks again.

Also, the combination of the pagerank algorithm and normal user behavior
typically helps Google to understand who was first and who deserves to rank
higher. That is, most people don't plagiarize content, they quote it and then
cite the source, which (thanks to pagerank) tends to rank the original better
than sites which have plagiarized it.

~~~
scholia
_> the current algorithm's obsession with "freshness."_

Which is how Google makes blogspam such a good business to be in, even if your
content is inferior to the post you used for "research".

~~~
55555
Most of the spam I see in the wild these days is indeed (established) dropped
domains which were picked up and then loaded with thousands of pages of
"fresh" spun content, with an incestuous backlink profile if any. So indeed
'blogspam'. Everything old is new again; it feels just like twelve years ago.
Soon people will be keyword stuffing in a font the same color as the
background...

But Google certainly isn't intending to make blogspam a good business to be
in, and I'd argue that they aren't; over the past four years Demand Media's
stockprice has fallen from $400/share to $4, and the general marketplace for
commoditized SEO services has shrunk by a similar degree over the same period.
19 out of every 20 SEOs who were active five years ago have thrown in the
towel... just check alexa graphs for the top SEO forums.

The SERPs are clean these days. Google has done an amazing job every year for
at least thirteen years now of improving them constantly. The new wave of spam
is social. In practice this means Buzzfeed writers stealing user-produced
content from AskReddit threads and it ending up polluting my Facebook feed to
the point that I can't even find any good counterfeit Raybans.

~~~
aaronwall
"The SERPs are clean these days."

Here's an alternate take on that [http://www.johnon.com/1075/bullish-on-seo-
rankbrain-vs-seobr...](http://www.johnon.com/1075/bullish-on-seo-rankbrain-vs-
seobrain.html)

~~~
scholia
That's fascinating because I really don't understand it. Maybe I'm just out of
touch with SEO, but things like this escape me completely:

"This is because SEOs follow and influence the intent of searchers in the
marketplace, while Google’s algorithm (and AI) merely monetizes it."

Where does the extra monetization on page 1 results come from? Unless he's
implying that Google provides bad search results so that people will click the
ads instead.....

~~~
aaronwall
There are numerous ways to interpret that. At a base level, one could look at
how the mobile search results are sometimes a screen full of ads, or how in
some verticals they are a screen full of ads followed by yet another screen
full of ads.

And then there is the knowledge graph & other flavors of scrape-n-displace,
which is largely content recycled from elsewhere, given prominent positioning
not based on merit or editorial quality, but based on who the publisher (or
recycler) is.

Another parallel trend would be the confirmation bias / brand bias factors
promoting older and staler sites. Or simplified "take" articles in the
mainstream media rather than the original source articles on niche hobbyist
blogs and forums or such.

And in taking broad sets of new niche intents and trying to guide those
streams of users back down well worn paths. For example, sometimes when you
want to find a particular news story _about_ a broad & well-known web platform
like Apple, Amazon, Facebook, or Google it can be hard to find sites other
than the official site. And on some other longtail queries Google rewrites
what is being searched for in a way that brings up some results that don't
match the true searcher intent. Probably the best example I can come up with
on this front is say you wanted a pair of shoes of a specific brand, size,
width, and model number. If they are not the most recent and most heavily
marketed versions it can be tough. Auto-generated internal search pages on
trusted brand sites rank well, while a small retailer carrying that specific
shoe might be penalized by Panda.

------
Animats
This came up before on YC.[1] Google does have a system to detect provenance,
but you have to report your changes to Google as an RSS feed.[2] Google hasn't
updated that page since 2010, and it may no longer do anything.

[1]
[https://news.ycombinator.com/item?id=10103545](https://news.ycombinator.com/item?id=10103545)
[2] [https://pubsubhubbub.appspot.com/](https://pubsubhubbub.appspot.com/)

------
tomschlick
And they shouldn't. Thats not their job.

~~~
scriptproof
As it is not the job of the street vendor to know from where come these Rolex.

~~~
smt88
I don't know if you're being sarcastic, but it is illegal to sell stolen or
fake merchandise in the United States. Anyone selling fake Rolexes is
committing a crime and could also be sued.

Saying that Google is "selling" stolen content isn't that clear, though. Yes,
they're selling ads on search results, but wouldn't they get the same ad
revenue regardless of where those links pointed?

It's easier to make the case with AdSense, where Google literally profits
directly from stolen content.

~~~
PhantomGremlin
_it is illegal to sell stolen or fake merchandise in the United States. Anyone
selling fake Rolexes is committing a crime and could also be sued_

Huh? Now I'm worried. Are you telling me that the Rolex watch I paid $30 for,
that I bought from a street vendor near Times Square, might be fake? Oh no,
the horror! /sarcasm

I don't think that Rolex is too worried about this. Nobody would mistake a $30
watch for a real Rolex. And, give it credit, my fake Rolex worked for a year
or so. It probably just needs a new battery.

Besides, you can't sue a street vendor. They're what's known as "judgement
proof".[1]

And as to police action against them, the de Blasio administration seems to
have adopted a laissez-faire attitude about all this stuff. If they're willing
to allow squeegee men to operate with impunity, they certainly won't care
about novelty watches being peddled.

[1]
[https://en.wikipedia.org/wiki/Judgement_proof](https://en.wikipedia.org/wiki/Judgement_proof)

------
DarkLinkXXXX
This may be pedantic, but is stolen the right word to use?

I think plagiarized is more accurate.

~~~
smt88
Plagiarism can also apply to copying content without using the exact same
wording. In my mind, "stolen" means copying verbatim.

Plagiarism comes from a word meaning "kidnapping" though, so the tone of both
words is pretty similar.

~~~
johansch
_Stolen_ implies that the original owner no longer has access to the data due
to the actions of the perpetrator.

~~~
bduerst
For physical property, yes. For intellectual property, it can still be stolen
even if the original owner still has a copy.

e.g. The Soviet spies stole the plans for the hydrogen bomb.

~~~
tbrownaw
No, for intellectual property the word "stolen" is not appropriate.

(Well, unless you're talking about getting the courts to tell the original
owner it's yours instead. Which has been at least attempted a few times.)

~~~
bduerst
It is semantically appropriate for the word _steal_. I think you're confusing
it with the criminal implications.

------
Daneel_
What a rubbish article.. It doesn't fly for a second under copyright law.
Google is entirely within their rights doing what they're doing. The onus
isn't on Google to detect the infringing content.

For anyone interested in copyright and legal issues, I'd recommend checking
out techdirt.com. They have a great starter section at
[https://www.techdirt.com/blog/?tag=techdirt+feature](https://www.techdirt.com/blog/?tag=techdirt+feature),
and they cover legal, copyright, patent, surveillance and all sorts of related
topics. High quality journalism.

------
6stringmerc
Is this any better/worse than Facebook actively trying to profit and win over
users when people or organizations copy / upload / soak up views for material
they did not create and don't have the rights to use? Because that's a hot-
point of discussion in some creative circles as well.

~~~
Houshalter
Facebook's freebooting is pretty terrible. But this can destroy entire
websites. The title is misleading. Google isn't just not punishing thieves,
it's _heavily_ punishing the originals. They dropped from 20th result, to
100+, because someone stole their content.

~~~
6stringmerc
Yikes! That is much worse, at least based on your note. Do you think this is
an area where the EFF could litigate on behalf of the original creators in a
fraud context? Just curious, and also grateful to not be dealing with such a
horrible prospect.

------
nkozyra
Recognizing "stolen" content autonomously can only pivot on knowing when
something was first published or visible to Google, which is a pretty dubious
measurement.

~~~
matt_morgan
But the weird thing is the stolen content, even when it's on a crappier site
with no shares etc., knocking the original out of its slot. I.e., something
(probably freshness) is causing the stolen content on the crappy site to be
higher-ranked than the original content on the strong site. That seems
avoidable (and in Google's best interest).

------
randyrand
The title should more accurately be Google _search._ Many parts of Google, for
instance Youtube definitely do penalize stolen content.

~~~
walshemj
I have had a site that had UGC spam promoting dodgy TV streams - get hit with
a penalty.

------
jeremy7600
Advertisement post?

~~~
mikkom
Yes but a very interesting one.

------
hlmencken
This is not stealing, and even if it is illegal that is a bad way to put it.
Also, google's service is primarily to the searcher so this isn't a huge issue
for them.

~~~
smt88
> _google 's service is primarily to the searcher_

You are very wrong. For many years, nearly 100% of Google's revenue was from
AdSense.

What if someone spends days writing an article and posts it on his blog. Then,
someone else copies and pastes it onto BuzzFeed, which becomes the top search
result for that topic.

BuzzFeed is making money that the same blogger would have made from his own
content. Now, also assume Google serves ads to BuzzFeed, but it does not serve
ads to the blogger. _Google has a financial interest in ignoring the
provenance of the content in this case._

Is all of that ethically acceptable?

~~~
morgante
> You are very wrong. For many years, nearly 100% of Google's revenue was from
> AdSense.

That's incredibly untrue. A substantial portion of Google's revenue has always
been and continues to be from first-party AdWords ads.

The fact that you're using BuzzFeed as an example, a firm which emphatically
does not use display ads, shows how little you know about this.

------
jakeogh
It's downright tragic to ask for more rules.

------
sismoc
There is no such thing as "Stolen" content.

~~~
smt88
If that's true, there's also no such thing as "stealing" at all.

Consider a novelist who works for 10 years on her novel. A hacker steals the
document from her computer and publishes it online under his own name. He
makes $100M.

Is it wrong for the novelist to feel like someone stole from her? What word
would you use instead?

~~~
jsizz
> What word would you use instead?

Infringing. (duh)

~~~
smt88
I can't tell if you're joking, but "infringe" means "to violate" which implies
that there is a law or agreement that's being broken. That makes it sound like
you agree with the idea that this is stealing.

~~~
jsizz
I absolutely do not agree with the idea that this is stealing. How can it be
stealing, when the owner still has the thing that was supposedly stolen?

Different circumstances, different terminology. The correct terminology (see
US Title 17 or CDPA 1988) is "infringing". Anyone who insists on using the
word "stolen" is signalling their ignorance of the first, most basic fact of
copyright law.

~~~
smt88
In my example, the specific crime may not have been stealing, but there was
revenue stolen.

~~~
dsp1234
_but there was revenue stolen._

1.) If the item is being given away for free, there can still be infringement.

2.) If a person would never purchase an item at the available price (due to
the law of supply and demand for example), that person might still infringe.
No revenue was lost or gained since the transaction would never have completed
at the existing price.

In either of those cases, no revenue was "stolen", but infringement still
occurred. These are some of the many reasons that stealing isn't a good way to
describe copyright infringement.

