
To show how easy it is for plagiarized news sites to get ad revenue, I made one - airstrike
https://www.cnbc.com/2020/05/17/broken-internet-ad-system-makes-it-easy-to-earn-money-with-plagiarism.html
======
mgamache
Don't mean to be snarky, but this is _not_ how easy it is to get ad revenue.
It's how easy it is to get approved for ad networks. She didn't even get
adsense. I find it completely unremarkable that anyone could set up a non-
adult site that has human generated content and get ads placed. The traffic
needed for real ad revenue is a different story. I bet that site gets close to
zero traffic (not even enough to cover hosting). SEO (black-hat or whatever)
is the trick IMO not getting ad revenue. Plagiarized new domains get no weight
in the Google engine.

~~~
CM30
Honestly, it'd be pretty interesting to see an article like this which
continues after the 'approved for ad networks' part and shows how such a site
could rank in Google, do well on social media sites, etc.

Could be interesting to see how scammers are doing that, and lead to some
potentially interesting insights about black hat SEO, social media marketing,
targeted ads, etc.

Because yeah, as you said, getting approved by an ad network is only part of
the story, and not very much of it at that.

~~~
berbec
It may be a low bar, but being able to automate this (scraping website ansible
playbook?), makes the effort required as near-nil. They only have to clear $50
to pay for ALOT of domains and hosting.

~~~
notahacker
Sure but you need thousands of legitimate-seeming pageviews to get that $50
back, and the networks - even or especially the bottom tier ones - are likely
to be hotter on click fraud than scraped content.

~~~
mgamache
You would have to get ~16,000 pageviews to make $50. (assuming $3.00 CPM --
which would be low for adsense, but not for these second tier networks).

~~~
luckylion
And 16,000 page views without unique content and links _might_ happen, but
most likely not within a year or five.

If you're lucky, you trigger something in Google's black box and they rank
your site better than others at the same level, but you'll still only do long
tail, and even on long tail, you'll compete with the original source of the
article, which has a billion links pointing to its domain. Since you'll also
need to go for quantity, you'll have a giant amount of pages as well, which
will not help you even with niche rankings.

I doubt that the site would pull in 10 actual, human visitors per day on
average with just scraped content.

------
mrtksn
It’s a dirty business, no cost is too great for eyeballs.

In Turkish whatever, you search the first few pages of results is from the
Turkish largest news outlets because the SEO’ed for everything and Google
doesn’t care.

Do you want to learn how to renew your driver license? Good luck with that
because your search results will bring you a wall of text articles that are
almost the same for every search term.

“Lately people started to ask themselves how to renew their driver's license.
But do they consider the risks of renewing drivers licenses? Experts agree
that renewing the driver's license can be a complicated thing. Now strap on
and get ready to learn how to renew your driver's license”

Think to have pages like that on CNN, BBC and others. They are the top result
for so many searches.

Plagiarism of news, on the other hand, is more nuanced IMHO. There’s nothing
stopping you to say “NBC reports that” anyway. As per the article, you can not
use their assets but you can create or even generate articles about the news
based on the news.

The ad business is dirty. I’m almost proud of blocking ads.

~~~
mprev
In the U.K. it’s supermarket opening times, especially near holidays.

Google for “Aldi opening times Easter Sunday” and you’ll get articles from the
lower quality newspaper websites.

It’s pathetic.

~~~
mrtksn
Oh, definitely. Especially in these pandemic days that was something that I
tried and failed. On the Turkish web apparently the news outlets gave up any
hope of respect and now every single one of them is doing it. The biggest
ones, the leftie ones, the right-wing ones, the cushy with the government
ones. All of them.

No way Google isn't aware of this, there's a local Google office in Turkey,
they have a large presence and full Turkish language support on most of the
products.

Maybe it's simply part of the business model now. If a supermarket wants
people to find their opening times, maybe they should buy an ad placement.
There's no money in the high-quality organic search results I guess.

~~~
gokhan
I'm watching this exact problem for SEO in Turkey and I can say that it was in
full-force way before pandemic. Google doesn't care. Instead, they're busy
flagging pages discussing "penisilin (penicillin in Turkish) application for
kids" as AdSense Policy Violation since the page contains "penis".

------
propter_hoc
I suspect one of the hard parts of this for Google is that many news sites
legitimately publish the same articles because of wire services and
correspondence arrangements, like AP and Reuters. Hard to tell whether the new
site is plagiarizing or syndicating.

~~~
wobbly_bush
Don't they always include the source as AP or Reuters in the body of text
somewhere?

~~~
ryanwaggoner
You can copy that text too...

------
bufferoverflow
Doesn't say how much she made from it. Guess: very close to zero, if not zero.

~~~
benburleson
Including hosting? I'm sure it's actually in red.

~~~
xwdv
If you include the hit to your professional reputation from actually
plagiarizing a news site for revenue and getting blacklisted from the
industry, then what did it cost? Everything.

~~~
waheoo
Or you get placed as a cto with a fat raise.

~~~
xwdv
My dream job.

------
kayoone
i remember in 2004 or so when an agency i worked for had a wikipedia clone
running with adsense and tons of SEO which made 20k per month and basically
kept the company afloat. I was a young junior dev and while i was impressed by
it, it never felt right to me (which it obviously wasn't in many ways). As far
as i remember this only worked for about a year at best until Google penalised
those sites more and more.

------
gitgud
Unethical Continuation of This Idea:

Scrape existing news sites, and use machine learning to paraphrase everything
so Google doesn't detect plagiarism.

~~~
zaphods3rdhead
US military has already been working on this for ~ a decade. There was a
contract out of Redstone Arsenal where they writing "story spinners" to scrape
and re-word war-time propaganda.

------
randomgoose
How easy is it to get distribution over social media sites like facebook or
Twitter? The distribution costs would be close to zero there right?

------
peter_d_sherman
>"It all underscores the fact that the ad tech space is so convoluted, it’s
easy to make money from legitimate advertisers just by setting up a web page.

 _That means there’s significant incentive to create sites with not just with
low-quality clickbait or A.I.-generated nonsense, but sites filled with
outright plagiarized content._ "

------
ipiz0618
It's really easy to get any content on the internet but really hard to verify
if they are plagiarized. Basically anyone can place some ads on their
websites, but if the site posts nothing but copied content, I doubt if it will
last.

------
kevsim
> These firms mostly sold “popunder” ads, which pop up a new link in a browser
> tab when you click something

Who is buying ads on these networks? There cannot possibly be any returns can
there?

~~~
is_true
It might be a "victim filter", some scams are created to avoid wasting time in
people smart enough not to fall for the scam in the following steps.

------
nillium
We're working on another way for disseminating news. It might make plagiarism
a little more difficult, while also working a little better for our audience:
[https://blog.nillium.com/what-can-napster-teach-local-
news/](https://blog.nillium.com/what-can-napster-teach-local-news/)

~~~
kevsim
How does that help prevent plagiarism?

~~~
nillium
Because it isn't full articles -- just updates as they happen coming straight
from the newsroom, more like tweets. It's not to say that people can't
plagiarize, but it wouldn't be as easy or make as much sense as just copy and
pasting an article.

------
aaron695
OK first try. But needs more work.

Not much proven so far.

Many site _seem_ to translate to language X and back to English to clean the
data.

Research this.

Anyone using GANs yet?

How do you stop sites blocking your scraper?

There's money for the ad companies to allow you to plod along then steal your
hard earned money because you are breaking the rules. Are they?

