
How much of your audience is fake? - fauigerzigerk
http://www.bloomberg.com/features/2015-click-fraud/
======
meeper16
This is exactly why social networks are far less valuable than properties like
Google that started with algorithmic foundations as opposed to rebranding the
next geocities. I'll never trust facebooks numbers on their fake profiles
especially when Yahoo kills close to 1mil fake profiles every month and while
Ashley Madison was rife with them. Reddit started this way too, with tons of
bots, fake accounts etc. We just don't know how many fake or synthetic
Astroturfed
[https://en.wikipedia.org/wiki/Astroturfing](https://en.wikipedia.org/wiki/Astroturfing)
accounts are really out there. I have a close friend that ran a consumer
oriented search site with 50 million monthly active users and he showed me
that if certain sites, not his, did not make it easy for fake accounts to be
auto generated from various advertising groups and spammers, then they just
would not get the bloated monthly active user numbers they wanted.

Ironically, Google the term "buy facebook accounts". This is the elephant in
the room nobody wants to talk about in the social networking space. 50% or
more of the social networking profiles out there could be fake given bots,
systems and marketplaces that auto generate this stuff in mass amounts to spam
and siphon ad dollars.

Another way to look at this is based on how and why virus makers target PCs
along with how many PCs are currently affected.

Not many fake 'searches' are happening compared to fake bots and social
networking profiles. Yet another reason why Google makes $70 billion per year.

~~~
downandout
I personally know the founder of an SEO company that is running around 50 fake
searches per second as I type this.

~~~
vorg
Some outfits don't just do fake searches but also fake downloads. E.g. visit
[https://bintray.com/groovy/maven/groovy/view/statistics#stat...](https://bintray.com/groovy/maven/groovy/view/statistics#statistics)
then click on date range "1 year" and wonder why the daily download numbers
suddenly multiplied by 10 in late May. You'll have trouble believing the 2.5
million download number for Groovy from Bintray, and wonder how long they've
been running the same ploy with Maven.

~~~
thebournepopret
Late May is the beginning of intern/coop season and Groovy and Maven are
popular technologies used by software engineering interns.

Could explain some of the spike.

------
cognivore
I find the whole soft white underbelly of internet advertising interesting in
that pollutes the entire advertising pool. Even ads that are "legitimate" are
lost in a sea of click bait and click fraud and morass that is covered in the
article.

It makes me wonder about companies that get a large part of their income from
advertising (Google...). Once the ad market descends into a cesspool that no
legitmate company will dip their toe into, can companies that depend on
advertising revenue survive on the self-serving, artificial, and mostly
automated, ad market?

~~~
ttctciyf
I quite like the idea of a self-serving, artificial and mostly automated ad
market. Maybe making it wholly automated would be an improvement..

Like: a browser plugin that would randomly click ads in a different browser
profile, maybe through an IP-masking proxy, without ever displaying them or
the pages they load to the user (hey, maybe without ever traversing the last
mile from the proxy to the user's computer.)

In this way, and assuming the clicks can be made indistinguishable from actual
for-real user clicks (maybe a tall order if a proxies are involved?), the
whole ad market automation circle could complete while still retaining a
usable web. The online ad economy spins off to become a self-sustaining fully
automated exchange bubble, visible to the external world only at its interface
to participants' billing systems and analytics.

Publishers and ad networks win because clicks, network infrastructure
providers win because traffic is lucrative and content providers can flourish,
users win big time because they have all the benefits of an ad-supported free-
beer web with none of the down side (actually seeing and being tracked by ads,
that is.)

So long as the advertisers (or, more specifically, the campaign managers'
bosses) don't catch on, what could go wrong?

~~~
chongli
_In this way, and assuming the clicks can be made indistinguishable from
actual for-real user clicks (maybe a tall order if a proxies are involved?)_

Unless your automated system actually spends money then it will always be
distinguishable from the real thing. Advertisers, no matter how wealthy they
might be, cannot afford to pay for a "fully automated exchange bubble" that
gives them zero returns.

~~~
ttctciyf
Unfortunately, I guess you're right. Dammit, I knew there'd be a catch
somewhere! :)

------
marwann
Any clever advertiser doesn't judge the success of a campaign by its number of
clicks, but by its number of conversions (and ultimately, revenue). What's
become a new pain in the advertising space are spam/ghost referrals, messing
your analytics only for the sole purpose of making you visit their websites,
also adding up to the stack of fake audience of your site (which the article
fails to mention).

~~~
fideloper
The really crazy thing is that tracking from click to conversion is still not
a truly solved problem!

e.g. When a person clicks, and then closes the browser and/or views other web
sites, only to come back as direct traffic later and convert. The funnel is
very hard to track and very easy to lose.

Not that I do this full time (I don't), but I've yet to really see a solution
that lets me truly match up a person who clicks an ad and then converts.

At best, I can roughly correlate ads to a bump in revenue.

~~~
ecopoesis
Your example is actually pretty easy to track. That's why sites drop cookies
and have look back logs.

Harder to track is when someone clicks on a link on mobile, and then switches
to desktop to complete the transaction. Unless everyone is logged in, you have
no chance of tracking the funnel.

~~~
noxToken
I know nothing about advertising, but isn't this method thwarted by users
(likely small in number) that either do not allow cookies or dump cookies
after a session? The latter type would only affect tracking if the transaction
was completed in another session. I would think that IP tracking is somewhat
useless due to shared public address among users.

Really what I'm asking is that is it trivially possible to track a user that
does not store cookies?

~~~
ecopoesis
Yes, dumping cookies makes it much harder to track lookback conversions. Some
networks will use fingerprinting (IP, user-agent-- panopticlick style stuff)
to supplement their data, but it's much less reliable and a determined user,
like someone blocking or dumping cookies, is going to defeat it.

------
xenadu02
Modern web advertising networks are an example of data telling lies we (or
advertisers) want to hear.

The dream has always been to connect each advertising dollar spent to revenue
generated. The reality is that data only exists in people's heads, is spread
out over time, or is in their social interactions. My hunch is at least half,
if not more, of all positive impact of ads (for the ad buyer) is generated
this way. The data isn't low quality... it is literally impossible to collect.

For example: You research items on your desktop, then actually make the
purchase on your phone while taking the train home. Maybe they can connect you
to that purchase with enough tracking; So what if your neighbor or coworker
asks about the product? Can't track that one.

Another example: You hear ads on a podcast for Igloo and when work starts
asking about preferences for Confluence vs Jira you mention looking into
Igloo. Your company ends up adopting it. Later you grow from 20 people to
3000. There's absolutely _no way_ for Igloo to connect the dots leading to a
3000 user account. Let's say after discounts that's $300k/year. If Igloo paid
$200/episode * 52 episodes/year * 20 podcasts = $208k. That's an absolute
_steal_ just to acquire that single customer.

Yet another example: You may be 24 with no kids living in an apartment but
will you always be that way? Smart car makers understand that if you have a
good experience with your first car in their brand you're much more likely to
buy a larger car from them when you have kids, or a more luxurious car when
your career advances. How can you figure out if the dollars spent advertising
to the college kid with no money is wasted in that scenario?

So that's the great lie... that somehow all the tracking cookies, comScore
profiles, etc will make advertising more effective or has some benefit period,
regardless of fraud or click bots.

Advertising on the web right now is a fool's errand in many cases. I'm glad
because it gives the small guys like me a chance to exploit the system to buy
ads on lower-volume sites directly from the content creators and reach a good
sized audience.

~~~
eterm
You can track all of that stuff, you just need enough data to be able to draw
inference from the statistics.

Ultimately, if lots of poor people are worth targetting in a hope they will
one day be rich, the statistics will ultimately back that plan.

But I suspect that anomalies will fall out in the big picture.

------
Gustomaximus
This is such a re-hashed story. While not good, it's not the big issue people
seem to think they keep discovering.

For non-brand advertising this is easily solved by down funnel tracking. Any
marketer worth their salt tracks to the 'ultimate goal' and will avoid shadier
networks by results based tracking. End of the day marketing value comes down
to a ROMI metric. It doesn't matter if 50% of the impressions are fake as long
as the total investment pays off. For brand marketing I simply follow the
networks that show DM performance as an indicator of quality display.

Generally the worse point for junk clicks is mobile display - particularly in
app. Id never advise RON campaigns here without tight monitoring. I like to
build white lists rather than a black lists for companies where I handle their
marketing for display on mobile.

Interestingly I've been noticing increasing bot clicks on Google search ads.
If I can detect this is makes me really suspicious why Google, with all their
data/prowess is not.

This would really make interesting reading if someone wants to show this to
the world. Even more interesting than the 2012 FB 80% bot claim:
[http://techcrunch.com/2012/07/30/startup-claims-80-of-its-
fa...](http://techcrunch.com/2012/07/30/startup-claims-80-of-its-facebook-ad-
clicks-are-coming-from-bots/)

~~~
meeper16
Try this: Inside a counterfeit Facebook farm
[https://news.ycombinator.com/item?id=10275631](https://news.ycombinator.com/item?id=10275631)

------
sologoub
"The most startling finding: Only 20 percent of the campaign’s “ad
impressions”—ads that appear on a computer or smartphone screen—were even seen
by actual people.

“The room basically stopped,” Amram recalls. The team was concerned about
their jobs; someone asked, “Can they do that? Is it legal?” But mostly it was
disbelief and outrage. “It was like we’d been throwing our money to the mob,”
Amram says."

Every time I see an account of an advertiser seeing a viewability metric, it
reminds me how scare tactics always work to drive sales. Viewability tech is
very nascent and is often just guessing because there are so many variables
out there that can give it both false positives and false negatives.

However, giving such numbers to advertisers drives sales of your tech big
time, even if the number cannot be verified because it's such a scary number.

~~~
geocar
Amram is trying to become his own brand, so I skip articles that he's got a
quote in simply because he's not out to solve any problems, just get an agency
job in a few years. I don't think he actually understands ad fraud at all.

Here's the thing: The (Boris Boris) sites that myspace is selling are
obviously crap. Absolutely worthless. The traffic he's buying is a bunch of
fraud redirects, and _he_ thinks that it's the media buyer's responsibility to
be choosier.

However Christopher Barnet of Myspace is selling these sites as genuine video
preroll that retails at around 6-8$ per thousand views. He's selling them on
ad exchanges and they're buying it like crazy because it reports as highly
"viewable", it's in "demo" (meaning a lot of these users have a comScore
cookie), and of course, because nobody believe's Myspace is defrauding them.

So here we are: At least 80% of Myspace's ad inventory is fraud, and nobody's
going to ask for their money back because "hey, we got fooled too".

And life goes on.

------
zelos
For this article? Most of it, because the animations annoyed all the human
readers so much they closed the tab.

------
davidgerard
This is absolutely hilarious. And another powerful argument in favour of
powerful ad blockers. Let the ad malware talk to the bot viewers and leave us
out of it.

(That said, I'm still trying to work out how to _unblock_ Project Wonderful, a
non-arsehole ad network, in uBO. Seems to require picking the precise JS they
serve.)

------
ck2
If you have any kind of posting allowed on your website, 80% of your server
power is being used to deal with bots.

Good luck dealing with ipv6 traffic where their behavior cannot even be
tracked across connections.

I'm already seeing bots that do not repeat an ipv4 in their farm for hours and
that is with one attempt every few seconds.

------
anonymousDan
Would be an interesting tactic for e.g. Microsoft to secretly set up a click-
spamming botnet so as to pollute Google's click-through measurement enough
that advertisers no longer trust it. Would that even be illegal?

------
chad_strategic
I'm now very interested in making my own bot to generate views? Any code out
there?

This comes from a coder that had 100 twit bots, before twitter banned my API
access.

------
inversionOf
Aside from bot traffic, a significant percentage of "legitimate" traffic
seems, anecdotally, to be engineered accidental clicks -- the mobile site that
is constantly pushing content around in the hopes that one of your screen
interactions accidentally yields an ad click. As one of an endless number of
examples, a well respected, major recipe site has a mechanism to change the
servings, and first you have to click on a "servings" button, and then on the
actual serving count. After clicking on the servings, several hundred
milliseconds later an ad appears exactly where the count input is, and clearly
considerable engineering effort went into designing this, and many other,
accidental interactions.

For what? I can only speak for myself but my immediate reaction is to click
back and feel annoyed, and consider ad blocker options. It has never led to
engagement or a purchase. Ever. The end result is that the performance of ads
simply collapses, and sites have to get even trickier to entice accidental
clicks. Rinse and repeat.

If you work in the "trick click" space, you are just dooming yourself. It is a
race to the bottom.

~~~
commentzorro
_> engineered accidentally clicks_

You mean the Slashdot model. Four huge buttons that take up the entire screen
while scrolling. No room on either side to avoid them or get past them.
Slashdot has become the poster child for this crappy model.

~~~
dspillett
slashdot, now there is a memory...

I used to spend a lot of time there. That went down to barely any in recent
history as HN and other sources "took over". When the sourceforge adding
rubbish to downloads and slashdot reportedly censoring discussion of the topic
(they are owned by the same parent company) I realised how little I'd visited
in recent months and decided that I never needed to go there again.

Silly tricks like the one you describe when seen on previously respectable
sites seem to be a symptom of the site slowly dying and desperately grasping
for what it can on the way down.

------
socialmediageek
Very insightful. Thanks for sharing.

------
jooukish
Great article, thanks for sharing.

> He dismisses the idea that it’s hard to tell genuine traffic from fake. “The
> whole thing about throwing your hands in the air and saying, ‘I don’t know,
> maybe it’s real, maybe it’s not real’.

