
61.5% of Web Traffic Is Not Human - vellum
http://www.theatlantic.com/technology/archive/2013/12/welcome-to-the-internet-of-thingies-615-of-web-traffic-is-not-human/282309/
======
omarforgotpwd
100% of web traffic isn't human, but some of the software listens to humans
more than others.

------
ChrisNorstrom
On my blogs I used to use a script that banned bad bots by adding their IP to
a list on my .htaccess file. Until it crashed the banning script by adding
over 5,000 IPs of bad bots in just 1 month. If my google analytics numbers
were correct I'd say between 50% and 70% of my traffic were bots. So 61.5%
sounds about right.

I've pretty much stopped looking at "number of impressions" when purchasing
advertising too. It's always untrue. A recent blog I was going to advertise on
said it had 35,000 unique visitors a month. But yet just a few weeks ago they
had a "free give away contest for a jewelry brand (no purchase necessary,
international readers allowed, free to apply, no shipping costs)" and only 100
people applied. That pretty much said it right there.

One more example: A site I was on allowed unregistered visitors to give their
entire collection of corporate logos a rating between 1 and 10 stars. And they
used radio buttons to vote for the numbers 1-10. As soon as you select a radio
button it auto-submits and gives the logo that rating. EVERY SINGLE LOGO had a
massive number of "1 star" ratings because all the bots would see a radio
button, check it, and trigger the vote. All the logos on their site had a
rating of 1.

~~~
viraptor
> "free give away contest for a jewelry brand (no purchase necessary,
> international readers allowed, free to apply, no shipping costs)"

The internet conditioned me to ignore such claims as too-good-to-be-true. I
would expect loads of people to treat it like "you're the 1000000th visitor!
you won!" and other "free stuff here (if you register and fill out hundreds of
forms)" offers.

~~~
ChrisNorstrom
Na, it was a very personal blog and the owner does things like that all the
time. The name of the jeweler was disclosed and everything. I get what you're
saying though, but it was legit.

------
guiambros
_...of all website traffic..._

The report [1] does not include any non-http traffic, which excludes all heavy
content like torrent, Netflix, music streaming, etc.

[1] [http://www.incapsula.com/the-incapsula-blog/item/820-bot-
tra...](http://www.incapsula.com/the-incapsula-blog/item/820-bot-traffic-
report-2013)

~~~
swinglock
That's what "web traffic" means. It didn't say "Internet traffic".

------
grinich
I'm actually surprised that 38.5% is human. That seems high to me.

~~~
tedsanders
It could be that humans tend to use more bandwidth. I imagine plenty of humans
watch Netflix, but very few web crawlers do. :)

~~~
solistice
They're excluding streaming and torrenting.

------
skylan_q
_61.5% of Web Traffic Is Not Human_

I, for one, am glad that human trafficking over the web is down to only 38.5%.

~~~
devilshaircut
Thanks for the laugh.

Also, I wonder what the definition for "traffic" is within the context of this
research.

------
ds9
I wondered how they distinguished bots. The only info I found on the study
[1], on a quick search, says:

"For the purpose of this report we observed 1.45 Billion bot visits, which
occurred over a 90 day period. The data was collected from a group of 20,000
sites on Incapsula’s network"

So, in the first place, not necessarily representative of the web generally.
And it does not say how they detect bots. Of course some declare themselves
[2]; otherwise I guess one would have to rely on captchas [3], which are known
to be of limited reliability and subject to an arms race.

1\. [http://www.incapsula.com/the-incapsula-blog/item/820-bot-
tra...](http://www.incapsula.com/the-incapsula-blog/item/820-bot-traffic-
report-2013)

2\. kristopolous on this page, '(bot|spider|crawl|aggregator)'

3\. which Incapsula does, as a condition of access to its summary of its
previous similar research.

------
bkirwi
It's interesting to me that the author links to an article about how coding is
useless for journalists, and then spends the rest of the article talking about
all the web-scraping scripts she's written or could easily write.

For me, this seems like a _success_ of the whole teach-everyone-to-code thing;
the author's picked up a bit of simple scripting and it's made some things
dramatically easier for her. It's certainly not professional-level, but that's
normal; for example, many people think programmers should know how to write,
but few think they should all be able to churn out high-caliber journalism. Is
this a failure of setting expectations, or is that really where some people
are setting the bar?

------
FrankenPC
I remember during the heady days of SLIP and PPP dialups, my dad was trying to
explain how the future of this new weird invention called the internet would
have automated software robots that would perform automated tasks to help up
out.

I had no idea what he was talking about.

------
onethumb
I estimated >50% of our traffic was from search engine crawlers specifically
(billions of requests per month) way back in 2010. It's only gotten worse
since then. [http://don.blogs.smugmug.com/2010/07/15/great-idea-google-
sh...](http://don.blogs.smugmug.com/2010/07/15/great-idea-google-should-open-
their-index/)

------
moca
The Internet access cost tends to be higher than advertising cost. You can
just compare revenue of AT&T to Google. I pay about $600/year for cable
internet, and my advertising value is much less than that. As long as bots pay
for the internet access, I think it is just fine.

------
jgalt212
Yet, advertisers pay for 100% (or very nearly that amount) of ads displayed--
regardless of the client.

~~~
PavlovsCat
Dunno. I assume quite a lot of bots don't execute Javascript and therefore
don't get to "see" a lot of ads anyway.

~~~
jgalt212
that's good point. It would be nice to see someone who does not have a vested
interest in under or over reporting the number of bots on the web take a good
crack at this number.

Per spider.io, bots are ripping off advertisers left and right. That being
said, scared advertisers are the target market for their products.

[http://www.spider.io/blog/](http://www.spider.io/blog/)

~~~
kristopolous
I've got no investment either way. My nginx log has 19,333,928 entries ...
when I do grep -Ei '(bot|spider|crawl|aggregator)' on it I get 2,609,874.

That means that bots who declare themselves make about 13.5% of requests to
this site. The robots.txt has no restrictions.

I don't know if that's helpful.

~~~
arn
That doesn't really tell us anything about whether or not they trigger
javascript. Bots triggering javascript will inflate advertising and js-based
analytics numbers, which would be more problematic.

Bots that don't trigger js, however, are more a drain on bandwidth than
anything else.

~~~
ch4s3
I seem to remember from a job I used to have as an analyst that advertisers
have a few ways to discern human and bot traffic. They don't pay for traffic
that is known to be from bots. However there are clearly some bots that aren't
accounted for this way, and some that are designed specifically to trigger
ads.

------
sparrish
I know monitoring services like ours (NodePing) create a huge amount of
traffic by just keeping watch over all those servers... in case someone wants
to see that cat video. <grin>

------
proksoup
At least.

------
UNIXgod
zombie? UDP? DDoZ botnet?

~~~
ehsanu1
googlebot

~~~
STRiDEX
bingbot

