
Quantifying the Clickbait and Linkbait in BuzzFeed Article Titles - minimaxir
http://minimaxir.com/2015/01/linkbait/
======
logicallee
I'll tell you guys a funny story. So, the Onion has this site:
[http://clickhole.com](http://clickhole.com) Check it out, it's great parody.
I actually go there.

So, the last time someone mentioned BuzzFeed - which I don't really go on, I
don't recognize it or anything - I ended up opening it and opening
Clickhole.com in adjacent tabs.

Now, since I don't read buzzfeed, I was then _completely_ convinced it was
clickhole. It had like articles like this (from the current front page, not
what I was reading but they'll do):

-23 Ways You Know Your Obsession With Your Cat Is Getting Out Of Control

\- The Definitive Ranking Of The Most Cringingly Great Dialogue In “Titanic”

and so forth. (The above are verbatim from the site right now.) But then I got
to one, and I couldn't figure it out. (Just looking at the title.) I've
actually found it just now, based on remembering a keyword. It was:

\- "Muslim-Owned Shops In Birmingham Were Attacked With Guns And Hammers"

I was looking at it and looking at it, trying to figure out the "joke" (why
guns and hammers.) It was like a pun I couldn't get. Like the Onion was too
clever for me. After about a minute, I gave up. And I saw on the tabs that I
was on an actual site, not clickhole. I had to just laugh and laugh. The
reason I couldn't figure out the joke was that I was on buzzfeed, which isn't
satire! I just couldn't stop laughing.

------
jcr
Max, great analysis. I was curious if you researched whether or not BuzzFeed
is using the tactic of multiple titles for a single (partially generated)
article?

The multi-title tactic was mentioned in the "The King of Clickbait" article
and discussion:

[https://news.ycombinator.com/item?id=8810670](https://news.ycombinator.com/item?id=8810670)

[http://www.newyorker.com/magazine/2015/01/05/virologist](http://www.newyorker.com/magazine/2015/01/05/virologist)

~~~
minimaxir
There were articles posted in multiple categories, which caused misleading
results due to double-counting when I first made the chart.
([http://www.reddit.com/r/dataisbeautiful/comments/2s6d1y/30_l...](http://www.reddit.com/r/dataisbeautiful/comments/2s6d1y/30_linkbait_phrases_in_buzzfeed_headlines_you/cnmk7y3))

The dupe articles had the same exact title, however.

------
DanAndersen
What I would love is a fully-integrated browser extension that
detects/fades/hides clickbaity headlines/links/articles, using something like
Bayesian spam filtering. I know there's already a node.js package that worked
with this concept
([https://github.com/TJkrusinski/clickbait](https://github.com/TJkrusinski/clickbait)),
but it would be great to have something that works like Adblocker on the
various CSS classes where headlines usually lurk, something that would check
all headlines instead of just some sort of blacklist for a few certain sites.
Maybe it would include a crowdsourcing aspect, where users would, given two
headlines, choose which one is more clickbaity, to train the system.

------
JetSpiegel
> BuzzFeed was one of the first news sources to use non-neutral headlines that
> deliberately invoke a reaction in the reader

Now now, let's not go overboard. BuzzFeed is a new tabloid, they didn't invent
tabloids.

------
dude_abides
Fascinating topic, but nothing surprising about any of the findings. (Does
that qualify this post as clickbait?:))

An extremely interesting question IMO would be what is the longetivity of
these tactics? Are there phrases that worked extremely well for a period of
time, and then stopped being effective? In other words, how long before people
get tired of clickbaits?

~~~
onewaystreet
The fallacy of the clickbait charge against Buzzfeed is that they don't use
clicks as a metric, they focus on shares. Buzzfeed learned from the once hot
Upworthy (who originated the "You Won't Believe What Happened Next" style
headline) that if a visitor feels like they were tricked into clicking on a
link to your site then they aren't going to share it. You get a lot more
traffic by delivering on the promises made by your headlines.

~~~
zeeshanm
I think it's kindov subjective. A clickbait headline may "trick" a user into
clicking on a link but it may also deliver on the expected content. Is there a
blog post dissecting above hypothesis though. I'd be interested in reading it.

------
danso
I'm teaching a Unix/data course this quarter, and inspired by the OP (who I
believe did an analysis/visualization prior to this) and other BuzzFeed-title
analysts, I put together an exercise in doing a frequency distribution of
BuzzFeed titles (really, an exercise in basic HTML parsing and regexes)
[http://www.compciv.org/homework/assignments/buzzfeed-
listicl...](http://www.compciv.org/homework/assignments/buzzfeed-listicle-
title-parser/)

I gathered the links through their archive pages, which go back to 2006...I
didn't run the exercise yet myself, but at one point did a quick count of
total titles since 2006 and counted more than 200,000. That didn't seem right
(and maybe my selector was off)...but the other day, I tried it with just the
2014 year and came up with more than 60,000 articles.

...that sounded _clearly_ off (the OP said he scraped only ~60,000 distinct
articles)...but when I visited one of the archive pages to eyeball it...yeah,
60,000 articles in a single year looks about right:
[http://www.buzzfeed.com/archive/2014/11/18](http://www.buzzfeed.com/archive/2014/11/18)

It's even more astonishing when you go back to the 2006 pages and see that
there were days/regular weekends in which BuzzFeed published _no_ content at
all.

~~~
minimaxir
Huh, I was not aware of the archive at all. However, looking at the archive,
it appears that there are a lot of community, user-added posts so I'm unsure
of the reliability of those posts.

~~~
danso
Yeah it took me a bit of time to find the archive. Unlike you, I have a little
more faith in BuzzFeed's SEO expertise and figured they would have to have
some kind of sitemap ;)

Don't know about the community posts or how they're structured with the main
content...it might be possible on a second pass to filter out non-Buzzfeed-
staff posts based on the URL (which I assume contains the username of the
author).

(Also, I don't know if they changed this since you attempted your last scrape
via pagination...but now if you paginate too far, you will be rewarded with a
Boyz II Men video)

~~~
aw3c2
[http://www.buzzfeed.com/sitemap.xml](http://www.buzzfeed.com/sitemap.xml)

------
benbristow
Someone definitely needs to make a Chrome extension that blocks these sort of
things. Or an Adblock filter.

~~~
pconner
Several already exist. Here are a few

[https://chrome.google.com/webstore/detail/baitblock/nnonoddk...](https://chrome.google.com/webstore/detail/baitblock/nnonoddkboglnjgnajhjiimgkclmciji)

[https://chrome.google.com/webstore/detail/gawkblocker/fbdakf...](https://chrome.google.com/webstore/detail/gawkblocker/fbdakfnfbdeccpildcaemgdipfkamghn)

------
ryan90
This is a great article for us marketers whose primary goal is shares and
pageviews.

Unfortunately, it's sad to think that this is the state of journalism. <sigh>

------
mFixman
> Additionally, BuzzFeed was one of the first news sources to use non-neutral
> headlines that deliberately invoke a reaction in the reader which then
> subsequently tempts them to click on the article in an attempt to promote
> virality.

I don't think this is a new phenomenon. Yellow newspapers like The Daily Mail
have existed since shortly after the invention of the printing press.

------
thehal84
Through my research I have found you can trick someone into clicking a link
but a share not so much.

I built a search engine that ranks on social shares and the social comments.

It's great for avoiding click bait in results.

[http://engu.me](http://engu.me)

------
zeeshanm
The analysis is right on. I once experimented with creating a single page
website, literally a single page, by posting a listicle with photos. The
webpage got over a million shares on Facebook.

