
Insights into a corpus of 2.5M news headlines - randomwalker
https://freedom-to-tinker.com/2016/09/14/all-the-news-thats-fit-to-change-insights-into-a-corpus-of-2-5-million-news-headlines/
======
jplasmeier
I nearly skimmed over this line but it's pretty important:

"The classifier was trained by the developer using Buzzfeed headlines as
clickbait and New York Times headlines as non-clickbait."

~~~
nerdponx
Bold and incorrect assumption.

~~~
rustyfe
Also amusing to me was that Reuters, Politico, and CNBC all scored _better_
than the NYT for the clickbait rating, despite NYT being used as the training
data. They wrote headlines more like them than they did.

~~~
TheCowboy
I think it is likely a (perverse) indication that the NYT is more successfully
adapting and competing in the news market. But headlines have always been
written with the primary intention of grabbing attention, modern web clickbait
style headlines might be another evolution of that approach.

I've thought that Times article headlines have been more clickbaity over time.
Some articles, especially if they aren't related to current events, are
becoming more casual in their voice as well.

If they can maintain quality journalism by only sacrificing headline quality,
that is probably a good and necessary tradeoff to make.

~~~
Torgo
NYT is struggling to survive, made more difficult by not wanting to succumb
completely to the evolutionary advancements that have made clickbait
successful.

------
beilabs
I worked on the Irish version of an open source tool that looked into this
somewhat.

Allowed you to pull in each article on the various news sites and stick them
into git where you could then see the difference in the articles over time.

A Phd student founds some interesting insights particularly in the Irish media
where content would often be changed sometimes weeks / months / years after
the original story was placed online.

[https://github.com/johnl/news-sniffer](https://github.com/johnl/news-sniffer)

Often though; I found it showed how little time a journalist often had time to
work on revisions to any one story....the world should lament the death of
journalism.

------
brownbat
Really interesting findings on clickbait. I worry about the viability of the
second half, using machine learning to detect bias (which seemed admittedly
inconclusive).

> finding bias in headlines is a more subjective exercise than finding it in
> Wikipedia articles...

Probably because bias from new Wikipedia users is less careful. And absent
smoking guns, once you get into the gray areas, several studies have shown
bias is often in the eye of the beholder:

[https://en.wikipedia.org/wiki/Hostile_media_effect](https://en.wikipedia.org/wiki/Hostile_media_effect)

------
cha-cho
I'm glad this work is being done. For kicks, I'd like to see the clickbait
classifier take on /r/savedyouaclick

------
yakult
What I'd really want is an adblock extension that automatically hides
clickbait articles on, say, google news. Or maybe a crowd-sourced thing that
converts titles into nonclickbait titles. People need to think of it as just
another form of spam.

~~~
a_imho
tldr.io might be close to your needs, but imo usually one can pretty
accurately guess whether a title is intended clickbait or not (of course it
helps that even if you are moderate about it in my guesstimation 90%+ of
articles are easily fall in that category).

I'm on the other end of things though, I would really like a tool which can
generate clickbait titles based on the article body, or even better, generate
the whole fluff piece based on some keywords. Something similar to
[https://pdos.csail.mit.edu/archive/scigen/](https://pdos.csail.mit.edu/archive/scigen/)
maybe

------
noir-york
The BBC has more clickbaity headlines than Politico, Fox, Breitbart? Really?

I think the research may need to take a second look at the algos used. Would
have been interesting to have the Economist in there as well.

