
Headlines - dirtyaura
http://avc.com/2016/12/headlines/
======
jawns
Former headline writer here.

Headlines are not one monolithic thing, any more than publishers are.

There are giant worldwide news organizations, there are personal blog
publishers, and everything in between.

They all have different constraints and different motivations for their
headlines.

Even if you look at just one type of publisher -- a traditional print
newspaper with an online presence -- you are likely to find that they write
multiple headlines for each story, depending on where that headline will be
seen.

For instance, they might write one headline for the print version of the story
that's constrained by page layout requirements.

Then then might write another headline that gets displayed on their homepage.
It has to be short, punchy, and eye-catching.

Then they might write another longer, more complete, SEO-friendly headline
that gets displayed when you actually click on the link to the story.

There might also be an alternate headline that gets displayed as the HTML/meta
title of the page, which can be useful for when the link is shared via social
media.

And all of those have their specific purposes and limitations.

Another wrinkle: In the case of news organizations, it's highly likely that
the person who writes the headline is not the person who writes the article,
whereas typically with personal blogs the same person writes both the story
text and the title.

So if inaccurate or sensationalistic headlines are one problem, then another
problem is treating headlines as if they are all the same. They're not.

~~~
misterbowfinger
_...then another problem is treating headlines as if they are all the same.
They 're not._

What you indicated was that headlines are written in different circumstances.
But it doesn't mean that they're not all, roughly, similar.

I betting there exists a strong correlation between clickbait-y-ness and
pageviews.

So,

1\. Gather all articles with a lot of pageviews

2\. Do some NLP to get the really bad ones, i.e. "Look at what this <noun> did
after <predicate>"

3\. Remove outliers

And bang, you mark all of the sucky articles. That could be included in a
Chrome extension, much like uBlock. Hell, why not just have it in uBlock?

------
jonstokes
Wilson is solving the wrong problem.

Incentives matter, and the underlying problem that gives us clickbait
headlines (and fake news, and the rest of our current journalistic ills) is
that sites are primarily incentivized to maximize the number of people who
show up on the site (so that they load ads and/or can be surveilled) -- hence
the chase for "clicks at any cost".

There are two ways to fix this:

1\. Somehow change adtech's incentives from "I get paid in proportion to how
many clicks I get" to "I get paid in proportion to some other combination of
metrics that's a better proxy for quality content (and quality engagement)
than 'someone clicked on this and my ads loaded.'"

2\. Break the link between advertising and some worthwhile subset of "content
that we want to exist in the world" by coming up with a scheme to entice
readers to fund the content directly.

I think there's a whole universe of untried startup ideas in both areas. I've
been noodling around with an idea for #2 as a side project, and maybe at some
point this spring I'll attempt a "Show HN".

Anyway, my ultimate point is that attacking this problem at the level of the
content itself -- verifying headlines or rating the "fakeness" of news, etc.
-- is the wrong approach, and I think most of the smart people who toss these
ideas out know on some level that the proposed cure may end up being worse
than the disease.

The fundamental problem is the busted incentive structure. You have to find
ways to incentivize the creation of quality content by either rethinking the
relationship between advertising and users, or finding a way to get users to
pay. There is no third option that's market-based and sustainable.

~~~
secstate
To point one, a former co-worker of mine cooked up a great algorithm that
measured how much time a user spent on your site and how far down they
scrolled and then presented that as aggregated data on the different levels of
engagement users displayed.

It was brilliant work, and exposed deep flaws in many customer's traffic data.
But no one cared all that much and it became a buried tool in a much bigger
startup idea.

~~~
jonstokes
"[it] exposed deep flaws in many customer's traffic data" \-- this probably is
why nobody cared that much and it got buried. See my longish reply below. The
incentives are such that only metrics that make traffic look more valuable are
metrics that publishers and ad agencies have any love for.

I'm sure a truly great way of measuring a user session's real value to the
advertiser has been invented and discarded hundreds of thousands of times over
the past two decades -- invented because it seems needed, and discarded
because it actually worked and holy crap most traffic is garbage.

~~~
davemel37
>a truly great way of measuring a user session's real value to the advertiser
has been invented and discarded hundreds of thousands of times over the past
two decades --

Most advertisers do track and measure actual revenue to them. I think your
mistake is assuming junk traffic and bounced eyeballs dont convert or
contribute to sales...They do. If you have a decent product with decent
margins, each sale can pay for thousands of useless eyeballs.

If it didnt work, advertisers would stop.

------
beat
Validating headlines may not be as good a model as it seems.

First, _what problem are you trying to solve_? In this case, it's "How can I
find good articles even with bad headlines?" So while the approach addresses
headlines, the interest is in the content. So I'm not sure the proposed
solution solves the perceived problem.

Second, _what are the current solutions /workarounds to the problem_? In my
case, at least, the solution is blanket rejection of certain sites. I assume
certain sites are so full of clickbait nonsense and/or partisan propaganda
that I won't read them at all. The probably works better than some software
that will consistently rate The Economist as good and anything from Infowars
as nonsense (or worse, think the nonsense headline and the nonsense content
are sympatico, so it's fine).

Third, _what is the root of the problem_? And the root is largely that people
_like_ their nonsense. People consistently read bad headlines and bad stories,
often preferring them over respectable mainstream news.

And finally, _how do you implement this_? You clearly don't want something
that can be gamed by crowdsourced campaigns, or it _will_ be gamed. So you're
either somehow relying on deep learning automation, or you're relying on human
editorial effort. The former is unreliable, the latter is expensive, and
itself prone to both bias and rejection (consider how many people consider
Snopes to be untrustworthy).

I dunno. Maybe there's a great business or social idea here. But it's going to
take some deeper thinking.

------
dirtyaura
This is a very good idea: "someone, or some company, or some open source
community ought to build software that parses headlines and the stories that
follow and rate them for how well the headline represents the article."

~~~
ams6110
I think the cure would be worse than the disease in this case. Fake news is a
fake problem, or at least not a new one. We've had the National Enquirer and
similar publications in every supermarket checkout for decades.

------
jccalhoun
What I would really like to see is a way that the original source of a story
is promoted or easier to find. Too often I see a headline online and click it
only to see that the entire story is "according to site x..." Then I go to
site X and see that its story is "according to site y..." and so on.

While I know that some subsequent stories can do original reporting, too often
sites with better SEO just republish stories without adding much and, whether
intentional or not, often distorting some part of the actual story

~~~
jawns
I've noticed that Google is starting to experiment with this.

If you search for a news story, below some results you'll see a "Related
stories" box.

Some of those stories have labels like:

* In-Depth: a longer article about the story

* Local Source: an article from a source local to the story

* Highly Cited: the article that appears to be most frequently cited by other articles

* Most Referenced: web content that appears to be linked to from other articles the most frequently

* Preferred source: an article from a source you've marked as a favorite

See:
[https://support.google.com/news/answer/1217612](https://support.google.com/news/answer/1217612)

------
nattaylor
Wilson writes:

    
    
      I also have seen hundreds of stories written about me,
      USV, and our portfolio companies that have sensational
      and often inaccurate headlines followed by stories that
      are essentially correct and well reported. It drives me
      nuts but I don’t often do much about it.
    

Subjectively, this is not what I see. Instead I find that junky headlines go
with junky articles. That would still be an interesting thing to try to
objectively quantify, but different from what the author has observed.

------
vonklaus
Are there any free apis/rss feeds for newsline breaking stories?

Reuters and other publishers have rss feeds however they are split accross
many categories and also have strict ToU. I have been trying to find news &
event feeds that are free to consume; ideally with a headline and article, but
simply a blast like "3 trapped in hiking incident in montana cavern" would be
useful.

Any resources or experiences would be helpful.

------
roryisok
It would be nice if URLs were a living thing; that upon loading a page, the
browser had to pull live metadata for that URL indicating the title and end
URL. No more broken links, and the original content provider could retain
control of the headline.

It would increase page load time of course.

Actually maybe this is a job for a browser plugin

