
Inside the Google News algorithm - iuguy
http://blogs.computerworld.com/14861/inside_the_google_news_algorithm
======
anigbrowl
A lot of has gone into Google News and it was really 'best in class' for a
long while. But progress has gone into reverse over the last year, to the
point that I've been trying to spot which competitor will disrupt it. Of
course, I have no idea what weighting is given to the various coefficients,
but the ones revealed at this talk seem entirely consistent with the cruft I
see on my news page.

 _Volume and originality of content produced consistently about a topic._

This would explain why I see so many blogs, one-note websites, and even press
releases from lobbying groups or those obsessed with a particular topic. It
works OK for broad categories like 'Business' or 'Sport' but allows niche
subjects to be gamed easily.

You know the most annoying one that keeps coming up? I have a search for
'legal' because I'm interested in news about the legal industry, major court
cases and suchlike. 20-30% of stories in my daily news feed about about how
someone arrested for drunk driving had 3-4x the legal limit of alcohol in his
bloodstream. Not anyone famous, it's just an easy story for local newspapers -
'you won't believe how drunk this guy was!' Well yes I would believe it,
because I get this story several times a week.

 _Links around the Web._

Content/policy affiliates networks defeat this easily. I just game up and
removed my political news feed a few months ago because both left and right
bloggers seem determined to live in an echo chamber of ideological allies, and
quote endlessly from each other to drown out any other perspectives. The same
thing happens with non-ideological topics that have a strong overlap with
large commercial markets; instead of news about health and medicine, I often
get information about new services available under medicare that are thinly
disguised plugs for commercial products aimed at the elderly, such as mobility
scooters. Health & wellness is as favorite topic of content farmers since
readers and searchers of these topics are usually highly motivated to learn as
much as possible.

 _What users do in response to links to that source on Google News._

To the point where sometimes I prefer not to click on a story, but open up an
incognito tab in Chrome and retype the headline to read the story. Earlier
this year, I clicked on two stories within the same week, referring to
lawsuits involving sports injuries. Although I like my local baseball team, I
really don't care for sports in general, and even less for sports writing. I
have Google News configured not to show me any sports stories. For months
after reading the two articles mentioned above, I kept getting stories about
pro football players in my 'top stories' news feed - who was being traded, how
well they played, who they're having affairs with, and what-all else. I'd
select 'fewer stories from ESPN' if it was a sports-centric website, but WTF.
I'm OK with seeing things like who won a major championship or somebody famous
getting arrested, but those sort of things are relatively rare.

I know I'm not the typical news consumer and frankly I'm not 'where the money
is' from Google's perspective, or that of the media in general. But I
desperately want less automation and better input/configurability.

