
More News Is Being Written By Robots - gphilip
http://singularityhub.com/2014/03/25/more-news-is-being-written-by-robots-than-you-think/
======
jawns
I used to work as a web editor at a daily newspaper. One of my daily
responsibilities was to write and post short blurbs about lottery results and
beach surf conditions. They both were extremely formulaic, so I wrote push-
button scripts that would fetch the data, parse it, and generate stories.
Saved me time, which I used to do other, more important things.

Some people have said, "Why not just display the raw data?"

Well, to save you the trouble of having to analyze the data.

For instance, suppose I've written a Powerball results script that lists the
winning numbers, along with which states had winners.

If I'm just spitting out the raw data, then people might miss the fact that
one of the winners was from our state -- whereas if I'm generating a story,
I'm going to make that the lede.

Similarly, the magic of a company like Narrative Science and its Quill service
is not in taking a ton of data and filling in a bunch of blanks with the
values, Mad Libs style. It's in analyzing the data and figuring out what the
most important parts are, and constructing a story around those findings.

In other words, it's not hard for a bot to write, "The Tigers played the
Wolves yesterday. The Tigers won 1-0. The Tigers' John Johnson scored a home
run."

It's more difficult to write, "The Tigers' Tom Thompson pitched the first no-
hitter of his career yesterday in a 1-0 game against the Wolves. Remarkably,
the Wolves' pitcher, Dobbie Dobson, was moments away from forcing the game
into extra innings with his own no-hitter, when, in the ninth inning with two
outs and two strikes on the board, the Tigers' John Johnson hit a home run."

(I'm not a baseball writer, so I'm probably bungling it, but you get the
point.)

~~~
exelius
News in this sense is just catching up to the financial world. Big finance has
had bots that parse news feeds and raw data for YEARS as part of their high
frequency trading (HFT) platforms as a form of arbitrage.

So let's say a major earthquake happens in San Francisco and the Google HQ
falls into a bottomless pit. The trading algorithms will know about the
earthquake within seconds, and they will also know about likely know about
Google's obliteration from speech-to-text of police scanners. The millisecond
these reports hit the wire, the HFT algorithms would likely short the hell out
of Google's stock immediately while also placing positive bets on Google's
competitors as well as construction companies and building suppliers. And this
was going on 10 years ago; they're probably light years ahead of this by now.

It's also not very hard to write the second piece given the right machine
learning technology. Feed a decade's worth of human-generated articles into an
algorithm (which could also be paired with marketing metrics like page views,
avg time spent on page, etc.) and the algorithm picks out the types of
statistics it should look up from the databases and how those statistics
should be incorporated into the story. Heck, if you wanted to get really
fancy, you could try to pick the major themes of the game out of a transcript
of the play-by-play or specific emotional moments of the game by measuring the
level of crowd noise or twitter traffic.

A computer will have better information recall, better research skills and
better perspective to write stories that are more appealing to people. Heck,
they could even write a story on the fly designed to appeal specifically to
the individual's preferences. I understand that there's a place for reporters,
but I don't think it's going to be written event-based reporting for much
longer.

------
Pitarou
TL;DR

A Chicago startup called Narrative Science makes scripts that turn raw data
feeds such as seismological data or website analytics into accessible natural
language reports. Some news outlets are already using them to get out
earthquake and Little Leagues baseball reports quickly and cheaply.

The founder, Kristian Hammond, has big ideas. "Pulitzer Prize by 2017." "90%
of news written by bots in 2030."

This TL;DR was written by a human. Just saying.

~~~
pantalaimon
\- In a recent example, an LA Times writer-bot wrote and posted a snippet
about an earthquake three minutes after the event.

\- The LA Times claims they were first to publish anything on the quake, and
outside the USGS, they probably were.

\- The LA Times example isn’t special because it’s the first algorithm to
write a story on a major news site.

\- Indeed, Kristian Hammond, cofounder and CTO of Narrative Science, thinks
some 90% of the news could be written by computers by 2030.

\- The LA Times earthquake story, written by an algorithm created by one of
their staff, included a disclaimer.

This TL;DR was written by a bot.

[http://www.textteaser.com/s/Y4TWuU](http://www.textteaser.com/s/Y4TWuU)

~~~
Pitarou
And the bot doesn't get the details wrong!

I mistakenly wrote "Bill Hammond" instead of "Kristian Hammond" in my TL;DR.
(Sorry Kristian.)

~~~
higherpurpose
But it could still have bias.

~~~
sosborn
Everything is subject to bias.

------
seanccox
"If a writer never had to compose a fifty word earthquake report again—few
would complain. Better to leave the short, dry, purely informational articles
to the bots."

The logical extension of this software is to replace the phrase 'earthquake
report'.

"If a writer never had to compose a fifty word Presidential press brief
again—few would complain."

"If a writer never had to compose a fifty word news leak again—few would
complain."

"If a writer never had to compose a fifty word announcement of a declaration
of war again—few would complain."

Let the machine do the writing to expedite publishing, because the accuracy of
the source is assumed before the information is even shared publicly, provided
the source submits releases in a formulaic manner.

This removes the agency of humans that might ask complicated questions
(journalists) – so I don't believe it qualifies as news or journalism, at all.
This just helps move the words of the source to the front page of a news
agency.

Why dress up the release as unique content at all? Just print the damn press
release word for word and cite the source.

~~~
EdiX
A lot of modern journalism is already like this, it's a middle man between a
number of sources (news agencies, PR offices, politicians) to a public. What's
worse is that usually the source wants the message spread as far and wide as
possible, that's why journalism is having so many economic problems recently.

~~~
seanccox
Agreed. The internet burst the bubble that news agencies possessed editorial
integrity, and that has cost them business. This just helps online media
transition faster in the direction of link bait bullshit. The only people who
benefit from bot composition are the wealthy and powerful. I am glad the LA
Times has disclosed their use of bots, and I hope others will do the same, so
I can stop reading their 'work'.

~~~
dragonwriter
> The internet burst the bubble that news agencies possessed editorial
> integrity, and that has cost them business.

That's a nice explanation and particularly common since the 2000s, but the
problem with it is that the decline in perception of most mainstream news
sources (particularly newspapers) and the resulting decline in the industry
was already widely discussed before the internet, as anything other than
ARPANET, existed, and certainly before it was a major factor in public
perception of the news.

------
mlchild
Most news stories are already being boiled down to < 140 characters by humans.
The next frontier in journalism is an explosion in informed, opinionated
analysis from multiple perspectives.

We're seeing this in sports (Grantland, Deadspin, 538, Baseball Prospectus),
politics (Vox.com/Ezra Klein, 538 again, Politico), and tech (Thompson, Evans,
Gruber, The Wirecutter, Anandtech,
MG/Dixon/Wilson/Suster/Andreessen/Horowitz/etc/yesIleftamillionpeopleout).

Tech leads the way because those on the cutting edge are often interested in
tech, and I believe it's a solid leading indicator of where all journalism is
headed—a 'Cambrian explosion' of thoughtful, analytic, but not purely
objective writing.

~~~
jonnathanson
There are three areas of journalism I see as relatively "safe" from bots for
the foreseeable future:

1) Big-picture Op/Ed (of the kind you're talking about)

2) Investigation (of the deep, involved, I-need-to-know-where-to-start,
I-need-to-get-people-to-talk variety)

3) Narrative nonfiction (profiles, colorful takes on a subject; see: The New
Yorker, Michael Lewis, etc.)

The interesting thing is that each of these domains could be _enhanced_ by the
use of better software. And that's the best way to start looking at bot-fueled
journalism. I'd _love_ a machine-based fact checker for my pieces. I'd love
AI-driven help with compiling and analyzing data. I'd love to outsource the
high-frequency grunt work to a machine, just as it's done in many other
industries. It would give me more time to think. It would let me focus where
my focus matters. Instead of spending 99% of my time chasing raw research, and
1% of my time scrambling to assemble the piece, I could get more data, better,
faster, up front, and thus put deeper thought into the analysis.

On that note: the journalists who thrive in the machine age will be the ones
who understand what the machine is doing. Data-literacy is already a big,
differentiating factor in the market now. Statistical competency will be very
important, and the bar for competency will be positioned higher every few
years. Beautiful logic will trump beautiful fluff.

This will accelerate the bimodal shake-out of journalism that we're already
starting to see. We'll have base, commodified, LCD-pandering content farms on
one end, and deep thought on the other. Quantity shops and quality shops, both
of them improved by the automation of certain functions. Anyone in the middle
ground is in trouble. This will be a good thing for both poles -- provided we
can find a way to make the economics of quality work as well as those of
quantity.

~~~
mlchild
Interesting—is there something small and replicable that a piece of software
could help you do right away?

Also, I lump 1) and 3) together in my mind as not-quite-objective, analytic
content, but I admit that was poorly expressed.

~~~
jonnathanson
_" is there something small and replicable that a piece of software could help
you do right away?"_

Fact checking is a huge problem. I'm not sure if call it a hair-on-fire
problem. But it's certainly pissed me off on any number of occasions.
Typically, what happens is that I'll write a story mildly critical of BigCo.
The PR department at BigCo will catch wind of it, then email me and my editor
with as many small "corrections" as they can muster, in an attempt to take
down the piece.

When I say corrections, I don't mean broad facts. I mean infinitesimally
perceptible technicalities. For instance, I might report someone's title as
the "VP of Strategy," because that's what's listed on this company's website
and on the person's LinkedIn page. But the company's PR group will come at me,
screaming bloody murder, because the guy's title is actually "VP of Strategic
Initiatives." They'll do this because they're unhappy with the broader
criticism I'm leveling against their business -- so they want to pick away at
as many nits as possible. It's a guerilla harassment tactic.

Now, you might well ask: aren't I just being sloppy? How hard can it be to
fact check everything? The answer is: it's damned near impossible to catch
everything. Some of this information is conflicting across multiple sources.
Some of it's out of date or inaccurate. And some of it's not even publicly
available, and you wouldn't _know_ you're slightly off until someone corrects
you. (Case in point: the "Strategic Initiative" example, a nonsensically
titled internal promotion that was never made public.) Add to this the immense
pressure of weekly deadlines, and factor in how much time I need to spend
ideating, researching, writing, and editing in the first place, and you see
how little time I have to get every last thing 100% accurate. I'd love to be
100% accurate 100% of the time, but given my constraints, that's rarely
feasible. It's like asking an engineer to write flawless, bug-free code at a
hackathon.

I'd _love_ a piece of software I could run my stories through, which would
analyze my piece against a public data set (Google, for instance) to check for
factual accuracy in all these little places. Or at least a piece of software
that would throw up red flags on unverified statements, for which multiple,
conflicting data exists on the web. Sort of like the little, red underlines on
automatic spell check.

Perhaps that's not a big enough use case or TAM. But broaden it out. I'd love
a better way to compile Google research about any given query. This is why I'm
rooting for the stuff Wolfram is working on, though I'd love a more modifiable
and specialized front end for professional users.

~~~
mlchild
Really interesting. Complicated, but really interesting.

~~~
jonnathanson
Yeah, by no means an easy problem to solve. But I think better research tools,
in general, are an interesting area. Search engines are beautiful things, but
their output is limited by your input. They can't necessarily help you with
the things you didn't know to look for in the first place. In a way, a search
process is a very deterministic, almost teleological one. I believe there is
room in the market for a more focused, yet more open-ended research
methodology.

------
davidw
Indeed, it turns out that HN's own tptacek is a particularly clever Perl
script.

~~~
noonespecial
Heh, I only get my news from ruby scripts. Perl written news can be a bit
obtuse and hard to parse as its umm... regularly expressed.

~~~
imikushin
Beware of Scala bots, too. They type way too much :D

------
shakethemonkey
Thousands of Wikipedia articles regarding US cities were originally written by
a bot years ago. Most still retain this content or traces of it, Englished
from US Census data.

~~~
Semaphor
I found this interesting, so here are some links:

[http://en.wikipedia.org/wiki/User:Rambot](http://en.wikipedia.org/wiki/User:Rambot)

[http://en.wikipedia.org/wiki/Wikipedia:History_of_Wikipedia_...](http://en.wikipedia.org/wiki/Wikipedia:History_of_Wikipedia_bots)

------
motters
Also see [http://churnalism.com/](http://churnalism.com/)

More news is really just crudely rehashed or verbatim press releases by
particular companies than you might think.

~~~
Qworg
Out of all the things I learned working in marketing, this was one of the most
shocking.

Many times, cool news about a particular company or a new products was
written, verbatim, by the company itself.

~~~
ryandrake
Really, you found it shocking? I thought this concept is pretty much well
understood by anyone who's seen an infomercial.

~~~
motters
I found it shocking when I first learned about it. Previously I had thought
that articles featuring in the news were actual journalism, not just copy and
pasted stuff from company press releases. It changed my view of what the news
is.

------
ptha
"With the help of Chicago startup and robot writing firm, Narrative Science,
algorithms have basically been passing the Turing test online for the last few
years." If an article written by a bot is indistinguishable from that written
by human author, it does not pass the Turing test, as they are typically based
on conversations. One of the claims is that is that the bots produce typo free
articles, funnily enough some programs were able to fool judges of the Turing
test by imitating human misspellings:
[http://en.wikipedia.org/wiki/Turing_test#Loebner_Prize](http://en.wikipedia.org/wiki/Turing_test#Loebner_Prize)

------
gphilip
In a recent example, an LA Times writer-bot wrote and posted a snippet about
an earthquake three minutes after the event. The LA Times claims they were
first to publish anything on the quake, and outside the USGS, they probably
were.

------
asperous
I've always thought news could be boiled down to a few important facts, and I
wonder if having all that extra copy is really important in this day an age.

Why aren't news digest sites a thing?

~~~
perlgeek
Taking this one step further: Why aren't "factual" news sites a thing, where
news are computer readable (think JSON), and you can filter for the
information you want?

Not all news can be made computer readable easily, but there are several
classes of news that can be that way, or at least have rich, computer readable
meta data (sports results, releases of software, new music albums, movies,
upcoming events, announcement of road maintenance, ...).

Update: Is anybody interested in working on such a system with me? If yes,
drop me an email (moritz@faui2k3.org)!

~~~
herokusaki
Great idea! I'm not sure something good would come out of it but another type
of news that follows a predictable pattern and could be made machine-readable
is celebrity gossip.

------
nyrina
It would've been a really cool mindfuck if the article ended with something
along the lines of "And this article was written by a bot"

~~~
NKCSS
You beat me to it :)

------
asgard1024
This is terrible. What a waste of effort! We have a machine that translates a
simple fact into few paragraphs of text, so that thousands of people can spend
time trying to do exact reverse when they read it.

Why don't they just publish raw facts in the computer readable form?

~~~
christiansmith
That might work well for someone with domain expertise. But not everyone is
prepared to look at raw data and instantly extrapolate meaning. Taken to an
absurd extreme, do you also think that machine translation between languages
is also a terrible waste of effort? Why not just learn to read kanji fluently
and skip the extra step?

~~~
asgard1024
If the computer adds some additional analysis to the data, autonomously, then
it would be fair that reader should understand the nature of the analysis.
With humans, we kind of live with it because we know they have common sense,
so we kind of know what we can expect (or maybe, humans understand what are
our needs). With automated translation or adding titles, we understand the
process. But in automated writing? I am worried a lot of effort will be wasted
trying to figure this out.

------
swalsh
"Indeed, Kristian Hammond, cofounder and CTO of Narrative Science, thinks some
90% of the news could be written by computers by 2030."

I think there's more to a reporters job than writing, in fact i'd say the
writing part is maybe the smallest aspect when it comes to the most
interesting stories.

Investigation, interviews, and general "experiencing the world" is required,
and computers aren't ready to do that yet. I'd be surprised if they are by
2030 too.

Of course, it could be a sign of how far down the quality of news has fallen.
That 90% of it is now formulaic. I guess I had hopes that these new sites that
are popping up would becomes a trend in quality. Artisanal news as the
hipsters might say.

~~~
StavrosK
He didn't say "90% of the job of reporters could be done by computers", he
said computers will do the writing.

~~~
hpriebe
True. And I wonder how computers/bots might assist journalists in the other
aspects of journalism - investigation, interviews, and trend spotting, etc?

I'm not sure I hope for the day that computers replace the human analysis,
emotion, or opinion in news, but I do believe they could help journalists
better spot leads and dig into stories. This would free journalists up to do
less leg work and more thinking about the content in front of them.

That said, the process of investigation, etc. does contribute to the
experience a journalist brings to the table, so the question then is, does
cutting out the legwork undermine a journalists ability to deeply understand
and analyze a given story?

------
atmosx
If the _news_ would turn into reports that way, I think would be better than
_opinionated_ news. It's okay to get opinionated news, as long as you _know_
the frame and background of the speaker/paper/channel/etc.

~~~
icebraining
All news is opinionated. Even the driest report has an implicit opinion on
what facts were considered relevant to the audience and which ones should be
left out. If in this case the algorithm is just turning data into English,
they're just leaving the opinionating to whoever wrote the data report.

~~~
pjc50
.. and the choice to present the news in English is itself an opinion.

(Check out some French, German or Spanish news sources and see how different
the world is sometime)

------
underyx
>Pulitzer Prize by 2017

Isn't that a bit too ambitious? I admittedly don't know a lot about the
Pulitzer Prize, but I'd assume the main qualities they're looking for in a
journalist there are not 'fast', and 'factually correct'. The first one
doesn't really relate to the quality of journalism, and the second one's just
too basic a requirement.

What I'd assume is worthy of a prize in journalism, is the data collection and
investigation process before even starting to write the article. This would
include very complex tasks, most covering social interaction, which I just
can't see a computer outperforming a human in by 2017.

------
camus2
It doesnt really matter ,journalism is already dead. Free news means sponsored
news,therefore it's not news,it's marketing.

The future of news is to raw/unredacted/untransformed datas with tracable
origin that people can analyse themself and decide to trust / not to trust.
Exactly like the Snowden leaks.We dont need "journalist" to source/filter the
data.

People then will be able to mashup the data with apps designed for that.

------
calebclark
I was half expecting the final sentence to announce that the writer of the
article itself was actually a robot named Jason Dorrier.

------
quasque
That's nice, but I found the story behind the leading image more interesting:
[http://www.robotlab.de/bios/bible.htm](http://www.robotlab.de/bios/bible.htm)
(robotic calligraphy)

------
shittyanalogy
Next Step: Individuals have personal news writing software that scans sources
of interest and generates human digestible stories. Completely bypassing news
agencies all together.

------
marincounty
Sometimes, I wonder if the average dude, or Karen the cool Koder even notices.
I have read the TLDR too many times. "I want my golden egg now daddy!"

------
ExpendableGuy
Reminds me of this:
[http://youtu.be/IC3W1BiUjp0](http://youtu.be/IC3W1BiUjp0).

------
cylinder
When did software and scripts become known as "robots?"

------
ape4
Why did it take the bot 3 minutes?

~~~
perlgeek
There was manual approval involved; I guess that's what took 3 minutes (or
maybe 15 seconds for polling the data, 5 seconds for actually composing the
article, and 2m 40s waiting for approval).

