
Can an Algorithm Write a Better News Story Than a Human Reporter?  - J3L2404
http://m.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1
======
knowtheory
Instead of trolling from wired about Narrative Science, how about we see how
an _actual_ reporter uses tools to automate his job?

Here's Ben Welsh of the L.A. Times talking about "Human Assisted Reporting"
(which itself is a play on "Computer Assisted Reporting"):
<http://www.youtube.com/watch?v=iP-On8PzEy8>

Oh, and here's the slide deck:
[https://docs.google.com/a/knowtheory.net/presentation/embed?...](https://docs.google.com/a/knowtheory.net/presentation/embed?id=1sW3iLUDXs7NTdo7EUUxD7X3vnIWYo7G8orMf5FT05Zk&start=false&loop=false&delayms=3000#slide=id.p)

~~~
corin_
I found Ben Welsh himself to be interesting, and his opinions, but the overall
premise was talking about using computers for simple tasks that frankly I
assumed they had all been doing for a long time, stuff like "now we can run a
search to see if somebody has been arrested before!"

~~~
knowtheory
Well, you have to remember two things.

First, publishing (whether it's papers, broadcast, whatever) is a big
lumbering business. There are entrenched interests, people who fear or loathe
technology, egos, and all of the other perils which modulate large scale human
endeavors.

There are people within the news world who have been working on computer
assisted reporting, doing all sorts of cool stuff for decades now. What
_hasn't_ happened is a uniform and widescale adoption of all of these
practices, because much of the workflows involved in the business are tied to
their legacy business, and those legacy components are really difficult to
shed for a pile of reasons.

Ben may be talking about this stuff as a good idea, but it's not that nobody's
been doing this stuff... it's that stuff like this isn't standard practice
across the industry (yet).

Also, please remember that Ben is talking about _automating_ these processes.
Reporters have been filing FOIA requests manually, and getting info from the
government essentially for as long as there has been a government and foia
laws. That's where part of the tepidness about technology comes from. The
government isn't particularly friendly to efforts to stream line processes
that get more information to reporters.

~~~
ceejayoz
A third thing to remember: the folks you're getting information from
(governments) are often reluctant to provide that information, and there's no
legal requirement for it to be a particularly useful format.

It'll often be a CD of some 1970s mainframe database format with no
information on what format it is, how to open it, etc. They'll do a little
happy dance if you just give up and move on to another story.

~~~
knowtheory
My employer still has two DAT readers which we use to process records from the
Department of Labor.

I had never seen one in action before.

------
aqme28
In this economy, software developers are in extremely high demand and
journalists are a dime a dozen. What's the value in building software to do
the job of journalists?

That said, it's a really neat idea and I'm curious to see this technology
mature.

~~~
vibrunazo
It's much more fun to code a robot than to hire a journalist :-)

But also, once you get to massive robot production, then the amount of
journalists will seem small. Think of content personalized to each user. Real
time description of things around you in your software. In our startup we plan
to have a robot write the plot for user generated games based on their custom
settings. Hiring a writer for each use would certainly increase our costs.

~~~
knowtheory
Explain to me why this isn't techno-utopian claptrap?

The thing that frustrates me about the way that Narrative Science is discussed
and some of the unhinged techno-futurism stuff that people write, is that it's
never about making _people_ or _individuals_ more effective.

Automated processes and people have different strengths. Computers really
aren't that great at pretending that they're people, and people are bad at
accuracy on repetitive tasks.

That people keep on talking about "replacing" so-and-so misses the point.
Change their market, change their workflows, make them better at accomplishing
the function journalists or whomever play in society.

Endeavoring to make them obsolete is both unnecessarily inflammatory and also
not terribly sensible.

~~~
vibrunazo
We don't wanna replace them as the end goal. We simply have repetitive tasks
that could be automated. Whether that automation will, in fact, replace
journalists or assist them. Is another problem that depends on your specific
implementation details and economics.

For our company specific use. Hiring a writer to be assisted by our robots
would just add unnecessary cost to the process and delay to the end user
experience. Versus pretty much little to no gain in quality. We only need a
simple transformation from structured data to narrated english, and we want
lots of it to be delivered instantly to our users.

It just makes more sense for applications like ours to have robots replace
humans.

------
OmegaKnot
Good journalists write stories that are not solely based on data but also on
experience and observations. I personally hate it when a game summary of a
sporting event is just the box score in sentence structure. Yes, rote
descriptions of data are replaceable by software, but is it even worth
replacing? Maybe it's just because I'm a data-oriented person, but I would
rather look at a table of data or a graph than read these types of stories.

I want to read details that can only come from being there in person. I can
look at a box score to see Lebron James scored 27 points or missed a free
throw at the end of a game. A journalist's job is to tell me that he appeared
distracted by the crowd loudly booing a controversial call or that Andrew
Bynum pouted on the bench after getting into a heated argument with his coach
during a timeout. Those kinds of observations are what makes a story come
alive and it will be a long time before software can replace this human
element.

~~~
vibrunazo
Robots have much better memory and access to want you call "experience" than a
human.

Robots only do what they're told and are only as good and creative as the
coder. If you can write an emotional heated narrative. Then you can code a
robot that does the same. There's nothing intrinsic to programming that makes
it impossible for good writers to coffee a robot that mimics their style.

~~~
dansingerman
I think you are missing the grandparent's point. It is easy(-ish) to get some
structured data and turn it into sentence format.

Unless you have structured data that represents all human experience someone
present at an event could muster (smell, sound, emotional feeling...) I don't
see how a robot could replace it.

------
anamax
See the talk by Kriss Hammond (founder of Narrative Science) at Stanford's
ee380 on January 18. See [http://www.stanford.edu/class/ee380/winter-
schedule-20112012...](http://www.stanford.edu/class/ee380/winter-
schedule-20112012.html) .

------
baconner
Seems to me that the current state of the art here is getting a computer to
write a couple of paragraphs to describe a data set. While that's a neat trick
its not a very valuable one in this context.

Congratulations you've replaced the absolute bottom of the barrel of
reporting, that which could and probably should be replaced with a table or
two.

Until the software actually attends the event somehow to gather the
information and add additional context its just a clever trick. If you want to
save money on this kind of reporting id probably stick to outputting some nice
tables of data in the sports section.

------
redwood
How about the opposite: I assume high frequency traders use algos to interpret
whether news stories are good are bad for stocks and trade accordingly

~~~
scarmig
People do use news stories, blog posts, Twitter, pretty much anything to
algorithmically generate profits. But that's not really the domain of HFT.

~~~
nitid_name
Are you certain about that?

I thought the whole point of HFT was to use small shreds of information to
play the liquidity gap between sell and buy orders.

~~~
scarmig
Someone should confirm this, but my understanding is that HFT typically
monitors events on the stocks themselves to drive actions. I also believe that
HFT is something that happens with response times on the scale of
microseconds, which obviously scraping CNN.com for content isn't suited for.

------
cantankerous
I can't say with any certainty about how much you could automate the job of a
featured journalist or editorialist for a big journalistic institution like
the Economist or maybe the Times, but I'd say at the current quality network
and cable news reports are going...automating the "what's up right now" news
is probably already doable. That's not necessarily saying a lot about the
algorithm, though.

------
DanBC
5 years to get a computer winner of a Pulitzer prize is ridiculously
optimistic.

