
In Case You Wondered, a Real Human Wrote This Column - l_adams
http://www.nytimes.com/2011/09/11/business/computer-generated-articles-are-gaining-traction.html
======
jellicle
What this article doesn't tell you is that the human who wrote it works for
Narrative Science.

For those who don't know, if you see a story in your local paper, and it
doesn't involve a car crash, crime, weather, or sports, it was probably placed
there by a PR representative. Most of the things you read are not the result
of random reporters deciding to cover X or Y, but a paid, concerted effort to
place story X or Y in the paper by providing the paper with a fully pre-
digested story to perhaps rewrite, or perhaps not.

The words "narrative science" appear 14 times in that story, including such
clunkers as "To generate story “angles,” explains Mr. Hammond of Narrative
Science...." when Mr. Hammond has already been introduced earlier in the
story. It even includes pricing: hey readers, this is not only cool and will
win the Pulitzer Prize, but it's cheap too! No mention of competitors... It
reads like an ad because it is an ad.

This story was provided, probably almost word for word, by a PR person to the
NYT reporter.

I'm not sure if computer-generated text will be better or worse than the media
system we have now.

~~~
dimitar
This is true. I've been told by PR people that 80% of the content in a
newspaper or a magazine is written by PR agencies.

I have a question, though. Why don't they also create an expert system that
writes PR pieces? I imagine it would be really nice tool for people in the
business. You write down the variables (who, what, when)from an interview with
a client in a form and you get to show him an instant first draft, you correct
it together and you publish it after leaving the room.

Or even better yet, take that data and use advanced algorithms to embed
mentions of clients, products and initiatives in bigger articles.

This is how a PR piece usually goes:

1\. Case, problem, solution. Short story here.

2\. Introduction of company

3\. A Question and an Answer

4\. Another look of the company, credentials.

5\. This is really cool think about it. There may be a sound (text) bite here.

Do you see the patterns?

~~~
rdouble
_Why don't they also create an expert system that writes PR pieces?_

It's cheaper and easier to just use the endless supply of interns.

~~~
dimitar
Unpaid interns? Ouch. I didn't think of that.

Although there are hidden costs even behind unpaid labor. Office space, HR
costs (hiring isn't easy), training, etc, etc.

~~~
rdouble
It's not hard to hire people to write PR stuff. It's just unemployed english
majors of which there are legions. There are no associated costs as the
positions have no benefits and everyone works from home.

~~~
bergie
...and the interns get to say "I've written for New York Times" the next time
they interview for a job. Everybody wins, as The Economist argued:
<http://www.economist.com/node/21528449>

------
6ren
I love seeing these examples of product development: begin with a very
specific niche at the edge (not tackling the mainstream head-on) and "target
non-consumption" - that way, you have no competition; and it's not a zero-sum
game where you beat someone, but creating value that never existed before.
This is possible not because it's good, but because it's _cheap_ (and good
enough):

> primarily a low-cost tool ... for local youth sports .... and financial
> results of local public companies ... _“Mostly, we’re doing things that are
> not being done otherwise,”_

Then, once you have some customers - _any_ customers! - you improve it, bit by
bit. It doesn't need to be perfect in the first place; it doesn't need to be
perfect in the end. It just needs to be good enough to be useful.

> [customer] worked with Narrative Science for months to fine-tune the
> software

As for the technology itself, we're not told anything of its details, just
what it can do. This is a marketing article, not a tech report. It would be
interesting to see the models they use for stories, and whether they use
grammars for the overall structure. These are very narrow domains, which are
the easiest to start with: you could enumerate all the standard cliches,
understand when they apply, and tweak the model. That's where the journalist
expert domain knowledge of the two founders would come in handy. BTW:
"easiest" is only relative - it would still be very difficult (almost
impossible), and kudos to these guys for actually doing it - and even better,
making an actual business out of it.

It reads like a 50's Asimov story - the future is finally arriving.

But a Pulitzer in 5 years is absurd, either cynical puff or visionary bravado.
Theoretically possible, I think, maybe in 50 years - the figure I've long
given for strong AI. ;-)

~~~
tomahhy
Curious, what do you mean by "target non-consumption"?

~~~
6ren
It means target people with needs that aren't being met. Those people are not
consuming (i.e. buying and using) a product or service to address their
problem.

It's clearest to see when a product exists to solve a problem, but it's too
expensive for some people (or some situation). In the article, the problem of
reporting on local sports/financials has a solution (reporters), but the value
of that news isn't worth their time: their time is too expensive. So the
newspaper doesn't "consume" a solution to the problem of reporting that
particular news.

By targeting this non-consumption, the startup doesn't compete against
reporters (yet...), so it provokes no desperate fight for survival.

It's an term from Clayton Christensen, who wrote The Innovator's Dilemma,
though he doesn't use it til "The Innovator's Solution", and expands on it in
"Seeing What's Next".

------
talbina
Did they write an entire two page article while ignoring the real leader in
this space, in my opinion: <http://statsheet.com/>

~~~
hyperbovine
A computer never would have missed that.

------
levy
My worry here is computers will learn to write articles specific to every
individual. The computer will know what other articles we liked and what we
didn't like and just try to write to what we want to read. This will make it
even less likely we'll hear an opposing view to our own, if the computers are
giving us what we want to read.

~~~
zalthor
We don't need computers to do this, people do it already. Most "news" channels
spew out such a monotonous train of thought, that it makes me wonder if
switching to a news channel these days is even worth it. For now, I am able to
get my news and information online from a variety of sources that offer me
interesting views on both sides. Even if one day all of this is content
produced by machines, I think its the individual's responsibility to make sure
that he/she gets information from both sides.

------
jgilliam
If a computer _had_ written this article, maybe it would have mentioned how
useful this technology is for spammers.

~~~
alexandros
I've said it before, sufficiently advanced spam is indistinguishable from
content. If this comes to pass, I'm not sure who will have won the spam wars,
but I'm. Inclined to say that the low end content authors will have lost it.

~~~
LiveTheDream
Relevant: <http://xkcd.com/810/>

------
thalecress
I'm skeptical of the claim that a program could win a Pulitzer. How does it
decide what to write about, who to interview, and what questions to ask?

Reporting a day at the races or the markets is easy because we know which
kinds of data are relevant and we have them available.

~~~
dr_
It's not inconceivable that, as AI advances, at some point there with be
algorithms that figure out the questions you have posed.

~~~
arethuza
True, it's not inconceivable, but arguably having a system that is an
effective investigative journalist is AI-Complete:

<http://en.wikipedia.org/wiki/AI-complete>

~~~
nl
This is true but like most things has to do with AI has limitations.

In some fields (eg, finance) one can conceive a computer based process that
would do a _better_ job than most investigative journalists. It can't deal
with missing data, but it can discover inaccuracies, unusual events and
suspicious patterns and in some limited fields this is enough.

For example, an AI based process might have been just as good at finding the
problems at Enron as conventional journalists were (since it the problems
there were mostly uncovered by forensic accounting on their public balance
sheets):

 _But hard information was scarce. "It's almost as if you have to use forensic
accountants when you're doing a company story because many companies are using
very aggressive accounting techniques that are perfectly legal," Shepard
says._

[http://www.washingtonpost.com/wp-
dyn/articles/A64769-2002Jan...](http://www.washingtonpost.com/wp-
dyn/articles/A64769-2002Jan17.html)

------
TorKlingberg
I wonder if these automatically generated articles will ever become good
enough to be worth reading. Currently, they seem to be just good enough to
fool Google, and convince people to click the link. Do any sports fans
bookmark and come back to these sites?

No matter how good the algorithms get, they are still limited by their input,
the statistics. If for example a player scores a very unusual goal, say a
bicycle kick in soccer, then a real writer who actually saw the match would
surely mention it. An algorithm could not if there is no field for unusual
goal in the match statistics.

~~~
cageface
I think I'd rather see more innovative ways of representing the history of a
sporting event graphically instead of trying to replicate the relatively inane
blow-by-blow that you currently get from real sports journalists.

Maybe though if this kind of thing continues to improve the humans will have
to start doing more serious analysis instead of fluff coverage.

------
jawns
Here's a description of my venture into this territory, in which I generated
formulaic lottery result briefs:

"I wrote this article with one mouse click"

[http://coding.pressbin.com/60/I-wrote-this-article-with-
one-...](http://coding.pressbin.com/60/I-wrote-this-article-with-one-mouse-
click)

I can't imagine the sort of code base that would be needed to make these
stories not seem formulaic.

------
kia
Single page:

[http://www.nytimes.com/2011/09/11/business/computer-
generate...](http://www.nytimes.com/2011/09/11/business/computer-generated-
articles-are-gaining-traction.html?_r=1&pagewanted=all)

~~~
guygurari
Page One [1] is a browser extension that automatically redirects you to the
single-page version of articles on popular news sites, including NYT. Works
great for me.

[1] [http://globalmoxie.com/blog/page-one-safari-chrome-
extension...](http://globalmoxie.com/blog/page-one-safari-chrome-
extension.shtml)

~~~
kia
But doesn't work with Firefox.

------
dredmorbius
ObXKCD: <http://xkcd.com/904/>

There are certain topical areas which lend themselves to automated content
generation. Sports, financial news, weather, astronomy (astrology isn't worth
mentioning), earthquakes and other severe events, machine monitoring.

Domains in which a quantified or measured outcome tied to a specific point in
time or event (final score, market close, daily forecast, etc.) occurs. The
important data has already been highlighted, all you've got to do is sprinkle
some syntactic sugar around it.

Oddly enough, these are areas in which you're already most likely to find
existing "AI"-type content generators.

In areas in which you've got to do significant determination of what is
salient, the approach isn't nearly as successful.

------
lexicon
This is a recent email I got from Facebook Support team regarding a vanity url
for my business. I could swear this guy is a robot or a script, and I wonder
if Facebook is using the technology described in the article:

\----------------

We’re sorry, but we’re unable to process your request because another entity
has made a previous request concerning this username. If you are still
interested in claiming the username, you may contact us in 60 days for an
update about its availability.

\---

You have reached the right channel for these requests. As mentioned earlier,
we have no further information to share with you concerning the username
"xxxx" (marked out). We will be unable to assist you further from this alias.

\----------------

What human being talks like that?

~~~
ugh
A human who had to write several dozen support answers?

Naming collisions have to be a common occurrence for Facebook. It’s sufficient
to write exactly one mail for such cases. There is no need to re-write or
change things around, it’s always the same answer to the same question.

Using a robot to write stuff like that seems wasteful – I don’t even think it
would currently be possible.

~~~
ori_b
It's probably a template. Someone read it, and picked "template-name-conflict-
XYZ" to respond with.

------
sjs
I suppose this may do for articles that just deliver some facts. However the
kind of stuff I enjoy reading doesn't just barf up some facts in the form of
sentences, it provides insight into what the implications of those facts may
be and also draws from the past to better put things in context.

That's not to say their technology couldn't be improved to search the web and
see what past events are relevant, but providing _good_ insights about the
implications of the facts will be a whole lot tougher. I don't think
journalists need to be shaking in their boots unless they only deliver the
quality and depth of results that this algorithm delivers.

------
kiba
These technological advances made me shudder about the potential job loss of
the future even though the previous technological advances created new jobs.

Sure, there's no way that my profession and the great majority of jobs on the
internet would be possible if we rely on human switchboard operators rather
than relying on automation. That doesn't mean it will be true for the next
advances in technology, does it?

~~~
notahacker
If the cost of producing even the sort of dry, statistics-heavy content the
program presently excels at was a primary factor then we'd have outsourced it
to India or the Philippines by now. You'd certainly pay less than $10 for an
article like this: [http://www.builderonline.com/local-housing-data/new-
england/...](http://www.builderonline.com/local-housing-data/new-
england/manchester-nashua-nh.aspx)

I'm willing to believe the underlying machine learning technology is very
clever, but I'm also willing to believe a _specialised_ toy script could
produce similar results, even if you had to hard code the minimum winning
margin for a "rout".

As for the Freakonomics comparison, they seem to have missed the appeal of
Levitt: that his ability to posit a _plausible causal relationship_ between
two apparently unrelated variables. Any idiot can summarise "remarkable
findings" based on spurious correlations.

~~~
bergie
Fun idea: produce similar autogenerated narratives on commit activity of
various open source projects

 _Bergie had a strong start on the office day, closing four bugs in row. Then
luck turned and he broke the build..._

------
SwellJoe
This is pretty fascinating stuff, despite the limitations and obvious bias of
this article. Are there any Open Source libraries or papers which cover toy
implementations of this sort of thing? (Assuming, of course, that it is not
simply a bunch of if/else constructs applied to templates, which would be far
less interesting.)

------
jasonshen
This reminds me of what MarketBrief is doing for financial documents.
Definitely less color / variance in the stories though.

[http://techcrunch.com/2011/08/15/yc-funded-marketbrief-
makes...](http://techcrunch.com/2011/08/15/yc-funded-marketbrief-makes-obtuse-
sec-documents-human-friendly/)

------
nl
For those interested, the best source of research in this field is the
"Special Interest Group on Natural Language Generation":
<http://www.aclweb.org/anthology/siggen.html>

------
jack7890
If this works as advertised, it would have important (bad) consequences for
SEO, right?

------
mkramlich
Making a note here: add some shiny around my Python template engine and I can
land $6m investment.

