

How I automated my writing career - RobbieStats
http://radar.oreilly.com/2011/11/automated-writing-software.html

======
jonnathanson
Just a prediction here, so take it for what it's worth, but my hunch is that
the profession of writing is going to fragment into more and more subsets in
the coming years. At the low end of the totem pole -- what we'll call "low-
value" writing -- will be the sorts of articles that software can eventually
automate. Things like news updates, information dumps, how-to pieces, lists,
summaries, and so forth. Much of what traditional journalism would call "news"
stories, and what magazine journalism would call "informational" pieces, fall
into this bucket. In these sorts of articles, substance is more important than
style. These pieces are all about the facts, or summations of the facts. Or,
in the case of content farms, they're about relaying and recombining
information in endless mixes, using provocative headlines. You don't need a
Pulitzer-caliber author to crank these out. Hell, pretty soon you won't even
need a _human_ to crank them out. It's no surprise that this type of writing
doesn't pay well, because frankly, it's the fast food of journalism. It's
cheap, it's disposable to the consumer, and so it pays cheaply.

On the other hand, higher-value writing will be that which isn't easily
automated, and for which style is every bit as important as substance. Fiction
(good fiction, at least), features, human-interest stories, editorials
(especially those relying on expertise), and so forth. This will be the kind
of writing that either pays crap, or pays big, depending on the writer's skill
level -- and his or her ability to build a market or following for it. There
will always be a need for this kind of writing, and until such time as
software AI becomes genuinely creative, it'll be very hard to automate the
highest-quality, most interesting, and most innovative stuff.

Low-value writing will, if anything, see its value decline even further. It is
the equivalent of the man on the assembly line who can be replaced by a
tireless, hyper-efficient machine. High-value writing will not, on average,
find itself paying more handsomely than it used to. It will still be a high-
variance profession. But it will be what remains for professional writers in
the age of content farms, automated news, social networking, and so forth.

Essentially, the way to earn a decent living in the future will be: 1) be
damned good, 2) build and maintain a following, 3) differentiate yourself, and
4) produce at high volume.

~~~
lupatus
Fiction, at least, has been well-studied and broken into defined components.
All stories follow a predictable arc: exposition, conflict, climax,
conclusion, and denouement [1]. Use well-tested plot devices to move through
the story arc [2]. Define a skeleton framework of what kind of plot devices go
where in the story. Have a database of scene location descriptions. Have a
database of character stereotypes. Make your algorithm mix-and-match them to
fill-in the story skeleton.

Seen this way, A New Hope and Raiders of the Lost Ark are kind of similar.

Plot: Rescue the princess/my father

Hero: young/grizzled adventurer

Plucky sidekick: loyal robots/Professor

Ally: Semi-sleazy smuggler

Scene: Fascist Empire spaceship/Nazi North Africa

Bad guys: Stormtroopers/Nazis

Main villain: black-clothed mystical bad Jedi/Archeologist

Heck, using this format, you could make your own OWS drama:

Plot: Rescue my bankrupt mother

Hero: idealistic hippy

Plucky sidekick: loyal golden retreiver

Ally: Semi-sleazy drug dealer

Scene: Urban city streets

Bad guys: city police

Main villain: black-suited bad finance executive

I would call it "Wall Street Raiders" or "Wall Street: A New Hope".

Yes, FWIW, I've been contemplating coding just such a story-generating engine.
Like Mad Libs on steroids.

[1]<http://en.wikipedia.org/wiki/Dramatic_structure>

[2]<http://en.wikipedia.org/wiki/Plot_device>

~~~
akg_67
Could this be done in reverse by software? Instead of story-generating engine,
automated identification of the plot line, character profile. Basically
automatically building the database you mentioned by feeding already written
fiction. I think a larger business may be in building such a repository. Story
generating engine will commoditize fiction creation.

~~~
lupatus
I think that there are far fewer plot devices, character stereotypes, and
background scenes for fiction than you expect. There are enough though to
have, at least, 100s of millions of possible combinations. And those
combinations are where the "creative" novelty of fiction novels arises.

For example, let us suppose that at the end of Braveheart, instead of getting
disemboweled, Optimus Prime flys down out of the sky, kills the executioner,
saves Mel Gibson, and lays waste to the English forces; thus allowing Mel
Gibson to have his revenge and to ride off into the sunset with the English
queen. Creative? Sure. Formulaic? Yes!

I think that a few Google searches would reveal lists of the majority of the
different story components

------
gvb
That was a very inaccurate headline on the story.

 _Our software can create eight paragraphs now, but is it possible to create
eight chapters' worth of content? The answer is "yes," but not quite the same
kind of technical books I used to write, at least right now._

...which is why the previous paragraph says "[b]ecause I've been so focused on
running Automated Insights, I haven't had time to write any new books
recently."

That is a variant on the Calvin and Hobbes bed making robot[1]. I strongly
suspect he will end up like Calvin, with something that doesn't work as
planned. If he is lucky, like Calvin he will find he accomplished his goal,
but discover he started with the wrong goal in mind.

[1] [http://books.google.com/books?id=NV4WEqQtvTYC&pg=PA126&#...</a>

~~~
zackmansfield
the headline was a lede to draw you in...which it did. The reality is that he
has created really interesting technology which _is_ automating content, which
is really freaking cool. As far as him becoming like calvin and developing
something that won't "work as planned" - seems like from the article the plan
wasn't and isn't to replace himself, but rather to replace some of the lower
value pieces of content. the human/machine interplay is an important part of
the whole concept.

------
swombat
"How I didn't do what I claimed in the title, really, but I'm going to use
this title anyway because it's bound to get clicks and upvotes"

 _cynic_

------
jgrahamc
Slightly off topic, but when I was writing The Geek Atlas one of the things I
did was keep metrics about my writing so that I knew where I was, and then I
used those metrics to predict the book's delivery date to O'Reilly, and
measure how I was doing against the required delivery date.

This was all done in a spreadsheet and it enabled me to see whether I was
ahead or behind on my writing. Turned out to be very, very useful.

~~~
arethuza
What metrics did you use?

~~~
jgrahamc
The Geek Atlas consists of 128 similarly sized 'chapters' (one for each place)
so I had a number of key metrics:

1\. Number of chapters completed. A very gross progress bar that I could use
to get a rough estimate of when I would deliver.

2\. Words per chapter. I used this to determine if I was changing the length
of the chapter without realizing (which did happen) and correct for that so
that the book would be consistent.

3\. Hours per chapter. I used this to test my writing speed and work out more
accurately when I would be done and also how many hours I could allow per
chapter.

------
sharmajai
On an unrelated note, if you highlight a portion of the article, you can
listen to it. Very useful for listening to articles while working, instead of
listening to music, or just to rest your eyes.

It is powered by <http://www.readspeaker.com> and AFAICT is the best sounding
Text to Speech implementation, I have heard so far.

If you just listen to the text, it sounds like a human news reader, much
better than Siri. Wow. And the cherry on the top is that it highlights the
text which it is reading as it's being read.

------
jawns
I tried a little automated journalism a while back and wrote a blog post about
my code:

"I wrote this article with one mouse click"
[http://coding.pressbin.com/60/I-wrote-this-article-with-
one-...](http://coding.pressbin.com/60/I-wrote-this-article-with-one-mouse-
click)

There are a whole bunch of little things that go into play with something like
this that you just don't think much about until you try it -- stuff like
subject/verb agreement, when to use figures and when to spell out numbers,
etc.

------
brador
What's Googles take on this? Are they for or against automated content
creation? Will they be kicking these sites out of search or letting them stay?

~~~
Lukeas14
Regardless of whether they are for or against it, they would have a tough time
detecting it in the first place. In theory, their detection algorithm would
have to be at least as smart as the StatSheet Algorithm.

This also differs from most forms of automated content creation in that the
underlying data is recent and newsworthy as opposed to simply reworking text
found elsewhere on the internet.

------
pdenya
I loved the 3rd bullet point: "Software doesn't get bored and start wondering
how to automate itself."

I'm not sure if this will ever be applied to non-data driven fields but this
is still extremely cool.

------
hkmurakami
"A common, and funny, question I get from journalists is:* "when will you
automate me out out of a job?"* I find the question humorous because built
into the question is the assumption that if our software can write the perfect
story on a particular topic, then no one else should attempt to write about
it. _That's just not going to happen."_

It only takes one misguided and uninformed manager to fire good writers,
thinking that they can be replaced with an army of computers, only to find
that the product is now crap. Damage will have been done.

The example I'm thinking of? Square-Enix firing their developers and
outsourcing core development to China. The Result? Crappy games. (In a
humorous twist, they've since then been asking the very developers they fired
to come back and work for them)

------
3dFlatLander
It makes sense that the sports genre was chosen. With scores, winners, teams,
tournaments and the like all being mentioned in pretty much every article out
there, it stands to reason that it would be fairly easy to parse them all and
get good data. I'd imagine tech and celebrity writing would also work well.

Political stories have such a wide range of views, this approach would produce
gibberish until you sort out all the articles on a left-right scale.

~~~
eru
> Political stories have such a wide range of views, this approach would
> produce gibberish until you sort out all the articles on a left-right scale.

I'd hope we need more scales than that.

------
snorkel
If you're writing style happens to be very repetitive and templatized, then
yes, you've automated your writing career.

A more likely scenario for applying this tech to journalism would be for
providing filler paragraphs around the more substantive prose banged out by an
human journalist, that way the journalist doesn't have to write as much or
spend time on pulling tedious raw data into the story.

------
shabble
A friend of mine was working on an automated story telling system for Nethack
as his Masters (Linguistics & CS/AI, IIRC) thesis.

It was never really completed, but there was some interesting work in applying
goal-based planning AI in reverse to generate possible long-term motivations
for individual actions.

I don't think it's available online though, sadly.

------
JeremyStein
Funny how articles about computer-generated prose are never computer-
generated.

------
hammock
Would like to see an example of this - even if it's not all that impressive -
applied to some non-data intensive area, i.e. someplace other than reporting
(sports, finance, etc).

~~~
dpapathanasiou
An INSEAD professor built an automated system to do something similar, though
the quality of the output leaves something to be desired:
[http://www.nytimes.com/2008/04/14/business/media/14link.html...](http://www.nytimes.com/2008/04/14/business/media/14link.html?_r=2&pagewanted=all)

~~~
JimmyL
If you're interested in this guy - Phillip M. Parker - check out his list on
Amazon
([http://www.amazon.com/gp/search/ref=sr_nr_i_0?rh=k%3APhilip+...](http://www.amazon.com/gp/search/ref=sr_nr_i_0?rh=k%3APhilip+M.+Parker%2Ci%3Astripbooks&keywords=Philip+M.+Parker&ie=UTF8&qid=1320365116#/ref=sr_st?keywords=Philip+M.+Parker&qid=1320365151&rh=k%3APhilip+M.+Parker%2Cn%3A283155&sort=inversepricerank)),
currently standing at 111k+ "books" published.

From a skim of his titles, it seems like there are different series of his
books - from the "The Official Patient's Sourcebook on $disease" series
($28.95) to the "The 2007-2012 World Outlook for $niche-industrial-product"
($795.00) series.

------
legec
I stand unimpressed. I am still waiting for "How I automated my reading hacker
news"...

~~~
eru
Just like you use a VCR to watch your favourite TV shows for you.

------
danso
This is one example he gives of his automated writing: "Second-seeded North
Carolina was defeated in the Elite Eight with a 76-69 loss to fourth-seeded
Kentucky in the Regional Finals in Newark."

That's perfectly serviceable. But it makes me wonder...what is the point of
this? Not his automated-writing tool, but why are we putting what was meant
for a statistical/symbolic graphic into sentence form?

This kind of writing is only possible with the collection of discrete
datapoints: the date, the score, the participants, and the location. From
there, you can do any kind of variation of subject-verb etc., even adding
adjectives if the point spread is high.

So we're taking data and turning it into a less efficiently readable form.
It's no fault of the auto-writer of course, that's just how we are taught to
read and write. Someday, we move towards a society in which other forms of
communication, particularly visual, are as commonplace. [insert your own
Tufte-inspired rant here)

~~~
patio11
_why are we putting what was meant for a statistical/symbolic graphic into
sentence form?_

Because many people (and other important constituents, like Googlebot) cannot
read graphs and so perceive a graph as having zero value but a paragraph
telling 1/10th the story of the graph as having positive value.

This is hardly the only "inferior form factor dominates because of ease of
consumption" thing out there. For example, rather than look at an unemployment
graph or read a paragraph beginning with "Government figures released earlier
today say that unemployment is the highest it has been since 2004", most of
the world prefers someone whose professional competence is looking pretty to
read "Unemployment is up" to them for approximately 4 seconds before cutting
to commercial.

