
Journalism generated by machine is on the rise - pseudolus
https://www.nytimes.com/2019/02/05/business/media/artificial-intelligence-journalism-robots.html
======
jawns
I'm a former journalist and spent the first 10 years of my career at various
magazines and newspapers.

About eight years ago, I landed my first full-time software development job in
part because of a walkthrough I gave during my in-person interview process of
a tool I'd developed while working for a metro daily newspaper. It wrote
lottery results stories with the click of a button.

Writing those results stories was a daily chore. We would look up the state's
daily lottery results and also report Powerball and Mega Millions results. For
those larger lotteries, we'd also adjust our headlines depending on whether
there was a big winner in our state. But aside from that, it was just tedious
and formulaic, so I wrote a script that would hit the various lottery sites,
scrape the data, and generate a two- or three-paragraph story that was ready
to post. It took the task down from five minutes to five seconds. (I later
worked on a similar tool for generating short weather reports and alerts.)

Since then, I've worked on other NLG stuff, and honestly, it's pretty hard,
and the topic really has to be data-driven to begin with; we're probably
decades away from really insightful NLG that offers "why" explanations, as
opposed to "what" or "how" descriptions.

The other tough challenge about computer-generated journalism is that getting
the data is often the hardest part of the process. Oh, sure, businesses are
going to release their quarterly earnings reports, and sports teams are going
to release their game data. You might even use computer-assisted reporting to
generate FOIA requests for data that governments are required by law to
release upon request. But you're not going to write a story that leads to
Nixon's resignation or Enron's collapse simply by asking for the data.

~~~
msla
> we're probably decades away from really insightful NLG that offers "why"
> explanations, as opposed to "what" or "how" descriptions.

For any specialized field, doing this would require something more than the
kinds of schooling journalists get. It would also step on a lot of political
toes: For example, the current national debt is $X. Why is it that big? Is
that number a problem? Say anything concrete regarding either of those
questions and you get screeching. Just endless, wordless screeching, like the
Pod People in _Invasion of the Body Snatchers_ , only directed into your
Letters to the Editor column.

~~~
neffy
To be fair to the journalists, those are actually open research questions.

~~~
msla
> To be fair to the journalists, those are actually open research questions.

There are strong consensuses about some aspects of those problems, however,
especially regarding the benefits of using government debt to finance stimulus
programs as part of a counter-cyclical action against a recession. Saying it's
all unknown is about as intellectually honest as saying that we don't know
whether the world is flat or round because we're unsure of its topology at the
millimeter scale.

~~~
bgorman
Economics, like all social sciences is nowhere near as sound as the hard
sciences. You simply cannot have control groups for many experiments so you
are left with case studies, which are often unconvincing. For example, does a
true free market really exist anywhere in the world? How about a 100% command
economy? How can you honesty determine if the federal reserve helps or hurts
the economy?

~~~
msla
> Economics, like all social sciences is nowhere near as sound as the hard
> sciences. You simply cannot have control groups for many experiments so you
> are left with case studies, which are often unconvincing.

You can say the same thing about paleontology, but do you seriously doubt the
past existence of non-avian dinosaurs?

> For example, does a true free market really exist anywhere in the world? How
> about a 100% command economy?

You don't need perfect examples of things to observe what happens when
economies closely approach those extremes.

> How can you honesty determine if the federal reserve helps or hurts the
> economy?

By observing how horrible the business cycle was prior to its existence.

~~~
msla
I see disagreement by downvoting is in effect, which discredits the
disagreement.

------
freehunter
My side project is a local news outlet for two small cities near me. They are
small enough cities that I am the only game in town when it comes to digital
media, the only competitor is the local newspaper who has no online presence.
Plenty of our stories are actual news articles talking about local politics
and new businesses/restaurants and local interest type stuff, all written by
humans.

But filling in the gaps is plenty of machine-generated content. One of our big
draws is the event calendar, and fortunately events give me enough data to
feed an AI that I've developed. There is enough structured data and the
information is so routine that this system can churn out several articles per
day that sounds hand-written and every article sounds custom tailored to the
business and the event itself. This fills in the routine work of keeping
content flowing in between the stuff that takes a lot longer to research and
write, and makes it viable for us (one full time employee and one part time
employee) to run two popular small-town news websites where otherwise no one
would even bother trying.

This is also a great case where classical AI is still relevant to modern
problems. Neural networks just are not good at writing English in a way that
humans would enjoy reading. At some point I plan on packaging it up and doing
a Show HN, but for the time being, this article is spot on. Machine-generated
news content breathes whole new life into an area that's increasingly hard to
turn a profit.

~~~
pizzapill
Id be really interested in your technical approach to this. How did you
implement this in a high level?

~~~
freehunter
The core of it is a decision tree tied to a pretty in-depth database that
either knows everything about the city or seeks to learn everything about the
city. The system reads an event on the calendar that says "Van Halen is
playing at American Brew Pub on February 12th at 7pm" and starts running
running through the decision tree pulling information out of the database to
fill in the blanks on the phrases it's picked.

Then it strings them all together to write something that looks something
like:

>Do you have plans on Tuesday? Well now you do! Van Halen is playing at
American Brewpub at 7pm! Van Halen is a local rock band that we love. They
have been to our city before, but they played at Next Door Music Venue (link
to the previous article for Van Halen at Next Door). This time they're playing
at our favorite brewery, so you can watch the show while drinking Local IPA!
Tickets are required, and you can purchase them here (link to tickets). We'll
see you there!

People love these articles but they're not super fun to write every day
(sometimes twice a day for the two cities), it's just routine stuff.

~~~
jon-wood
The concept seems pretty solid, but reading the output I feel like this is
bordering on unethical. Maybe this is just a bad example, but it seems you’ve
gone past programmatically giving people information, and into giving the
impression you’re actually endorsing things. Unless you actually add metadata
to the artist saying you love them, or rate the venue to indicate it’s good
for live music, you’ve got no way of going beyond “Van Halen, a local rock
band, are playing at American Brewpub at 7pm Thursday. You can drink local IPA
while watching them, and buy tickets here.”

~~~
freehunter
This is a very simplified example written by human hands just for this post
(not machine written). The database is immense and has far more fields than
you might imagine. It takes into account how many clicks content with Van
Halen has had in the past and how many clicks content with American Brewpub
has had etc to figure out how likely this article is to resonate with our
audience, plus how many human-written articles have been published about these
subjects and sentiment analysis to figure out if our articles were positive or
neutral (we don't write negative articles, if we don't like a business/event
we just don't talk about it), which impacts how the machine-generated article
is phrased or even if it gets written at all.

We have about 90 events on our calendar per city in any given month, and this
AI writes about three or four articles per city per week even though it's
running constantly. It's pretty selective and very rarely writes something
that our two human writers wouldn't have mentioned otherwise. Our personal
preferences and those of our audience are certainly taken into account.

~~~
jon-wood
That’s incredibly cool, and definitely covers my concerns there. Good work on
giving it some restraint rather than going for sheer volume as well.

------
rchaud
> The program can dissect a financial report the moment it appears and spit
> out an immediate news story that includes the most pertinent facts and
> figures.

As an accountant in a prior life, I can tell you that this approach won't
provide anything near "the most pertinent facts". That's because public
earnings reports are written specifically to circumvent automated analysis.
Wall Street firms have tools in place to scrape the data tables from those
PDFs, and even then the technology isn't perfect, because layouts aren't
standardized (merged cells, table built in InDesign, etc).

At best, it will get the headline numbers right, which are meaningless without
context. What does a quarterly profit of $3.3m for MSTR mean without
benchmarking against its competitors, or taking the macroeconomic environment
(interest rates, etc) into account?

The headline numbers on the balance sheet, P&L and cash flow statement don't
say nearly as much as the notes to those statements, which often contain
minute details that are extremely relevant for investors and analysts. Trained
accountants and analysts can miss details when reading through those, so how
is AI going to parse it any better?

Unfortunately, this smacks of "quantity of articles" over quality of analysis,
the latter of which represents why journalism is valuable and necessary.

~~~
porpoisely
Wall Street firms get financial feeds from zacks, edgars, morningstar and a
whole slew of other financial data providers. Finance sites, like yahoo
finance also get their data from these sources. They don't have to scrape
anything for data. It comes in structured CSV, XML, etc format already.

Also, journalists are supposed to provide the "facts", not analysis. They
aren't financial experts so even if they provided analysis, I wouldn't put
much stock in them.

~~~
nickles
> Wall Street firms get financial feeds from zacks, edgars, morningstar and a
> whole slew of other financial data providers. Finance sites, like yahoo
> finance also get their data from these sources. They don't have to scrape
> anything for data. It comes in structured CSV, XML, etc format already.

Data quality varies from vendor to vendor. Additionally, speed is a factor in
how profitably some strategies can be executed. When firms are examining bits
on the wire to guess whether earnings were good or not (before the full
headline arrives), you can’t necessarily wait for the vendors to update their
releases, especially since all of your competitors will have exactly the same
data.

------
rubidium
Just like HackerNews comments.

Somewhat seriously, I know someone out there is posting AI generated HN
comments and testing it here. With the proper timing/rates/etc... it wouldn't
be hard to avoid easy detection. I don't have specific accounts in mind, but I
have a hard time believing no-one is trying it out (given the overlap of HN
with AI enthusiasts).

So the real question is: can anyone detect the AI comments?

~~~
dreen
Interesting! Please tell me more about the can anyone detect the AI comments?

~~~
vharuck
Does humor signal humanity, or does a reference to a popular line from a show
famous in nerd culture show lack of creativity?

No matter the answer, I was entertained.

~~~
dreen
Not aware of the show you speak of, was just parroting poorly written
chatbots.

As for humor, that's one thing AI is not going to be able to do well for many
years to come because it requires too much creativity. But as a Turing test
it's not very good - some people are just fundamentally unfunny.

------
apocalypstyx
The thing is the majority of what humans consume is so structured [tv
procedurals, romance/SF/fantasy novels, superhero comics, news, etc, ad
nauseam] is so tightly structured you don't even need an AI to generate it,
even just a markov generator will do it most of the time (someone had a
project that generated quotes indistinguishable from 50 Shades of Grey).
They'd already written screenplays on computers at MIT in the 60s. But even
the high-end isn't _saved_ ; if Burroughs were alive today and in his 20s-30s,
he'd probably being using computers for a modern generated 'cut-up' technique.
Really, Dada was just ahead of the curve by a century.

~~~
thfuran
You're not going to get a (remotely decent) novel out of a Markov chain text
generator unless you're using tuples so long that you're just regurgitating an
existing novel.

~~~
fdggdfsvscvsd
Not Markov chain, but I have been wondering about Dwarf Fortress. Some stories
of Dwarf Fortress sound quite interesting. So I wonder if an automated writeup
of an automated Dwarf Fortress game could yield something worthwhile.

------
Alterlife
Is AI the death of human creativity?

Assisted AI apparently now designs our cars and our planes, designs our logos,
designs our buildings, generates our music.

More significantly, un-assisted AI curates what we are exposed to in the form
of what's shown to us in our feeds and suggestions, and judging by the
articles in OP, already is capable of writing articles for us. AI voices and
news casters seem to be getting better by the day.

I can't tell where this is going anymore.

~~~
freehunter
I'd argue that this type of AI frees up humans to be _more_ creative. If a
journalist doesn't have to spend time writing routine articles about baseball
scores, they have more time to spend tracking down real stories. I posted
elsewhere in these comments about my own AI-powered news system, and if I
didn't have that my entire day would be taken up writing routine articles that
are interesting to readers but have very little journalistic value. Instead I
can turn on the machine and let it write this routine content while I go out
and interview the city manager about the proposed development downtown or the
upcoming tax increase.

If the routine and mindless tasks of humans are moved to automation, those
humans are now free to actually _create_.

~~~
Alterlife
While I do agree with your argument to a point, it breaks down because the
internet is a mountain of crap, and while attention is a limited resource, the
crap is limitless.

You say you use AI to generate mundane articles. Well, your mundane AI article
about baseball and 10,000 other AI generated articles are competing with
quality content written by real people.

There are people who are passionate about baseball, who went went out to watch
that baseball game and write about the passion behind every play and every
ball. Possibly interviewing the crowd and the players.

Your AI is probably learning from those articles and getting better at faking
that passion. So much so that in a few years maybe people won't be able to
tell the difference.

~~~
Footkerchief
In most cases, they are not competing, because the primary niche for NLG is to
generate content that was never economical for a human to take the time to
create -- minor leagues, fantasy football, etc.

------
TheGRS
I was under the impression almost anything that is trigger-based and has some
data points is automated these days. Severe weather, sports statistics, poll
results, finance numbers. All of that is fairly automate-able. They were
talking about this 10 years ago. I'm not sure how to automate much of the rest
since the good stories require some kind of context and talking to people on
the field.

I just hope we don't lose too many reporters who go to city council meetings
and court hearings. Such places need a human present to understand the context
of what is happening and can spot corruption as it occurs. The dark side of
automation is where people figure out ways to game the system.

~~~
marcosdumay
> Such places need a human present to understand the context of what is
> happening and can spot corruption as it occurs.

We have some interesting AI projects going for spotting corruption as it
occurs here at Brazil, both government backed and fully private ones.

------
biztos
It's worth asking whether "thousands of articles on company earnings reports
each quarter" constitutes journalism.

~~~
freehunter
I think the obvious answer is "if it can be done by a machine today, it's not
journalism". It's stuff that journalists are expected to do, but it's busy
work that takes time away from actual journalism.

~~~
biztos
Yeah I would assume that there are some interesting stories hidden in the
earnings reports, but your typical regurgitated press release style "article"
is about the least useful way of presenting the data I can think of.

I hope the financial reporters (or whatever they're called) would be free to
do more digging into anomalies, and have more help spotting those anomalies,
thanks to software.

------
ChuckMcM
Isn't the elephant in the room here that the stories that can be auto-
generated don't have a creative component? They are just reporting some
factual event or data. That is exactly analogous to robots that can assemble
something on a factory line because all of the steps can be pre-programmed.

"Real" journalists are the ones that go through various facts and other
tidbits and realize that there is a story that is deeper than just the facts.
They research and tell that story so that the reader can understand it and
appreciate why it is important or insightful. That is something that a 'robo-
journalist' won't be able to do for a while.

For example, you can easily write a script to publish descriptions of the
local high school football game every Friday night. But a robot doing that
won't recognize when one of the team's members is exceptional, or how a team
has changed its tactics to increase its ability to win games, or the impact of
new facilities have had on the teams performance. Connecting those dots are
still outside the realm of possible for these things.

------
simonw
I'm a fan of Quakebot by the LA Times. The bot even gets a byline:
[https://www.latimes.com/local/lanow/la-me-earthquakesa-
earth...](https://www.latimes.com/local/lanow/la-me-earthquakesa-
earthquake-32-quake-strikes-near-freshwater-calif-yfdd-story.html)

------
summm
So lots of effort is spent to collect data, make it machine readable data,
distill into a selected set of information (still machine readable?) and
then... it is obfuscated again by converting it back into natural language
which must be parsed again by humans? What a waste! Couldn't that last step be
skipped or put under the user's control? Say, my personal
assistant/agent/filter/script could ingest the actual data and act on it on my
behalf? Maybe tell me only what is actually relevant for me. Such that I don't
have to wade through heaps of fluffy bs all the time.

------
atakiel
> “I hope we’ll see A.I. tools become a productivity tool in the practice of
> reporting and finding clues,” said Hilary Mason, the general manager for
> machine learning at Cloudera, a data management software company. “When you
> do data analysis, you can see anomalies and patterns using A.I. And a human
> journalist is the right person to understand and figure out.”

While there's been a lot of great stuff happening on the front of machine-
generated news for a while now, data analysis is definitely another great
target, with perhaps some more immediate gains, when it comes to AI / general
automation in journalism.

------
rblion
Interesting timing when you consider the layoffs and AI-assisted content. It's
like the people that lost their jobs were replaced, so really the only changes
ahead for many firms is content strategy, which is what will drive traffic.

Yet, the elephant in the room is the fact that more and more attention span is
held by just a handful of platforms. If economic trends are any indication,
the coming 'winter' will be rough for many firms that are competing for fewer
dollars that aren't on Facebook or Google.

Interesting times ahead...

------
opportune
If you've ever actually read these financial articles you would know that they
are basically hot garbage. Usually someone just wrote a template that scrapes
a bunch of numbers out of a financial report and puts them in predetermined
places. It has no utility beyond reading the financial report itself aside
from being marginally more accessible.

------
tw1010
What's an example of the most impressive article written by a bot, in your
opinion (fellow HN reader)?

------
charlie0
It would be really ironic if this article was machine generated.

------
therealforsen
Sounds like they finally learned to code

