About eight years ago, I landed my first full-time software development job in part because of a walkthrough I gave during my in-person interview process of a tool I'd developed while working for a metro daily newspaper. It wrote lottery results stories with the click of a button.
Writing those results stories was a daily chore. We would look up the state's daily lottery results and also report Powerball and Mega Millions results. For those larger lotteries, we'd also adjust our headlines depending on whether there was a big winner in our state. But aside from that, it was just tedious and formulaic, so I wrote a script that would hit the various lottery sites, scrape the data, and generate a two- or three-paragraph story that was ready to post. It took the task down from five minutes to five seconds. (I later worked on a similar tool for generating short weather reports and alerts.)
Since then, I've worked on other NLG stuff, and honestly, it's pretty hard, and the topic really has to be data-driven to begin with; we're probably decades away from really insightful NLG that offers "why" explanations, as opposed to "what" or "how" descriptions.
The other tough challenge about computer-generated journalism is that getting the data is often the hardest part of the process. Oh, sure, businesses are going to release their quarterly earnings reports, and sports teams are going to release their game data. You might even use computer-assisted reporting to generate FOIA requests for data that governments are required by law to release upon request. But you're not going to write a story that leads to Nixon's resignation or Enron's collapse simply by asking for the data.
For any specialized field, doing this would require something more than the kinds of schooling journalists get. It would also step on a lot of political toes: For example, the current national debt is $X. Why is it that big? Is that number a problem? Say anything concrete regarding either of those questions and you get screeching. Just endless, wordless screeching, like the Pod People in Invasion of the Body Snatchers, only directed into your Letters to the Editor column.
There are strong consensuses about some aspects of those problems, however, especially regarding the benefits of using government debt to finance stimulus programs as part of a counter-cyclical action against a recession. Saying it's all unknown is about as intellectually honest as saying that we don't know whether the world is flat or round because we're unsure of its topology at the millimeter scale.
There is no consensus that counter-cyclical stimulus is beneficial, outside of those economists already committed to central economic planning of the money supply. Basic economic theory tells us that spending directed by political forces is likely to be less efficient than spending directed by market forces. Counter-cyclical stimulus uses future earnings of the private sector to pay for current government spending, so assuming all other factors remain equal, the price is less market-directed spending.
Moreover, counter-cyclical stimulus interferes with the development of the natural corrective processes of a free market, because it means those who saved cash during the credit bubble have less opportunity to buy up assets when the bubble pops. By reducing the capital allocated to this group, you reduce their future influence on the economy, which would have the effect of reducing the magnitude of bubbles, and with it, economic volatility.
You can say the same thing about paleontology, but do you seriously doubt the past existence of non-avian dinosaurs?
> For example, does a true free market really exist anywhere in the world? How about a 100% command economy?
You don't need perfect examples of things to observe what happens when economies closely approach those extremes.
> How can you honesty determine if the federal reserve helps or hurts the economy?
By observing how horrible the business cycle was prior to its existence.
But filling in the gaps is plenty of machine-generated content. One of our big draws is the event calendar, and fortunately events give me enough data to feed an AI that I've developed. There is enough structured data and the information is so routine that this system can churn out several articles per day that sounds hand-written and every article sounds custom tailored to the business and the event itself. This fills in the routine work of keeping content flowing in between the stuff that takes a lot longer to research and write, and makes it viable for us (one full time employee and one part time employee) to run two popular small-town news websites where otherwise no one would even bother trying.
This is also a great case where classical AI is still relevant to modern problems. Neural networks just are not good at writing English in a way that humans would enjoy reading. At some point I plan on packaging it up and doing a Show HN, but for the time being, this article is spot on. Machine-generated news content breathes whole new life into an area that's increasingly hard to turn a profit.
Send me an email at moura at my the domain in my profile
Then it strings them all together to write something that looks something like:
>Do you have plans on Tuesday? Well now you do! Van Halen is playing at American Brewpub at 7pm! Van Halen is a local rock band that we love. They have been to our city before, but they played at Next Door Music Venue (link to the previous article for Van Halen at Next Door). This time they're playing at our favorite brewery, so you can watch the show while drinking Local IPA! Tickets are required, and you can purchase them here (link to tickets). We'll see you there!
People love these articles but they're not super fun to write every day (sometimes twice a day for the two cities), it's just routine stuff.
We have about 90 events on our calendar per city in any given month, and this AI writes about three or four articles per city per week even though it's running constantly. It's pretty selective and very rarely writes something that our two human writers wouldn't have mentioned otherwise. Our personal preferences and those of our audience are certainly taken into account.
In any case the AI could make several educated guesses on how good the band is. Quality of the venue would be another indicator.
As an accountant in a prior life, I can tell you that this approach won't provide anything near "the most pertinent facts". That's because public earnings reports are written specifically to circumvent automated analysis. Wall Street firms have tools in place to scrape the data tables from those PDFs, and even then the technology isn't perfect, because layouts aren't standardized (merged cells, table built in InDesign, etc).
At best, it will get the headline numbers right, which are meaningless without context. What does a quarterly profit of $3.3m for MSTR mean without benchmarking against its competitors, or taking the macroeconomic environment (interest rates, etc) into account?
The headline numbers on the balance sheet, P&L and cash flow statement don't say nearly as much as the notes to those statements, which often contain minute details that are extremely relevant for investors and analysts. Trained accountants and analysts can miss details when reading through those, so how is AI going to parse it any better?
Unfortunately, this smacks of "quantity of articles" over quality of analysis, the latter of which represents why journalism is valuable and necessary.
This doesn't seem that difficult to implement into a story. There are 125 earnings reports today , a retelling of the headline numbers with competitor analysis should fit the bill for reporting on most of them.
==Trained accountants and analysts can miss details when reading through those, so how is AI going to parse it any better?==
How could we expect a journalist to parse it better? Is that really the goal of news on financial reports?
Also, journalists are supposed to provide the "facts", not analysis. They aren't financial experts so even if they provided analysis, I wouldn't put much stock in them.
Data quality varies from vendor to vendor. Additionally, speed is a factor in how profitably some strategies can be executed. When firms are examining bits on the wire to guess whether earnings were good or not (before the full headline arrives), you can’t necessarily wait for the vendors to update their releases, especially since all of your competitors will have exactly the same data.
As 'structured data' goes, it's nowhere near where it needs to be to support 'instant article generation' beyond anything but the shallowest headline numbers. The consolidated financial statements all have appendices (notes) with relevant details. The case study you linked to simply indicates only that reduced man-hours significantly by automating the process of manually picking numbers and putting them into an article template. They may as well be putting out a press release.
Although the topic of the post is how automation affects financial journalism, it bears mentioning that an analysts' job is to reverse engineer the report to see how they arrived at those numbers. The vendors' auto-generated reports never include formulas, so you'll have to be doing your own calculations as part of your due diligence.
For instance, if they're trading at a high P/E ratio, how much of that is due to positive investor sentiment and not related to recent buybacks? The headline numbers won't reveal that, but past data and the notes to the financial statements usually will.
If their cash balance says $x billion, how much of that came from convertible bond issues that are coming due in the next 12 months?
Somewhat seriously, I know someone out there is posting AI generated HN comments and testing it here. With the proper timing/rates/etc... it wouldn't be hard to avoid easy detection. I don't have specific accounts in mind, but I have a hard time believing no-one is trying it out (given the overlap of HN with AI enthusiasts).
So the real question is: can anyone detect the AI comments?
As this article puts it: 'And it also illustrated how much people tend to anthropomorphize AI, believing that it has deep-seated beliefs rather than seeing it as a statistical machine.'
But, really, have we proved there is anything to such romantic or spiritual notions about human beings, or are we just 'statistical machines'?
Anyway, my test for an AI-generated comment: determine a measure for 'sensicalness', the higher the score (aka, more sensical) the higher the probability of non-human origin.
[b] Really, I think our definition of human comes not from the mind but from the body. That's always the definition we deploy, whether it's for or against racism or abortion or...anything else, really. Even a brain in a jar is still defined in terms of being a brain in a jar. This is probably why the internet, as was known before, will go away, will cede to the 'video-sphere', when we invented writing, (such as this message) we divorced the content from human embodiment, so we could never be sure, even all the way back then, if we were looking at something composed by man or gods or...anything.
I would go in exactly the opposite direction. AI does have deep-seated beliefs because the programmers who input the training data and label it have deep-seated beliefs, as does the culture the content is drawn from. I'd say it's much more likely that AI is more human than philosophically ignorant scientists obsessed with mechanistic empiricist dogma would let on than it is that humans are just 'statistical machines'.
For instance, AI identifying some women as men (and some men as women) show that it's just as human as the rest of us - it was trained on data based on squarely modernist gender appearances.
This is a good article that touches on the issue: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3078224
No matter the answer, I was entertained.
As for humor, that's one thing AI is not going to be able to do well for many years to come because it requires too much creativity. But as a Turing test it's not very good - some people are just fundamentally unfunny.
An actual implementation of such would be wonky? Yes. But decent is a relative word. People are quite willing to put up with a lotta issue to get what they want. (The churn of quick-shift material in the self-publishing world is enough to demonstrate this.)
But, in relation to the nature of text vs video as I mentioned in another comment, the integrity of the written word probably doesn't matter. If video dominates, the words needed in a script need be no more than a generalized layout of plot-points filled with a bit of ad lib and improvisation. (Given the nature of so much 'reality tv' you don't even need that, simply impress the images and arrange them to the pre-defined consumer-expected structure.)
We really overestimate the importance of the novel. Tweets could be generated easily (and are), and such snippets, I'll argue, are to most-consumed scriptorial content. And visual media predominates. The novels that have wide-ranging effects are things like Dan Brown, which are structured in just such the way as they are much easier than would be suspected to be systematized (As most people will admit: he's really good at writing the same book over and over again).
And the people who care about 'remotely decent novels' beyond their own engagement are few and far between. They have no mainstream cultural value (in the US) beyond shock value.
[This is not to be taken as me knocking novels. Also, I may be too US-centric, but I doubt the rest of humanity is any less dominated by the arresting nature of the visual, or, again, they interact heavily through messaging apps that count in the snippet category.]
>regurgitating an existing novel
That's what most media is. (Which is not to be taken as a slam; a given culture has to repeat or it's not a culture.)
 Am I going to absolutely bet on that it would work? No. But I've done some work in this area, and I still will contend the distance between theory and praxis is much smaller than anyone wants to admit. The tooling would be much simpler than a full-blown AI.
Assisted AI apparently now designs our cars and our planes, designs our logos, designs our buildings, generates our music.
More significantly, un-assisted AI curates what we are exposed to in the form of what's shown to us in our feeds and suggestions, and judging by the articles in OP, already is capable of writing articles for us. AI voices and news casters seem to be getting better by the day.
I can't tell where this is going anymore.
If the routine and mindless tasks of humans are moved to automation, those humans are now free to actually create.
You say you use AI to generate mundane articles. Well, your mundane AI article about baseball and 10,000 other AI generated articles are competing with quality content written by real people.
There are people who are passionate about baseball, who went went out to watch that baseball game and write about the passion behind every play and every ball. Possibly interviewing the crowd and the players.
Your AI is probably learning from those articles and getting better at faking that passion. So much so that in a few years maybe people won't be able to tell the difference.
Photographs took up the mantle of depicting reality and painting became more about visual expression of feelings, perhaps.
I just hope we don't lose too many reporters who go to city council meetings and court hearings. Such places need a human present to understand the context of what is happening and can spot corruption as it occurs. The dark side of automation is where people figure out ways to game the system.
We have some interesting AI projects going for spotting corruption as it occurs here at Brazil, both government backed and fully private ones.
I hope the financial reporters (or whatever they're called) would be free to do more digging into anomalies, and have more help spotting those anomalies, thanks to software.
Another good one might be whether there's an inherent gain in machine-generated news to play by the same rules and constraints as regular human generated news? Is a traditional news article the correct format for a machine generated news, or if some of the constraints of traditional articles are due to human limitations. Basically, which of the rules of traditional articles are due to the creator and which are there to make it easier to consume the information for the reader.
Is there a gain in creating 1 000 articles of company earnings versus condensing or distributing the same (roughly) information in another manner. Mainly, would the reader be able to gain more from news created in another format or medium.
What happens here, is it similar to when newspapers moved from print to web? Initially staying in almost the same format as they had been for hundred years?
When we talk about tools and intermediate forms in relation to journalism and journalistic process, there's also the question of transparency. Is only the end product journalism, or is the end product only a medium for the message? For example, The Correspondent has been a vanguard in this front in traditional journalism, opening up the intermediate forms of the process to the public.
Should we consider for example the aforementioned "thousands of articles on company earnings reports from each quarter" as a intermediate form, rather than the journalistic end product - just a transparent intermediate form? If so, through that lens, is article as a format still the best means for sharing this information?
"Real" journalists are the ones that go through various facts and other tidbits and realize that there is a story that is deeper than just the facts. They research and tell that story so that the reader can understand it and appreciate why it is important or insightful. That is something that a 'robo-journalist' won't be able to do for a while.
For example, you can easily write a script to publish descriptions of the local high school football game every Friday night. But a robot doing that won't recognize when one of the team's members is exceptional, or how a team has changed its tactics to increase its ability to win games, or the impact of new facilities have had on the teams performance. Connecting those dots are still outside the realm of possible for these things.
While there's been a lot of great stuff happening on the front of machine-generated news for a while now, data analysis is definitely another great target, with perhaps some more immediate gains, when it comes to AI / general automation in journalism.
Yet, the elephant in the room is the fact that more and more attention span is held by just a handful of platforms. If economic trends are any indication, the coming 'winter' will be rough for many firms that are competing for fewer dollars that aren't on Facebook or Google.
Interesting times ahead...