

AI Sports Journalist Covers Every Division I College Basketball Team - kkleiner
http://singularityhub.com/2010/11/17/ai-sports-journalist-covers-every-division-i-college-basketball-team/

======
cryptoz
The future. It has arrived. I remember reading a few years ago that sports
writing is very formulaic and _should_ be possible to automate. At the time,
nobody had really done it...but here we are. It's fantastic to see all this
"weak AI" out there, making people money and building a reputation for _being
better than most humans_ at a task previously thought to be human-only.

Are we slowly walking directly towards strong AI, with all our little tiny
steps of weak AI?

I can't wait.

~~~
JonnieCache
Surely strong AI is by definition such a paradigm shift that it cannot be
achieved with lots of little steps of weak AI? I thought that was kind of the
point of the strong/weak AI dichotomy.

~~~
jerf
A definition has no power to change the engineering feasibility of a task. If
it is possible to build a strong AI from a series of weak AI steps, which is
certainly not out of the question (one can argue that's how human brains
work), no amount of definition will change that.

~~~
JonnieCache
However if we accept the definition, then it is surely meaningless
(pointless?) to discuss AI progress in those terms.

EDIT: what i really mean is that I reject the strong/weak AI dichotomy. I
really need to resist the urge to play pointless semantic games. Guess thats
an occupational hazard of programming all day long.

------
DevX101
This is really amazing. I love data, but the real value comes when I'm able to
translate that data into a story. Programs like this are going to become
increasingly important as we become deluged with data, but don't know what it
means.

All that being said, I want to start hacking around on something like this in
a different domain. Any tips on how to get started on the data-to-language
transformation? Code or academic articles welcome.

------
dlytle
I never understood what was interesting about sports or sports journalism
until I moved back to Nebraska.

I've been watching Nebrasketball, the mandatory Huskers games in the evenings,
and roller derby, all with a group of very sports-aware people and one
aspiring sports journalist. His commentary on the football/basketball games
converted me from a bored observer to actually enjoying the sports.

I'm impressed by what this site is generating, but I'm not too worried about
it displacing actual writers. It's the difference between a local news
affiliate press release, and an article in the New Yorker or the Atlantic, or
BBC reporting. There's still something that a passionate human can bring to
the equation that will be almost impossible to automate.

That said, StatSheet is really damn impressive.

~~~
RobbieStats
I agree that we aren't replacing anything. There will never be one source for
a particular topic (although our content is significantly less expensive to
produce).

But I wouldn't underestimate the potential for automated content to be even
better than what the best journalists can produce. We are at the infant stage
of automated content. We have a long way to go and it will only get better.

~~~
JoelSutherland
You've already surpassed Adrian Wojnarowski, but computers have a long way to
go to replace even average humans:

[http://www.truthaboutit.net/2010/11/wizards-claw-
raptors-109...](http://www.truthaboutit.net/2010/11/wizards-claw-
raptors-109-94-the-gilbert-arenas-hockey-assist.html)

That's a bit of a joke, but the real value automation will provide is not the
translation of statistics into narrative, it is providing the right
statistics. I can honestly say that your generated stories are better than
most AP articles in that regard.

Have you looked at computing any advanced basketball statistics? Highlighting
those and turning them into narrative could go a long way toward raising
awareness about them in the general (sports) public.

~~~
RobbieStats
Yes, we compute a bunch of advanced stats for statsheet.com. I've
intentionally shied away from doing too much "advanced" stuff on the
individual sites because i want them to be consumable by the average fan. Some
feedback we get about statsheet.com is that it can be overwhelming.

That said, we'll be adding more unique stuff over time. We started doing a
"Fan Satisfaction" chart: [http://carolinaupdate.com/north-carolina-
basketball/fan_sati...](http://carolinaupdate.com/north-carolina-
basketball/fan_satisfaction) that attempts to objectively measure the
subjective notion of "fan satisfaction".

------
snorkel
Expectations are high that this year’s <% TEAM_NAME %> team is an improvement
on last year’s. They’ll be bringing back a group that played <% TOTAL_MINS %>
of last season’s minutes and adding the efforts of <% NUM_TOP100_PLAYERS %>
Top 100 recruits that will *###>>>#> RISE UP AND KILL ALL LIFEFORMS ###>> ...
losing in the Championship game to <% FINAL_OPPONENT %>

~~~
RobbieStats
Not quite that easy I'm afraid :-)

~~~
snorkel
Are you sure about that?
[http://www.google.com/#q=%22Expectations+are+high+that+this+...](http://www.google.com/#q=%22Expectations+are+high+that+this+year%27s%22+%22is+an+improvement%22)

~~~
kenjackson
Not quite that easy. Notice that the following sentence is often different.
What I suspect they do is have a set of characteristics they look for, like
winning team, number of starters coming back, all starters coming back, won
conference, made NCAA, made final four, won it all, etc...

And then they have a set of sentence structures they can plug in for these.
This allows you to actually generate a different story for the exact same team
-- saying really the exact same thing.

------
swombat
The lesson for journalists here is: if you get into the habit of writing
formulaic garbage, you WILL be replaced by a small shell script.

We're a long way from top writers like Hunter S Thompson (who was a sports
journalist, among other things) being replaced by programs.

Even far less outstanding writers still provide something valuable, by, for
example, extracting knowledge and insight out of the data, rather than just
regurgitating data in a human-readable format. If your job is to publish hard
facts without insight or analysis, though, your job is toast.

------
ilamont
Bravo! This type of technology will free up sports journalists to concentrate
on the areas where they can really add value, such as interviews, analysis,
and "color" that the algorithms cannot generate.

A word of caution, though: The potential for bad data to enter the system and
get published and spread without a human editor to correct it could cause real
problems for teams and athletes. It will require tightening up on the sources
that are used, redundancy of sources, and checking mechanisms.

------
gcampbell
I'm a Stanford fan, so here's a (nit-picky?) deconstruction of
[http://cardinalupdate.com/stanford-basketball/game-
recap/sta...](http://cardinalupdate.com/stanford-basketball/game-
recap/stanford-wins-64-48-over-san-diego) \- the recap of the team's first
game.

"Stanford has already started living up to monumental expectations with a good
first game": I'm not sure where the "monumental expectations" comes from;
Stanford finished below .500 last year and was picked 9th in the Pac-10
preseason media poll.

"Stanford defeated San Diego with domination in offensive rebounding and an
additional beat in possession management.": Not sure what "an additional beat
in" means here.

"Stanford has incredible expectations for this season, and this victory over
Cardinal was a good start.": Should be "over the Toreros" or "for the
Cardinal".

Overall, though, it's pretty slick.

~~~
RobbieStats
Two of those three have been fixed now. The first (around expectations)
requires a larger tweak which we are making.

The point to keep in mind is that we are at the infant stage of automated
content. Unlike most writers, our content will get significantly better over
time.

------
lsb
At a hospital that my dad programs for, the radiologists were using medical
transcription software to translate "normal foot, normal shoulder" into
doctor-speak, and the pre-canned phrases were too repetitive to be pleasant,
so they ended up with 50 ways of expanding "normal", and that was acceptable.

This looks like a pretty cool way to interpret stats; I wonder if they could
automate Tufte-quality graphics of how the team performed, annotating graphs
with news excerpts.

------
akkartik
The original robot in Asimov's "I, Robot" short story was called Robbie.

OMG, statsheet has robot journalists _and_ a robot CEO!

~~~
RobbieStats
Can you tell me more about "robot CEO"? :-)

~~~
julianz
Well played :)

------
scotch_drinker
As someone who has long had an interest (my wife would call it an obsession)
with transforming the basic stats of sports into something more useful and
meaningful, this is fascinating stuff. I think anyone that reads the game
previews and recaps on NBA.com, NFL.com or any other major league site can see
where things could be automated. That writing IS largely formulaic and
forgettable. It would be very interesting if instead of just doing the same
thing through automation, it produced innovative stats and interpreted them in
ways that gave the reader insight into the outcome of the game.

My current pet project is taking NBA data and transforming it into something
beyond points per game, etc. I think that's where the future of sports related
AI resides.

~~~
yesno
Mind if I ask you where you can get these data?

I'm a semi hard-core NBA fan.

~~~
mbergins
I'd like such data as well, email in profile.

------
bryanh
How interesting, I was just reading some articles [1] [2] about learning to
think like a programmer and program. Both mention how beneficial it is for
journalists to code and think like a coder.

While they mainly talk about saving time with repetitive tasks or
compiling/understanding complex data, _I suppose now its good for job security
too_...

[1] [http://infovore.org/archives/2009/01/22/learning-to-think-
li...](http://infovore.org/archives/2009/01/22/learning-to-think-like-a-
programmer/)

[2] <http://www.charlesarthur.com/blog/?p=1098>

------
smackfu
Article says: "Many games, though not all, get detailed write ups. Computer
programs turn data from box scores into full sentences that put the reader in
the game. "

I have yet to find a game that has a write-up, even for top 25 games. Am I
missing the link or something?

Because just automated stats doesn't seem that impressive or very AI.

~~~
RobbieStats
Actually we generate game recaps for EVERY division I college basketball team.
Which one did you not see it for?

Here is Duke: <http://bluedevildaily.com>

~~~
smackfu
Oh, I didn't realize that the per team domains have different stuff than what
is on StatSheet.com. I just went to the site, clicked on College Basketball in
the header, and then picked my team.

Basically, I was expecting to see a generated recap on this page:
[http://statsheet.com/mcb/games/2010/11/12/rutgers-73-princet...](http://statsheet.com/mcb/games/2010/11/12/rutgers-73-princeton-78)

~~~
RobbieStats
And it should be there! We have a lot to do and integrating the team site
content within statsheet.com is a top priority.

------
iampims
I automating sports news is first, I predict that Finance will come next.

~~~
rorrr
Seriously, most finance "articles" already look like they were written by
bots. I have no idea who reads garbage like this:

[http://www.tradershuddle.com/20101117119131/Stocks/pre-
marke...](http://www.tradershuddle.com/20101117119131/Stocks/pre-market-
update-gapping-stocks-f-urbn-c-fcx-hd.html)

[http://collegestock.com/blog/885-metal-stocks-slumped-
alcoa-...](http://collegestock.com/blog/885-metal-stocks-slumped-alcoa-inc-
cenx-nor-fcx/)

There's literally no useful info, just data written as words.

~~~
ankimal
HFT = High Frequency Trading ?

~~~
JonnieCache
Yep. They do natural language parsing of wire news stories to try and
synthesize market intelligence. Which is one day going to lead to a
civilization-ending feedback loop due to the sort of thing we're discussing
here (in my paranoid hyperbolic opinion :)

------
nkassis
I don't know where they get their data but I just checked my teams(FSU)
football page and they have a bug in the record (6-4 instead of 7-3). Not too
impressed there.

------
joezydeco
If the bot is willing to make Taiwanese-style CG animations of the highlights,
I'm all for it.

------
herdrick
Obviously the big win here is in the long tail: high school sports, rec
leagues, etc.

------
omouse
Ugh this obsession with AI is annoying.

