

Movie Scripts Ranked by Flesch-Kincaid Grade Level - dfkoz
http://dfkoz.tumblr.com/post/84628568976/movie-scripts-ranked-by-flesch-kincaid-grade-level

======
carlob
Wouldn't using a database of subtitles remove most of the bias?

~~~
serf
mostly. some subtitles contain Closed Captioning expressions like [Dogs
Barking] or [Crowd Murmurs].

It'd definitely be less biased than the full scripts+stage direction though.

------
anigbrowl
_Second, the analysis is unfairly favorable to scripts with detailed stage
directions, which tend to be more complex than dialogue._

Yeah, this basically makes it useless for me. Description can convey a mood
but there's no telling how that will translate to what the audience sees. I
used to like writing impressionistic, evocative descriptions; over time I
shifted towards a very minimal style where each sentence describes a single
action or visual subject.

------
waterlesscloud
Buckaroo Banzai is #2, and #1 by Fog score. Helps to have your main character
named "Buckaroo", I guess. And your mcguffin named the Oscillation
Overthruster.

~~~
terranstyler
You could improve the score by mitigating the effect of words that occur
often, e.g., by dividing by the log(word frequency). This would cover both
names and effects of movie-specific language (think Star Trek)

------
sloak
Scripts let you do interesting analyses. Dialog, action (non-dialog), and
directions all have their own syntax. I once wrote a tool that let you pick
movies based on dialog complexity vs. ration action/dialog.

You also have to be careful to use the shooting script. The scripts available
online are often many generations removed and quite different from the
actually filmed scenes.

~~~
andreasvc
That sounds interesting. Consider putting it up on Github or some such.

------
Donzo
I love the concept.

This could be improved by using multiple reading level tests and averaging the
results,

Kind of like this guy does with his open source project:

[https://readability-score.com/](https://readability-score.com/)

------
qwerty_asdf
I'm amazed that a talking head movie like Glengary Glen Ross (0.11) ranks
lower than The Shining (0.73), with its long tracts of imagery and music.

[http://www.imsdb.com/scripts/Glengarry-Glen-
Gross.html](http://www.imsdb.com/scripts/Glengarry-Glen-Gross.html)
[http://www.imsdb.com/scripts/Shining,-The.html](http://www.imsdb.com/scripts/Shining,-The.html)

I guess that's a testament to David Mamet's distinctive writing style, and his
technique of characterization (...and probably the stage directions, as
mentioned in the article).

Also, out of 956 movies, nothing ranks above 6th grade.

~~~
andreasvc
It's much more likely that it's a testament to the unreliability of this
superficial readability measure. A long sentence may be less complex than a
perhaps cryptic shorter sentence. A long, rare word may be more descriptive
than a long-winded circumscription using common words.

------
dfkoz
Author here: all of the comments are spot on. This is what I love about Hacker
News.

------
chbrown
Very cool. @dfkoz, did you do any pre-processing to remove script boilerplate
(e.g., I'm looking at Zero Dark Thirty and there's a lot of CUT TO ... CUT TO
... CUT TO) or try to extract dialogue vs. stage direction?

~~~
dfkoz
Thanks! I removed all of the ALL CAPS words to eliminate many instances of
character names and some stage directions.

------
dmourati
They should rename it the Fletch-Kincaid score.

[During a proctological exam] Fletch: You using the whole fist, Doc?

