
Stop Using the Flesch-Kincaid Test - polm23
https://andreadallover.com/2019/02/26/stop-using-the-flesch-kincaid-test/
======
victoro0
It's time people realize that a great deal of scientific papers are just
written as a requirement to finish a masters/doctors degree, or are just about
meeting the yearly publication quotas of the department. I know very
proficient writers that probably haven't written a single article worth
reading.

~~~
PeterStuer
While there are many things extremely wrong with the quantitative output
requirements of modern science, arguing that "a great deal of scientific
papers are just written as a requirement to finish a masters/doctors degree"
is somehow a bad thing is not sincere.

For many their PhD thesis is probably both the most laborious and the most
scrutinized piece of work they will have to produce.

------
tsimionescu
The idea of counting syllables for languages other than English reminded me of
a fun personal experience.

When I was in high-school, I was in a camp with some American Peace Corps
students in my country. They were learning Romanian since they had been
spending some time here and were curious. At one point, one of them asked 'how
do you say "hug"?' to which we replied 'îmbrățișare' (uhm-bruh-tzee-shuh-reh,
approximately). They were taken aback, and after a pause, jokingly quipped
'don't you have a shorter word? Like... Hug?'.

~~~
jacobush
"embrace" it sounds like

~~~
vidarh
That's because Romanian is a Romance language, descended from Vulgar Latin
after the Roman colonization of present-day Romania.

Both are (through intermediaries at least in the case of English) coming from
the latin prefix "in-" and "bracchia" (arm).

Also compare to modern Italian "abbracciare".

------
cthor
Author's criticism that Flesch-Kincaid isn't suitable for non-English text is
on point, but history hasn't been kind to critics of readability scores in
general: [http://www.impact-
information.com/impactinfo/newsletter/smar...](http://www.impact-
information.com/impactinfo/newsletter/smartlanguage02.pdf)

The given counter-arguments against their use in general ("but what about
context!", "they're clearly too simple!") have been tread over for years.
Readability formulas are surprisingly robust, although obviously weak to
adversarial input.

~~~
tgv
Not just adversarial. FK simply uses word length as a proxy for frequency,
which is obviously wrong. It makes crocodiles and elephants score higher than
gybe and vaunt. It also doesn't acknowledge that embedded clauses are more
difficult than a "sequence" of clauses, etc. In practice, it'll often point in
the sort-of-direction-ish, but you need to take the outcome with a pinch of
salt.

Here's an almost random, older paper that discussed readability score
differences between genres:
[http://csjarchive.cogsci.rpi.edu/Proceedings/2008/pdfs/p1978...](http://csjarchive.cogsci.rpi.edu/Proceedings/2008/pdfs/p1978.pdf).
Their conclusion:

... the analyses confirmed that several frequently used approaches for
measuring vocabulary difficulty tend to be structured such that resulting text
difficulty estimates overstate the difficulty of informational texts while
simultaneously understating the difficulty of literary texts. These results
can be explained in terms of the higher proportion of “core” vocabulary words
typically found in literary texts as opposed to informational texts.

~~~
mannykannot
To be fair, these measures are most useful when we are considering alternative
ways to state a given set of straightforward facts. It is unlikely that one
version will use 'crocodile' and another will use 'gybe'... which, on the
other hand, is a good reason for regarding them as merely a first step in
studying political speech.

I wonder what William Buckley would have thought of the claim that the
language of conservatives is simpler than that of liberals?

~~~
tgv
Even then, many of these metrics will prefer "the rat the cat the dog bit
chased escaped" over "the dog bit the cat that chased the rat that escaped".
They are very, very rough measures, to be used only in case of emergency.

> the language of conservatives is simpler than that of liberals?

Times have changed.

------
deathwarmedover
The title is perhaps missing "... for spoken and/or non-English sources,
preferably not at all".

If we should stop using this test, what should we start using? In the author's
comment on the study, they noted "There are ways to study linguistic
complexity".

I'm aware, for example, of this python project which provides F-K scores along
with 7 other readability metrics to consider: [https://pypi.org/project/py-
readability-metrics/](https://pypi.org/project/py-readability-metrics/)

------
yebyen
I read the headline and said "I don't even know what a Flesch-Kincaid test
is." Then I read the article, and realized that I actually do use this test
every day. TIL

The Boomerang extension for Gmail applies a test, in the free version, to give
you a readability grade level. I always find this scores me at 12+ and I try
to do better (lower score), because I know that there's nobody reading my
emails and I will have wasted my time, unless I make it short and to the
point.

And now I know the name of the test. Cool.

------
nooyurrsdey
> Liberals lecture, conservatives communicate: Analyzing complexity and
> ideology in 381,609 political speeches

At what point does it become irresponsible to put out studies with this kind
of title?

All this does is create two warring factions - one side that is bolstered by
the claims and feels they are superior and the other side that feels attacked
and loses some faith in the instition of studies like this.

Furthermore these are rarely ever conclusive. They take a topic like
understanding and interpreting linguistics (an already incredibly difficult
field of study) and boil it down to a dinner table conversation quip.

But the damage is done. This click bait will be freely shared on social media
and be consumed by unwitting participants.

~~~
jfengel
So much of popular science reporting is abysmal that I have to flip it around:
assume that the study is going to be mis-reported and just accept it as a
given. I don't want to let that certainty factor into the choice of what
research is done.

Politics is hard to study, but it's important. If the results are valid,
people will apply them, and that has a direct effect on people's lives. The
data may be messy and the theory doubtful, but if there's any measure of merit
to it, it can gain a small advantage that can be the critical difference. And
hopefully, each subsequent study builds evidence for a paradigm shift that
makes the theory less dubious over time.

------
waisbrot
Imagine applying F-K to code. Since it's based around syllables, we'd discover
that Perl5 code I write on the command-line is some of the most readable code
out there.

------
undecisive
I feel like the author of this criticism is screaming into the void. I have no
doubt that the study is probably flawed, and likely has not proven beyond
doubt any of the hypotheses it has put forward. While FK may not be a great
test of linguistic complexity, it is a test.

To truly annul the study, the author needs a smoking gun - for example:

\- Take a couple of the source texts that score wildly far apart

\- Rework the punctuation such that an objective reader still agrees that the
transcription is a valid representation of the original speech

\- Show that the readability scores in the re-punctuated texts are inverted -
or at the very least, score within a small percentage of each other.

It's a bit like trying to measure the happiness of a city by counting the
percentage of smiling people as they walk down the high street. There are many
social reasons a person might smile from the depths of misery, different
people might classify a smile differently, and different cultures might view
walking down the street with a grin as either frowned upon or mandated.
Weather, time of day, month of year, public holidays - etc - all will affect
the results.

But so long as you work hard to eliminate these biases, you still couldn't use
such a measure to test whether Beijing is happier than Belfast, but you might
use it as an indicator of whether left handed people are happier than right
handed people. You can't tell whether men are happier than women (social
pressures will influence facial indicators along gender lines from an early
age in many cultures) but you might be able to begin a conversation on whether
conservatives are happier than liberals. It won't be conclusive proof, but as
a starting point for future research, it still holds some validity.

My point is that nothing in this criticism conclusively proves that this
particular measure is unsuited for measuring against the stated outcomes. Even
a measure that is the current darling of the linguistics community will have
its drawbacks, and the author needs to prove that the weaknesses in the
measure present a bias in and of themselves, not just that they are generally
deficient.

To my mind, the better criticism is intent. A person can - and should - craft
a speech based on the audience. A speech given at a university or research
facility should have a different complexity score than one given at a rally or
disaster zone. The interesting question to my mind is whether more effort is
put into this by conservatives or liberals, and then comparing that to those
in power vs their contenders.

But then, that paper wouldn't have such a catchy title, would it?

~~~
ykts
The smoking gun you're looking for is in the Language Log posts linked from
the blog post. In particular,
[https://languagelog.ldc.upenn.edu/nll/?p=21847](https://languagelog.ldc.upenn.edu/nll/?p=21847)
shows how trivial changes to punctuation can inflate a paragraph's score from
4.4 to 12.5.

