
Paving the way for human-level sentence corrections - elizacassan
https://tech.grammarly.com/blog/paving-the-way-for-human-level-sentence-corrections
======
aisofteng
The problem that the authors are trying to tackle is an interesting and
difficult one.

I have noticed that when dealing with natural language using artificial
intelligence / machine learning techniques, the work being done by computer
scientists very often would have greatly benefitted from collaboration with a
linguist or other sort of language expert, especially in the design phase of
an experiment. This work is a good example of what I mean.

People trained in CS or similar precise fields develop, over time, a tendency
of thinking in terms of "getting the right result" (I say this as one of these
people). When dealing with natural language, however, sometimes there simply
is no single correct result.

Consider the topic of fluency that the authors work on: is there a rigorous,
objective definition of "fluent"? The answer, as any linguist would tell you,
is "no". There are idiomatic expressions, grammatical structures,
contractions, slang, and so on that vary from city to city within a country,
let alone globally. What may sound "fluent" to one native speaker of a
language may sounds strange to another. It is impossible to objectively
generally evaluate "fluency". In particular, any practicing linguist will be
able to give examples, likely off the top of their head, of English sentences
that would be rated as "fluent" by someone from one geographical area and
"awkward" by someone from another.

Furthermore, using Mechanical Turk to find humans to rate the fluency of a
particular sentence makes for an unclean dataset and evaluation benchmark. The
linked post says that, in the end, 50 people found via Mechanical Turk rated
sentences for fluency; since any one language is used significantly
differently around the globe, there will be an unpredictable range of fluency
ratings for at least some sentences across just 50 people around the world.
Choosing a different 50 people to rate the same sentences would most likely
result in different fluency ratings.

I do not mean to detract from the authors' work; this is a difficult problem
to tackle, with no clear path to a general solution. However, I am forced to
wonder why the authors, who, based on their biographies linked to in the
article, seem to have a range of experience, did not comment on the
considerations I've mentioned here.

~~~
matt4077
Maybe they went to ten linguists, and all they got as an answer is "there is
no objective definition of 'fluent'. You are trying to find a single correct
result that doesn't exist!"

Then, armed with the naiveté of thinking that if there is something like
'fluency' it must be possible to measure it, they just threw a bit of money at
the problem. Note that asking a representative group of people is the closest
you can get to exactly what you want to measure (apart from asking everyone).
It doesn't matter that there's no agreed-upon method to measure the quality of
pizza: if I maximise the subjective impression, I'll get exactly what I
wanted.

------
matt4077
I don't quite get this... According to the bar graph, the automated systems
fail to correct around half of even orthographic (spelling) mistakes. One
example is "advertissment", which macOS is now trying really hard to correct
against my will in this text area.

Another example is "From this scope, social media has shorten our distance",
where "scope" is supposed to be "perspective". That seems to be something that
machine learning should easily pick up on, and indeed, when I just tried it on
google translate, I couldn't get it to make this mistake without my original
(german) sentence also encroaching awkwardness.

So I'm unsure how much value it is to win against systems that fail rather
spectacularly. I also don't quite understand why you would need manually-
created data for this task, instead of just buying everything ever written in
TOEFL essay questions and pitting it against the New York Time's archive.

It's obviously quite likely that there are good reasons for all this. They may
have thought a bit longer about it than I just did.

------
cooper12
Hmm I'm sensing a bit of garbage-in garbage-out here. For starters their
original sentences contain unlikely typos instead of homonyms which would be
much more commons. (it complicates the learning as well I'm sure since some
changes were made to correct similarly spelled terms which could really change
a sentence's meaning once applied) Second, the human corrections aren't that
good. We really need to stop creating data sets using anonymous exploited
labor that is paid pennies. (they did screen the Amazon Turk users, but if you
live in America or work at a university, is there really a shortage of fluent
English speakers around you?) Overall I'd say the fluency-editing approach
shows promise and would be a boon to ESL-learners, but the training data needs
to be improved.

------
lutusp
This is a great project -- in Phase One, the algorithm will correct sentences
written by people who didn't learn basic literacy in school and who
subsequently endeavor to avoid reading or writing any text, preferring video.
In Phase Two, the algorithm will do away with the poorly written source and
create something entirely on its own. Based on my sampling of contemporary
human-crafted sentences, Phase Two will take place just in time.

Apropos, my all-time favorite malapropism took place 50 years ago when I was a
teenage TV repairman. I visited a household, spied a record turntable, and
asked, "Is that a stereo turntable?" "No," replied the customer, "It's
_monorail_."

I was able to avoid blurting out, "I think you mean _monaural_ , yes?" \-- for
three reasons. One, it's regarded as bad form to correct the grammar of
customers, who are always right. Two, technically, the turntable was in fact
monorail (i.e. able to follow only one recorded track). Three, I was too busy
trying not to laugh.

~~~
kwhitefoot
> In Phase Two, the algorithm will do away with the poorly written source and
> create something entirely on its own. Based on my sampling of contemporary
> human-crafted sentences,

A problem with this is that there will be a tendency for it to become
normative. This is what happened to the OED. Originally it was an etymological
dictionary of the usage of English. Now it is regarded as an arbiter of
'correct' English.

~~~
lutusp
> A problem with this is that there will be a tendency for it to become
> normative.

Yes, true. It would turn description into prescription, but we're already
approaching that point. I'm not advocating this, only mentioning it.

> This is what happened to the OED. Originally it was an etymological
> dictionary of the usage of English. Now it is regarded as an arbiter of
> 'correct' English.

I suspect those behind the OED would deny that as a goal, while acknowledging
it as an outcome.

I have a little fun with people who think dictionaries prescribe correct
usage, by pointing out that, according to current dictionaries, "literally"
and "figuratively" mean the same thing. This is true because that's how people
use the words, and a dictionary's purpose is to dispassionately record how
people use words, without judgment or rancor.

This is why "reign it in" (now seen regularly) will become an accepted
substitute for "rein it in" \-- people want to say it that way, so be it.
_Reigning_ is what a monarch does to a kingdom, _reining_ is what a cowboy
does to a horse, but people are free to say what they want.

------
saagarjha
This was an interesting read for someone unacquainted with the field–it
appears to be very difficult to fix "awkwardness" in sentences; none of the
methods were able to reduce it significantly. It looks to me that awkwardness
is more based on common usage than on actual grammar, perhaps this could be
improved with a solution similar to Google Translate's, which looks at real
world usage instead of syntax?

~~~
839083
Real world usage would have to be curated though, awkward sentence
constructions or word choices do happen in real world usage. Or as the article
shows, there can be multiple, very different ways of fixing awkwardness. I'm
not sure what it would look like to find a solution that's "fitted" to several
of these.

------
mannykannot
Fluency is nice, but semantics matter the most.

------
jwilk
s/are comprised of/are composed of/

~~~
3131s
Are you saying that "are comprised of" is not grammatically correct in its
context in the article? Why?

~~~
jwilk
[https://en.wikipedia.org/wiki/Comprised_of#Evaluation](https://en.wikipedia.org/wiki/Comprised_of#Evaluation)

------
throwayedidqo
I have a feeling this is one of those places where ML will not be useful until
we have strong AI.

Certain grammatical errors are impossible to fix unless you understand the
overall meaning of the text. Sometimes this meaning is embedded over many
paragraphs. Errors involving incorrect word usage are unsolvable when words
have more than one meaning and you don't comprehend the subject at hand.

~~~
mack73
Once we have strong AI, whatever that buzz word means, what then would be the
usefulness of understanding slang?

Personaly I think the usefulness is already to be able to enterpret a concept
encoded in slang as the same as the concept derived from a message encoded in
a different dialect (or language).

I would never assume a machine spoke this language, only that they understood
it. Machines should evolve into speaking succinctly as to not include
unneccessary complexity in their messages as they would strive to be well-
understood like all other persons do. I fail to see why we would want to
produce slang-encoded messages, unless we want to mask the fact we are a
machine.

~~~
throwayedidqo
Ambiguous messages do not imply slang. Plenty of words have multiple meanings
in normal and formal English. It's a much worse problem in tonal languages
like Chinese. Tell me how you could grammatically correct this without
understanding meaning [https://en.m.wikipedia.org/wiki/Lion-
Eating_Poet_in_the_Ston...](https://en.m.wikipedia.org/wiki/Lion-
Eating_Poet_in_the_Stone_Den)

Strong AI isn't a buzzword either, it's been in use for as long as I can
remember. Maybe you would be able to understand my Grammer better if I said
super human general intelligence and wasted a bunch of space in the process.

I don't think you read my comment? You seem to imply that the corrections
would be unambiguous while my point was that some errors are uncorrectable
without understanding meaning.

~~~
3131s
> _Plenty of words have multiple meanings in normal and formal English._

There are some stats from Wordnet on polysemy in English. Obviously this
depends on the granularity of a set of senses in a dictionary, but regardless
English has many polysemous words (26,000+ according to Wordnet). And more
importantly, these polysemous words also tend to be the most common words,
hence words like "set" having around 120 definitions in the Oxford English
dictionary.

[https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html#s...](https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html#sect3)

