
Linguistics and Programming Languages - rumcajz
http://250bpm.com/blog:95
======
JoeDaDude
"We have again the popularity of "Wouldn't it be nice if our machines were
smart enough to allow programming in natural language?". Well, natural
languages are most suitable for their original purposes, viz. to be ambiguous
in, to tell jokes in and to make love in, but most unsuitable for any form of
even mildly sophisticated precision. And if you don't believe that, either try
to read a modern legal document and you will immediately see how the need for
precision has created a most unnatural language, called "legalese", or try to
read one of Euclid's original verbal proofs (preferably in Greek). That should
cure you, and should make you realize that formalisms have not been introduced
to make things difficult, but to make things possible. And if, after that, you
still believe that we express ourselves most easily in our native tongues, you
will be sentenced to the reading of five student essays."

E. W. Djikstra, EWD952 [1]

[1]
[https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD952.PDF](https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD952.PDF)

~~~
AnimalMuppet
I think Perl is interesting here.

When we're talking about computer code, we humans say things like "Read in a
line of text. If it ends in a newline, remove it." But we can't program
computers that way. The compiler says, in effect, "Read in a line of text from
where? And put it where? If it ends in a newline? If _what_ ends in a
newline?" And so on.

But Perl actually lets you program that way. Perl says, "Read in a line of
text? You didn't say from where, so I'll assume the default place, which is
the files that were named as arguments in the program invocation. You didn't
say where to put it, so I'll put it in the default variable. If it ends in a
newline? You didn't say if _what_ ends in a newline, so I'll assume that
you're talking about the default variable, which happens to contain the line
we just read in." And so on.

Effectively, you can use "it" in your conversation with Perl, and it will do
reasonable things. This is one of the places that it shows that Perl was
designed by a linguist.

~~~
schoen
By the way, an important problem in natural language processing, which is
related to this at a high level, is anaphora resolution. When we use pronouns
to refer to people and things we've mentioned before, there are often
ambiguities about what the referent of a particular pronoun is, but native
speakers almost never have to consciously think about this question. But in
practice, resolving these references correctly may require sophisticated
reasoning about our knowledge of the world in order to determine which
interpretation is plausible.

An example adapted from the Winograd Schema Challenge is:

The file couldn't fit on the hard drive because it was too big.

The file couldn't fit on the hard drive because it was too small.

In the first sentence, "it" refers to the file; in the second sentence, "it"
refers to the hard drive. Native speakers who know what files and hard drives
or (or even who don't) should have no trouble understanding the references and
might not even have noticed that there was any ambiguity (!), even though
resolving the ambiguity requires bringing to bear specific knowledge about the
world.

There is a whole family of AI language understanding problems based around
this, such as

[https://en.wikipedia.org/wiki/Winograd_Schema_Challenge](https://en.wikipedia.org/wiki/Winograd_Schema_Challenge)

Looking over some examples shows just how challenging this can be, because of
the way the sentences can require people to know arbitrary things about the
world. ("The atom emitted a photon because it was entering a lower energy
state.")

~~~
thaumasiotes
> even though resolving the ambiguity requires bringing to bear specific
> knowledge about the world

Actually, in this example knowledge of the world (I'm interpreting this as
referring to knowing the details of how "files" and "hard drives" relate to
each other) is not necessary. The ambiguity disappears as soon as you know the
meaning of "fit" \-- fitting requires a large thing to contain a small thing.
Therefore if something is too large, it must be the contained object, and if
something is too small, it must be the container, and those roles are marked
directly within the syntax of the sentence.

You can easily see this experimentally by asking people about sentences with
nonsense words:

1\. The glirp couldn't fit on the vell because it was too small.

2\. The glirp couldn't fit on the vell because it was too big.

Then again, I see you've already noted that speakers who don't know what files
or hard drives are should have no trouble with these sentences. Is the lexical
meaning of "fit" "knowledge about the world" to you?

~~~
gpawl
When you say "glirp couldn't fit on the vell", do you mean to say that the
glirp was placed onto the veil, or that the glirp placed the vell on the glirp
itself?

The melon could not fit on the hat because it was too small.

The melon could not fit on the hat because it was too big.

[https://www.google.com/search?q=watermelon+hat](https://www.google.com/search?q=watermelon+hat)

~~~
thaumasiotes
The first. If the glirp was trying to wear the vell, I'd say "the glirp
couldn't put on the vell" or something else idiomatic, like "the glirp
couldn't get the vell on". I cannot use "fit on" in the sense you're going
for.

You can investigate this yourself at
[http://corpus.byu.edu/coca/](http://corpus.byu.edu/coca/) ; the first hundred
results for "fit on" contain, by my eyeball estimate, more than 90 of the
sense I describe, zero of the sense you insinuate, and a few spurious hits
(such as "to spend as it sees fit on government services", "kept himself fit
on a rowing machine", and my favorite, "I've worked with schools such as the
Pratt Institute and FIT on developing eco-friendly vegan design programs").

There are six results, out of over 500 million words, for "fit it on", of
which one matches your pattern. ("She passes it under the running tap and
hikes her tank up to fit it [a strip of nylon] on around her rib cage")

------
falsedan
Note that Martin Sústrik is a programmer, not a linguist. Larry Wall (creator
of perl) _is_ a linguist _and_ a programmer, and has written on the topic[0]

[0]:
[http://world.std.com/~swmcd/steven/perl/linguistics.html](http://world.std.com/~swmcd/steven/perl/linguistics.html)

~~~
forgotpwtomain
Curiously I find myself disagreeing quite a bit with Larry.

> If a language is designed so that you can "learn as you go", then the
> expectation is that everyone is learning, and that's okay.

It's okay if we never reach understanding or agreement on what Faulkner
intended by a particular sentence (we can still grasp most of the whole). For
a programming language this is explicitly not okay! This goes for ambiguity as
well.

> Multiple ways to say the same thing

> This one is more of an anthropological feature. People not only learn as
> they go, but come from different backgrounds, and will learn a different
> subset of the language first.

This increases cognative load with no particular benefit.

~~~
falsedan
For the former, I think Larry does not imply a post-modernist “death of the
author” position: there is still an objectively correct interpretation of the
code as the author intended, but it may be understood differently by people of
different experience levels. For example, a map with reference to a sub can be
though of as a loop that calls the sub.

With perl, the objective truth is opcodes, which are well-understood by a
small group. Everyone else bases their understanding on heuristics and
analogies, and the goal is to write your code to trigger the same
heuristics/etc. in the reader.

For the latter, you are declaring your opinion as fact. Every language allows
redundancy and variation of expression; if it truly provided no benefit, why
have we not seen a popular language that only allowed a single expressive
style?

~~~
forgotpwtomain
> For the latter, you are declaring your opinion as fact. Every language
> allows redundancy and variation of expression; if it truly provided no
> benefit, why have we not seen a popular language that only allowed a single
> expressive style?

It is indeed my opinion (as prefaced by 'I find myself disagreeing..') but the
premise that a programming language _benefits_ from resembling or mimicking
features of natural languages is also an _opinion_.

Also I should note that your phrasing of the question isn't quite correct ("if
it truly provided no benefit"), something that provides no benefit is unlikely
to be excluded from a language (e.g. double negatives "there ain't nothing
here to see!") unless there is a clear benefit to doing so, and in fact there
are times that there is.

Particularly I would draw your attention to for example more limited lexicon
sets (such as those used by dispatchers, rescue workers, climbers, EMT
professionals etc.). Those explicitly exclude variation of expression, since
that leads to an increased risk of being misinterpretation in often critical
situations. It is my inclination that while interpretation of code isn't time
critical, reducing variance (e.g. a common coding style does this as well)
reduces cognitive load (makes comprehension more efficient).

~~~
falsedan
Short-order fry cooks, too. That's a different kettle of fish though.

What do you think regarding my position on heuristics/forming an idea in the
mind of the reader?

------
DonbunEf7
Learn Lojban today:
[https://mw.lojban.org/papri/la_karda](https://mw.lojban.org/papri/la_karda)

Lojban is based on predicate and relational logic, and parses unambiguously
for both humans and computers, meaning that we can get straight to semantics
instead of faffing about with syntax.

~~~
kbenson
> parses unambiguously for both humans and computers

I suspect this is only because there are so few speakers of Lojban. As soon as
a language is in common use, it will be extended naturally by the users. I
doubt any level of consistency can be enforced over time in that case. Popular
slang and idioms generally make it into a language if they persist long
enough. Good luck keeping unambiguous parsing in that case.

~~~
TremendousJudge
Just look at JavaScript

~~~
msla
JavaScript cannot be ambiguous because there's a constant acid test: Does the
machine find it ambiguous? If it does, it fails to parse. (If it parses, it
wasn't ambiguous, but it still might mean something the programmer didn't
intend.) That property will hold until we do something like embedding AI in
the compiler.

Humans can deal with ambiguity. There are various reasons for humans to want
to be ambiguous, and any natural language has to support that to be useful for
day-to-day conversation. Loglan and Lojban might well escape that, if they're
only ever used in contexts where ambiguity is not desired and will be repaired
if it is found.

Think formal specifications, not love letters.

~~~
setr
Isn't the automated insertion of semicolons to your detrimemt a result of
ambiguity in the language, and the compiler forced to decide

~~~
laszlokorte
The rules for automatic semicolon insertion are defined by the language spec.

------
Ngunyan
English ≠ Computerish

but:

English = C || English = Pascal || English = SQL .......

