
An amateur linguist loses control of the language he invented (2012) - godarderik
http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners
======
godarderik
Reading this story brings to mind the history of algorithms in the field of
machine translation. Early attempts at the problem attempted to explicitly
define the rules of converting between tongues using meticulously laid out
systems of vocabulary and syntax. This approach proved untenable, in part due
to the complex and ever changing nature of language. Modern systems such as
Google Translation make use of machine learning algorithms that are fed large
amounts of source material and computationally discern relationships between
them.

I wonder if a similar approach could be taken with language construction.
Instead of spending 25+ years fleshing out the details of a language in
painstaking detail, computer programs could be devised that, using large
amounts input, determine the most "efficient" means of expressing information.
The approach would not only be far less labor intensive, it could also
accommodate the rapidly evolving nature of language, for example adding to its
"dictionary" in response to new phenomena in need of naming.

~~~
erikb
In fact it doesn't just sound like an experiment worth doing. It sounds like
something that somebody somewhere might have already done.

~~~
voronoff
To be glib, it has been done. We call it language.

Seriously, though, take a look at the link I posted in
[https://news.ycombinator.com/item?id=8180924](https://news.ycombinator.com/item?id=8180924)

One of the techniques used is to computationally create a space of possible
ways to partition semantic domains on a plane whose dimensions are simplicity
and informativeness, in order to look at where in the possible space it is
that real languages lie. While it's not been done (to my knowledge) for a
whole language, it's potential direction to go.

------
voronoff
For anyone who is interested in what an ideal language would look like,
particularly in respect to brevity vs. informativeness I'd highly suggest
looking into Terry Regier's work:
[http://lclab.berkeley.edu/](http://lclab.berkeley.edu/)

I worked in his lab on one of many projects showing that most human languages
use a near optimal trade-off in various semantic domains (so far - color,
kinship, containers, and spatial relations). His work also includes some of
the best evidence for some language dependent forces in cognition interacting
with some universal ones.

------
MichaelDickens
Ithkuil seems like what a language should be: as the article said, it is both
precise and concise. It looks the way Esperanto ought to have looked. I find
Quijada's effort deeply impressive.

I don't know much about designing human languages, but I know how hard it is
to design a decent programming language (see
[http://colinm.org/language_checklist.html](http://colinm.org/language_checklist.html)),
and building a serious human language seems orders of magnitude more
difficult. I've never seen an attempt that really intrigued me until I found
Ithkuil.

~~~
smsm42
Language should not be concise. Redundancy is built into the language for a
reason - language communication is extremely noisy and if there's two-bit-
error distance between "I love you" and "I killed and ate your dog" then the
usage of this language by humans would not be comfortable.

Moreover, people communicating are imperfect. So if you have a language which
is very precise and concise, you would have to spend a lot of effort to find a
word or set of words which exactly expresses your meaning (in programming, we
call it design when we do it upfront, and debugging when we do it post factum)
and communication would be a very complex exercise. However, if you have a lot
of words which mean roughly the same, you can be sure the meaning is passed
through even if the words are not chosen super-carefully.

~~~
laichzeit0
In some languages the meaning of a word is highly dependent on the pitch
accent, like ancient Greek. If you're 1 bit off you have trouble :) Surprising
enough I read an example of this yesterday evening:

"Hegelochus, the actor in Euripides' Orestes, which was presented in 408 BC,
in line 279 of the play, instead of "after the storm I see again a calm sea"
(galeén' horoo), Hegelochus recited "after the storm I see again a weasel"
(galeên horoo)."

~~~
legutierr
One of the most famous passages from the Bible seems to have have been
affected by (or benefited from) a similar ambiguity.

Mark 10:25 (and parallel versions in Matthew and Luke) has prompted much
speculation over the centuries with regards to the origin of it's evocative
metaphor: "It is easier for a camel to go through the eye of a needle than for
someone who is rich to enter the kingdom of God."

But when you consider that the word for camel (kamêlos) and for rope (kamilos)
differ by only one vowel, quite a mundane explanation springs to mind: someone
in the early church misheard, misspelled, or mistranslated Jesus' original
admonition.

The more satisfying explanation, the one that I prefer, is that this is a pun
that happens to have gotten lost in translation.

The few comments in this blog article offer some interesting explanations:

[http://rambambashi.wordpress.com/2010/06/03/common-
errors-36...](http://rambambashi.wordpress.com/2010/06/03/common-
errors-36-a-needles-eye/)

------
ejr
If anyone wants to hear what Ithkuil sounds like :
[https://upload.wikimedia.org/wikipedia/commons/c/c9/Ithkuil_...](https://upload.wikimedia.org/wikipedia/commons/c/c9/Ithkuil_pull_uiqisx.ogg)

From
[https://en.wikipedia.org/wiki/Ithkuil](https://en.wikipedia.org/wiki/Ithkuil)

~~~
lloeki
> _Ithkuil does not use the concept of zero_

Interesting. How is one supposed to talk about math without a concept of zero?

~~~
ejr
It's a fascinating problem. It makes me wonder, without zero, what paths
mathematics would have taken. There have been civilisations that used math
extensively without zero and I hope Ithkuil-fluent mathematicians some day
would continue exploring this.

~~~
csirac2
"Zero: The Bibliography of a dangerous idea" was quite a memorable book for
me, and goes into some good detail on those very points - even trying to
guesstimate just how damage to progress of civilization by the numerous
rejections of zero at several points in our early history.
[http://books.google.com.au/books/about/Zero.html?id=obJ70nxV...](http://books.google.com.au/books/about/Zero.html?id=obJ70nxVYFUC&redir_esc=y)

------
tomkinstinch
The same thing happened to Blissymbols[1], as documented by radiolab[2].

1\.
[https://en.wikipedia.org/wiki/Blissymbols](https://en.wikipedia.org/wiki/Blissymbols)

2\. [http://www.radiolab.org/story/257194-man-became-
bliss/](http://www.radiolab.org/story/257194-man-became-bliss/)

~~~
MBCook
That was a great episode.

This has also happened with the language Lojban[1], which was 'forked' from
Loglan[2] when the creator starting making copyright complaints so the
community could maintain control.

Such an odd concept that someone could 'own' a language, but I guess if you
created it I can see why you would want to.

[1] [http://en.wikipedia.org/wiki/Lojban](http://en.wikipedia.org/wiki/Lojban)
[2] [http://en.wikipedia.org/wiki/Loglan](http://en.wikipedia.org/wiki/Loglan)

~~~
mchaver
I think it's a great lesson for any creator. Just because you have invented
something does not necessarily mean you have the right or are capable of
dictating how people use it.

I seem to remember Umberto Eco mentioning that he does not offer his own
interpretations of his novels for a similar reason, but I can't find the
quote.

~~~
Uncompetative
I can recommend his book.

The Search for the Perfect Language - Umberto Eco - 2010

------
tokenadult
This is attracting some reader interest here, so I should probably mention,
for other Hacker News participants deeply interested in human languages, a
definitive analysis of Esperanto[1] explaining why Esperanto has not caught on
with more speakers.

[1]
[http://www.xibalba.demon.co.uk/jbr/ranto/](http://www.xibalba.demon.co.uk/jbr/ranto/)

~~~
ketralnis
I don't think that explains anything; it looks like a list of aesthetic
"faults" that the author finds, unlikely to be recognised by anyone without a
degree in linguistics. Surely those faults exist, but it's a stretch to
pretend that they alone explain anything.

I speak some Esperanto but I'm no zealot. The "explanation" that your
neighbours don't speak Esperanto, if such a simplified thing can exist, is
probably as simple as network effects. I can learn German and speak to the man
next-door, or I can learn Esperanto and speak to some theoretical people that
may exist somewhere but I don't know them and they are mostly a bunch of nerds
that meet at the local co-op to speak in Esperanto mostly about how great
Esperanto is.

Go ahead. Ask your neighbour why he doesn't speak Esperanto. Does he say " The
'basic' number‐terms tri, trio, tria ('three, threesome, third') are a crowded
jumble, making a mockery of the regular root/noun/adjective pattern they
imitate" (K5 in the article)?

Or does he say "what's that?"?

~~~
Pamar
Agreed: studying Esperanto now makes as much sense as striving to provide CP/M
compatibility in your product.

~~~
ketralnis
Please don't put words in my mouth, I don't think you agree with me at all.

I think Esperanto is a fun hobby and I have fun toying with it every few
years, even if I'm under no delusion that it's a practical way to, say, do
international business. I do think that if it or any universal second language
(French, English, Mandarin, whatever) were to suddenly become more widespread,
the world would be better off.

My claim was only that the reason that it's not more widespread has nothing to
do with subjective flaws in the language itself.

~~~
Pamar
Sorry, I wasn't trying to put words in your mouth. I was just trying to
underline the fact that if you put "CP/M compatible" in the ad for your
product you'd get exactly the same response from most people, i.e. "uh, what?"

And probably the same response after you patiently explained them what CP/M
(or Esperanto) was: "Why the hell should I care about that? Give me something
that works with Windows (or Mandarin) so I can work with a sizeable portion of
the rest of the world !!!"...

------
ilaksh
Ithkuil is definitely one of the most amazing pieces of work I have ever come
across. I having been using the name as my email address for many years and
another variant of it he had called 'ilaksh' as my screen name (note I didn't
have anything to do with the creation of ithkuil/ilaksh, just a fan). I think
not only other conlangers but also anyone interested in fields like
linguistics, computer programming, knowledge representation, etc. can be
inspired by what Quijada did.

I did get a few somewhat weird emails that I think were in Russian some years
ago, but I think they figured out pretty quick that it wasn't the right email
address to reach Quijada.

------
jqm
Losing control of a language seems to be standard procedure.

If this invented language were to catch on, it likely wouldn't be a generation
or two and kids who grew up speaking it would start saying the Ithkuil
equivalent of things like "yo dog, that's the rad shizaz!". Then, several
generations thereafter grandmothers would be regularly using the word "shizaz"
and they would have to put it in the dictionary. That's just the way it goes
and is probably the reason we don't all speak the same language in the first
place.

That being said, I've always been fascinated by the idea of a systematically
created universal language and think the world would be much better place with
one....if that were possible.

This was a neat article.

------
Terr_
I think there's some research out there that suggests all natural languages
have about the same information density, when you factor how two people in
conversation will add error-correction or extra context to frame an idea.

IMO this suggests the bottleneck is something about our brains on a biological
rather than linguistic level.

~~~
godarderik
According to a study published several years ago, mainstream languages seem to
operate on an information density/speed tradeoff [1].The authors found that
languages that are spoken faster seem to encode less information per syllable
than those uttered at a slower pace.

This does seem to suggest that biology may be the limiting role in controlling
the rate at which humans convey information. Indeed, the language mentioned in
the article seems almost laughably cryptic and dense. However, I feel that the
limitation of the mentioned study results from the fact that it treats
information on a relatively limiting per syllable basis. Quijada seems to
suggest that an artificially constructed language has the ability to
incorporate all the implicit meanings of a phrase that are left unsaid in
normal conversation.

Ultimately, while Quijada's project seems quite unlikely to catch on among
those who are not fringe pseudoscientists, it poses interesting philosophical
questions about the nature of speech and communication and perhaps earns its
title as a "conceptual-art project."

[1] [http://rosettaproject.org/blog/02012/mar/1/language-speed-
vs...](http://rosettaproject.org/blog/02012/mar/1/language-speed-vs-density/)

~~~
hueving
>The authors found that languages that are spoken faster seem to encode more
information per syllable than those uttered at a slower pace.

I think you mean the inverse.

~~~
godarderik
Fixed it, thanks.

------
gabemart
I found this article fascinating and satisfying.

I'm curious about the desire to reduce ambiguity, which seemed to be
emphasized as a motivation for the creation of Ithkuil and some of the other
languages mentioned.

Is it desirable to completely eliminate ambiguity? I can see why it would be
desirable in a scientific paper or a public political debate. But in everyday
interactions, (intentional) ambiguity plays many important roles.

In my experience, politeness is bolstered by some level of ambiguity. Rather
than explicitly state your needs, desires or opinions, you imply them at some
level of abstraction, allowing other participants in the conversation to
accept or decline more easily. Imagine Jessica who has brought two friends who
don't know each other to see a play. They chit-chat a little afterwards, then
Jessica goes home early leaving two virtual strangers to have a drink
together. It's not hard to imagine the conversation going like this:

A: "Did you enjoy the play?"

B: "It was very interesting. I thought the stage dressing was a little
unconventional."

A: "Yes, I noticed that too. Very creative. I was intrigued by the style of
the narration. It really let the audience write the story for themselves."

B: "It certainly didn't constrain the imagination did it? I couldn't help
noticing that many of the actors took a somewhat avant-garde interpretation of
the source material."

A: "Yes, as if they didn't want it to seem like they were 'acting', so to
speak?"

B: It was awful wasn't it!?

A: Thank god! Yes, worst thing I've ever seen!

Ambiguity allows subtle social cues (not so subtle in my example!) that avoid
direct confrontation when it might be uncomfortable. If one person loved the
play and the other hated it, they each might want to avoid offending the
other.

Intentional ambiguity plays an important role in other social interactions
like dating or friendship-making. Correct use of ambiguity protects feelings,
demonstrates subtlety and good judgement, and avoids non-productive conflict.

In artistic expression too, ambiguity is often intentional or even necessary
to the effectiveness of the work. Consider a poem like "My Papa's Waltz" [1].
Does it describe happy memories of the narrator's father, or dark memories of
childhood abuse [2]? Can it describe both? Is there something in between? The
ambiguity isn't a byproduct of imprecise language. The ambiguity _is_ the
meaning. To resolve it is to remove the point of the work. The poem cannot be
effectively communicated in any medium that does not allow for the existence
of ambiguity.

[1]
[http://www.poetryfoundation.org/poem/172103](http://www.poetryfoundation.org/poem/172103)

[2] 'Yet, this poem has an intriguing ambiguity that elicits startlingly
different interpretations. Kennedy calls it a scene of "comedy" and
"persistent love", and Balakian, in part, labels it a "comic romp" (62). In
contrast, Ciardi sees it as a "poem of terror"' \- from
[http://www.mrbauld.com/exrthkwtz.html](http://www.mrbauld.com/exrthkwtz.html)

~~~
jasode
>, politeness is bolstered by some level of ambiguity. Rather than explicitly
state your needs, desires or opinions, you imply them at some level of
abstraction, allowing other participants in the conversation to accept or
decline more easily.

Steven Pinker explores this point in an entertaining presentation[1] for RSA.
He also covers other language topics such as spacetime encoding, and profanity
but the last part analyzes the need for ambiguity in a language. I deep-linked
into the relevant portion of the presentation although the the entire talk is
very enlightening. It's worth rewinding to the beginning to watch the entire
talk. The 2nd youtube video[2] is the mostly the same material but it's the
older one he presented at Google TechTalks.

[1][http://www.youtube.com/watch?v=5S1d3cNge24#t=32m55s](http://www.youtube.com/watch?v=5S1d3cNge24#t=32m55s)

[2][http://www.youtube.com/watch?v=hBpetDxIEMU#t=40m38s](http://www.youtube.com/watch?v=hBpetDxIEMU#t=40m38s)

------
JoeAltmaier
"Among the Wakashan Indians of the Pacific Northwest, a grammatically correct
sentence can’t be formed without providing what linguists refer to as
“evidentiality,” inflecting the verb to indicate whether you are speaking from
direct experience, inference, conjecture, or hearsay"

This is amazing. But I can't grasp the difference between inference and
conjecture - they are both 'figuring out' what happened rather than knowing or
hearing?

------
lotsofmangos
I wonder how well Ithkuil can be represented in Ian Banks' Marain script.
[http://trevor-hopkins.com/banks/a-few-notes-on-marain.html](http://trevor-
hopkins.com/banks/a-few-notes-on-marain.html)

~~~
gamegoblin
Ithkuil has well over double the number of phonemes as Marain, so the answer
would probably be "not so well".

~~~
lotsofmangos
Rotation and reflection of the basic set extend the phonemes and can link
together similar sounding ones in Marain, so I would have thought it would be
achievable.

------
arsalanb
>"Languages are something of a mess. They evolve over centuries through an
unplanned, democratic process..."

I'm in awe of the creator of _Any_ language. Because to create a (Good)
language isn't easy. This is true or both programming languages and otherwise.
However, it comes without saying that adoption is a vital component of any
language, and with mass adoption comes evolution.

People will often make changes in languages, make their own dialects (based on
things perhaps the can relate to on a deeper level, etc..). This isn't a bad
thing. To me it only signifies growth and expansion of the language.

+1

------
pohl
I really enjoyed this article when it was new. Not long ago, when I was
learning Octopress, my first post was Hello World in Rust and Ithkuil. (I just
wanted to make sure code formatting was working.) I have no idea how correct
the translation is. I just googled around until I found someone else's.

[http://screaming.org/blog/2014/07/12/ettawil-
cutx/](http://screaming.org/blog/2014/07/12/ettawil-cutx/)

------
wyager
Can someone list a few popular constructed languages (maybe comparing them to
programming languages)? I'd only heard of Lojban and Esperanto before reading
this.

~~~
mchaver
Toki Pona is a minimalist language with a bit of a following. It has 120 root
words and tries to build all concepts based on those.

Loglan is a predecessor and inspiration to Lojban.

Slovio is the Slavic version of Esperanto.

Dothraki, Elvish (Quenya, Sindarin), Klingon, Na'vi are constructed languages
from popular novels/movies.

~~~
iLoch
> Slovio is the Slavic version of Esperanto.

Someone doesn't understand the point of Esperanto.

~~~
mchaver
Regardless of the point, the truth is Esperanto's lexicon is mostly inspired
by Romance languages whereas Slovio's lexicon is mostly inspired by Slavic
languages. It's not that shocking of a comparison.

------
StavrosK
TFA:

> A sentence like “On the contrary, I think it may turn out that this rugged
> mountain range trails off at some point” becomes simply “Tram-mļöi
> hhâsmařpţuktôx.”

Wikipedia:

> Romanization: Oumpeá äx’ääļuktëx.

> Translation: "On the contrary, I think it may turn out that this rugged
> mountain range trails off at some point."

------
yongjik
That was an interesting read, but the reporter's breathless assertions
frequently got in the way of appreciating Quijada and his idea.

I mean, things like:

> A sentence like “On the contrary, I think it may turn out that this rugged
> mountain range trails off at some point” becomes simply “Tram-mļöi
> hhâsmařpţuktôx.”

 _Simply?_

We could have used LZW algorithm and the sentence could probably become even
shorter, just a "simple" sequence of random-ish bytes. If you increase the
number of allowed symbols, of course you need less symbols to convey the same
information. If you allow for a limitless set of words that are dynamically
generated from combining many roots, of course the number of words
decreases... sometimes down to 1, as in polysynthetic languages. This is
Information Theory 101.

~~~
moconnor
The English text is 97 ASCII encoded bytes.

Compressed with zlib: 86 bytes.

Compressed with lzma: 98 bytes.

The Ithkuil representation is just 30 UTF-8 encoded bytes.

Compressed with zlib: 39 bytes.

Compressed with lzma: 47 bytes.

(Measured using python's zlib/pylmza modules to avoid e.g. file header
overhead)

It's hard to achieve this kind of compression without an external dictionary.
What Quijada has created with Ithkuil is, in part, a dictionary for the space
of human thought and concepts, something I wouldn't have expected to work in
the way the article describes it.

~~~
Dylan16807
Actually, using zlib format gets you an unnecessary 2 byte header and 4 byte
footer, so the proper sizes are 80 and 33.

I'm having trouble figuring out what's going on with lzma because the spec is
lying about the header, so I won't attempt to guess the correct number there.

------
based2
[http://www.reddit.com/r/linguistics/comments/2dlsgl/utopian_...](http://www.reddit.com/r/linguistics/comments/2dlsgl/utopian_for_beginners)

------
mariusz79
While it looks like it's an impossible language to use in every day, I'm
wondering if it could be used for science and technology. Just imagine having
all scientific papers in it :)

------
alxndr
I'm amazed that the article doesn't mention Lojban at all.

------
thisjepisje
Off topic: Are the drop caps supposed to be lower than the line of text to
which they belong? It looks kind of silly IMO.

------
lawlessone
Any font files for this? would be interesting to use.

------
stuaxo
If there was a site that summarised New Yorker articles in 2 pages I would be
there in a flash.

~~~
Fastidious
That would be atrocious! You need to flavour and enjoy the reading, just the
same you enjoy a nice drink, or a good cup of coffee, or you take the time to
make coitus a never-ending engagement.

Just enjoy it.

~~~
StavrosK
Sometimes I need a quick snack, for the calories.

~~~
krisgee
To be fair there are lots of "snack food" articles around these days but few
three course meals. I try to appreciate the long form articles that I find
interesting for what they are.

------
QuantumGood
Seems this could contribute to accelerating artificial intelligence towards
the possibility of the singularity.

------
sbussard
another hacker news TL;DR article

~~~
knieveltech
Too bad. It's a pretty good read.

~~~
JohnTHaller
It's a very interesting article. But it's done in old journalism/academic
paper style where it takes 5 pages to get to the point and has huge
multiparagraph asides that the reader is often uninterested in. I already know
the history of esperanto... most people won't even care about it. I don't care
at all that George Soros learned it as his first language, it's unrelated
nonsense. Tell me about the topic of the article. If you want interested
people to be able to learn more about Esperanto, link to a side article. We
can do this today.

~~~
Uncompetative
Infodump

------
selimthegrim
Two things struck me about this article in hindsight when I read it.

\-- Whose pot did the Croats, Bosnians and Slovenes piss in to not make it
into this super Slavic union?

\-- China Mieville wrote a book[0] along very similar thought lines which won
the Locus Award.

Also, Garkavenko appears not to have taken the obvious side [1] in Ukraine's
present conflict given how he is described in Foer's article

[0]
[https://en.wikipedia.org/wiki/Embassytown](https://en.wikipedia.org/wiki/Embassytown)

[1] [http://maidantranslations.com/2014/06/24/russian-
volunteers-...](http://maidantranslations.com/2014/06/24/russian-volunteers-
returning-home-from-donbas-will-bring-the-deadly-maidan-virus-to-russia/)

