Hacker News new | past | comments | ask | show | jobs | submit login
Distant languages have similar sounds for common words (economist.com)
44 points by tintinnabula on Sept 20, 2016 | hide | past | web | favorite | 60 comments

It is a curious fact, and one to which no one knows quite how much importance to attach, that something like 85% of all known worlds in the Galaxy, be they primitive or highly advanced, have invented a drink called jynnan tonnyx, or gee-N'N-T'N-ix, or jinond-o-nicks, or any one of a thousand or more variations on the same phonetic theme. The drinks themselves are not the same, and vary between the Sivolvian 'chinanto/mnigs' which is ordinary water served at slightly above room temperature, and the Gagrakackan 'tzjin-anthony-ks' which kill cows at a hundred paces; and in fact the one common factor between all of them, beyond the fact that the names sound the same, is that they were all invented and named before the worlds concerned made contact with any other worlds.

What can be made of this fact? It exists in total isolation. As far as any theory of structural linguistics is concerned it is right off the graph, and yet it persists. Old structural linguists get very angry when young structural linguists go on about it. Young structural linguists get deeply excited about it and stay up late at night convinced that they are very close to something of profound importance, and end up becoming old structural linguists before their time, getting very angry with the young ones. Structural linguistics is a bitterly divided and unhappy discipline, and a large number of its practitioners spend too many nights drowning their problems in Ouisghian Zodahs.

-- Douglas Adams

There's a visceral gratification in the pronunciation "ginin taw nick" it rolls off the tongue, perhaps that is the root. My favorite word for this reason is pumpernickel.

Pumpernickel. A very woody word. [1]

[1] https://www.youtube.com/watch?v=-gwXJsWHupg

That made my day, thanks :)

This isn't really a new discovery. It has been common knowledge in linguistics at least since the '70s that there is a universal tendency to use /i/-like sounds in words that designate smallness and /o/ or /a/-like sounds in words for the opposite. (Note: Tendency. There are plenty of counter-examples, but the tendency is still there.) This is possibly linked to sound symbolism and has to do with how you position your tongue when you say those vowels. Possibly.

There are also plenty of older studies using made-up words that show that people link /i/ sounds to sharpness and /o/~/a/ sounds to roundness. (E.g. show a group of subjects a triangle and a hexagon and ask them which one they think the word 'pirl' refers to. Then show another group the same shapes but use the word 'porl' instead.)

Sound symbolism is super interesting. The Jabberwocky is a great example of how certain phonemes/syllables can affect our interpretation despite not being connected to each other in any immediately meaningful way.

My interpretation of the significance of the study this article is about was that it is a very longitudinal study, demonstrating the patterns you mention across thousands of languages.

Anecdotally speaking as a multilingual person, I've noticed some of the same patterns - but untested, anecdotal comments aren't worth much; I find it significant and interesting that (at least some of) these trends seem to hold across thousands of languages. This isn't my field, though - is my impression of the significance here correct?

A second question: isn't it the currently held theory in linguistics that there is more or less a single "super parent" language - which my memory of my intro to linguistics class from ten years ago identifies as Proto-Indo European - and wouldn't that common linguistic ancestor provide a plausible alternative explanation for this common sound symbolism?

A question tangentially related to the above: the sounds you mention are all vowels, which are easier for infants to make and which infants tend to learn to make first; it seems to stand to reason that sound symbolism tied to sounds made earliest by infants would be more likely to be unchanged over time. Is there anything that rules out this perspective? Building on this, I know that there are some patterns to the development of abstract reasoning in children; has there any work been done on correlating abstract reasoning development timescales with the development of phonetic production skills? Perhaps this sound symbolism has to do a parallel in the developmental timelines of these two?

There was something called "Nostratic" hypothesized by some Russian linguists, I think. But broad though its spread is, I don't think that Proto-Indo European accounts even for all European languages--Hungarian, Basque, and Finnish being outliers. However, I am not a linguist.

I'm not sure if you realize the great differences in generality between this study and previous studies in sound-meaning associations. First, this study drew from ~4,000 languages representing all the world's major language families (a scale I assume the /i/ vs. /u/ studies didn't achieve). Second, this study was sensitive to all associations in its methodology, as opposed to the high vs. low study, where the scientists came into the study asking a very narrow question (is there an association between high vowels and smallness and low vowels and bigness?). Here is the table from the study that summarizes associations that were very strongly attested in their corpus: http://i.imgur.com/iCOM8KQ.jpg

This study corroborates previous findings in sound-meaning association studies, but it also gives us a whole lot of new associations ('horn' and /k/, /r/? who knew?). Further, it investigates a previously uninvestigated (as far as I know) question: are there sounds that negatively correlate with certain meanings? (And it turns out there are.)

Ugh! 'Pirl', 'pirl'! What a nasty tinny sort of word.

But I do wonder how much of the similarity they're talking about here is accounted by phonoaesthetics, and how much of it might be simply due to some inevitable coincidence (like Zipf's law being applicable to gibberish texts). I mean, common concepts seem more likely to be referred to by short words, and there are only so many short words that can be formed with a given number of phonemes.

Shouldn't some repetition and similarity be expected from that alone?

> The words for “nose”, for instance, often involve either an “n” sound or an “oo” sound, no matter the language in question.

The wording here is almost painful. You can't play "often" as a minimizer against "no matter the X in question" like that. It is terribly confusing and the mark of someone who wants to sound sure, isn't sure, and isn't capable of making a more nuanced statement.

Seriously, that wording is bizarre, like they started with "often," and then later made a "use more convincing language" pass without really paying much attention...

[Nose in Mandarin: bí or bízi...]

No joke, the Mandarin translation was almost the comment I made. Just before post I reread it and discovered that their misleading phrasing tricked me into almost saying they were wrong. With all the conflicting modifiers, I suppose they are right, but man did I feel like I was supposed to be convinced of the false statement I thought I read.

As a self-diagnosed abuser of nitpicks, I have to say that this is a nitpick. The intent here is pretty clear.

In Japanese, it's "Hana", and in Mandarin it's "Bizi". Seems like they're only talking about IE languages, and those influenced by them.

This isn't true--the study considered 4,000 languages from all the world's major language families: http://www.pnas.org/content/early/2016/09/06/1605782113.abst...

Japanese fits perfectly, actually.

Nose - hana has N in it.

Sand - suna has S.

Round - maru has R.

There's not much too information in the article, to be honest. However, it's an interesting notion to me because I've recently become very interested in etymologies, specifically Proto-Indo-European. Since all people come from the same place and have roughly the same features, it's not hard to imagine that there was once a common shared tongue. After all, in the grand scheme of things, these migrations didn't happen all too long ago. I wonder if this will eventually lead to a new reconstructed language, something to tie together the Afro-Asiatic, Indo-European, etc. Very exciting.

Some researchers believe articulated language arose after the last out-of-Africa event. If that's the case, it's possible the gene(s) arose outside of Africa and migrated back into Africa from, e.g., Asia. And that primitive articulated language with grammar, etc, as we understand it arose multiple times independently.

Most researchers believe the cognitive prerequisites for articulated speech came before and drove the emergence of the physical ability for articulated speech. But Joseph Jordania, for example, argues that speech evolved from choral singing, which arose as a defense mechanism on the savannah where groups of human ancestors would vocalize and gyrate in unison to intimidate predators. The emergence of this particular strategy explains, he argues, why our ancestors, after descending from the trees, didn't evolve more typical defense mechanisms like becoming bigger, growing thicker hides, being able to run faster, etc.

The group intimidation display selected for many things, including the ability to dance and sing in a group, because if you weren't in unison with the group, or if you tried to run away, you'd be the first to be eaten. This also explains, he argues, why people can become entranced while dancing in a group. It also, FWIW, neatly skirts the dilemma of so-called group genetic selection--the free-rider problem is taken care of by the hungry lion, and we don't require a complex, unproven fitness model for how cheating was suppressed; classic genetic models would explain how such extremely cooperative behavior could arise and persist among members not immediately related genetically (i.e. not extended family).

In that scenario the physical aspects of articulated speech could arise first, and the last piece of the cognitive puzzle could have come last, possibly long after out-of-Africa. But that last piece might have been so advantageous that it could have quickly migrated everywhere, including back into Africa.

In Jordania's model African and European populations were the last to receive articulated speech. He predicts, among other things, that polyphonic (i.e. choral, aka group) singing would be more common in European and African societies. And that articulated speech glitches, like stuttering, would be more common in Europe and Africa than in Asia, as Asian populations would have had more time to see these things suppressed in their populations.

Checkout my post. I researched the timelines of when mammals appeared, when primates appeared, and when the continents first separated. One conclusion is that language existed WAY before even primates did.

On language: There is much to be learned from dolphins. They were just last week reported to have been seen having an actual back and forth complex conversation. Two dolphins were given a technical task to solve, and were discussing how best to do it. Pretty amazing.

Considering our nearest ancestors don't have complex language I'd say convergent/parallel evolution is a much more reasonable conclusion.

Dolphins have complex language using sound, and they are very distant cousins. The fact that chimps don't talk is likely to be 50% because of inadequate throat construction, and 50% lack of intelligence. Past extinct lifeforms could have easily had vocal cords and intelligence. We just don't know. That's the point of my post. The possibility exists that language first evolved billions of years ago.

175 million years ago all continents were merged into one, and about 65 million years ago primates appeared. If there is an actual historical link to this language pattern, one possible explanation is that animals were able to make sounds with vocal cords well before (like 100 million years before) we actually were primates. This is not far fetched at all. Recently dolphins were found to be having actual conversations. I mean mammals had been around 20 million years BEFORE the continents broke apart so perhaps there has always been a "segment" of mammals that had the right brain nodules to control speech and the requisite vocal cords also. Possibly the dino killing asteroid killed off most of the 'talking mammals' and only left one group of mammals that had these verbalizations, which were for some reason only transferred down one evolutionary branch of the tree that ended up at humans. It makes sense that it would always follow the branch of the volutionary tree that represents the highest intelligence. Language and intelligence reinforce each other in an evolutionary sense, so perhaps the value of speech just died out because it had little value in most animals because they had no intelligence to develop it further.

This is a completely unnecessary hypothesis. At some point early on all humankind was geographically co-located. If there are somewhat universal features to language that were transmitted memetically, it's a much more reasonable hypothesis that these are choices that were made before humanity spread too widely.

The thing that makes this a mystery is that we think the continents separated 175 million years ago, but primates didn't evolve until only 50 million years ago. I guess if there were still land-bridges between the continents 50 million years ago, then it makes sense that all humans are from the same small tribe that spoke the same "original" language. Problem solved. But my point was that language could have existed long before primates, and no one knows if that's right or wrong. The fact that monkeys don't talk is moot, because they are already 98% identical to humans so you can't make any argument based on similarity. Based on genetics there is no analysis that would have predicted monkeys wouldn't be able to speak, or for that matter be as intelligent as man.

> I guess if there were still land-bridges between the continents 50 million years ago

There are land connections between two groups (of 3 and 2) of the inhabited continents today, and as recently as 11,000 years ago five of the six human-inhabited continents were mutually connected by land.

Yeah the simplest explanation is the conventional one, which is that all humans came from a common tribe that migrated out of africa long after the continents had separated mostly, and the way they got to some of the continents had to have been by boat. My initial post was merely saying how interesting it would be that language could have evolved well before primates, and we would have no way of knowing it. All you have to do is look at the fact that dolphins have language, and are very intelligent. Intelligence and language is far from being specific to human brains.

> animals were able to make sounds with vocal cords well before (like 100 million years before)

Um, no. Even between chimps and ourselves lies a vast gulf of unpronounceability, just due to vocal chord physics. Yet more interesting is the mapping between formants and meaning in linguistics.

Lots of primate species have died off. Since even humans and chimps are 98% identical there is plenty of reason to believe other primates had vocal cords and even language. The only way you can argue against this is to say that you know the 2% difference between man and chimp contains instructions for vocal cords that could never have appeared in the past which is obviously a wild guess on your part. Neither one of us knows so your "um, no" snarkyness is nothing but that. snarkyness.

Any non-Asian languages use onomatopoeia as the basis for animal nouns? In Vietnamese, the words for some common animals are basically the sound associated with said animal, like referring to a cat as a "meow":


The funny thing is that "bo" means cow not only in Vietnamese but also in Irish, as well as in some Italian dialects (though Irish and Italian share the Indo-European roots). Does anybody know if the Vietnamese is in any way connected?

'bo*' is pretty common for cows in european languages - they are the genus 'bos' after all (see: english 'bovine').

As for the Vietnamese roots, I thought it might have been a French loanword but it looks like it came from a proto-vietnamese language: https://en.wiktionary.org/wiki/b%C3%B2#Etymology_1

The Google etymology viewer is pretty good for this: https://www.google.com.au/search?q=etymology%20cow

I have just made the most shallow research, but in Swedish (and other Scanian languages) cattle are called "boskap". The etymology is shared with "bo", meaning dwelling (animal or human house). The original meaning of "boskap" is that which belongs to the house.

I'm wonder if it might be shared with the other languages, that cows and sheep share the name of the dwelling because they are part of it?

Looking further at the etymology of (Swedish) "bo", it seems it comes from the Proto-Indo European "bhuh", to become, to grow into being, sharing the same etymology as English' "be". In Swedish, "bo" is also a verb meaning "to live (some place)", e.g. "I live here": "jag bor här".

Edit: Actually, "be" has a mixed history it seems. Indeed it is mixed with "to reside". See here for more etymology: https://en.wiktionary.org/wiki/be#Etymology

Edit II: After a bit more of Wiktionary, "bo" has the exact same meaning as in Swedish in Cuiba, a language with less than 3000 speakers in Colombia and Venezuela. Wow!

When it comes to the cows though, my original research might have been misguided. Looking at the etymology for Latin "bos", it comes from a different Indo European root: https://en.wiktionary.org/wiki/bos#Latin

French had a big impact on Viet culture and I believe were in some part related to the romanization of the language. To me Viet seems like Cantonese written in latin script, with borrowed French words. I don't know it very well obviously.

Not quite what you are asking for, but in Danish you tell kids that a dog is a wuf-wuf-dog and a cow is a muu-cow, etc., basically concatenating the onomatopoeia and the normal noun together.

A few onomatopoeia have also evolved into nouns, such as vov (wuf) is the basis for vovse (informal for dog)

Similarly in english we use "yappy dog" to describe little chihuahua like dogs. You can drop dog, or substitute it for a superlative of your choice and everyone will understand what you mean.

The woof bark doggo

Ramachandran touches on this in his TED talk (starts around 21:20 )about 'kiki' vs 'babu'.


The similarities among mother/mom (English), mádre (Spanish), ma (Chinese, sorry if the intonation is wrong), ma/mata (Hindi) has perplexed me.

The earliest sounds that babies make tend to be "ma/muh" sounds, as they're smacking their lips to signal for food. Seems likely for language then to develop that sound into the word for "mother", the nearest object (food source as the case may be).

This pretty much explains away the entire article.

The progression of sounds that a baby learns is well understood, especially the physiological aspects of language development and acquisition.

If we accept as a premise that cultures would tend to assign words most relevant to babies according to their physiological capability, then it becomes obvious why words for mother, father, and even facial features would be strongly similar across cultures. And why the similarity will slowly diminish as the meaning of words becomes less important, functionally, for communication with a child.

And especially for simple words the phonetics would have very little to do with cognitive development (and evolutionarily-dictated language models), but be controlled almost entirely by a baby's physical constraints--e.g. huge tongue in a tiny mouth, poor motor control, etc.

It does explain similarity of words for familial relations, but not for things like "nose". Babies don't randomly say "nose", or something that sounds similar, while pointing at their nose.

No, but they learn the words for various body parts very early in life, usually before the age of 2 and beginning around 1-year-old. A 2-year-old also has significant physiological and neurological constraints on vocalizations.

There are also cognitive development constraints, of course.

The question is, from where do we get the correlation? If it's physiological or neurological, then there's nothing to support fanciful claims of innate language models. It's the cognitive constraints that can support fanciful claims, but you have to first show that it's those constraints, rather than the physiological or neurological constraints, that are controlling word choice.

Yeah, for a very, very, very small subset of words, the very first and most immediate things babies care about, this theory makes some sense.

For everything else, not so much.

100 basic words (the corpus of the study) is such a small subset, especially if you posit very strong correlations among subsets of those 100 words most likely to be used in the first year of life.

And especially considering that the researchers "reduced all sounds to 34 distinct consonants and 7 vowels." (Consider that the more we reduce and simplify the vocalizations the more likely we should expect correlations. For example, if we simplified all sounds to grunts we should expect tremendous overlap.)

The correlation of the corpus of 100 words isn't particularly surprising given what we already know about language development. The null hypothesis was that there'd be no correlation whatsoever (i.e. the sounds would be completely random), which I don't think anybody could reasonably expect.

Indeed. Especially when the article gives us this: "...the team found a lot more consistency across languages than they had expected." What does that mean? Were their originally expectations reasonable, or did they merely re-discover the birthday paradox?

I've heard that babies make the same 'ma' and 'pa' sounds regardless of context, and that it is vain parents who assume that the baby is addressing them :)

In Georgian, mother is "deda" and father is "mama", which lends further evidence to this :)

Three out of those four are Indo-European, and indeed the similarity between mother, madre & mata is no coincidence.

English: mom, dad

Arabic: ami, ab

Chinese: ma, ba

Zulu: umama, ubaba

IT's interesting how they all use the same vowels for the names they use to call mother/father. IS there a reason they use vowels vs say consonants?

Small children have trouble with pronouncing most consonants. Unformed palate, small tongue, lack of teeth.

The first two and the last are Indo-European languages; it appears that mother in Chinese is "mu".

It's "obaasaan" in Japanese, though. It'd become "baa" if you remove the polite syllables, and thus also ends up becoming a labial consonant (ओष्ट्य for those familiar with the verse in पाणिनी).

Curious indeed.

Note: Mother in Japanese is actually "okaasan" (or "haha"). "obaasan" is grand-mother apparently.

yeah mother in chinese is ma. I have always been curious about this as well.

Link seems to be dead? Redirects me to front page of economist.com.

Thank you! We updated the submission.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact