Hacker News new | past | comments | ask | show | jobs | submit login

Why is NLP these days equivalent of Natural English Language Processing? There are much more languages in the world.



Short answer would be availability of tagged data, in turn dependent on historical research, which is a laggy history of who's been funding the research.

This is changing, thankfully; check out e.g. the European Language Resources Association's catalogues, http://catalog.elra.info/ or any of the very incomplete list on wikipedia: https://en.wikipedia.org/wiki/List_of_text_corpora or in fact the various international Wikipediae themselves!

As for this particular project being English only ... itch to scratch probably ;-)


Short answer would be availability of tagged data,

Part-of-speech and syntax-annotated corpora have been available for a long time for many other languages than English. E.g. for German:

http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora... http://www.sfs.uni-tuebingen.de/ascl/ressourcen/corpora/tueb...

Dutch:

https://www.let.rug.nl/vannoord/Lassy/ http://lands.let.ru.nl/cgn/ https://www.let.rug.nl/vannoord/trees/

French:

http://www.llf.cnrs.fr/en/Gens/Abeille/French-Treebank-fr.ph...

I think that one of the problems is that NLP research of English typically has a higher chance of getting accepted at conferences.


On a related note: why are we putting up with other languages anyway? I've been asking myself for a long time why we (as in humanity) don't try to standardize on a language.

The language of the Internet is English for the most part, and since the Internet is the greatest facilitator of communication in human history, that standard language might as well be English.

I firmly believe the world would be a much better place if people from everywhere could communicate with each other without being inhibited by the language barrier.

I understand that this is not something that can be done overnight, but a lot of people learn English as a second langue today, so maybe a switch could be made in a generation or two, as people who don't speak English as a first or second language slowly die out.


>On a related note: why are we putting up with other languages anyway? I've been asking myself for a long time why we (as in humanity) don't try to standardize on a language.

Because we value our languages and cultures, thank you very much?

Especially if you happened to be born a native English speaking, and you propose this from that vantage point, it comes as borderline racist, and reminds me of this except from Bernard Shaw's Julius Caesar:

"Pardon him, Theodotus: he is a barbarian, and thinks that the customs of his tribe and island are the laws of nature."

English the language of the internet? It is perhaps the language of YOUR internet. The Chinese internet is as big -- and there's the Indian internet, the Spanish speaking internet, and tons more besides.

Why not just get rid of other cultures too, and just keep anglo-saxon culture?

Imagine how easier things could be then, not only we'll have one language, but we could all relate to the same stuff...

Oh, and about those English -- it's dominance in the last centuries is just a historical accident, caused by British and then US power. Even the very term used to describe such a phenomenon, "lingua franca" is not English. The main language used to be Greek, then Latin, then French and English, etc. And that's mostly on the Western part of the world (Europe, US etc) -- that westerners tend to think it's the whole world, but it's hardly 10-20% of the global population. The "lingua franca" could be again very different in 1 or 2 centuries...


For what it's worth, I'm not a native English speaker and I don't live in an English-speaking country. However, I think I am "allowed" to voice this opinion without being labeled a racist, regardless of my native language.

I still think that there should be a standard language.

I really don't like that language is often conflated with having a distinct culture. I acknowledge that there is some connection between language and culture, but I don't think having a distinct culture necessitates a distinct language. There are many distinct cultures that share a language already. Even if that was the case, what would be wrong with cultivating a world wide culture? Also, some languages like Latin died out, while we still benefit from its speakers cultural achievements.


>I really don't like that language is often conflated with having a distinct culture. I acknowledge that there is some connection between language and culture, but I don't think having a distinct culture necessitates a distinct language.

Having a distinct culture doesn't necessitate having a distinct language, but having an existing distinct culture in a specific language necessitates preserving the language for preserving it -- even just the tons of literature, poetry, song, etc, in that culture, not to mention worldviews and philosophical outlooks expressed in words, etymology, proverbs, shared expressions, etc.

(Also cultures with distinct languages are also more distinct than cultures with shared languages -- except of course if the languages are close cousins of the same family and the cultures have been co-existing for centuries already. An english speaking China will be a very different culture).

And we haven't even mentioned the Shapir/Worf hypothesis: https://en.wikipedia.org/wiki/Linguistic_relativity

>Even if that was the case, what would be wrong with cultivating a world wide culture?

You mean besides obliterating every existing culture and using it for parts?

Everything that's wrong with monocultures of all kinds and singular ways of thinking and doing things (less evolutionary adaptable, less competition among different viewpoints, poorer breadth of experiences and outlooks, etc)


> Having a distinct culture doesn't necessitate having a distinct language, but having an existing distinct culture in a specific language necessitates preserving the language for preserving it

I'm not saying other languages shouldn't be preserved. I'm saying there should be a standard language. I don't think that one precludes the other.

> An english speaking China will be a very different culture

Maybe, but why would that be worse (or better for that matter)?

> And we haven't even mentioned the Shapir/Worf hypothesis: https://en.wikipedia.org/wiki/Linguistic_relativity

That's an interesting point, but on the other hand I'm still not convinced that this means that having a lot of distinct languages is "better". While on the other hand, there would be clear benefits to everyone speaking the same language or having at least one common language.

> You mean besides obliterating every existing culture and using it for parts?

Isn't that what always happens with all culture as it evolves?

> Everything that's wrong with monocultures of all kinds and singular ways of thinking and doing things [...]

This is where I strongly disagree. I think that some things are tools for a specific purpose and should be standardized. Language is a tool to convey facts and ideas.

I recognize that culture should not be standardized, of course.


do not forget the fact that different languages have different cultural concepts embedded in them.

weltschmerz. schadenfreude. schmetterling.

you simply can not say this word in english without explaining it. if you do not understand german you will also have a very hard time getting the emotion that is embedded into all of us german speakers when we hear those words. there are tons of examples like this in every language.

i also personally think that learning german, english and russian was an enormous benefit to myself, some thoughts i think in english, some in german and very seldomly i even think in russian. the fact that i also think javascript, php, python and lots of other programming languages helps me even more to better be able to transpile my thoughts into words and actions.

you can not articulate thoughts that you do not have words for. if society has less words, society has less variety of thought. it follows that, if you have less words, you have less variety of thought.

in the end i think that instead of declaring a standard language or creating a new standard language (https://xkcd.com/927/ anyone?) what will happen is that our currently used languages will merge into one language that is influenced heavily by all currently spoken languages.

the situation will stay the same, most people will not be able to communicate efficiently, this is not the fault of language though.


As a casual student of German, is there any context in “der Schmetterling” which is not translated as “the butterfly”?


if you think about the butterfly effect, calling it schmetterlingseffekt just sounds way more badass :p

but you are right, schmetter does not originate from "smasher" (as i and probably most german speakers assumed), but instead from "Schmetten", which means butter or cream.

this also means that this word is translated perfectly, but the german word has a whole different meaning nowadays, i personally have never heard the word schmette.

thanks for the motivation to look at this, feeling a bit smarter now and finally know why the schmetterling is called butterfly. :)


>> the Shapir/Worf hypothesis: https://en.wikipedia.org/wiki/Linguistic_relativity >That's an interesting point, but on the other hand I'm still not convinced that this means that having a lot of distinct languages is "better". While on the other hand, there would be clear benefits to everyone speaking the same language or having at least one common language.

Would the computer world be "better" if anyone standardize on one single programming language (e.g. java) or is it "better" that different languages are there, ( dartmouth basic , javascript, python, lisp, haskell or perl) so you can code differently ?

As for the universal language, Esperanto tried: https://en.wikipedia.org/wiki/Esperanto a nice documentary about it: http://esperantodocumentary.com/en/about-the-film


a nice project is the Positive Lexicography Project, an evolving index of 'untranslatable' words related to wellbeing from across the world's languages: http://www.drtimlomas.com/#!positive-lexicography/cm4mi

(previous discussion: https://news.ycombinator.com/item?id=11001695 )


You can have wildly differing viewpoints inside a culture. A democrat and republican can be further apart than an american and a frenchman. Culture and identity are complex phenomena. You cannot reduce a person's identity to a single label without removing almost every aspect about them. A global culture would be a baseline for mutual understanding, but it would not cause homogenous identity anymore than a person's citizenship does.


From a logical standpoint, standardising language certainly works as an idea, but humans are not so logical! The history of language is that of languages merging and diverging. It kinda depends a lot on who you want to talk to at any given time.

It's not clear that language is inseparable from culture; there are arguments to be made that language shapes thought (see "1984" for an extreme thought experiment!), and culture might well be shaped by the network effects of such frameworks.

As for cultivating a world wide culture, perhaps we can already see it happening! And perhaps global language would follow. A few years back there were great claims made that in a century we'd all be speaking so-called "Panglish", the sort of global pidgin that would emerge from mashing lots of languages into English. Maybe ... maybe not. If there were a "standard language", chances are it would be at least partially based on English, and supplement local languages. But it depends entirely on whether everyone really cares about understanding each other, to the extent of learning a common tongue.


I agree with most of what you have written except this part:

but humans are not so logical!

I don't believe there is anything illogical about the diversity of languages. No different from the existence of multiple,types of programming languages.


Perhaps instead of the literal "but humans are not logical" he meant "but languages are not precise". How ironic that would be.


Sometimes I wonder if this kind of notion of what's "logical" is a confusion of short-term instrumental thinking with what is actually successful in he long run. In many biological systems, for example, diverse evolutionary strategies play out in competition and exchange with one another with the end result that "life" more generally persists and grows in complexity on a grand scale. Humans seem to have an inverse tendency towards projecting uniformity wherever possible, with the end result being limited models that subsume as much as possible and then collapse under an unexpected pressure (the structuring of Western economies being the most obvious example of this). I wonder if we'll ever develop some kind of meta-logical approach whereby we intentionally foster diverse, but somewhat isolated, strategies and then experiment with varying levels of exchange to arrive at better outcomes.


Call me a racist too (I am mocking your misuse of the term here) but I never saw the value in having multiple languages. It is not enriching our culture in the same way different cuisines do, you can not easily savour on a poem in different language. For absolute majority of people their native language is just a necessity because their adult brains are mostly unable to learn new language at the same level as their mother tongue (unless you spend many years in that culture). IMHO different languages are equally annoying, useless and impossible to deprecate as different calendars or metric systems. On the other hand it is a little tragedy that English with it's ambiguous spelling and pronunciation become the lingua franca of the day.


> On the other hand it is a little tragedy that English with it's ambiguous spelling and pronunciation become the lingua franca of the day.

*English with its ambiguous spelling

On the other hand, Mandarin is hardly an improvement when it comes to avoiding ambiguity. Any other language would likely not be big enough to make it to lingua franca status anytime soon.

Again, English winning out so far is already a point for the good-enough, and exterminating other languages would mean giving up on better designs in favo(u)r of standardization on a suboptimal thing.

Is it cool that I can code JavaScript / Browser API almost everywhere these days? Hell yeah! Do I want to give up things that other languages offer (types, concurrency, non-web sockets, manual memory management where it makes sense) so we can all speak the same language? Maybe not.

I think efforts should go towards getting better at supporting diversity, rather than trying to water down everything to the lowest common denominator. Sure it's more effort, but it's not like you don't get anything for it in return.


What languages are you familiar with?

There is a lot of culture tied to languages. English happens to be a melting pot of many other languages and cultures so perhaps,you don't see it as much with English.

But there is a lot of fascinating history and context tied to other languages.


Let's give the idea a moments thought.

Somehow in the minds of the world's population, while they are sleeping, you replace their language with a variant of English with cultural-specific loan words and extensions (for the closest approximation, think of French/Portugeuse? schooling in the colonies, or in France for that matter).

And let's propose we rather brainwash ourselves with 'their' language rather than the other way round, to avoid various accusations.

And let's propose we construct great internet firewalls, employ sharp-eyed border guards and have strict controls over the movement of people into and out of the country to maintain cultural homogeneity (look to history for precedents).

Practically almost any phrase can be translated into another language with absolute equivalence. A great army of translators translates all significant works of literature and media into the new language.

Culture is preserved yet we can now communicate between ourselves better? Any observations? (I'll think about it too.)


I can't see how it would change anything if I happened to use Hindi (for example) in my everyday life instead of English, all other things being the same. Watch Seinfeld in Hindi, surf the internet in Hindi, talk with my friends in Hindi.

I think languages serve as natural barriers to cultural homogenization, which otherwise tends to occur.


So how many languages do you speak? And no, english + some latin language doesn't count as "many".

There are many ideas which are impossible to express in english because it's not only lacking the words but the culture itself. Calling for standardization on a language is calling for the death of cultures you do not understand.

And while that's not "racist" as it's not about race, it's quite literally xenophobic. You're afraid of something because it's alien to you.


>You're afraid of something because it's alien to you.

Ugh, don't make stuff up like this to try to prove some point. This discussion is about getting rid of language barriers and has nothing to do with fear of a language. I want standardized language as well and I wouldn't care if it meant I had to learn a new language. So it's not xenophobic at all.


"IMHO different languages are equally annoying, useless and impossible to deprecate"

Not understanding different cultures, thinking their very existence is "annoying and useless" and thus wanting them gone? Yeah, that's being afraid of something alien. And "afraid" is a nice way to put it.

I speak 4 languages fluently and learn more because it's fun, it's beautiful, it teaches you a lot about people and culture and it even improves your understanding of languages you already speak. It's also a wonderful way to immerse yourself into a country you're moving into for the first time. This thread is filled with people who lack that perspective and think their opinion is informed and considerate because "I know I might have to learn a new language and that's fine". Let me remind you how this thread started: "why are we putting up with other languages anyway?".

We already have a lingua franca in the western world and it is English. It's as good as it can get. But the replies here, unhappy with that, boil down to "let's make it official and declare every other language as dead, then let's force that onto the non-english-speaking world".

I used to think like this, btw. I wondered why we bother with different languages. Why going to another country had such a high artificial barrier of entry. Why we didn't just agree on one and left languages to "hobbyists". But I was 8 years old and didn't know any better. I expect more from HN.


Stop projecting! Just because you used to be afraid of other languages does not mean people cannot criticize the economic and social barriers languages put up. Calling an idea xenophobic because you disagree with it shows a complete lack of objectivity.

Would you call the creation of the Euro or even bitcoin xenophobic? They both were designed to eliminate barriers imposed by differences between countries as well.


I'm not calling the idea xenophobic. It's misguided but has potential. The reasoning various people are giving for it here is another story however. The lack of attempt to understand what languages have to offer to humanity, and just pass it off as "useless" is extremely xenophobic.


> There are many ideas which are impossible to express in english because it's not only lacking the words but the culture itself.

This makes me really curious: do you have an example of this or would the explanation of an example in English lose the essence of the explanation?


It will mostly lose it, yes (like explaining a joke).

Because it's not just that a word exists in some language X that doesn't exist in language Y (you can always explain the word's general meaning with a complete phrase or two).

The most important part is rather having that notion in the language X as a handy thing to use -- and thus being able to shape concepts and phrases around it, and sharing an instant common recognition for that notion with others.

That said, there are several very good books on the subtleties of translation and language concepts in general, but one I suggest for the HN crowd would be "Le Ton Beau De Marot: In Praise Of The Music Of Language" by Douglas R. Hofstadter (of "Godel, Escher, Bach" fame).

http://www.amazon.com/Ton-Beau-Marot-Praise-Language/dp/0465...


I'm not sure if the main point is that things can't be explained, but rather that a good explanation needs to encompass a (impractically) large section of the culture surrounding the term. Just as you can learn a new culture and language by immersion, that immersion can be approximated by language (one of the reasons why reading books is such a great thing!).

I'm not sure if these are examples of things that cannot be translated or explained in a foreign language, but they are a few things that came to mind:

You can't really translate Japanese haiku to English, only approximate them (well, some of them). The Samic language (and probably others, at least I'd expect some native American/First Nation languages to have a similar term), has a word for the state between being awake and being asleep/dreaming: adjagas. In Maori, the ideas of boiled food is tightly connected to the physical world, so much so that boiled fish/food is considered anti-magic (or so I've been informed by a linguist). The Norwegian word "koselig"/"å kose seg" is hard to translate to English. It's related to "cozy"[k], but carries a deeper meaning that is deeply connected to Norwegian culture (or so a lot of foreigners seem to think).

In martial arts, the concept of distance can be represented by the Japanese term "maai"[m]. Is this really so different from similar concepts from fencing? I'm not sure, but the meaning is certainly deeply connected to the culture and concepts of martial arts.

You'll note that I've not listed any English concepts that I see as hard to translate or explain in other languages. This is partly because thanks to Hollywood (and other cultural exports, like books), many ideas from (American) English is already part of many cultures of the world - at least those with which I'm mostly familiar: Norway and Japan. I suppose the custom of asking how someone is doing, while not actually wanting to know how they're doing, could be one such example.

[k] http://www.lifeinnorway.net/2015/02/a-visual-guide-to-koseli...

[m] https://en.wikipedia.org/wiki/Maai


> And while that's not "racist" as it's not about race, it's quite literally xenophobic. You're afraid of something because it's alien to you.

    (or (disagrees-with person thing)
        (disinterested-in person thing)
        (dislikes person thing)) => t

    (equal (or (disagrees-with person thing)
               (disinterested-in person thing)
               (dislikes person thing))
           (afraid-of person thing)) => nil
This is the same kind of irrational, manipulative, deceptive talk used to ridicule political opponents.

"You dislike x? So you're xphobic."

"You disagree with x? So you're an xphobe."

"You don't support x? So you're against x rights."

It's hollow and transparent, but sadly, many people fall for it.


So what do you call "wanting the death of cultures alien to you because you don't understand them"?

I see you complaining elsewhere in the thread about the use of the term "racist". Yes, I'm with you, it's not racist. I said it was xenophobic in answer to that. People confuse the terms a lot because often enough they can be used interchangeably. But the very etymology of xenophobia is "fear of what is alien".

This thread is a very, very clear example of that. You don't like it maybe because you don't see it, or you don't understand how inconsiderate it is to take somebody's language - an essential tool of communication, built over thousands of years with immense and incredibly diverse legacies - call it "useless" and wanting it gone.

I would love to discuss the practicality and side effects of "standardizing on a single language". Why it's not that simple. Why it won't have the effects you'd be looking for. Why languages even exist in the first place. But this thread started rotten and got worse - nothing good can come of it anymore.


English borrows heavily from other languages.

For example Hawaiian Pidgin uses many words that don't have a clear English equivalent. This seems like it is a common occurrence in other places.

American English will keep evolving thanks to immigrants and cultural progress.


> It is not enriching our culture in the same way different cuisines do,

Speak for yourself. Learning foreign languages has enriched my life very much.


I prefer to use my regional language when talking to people in my region, and doing local transactions, conversations and contracts in my region, while using English to browse on the internet and use any kind of software.

I think it is much more of an insult to keep things as they are, where developers everywhere try to translate software and websites to my regional language, and the result is catastrophic. The level of the translations from English to Portuguese I'm forced to read on the internet is a terrible insult to Portuguese, a última flor do Lácio, inculta e bela, it is the horror, the horror.


How is it racist?

> Especially if you happened to be born a native English speaking, and you propose this from that vantage point

This is slightly racist to me.


That word does not mean what you think it means.

In fact, today it doesn't mean anything, because it means whatever the speaker wants it to mean. It's used simply to denigrate whoever the speaker dislikes or disagrees with, without making a rational counter-argument.

    Speaking a language != being a member of a race

    Preferring a language != being prejudiced against members of a race

    "That's racist!" != it being racist


So, a native english speaker proposing all countries drop their languages and adopt english wouldn't be racist to you -- but someone calling that out, would?


I'm saying it shouldn't matter if the person proposing English as a world language is a native English speaker or not.

Aside, this "vantage point" logic seems to be similar to the logic that only some people (white people, for example) can be racist.


Though I wish the world language was Indonesian (high learnability) its not.

The British empire was the largest spanning empire in world history.

English is the world language, like it or not.

The country with the highest number of English speakers is not even in the Western hemisphere!


>The British empire was the largest spanning empire in world history.

Thus far.


>>English the language of the internet? It is perhaps the language of YOUR internet. The Chinese internet is as big -- and there's the Indian internet, the Spanish speaking internet, and tons more besides.

These other internets that you have mentioned are not much significant outside their own little spheres. But English internet is the original one and the most important internet out there. English has become the de-facto standard language of modern communication.

I am all for making English as "the" standard language for the world. History is full with countless cultures/languages getting destroyed at the hands of "time". I am very happy that unofficially English has already become "the" standard language for the world.

Disclaimer: I am an Asian with English as a second (actually earlier as a third language). But due to English and due to English internet alone I could educate myself to a large extent and could come out of the clutches of the barbaric and irrational religious preachings/scriptures.


>These other internets that you have mentioned are not much significant outside their own little spheres.

Those "little spheres" can be 2-3 times the US market. And the rest of the English internet is of not much use to most people in their day to day lives, outside businessmen and techies exchanging business communications and technical information.

>Disclaimer: I am an Asian with English as a second (actually earlier as a third language). But due to English and due to English internet alone I could educate myself to a large extent and could come out of the clutches of the barbaric and irrational religious preachings/scriptures.

Well, not really representative of Asians at large then. There are always people that adopt a new culture and frown upon their old culture -- there's also whole thing called "internalized racism" -- feeling your ways are inferior to those of the dominant culture.

https://en.wikipedia.org/wiki/Internalized_racism


>>Well, not really representative of Asians at large then.

I didn't tell that I am representative of Asians. I did tell that thing to emphasize the fact that I am NOT a westerner English speaker.

>>There are always people that adopt a new culture and frown upon their old culture -- there's also whole thing called "internalized racism" -- feeling your ways are inferior to those of the dominant culture.

Then I'd say that there are people who like to cling to their cultures/ways however rotten/backward their cultures/ways/customs might be just because they become emotionally attached to it.

Meaningful progress can be made only if we look at our ways (cultures/customs) critically and if require throw part/whole of it away. Today there is so much knowledge available in English that if a person (or better any sizable group of people) adopts English language, he/she gets huge benefits out of it.

>>outside businessmen and techies exchanging business communications and technical information.

Most of the societal progress happens due to technical innovation/progress and business communications. Here also English dominates the world. Take for example the Wikipedia. The English one dwarfs any other language wikis.


I agree with you, and English is not my primary language (nor is it an official language of my country).

The medievals had Latin. We have English, and people do have incentive to learn English (while still keeping their regional languages) because almost all knowledge is available in English on the internet.

But we see everywhere efforts to "localize" everything from English to other languages, and this is ridiculous. Wikipedia in languages other than English is awful for almost all topics, websites I visit for the first time keep showing me badly-translated versions of their primary English content and I can't get MDN documentation to be shown to be in English, instead of the horrible translations made by voluntary people they automatically show to me.


Wikipedia in French, and Wikipedia in German are very good. They have each of them the third of the articles of the English Wikipedia (in 2012, source: http://royal.pingdom.com/2012/06/13/the-biggest-and-busiest-... ). Note also that the German Wikipedia had as many edits as the English ones, with fewer articles, which could be a good indicator of the (good) quality of the German Wikipedia.

More generally, there are plenty of very high quality websites in other languages than English on the Internet.


In addition to this, Wikipedia in Swedish is great and has 1m+ articles in a language spoken by <10m inhabitants.


I'm a native English speaker. I don't really know any other language, though I've studied a bit of the romance languages.

I don't think English is a good "standard" language. Despite widespread usage, there are a lot of problems I see with it when compared to other languages.

For example, the word read. Which read did I mean? If I spoke it, and meant the other read, was I talking about the colour red? I know, context and all that, but the inconsistent, broad rules often based on context alone isn't that great. I imagine computers would have a bit easier a time with a language with a more consistent structure.

I think a good parallel lies in programming languages. PHP is widespread on the web. Should we make it the standard? If I called for that, I'd be served a link to how PHP is a fractal of bad design or similar with a resounding no and perhaps a bit of shame to finish things off nicely.

That and in England the whole "St George's day hurkadurr" crap is pretty tedious. I don't want to see memes like "English is the world standard language. Why do we need a polish shop in ENGLAND?!"


This is a constructed language designed to solve most of those problems:

https://en.wikipedia.org/wiki/Esperanto

People have been pushing for years to make it the standard worldwide language.

Incidentally, you can use reddit in Esperanto: https://eo.reddit.com


The biggest issue with Esperanto, is that it's (from the wp page):"The phonology, grammar, vocabulary, and semantics are based on the Indo-European languages spoken in Europe."

While that puts it in the same boat as English - it strikes me as slanted heavily towards the "European dialects", rather than say, Farsi.

I actually think Japanese might make as good a "universal" language: it's an old language - and has few sounds. I think (but am not certain), that one would be hard pressed to find anyone that wouldn't be able to distinguish all the phonemes of Japanese (while many have issues distinguishing between "tap" and "flap" ("r" and "l" - right vs light)).

A big caveat is that I'm completely unfamiliar with African languages in general, as well as "old" American ones (both things like Navaho and Tupic or Macro Je). For use as a "universal" language, one would probably want to limit the vocabulary and reform the writing system in some way though. Perhaps choosing a subset of the existing Kanji and updating them to be more classic iconographic again.


Esperanto came to mind when reading through the comments here. I first heard about it around 10 years ago but not thought about it since. Certainly renewed my interest!


I actually took an Esperanto class when I was a kid. I'm old now, so that was a long time ago, and it never went anywhere in all the intervening years. I have now learned a tonal Asian language and I don't see how Esperanto would ever be adopted by people who speak such languages.


Is this comment for real? I'm really confused

If we were to standardise language it would only make sense to standardise to the one spoken by most people in the world and last I checked it was Chinese and Spanish.


Yes, my comment is for real.

I disagree that the language of choice should be Chinese or Spanish, because as far as I can tell more relevant content exists in English.

If however a standard language would be chosen and it was Spanish or Chinese, I would still be in favor of it.


The reason that most relevant content exists in english 'as far as you can tell' is because that is the only content you have looked for.

If I only spoke Chinese, I would be able to make the exact same claim; that more relevant content is in Chinese. It would be all the content I would see (since I would never be searching for english content, and couldn't read the stuff i did find)


I'm a Spanish native speaker and I think that one of the pros of English is that it's simpler. English grammar is much easier. Also English is the main language of scientific publications and the most used language on the Internet.

I think that a language is just a tool and I would evaluate it only by it's effectiveneness for sharing ideas.


I am not a native speaker of English neither Spanish, but I have learnt both languages. I don't think English grammar is easier than Spanish (but that is debatable). But I think every body can agree that the pronunciation of Spanish is much easier, as Spanish is pronounced as it is spelled.

Why not using an artificial language specifically built for simplicity, such as Esperanto?


At the end of WW2 Indonesia needed an official language. Instead of picking Javanese, they picked a language spoken by relatively few people because of its simplicity.

Indonesians are all bi or trilingual.

Indonesian is the worlds simplest language. It should be the official world language!


There is something delightfully charming about this idea, I love it. I had a friend who spent some time in Indonesia and told me how it was possible to pick useful language skills in days.


After high school, I saved up and spend 4 months traveling across Indonesia. I was almost fluent in it by the time I made it to Sumatra.


Yes, Spanish is very easy to pronounce, unlike English. I think English grammar is easier because it has simpler verb tenses.

Esperanto or Ido may be the ideal languages. Still many people are attached to their languages and cultures, and I think the only chance for a standarized world-language to be successful would require coordinated efforts and willingness by various governments, which is extremely unlikely to happen.


Why not using an artificial language specifically built for simplicity, such as Esperanto?

There are barely any users and there is only a small amount of content available.

I do agree that it's probably one of the easiest languages to pick up, since it is so regular (largely one suffix per word class, regular inflection, etc.).


I remember trying to look up documentation for nginx in its early days. Almost everything was in Russian, and with links and discussion boards also in Russian. That was an eye-opener for me, realizing how it feels when important software/internet information wasn't available in any of the languages I understand, and also that there really are huge parts of the internet that you normally don't see as a mostly-English reading person. Particularly Russian, Japanese and Chinese.


I'm not sure we can accurately gauge this number. A large part of the internet and usage of technology (especially programming) are all in English. I would be shocked if a majority of the people in the world didn't know at least some English and if more people could speak English.

Most of these statistics are always the primary language of the people. But I don't think that's a clear indication on how many can actually speak with each different language.


The relevant wikipedia page [0] puts it thusly:

> For example, English has about 340 million native speakers but, depending on the criterion chosen, can be said to have as many as 2 billion speakers.

This statistic is based on research[1] from 2008 which summarizes it's abstract with this sentence:;

> In short, we have moved in 25 years from a fifth to a quarter to a third of the world's population being speakers of English.

So, not quite the majority.

[0] https://en.wikipedia.org/wiki/List_of_languages_by_total_num... [1] http://journals.cambridge.org/action/displayAbstract?fromPag...


Moreover, I am not sure how accurate these numbers are in practice. E.g. I live in Germany. Most Germans (in West-Germany) had English in high school. So, according to these statistics they are probably part of the 2 billion.

Since I am not a native speaker of German, I sometimes try to start a conversation in English. For the far majority speaking English is difficult or they don't speak English at all, including people with a higher education (e.g. GPs).

I can fully understand, since I had five years of German in high school, but since I had rarely practiced since, I had similar difficulties speaking German.


They might not be fluent, but most Germans could have a basic conversation in English.

English is the global language. A Korean pilot working for a German carrier landing a plane in Costa Rica will be speaking English.


> Spanish

Are you counting by native speakers only? Because I don't think this is true, both by intuition and by what Wikipedia seems to say.

https://en.m.wikipedia.org/wiki/List_of_languages_by_total_n...


I can't disagree more.

The language of _your_ internet is English. Personally it always bothers me to think of how the largest parts of the internet are essentially walled off by language.

China has essentially its own version of everything we use day to day, so does Japan, Russia, LATAM ...


But that's exactly my point.

Instead of having a bunch of barely connected Internets walled off by language, why not standardize language and have one global venue of communication, where everyone can work together?

I am German and I often come across this with my friends, all of whom speak English as a second language. Yet they consume mostly the German parts of the Internet out of habit and convenience. But as far as I can tell, they are worse off for it, since they don't have access to a lot of the stuff the Internet has to offer.

That is especially obvious when searching for IT/tech/science topics, but also relevant in other contexts.


I couldn't agree more! As a German, all my friends always ask me why my phone is in English and they just won't understand how much more the "English Internet" has to offer.

Just the other day, we had to look up something for our school and I simply sumerized the first few lines of the English Wikipedia article and my teacher was amazed how quickly I got those information.

Furthermore, English allows you to see topics from many different views – people from all over the world express their opinions which is much more then most German Jornalist find relevant/agree with.


I'm the opposite. I'm a native English speaker but I put my phone / computer in German so I can keep up my language skills. It constantly teaches me new constructs in German just by using my phone, and by this point most of the concepts have become second nature to me.


When the great writers of 1950s envisioned the future of the computers, they depicted a giant centralized repository of all human knowledge, working tirelessly for the benefit of the society, because it was simply unthinkable to them that such a great power would be used for anything less grand.

Instead, (borrowing a popular internet meme) we use the computing power to throw birds at pigs.

I think you're misguided in looking at the Internet as "one global venue of communication." Users of the Internet don't need a network for global cooperation. They don't need a better world. They need a tool where you can talk to your friends and send them funny cat video (or the latest kernel patch, or whatever they feel like to share at that moment).


This will be solved as computer become better and better translators. We will be able to read, write, speak, and listen in whatever language we want, and computers will translate between them for us.


Same here. The "brazilian internet" is terrible, and I see brazilians who do speak English only browsing the internet in Portuguese, everywhere.


>Instead of having a bunch of barely connected Internets walled off by language, why not standardize language and have one global venue of communication, where everyone can work together?

Because I don't want a huge monoculture, and the end goal is not to have "everyone work together in one way" (which sounds like something from the Third Reich), but to let a thousand flowers bloom.

Ease of doing business is not the be all end all criterion...


First off, I disagree that a common language means a common culture. There are a lot of distinct cultures that share a language.

Personally I don't care that much about the ease of doing business. Rather, I think standardizing on a language would have a real benefit for humanity as a whole.

Apart from practical considerations, think about how often people from other countries are somewhat dehumanized as an "other" or "the enemy" without you being able to get a view of the situation from the other side of the fence because of the language barrier. I'm wondering for instance if the Cold War could have happened in a world where both parties share a language while being connected over the Internet.

Also, congrats on the unnecessary Nazi comparison.


The Nazi reference was unfortunate, but coldtea does bring up a valid point. Languages tend to become dominant because of military conquest or cultural hegemony, not because people want a language other than their own to be the lingua franca. English dominates the West because of the British Empire and because the US was able to exercise overwhelming cultural and economic power after World War 2.

Any discussion of "standardizing" the world on a language has to take into account how often attempts to do so have been employed as cultural genocide by colonizing powers, explicitly to separate native people from their culture.

To attempt to answer your original question, "why are we putting up with other languages anyway," we don't "put up with" them, because languages other than English are not a burden. Other languages persist because no one has forced their extinction yet.

And even if you "preserve" other languages, having a lingua franca means that eventually native languages will die out because they're simply no longer useful. How many Irish people actually speak Irish? How useful is Japanese outside of Japan? When those languages do die out, as they probably will, what will be lost when all of their literature and cultural referents are translated into English?

I'm not saying having a global common language wouldn't be good for a number of reasons, only that that commonality necessarily comes with a price not everyone is willing to pay.


>Also, congrats on the unnecessary Nazi comparison.

Not meant to offend (didn't even knew the parent was German at the time, thought I was an American), but it wasn't an gratuitous reference either. It's not like someone in an internet forum calling someone a "nazi" because they disagree with them or anything.

I legitimately believe that there are plenty of legitimate lessons to be learned (to avoid, of course) from the Nazis, as WWII and the Holocaust is the single most deadly and morally disastrous event of the modern era.

And these kind of schemes for "one global government" / "one common language" etc, do have parallels and historical precedents on ideas such that of the Third Reich. An occupied, German speaking Europe, if not world, was indeed one of their stated goals.


> Not meant to offend (didn't even knew the parent was German at the time, thought I was an American), but it wasn't an gratuitous reference either. It's not like someone in an internet forum calling someone a "nazi" because they disagree with them or anything.

The comment you took your quote from already mentioned that I'm German.

That's alright though. I'm not "offended" by the Nazi comparison because of my German heritage. I just wanted to highlight what I perceived to be needless hyperbole that inevitably derails the discussion about the idea at hand by way of Godwin's Law.

> I legitimately believe that there are plenty of legitimate lessons to be learned (to avoid, of course) from the Nazis, as WWII and the Holocaust is the single most deadly and morally disastrous event of the modern era.

Agreed.

> And these kind of schemes for "one global government" / "one common language" etc, do have parallels and historical precedents on ideas such that of the Third Reich. An occupied, German speaking Europe, if not world, was indeed one of their stated goals.

Your argument is basically: "The Nazis did X, hence we should never do X - for all X". I think this is a fallacy. There has to be difference between subduing Europe/the world and proposing to standardize language as a tool of communication for greater peace and more collaboration. I don't think choosing a standard language is any more sinister than choosing standard units of measurement or base 10 numbers.


>The comment you took your quote from already mentioned that I'm German.

Hmm, you're right, kind of read past that! I was still thinking I'm replying to an American as in my first comment.

>Your argument is basically: "The Nazis did X, hence we should never do X - for all X". I think this is a fallacy.

That would a fallacy indeed, as the Nazis also did some good stuff (cheap cars, some good welfare laws IIRC, etc), and also neutral stuff.

But I didn't say that for "all X" -- only for specific X which I, for one, think are of the Nazi's bad heritage -- "one global monoculture" I'd say is one of these, even if the culture is the English one.

>I don't think choosing a standard language is any more sinister than choosing standard units of measurement or base 10 numbers.

I guess our difference is mostly in how important we see language as part of culture and/or how sensitive we are to mightier cultures/languages taking over others.

For me losing a language would be as bad as losing a country's literary corpus -- that is, a huge part of the culture (where another might consider the whole literary corpus as something to be dispersed with, or not a big deal if it just survives translated or even forgotten).

On the other hand, something like "units of measurement" I agree are inconsequential -- and could be unified without much impact.


> I guess our difference is mostly in how important we see language as part of culture and/or how sensitive we are to mightier cultures/languages taking over others.

I think you are absolutely right, in that this is the key difference between our respective opinions. I nonetheless enjoyed the surrounding discussion.


Yes, and it was definitely one of the more looney parts of their world domination plan. The end goal of Lebensraum was not making everyone German, it was making everything more accessible to Germans.

It was a racist ideology (in the real, actual sense of racism, not the conspiracy theory academia is being haunted by in the US at the moment). Untermenschen you teach German are still Untermenschen -- they just make better servants because they understand your language.

Saying the idea of one global language is reminiscent of Nazi Germany is either gratuitous or belies a fundamental lack of understanding of just how dangerous the Nazi mindset was.

I too believe there is much to learn about our history to avoid making the same mistakes again but if you want something to worry about with regard to re-creating a Third Reich situation consider these two things happening in the US right now:

1. A resurgence of jingoism and ultra-nationalism (from dehumanizing civilian casualties in drone strikes all the way to Trump's treatment of Muslims).

2. Racism and sexism becoming socially acceptable under the guise of "intersectionality" (partially as a reaction to actual historic injustice and religous fundamentalist attitudes to gender roles).


It's funny how common a mistake this is. So often people try to distance themselves from the Nazis by using the phrase "let X flowers bloom". Unfortunately, that was a phrase used by Mao to flush out dissidents. I get what you're trying to say though. ;)

http://www.phrases.org.uk/meanings/226950.html


>Unfortunately, that was a phrase used by Mao to flush out dissidents.

Yeah, I know -- there have been other historical breakdowns of the phrase in HN.

That said (and to be the usual pedant I am):

1) Even if it was used to flush out dissidents, it would be the abuse of the phrase for a sinister purpose that would be the bad thing, not the phrase itself.

2)It's not that clear cut that Mao meant it in a cunning way. Even the link alludes to that: "Whether or not it was a deliberate trap isn't clear".

3) It's not even that clear cut that Mao was in the wrong in the first place -- revolutions and changes to whole empires are frequently bloody and rarely judged like normal historical periods, especially when they involve civil war et al. If one man killed 2.000 men, women and children we'd consider them a batshit crazy serial killer. Someone like Truman though can order to kill 200.000 people in two Japanese towns and still be considered a legitimate leader, not even a war criminal by many.


That's exactly the point.

People should stop wasting their efforts to maintain versions of the internet in other languages.


Cool, let's standardise on Mandarin. What? You can't speak it? Well stop wasting your time maintaining the english version of the internet.

You see how discriminatory that sounds? That's the reason there's not a single global language for the internet or otherwise.


This isn't about discrimination or other trendy buzzwords, just practicality. English, for better or worse, is the standard language of the wider Internet, not to mention many professional fields.


In the western hemisphere perhaps.


Incorrect. More people speak English in India than any other country in the world.

English is the global language (for now), like it or not.


While your nitpick is technically correct, we were talking about the language of the internet. Where chinese pretty much dwarfs english.


> Where chinese pretty much dwarfs english.

Do you have any evidence to back that claim?

There are lots of native Chinese speakers, but remember that English is the most popular second language in the world.


It's funny, weren't Chinese languages rather fragmented not so long ago[1]? So it would seem that uniting on a single language was rather useful to them.

I know people that work in some indigenous areas in an other country and sometimes from one village to the next, they'll have trouble understanding. There is no benefit to them, just loss. Same reason when intl orgs come in and push teaching children in their indigenous language (for e.g. math), some push back because they know there's little value in using a language no one else does.

1: Wiki says: Some 54% of speakers of Mandarin dialects could understand the standard language in the early 1950s, rising to 91% in 1984. Nationally, the proportion understanding the standard rose from 41% to 90% over the same period.


To be fair, Mandarin is simply not a very practical language, especially in terms of its writing system. Arguably Korean would be an improvement on Latin as far as writing goes, though.


Hangul is neat and quick to learn and arguably better. Though from a practicality standpoint, I think English is easier to get going on low-tech interfaces. With no IME required, you can do char-by-char typing with zero ambiguity.


As a native English speaker who only speaks a little German, if everyone agreed that the world should switch over to Mandarin and standardize on it, I would be fine with it.

In my opinion, having a standardized language is very important. There are a number of different groups online that I'm interested in, but due to language barriers, I can't be part of them. Could I learn all those different languages? Yes, but in my opinion it wouldn't be worth the effort.


The common language of the world should be Indonesian.

Currently the world language is English though.

Learning other languages is not difficult. There is no reason to not learn a new a language.


Languages shape and are shaped by the environment, by the way the speakers experiences the world. There's the example by Franz Boas, regarding the many words Inuit might have for different kinds of snow [1]. Their language, in a very utilitarian manner, evolved to that state; and, of course, Inuit people's perception of the world is shaped by the fact that "snow" is not a single nor simple concept (they don't see "snow", they see a particular kind of snow with its associated characteristics). Wouldn't using English be a disservice to them, in their environment?

Besides the kind-of evolutionary argument (which might be downplayed on a globalised world), you'd have to consider the history and culture which languages carry. There are e.g. many recipes in my mother tongue which are named from historical and social context; all this would be somewhat lost in translation.

Obviously you have a point and there would be a lot of advantages on standardising on a language. The great issue is then on weighting pros & cons. Or greatest, even: being able to even begin to consider the cons. Can we really have an objective measure of how much we, mankind, would lose by turning all languages except English in dead languages?

Anyway, it doesn't matter much as the biggest obstacle is obviously political, not cultural.

[1] which was said to be a hoax, but after all not so much: https://www.washingtonpost.com/national/health-science/there...


You make a couple of great points and I concede that their would surely be some friction losses.

However, my point is that I think the pros outweigh the cons in this instance and that humanity would be better off if everyone could communicate with everyone else.


I think that with the progress of automatic translation and interpretation (which is not perfect now, but is likely to progress -- I can't imagine all the current progress in machine learning not being also applied to automatic translation) as well, learning a foreign language for practical reasons could also become useless.

No lingua franca anymore...


Yeah but if everything can be effortlessly translated by everyone in a lossless manner, why not just standardize on a language then? Do languages at that point not become an unnecessary hindrance in personal communication without perceivable benefit?


Well, languages are not just about straight communication, but also about culture, litterature, history, puns, sounds, and so on.

All languages can't express as easily the same set of ideas.

There is this Sapir-Whorf hypothesis that says that language shapes what we think. It sounds terrible to me that we could sacrifice so much diversity, so many ways of thinking.


>Wouldn't using English be a disservice to them

No. If it's true that a group of people come up with words for different types of snow, they'll do it in whatever language they speak. It's not like these words just showed up in their language out of nowhere. Likewise, someone learning Inuit but living in Jamaica probably wouldn't learn all those snow words.


If the internet were used only to publish read-only content, like a magazine, then maybe this would make some sense.

But language shapes thought. It actually limits it. If you can't express something in your language then you are unlikely to even think it. People who speak different languages actually see the world in different ways, with different nuances and details.

And the internet is highly interactive, collaborative and social. You don't have to look beyond HN to see great examples of this.

This means that limiting the internet through a single language means we would be also limiting our ability to form, spread and implement new ideas.

Creativity often comes from finding a curious analogy that crosses domains. If one of those domains has only a poor English representation then it's unlikely that the brainsinvolved in an English-only conversation would discover the novel analogy.

So no, that would not be good for the internet as canvas of collaboration and communication of ideas and thoughts.


> If you can't express something in your language then you are unlikely to even think it.

This is a fairly strong statement of the Sapir-Whorf hypothesis, which is not widely believed.

You can state the hypothesis in various ways, ranging from the strong hypothesis ("you can't think things that aren't in your language", which is clearly false because you can learn new things about language) to weak ("language kinda influences your thoughts").

There is evidence in favor of what I'd call the "extremely weak Sapir-Whorf hypothesis": that language can influence the way you judge things that are established by convention such as colors and directions, in experiments that are specifically designed to emphasize these distinctions, sometimes.


For science? Legals? Business maybe? yes, there is a chance it could possibly work.

For everything else? Politics, literature, journalism, and of course everyday usage? It couldn't possibly work. Let's say you held the whole world population at gun point, forced everybody to learn English and and use it for then year. I'm sure that as soon as you relaxed your iron fist, the various locally used dialects would diverge in a few generations and you would be back to what we are now. The world is not as connected as we like to think.


Cue reference to Basic English.

http://ogden.basic-english.org/


Do you also think we should standardise on one programming language and that it would make the world a better place?

It would be an incredibly boring world where communication could only be expressed in some lowest common denominator.


Making english the lingua franca would be like making javascript the "universal" programming language.


Good idea Mr. Trump, but how about a standard programming language that everyone agrees on first?


Programming languages are all in English.

_if_ you didn't realize, _do_ think about it _for_ a _while_.


Linotte would like to have a word with you. In French. https://en.wikipedia.org/wiki/Linotte


That would be JavaScript written in Chinese with localized spanish, comments.


I am surprised to see how an off-topic discussion generated much more comments than the discussion on this cool project per se. It seems to be the norm these days on HN...


Surpises me too. I just pointed out that the project at hand hardly even mentions that it is targeted at English, as if NLP is English by definition and that doesn't need any explanation.


I used the Stanford nlp in a project last year, apparently it can (with appropriate models) handle German, Chinese (not sure which flavour) and Japanese (I think?).

Unfortunately what I needed to process was likely to be in either English or Indonesian, without forewarning which it was, so we had to use an api (we used Google in the end , but the Microsoft one worked too) to detect + translate the text to English, which has some interesting side affects when dealing with numbers with suffixes, and dates, due to limitatioms of what Stanford nlp recognised (1 million is recognised as 1000000, 1M is not)

Also, handling informal text can be a nightmare - I'm still wondering if a mechanical Turk solution wouldn't have been better.


If I were to get back into it, I'd still pick English because English is lingua franca. But I fully suspect that there are equal efforts in Madarin, Hindi, Spanish and French, we just don't hear a lot about them as outsiders of the community.


Probably related to English being the third most spoken language in the world and widely represented in international scientific communities, even by non-native speakers. Not defending it.


*many more languages




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: