Hacker News new | past | comments | ask | show | jobs | submit login

This is important work. And think about African or southeast Asian languages, they are even more screwed. We need to make sure that AI multilingual to avoid total English domination of culture.



>We need to make sure that AI multilingual to avoid total English domination of culture.

Not saying English is an ideal language, but I'm interested in why you think it shouldn't dominate. Wouldn't a universal language be a good thing?


Languages are like frameworks, they (slightly) guide your thinking. Think about the same stuff in different languages and you’ll probably get more ideas about it than in one language only.


Moreover, censorship does not cross languages. Its very hard to make a multi-lingual censor system.

Being fluent in two major language systems, its very, very glaringly obvious, what are the taboos, politically correct untruths in both languages, that seem completely invisible to the monolingual speakers.


Orwell demonstrates this concept very powerfully in 1984.


That's the Sapir-Worth hypothesis. It's decidedly pre-Chomsky, and as Pinker calls it, it is both the most well known and accepted linguistic hypothesis and also almost certainly completely wrong.


It’s something you just know if you’re bilingual or more. It would sound threatening to monolingual speakers but this is just how it works. Languages are our thought construction algorithms.

It’s not a “Turing Completeness isn’t real” hypothesis, more like just “Rust != C”. I think this is where software nerd types predict wrong, as it sounds as if it is trying to disprove TC, which shall be futile attempt. It’s not(the former one at least).


> It’s something you just know if you’re bilingual or more.

Not in my experience.


It is obvious. Most words in different languages aren't 100% equivalent - they have large or small differences in sets of connotations. Same is true for phrases and sentences. When you translate a thought expressed in one language to another, you may get the primary, leading-order meanings across 100% right, but you'll still lose some lower-order connotations.

For a more direct analogy, I'd compare this to how LLMs process tokens, but I feel most of the community is not ready for it yet, as we're still stuck debating the validity of this comparison in the other direction...


> It is obvious. Most words in different languages aren't 100% equivalent - they have large or small differences in sets of connotations.

This is true. Which is why idioms and phrases can be hard to translate. But we're talking about Sapir-Worth hypothesis which is a much stronger claim.

I have never experienced that I need to think in a specific language in order to do something better (well, I only have two choices).


> I have never experienced that I need to think in a specific language in order to do something better (well, I only have two choices).

I found that for me, certain thoughts "flow better" in one language, and others in another. And this makes sense, because thinking is, in large part, exploring the web of associations and connotations, going down the gradient of what "feels right"[0] - and since those annotations and connotations have different structures in different languages, so will the thoughts drift in different directions, take different paths, even if they end up in the same place.

--

[0] - The obvious parallels between one's inner voice and the workings of an LLM are left as an exercise to the reader.


> I found that for me, certain thoughts "flow better" in one language, and others in another.

For you. So not an absolute truth as in “it is obvious”.

> The obvious parallels between one's inner voice and the workings of an LLM are left as an exercise to the reader.

Another obvious? Hmm.

Perhaps obvious to the techno-animists.


> I think this is where software nerd types predict wrong

Pinker isn't a software nerd, he's an evolutionary psychologist and psycholinguist.


He's indeed precisely one of the most qualified person around to opine on this subject.


"Evolutionary psychology" is not a real discipline. It's like calling yourself a "karmic surgeon" or whatever.


Both Chomsky and Pinker have been decidedly debunked.


A lingua franca is useful and probably inevitable. The downside is language and cultural loss. Works in translation are rarely quite as good, especially humor and wordplay. This is why various countries have "local language quota" rules for media; although derided by English speakers and HN, they're a way to keep the local language, culture and identity alive.


Apriori assuming that those things are somehow intrinsically valuable. These decisions (eg. language quotas) are made by armchair intellectuals with career/identity investments in the said language. Language evolution ignores them, but they can be annoying in the interim.


!

Firstly, everything is only valuable in the sense of being valuable to someone or some people; nothing is intrinsically valuable.

Secondly, everyone has a career/identity investment in a language.. their first language. The one they work in, went to school in, read in, talk to their family in, consume literature in. (I suppose HN devalues literature as well).

For monoglot anglophones struggling to understand the concept, imagine if the US declared its official language was now Standard Mandarin. There would be riots. Heck, I've seen Americans get mad at the mere use of Spanish, the country's second language.


>Firstly, everything is only valuable in the sense of being valuable to someone or some people; nothing is intrinsically valuable.

Agreed, so when tech and globalisation are pulling towards unification for practical reasons - your argument reduces to "I'm going to force others to use what I like because I don't like the decisions people are making".

I'm not arguing we should make anything official or force anything.


Ehh, are you as a non-intellectual ready to throw away your first language? Also, think of people whose first language is their only language.


Yes ? If everyone suddenly started using English instead of Croatian it would be a net positive in my book. I'd probably get a lot better at casual English which is something I notice talking with non-Croatian speakers outside of work.

I'm not saying we should force adoption of English ! Tech and globalisation are pushing in that direction in the "west". People I see pushing back the most are "intellectual elites" invested in the language, presenting their value judgements as objective arguments.


But everyone wouldn't suddenly start using English, that's not feasible. Is everyone in your family fluent in English (my family is not)? In every country, there are people who couldn't adapt and disproportionally so in vulnerable groups. Suddenly, you would be in a country where you don't know the language. Everyone would be immigrants in their own country.

Protection of national languages is as much on the agenda of politicians, including right-wing populists. In your view, are the songs, stories, books and plays in Croatian the property of the elites and not of every Croatian?

You say we wouldn't force it, but if we left it to Hollywood etc. they would force it. Leaving issues for the markets to decide freely brushes aside all the negative externalities.


Programmers understand the efficiency and inefficiency of everything... and the value of nothing.


Not from an aesthetic sense. I think it's really cool that we have a lot of languages. I'm personally willing to pay a high price in inconvenience to keep that coolness around, although not everyone would.

However I also don't think we will have to. Machine translation and language learning are substitute goods -- the better the former gets, the fewer people who will feel any desire to pursue the latter, because it just won't be that big of a deal to translate between X and Y anyway.

A universal second language for commerce is a fine middle ground, though.


Given recent developments in AI, building a true Babelfish is more of a hardware challenge than a software challenge these days.


The languages you think in affect your decision making, your creativity, how you perceive the world. If we were restricted to a single language, we’d lose as individuals and as a species.

https://www.theguardian.com/science/2023/sep/17/how-learning...


For some reason it is only monolingual people who ever say this.


Project Aya is one such attempt at a multi-lingual model (targeting 101 languages):

- https://txt.cohere.com/aya-multilingual/

- https://aya.for.ai

I'm a contributor to the project and all data and model will be open-sourced.

We're looking for contributors in many languages!


It is a concern because presumably most people in office jobs are going to need to be able to use these tools, but I am somewhat comforted to know one language that AI systems do not understand well yet because of lack of texts. However, I think that will be short lived.


I can speak in my own language with chat gpt without much issue


While I can speak in Portuguese without much issues (except being hard for them to stick to European Portuguese), I've nooticed that sometimes it uses a clear translation of an English expression that does not feel natural in Portuguese at all.


You can, but it will get facts more likely wrong than if you converse with it in english


This will be easy, it will go from mostly English to hyperpolyglot quickly.


If it's training on available online corpus then it will go quickly mostly for English and Mandarin.

Most countries' classic texts and books are still undigitized sitting in Libraries and public archives.

Also book publishing market and online publishing are proportional to total population, smaller country means less content.


The issue raised in the article is that there may not be enough training material in many languages to do this.

I find this very plausible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: