Hacker News new | past | comments | ask | show | jobs | submit login

hey there, I work on this project. We categorize a language as low-resource if there are fewer than 1M publicly available, de-duplicated bitext samples.

also see section 3, table 1 in the paper: https://research.facebook.com/publications/no-language-left-...




hey, this sounds silly but I can't seem to find a link of all the languages covered in the 200 hundred languages. I've looked at the website and the blogpost and neither have a readily available link. Seems like a major oversight. There is of course a drop down in both but the languages there are a lot less than 200. I'm particularly interested in a list of the 55 African languages for example.


We have a full list here (copy pastable): https://github.com/facebookresearch/flores/tree/main/flores2... and Table 1 of our paper (https://research.facebook.com/publications/no-language-left-...) has a complete list as well.


Nice to see Esperanto made the cut — the only artificial language to do so, AFAICT.


I was happy to see that as well!


ha yes, that's correct. If you have thoughts on specific constructed languages where having translation would really help people, let us know!


thank you!


Looking at the list, I see a lack of Native American languages. Did anyone try to contact the tribes during this?


We interviewed speakers of low-resource languages from all over the world to understand the human need for this kind of technology --- what do people actually want, how would they use it, and what's the quality they would find useful? Many low-resource languages lack data online, but are spoken by millions. However, many indigenous languages are spoken by smaller numbers of people, and we are definitely interested in partnering with local communities to co-develop technology and have been actively investigating these collaborations but don't have much to share yet.


I'll take that in good faith, but I will say Facebook has been a particular pain for many tribal folks given its true name policy and banning people who it thinks are using a fake name. Yellow Horse was one that was wildly reported, but their are others. Mostly anything that takes the form Adjective Noun. Had a rather painful thread with someone claiming to be a Facebook employee that defended this practice. I haven't heard of anyone reaching out, and Lord knows we could of used the help because COVID has been a particular disaster for language preservation even with an extremely high vaccination rate.

I do admit I'm a bit bitter given another of the big silicon valley companies (Apple) claiming they specifically help the TCUs (Tribal Colleges and Universities) when I can find no one that knows about this help other than taking our money for product at the same price as other accredited educational institutions.


I was unfamiliar with this issue. Was their name Yellow Horse in English? Or was it supposed to be written in a language not available so an English translation was used?

I have a feeling that if it was written in the original language it would go through, since many English names also have adjective noun original meanings like ‘beautiful flower’.


Was their name Yellow Horse in English? Yes

I have a feeling that if it was written in the original language it would go through, since many English names also have adjective noun original meanings like ‘beautiful flower’. The legal name is in English, so anyone expecting it to be written in the original language is expecting too much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: