Hacker News new | past | comments | ask | show | jobs | submit login
Poro-34B: Open-Source Multilingual (English, Finnish) & Code LLM (silo.ai)
7 points by reqo 10 months ago | hide | past | favorite | 1 comment



"Poro’s advanced capabilities with European languages like Finnish descend from how it addresses the core challenge for low-resource languages: training LLMs requires enormous amounts of data, but for low-resource languages like Finnish, sufficient data is simply not available. In general, Poro addresses this by cross-training low-resource languages with high-resource languages. This takes advantage of a cross-lingual signal that allows the model to achieve higher performance for the low-resource language than training a monolingual model, and has the further advantage of teaching the model basic translation capability."

I wonder how this cross-training affects the "overall quality" of the LLM for the low-resource languages? Any scientific papers to pinpoint?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: