Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I posted on another thread that not only does GPT4 handle Norwegian just fine (0.1% of training data for GPT3), but Norway has two official languages that are mutually intelligible and close enough that some would consider them dialects, but GPT can handle Nynorsk, the smaller of the two (Bokmål being the other) just fine.

Going one step further, I asked it to "translate" into both "Riksmål", an artificial conservative variant of Bokmål that basically rejects most of the last few decades worth of language reforms, as well as Romeriksdialect (dialect from the Eastern part of Norway)... For the latter it gave me a lecture about how it varies internally in the region (which is correct) and presented a "translation" of a test sentence that is recognisably one of the variants from the Northern part of the region.

Of course for these competency definitely bleeds over. They share an almost identical grammar and a majority of orthography, but I'm impressed enough it can handle Norwegian that well at all, much less that it knows the distinctions between the variants.



Yeah, its language skills are through the roof. There's no reason to talk to it in English. From what I can tell, it does a decent job of translating out of even languages like Southern Sami, with ~300 speakers and utterly neglible training corpus. It seems it knows enough about grammar from related languages, and can infer enough from context (and maybe even etymology) that it does an OK job.

I tested it by giving it some news articles from NRK Sápmi, and compare it with the Norwegian translation they have.

Edit: Seems I may have gotten lucky that time, it's being a lot more, um, creative in its translation now. Or for all I know it could be changes in the model.


Looking at the basic ChatGPT (not GPT-4) while it can do reasonable translations for smaller languages and answer questions in them, the quality of the answers suffers significantly in my experience, if I ask the same factual question in two languages, I often see that the English one gets a correct answer while the small language gets a coherent hallucination. For big languages (French, Japanese, Spanish, etc) that's not an issue, but for the smaller ones it clearly is.


> There's no reason to talk to it in English.

Depends what you're doing. I haven't managed to make it continue after it stopped in the middle of a sentence in Japanese, but giving it the instruction to do so in English does. In some other cases, prompting in English (and asking for an answer in Japanese) can produce better results than giving the same prompt in Japanese.


reply "続けて" or "continue" works.

Generating Japanese is slower than English (it's annoying on GPT-4), that's my reason to prefer English sometimes (especially for tech topics). ChatGPT web users don't pay for each token, but API users pay for each token, so they would make different decision.


In my experience, while "continue" can work, "続けて" doesn't. At least not when making it rewrite large texts, which is when I hit the limit. With "continue", it continues rewriting. With "続けて", it tends to make up new text, that yes, is the continuation of what it was writing, but with no connection to the original text it was in the middle of rewriting.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: