Hacker News new | past | comments | ask | show | jobs | submit login

Funny how Slovene seems to tick all the complications checkboxes :D We have 4 grammatical numbers (singular, dual, plural for 3 and 4, plural for 5 and above), they repeat at mod 100 (so 101 is singular, 102 dual,…), it's an inflectional language with 3 grammatical genders, sentence should take a different form depending on whether the user is male or female,…



I was just trying to think of how it would work out in Polish....

directory is "katalog" in Polish, and it would be:

1 katalog

2 katalogi

3 katalogi

4 katalogi

5 katalogów

6 katalogów

....(it doesn't change for any number greater than 5)

But if you wanted to say "X files were found in Y directories" then you would have to say:

....1 katalogu

....2 katalogach

....3 katalogach

....1000 katalogów

(for 1001 even I am not sure if you should say katalogów or katalogach, both sound correct to me)

...and again, the whole thing changes depending on the speaker being male/female + singular/plural("I have found"/"we have found").

Compared to the grammar of Slavic languages, English is super easy.


> Compared to the grammar of Slavic languages, English is super easy.

And thus is solved the mystery all us Slavic speakers have such good English. Especially on the online. Because nobody had the guts to localize to our languages.

If they did localize to our languages "Omg this looks weird and smells off", we all cried. And switched our interfaces to English. Admit it, you did this in High School or even Middle School because the <insert slavic language> translation just didn't make sense.

And so all of us who are now in a position to localize these interfaces don't because it seems silly and pointless.

We come full circle. Younglings after us use computers in English because no good localization exists.


Yeah. I remember being a kid and HATING Polish translations of English games. Even though I barely knew any English playing any game in Polish just felt wrong. Nowadays I think it just felt more mysterious and interesting if apart from being difficult, the game was in a language I couldn't fully understand. But yeah, it contributed massively to me learning English, so it was definitely a good thing. But I also know what you mean - I wouldn't translate my programs to Polish unless I had to, because it just feels wrong to have them in my native tongue.


I think the poor understanding of English was part of the problem. We didn't understand, we just knew that you found what you need when you click "Insert" or whatever. To us "insert" didn't mean insert, it meant "That menu where you find an Image to put in your document".

In our mind the English word and its native translation have semantically different meanings. I discover this problem a lot now that I'm older. I understand the words and their translations, but they still mean semantically different things to me in different languages.

The funniest part is how when I'm in the US I know all the English words for pots and pans and stuff, but when I'm home in Slovenia and my girlfriend is here, it becomes almost impossible to translate. Because in Slovenia the pots and pans have Slovenian names, in the US they have English names. And to me those are completely different.


> In our mind the English word and its native translation have semantically different meanings.

Because they quite often have (I'd wager that more often than not).

To use your example, English "Insert" cuts through a different part of concept-space than Polish "Wstaw" (which is what usually ends up as a menu label). It's a fine translation up until you ask to insert a DVD - now you should say "Wsuń" or "Włóż".

And I don't think this is a problem. As I grow older I realize that this is how it should be. Personally, I think that the moment you stop mapping words from new language to the words of your native is when you actually start to be proficient in the language you're learning. Having mappings like "fork" -> "widelec" -> "<that pointy metal thing you use to eat>" in your head is totally wrong way to use the language. "fork" and "widelec" are two different labels referring to two different areas of concept-space, that happen to intersect somewhere in the area where you think about eating utensils.

That's why I keep my internal monologue (and speech, if people I'm talking with don't mind) switching constantly between English and Polish - there are concepts I can express with one language that I can't express in another, and any attempt at translation feels like lossy compression.

(and then I get tons of hate for occasionally saying "robi sens" instead of "ma sens"; I know what the proper translation of "makes sense" is, but the Polish expression emphasizes 'sense' as a property of things while English shows it as something that can be produced, and sometimes the idea I want to express is closer to the English than Polish version)


I agree completely.

The problem arises when your girlfriend (or friend or whatever) visits you in the homeland and you suddenly find yourself struggling to find words because many of the things at home don't have English names. It becomes even worse when you have to act as translator between them and your family who doesn't speak English or doesn't speak it as well.

Yes, I had a lot of fun over the holidays ...

Even something as simple as "Did your mum like me?" becomes difficult to translate because your mum said X and X doesn't quite translate into English with all the connotations preserved.


> Yes, I had a lot of fun over the holidays ...

Yeah, I can totally imagine it :D.


And don't even get me started at the case where you don't know the noun beforehand and need to synthesize the entire numeral phrase at runtime ("You have %d %s.")

Because, in Polish, numerals inflect in gender and case, come in two main variants ("normal" and "collective", and that's not including ordinals), and can (but not necessarily have to) for some genders undergo case changes that affect the verb part of the sentence under certain circumstances. Add to this the fact that there's even no definitive consensus on how many grammatical genders there are in Polish (opinions range from 3 to 9, with some of the theories based on numeral connectivity), and you're all set.

The Polish word "dwa" (two) has at least seventeen distinct grammatical forms, each of which has arcane rules that govern its usage.


> ....(it doesn't change for any number greater than 5)

Not so lucky there either:

....21 katalogów

....22 katalogi

....23 katalogi

....24 katalogi

....25 katalogów

And the same pattern repeats modulo 10.

EDIT: And as I think about the 1001 part... Actually the following sounds right to me (translating "in X directories"):

....w 1000 katalogów (w tysiącu katalogów)

....w 1001 katalogu (w tysiąc-jednym katalogu)

....w 1002 katalogach (w tysiąc-dwóch katalogach)

etc.


1 imenik 2 imenika 3 imeniki 4 imeniki 5 imenikov 6 imenikov … 101 imenik 102 imenika 103 imeniki … 201 imenik … 1001 imenik … Slavic languages are f*d up :D


Not to mention that I would never realise that "imenik" means directory. An imenik is a phone book. Potentially the Contacts app on my phone. But never a directory.

I think this has to do with vocabulary registers as well. We've learned to use English words for computer things. Using Slovene translations just feels weird unless they're a bastardisation of the English word. Similar to how English uses French words for foods because cuisine was a thing of the elite and they didn't eat pig, they ate porc.


But, a directory is a phone book!


Surely a phone book is a directory, not the other way round.


To you. To me it very much is not because as a kid I learned that directory is that thing on the computer where the other files are.

Things like that are difficult to shake off no matter how well you know a language.


ICU's messageformat solves this easily. One project that supports ICU's messageformat is L10ns http://l10ns.org


This looks quite nice and powerful and indeed an elegant way of solving the problem with multiple plural forms in a string. The only concern I have with that syntax is that it's yet another DSL, or markup language and translators need to know it, or could get it wrong. Granted, a program for helping translators might do automatic linting (much as Qt's Linguist already warns if you omit placeholders from the translated phrase that are there in the original).

Another thing is that the mini-language grows complex enough that the resulting text can be quite hard to read and understand:

    {people, plural, offset:1 =0{No one went.} =1{{user1} went.} =2{{user1} and {user2} went}.} other{{user1} and # others went}}.
is just a single (or two) placeholder and it takes a while to even parse how it's supposed to work.


Well, if we're going to introduce that level of complexity into a DSL, why not go full Turing-Complete and write it in code?

    (case (length folks) (0 "No one went.")
                         (1 ((elt 0 folks) " user went."))
                         (2 ((elt 0 folks) " and " (elt 1 folks) " went."))
                         (otherwise ((elt 0 folks) " and " (length folks) " others went.")))
You can wrap that in a lambda that concatenates resulting strings and voilà, you have "smart" string tables. And it's not a problem to make it even more DSL-y and translator friendly.


And then you have the exact same problem as if you'd write that logic in your source code. Just with half a dozen layers of abstraction, a more cumbersome way of displaying strings in your application and another programming language on top. I'd say that's not a net positive.


I disagree. That logic has to go somewhere anyway - you can't skip it because it's inherent in the problem of displaying a proper message. So you could at least write it in an expressive language instead of encoding it into what looks almost as readable as regular expressions.


Now you need to find a translator who knows Lisp


No you don't, in much the same way that you don't need a translator that "knows JSON or XML". Just don't tell them it's Lisp. That's how you do DSLs.

Also, I advocate closer work between translators and developers. Let the translators give the text and explain corner cases to someone who can code up the logic.

BTW. Lisp is only hard for people who acquired this stupid meme that "Lisp is weird/for crazy people". You'd be hard-pressed to find something which is simpler in terms of syntax and readability.


I recommend that you need to teach the translator about the markup. It's really easy to understand.

The way L10ns try to mitigate the complexity of the markup language is to provide buttons for pasting the correct markup for translators. Also a developer translates on his own language first. And this example will always be visible to the translator of an another language for easy reference. So the programmer offloads the logic thinking from the translator.

IMO ICU's messageformat markup is also slowly becoming the standard for localizing strings. It is already used widely by big organizations such as Apple, Google and Yahoo.

L10ns pre-compiles all message string. So it offloads the parsing performance.


Here is more info about the plural format of ICU's messageformat. http://l10ns.org/docs.html#pluralformat.


Just curious... are Slovene web services more likely to ask your gender after you sign up so they can get the grammar right or do they just go with 'male' or something?


Russian speaker here; I don't think it would be relevant, since services ought to be addressing you in second person, which I believe is gender-neutral in just about every Slavic language.


Wikipedia mention only singular, dual and plural ? What is the one for 3 and 4 ?

http://en.wikipedia.org/wiki/Slovene_grammar


No idea how it's officially called, but I'm not making this up :D

http://localization-guide.readthedocs.org/en/latest/l10n/plu...



It's the normal plural, but 3 and 4 take the nominative, while other numbers take the genitive plural.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: