Hacker News new | past | comments | ask | show | jobs | submit login
Pluralization rules around the world (developer.mozilla.org)
89 points by lucb1e on May 26, 2018 | hide | past | favorite | 54 comments



This reminds me of an old Russian joke. A Soviet factory has six fireplaces and only one fireplace poker; rather than carrying the poker around every time they need to use it, they decide that they should get five more so that they can keep one at each fireplace.

At this point you need to understand a big of Russian grammar. As mentioned in the link, Russian has different forms for -ending-in-1 numbers, -ending-in-2-through-4 numbers, and -ending-in-5-through-9 numbers. Also, the third set of plurals are often irregular.

This being a Soviet factory, the requisition for fireplace pokers needs to be sent with all the correct paperwork to the bureaucracy. A debate arises: What is the appropriate plural for kocherga? Send the form out with the wrong word, and the wrath of the Soviet bureaucracy will fall upon them.

After arguing back and forth for a while, an old janitor hears the conversation and proposes a solution: Send in a requisition asking for three kochergi and two more kochergi. A few months later, they get their fireplace pokers along with a note: "Here are the four kochergi and one more kocherga which you requested."


In Japanese (which i don't speak, and only know a tiny bit about), counts of objects are always formed using a number, a noun, and an appropriate "counter". AIUI, a bit like how in English we say "three pieces of paper" or "two pairs of trousers" rather than "three papers" or "two trousers", only it's for almost everything:

https://en.wikipedia.org/wiki/Japanese_counter_word

The counters often apply to classes of things, like "small animals", "long, thin items", "liquids", "flat items", "sitting occasions", etc. For example, i would guess pokers would use 本, 'pon'.

Every noun uses one of the counters, there are lots of them, they all carry a shade of meaning, and some of them are pretty obscure. Hence, there is also humour around them. I remember a Japanese-speaking friend explaining how she'd told someone that she had seen three rabbits on the road, but rather than using the normal counter for small animals, she used the one for flat things, because they were roadkill.


A deck (of playing cards) would translate to (トランプ)一組. For individual cards, the counter would be 枚.


I'm fascinated by this joke, but I have to admit, I don't get it.


The factory workers didn't know what the right word was for 5 fireplace pokers was -- only the forms for one or 2-4. So to avoid having to guess (and possibly getting it wrong) they made their request in a convoluted way.

The bureaucrats didn't know what the correct plural form was either, so their reply is similarly odd.


Basically, a lot of Russians have no idea how to say 5 pieces of kocherga. Factory workers think they can't spell it because they are just uneducated, but the reply they get suggests that the management has no idea either. Honestly, I couldn't say it myself before I looked it up when writing this reply, even though I've heard the joke before. The word is untypical.


Is there any significance to the fact that management divided the pokers 4+1 instead of 3+2? If I was trying to gloss over the fact that I couldn't say "5 pokers", I'd be pretty happy to have the excuse to just match what it said on the requisition form.

Something like "You requested 3 kochergi and 2 more kochergi. Here they are."


>Is there any significance to the fact that management divided the pokers 4+1 instead of 3+2?

Yes, that indicates that management processed two requests as one. If they would answer 2+3 that would mean they have not noticed what is going on.


what are the words? I'm curious but not knowing russian I have no idea how to even search for this


All Russians agree on this:

    1 kocherga = одна кочерга
    2,3,4 kochergas = две,три,четыре кочерги
But then it gets ambiguous:

    5 kochergas = пять кочерёг?
    5 kochergas = пять кочерг?
    5 kochergas = пять кочерыг?
    5 kochergas = пять кочерег?
All of those sound both correct and yet awkward. The first one is officially correct (and I have no idea why).


Just look the word up in a dictionary and find the declension box.

https://en.wiktionary.org/wiki/%D0%BA%D0%BE%D1%87%D0%B5%D1%8...

So kočergá, kočergí, kočerjóg.

English declension is so simple we don't include this information in a typical dictionary entry. ("Poker, pokers, poker's, pokers'")


"Just" looking it up occurred to me too, but that table doesn't help if you don't also point out that for some reason it's entry for the genitive plural that we're looking for.


Numbers ending with 2-4 (except for 12-14) put the noun in singular genitive case in Russian. 5-10, on the other hand, require a plural noun (also in genitive). Since plurality is not regular and plural cases are even less so, some people will have trouble with deriving the correct plural case for some words. Something similar to some English speakers sometimes being confused about plural for Greek and Latin origin words ("forums"/"fora", "viruses"/"virii" etc.).


relates to the topic of declension in Russian. In English, there's only 2 forms - e.g. 1 cookie, but N cookies (where N>=2). Imagine that for 3 cookies you would have something special again, like 3 cookises, and so on. In Russian, the word "kocherga" has quite intuitive forms for 1, 2, 3, 4, but when it comes to 5, our intuition breaks down, because the word is unlike any other, so it's difficult to reason by analogy. (This word is really a special case; normally, it's very easy to find the right ending). That's why they chose to put it as 3 kochergi+2 kochergi - just to be on the safe side (BTW, grammatically correct form would be "5 kochereg", which sounds really funny for a native Russian speaker)


The source for plural rules (with machine-readable representation) can be found in the Unicode CLDR: http://www.unicode.org/cldr/charts/33/supplemental/language_...


Wow, this is super-awesome!

But I think the Brazilian Portuguese rule in this chart for the cardinal 0 is wrong: I think it should be "other" (0 pontos) rather than "one" (0 ponto), just as in Spanish, English, and European Portuguese.

I've sent a note to a native speaker to check my intuition.


Native speaker and my judgement matches yours. I was surprised to see that on the page.


"it's complicated"


If you want to go deeper, there's the Grammatical Framework Resource Grammar Library: https://www.grammaticalframework.org/lib/doc/rgl-publication...


In case anybody needs another, similar rabbit hole to get lost in: http://www.sf.airnet.ne.jp/ts/language/number.html


In case anyone else was wondering about form #14 which is marked "unused": apparently earlier versions of Firefox incorrectly used it for Macedonian. Firefox 59+ uses form #15 instead: https://bugzilla.mozilla.org/show_bug.cgi?id=1415906


Note that that bug was just a documentation issue (see comment #21) - Firefox already used form #15 even before version 59.

Version control shows Firefox has used form #15 for Macedonian at least since 2012: https://hg.mozilla.org/l10n-central/mk/annotate/d13b2b460aeb...


I posted this in a group chat with "#14 will surprise you!" attached. I was wondering indeed!


Did you get there from here? https://factorio.com/blog/post/fff-244


I keep hearing about this game. Is it any good?


Automating a factory, then redo it so the ratios are right, then again so it's more efficient, then add trains. There's also the circuit system for automating further and doing fancy things like this: https://www.youtube.com/watch?v=indN4kcshB0

It's also quite[0] addictive. You may find you burn a lot of hours.

Exceptional value for money if it's your sort of thing.

[0] The Cracktorio nickname is well deserved.


In the interest of giving context, here is the original video¹ which was impressively recreated in Factorio:

https://www.youtube.com/watch?v=SJrY3p9nzVY

1. Well, the original file was copied around the Internet well before YouTube, but that is the original video re-encoded in higher resolution and uploaded to YouTube.

Further reading: http://knowyourmeme.com/memes/holiday-light-show-videos


The game is all about automating a factory. It's basically a survival game with most of the survival and grindy bits ripped out and turned into better crafting mechanics. If you like games like Minecraft or OpenTTD, you'll probably like Factorio.

One of the impressive things they've managed to do is make the multiplayer scale quite nicely--how many other games can say they have 400 concurrent players on one map and still be playable?


Planet side had a hard limit of 133 players per team, three teams so would end up as 399 players per map. But all the maps work together and influence the game play on all maps so the max per server was a lot more. Sdly they took it down and Planet side 2 isnt as good.


The website and the sibling comments already gave you an impression of what game it is in general. If that sounds okay, give the demo a try. I liked it, but wasn't convinced to spend money just yet because the demo is quite small. A friend sent me the Linux version (no DRM) and that convinced me soon enough. If you need the full version and solemnly swear you'll buy it if you like it (you will need to for updates anyway), let me know! (Contact info in profile.)


I did indeed!


Why are Latin/Greek listed together under rule #1? They may have shared cultural history but they’re not closely related aside from being in the very large Indoeuropean family. Latin should be with the Romanic languages if anything.

Also, it’s a bit naïve to say that a declined language only has 1-2 forms without taking into account oblique cases, but that’s a hard thing to deal with programmatically.


This is about how many forms of a phrase are needed to cover all cases. You have a list of phrase templates for each phrase where the number gets substituted in and the number of templates for each phrase depends on the language rule. So for Russian you have 3 templates and if the number is 31, you use template 1, but if it's 0, you use template 3. The relevant word is already declined in the proper case. This isn't about programmaticaly pluralising a single word, it's just so you can select a phrase template.



> Families: Asian (Chinese, Japanese, Korean), ...

I can't complain, but what an odd name for that "family"!

(...which is not actually a family, by the way.)


I was struck by this too. Chinese and Japanese/Korean are exteremely different languages. Japanese is an agglutinative language whereas Chinese is analytic. Japanese is probably organically more similar to English (if it is at all) than Chinese. But one needs to note that Chinese affected a lot of Asian languages and what we used to call Altai Language family (Turkic, Mongolic, Japonic languages) was actually bunch of different agglutinative languages aggressively affected by Chinese (and each other) which made linguists confuse for decades. Note that Altai Language family does not exist, those languages are not related at all (which is another mistake in the website)


It says it's about number, but then in the Examples section, Sorbian lists three nominative forms and one genitive. But not all the genitive forms, just the plural form. And no other cases. I'm... confused about how this is supposed to work.


Weird that Hungarian is listed under rule #1, it should be under #0: Nouns preceded by numbers are always in the singular in Hungarian. Does anyone know for the other Finno-Ugric languages listed under #1 (Estonian, Finnish)?


Don't know Hungarian, but pluralization is more than changing the noun after a number.

For example, in full phrases like "Clear history after N days", "Found N files" some languages would change the part before the number (instead or together).

Another example is text+button "N items in your cart [Buy them now]". The text may be unaffected by the number at all, but in the button "them" usually has to become "it" when N is 1.

I don't think the article is accurate. Turkish definitely shouldn't be together with Chinese, it needs plural forms in some situations in which Chinese doesn't, I know this because our product was localized for both. Unicode CLDR is recommended reference.


Despite the title, the page describes a function, not the language grammar as a whole.

Hungarian has plural forms. kutya = dog, kutyák = dogs. They follow the same pattern as english.

The plurals are used way less often, as [roughly] the language doesn't pluralise when it's redundant. But that's outside the scope of PluralForm., and it's up to its users to avoid `PluralForm.get`ting phrases which already tell you the quantity.

[That said, the Chinese example should probably be reworded. The stated reason applies to Hungarian, but it's not the whole reason they're in #0.]


> it's up to its users to avoid `PluralForm.get`ting phrases which already tell you the quantity.

You mean one should not use `PluralForm.get` to generate snippets of the form "0 pages", "1 page", "2 pages"? Because if that's what you mean, you are directly contradicting all of the examples on the page.


All the concrete examples on the page are English — so they use plural forms when English does.

That said, my wording did needlessly assume one of two approaches. The 'downloads' example shows both.

PluralForm is often called at the whole-phrase level, for all languages, rather than as at an 'atomic' level, at the points where specific languages pluralise things. This contrast is shown in the 'downloads' example.

In this use pattern, the correct approach is to write the same thing twice, if the 'plural form' of the matches the singular one.

This pops up in English too. For instance, chrome://browser/locale/downloads/downloads.properties contains:

shortHours=h;h

shortMinutes=m;m

shortSeconds=s;s

And a multiplayer game might need

ready=Are You Ready?;Are You Ready?

As most languages have a singular 'you'.


> All the concrete examples on the page are English — so they use plural forms when English does.

... which are all cases where Hungarian doesn't use the plural, which is why Hungarian shouldn't be in the same group as English.

> In this use pattern, the correct approach is to write the same thing twice, if the 'plural form' of the matches the singular one.

... except if the plural form always matches the singular one, in which case the correct approach is to write the thing once. Category #0.

Thanks for your efforts, but you have clarified nothing with regards to Hungarian.


> except if the plural form always matches the singular one

Hungarian has plural forms which are distinct from the singular. https://en.wikipedia.org/wiki/Hungarian_noun_phrase

Otherwise, yeah, they'd be in group 0.


> you have clarified nothing with regards to Hungarian.

Well, except the "are you ready" example. That is indeed one case where it might make sense to use different Hungarian forms.


It's correct for Finnish: 1 tunti, 2 tuntia.


In some cases, I can guess the logic behind the rule, (e.g. "twenty and one second" type idioms) but in other cases, I can't really fathom why it would be so complicated.


While I couldn't comment on how they evolved (and I suspect there are many different answers), I'd point out that English has a very complicated pattern for ordinals:

1, ends in 1 except 11, 111...: -st

2, ends in 2 except 12, 112...: -nd

3, ends in 3 except 13, 113...: -rd

everything else: -th

(also 0, most people say zeroth but nobody really feels comfortable with it.)

However they came to be, there's no mental load in applying them and nobody who speaks English natively ever much thinks about it.


Yeah, totally agree that there are plenty of things that don't make sense in English. The grammar is easy, but the sure number of irregular froms has got to be really difficult to deal with. At least the rules for plurals in some of the more complex examples are reegular.


This.

Come from country who use Type 0 (i.e. no plural whatsoever), English is already complicated and make no sense at first.


The general term for this is grammatical number. English gets by with just singular and plural, but this is by no means universal across different languages. A singular/dual/plural breakdown is quite common as well--Arabic, for example. You can also have a paucal form to represent "few", with or without the dual as well (Arabic has dual and paucal, Russian just has the paucal).

On top of this base strata are some more choices. You have some choices to make for 0 (English says that 0 gets a plural form, French believes it to be singular). You can also choose how the rules differ when you get to systematic naming for large numbers--do you lump it in with many, or do you reuse the rules for the small numbers when they show up in larger numbers? Note that systematic naming doesn't have to be based on base-10 (say, Danish), nor does the start of systematic names need to happen on a clean boundary (English starts it at 13, for example).


> nor does the start of systematic names need to happen on a clean boundary (English starts it at 13, for example)

It'd be hard to make a case for regular number words starting at thirteen. 13, 14, and 15 are all exceptional in that the numbers 3/4/5 are not usually pronounced "thirt", "fort", or "fiff". You could make a case for starting at 16, but even then it isn't normal for the lower place value to precede the higher one. At 21 you see words that are formed in a fully regular way, but the roots for 20, 30, and 50 are still odd, now being "twen", "thir", and "fiff".

From 60 on you can construct number names just by knowing the numbers 1-50 and the place values (hundred / thousand / million / etc.).


The systematic ordinals start at one. The word one becomes first, two becomes second, three becomes third, and everything else has th appended.


This commment ( https://news.ycombinator.com/item?id=17165382 ) accurately slams the extreme complexity of English ordinals in specific, and is sibling to my parent comment.

In order to call the ordinals perfectly systematic, you need to take the irregular cardinals as a given. If you're having trouble there, you'll have trouble with the ordinals too.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: