
Fluent 1.0: a localization system for natural-sounding translations - feross
https://hacks.mozilla.org/2019/04/fluent-1-0-a-localization-system-for-natural-sounding-translations/
======
lewisl9029
One thing that I've always struggled w.r.t i18n is having to split messages up
into often times less coherent chunks in order to add things like links or
tooltips or styling elements in the middle of the text, which makes it more
difficult to localize messages as a holistic piece independent of the source
language.

For a slightly contrived example to demonstrate this, let's say you have a
string like this:

"Please click here 7 times to confirm"

Where you want to make the "click here 7 times" look like a link by wrapping
it in a <a> tag, or just styled differently using a styled <span>.

Using something like react-intl, which is what I've used in the past, you'd
have to do something like this:

    
    
      <FormattedMessage
        id="confirm"
        defaultMessage={`Please { confirmLink } to confirm`}
        values={{
          confirmLink: 
            <a>
              <FormattedMessage 
                id="confirm-link"
                defaultMessage={`click here {clickCount, number} {clickCount, plural,
                  one {time}
                  other {times}
                }`}
                values={{ clickCount: 7 }}
              />
            </a>
        }}
      />
    

If then some language happens to require a completely different sentence
structure that changes the ordering such that the "to confirm" part needs to
be interleaved somewhere in the middle of the "click here 7 times" message to
sound fluent, this would not be able to accommodate that.

I'm wondering how people generally deal with this, in React and elsewhere.

~~~
stasm
This is a great point and something that we've seen come up very often in
building UIs. The good practice which we recommend to developers at Mozilla is
to avoid splitting or nesting messages, because it makes it harder for
translators to see the entire translation at once.

We've taken a layered approach to designing Fluent: what we're announcing
today is the 1.0 of the syntax and file format specification. The
implementations are still maturing towards 1.0 quality, but let me quickly
describe what our current thinking is.

For JavaScript, we're working on low-level library which implements a parser
of Fluent files, and offers an agnostic API for formatting translations. On
top of it we hope to see an ecosystem of glue-code libraries, or bindings,
each satisfying the needs of a different use-case or framework.

I've been working on one such binding library called fluent-react. It's still
in its 0.x days, but it's already used in a number of Mozilla projects (e.g.
in Firefox DevTools). In fluent-react translations can contain limited markup.
During rendering, the markup is sanitized and then matched against props
defined by the developer in the source code, in a way that overlays the
translation onto the source's structure. Hence, this feature is called
Overlays. See [https://github.com/projectfluent/fluent.js/wiki/React-
Overla...](https://github.com/projectfluent/fluent.js/wiki/React-Overlays).

Here's how you could re-implement your example using fluent-react. Note that
the <a>'s href is only defined in the prop to the Localized component.

    
    
        <Localized
            id="confirm"
            $clickCount={7}
            a={<a href="..."></a>}
        >
            {"Please <a>click here {$clickCount ->
                [one] 1 time
               *[other] {$clickCount} times
            }</a> to confirm."}
        </Localized>
    

I'd love to get more feedback on ideas in fluent-react. Please feel free to
reach out if you have more questions!

~~~
drdaeman
> `[one] 1 time *[other] {$clickCount} times`

How does this work for languages that have more complex pluralization rules?

E.g. in Russian it's "1 раз", "2 раза", "11 раз", "12 раз", "22 раза" and "55
раз" \- the case depends on the number ending, with exceptions for 11, 12, 13
and 14.

~~~
zbraniecki
That's a great question!

Fluent relies on Unicode Plural Rules [0] which allow us to handle all (as far
as Unicode knows) pluralization rules for cardinal and ordinal (and range)
categories :)

[0] [http://cldr.unicode.org/index/cldr-spec/plural-
rules](http://cldr.unicode.org/index/cldr-spec/plural-rules)

------
pcr910303
OMG this is so cool to a person who lives in the CJK world (to be specific,
I’m Korean) where the order of noun/verb/adj is reversed and always gets to
see programs that display text something like ‘Site is news reader HN’,
‘Button press confirm to’.

It’s a pity that the programming world is still super bad at i18n :-(

~~~
chungleong
It won't help much with Chinese localization, where the persistent problem is
that developers assume every language has the words "yes" and "no".

~~~
themacguffinman
Doesn't Chinese have understandable localizations for affirmative/negative
responses like "是" and "没有"? I don't see the problem.

~~~
chungleong
That's roughly like putting [OK] and [Cancel] into a dialog box that asks "Do
you want to save this document?". Perfectly understandable but awkward,
somewhat confusing to non-tech people--in a word, unfluent. Answering such
questions in Chinese (and I imagine, in Welsh) requires the verb that was used
in the question.

------
ngrilly
The comparison with gettext is really interesting:
[https://github.com/projectfluent/fluent/wiki/Fluent-vs-
gette...](https://github.com/projectfluent/fluent/wiki/Fluent-vs-gettext)

Especially the advantages and drawbacks of using the source string as a
message identifier, compared to a developer provided ID.

I'm wondering if fluent has something similar to xgettext, to extract the IDs
from the source code?

Edit: Looks like there is some discussion about extraction here:
[https://github.com/projectfluent/fluent.js/wiki/React-
Bindin...](https://github.com/projectfluent/fluent.js/wiki/React-Bindings-
Discussion)

~~~
olau
Don't take the comparison at its face value, it's clear to me that whoever
wrote it isn't really familiar with gettext, or deliberately talking it down.
Yes, it's sort of ancient, but the problems mentioned can be solved.

And using the source string as ID is a pretty clever trick. Of course, there
are some downsides, but there are certainly also downsides with separate IDs.

Having said that, Fluent looks interesting.

~~~
ngrilly
The downside of using separate IDs is that the developer has to "name" each
string shown in the user interface, instead of just using the source string as
an ID. And as you know, naming things is hard ;-)

~~~
zbraniecki
Yes, naming is hard, but, to quote the previous commenter "this can be worked
around" \- you can `slug` any string if you want to. We prefer to think of the
ID as the base of the social contract between the dev and the localizer. This
enables a lot of fine tuned control over string invalidation.

~~~
ngrilly
Considering the translation ID as a "slug" is a good tip to ease choosing the
ID. Thanks!

------
theon144
Nice! This honestly sounds really great; coincidentally as a Czech person, the
multiple plural forms have sort of been a bane of i18n for me, a surprising
amount of solutions don't even take this into account at all (?!) or require
dumb workarounds.

With Mozilla's experience and adherence to ideals of interoperability and
openness, I can see Fluent as a solid "golden standard" solution for a great
chunk of i18n needs :)

~~~
zoul
Ahoj! :) This is a perfect article that always comes to mind when talking
pluralization:

[https://metacpan.org/pod/distribution/Locale-
Maketext/lib/Lo...](https://metacpan.org/pod/distribution/Locale-
Maketext/lib/Locale/Maketext/TPJ13.pod)

We Czechs have it easy!

~~~
Tomte
Great article! I'm always impressed by the thought going into Perl libraries.

There was a presentation years ago, how Perl handled Unicode right and every
other programming language didn't (with Python 3 pretty close, IIRC).

Does anyone remember the URL?

~~~
arnsholt
I think the presentation you're thinking of is Tom Christiansen's _Unicode:
The Good, the Bad, and the (mostly) Ugly_ :
[https://www.azabani.com/pages/gbu/](https://www.azabani.com/pages/gbu/)

~~~
Tomte
I think that's the one. Thank you!

------
quelltext
In the Czech example, how is it obvious that `few` stands for 2, 3, 4? Is that
just how the concept of "few" (note the English term) is defined and
understood by all Cezch speakers and thus this language specific meaning is
encoded by Mozilla to map to the range 2-4?

My point is that while there might be a concept of "few" that does map
uniquely to that range, I am not sure naming the keyword "few" is the right
name for this.

Quite honestly it would be easier to understand if it were explicitly
referring to the range. After all these strings are provided specific to a
language anyway. As such why not encode the rules in them explicitly instead
of relying on keywords?

Or are those merely user defined abstractions to accomplish reuse? I guess it
would help, but I'm still not sure why this needs a whole new framework.

~~~
stasm
(Author of the blog post here.) Great question, thanks! Unicode defines six
categories of plural forms: zero, one, two, few, many, and other. The names of
these categories always appear in English. Unicode also maintains a collection
of all mappings of numerical rules to these categories, for all languages
supported by the CLDR. See
[http://www.unicode.org/cldr/charts/latest/supplemental/langu...](http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#cs)
for the mapping corresponding to the Czech grammar.

~~~
Zenbit_UX
Care to comment about how it might handle one bold word or a link mid
sentence?

~~~
stasm
When Fluent formats translations, it returns simple strings (in the sense of
primitive computer types). They can include markup which is parsed by a
higher-level abstraction responsible for actually showing the translations
somewhere in the UI. Take a look at
[https://github.com/projectfluent/fluent.js/wiki/DOM-
Overlays](https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays) in the
experimental fluent-dom package, and their React equivalent,
[https://github.com/projectfluent/fluent.js/wiki/React-
Overla...](https://github.com/projectfluent/fluent.js/wiki/React-Overlays).

------
Vinnl
Handing control of inflections etc. over to the translator rather than the
developer is one of those great ideas that make so much sense when you first
see them, that you start to wonder why we didn't do this before.

Great work by Mozilla; it's clear there's a lot of experience in the
organisation feeding into the design of this system, and it's great that
they're sharing it with the world.

------
ktpsns
Isn't it amazing that even with these "apparently solved and very basic"
problems like i18n, there are still so many low-hanging fruits, and an open
source project can do better than many companies.

I'm German and I disabled spell checking almost everywhere, because most
implementations are extremely poor in German. Word lists are a poor solution
to capture different word forms, and I find it surprising that even in 2019,
only very few programs get that right (for instance Microsoft Word; it
understands some but not all grammar rules). This is another thing where I
think a modern (OSS) spell checker could make a difference.

~~~
pluma
I guess my biggest pet peeve with German spellcheckers and autocomplete
solutions other than the nonexistent support for compound words is that most
of them don't understand capitalisation rules.

~~~
wongarsu
Not just spellcheckers, virtual keyboards too. Writing German on a smartphone
can be slightly infuriating

------
megous
Re: genedered pluralized example in the article. How will the system deal with
the fact, that in some languages (Czech, too):

    
    
        ($count -> Jana added {n} {apples|apple}) ($gender -> to {his|her} profile)
    

the $gender will affect what form the word "added" should take. You're
suddenly dealing with possibilities($count)*possibilities($gender) variants of
the sentence

~~~
tobr
Wouldn’t you just wrap that word in another selector?

~~~
megous
Will the system still be sanely usable for translators?

~~~
60654
Yeah this is where these things fall apart. I haven't used Mozilla's Fluent
but I used a _very_ similar closed source system at another company some years
back. Some failure modes:

\- Gender agreement is not trivial. In French, in "Mary bought it" the verb
needs to match the gender and number of the _object_ not the subject: "Marie
l'a acheté" vs "Marie l'a achetée" vs "Marie les a achetés" depending on the
gender/number of the "it" object. But in most other cases the verb needs to
match the _subject_ in gender and number, in Polish "Maria kupila" vs "Stas
kupil" vs "Oni kupili".

\- In many languages nouns need to agree in case, gender, and number with the
phrase they're in, even in English we see this with pronouns: "this is he" vs
"this is his".

\- And not to mention number agreement between pronoun and either subject or
object, depending on context: "this is the button" vs "these are the buttons"
\- but also "this hovers over the button" vs "these hover over the button"
etc. Pronouns in general are a world of hurt, as are copulae (is/are/etc)

\- So once we want more complex sentences, simple word tagging like $gender
becomes insufficient, because now there's multi-party agreement to worry
about, we have to worry about $gender_subject and $object_gender_number_case,
etc.

This becomes completely untenable for all but the most technical translators.
Maybe those are easy to find for a world famous project like Firefox.
Unfortunately, not so for a run of the mill commercial project.

------
revskill
This is how to pass variables to localized React Component. The API is
elegant. Bravo to the team !

[https://github.com/projectfluent/fluent.js/blob/master/fluen...](https://github.com/projectfluent/fluent.js/blob/master/fluent-
react/examples/text-input/src/App.js#L29)

------
ajuc
Does it handle the "x of y" in Slavic languages correctly?

For example in Polish:

    
    
        "Page 3 of 4" is "Strona 3 z 4"
        "Page 3 of 100" is "Strona 3 ze 100"

~~~
lifthrasiir
As far as I can tell Fluent has no builtin support for tagging a phonemic
information to messages or arguments (it only supports plural rules and number
formatting largely derived from the CLDR). You can probably specify special
cases with selectors (but it will quickly go absurd, for Polish I guess that
applies to 6, 7, 16, 17, 60..79, 100..199, 600..699 and so on?) or have an
external function.

Context: Polish preposition "z(e)" is spelled "z" if the following word starts
with s and z and alikes (problematically enough, the exact rule is not
systematic) and "ze" otherwise. Korean has a similar case with postpositions
"은(는)" and "이(가)" where the former is for words ending with a consonant and
the latter is for words ending with a vowel, and the ko-KR localization of
Firefox seems to completely ignore and/or sidestep this; the last letter is
assumed (e.g. "{-brand-name}는") or a static word is inserted (e.g. "{$user}
사용자는" instead of "{$user}은(는)").

~~~
ajuc
Funnily enough "Strona 3 z 6" and "Strona 3 z 7" sounds correct but "Strona 3
z 100" doesn't.

So I think it's only words starting with "s", not "sz" nor "si". And only 0
starts with "z" so it's not a problem (you never have "X out of 0"). So the
only special case is for 100-199, 100 000-199 999, etc.

~~~
lifthrasiir
Yeah, it is clear that I don't speak Polish ;-) What is a common workaround
there? "Strona X z(e) Y"?

~~~
ajuc
> What is a common workaround there?

Ignoring the issue altogether :) Or, if you're pedantic - implementing the
special cases in the source code. But that's unmaintenable if you have lots of
languages.

------
m12k
This look really good! As someone in ruby-land I wish there was some easy way
to get notified if/when a fluent-ruby gem becomes available.

------
santialbo
Looking at the examples, it looks like something like messageformat
([https://messageformat.github.io/messageformat/](https://messageformat.github.io/messageformat/))
would have been a good solution to them. We've been using this at my job and
we are very happy with the flexibility it provides. The hard part comes when a
third party has to do the translations because someone from tech needs to be
involved.

~~~
Schoolmeister
From the article: "Many key ideas in Fluent have also been inspired by XLIFF
and ICU’s MessageFormat."

~~~
santialbo
Thank you for pointing out. I found a comparison wiki article. I think this is
an improvement over messageformat. They have tackled many of the issues with
mf I didn't even know I had :P

------
revetkn
My take on how to solve the natural-sounding translation problem:
[https://www.lokalized.com/#a-more-complex-
example](https://www.lokalized.com/#a-more-complex-example)

The magic is a tiny expression language which understands plural
cardinalities, ordinals, etc. so a translator can encode all required logic in
a JSON file - the application code can be "dumb".

------
arendtio
Sounds more like a programming language for translations (than like a message
format).

------
okuryu
What's advantage over the FormatJS suite?
[https://formatjs.io/](https://formatjs.io/)

~~~
zbraniecki
Hi! FormatJS is very similar to some of our bindings, and is powered by
MessageFormat on the lower level.

Here's our take on the differences between MessageFormat and Fluent -
[https://github.com/projectfluent/fluent/wiki/Fluent-and-
ICU-...](https://github.com/projectfluent/fluent/wiki/Fluent-and-ICU-
MessageFormat)

------
indentit
it'd be great to see syntax highlighting for FTL files in the popular text
editors, but I guess they would have to be unofficial unless a member of
Mozilla/Fluent team wants to maintain it...

~~~
indentit
I see that the playground has syntax highlighting and uses `ace`, whose syntax
definitions are defined in JavaScript [0] and look fairly usable. I guess it
could even be converted into a `.sublime-syntax` file without too much trouble
:)

[0]:
[https://github.com/projectfluent/play/blob/a4f49a4a7eeb93535...](https://github.com/projectfluent/play/blob/a4f49a4a7eeb93535a6243be81c7c2a2b553c904/client/editor-
mode-fluent.js#L19-L161)

------
fxfan
I joke that prima face fluent is to gettext/other i18n frameworks that rust is
to c/c++ , well designed and thoughtful.

------
progx
Hope someday somebody invent .pot files, plurals and gettext.

~~~
pluma
No need to be sarcastic. The documentation explains the advantages over
gettext.

Most notable being that using the source language string as identifier a)
discourages changes (and improvements) to the source language strings and b)
makes it hard to handle strings that appear the same in the source language
but need different translations.

