Hacker News new | past | comments | ask | show | jobs | submit login
Fluent – A localization system for natural-sounding translations (projectfluent.org)
69 points by croes on June 11, 2023 | hide | past | favorite | 37 comments



I have used this and it is a welcome advance on the older techniques such as gettext.

Just be aware that not all the implementations have all the functionality, for example, i learnt the rust implementation does not yet support the date/time formatting.

I recommend taking it for a spin.


> welcome advance on the older techniques such as gettext.

I disagree. I think it depends on the use-cases. I blogged about that recently: https://slint-ui.com/blog/translation-infrastructure

In short, I much prefer having the original (English) in the source code as it makes it easier to maintain and removes level of indirection. Using message identifiers adds boilerplate.


I've never worked on a localized project but Fluent's explanation for why they use identifiers makes a lot of sense to me. I didnt see anything in your blog post addressing their issues with English source identifiers.

https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...

> The most important difference between gettext and Fluent is the choice of a message identifier. Gettext approaches the problem by taking the source string (often English). While the choice seem simple, it has long standing consequences in form of two limitations that this choice imposes.

First of all, it means that any change to the source string invalidate all translations of the string. This severely increases the burden on the developers to never alter messages in the source language as it results in all translations having to be updated.

> Secondly, it makes it harder to introduce multiple messages with the same source string which should be translated differently ...

> Fluent establishes a social contract between the developer and localizers. The developer introduces a unique identifier and provides a set of variables such as number of unread emails or the name of the user, and localizers are using Fluent syntax features to construct the best possible translation for that identifier.


Web2py solves the 2nd problem, by stripping everything after the last hash sign (#) if using the original string, but still using the entire string to look up translations. It’s a joy to use.

Technically, it also solves the first problem - because there is also a “no translation” translation, and you can make the changes there instead of in the source code - no worse than Fluent in terms of work - but a lot more confusing down the line, so I wouldn’t refer to it as a solution.


> Web2py solves the 2nd problem, by stripping everything after the last hash sign (#) if using the original string,

Sounds good.

Since it is the last one then it will also handle something like this:

    Fetch grocery item #{number} # groceries


Yes. If the string contains a #, you must append another # at the end to “escape” it.


I’ve never worked on localization either, but I do spend a lot of time reading other people’s source code, and I always find it mildly annoying when a project uses identifiers because it means I have to grep twice: once for the message (to find the identifier) and once for the identifier (to find the code that uses it). In projects that embed messages directly, I only have to grep once.

I’ve wondered about the possibility of having the best of both worlds: have the source code contain the full English message (and no other identifier), but use a tool that automatically assigns identifiers to source code locations that contain messages. Different locations would get different identifiers even if they use the same string. The tool would have to be history-aware, in order to keep the identifier the same if the code is moved or if the message is edited. It would have to use heuristics to differentiate between “message was edited” and “message was deleted, and an unrelated message was added in roughly the same place”. But in practice this would usually be easy to do.

…Or you could simplify things drastically by having the source code contain both the English message and a unique identifier (perhaps just a number rather than anything descriptive). That’s less fun though.


The latter is roughly what we do on the project I work on:

English string directly in the source, plus a human-readable identifier that helps translators understand the context of the string they’re translating. One script to extract all strings, and another to load all translations.

Works fine really.


A double grep sounds easy to automate.

FWIW I do the same thing in our code.


Exactly, thanks for highlighting this, it always really irked me about the gettext design that they were tightly coupled to the source translations.


Message identifiers force you to confront the fact that two messages which sound the same in English might not sound the same in other languages.

I used an app that used Gettext for translation and had this problem. The string "banned" appeared in two places, once as a filter (to only show online / moderator / banned users) and once as a status indicator on the list of users (to indicate that a given user is banned). In English, the word "banned" is the same regardless of whether you're referring to a single user or a group of users, but that isn' true in other languages.

If you're developing with Gettext, you usually get this wrong and then translators have all sorts of issues. If you use something like Fluent, the natural thing to do is to give the first string an identifier like `user_list_filter_banned` and the second an identifier of `user_status_banned`.

As far as I understand, Fluent also lets you pass seemingly extraneous data to translation strings (like the user's gender), which, in some languages, might be necessary to translate something like "%s has just sent you a new message."


Not just gender, but also supports plural rules for numbers which is great even if you have just one language (for example "no user/one user/2 users"). No more hacky string manipulation in programming language!


I haven't tried Fluent yet, but—in theory—could you not have both the message identifier and the original string in your code?

Their React example kind of points towards this: https://github.com/projectfluent/fluent.js/wiki/React-Bindin...

They provide a <Localized> component, which has an identifier and wraps a piece of markup containing the original:

  <Localized id="hello">
      <h1>Hello, world!</h1>
  </Localized>
Seems like the best of both worlds.


Per the docs the wrapped markup is to improve readability for devs, it isn't ever used. That seems like the worst of both worlds to me.


> In short, I much prefer having the original (English) in the source code as it makes it easier to maintain and removes level of indirection. Using message identifiers adds boilerplate.

As long as you’re fine with the original being Italian or whatever if you happen on a project which is chiefly aimed at Italians.


The string basically is the identifier, but easier to read, and less easy to mistake for something else. Many people will have a problem with Italian identifiers in code, but there obviously are Italian code bases. IMO, the identifier should be in the same language as the rest of the project.


It has its rough edges but overall it's pretty nice indeed. Nicer than using dumb property files in any case. The only rough edges are with the more advanced things you'd in any case not have access to otherwise.

We actually built a multi platform kotlin library that adapts the jvm implementation and the js implementation for project fluent. The java library we use is indeed a bit limited for some things. We've been working around some of those issues by doing our own variable processing.

Overall, it's been useful for us and we're using the same localization strings in our spring server and kotlin-js web ui.

https://github.com/formation-res/fluent-kotlin

It's not very widely used and has a few rough edges. But it works for us and it should be fairly straightforward to adapt it for e.g. Android and IOS if you need that.

A neat trick these days is to translate fluent localization files with chat gpt and asking it to preserve the structure and identifiers. Actually works. We got it to translate hundreds of strings in a few languages. The translations were good quality and we found very few issues with this. Only took a few minutes. GPT 4 understands most major and minor languages in this world.


It's going to be hard to have a date/time format when no standardized date/time lib.


This one from landing page is not correct:

  tabs-close-warning: Zostanie zamkniętych ⁨1⁩ kart. Czy chcesz kontynuować?
should be:

  tabs-close-warning Karta zostanie zamknięta. Czy chcesz kontynuować?


In the real code, this message is only shown when closing more than one tab.


I kind of despise the pattern where in order to support translations you have to turn every string in your program into an abomination of bespoke markup, storing strings in variables, etc..


Do you have any alternative in mind? Assuming we want the user interface to have grammatically correct language.


use a delimiter that's already in your markup to identify strings, and then translate those. Like look at `<p>` boundaries and other tags that wrap plain text, and attach an attribute like data-translation-id.


It's one of Mozilla projects.

I ported fluent.rs to C# after current FluentDotNet Ftl implementation was abandoned.

Interesting format, very simple compared to MessageFormat2. A bit too simple. To be honest.


> A bit too simple

Why do you say that -- are there particular capabilities it is lacking?


Dynamic message references.

Like say you want to make message like X skeleton(s) attacked Y dragon(s) generic over type of attacker/defender.

Currently only option is to duplicate it or deal with it in program.


That doesn't work. You need to know the gender and actual number `X` of `X skeletons` to be able to (correctly ;) translate `attacked` in many languages. To correctly translate `X skeletons`, you need to know `X` and whether they are the attacker or being attacked. Same for `Y dragons`. In the end, you do not gain much but have a _really_ complex template (for such a simple phrase).


Sure it does, you need a way to pass arguments for case/gender etc. It's not impossible to make it work, I already implemented a prototype for it, but it requires changes to how it handles references.

See https://github.com/projectfluent/fluent/issues/80 for more details.


See also https://news.ycombinator.com/item?id=16763092 (5 years ago, 10 comments)


tl;dr of that thread:

> I don’t get it. Where’s the localizations [read: non-English]?

> One of the authors: haha, there’s only English so far because it’s not localized yet :)


Seems that this is implemented [1] in:

- JavaScript

- Rust

- Python

[1] https://github.com/projectfluent/fluent/wiki


There are unofficial implementations

- C# https://github.com/Ygg01/Linguini


The idea is nice, but i can't remember that i need such complex translation in any app. The few cases could add with a simple if-else-switch.


Are you speaking on behalf of the user or the developer here? And, as a native English speaker or not?

I have definitely seen my fair share of quirky texts in Danish UI's. It's not the end of the world but something that can clearly be improved.


Even in Microsoft products, the Swedish translation is sometimes so bad that I can't understand what they are saying. The trend is that translation quality is declining. Fluent seems like a great step forward but the bigger issue is corporations are not ready to put in the work needed.


I'm going to assume you are building mostly in English, or in languages that maps simply to English (eg. With the same number of plurals). If you have 3 or more plural cases, plus gender (including neutral) you already have a multitude of cases in an if/else.


I'm a Croatian developer that develops apps with multilingual interfaces. Fluent has been great for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: