I have used this and it is a welcome advance on the older techniques such as gettext.
Just be aware that not all the implementations have all the functionality, for example, i learnt the rust implementation does not yet support the date/time formatting.
In short, I much prefer having the original (English) in the source code as it makes it easier to maintain and removes level of indirection. Using message identifiers adds boilerplate.
I've never worked on a localized project but Fluent's explanation for why they use identifiers makes a lot of sense to me. I didnt see anything in your blog post addressing their issues with English source identifiers.
> The most important difference between gettext and Fluent is the choice of a message identifier. Gettext approaches the problem by taking the source string (often English). While the choice seem simple, it has long standing consequences in form of two limitations that this choice imposes.
First of all, it means that any change to the source string invalidate all translations of the string. This severely increases the burden on the developers to never alter messages in the source language as it results in all translations having to be updated.
> Secondly, it makes it harder to introduce multiple messages with the same source string which should be translated differently ...
> Fluent establishes a social contract between the developer and localizers. The developer introduces a unique identifier and provides a set of variables such as number of unread emails or the name of the user, and localizers are using Fluent syntax features to construct the best possible translation for that identifier.
Web2py solves the 2nd problem, by stripping everything after the last hash sign (#) if using the original string, but still using the entire string to look up translations. It’s a joy to use.
Technically, it also solves the first problem - because there is also a “no translation” translation, and you can make the changes there instead of in the source code - no worse than Fluent in terms of work - but a lot more confusing down the line, so I wouldn’t refer to it as a solution.
I’ve never worked on localization either, but I do spend a lot of time reading other people’s source code, and I always find it mildly annoying when a project uses identifiers because it means I have to grep twice: once for the message (to find the identifier) and once for the identifier (to find the code that uses it). In projects that embed messages directly, I only have to grep once.
I’ve wondered about the possibility of having the best of both worlds: have the source code contain the full English message (and no other identifier), but use a tool that automatically assigns identifiers to source code locations that contain messages. Different locations would get different identifiers even if they use the same string. The tool would have to be history-aware, in order to keep the identifier the same if the code is moved or if the message is edited. It would have to use heuristics to differentiate between “message was edited” and “message was deleted, and an unrelated message was added in roughly the same place”. But in practice this would usually be easy to do.
…Or you could simplify things drastically by having the source code contain both the English message and a unique identifier (perhaps just a number rather than anything descriptive). That’s less fun though.
The latter is roughly what we do on the project I work on:
English string directly in the source, plus a human-readable identifier that helps translators understand the context of the string they’re translating. One script to extract all strings, and another to load all translations.
Message identifiers force you to confront the fact that two messages which sound the same in English might not sound the same in other languages.
I used an app that used Gettext for translation and had this problem. The string "banned" appeared in two places, once as a filter (to only show online / moderator / banned users) and once as a status indicator on the list of users (to indicate that a given user is banned). In English, the word "banned" is the same regardless of whether you're referring to a single user or a group of users, but that isn' true in other languages.
If you're developing with Gettext, you usually get this wrong and then translators have all sorts of issues. If you use something like Fluent, the natural thing to do is to give the first string an identifier like `user_list_filter_banned` and the second an identifier of `user_status_banned`.
As far as I understand, Fluent also lets you pass seemingly extraneous data to translation strings (like the user's gender), which, in some languages, might be necessary to translate something like "%s has just sent you a new message."
Not just gender, but also supports plural rules for numbers which is great even if you have just one language (for example "no user/one user/2 users"). No more hacky string manipulation in programming language!
> In short, I much prefer having the original (English) in the source code as it makes it easier to maintain and removes level of indirection. Using message identifiers adds boilerplate.
As long as you’re fine with the original being Italian or whatever if you happen on a project which is chiefly aimed at Italians.
The string basically is the identifier, but easier to read, and less easy to mistake for something else. Many people will have a problem with Italian identifiers in code, but there obviously are Italian code bases. IMO, the identifier should be in the same language as the rest of the project.
It has its rough edges but overall it's pretty nice indeed. Nicer than using dumb property files in any case. The only rough edges are with the more advanced things you'd in any case not have access to otherwise.
We actually built a multi platform kotlin library that adapts the jvm implementation and the js implementation for project fluent. The java library we use is indeed a bit limited for some things. We've been working around some of those issues by doing our own variable processing.
Overall, it's been useful for us and we're using the same localization strings in our spring server and kotlin-js web ui.
It's not very widely used and has a few rough edges. But it works for us and it should be fairly straightforward to adapt it for e.g. Android and IOS if you need that.
A neat trick these days is to translate fluent localization files with chat gpt and asking it to preserve the structure and identifiers. Actually works. We got it to translate hundreds of strings in a few languages. The translations were good quality and we found very few issues with this. Only took a few minutes. GPT 4 understands most major and minor languages in this world.
I kind of despise the pattern where in order to support translations you have to turn every string in your program into an abomination of bespoke markup, storing strings in variables, etc..
use a delimiter that's already in your markup to identify strings, and then translate those. Like look at `<p>` boundaries and other tags that wrap plain text, and attach an attribute like data-translation-id.
That doesn't work. You need to know the gender and actual number `X` of `X skeletons` to be able to (correctly ;) translate `attacked` in many languages. To correctly translate `X skeletons`, you need to know `X` and whether they are the attacker or being attacked. Same for `Y dragons`. In the end, you do not gain much but have a _really_ complex template (for such a simple phrase).
Sure it does, you need a way to pass arguments for case/gender etc. It's not impossible to make it work, I already implemented a prototype for it, but it requires changes to how it handles references.
Even in Microsoft products, the Swedish translation is sometimes so bad that I can't understand what they are saying. The trend is that translation quality is declining. Fluent seems like a great step forward but the bigger issue is corporations are not ready to put in the work needed.
I'm going to assume you are building mostly in English, or in languages that maps simply to English (eg. With the same number of plurals). If you have 3 or more plural cases, plus gender (including neutral) you already have a multitude of cases in an if/else.
Just be aware that not all the implementations have all the functionality, for example, i learnt the rust implementation does not yet support the date/time formatting.
I recommend taking it for a spin.