
Localizing “Papers, Please” (2014) - wglass
http://dukope.tumblr.com/post/83177288060/localizing-papers-please-papers-please-was
======
teej
For those who aren't aware: Papers, Please is a video game about paperwork.
You play a border agent in a fictional eastern bloc country, checking
passports, visas, and work permits. It's surprising and tense and incredibly
good. It's currently on sale for $3.99 on Steam, I highly recommend it.

~~~
soup10
it's a bore, but maybe politically interesting for some

~~~
teej
I would call the gameplay mundane, but that's the point. Completing a tedious
task with high accuracy and throughput while the complexity and stakes ramp
up. I found it to be a very entertaining challenge.

------
cbanek
One other interesting problem with localization involves the use of printf.
Even if you're looking up strings based on IDs in another file (which is a
good pattern), sometimes you'll need to move things around based on language.
For example, if you're doing right to left languages, you might put the number
before, or after the string, and the other way for left to right languages. So
like ("%d %s" vs "%s %d").

The way that we got around this was adding another level of indirection, and
putting printf format strings also as localized data.

~~~
int_19h
The format strings have to be localized data in any case, because they usually
contain literal text, not just placeholders. The real problem here is that you
need to change the order of arguments in a printf call - if the string changes
from %s%d to %d%s, the order of arguments in the call must change, as well.

If you're on POSIX, you can use positional arguments for that:

    
    
       printf("%1$d %2$s", d, s);
       printf("%2$s %1$d", d, s);
    

Because it's not standard C, VC++ does not support it directly in printf, but
it offers _printf_p with such support, and you can always #define printf
_printf_p.

~~~
jwilk
Or you can use GNU Gettext, which provides featureful replacements for
printf() functions.

~~~
dotancohen
I came to say this. Gettext _is_ the right answer, this is a solved problem.

------
unsigner
Don't ever use the original string as key in the localization table. That will
force you to translate "high" difficulty the same as "high" resolution, for
example.

~~~
hueving
When translating a fixed set of messages, you translate the entire phrase, not
individual words. So the keys would be "high difficulty" and "high
resolution".

~~~
ptaipale
And even the same phrase may get a different translation at a different place,
depending on context.

~~~
hueving
Good point. My only experience with translation is with logging messages in
systems where everything is contained within a single statement.

------
tschwimmer
Awesome article. I'm always impressed by the distance people will go for their
passion. Lucas talks about ultimately having to hand draw Cyrillic versions of
_each_ of the game's ten fonts. Very cool!

~~~
Markoff
how do you know it's passion when he provided more languages option only after
his PAID game became successful? i think there is different word for it in
English

~~~
raverbashing
When he's still doing games when a regular job would make him more money

~~~
Markoff
its called extra income when you do it for money

~~~
raverbashing
Translators are not free, neither is personal time

------
Animats
If you haven't seen the trailer, it's worth watching.[1]

Glory to Artstozka!

[1]
[https://www.youtube.com/watch?v=_QP5X6fcukM](https://www.youtube.com/watch?v=_QP5X6fcukM)

------
mproud
This should be amended as (2014).

------
revetkn
Localizing well has a lot of complexity - gender, cardinal, ordinal, etc.
rules, and then how to combine them with locale-specific special cases (e.g.
in Spanish, a 15-year-old birthday girl is a quinceañera)

I am attempting to solve this with a small library that offers full CLDR
coverage and a special expression language.

See [https://www.lokalized.com](https://www.lokalized.com)

Currently for Java 8 but am porting to JS and Python (probably Swift after
those)

------
jdonaldson
Haxe really shines at converting compile-time assets into static types. The
other related trick is to use json as a config object, and access the fully
typed equivalent as a static instance within your code. It's also possible to
do this with database queries.

I realize other languages provide support for this, but in my experience with
Haxe it's way easier to implement something custom. The macro translation
layer for manipulating the AST is flexible and speedy, and the compiler is
wired directly into autocompletion requests. There's very little impedance
between my fingertips and the desired outcome.

------
surgi
Loosely related to the title: Why not create a complete modular version not
only localised, but also tied to individual country's flows and processes? So
it could serve as an education material. (mind:blown)

~~~
breakingcups
The game is about a fictional country though. Why should the localized
versions be about a real country?

------
mattmanser
Having just done some l10n for a client, the thing that annoys me is how even
the most powerful editors, such as VS, have such awful tools for l10n. .Net's
actual i18n support is pretty good overall, but the editor support is bad.

I literally had to build my own. With 2,500 different strings for a total of
10,000 words I wouldn't even consider our application even that big, it must
be a nightmare in bigger projects. We haven't even done the sales site yet
because the product's being upsold through a partner.

We came up with our own id naming system, then created an xlsx/resx
importer/exporter that uploaded to GSheets to allow us to share files with
translators. The ids and comments fields allowed us to add extra meta data, to
split the strings into logical sections and sheets and order them properly. Be
able to add links to the page that section of translations are on so the
translator could see the context. This then additionally allowed us to
highlight if a translator had missed any lines when we re-imported it, add
their own questions/comments, etc. Also, as we were using sendwithus, we used
the importer/exporter to allow us to import pot files from them to keep
everything in one place.

Then to support those tools, I created a tool to search for phrases used
before, find out the ordering from the meta data, quickly copy ids of strings
we want to re-use, see missing spreadsheet tabs.

Programmatically, we had to add support for automatically translating enums
into strings (think project status for example), add l10n to our audit logs so
customers could see their audits in the correct language and we'd see them in
English, modify how .Net did l10n of dates because their built in one is
really odd with en-GB which is where we are based (shortdate is Jan 01 2025 in
en-US but inexplicably 01 January 2025 in en-GB and all sorts of other
oddities).

Then we used a modified version of pseudoizer (thanks John Robbins + Scott
Hanselman![1]) to allow us to easily see untranslated strings while we went
through the whole site without having a finished translation (we used ja-JP
instead of Polish to really see the differences in date strings, currency,
etc.). We ended up modifying it because it goes a bit mental with adding !!!!
for things like tabs.

Probably spent a week on those tools, but boy was it worth it.

I've not tried intellij's l10n support, maybe it's better, but VS's is very
lacklustre.

[1][https://www.hanselman.com/blog/GlobalizationInternationaliza...](https://www.hanselman.com/blog/GlobalizationInternationalizationAndLocalizationInASPNETMVC3JavaScriptAndJQueryPart1.aspx)

------
eropple
I would recommend against one's own XML format and _doubly_ against CSV/some
homegrown delimited format. Instead, consider something like Excel 2003 XML
(one of the easier ones), OpenDocument (also pretty easy in many languages),
or Office OpenXML (easy in .NET, a bit harder elsewhere) to store your
translation data.

Potfiles are another option, but the tooling is pretty clunky and, in games in
particular, people don't seem particularly attuned to their use. And they're
not great for editing, though they might be for storage--when dealing with
tabular stuff, it just makes a lot of sense to use tools that present a
tabular interface. It makes life a lot easier.

~~~
microcolonel
If it's literally a table of strings, why on earth would anyone use ODF/OOXML?
CSV is perfectly fine for editing in any functioning spreadsheet software,
works reasonably well in version control (especially since a given commit
won't touch multiple columns). In his case, he's using the XML format which
Haxe will parse into compile-time-checked references right in his source code;
sounds like a great reason to use this standardized(but Haxe-specific) XML
format.

~~~
et1337
CSV has no formatting. Who wants to resize columns and set up text wrapping
every time you open the file. Could save it as .xlsx and then export to CSV,
but that's another step and it's not hard to parse simple spreadsheets in XML.
Worth it in my book, because it enables fan translators to contribute easily
since everyone has Excel.

~~~
douche
Don t use Excel to edit csvs. It makes a hash of things at best, and frigs
things up at worst.

~~~
eropple
It's not about you, though. Or me. It's about what the people working with you
are most comfortable with. I can go blurf out JSON or XML or potfiles by hand,
whatever. But normal people are going to go "er, no?" and even if you win that
argument you've lost social capital on something that didn't really matter.

------
haikuginger
This article makes me unreasonably glad to be working in a framework (Django)
with good i18n tooling and few special needs re: textual images.

~~~
raverbashing
Django solves the pluralizing issues (with some limitations) but it won't
(can't) solve gender issues in translation.

See
[https://docs.djangoproject.com/en/dev/ref/templates/builtins...](https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#pluralize)

It doesn't help when the plural is 1/2/many or something different (example:
Arabic/Icelandic, etc) [http://docs.translatehouse.org/projects/localization-
guide/e...](http://docs.translatehouse.org/projects/localization-
guide/en/latest/l10n/pluralforms.html)

~~~
haikuginger
Django provides ungettext for the pluralizing problem, and pgettext, if
implemented conscientiously, for the gender problem.

------
paines
Is the Steam version localized and available in german language?

~~~
masklinn
The second part ("less technical stuff") specifically notes that they used
Steam's private branches to beta-test the localisation and that the supported
languages are Italian, Japanese, Spanish, French, _German_ , Russian and
Brazilian Portuguese.

So yes and yes.

~~~
paines
Thank you. Didn't saw that there is a 2nd part!

------
rasmafazi
Sometimes you just have to bite the bullet. For interesting subjects, which
always have global reach, the virtual conversations are conducted in English.
There is also a place for vernacular -- it is part of people's cultural
identity -- but not in a formal knowledge setting. English is a bit like Latin
used to be: the language of knowledge, technology, and business. If the
subject has global reach, you will miss out on the interesting bits of
knowledge, simply because you are trying to do it in vernacular. Doing
anything in vernacular, will just lock you up in a small and uninteresting
national silo. Nothing of any interest is national. But yes, I use vernacular.
I also speak it with my kids, but I don't read it -- unless it is poetry or
literature -- and I don't use it in software or in business.

~~~
microcolonel
Sometimes I get i18n fatigue too. I think the world would be a better place if
everyone's languages fit in ASCII.

That said, the cat's kinda out of the bag. UTF-8 is at least well-done, and
the algorithms are widely available. I study Japanese and have started
studying Russian and Chinese; I think maybe the best way to convince people to
learn English is to walk the walk. Who knows, maybe everything will go very
wrong again before we get a chance to standardize.

I'm also working on an engineered language with a test suite/corpus maintained
alongside the language. Maybe in the ashes of the old new world there'll be
room for something like this.

~~~
Symbiote
English doesn't even fit in ASCII.

To write it properly, we need left- and right-facing single and double quotes,
diareses and accents for words like naïve, façade and café, en- and em-dashes
and the ellipsis.

Longer documents will require symbols like † and ‡, bullets and §. The
currency symbols £, €, ¢ and ₹ are used by countries where English is an
official language.

~~~
microcolonel
I can't even use symbols like that anyhow (I deal in USD, CAD, and NTD). I end
up using ISO 4217 codes everywhere. You missed ¥ as well, for which I would
use JPY or CNY.

