

CSS text-transform is language-dependant: the Dutch case explained - lillycat
http://firefoxnightly.tumblr.com/post/20267585898/css-text-transform-updated-for-the-dutch-language

======
dasil003
I might be biased because I recently spent a couple days wrestling with it,
but I think the Turkish case is more interesting. There are two vowels that
use the letter I forms:

İ/i and I/ı

So the lowercase and uppercase are essentially split into two letters and then
a new uppercase and lowercase form is created based on growing or shrinking
the originals. I'm sure it seemed quite the elegant solution in the 1920s when
they were working on the latinized Turkish alphabet.

The implication of this is that unlike the Dutch case, this affects all text-
transform actions (uppercase, lowercase, and capitalize).

When we were doing our Turkish localized site I started digging into this, and
I was horrified that no browser actually supports proper Turkish
capitalization rules. Firefox in particular had a bug open since 2004 (now
finally fixed as well). Asking around with Turkish web developers I heard of
some crazy hacks (custom fonts!), but I got the feeling that Turkish web
designers just avoid text-transform. This was not an option for us as we rely
heavily on text-transform in our design (<http://tr.mubi.com>).

In the end I was able to piece together a surprisingly robust javascript
replacement method with some help from Stack Overflow:

<http://stackoverflow.com/a/8743095/8376>

~~~
lillycat
The Turkic bug was fixed a week ago and we'll also be in Firefox 14. As far as
I know Webkit still has this bug. I don't know about IE or Opera, if somebody
knows...

------
pennig
The code path for that case must be delightful.

~~~
robin_reala
Here's the diff:

<https://hg.mozilla.org/mozilla-central/rev/bb53aec4a302>

Doesn't look _too_ bad

~~~
masklinn
It's really weird that they just add special cases like that. Though I expect
it's just because they don't have enough special cases yet (went from one —
for the Turkish I — to two).

I'd have expected something like a generic Unicode-aware/y text management
layer, and CSS text transforms would just go through that layer.

~~~
mcpherrinm
The problem is that Unicode doesn't know about language. Unicode is just
characters.

Language-aware bits are more gross, but then language often is. It's not
nicely structured like most of the other things we encounter when transforming
data.

~~~
masklinn
> The problem is that Unicode doesn't know about language. Unicode is just
> characters.

I won't blame you for this, it is a common mistake, but Unicode goes far
beyond merely mapping characters to integers. The Standard Annexes, Technical
Reports and Technical Specifications cover pretty much all things localization
from line breaking [UAX14] to regular expressions [UTS18] through date and
time formatting [UTS35] or sorting [UTS10].

And as it turns out, both uppercasing and titlecasing are covered by [UAX44]
as part of the SpecialCasing.txt file which provides lower, upper and title-
casing (along with optional conditions) for characters with non-trivial
mappings (trivial 1:1 mappings are covered in the base UnicodeData.txt file)

[UAX14] <http://www.unicode.org/reports/tr14/>

[UTS18] <http://www.unicode.org/reports/tr18/>

[UTS35] <http://www.unicode.org/reports/tr35/>

[UTS10] <http://www.unicode.org/reports/tr10/>

[UAX44] <http://www.unicode.org/reports/tr44/>

------
Michiel
Of course, editors should be using the ligature (unicode character LATIN SMALL
LIGATURE IJ). But having said that: I'm Dutch and a) I have never used that
character and b) I have no idea how to write it on a keyboard.

~~~
Nvn
That's actually discussed in the bug report[1] and apparently its use is
discouraged by Unicode.

[1] <https://bugzilla.mozilla.org/show_bug.cgi?id=740477#c2>

~~~
nemoniac
Unicode has a gazillion code points and it's discouraging us from writing our
own language? Really?

~~~
Someone
<http://unicode.org/faq/ligature_digraph.html>:

 _A: The existing ligatures exist basically for compatibility and round-
tripping with non-Unicode character sets. Their use is discouraged. No more
will be encoded in any circumstances._

------
Navarr
Now we only need "text-transform: katakana" and "text-transform: hiragana" for
emphasizing Japanese text.

Of course, this can't possibly work with Kanji without some special hack
around.

------
iamgilesbowkett
there is no such word as "dependant." it's "dependent." really sorry to be
that guy but it would hugely brighten my day if you could fix the spelling in
the title.

~~~
dbuxton
Well, there is such a word, certainly in the UK.

A more constructive comment might be, "the spelling variant you have used is
normally a noun meaning 'a person who depends on another for their upkeep or
care', where here you want the more normal adjectival spelling 'dependent'."

Although fwiw it seems that even as an adjective "dependant" might fly:
<http://www.wordnik.com/words/dependant>

