> Everywhere[1] but in Windows UTF-8 is being used. Further popular counterexamp...

new_realist · on Aug 21, 2019

All decisions made 20 years ago. The future is UTF-8.

wolfgke · on Aug 21, 2019

> All decisions made 20 years ago. The future is UTF-8.

As you can see in the examples, they are all (except perhaps the Joilet file system) still very actively used. I see no reason that they will disappear soon.

Also the decision to use UTF-8 under GNU/Linux is a decision that was similarly made about 20 years ago.

naikrovek · on Aug 21, 2019

honestly the future is probably UTF-32. just make all code points 32-bits and use the 11 bits that Unicode says it will never use for flags so things like formatting can be encoded per character.

targonca · on Aug 22, 2019

That's... a very bad idea.

First of all, there's graphemes are inherently not one-to-one with code points, e.g. Á = A + `. There's simply no Unicode encoding that will let you safely index into an array without paying attention to the meaning of the underlying codepoints. (and no, using NFC won't solve this either, because there are combinations for which there's no composed equivalent)

Secondly, general formatting info won't fit into 11 bits (italic, bold, underline, strikethrough - that's already 4 bits, and we haven't talked about color, font weights other than bold, etc.), so why bother baking in a limited, intentionally gimped version into your character encoding?

naikrovek · on Aug 23, 2019

It doesn't have to be formatting...

It is not a "very bad idea" it is "an idea you do not like." Those are different things.

The way you're describing UTF-32, it can't work at all, and it definitely does.

Trying to save space by using UTF-8 over UTF-32 seems like a very small gain to me, is all. UTF-32 is simpler, for text created in that encoding.

targonca · on Aug 23, 2019

There are tons of resources online about why UTF-32 doesn't make sense. I'm not gonna repeat them. Do your own research.

https://news.ycombinator.com/item?id=8195827

https://softwareengineering.stackexchange.com/questions/2361...

https://en.wikipedia.org/wiki/UTF-32#Analysis

http://utf8everywhere.org/#myths