in my opinion utf8 should have been a bigger variable length encoding, today it ...

lifthrasiir · on Oct 28, 2021

First I'd like to introduce you to this: http://ucsx.org/

But no, there is no particular reason to introduce a longer encoding than the modern UTF-8 (which is actually shortened from the original one-to-six-byte encoding). The current set of 1,114,112 Unicode characters is sufficient for at least the foreseeable future, because any new assignment requires a demonstrable historic or current use. (Emojis are slightly different, but they still require that the underlying concept is widespread and do not significantly overlap with existing emojis. See [1].) Han characters are the largest source of new assignments to this date and they are yet to reach two out of 17 full planes (that would equate to 131K characters).

[1] https://news.ycombinator.com/item?id=26904980