UTF-9 and UTF-18 (wikipedia.org)
109 points by wglb on Feb 5, 2015 | hide | past | favorite | 11 comments

China actually has researched into one of the RFC jokes http://slashdot.org/story/04/07/03/1324219/china-deploys-ipv...

It's interesting to read the April Fool's RFC1606, which was written a little over 20 years ago, and see references to light switches and lightbulbs with IP addresses.

Official gov statement, only states that:

"IP address urgent needs caused by natural question of how to introduce IPv6, although other new industry solutions IPv9 IP addresses, etc. are also emerging, but IPv6 is still by far the most practical, the most sophisticated methods" [1]

Very distinct than the "chinatechnews.com" statement [2]. Just plain poor journalism.

[1] http://www.miit.gov.cn/n11293472/n11293877/n13434815/n136009...

Translated by google translate :)

[2] http://www.chinatechnews.com/2004/07/07/1352-chinas-new-gene...

One would hope UTF-7 was also a joke, alas it is not. I had to implement support for it once. http://en.wikipedia.org/wiki/UTF-7

The worst thing about UTF-7 is that it has multiple valid representations of the same string.

I'm convinced that the primary use of UTF-7 is hiding malicious input as different characters [1], as a possible exploit against systems that support UTF-7 for no reason.

[1] http://nedbatchelder.com/blog/200704/xss_with_utf7.html

This is correct, and is why (afaik) most modern browsers no longer support it.

I originally thought that UTF-9 would be perfect for Iain M Banks's Marain language [1].

Damn that continuation bit!

[1]: http://trevor-hopkins.com/banks/a-few-notes-on-marain.html

I was hoping to read about a text format with a parity bit!

That would be a bit of an overkill now, 1 parity bit for every 8 bits of information? Also, text is a representation format, not a transfer/storage format, so I reckon it's not really the job of the character encoding to do error detection.

Yes, I know, I'm taking this way too seriously. :)

Actually, with 13 bits you could do in-character single bit error correction.

