
UTF-8 History (2003) - zdw
https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
======
dang
Posted over a dozen times but with surprisingly few comments. The best I
found:

[https://news.ycombinator.com/item?id=6463466](https://news.ycombinator.com/item?id=6463466)
(2013)

[https://news.ycombinator.com/item?id=8648541](https://news.ycombinator.com/item?id=8648541)
(2014)

[https://news.ycombinator.com/item?id=15236856](https://news.ycombinator.com/item?id=15236856)
(2017)

Don't miss this great link from that last thread:
[https://www.flickr.com/photos/ajstarks/sets/7215763147079887...](https://www.flickr.com/photos/ajstarks/sets/72157631470798870)

------
ncmncm
It is tragic that this was not adopted as the Standard encoding for C++98. It
probably could have been, if the right people had known its properties, and we
might have been able to avoid standardizing wstring and all the basic_*
templates.

It still could be the Standard encoding for C++20 if the right people could be
persuaded. In practice, that would mean that whatever other execution
encodings any given Implementation supports, it must support a Standard mode
with a UTF-8 encoding assumed for any output from the program that is
interpreted as text.

It would mean that downstream programs that need to know the encoding, such as
terminal emulators in some Implementations, would need to have a way to
recognize or designate non-Standard programs and their encodings, and
interpose appropriate handling for them.

~~~
makecheck
They’d be better off providing standard free functions that can operate on any
bucket of bytes claiming to be UTF-8 (allowing operations like iterating over
complete composed sequences, creating normalization forms, erasing or
inserting based on ranges of composed character sequences, etc.). I don’t want
to be forced to create a std::string for example. The bytes might be an
incomplete stream as well, e.g. requiring processing up to the last _complete_
thing, detecting invalid sequences.

