
Show HN: Concise Encoding: The friendly data format for human and machine - kstenerud
https://concise-encoding.org/
======
zzo38computer
You might want to use encodings other than Unicode (or need to store strings
containing invalid Unicode for some reason). For implementations which support
this, I suggest a \x escape to escape a single byte; the \u escape would
encode the data as UTF-8. Even when Unicode is used, you might want a separate
escape for astral characters, rather than using surrogates (which are mostly
only relevant for UTF-16 anyways, not UTF-8; it says that unpaired surrogates
are not allowed, but surrogates should not be used at all in Unicode text
encoded as UTF-8). This would be application-specific. Additionally, in order
to simplify the implementation, you may wish to disallow non-ASCII characters
entirely when outside of quoted strings, verbatim strings, and comments; the
fact that Unicode characters that look similar are disallowed is the problem,
and can be solved by disallowing all of them in contexts where any of them are
disallowed. Also, comments should not be required to be valid UTF-8 (although
null bytes should still be disallowed). Furthermore, uppercase hexadecimal
should be allowed. Additionally, I believe the abbreviation for local time is
supposed to be "J" and not "L". Years should perhaps use astronomical year
numbering; zero represents 1 BC. And, another thing, is that in some
programming languages, the string "2000" might clash with the number 2000 as a
key, while in others it might not (actually, the same is true with integer vs
float), so I believe it is worth noting that. Some implementations might treat
UUIDs as URIs (I suppose any one that does will need to convert it back to
UUID when writing it out). Additionally, it says that the programming language
must match a name from the Linquist languages list; however, some people may
use programming languages not listed there, and although PostScript is listed,
it is listed as "markup" rather than "programming", even though it is actually
a programming language too (and not a markup language). Other than that, I
think that it look like good to me.

