Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Concise Encoding: The friendly data format for human and machine (concise-encoding.org)
2 points by kstenerud on July 27, 2020 | hide | past | favorite | 1 comment



You might want to use encodings other than Unicode (or need to store strings containing invalid Unicode for some reason). For implementations which support this, I suggest a \x escape to escape a single byte; the \u escape would encode the data as UTF-8. Even when Unicode is used, you might want a separate escape for astral characters, rather than using surrogates (which are mostly only relevant for UTF-16 anyways, not UTF-8; it says that unpaired surrogates are not allowed, but surrogates should not be used at all in Unicode text encoded as UTF-8). This would be application-specific. Additionally, in order to simplify the implementation, you may wish to disallow non-ASCII characters entirely when outside of quoted strings, verbatim strings, and comments; the fact that Unicode characters that look similar are disallowed is the problem, and can be solved by disallowing all of them in contexts where any of them are disallowed. Also, comments should not be required to be valid UTF-8 (although null bytes should still be disallowed). Furthermore, uppercase hexadecimal should be allowed. Additionally, I believe the abbreviation for local time is supposed to be "J" and not "L". Years should perhaps use astronomical year numbering; zero represents 1 BC. And, another thing, is that in some programming languages, the string "2000" might clash with the number 2000 as a key, while in others it might not (actually, the same is true with integer vs float), so I believe it is worth noting that. Some implementations might treat UUIDs as URIs (I suppose any one that does will need to convert it back to UUID when writing it out). Additionally, it says that the programming language must match a name from the Linquist languages list; however, some people may use programming languages not listed there, and although PostScript is listed, it is listed as "markup" rather than "programming", even though it is actually a programming language too (and not a markup language). Other than that, I think that it look like good to me.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: