It seems like most of these are handled by just rejecting invalid UTF-8 byte seq... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ninkendo 1 day ago \| parent \| context \| favorite \| on: RFC 9839 and Bad Unicode It seems like most of these are handled by just rejecting invalid UTF-8 byte sequences (ideally, erroring out altogether) when interpreting a string as UTF-8. I mean, unpaired surrogates, or any surrogate for that matter, is already illegal as a UTF-8 byte sequence. Any competent language that uses UTF-8 for strings should already be returning errors when given such sequences. The list of code points which are problematic (non-printing, etc) are IMO much more useful and nontrivial. But it’d be useful to treat those as a separate concept from plain-old illegal UTF-8 byte sequences.

doug_durham 1 day ago [–]

That seems reasonable. It should be up to the application implementer to make that choice and not a lower level more general purpose library. I haven't run into any JSON parsers for usernames only code.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact