That explains why gpt-oss wasn't working anywhere near as well for me as other similarly and smaller sized models. gemma3 27b, 12b, and phi4 (14b?) all significantly outperformed it when transforming unstructured data to structured data.
UTF-8 contributors are some of our modern day unsung heroes. The design is brilliant but the dedication to encode every single way humans communicate via text into a single standard, and succeed at it, is truly on another level.
Most other standards just do the xkcd thing: "now there's 15 competing standards"