Hacker News new | past | comments | ask | show | jobs | submit login

Would you care to elaborate? I'm not claiming to know Rust, but the link I provided clearly says "[t]hese do inexpensive conversions from and to UTF-8 byte slices".



https://doc.rust-lang.org/std/ffi/struct.OsString.html has it pretty clearly: on unixes it’s a bag of bytes, on Windows it’s the modified UTF-16 they’ve got going on. There’s a trick called WTF-8 that bridges some gaps, though that’s considered an implementation detail: https://simonsapin.github.io/wtf-8/

They’re conversions because they’re not UTF-8 in the first place, that is, they’re not String/str. The conversions are as cheap as we can make them. That language is meant to talk about converting from OsString to String, not from the OS to OsString.


Why do people say "bag" instead of string/sequence/vector/array/list/etc.? Bags are multisets... they're by definition unordered. It's already a niche technical term so it's really weird to see it used differently in a technical context...


I think it feels really evocative. Like, a bag of dice or something. You can’t see what’s inside, you have no idea what’s on them. It reinforces the “no encoding” thing well.


I think it is more eloquently stated that "you shouldn't make assumptions about what's inside." Saying "you can't see what's inside" ignores the biggest cause of the conflation. Userspace tools allow you to trivially interpret the bag of bytes as a text string for the purpose of naming it for other people.


Yeah, that might be better, good point.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: