Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To be extra pedantic, it's 32k code units, so some characters take up two.


Pretty sure all of Microsoft's own stuff is using UTF-16, so all characters take two bytes, regardless.


Indeed. Although technically it's UCS-2 as the low level stuff doesn't understand modern unicode; it just treats it all as a stream of wide chars. It's up to applications to interpret it.

According to this[0] the path limit is nearer 32739*2 bytes as a drive like "C:" gets expanded to "\Device\HarddiskVolume1".

[0] https://stackoverflow.com/questions/15262110/what-happens-in...


It's 64 kilobytes, more or less. Some characters are 2 bytes and some are 4. Also you can use invalid unicode.

And when first implemented it was UCS-2.


Only characters from the Basic Multilingual Plane. If you put an emoji in a path, for example (allowed), it will be 4 bytes.


Except if it's a compound emoji, which takes multiple codepoints -> more than 4 bytes. So really the takeaway is that "character" is highly ambiguous nowadays and shouldn't be used anymore, use one of these instead: * UTF-n code unit * Unicode code point * Grapheme Cluster * Extended Grapheme Cluster For the latter one also needs to specify the version of Unicode they're talking about, since every new compound emojis changes the definition of "Extended Grapheme Cluster".




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: