Hacker News new | comments | show | ask | jobs | submit login

.NET uses UCS-2 because the Windows API uses UCS-2 (so when you use Visual Studio out of the box, you will get UCS-2). ECMAScript (JS) uses UCS-2 because that's all there was when the spec was written.

Other scripting languages I know for certain are

- PHP doesn't care and treats strings as arrays of bytes. All the str functions operate on these byte arrays and thus happily destroy your strings if they are encoded as anything but the old 8-bit encodings. If you need to support utf-8, you have to use different library functions (mb_*) and a special syntax in their regex support (/u modifier).

- Python < 3 treats strings as byte arrays or UCS-2 depending on whether you use the byte type or the Unicode type. As such, it has all the same issues as all other UCS-2 libraries

- Ruby < 1.9 treats strings a byte arrays. There is some limited UTF-8 support, but it's in additional libraries. The internal API is treating strings as byte arrays. Ruby >= 1.9 lets you chose your internal encoding. Most people use utf-8, but you don't have to.

- Perl I don't know enough about, but I hear it as an UTF-8 mode that is actually well-supported by the language itself and gets almost everything right.

These are the more common scripting languages.

Of the compiled languages, I know for certain about Go (utf-8; good library support), C (OS dependent, but the standard string API treats strings as byte arrays), C++ (dito) and Delphi (UCS-2 since 2010, byte arrays before that)

I would say that there are so many exceptions to the UTF-8 rule that I wouldn't say "most" languages are using UTF-8.




> - Python < 3 treats strings as byte arrays or UCS-2 depending on whether you use the byte type or the Unicode type. As such, it has all the same issues as all other UCS-2 libraries

It's Python < 3.3 (the Flexible String Represrntation was introduced in 3.3), there's a byte array type (str in P2, bytes in P3) and a string type (unicode/str), which may be UCS2 ("narrow" builds, the default) or UCS4 ("wide" builds, set by many linux distros)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: