Hacker News new | past | comments | ask | show | jobs | submit login
Windows10 Insider Preview Build 17035 Supports UTF-8 as ANSI (twitter.com/matarillo)
22 points by matarillo on Nov 16, 2017 | hide | past | favorite | 11 comments

There is a new checkbox in the legacy control panel under Region / Administrative / Change system locale which says something like: "Beta: use UTF-8 to support global languages" (I have the German version so I'm not sure what the English label is.)

I really wonder what that does. This can't affect the Win32 wide API e.g. GetWindowTextW will still return wide characters (UTF-16 / sometimes UCS-2). Probably this sets the system codepage by default to 65001, so that if you request a string "in codepage format" it will return UTF-8.

I hope that this means that Microsoft finally made A functions (like GetWindowTextA) be able to work directly with UTF-8. If this is the case and it is possible for a process to say "give me UTF-8 regardless of global codepage settings" then this can help a lot with portability since every other C/C++ GUI uses UTF-8.

It is kind of way too late, but better late than never i suppose.

> every other C/C++ GUI uses UTF-8

Qt uses UTF-16, too, AFAIK.

> The return value of GetACP() is also 65001

Wow. I think that implies all the ANSI Windows APIs will be able to use UTF-8 as well. Native UTF-8 support is something that Windows developers have wanted for years. I thought it was impossible because of some API limitation where it assumed multibyte characters could only be three bytes long, but maybe this assumption has been removed.

EDIT: So I just tested this. It does seem to affect the ANSI Windows APIs: https://0x0.st/siCS.png

Unfortunately, this doesn't mean the average Win32 program will be able to start using UTF-8 internally, since the ANSI codepage is a system-wide setting, so you won't be able to opt-in to UTF-8 per-process (at least, I don't think this is possible.)

Will notepad.exe now also support unix line endings?

This would be really neat. Thing is, as far as I remember, notepad.exe uses a standard Windows control to display text, so that change would have to be system-wide as well, which could break backwards compatibility.

This looks like just Notepad.exe supporting UTF-8 without having a byte order mark at the start of the file.

It should affect more than just Notepad. One of the replies links to more juicy details: https://srad.jp/story/17/11/14/0640253/

so will things like SQL Server output logs in UTF-8 instead of UTF-16 and UTF-16LE?

Probably not. It might just make applications that never bothered to use the Unicode Windows APIs to support Unicode. And there's still a lot of them. In the past you could use CP65001 in some places to get sort-of UTF-8 support, e.g. in batch files. However, there have been quite a few problems and bus with that approach. Maybe they've just fixed those and added an option to use CP65001 ad the legacy codepage.

Of course, this will probably break VB6 apps for example and maybe even VBA. Ideally this should have been done when NT 3.5 was released in 1994, but then it might have been limited to 3 byte UTF-8.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact