There is a new checkbox in the legacy control panel under Region / Administrative / Change system locale which says something like: "Beta: use UTF-8 to support global languages" (I have the German version so I'm not sure what the English label is.)
I really wonder what that does. This can't affect the Win32 wide API e.g. GetWindowTextW will still return wide characters (UTF-16 / sometimes UCS-2). Probably this sets the system codepage by default to 65001, so that if you request a string "in codepage format" it will return UTF-8.
I hope that this means that Microsoft finally made A functions (like GetWindowTextA) be able to work directly with UTF-8. If this is the case and it is possible for a process to say "give me UTF-8 regardless of global codepage settings" then this can help a lot with portability since every other C/C++ GUI uses UTF-8.
It is kind of way too late, but better late than never i suppose.
Wow. I think that implies all the ANSI Windows APIs will be able to use UTF-8 as well. Native UTF-8 support is something that Windows developers have wanted for years. I thought it was impossible because of some API limitation where it assumed multibyte characters could only be three bytes long, but maybe this assumption has been removed.
EDIT: So I just tested this. It does seem to affect the ANSI Windows APIs: https://0x0.st/siCS.png
Unfortunately, this doesn't mean the average Win32 program will be able to start using UTF-8 internally, since the ANSI codepage is a system-wide setting, so you won't be able to opt-in to UTF-8 per-process (at least, I don't think this is possible.)
This would be really neat. Thing is, as far as I remember, notepad.exe uses a standard Windows control to display text, so that change would have to be system-wide as well, which could break backwards compatibility.
Probably not. It might just make applications that never bothered to use the Unicode Windows APIs to support Unicode. And there's still a lot of them. In the past you could use CP65001 in some places to get sort-of UTF-8 support, e.g. in batch files. However, there have been quite a few problems and bus with that approach. Maybe they've just fixed those and added an option to use CP65001 ad the legacy codepage.
Of course, this will probably break VB6 apps for example and maybe even VBA. Ideally this should have been done when NT 3.5 was released in 1994, but then it might have been limited to 3 byte UTF-8.
I really wonder what that does. This can't affect the Win32 wide API e.g. GetWindowTextW will still return wide characters (UTF-16 / sometimes UCS-2). Probably this sets the system codepage by default to 65001, so that if you request a string "in codepage format" it will return UTF-8.