> it's best to keep your text in UTF-8 and convert it to and from UTF-16 when in...

vardump · on June 11, 2019

Spot on. When coding against "raw" win32 API (or NT kernel APIs and perhaps rare native usermode NT API), using UTF-16 is the only way to keep your sanity. Converting strings back and forth between UTF-8 and UTF-16 in that kind of case is just senseless waste of CPU cycles.

One API call might take multiple strings and each conversion often means memory allocation and freeing — something you usually try to avoid as much as possible if it's something that's going to run most of the time the system is powered on.

The situation can be different in cross-platform code. In those cases, UTF-8 is a preferable abstraction.

Just don't use it for filenames. Filenames are just bags of bytes on at least on Windows (well, 16-bit WCHARs, but the idea is same) and Linux, and considering them anything else is not a great idea.

fwip · on June 11, 2019

"Too" slow depends on a lot of factors.

Const-me · on June 11, 2019

When you’re writing code that you 100% sure won’t ever become a performance bottleneck, you still care about time of development. Very often, unless it’s a throwaway code, also about cost of support.

Writing any code at all when that code is not needed is always too slow, this is regardless of any technical factors.

fwip · on June 11, 2019

Very little code in this world is needed. Much of it is, however, useful.

The person you replied to obviously isn't advocating for something they find useless.

Perhaps you could have instead asked "Why do you recommend doing this? I don't understand the benefit." But instead, you decided that they're advocating to do something useless for no reason.

Const-me · on June 11, 2019

> you decided that they're advocating to do something useless for no reason.

No, I decided they’re advocating to do something harmful for no reason.

They're advocating to waste hardware resources (as a developer I don’t like doing that), waste development time (as a manager I don’t like when developers do that). But the worst of all, UTF8 on Windows and converting to/from UTF16 at WinAPI boundary is a source of bugs, the kernel doesn’t guarantee the bytes you get from these APIs are valid UTF16, quite the opposite, it guarantees to treat them as opaque chunk of words.

UTF-8 has it’s place even on Windows, e.g. it makes sense for some network services, and even for RAM data when you know it’ll be 99% English so it saves resources, and that data never hits WinAPI. But as soon as you’re consuming WinAPI, COM, UWP, windows shell, any other native stuff, UTF-8 is just not good.