This seems like a very narrow case with very low risk. You would have to have UT...

TheDong · on Aug 30, 2023

> You can't fix all of the bugs, nor should you try. You have balance bug fixing with feature development.

Using that to justify not fixing specific critical bugs is silly.

A person reporting a CVE that allows arbitrary code execution is not saying "you should fix all the bugs", they're saying "This bug is important".

You can and should strive to fix all clearly-reported show-stopper security bugs. Which this is.

esrauch · on Aug 30, 2023

There's multiple issues reported, only the first one appears to be UTF-16 related and all of the reported issues trigger the issue with simply opening a malicious text file. The referenced conversion presumably happens eagerly so the editor can operate in utf8 in memory.

I think that's more severe than you suggest; it means that someone could craft malware and all it would take is get someone to view the file in notepad++ to run an exploit.

layer8 · on Aug 30, 2023

According to TFA, just opening the file is sufficient to trigger the buffer overflow: “Open the file in Notepad++ to hit out of bounds access with ASAN.”

Say, a *.txt file attached to an email, the opening of which in a text editor is usually considered benign.

inferiorhuman · on Aug 30, 2023

Ideally… yes. Consider that Windows APIs use UTF-16, not UTF-8, for wide characters. Microsoft's extensions to ISO 9660 used (big endian!) UCS-2 (a.k.a. UTF-16) for long filenames, NTFS uses proper UTF-16 I believe.

wolf550e · on Aug 30, 2023

I think NTFS is UCS-2, that is why WTF-8 was invented

https://simonsapin.github.io/wtf-8/

layer8 · on Aug 30, 2023

Surrogate pairs are interpreted by client software, so it’s UTF-16 in that sense. The file system just doesn’t ensure that there won’t be unpaired surrogates, or other noncharacters. This is similar to strings in .NET, Java, and JavaScript.

heinrich5991 · on Aug 30, 2023

Which unfortunately means you can't rely on it being UTF-16.

naniwaduni · on Sept 5, 2023

Nor should you. Even a well-formed sequence of utf-16 codepoints can be utter nonsense; there's approximately no level of abstraction between "sequence of fixed-width code units" and "run it through a full-blown a font rendering stack" where it makes sense to assume your input is "well-formed".

arthur2e5 · on Aug 30, 2023

You are right: NTFS exclusively runs on UCS-2, now "UTF-16 without validation". Back in the 1980s "Unicode" referred to a way to represent the ISO standard "Universal Character Set", i.e. UCS-2, instead of being synonymous to the entire character set like today. The wisdom of i18n back then was to use "Unicode" text encoding because it covered the known universe of writing. Qt made a similar choice.

delfinom · on Aug 30, 2023

Actually, you can manifest your windows app as UTF8 aware and all the win32 "-A" (formerly ascii only) APIs switch to handling UTF8 :3

This is a feature of Windows 10+ https://learn.microsoft.com/en-us/windows/apps/design/global...

Fulgen · on Aug 30, 2023

Or notepad++ could just use the WinAPI (WideCharToMultiByte) to convert UTF-16 to UTF-8 instead of hand writing their own conversion routine.