Hacker News new | past | comments | ask | show | jobs | submit login

This seems like a very narrow case with very low risk. You would have to have UTF-16 source code of unknown providence that you decide to load up and convert to UTF-8, and that source code would have to have some hidden exploit it. How likely is this scenario, I would say close to zero.

You can't fix all of the bugs, nor should you try. You have balance bug fixing with feature development.




> You can't fix all of the bugs, nor should you try. You have balance bug fixing with feature development.

Using that to justify not fixing specific critical bugs is silly.

A person reporting a CVE that allows arbitrary code execution is not saying "you should fix all the bugs", they're saying "This bug is important".

You can and should strive to fix all clearly-reported show-stopper security bugs. Which this is.


There's multiple issues reported, only the first one appears to be UTF-16 related and all of the reported issues trigger the issue with simply opening a malicious text file. The referenced conversion presumably happens eagerly so the editor can operate in utf8 in memory.

I think that's more severe than you suggest; it means that someone could craft malware and all it would take is get someone to view the file in notepad++ to run an exploit.


According to TFA, just opening the file is sufficient to trigger the buffer overflow: “Open the file in Notepad++ to hit out of bounds access with ASAN.”

Say, a *.txt file attached to an email, the opening of which in a text editor is usually considered benign.


Ideally… yes. Consider that Windows APIs use UTF-16, not UTF-8, for wide characters. Microsoft's extensions to ISO 9660 used (big endian!) UCS-2 (a.k.a. UTF-16) for long filenames, NTFS uses proper UTF-16 I believe.


I think NTFS is UCS-2, that is why WTF-8 was invented

https://simonsapin.github.io/wtf-8/


Surrogate pairs are interpreted by client software, so it’s UTF-16 in that sense. The file system just doesn’t ensure that there won’t be unpaired surrogates, or other noncharacters. This is similar to strings in .NET, Java, and JavaScript.


Which unfortunately means you can't rely on it being UTF-16.


Nor should you. Even a well-formed sequence of utf-16 codepoints can be utter nonsense; there's approximately no level of abstraction between "sequence of fixed-width code units" and "run it through a full-blown a font rendering stack" where it makes sense to assume your input is "well-formed".


You are right: NTFS exclusively runs on UCS-2, now "UTF-16 without validation". Back in the 1980s "Unicode" referred to a way to represent the ISO standard "Universal Character Set", i.e. UCS-2, instead of being synonymous to the entire character set like today. The wisdom of i18n back then was to use "Unicode" text encoding because it covered the known universe of writing. Qt made a similar choice.


Actually, you can manifest your windows app as UTF8 aware and all the win32 "-A" (formerly ascii only) APIs switch to handling UTF8 :3

This is a feature of Windows 10+ https://learn.microsoft.com/en-us/windows/apps/design/global...


Or notepad++ could just use the WinAPI (WideCharToMultiByte) to convert UTF-16 to UTF-8 instead of hand writing their own conversion routine.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: