On “Trojan Source” Attacks

VikingCoder · on Nov 2, 2021

I was hit by a similar bug back in Microsoft Visual Studio 6. The source code at the company I was at had mixed CR/LF throughout (switching from IRIX to Windows, and some stubborn folks continuing to use vi/emacs). And in this one instance, the MSVS6 Editor (and syntax highlighting) interpreted a comment followed by a line of code... but the MSVS6 compiler interpreted as a comment, followed by more characters on the same comment. Unfortunately the comment and line of code were something like this:

// don't forget to skip formatting the hard drive format_hard_drive = false;

So, yes, the code in question deleted the user's hard drive. And by looking at the code in Visual Studio, the editor, you'd swear the code was correct.

(The real logic was not that blatantly dangerous, but it amounted to the same thing.)

jackcviers3 · on Nov 2, 2021

Way to blame a bug in your editor on someone else's.

Emacs autoconverts between the two line ending styles at open and close, unless you expressly tell it to do otherwise [1]. You had malicious editors using emacs. The editor wasn't the problem, the people using it were and they had to go out of their way to manage it.

1. https://www.emacswiki.org/emacs/EndOfLineTips#:~:text=(The%2....

VikingCoder · on Nov 2, 2021

The editor showed the behavior that we all intended. The compiler was the one that behaved differently from how we all expected. And we all used the same compiler. I was not "blaming someone else's editor," I was describing the situation. If someone created a file in emacs, and then someone else edited it in MSVS6, then apparently you could end up with some mixed CR, CR/LF, LF, LF/CR state, enough to confuse the compiler.

PS - We all benefit when we give each other the benefit of the doubt, and hold off on the snark. And calling my co-workers "malicious" based on your limited understanding of the situation is not appreciated.

gnufx · on Nov 2, 2021

I don't know what date "Visual Studio 6" implies, but I'd been using Irix for a few years before GNU Emacs 20 and coding conversion; I can't speak for anything vi-le. Anyway, it's not uncommon to find mixed line endings even these days (which Emacs indicates).

jart · on Nov 2, 2021

It's not fair to the researchers to discount their contribution because prior knowledge of their findings exists. Publishing and evangelizing is important too. There's a lot of security weaknesses that simply can't be addressed without changing the social reality, which is what the people who built that slick website did.

For example, no one cared about supply chain hacks (e.g. depending on thousands of Node packages written by strangers on the Internet) until america's nukes started getting hacked because of it. No one cared about the BGP hack until someone told everyone at DEFCON how the trust inherent to how Internet engineering operates could be exploited. It goes on.

jerf · on Nov 2, 2021

I think the previous knowledge is incidental. Which the essay reflects in pointing it out in a single quick section. What's really relevant is all the other stuff.

At the moment it's academically interesting, not a "critical" vulnerability as so many people are rating it. Exploiting it is riskier than it is being presented as and less likely to work than presented. If I were trying to slip something into a code base today, this would not be the top of my list. Personally I'd be sticking to the already-existing plausibly deniable issues of things like off-by-one or subtle misspellings of string constants or variable shadowing or any number of other things.

Sure, by all means, take some steps to address this, but you can handle this as a medium or even a low priority vulnerability. It's not a panic to upgrade and panic to release half-assed, poorly-thought-out fixes. I'd rather see some carefully-thought out Unicode policies in a couple months than a hack fix now and people thinking the problem is solved. For instance, it's not hard to write code that looks at the category of characters and points out characters that don't match the category of the rest of the characters around them, highlighting homophone attacks. There's time to think this through. The priority is not "critical".

gnufx · on Nov 2, 2021

> carefully-thought out Unicode policies

Like https://www.unicode.org/reports/tr39/tr39-24.html ?

The phrase ‘It's not difficult’ is probably rarely applicable in the context of multilingual text — notice the possible homographs!

My simple-minded approach was to take the Unicode TR's list of "confusables" and make an Emacs display table indicating all the possible non-ASCII characters that could occur in a homograph sequence, assuming mainly ASCII text. But that is simple-minded, and you don't always want it on.

»¿ʇı̣ əsnqɐ ʇ,uɐɔ noʎ ɟı̣ ɓuı̣ɥʇʎuɐ sı̣ pooɓ ʇɐɥM« — anon

jerf · on Nov 2, 2021

Yes.

I don't mean that a full solution is not that hard. That's AI-hard in the limit. But it's not hard to detect most of the bad things. Most of the homographs are not right next to each other in Unicode.

phkahler · on Nov 2, 2021

I think the larger problem is this weird notion that everything needs to be fed through an interpreter - including Unicode. People, please stop creating infrastructure where data can also serve as code.

bombcar · on Nov 2, 2021

I agree - and give credit to the research team, they certainly got everyone excited and moving on this - even Atlassian released a "high severity" vulnerability over this.