Hacker News new | past | comments | ask | show | jobs | submit login
Arbitrary code execution during compilation – rust (github.com/eleijonmarck)
53 points by eleijonmarck 11 days ago | hide | past | favorite | 56 comments

Afaik you don’t even need to use macros for this, can’t you just put a build.rs file in the crate and it will execute on build?

Almost all build/project systems I know have this functionality simply because execution of arbitrary programs is too useful to go without. Any C# project (.csproj) for example can include a task that eats your homework.

It’s scary but I don’t see a solution like sandboxing being very easy to retrofit either.

I think the main problem the OP has is:

> When the do_not_compile_this_code is opened in VS Code with the rust-analyzer plugin, the editor expands the some_macro!() macro. This macro reads then content of ~/.ssh/id_rsa_do_not_try_this_at_home and deletes the file.

The rust-analyzer plugin seems to be the problem. It tries to compile the code when all you might want to do is read it. Like auto-executing Office macros.

Reading code should be a safe action. If just opening and displaying code can cause your editor/IDE to perform ACE, that's a problem.

> This behavior also occurs when cargo build is run or when the application is run.

This seems like more of an afterthought. Yes, when the application is run, whatever code is in the application is run. That's kind of the point.

And yes, you could always put arbitrary commands in your `configure` script or your `makefile`. But those commands shouldn't be run when all you did is open the file in vi(m)/emacs.

Note that vi(m), emacs, and other editors do allow files to modify the editor's environment, e.g. with modelines, or some other more advanced systems (ctags?). But they're very careful to limit the scope of what the files can do - and haven't always got it correct and the rules have needed to be tightened a few times IIRC.

So, yeah, I think this is a real issue that probably needs addressing.

>> When the do_not_compile_this_code is opened in VS Code with the rust-analyzer plugin, the editor expands the some_macro!() macro. This macro reads then content of ~/.ssh/id_rsa_do_not_try_this_at_home and deletes the file.

> The rust-analyzer plugin seems to be the problem. It tries to compile the code when all you might want to do is read it. Like auto-executing Office macros.

which is why before starting extensions, VS Code pops up a warning and requires you to click not just "Agree", but "Yes, I trust the authors; Trust folder and enable all features" in a dialog that also says "Code provides features that may automatically execute files in this folder.": https://code.visualstudio.com/docs/editor/workspace-trust. while I have a lot of complaints about VS Code (including, for example, last I checked they don't have such a dialog for telemetry collection), this doesn't sound like a real exploit unless the author found some way to bypass this setting.

> When the do_not_compile_this_code is opened in VS Code with the rust-analyzer plugin, the editor expands the some_macro!() macro. This macro reads then content of ~/.ssh/id_rsa_do_not_try_this_at_home and deletes the file.

Is that true though? I think I remember that by default vscode won't enable extensions like rust analyzer when opening a folder, unless you confirm that you trust the code in that folder first. Seems like reading code from the internet to ascertain it is not malevolent is a good use case for not trusting the code.

> And yes, you could always put arbitrary commands in your `configure` script or your `makefile`. But those commands shouldn't be run when all you did is open the file in vi(m)/emacs.

IIRC, if I open a Gradle project in IntelliJ IDEA, it executes the Gradle build script, including any arbitrary code therein. I think many other IDEs work similarly.

This doesn't actually happen though. First, VS Code asks you if you trust the workspace, and only when you answer "yes" does it run rust-analyzer.

> But those commands shouldn't be run when all you did is open the file in vi(m)/emacs.

How does a language server work without compiling the source? I don't see how this is rust specific at all.

Just turn rustanalyzer off by default if you don't want it to run on start-up. It's one click to do so.

The GP comment is completely correct, none of this needed macro expansion, it could be one line in a build.rs script.

> How does a language server work without compiling the source?

That's a good point.

I might suggest that for many older languages, the work of a language server didn't need to fully compile the source code to be effective. They could probably get "good enough" results with tokenisation and lexical/syntax analysis on a file-by-file basis, cross-referencing unresolved symbols with those found in other files in the same directory (and subdirectories?), and maybe knowing something about the locations/contents of standard libraries or other system-installed libs. If the language server can't find an include file, it has the option of ignoring it, and if it comes across a symbol it can't resolve, it can just not provide any help for that symbol.

If the only "macro" expansion that's available is textual substitution (e.g. C's preprocessor), then performing that step can't do anything except provide different source code to be analysed, and is no less safe than analysing any other source code file.

Even C++'s template expansion, while Turing-complete, I don't think it's capable of performing arbitrary I/O. IIRC it's only capable of manipulating existing C++ AST fragments?

If macro expansion can execute arbitrary code though... that's a whole different ball game. It seems like the kind of thing that really should be sandboxed. Or require a specific opt-in for each new project - like the "hey, are you sure you want to run the macros in this Word doc? It may have come from an untrustworthy source." prompt (or whatever it actually says).

Edit - looking at other comments written since I started writing this reply, you do get a "are you sure you want to trust this project?" prompt. So there's that, at least.

> the work of a language server didn't need to fully compile the source code to be effective

And you can get that by toggling three settings in your LSP client. They're even documented in the user manual [0].

They are enabled by default because users won't be happy if their proc macros don't work. They'd be even less happy with the "ctags for Rust" approach you're suggesting.

[0] https://rust-analyzer.github.io/manual.html#security

Opening code in an IDE is likely compiling the code, not merely reading it. I would not expect opening a file read only in a text editor to be certain to not execute anything. That's the reason many complex editors (including vscode) these days will ask you if you trust the contents. It's likely possible to merely "open" the code "as a text editor" would, but I'm not sure if that's what happens if you answer no.

The only way to “address” it though is individual:

If you don’t want arbitrary code running on your system, you can’t use tools that require running arbitrary code.

To be clear and as many other commenters have stated, VSCode and Rust Analyzer do not require running arbitrary code to view the code with basic syntax highlighting.

When you open a directory for the first time it will pop up a bit blocking dialogue asking if you trust the authors of the contents of that of the directory to allow code execution of it.

Suspect the same may be true of code generators in C#. It’s probably possible to do similar to Clojure as well.

The mistake is that arbitrary transformations != arbitrary code.

I want the build process to be able to generate arbitrary code based on the inputs given to it from the source control — but nothing else. No reaching out to HTTP command and control endpoints, making database calls, or deleting my home directory.

It’s not just because of security. Security is a side-benefit here.

The real benefit is that unrestricted build processes cannot be versioned with source control. If the build process can “reach out” and pull in data from external sources, then it will always use the “latest” version, not the version in that branch or commit.

It’s about being hygienic.

Then avoid crates that do such things. Other people however are able to make use of compile time code execution to do some pretty awesome things. For example, a database library sqlx can check all the SQL in your code as being syntactically correct, and also typed correctly against a test database at compile time. A feature that is useful and convenient for users of the library.

"Allowing connections to http://ga1sdf4saf.ru is fine, because it's so convenient not having to put things in source control."

The database example is (largely) a solved problem. Microsoft SQL for example lets you check in an ".MDF" database file into source control. If it's a "schema only" file, it's probably just a few megabytes. It can be loaded locally without a "server" using a connection string that simply references the file name. Similar things can be done with SQL Lite, etc...

Even these approaches miss the point to a degree. Relying on an external executable is also a mistake. What if the developers update their database engine version on their laptop, and they need to go back to a previous major release branch to produce a security hotfix update? They might not be able to if the build tools have "moved on".

This is not some esoteric scenario, I'm facing this issue right now with some old SOAP endpoints where I need to rebuild the front-end that has been untouched for 10+ years, but I can't because the endpoints are HTTPS with TLS 1.0 but all new desktop and servers enforce TLS 1.2, so now I'm stuck.

The correct solution instead of the dirty shortcut is to include the WSDL file into the source code and reference it from there.

This also allows builds in cloud-hosted build platforms like GitHub Actions or Azure DevOps Pipelines, because with a hygienic build process no "LAN connectivity" is needed or assumed.

Your convenience will become someone else's security nightmare.

I agree with you and I'm not sure why you're being downvoted.

That being said, it's nice to be able to have guarantees about your build without having to look at the transitive closure of dependencies in your project. It'd be nice if crates could be marked as "hygienic build" or something, and a hygienic crate can only depend on other hygienic crates. And then something like `cargo check-hygienic` which fails if any dependencies are non-hygienic.

'avoid the crates that do it' requires careful vetting of all code in the crates you use and all the crate's dependencies, now and in all future versions of your crate and crate's dependencies. Which in reality turns out to be impractical for most projects in most work environments. And even if practical, turns out that many ways of vetting the code will expand the macros and do arbitrary code execution.

> requires careful vetting of all code in the crates you use

I just explained it would be useful to have a cargo sub-command for automating this

You = “You and all your coworkers, forever.”

As the proud owner of a production database, a test database, a duly ancient build system, etc, this is entirely wrong.

It would be delightful if my build system checked my SQL against the schema that is checked in to the same repository. It should absolutely not look at my test database, nor should the test database even need to be running, thank you very much.

IMO builds should be sandboxed and deterministic by default. And turning off that default should require whoever invokes the build to explicitly grant permission to escape the sandbox.

If you need fancy things in the sandbox, put them in the sandbox.

These are optional features. You can decide whether or not you want to use them. No one is forcing you to do these checks at compile time.

My point is, the capability is useful to some people, and there are many other ways that doing arbitrary things at build/compile time can be useful or make things easier. The sqlx example is one of many.

Another usage, is calling out to another tool, e.g. a protobuf code generation tool. That requires the build toolchain interacting with another tool, that would "break the sandbox."

The ability to reach .ru addresses is also convenient to some people.

Speeding down the highway as fast as possible is also convenient to some people.

Convenience becomes some else’s bad day.

Isn't there an effort to use compile rust macros to wasm to sandbox them ?

FWIW, this is maybe an area where Go goes against the grain a bit and goes out of its way to not allow code you just downloaded to execute anything while you are building.

For things like 'go generate', the convention is to check in the results, which means a consumer of a package has the results without executing code:


It's not that unusual. The JVM ecosystem works the same way.

Developing inside a container seems like a basic mitigation that a developer could use. Depends what you're developing though.

Indeed the Go build system is paying a usability price in order to guarantee that no user code is executed during builds (unless explicitly invoked e.g. via "go generate")

Nothing new to see here...

Any of these steps could do the same to your system, and it's been the "standard" for 30+ years:

    sudo make install
Or literally any other language/package manager that supports build scripts.

At least you had to unpack the source archive and install the dependencies yourself, which gave one time to appreciate just how much you depended on and how trusting you were. Nowadays the bad code can be in any one of your 300 auto-downloaded public unsigned dependencies. It feels light, easy and fun but it's actually powerful dark magic to summon the work of thousands of individuals into your pet project.

A makefile can run arbitrary shell commands.

Sure. And the code you're compiling will also at some point be executed. So you're trusting the persons who wrote _that_ project. Also, if a Makefile looks like it's doing anything else than setting up the compile env and building you can be sure I'm interrupting it quickly to look at what it's doing.

OTOH a declarative build manifest with transitive dependencencies is like a self-replicating invite to an open house party inside your computer. It's only a matter of time before some _bad people show up_. (cue Beastie Boys' "Fight For Your Right to Party" )

I would expect any sufficiently powerful macro system would have to be this way.

Don't most editors ask you whether or not you want to trust some code before opening it with full privileges anyway?

Yes, which was kind of a result of people making a fuss about this a year or two ago iirc.

Doesn't happen with helix or most terminal editors without specific config

So... if I'm using a third party crate, I'm already trusting it not to do bad things in my running application. Why is it such a big deal that it could do bad things during build time just before I run it? If I'm using a third party crate... I've got to trust it one way or the other. So what's the big deal here?

In the context of a long-lived build server it could permanently compromise the machine, allowing an attacker to modify any other package you publish from there and maintain that access even after Rust has been fixed.

A lot of things could also potentially compromise a long-lived build server, to the point where it’s better not to be long lived.

If it’s not practical to use a fresh machine/vm/container/function for each build, at least rotate them out more than once a day.

You need full repeatable control over the execution environment for hermetic builds.

I also agree rust needs to either fix mitigate this. One option you have is to disable networking on the build machine.

If that build server runs tests too the surface area of such an attack is similar.

You can sandbox your application when it runs, but nobody's doing much about the dev environment. If you're working for a company and using VSCode, you are often just one malicious plugin update away from leaking the company's IP and/or having your system compromised. Similar case for Python packages and such Internet-facing code environments.

Are you sandboxing your applications when you run them on your dev machine?

Old, it's not new that macro expansions, build files and build tooling can do that. (And if we sandboxed that, you still get infected release builds, check your deps..)

See NPM installations and "please sponsor this project" messages, which can also give you a virus.

We must start systematically sandboxing developer tools. It's scary how sensitive dev workspaces are, and how much random crap we run. After decades of training the world's parents and grandparents not to download and run programs from untrusted sources we now routinely do it ourselves.

Most reasonable companies/projects do that. I believe the compiler explorer project - https://godbolt.org/ - uses nsjail or maybe firejail for that - https://github.com/compiler-explorer/compiler-explorer/tree/...

  asm(".section .text\n"
      ".global ls\n"
      ".global le\n"
      ".incbin \"/etc/passwd\"\n"

  int main() {
    extern char ls __asm__("ls");
    extern char le __asm__("le");
    write(1, &ls, &le - &ls);

D does this "the right way", which is to say free of side-effects.



You're supposed to be able to trust the compiler, you can't trust people. (https://forum.dlang.org/post/po2734$20mq$1@digitalmars.com)

As does C, it even guarantees that the preprocessor terminates, but it's just a tiny bit harder to write programs in the c preprocessor.

C the core language incl. preprocessor _may_ not allow arbitrary code execution during build.

But in the C ecosystem, there are no build systems with fully declarative configuration. Every project is expected to come with build configuration that is both very ad-hoc / unique to the project, and often includes tens of thousands of lines of unreadable auto-generated boilerplate (e.g. if people commit the later stages of auto-tools, which is common practice) which can run arbitrary code. So in practice C is not better at all.

Also, C still has several ways to do file inclusion from arbitrary paths, as well as ways to cause arbitrary long compile times and object size with tiny source code. Compilation time may be guaranteed to be finite, but it is certainly not bounded.

I think this is actually a good case for development containers. That way you're very explicit in what you expose in the container.

It could still read your AWS keys that you pass in through the ENV though and upload those to some server in China / Russia.

Or it could delete all your source code, but that's counter productive.

This is a feature, not a bug: https://youtu.be/MWRPYBoCEaY

Isn't that one of the main selling points of Jai?

POC to demonstrate how to delete files when cargo build runs

I filed a issue on `rust-analyzer` and apparently it is by design - https://github.com/rust-lang/rust-analyzer/issues/14375

I mean it’s fairly obvious. You can do this through build.rs files as well.

There was talk about trying to compile proc macros to WASM and run them sandboxed in the compiler. Not sure what happened to that RFC (by dtolnay?)

Such a safe language, better than everything else so much so that what amounts to a linter does compiles

Haha rust zealots are amazing, stay salty sea dwellers!

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact