Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Most loved programming language Rust sparks privacy concerns (bleepingcomputer.com)
75 points by rascul 7 days ago | hide | past | favorite | 61 comments

I have mixed feelings about this article.

0. The issue is real and interesting, especially (for me) its implications for reproducible builds.

1. The issue affects many other programming languages besides Rust.

2. The issue has not been ignored by the Rust team.

3. The headline is a clickbait cheap shot. Whether Rust is "most loved" on Stack Overflow has nothing to do with filepaths in binaries. The article author is just trying to inflame passions and get people outraged.

Can you expand on no 2? It seems an issue was opened 4 years ago, and was largely responded to with 'ok...soooo', and then closed when the opener apparently found a work around?

Not a cheap shot at the authors as you're right, other languages have or had this issue. But it still existing years later feels a bit ignored.

As the person who closed it: when someone opens an issue, and then says "thanks I fixed my issue," the appropriate thing to do is for that ticket to be closed.

Note that since then, other people have opened other issues, which haven't been closed, and we continue to talk about it.

(Also, while I said in the thread and at the time "I am not aware of a flag or anything to turn this off, though I may be wrong.", at least today, that flag does exist.)

Hey thanks for replying. Again, I didn't mean to point a finger at you or any of the team. Thanks for your work.

One person's (resolved) issue does not capture everything that happens with rust.

https://github.com/rust-lang/rust/issues/41555: "Tracking issue for Path Prefix Remapping", stabilized in 2018.

Nothing specific to rust. So do most other compiler and/or build-systems, but then we'd not have such a clickbaity title...

Solution to that and more problems: build in a clean chroot or container with the minimum of dependencies installed.

That's better anyway, not only due to leakage of build directory paths, but due to some projects just auto-picking up dependencies and enable the respective features if they find the libraries and the header files in the build system. For example, QEMU does this, but is not the single one to do so, but probably one where one notices this so much as it has just so many damn features! Naturally one can disable all unwanted features explicitly in the build system, but as every release three new one are added, and it's just better to have a clean environment anyway (avoids "oops, I installed a newer GCC version to test something, and broke ABI by accident"), that is a valid solution too in my book.

Hi folks, looks like a lot of people here have already dug into all of this, but as the person who closed the initial issue, I felt like I might as well chime in to reiterate:

1. We care about reproducible builds.

2. There is a flag to re-map these paths, as is pretty standard among other compilers. https://reproducible-builds.org/docs/build-path/

3. Bugs happen sometimes, and if that flag isn't working, or if there are other issues related to being non-reproducable, we'd appreciate folks filing them.

As always, happy to talk about any of this. I'll be around.

Aside from the privacy angle, this means it's harder to create verified builds, since building from source will generate arbitrary differences in the binary.

Since this is rust, I'm even more offended from a software quality perspective than the privacy perspective. It's just sloppy.

Use --remap-path-prefix.

The problem is you need to know about that, which probably many/most rust programmers didn't. At least until now.

But then the problem with automatic path remapping is that they can break you "automatic navigate to the problematic pice of code by clicking on the path" functionality many people use.

I'm always amazed about the amount of data leakage that happens due to binaries including full paths. I'm shocked whenever I see a crash and the error message mentioned `f:/username...`. There are also stories about computer viruses that have been reverse engineered and the information about the author was extracted from the file path. And it's not just small companies. I've seen paths leaked from production software coming out of Microsoft and Google.

It's so bad that even some intelligence agency-developed malware gets busted (in part) due to leaky paths and other strings in binaries' metadata.

Here's a leaked CIA Wiki/CMS thread from 2015 discussing some of the fuckups found by Kaspersky in NSA ("Equation Group") malware binaries: https://wikileaks.org/ciav7p1/cms/page_14588809.html

>This is PDB string, right? The PDB path should ALWAYS be stripped (I speak from experience. Ask me about Blackstone some time.). For Visual Studio user mode stuff, the /DEBUG linker switch should NOT be used. For drivers, it's a bit harder to avoid it, but a post-build step using binplace will strip the path information.

>For other strings generally, yeah, search the binary for them. Don't use internal tool names in your code.

Examples from Kaspersky's report (https://securelist.com/files/2015/02/Equation_group_question...):

>"Timeout waiting for the “canInstallNow” event from the implant-specific EXE!"

"Implant" is NSA lingo for "persistent malware". Probably best not to include routines for helpful error messages in your production-build nation-state malware code.


Conspicuous all-caps NSA codename.


"The codename GROK appears in several documents published by Der Spiegel, where “a keylogger” is mentioned. Our analysis indicates EQUATIONGROUP’s GROK plugin is indeed a keylogger on steroids that can perform many other functions."


Follows NSA's internal username / @nsa.gov pattern.

(The whole thread's an interesting read in general. You can even hover over all the myriad acronyms/initialisms to see what they mean. Also kind of amusing to see the unsurprising rivalry and passive-aggressive shade between CIA's and NSA's cyber operations teams.)

PDFs whose titles are the full path to the docx they were baked from remains an all-time favorite of mine.

Especially in things like legal opinions, company media releases, ...

It's almost universal. Games typically include a ton of local file paths from the developer's computers.

Just a non-rust specific tip:

- If you ever build, pack or otherwise "prepare" any kind of software always do so in a container "CI-style" (but not necessary in a CI, for many more reasons then just privacy).

- Always considers stripping debug symbols from binaries if you will distribute them using a post compilation step, potentially also keep the un-stripped binary for yourself for certain debug purposes.

- Generally consider using non-relevant user names on single-user desktop systems, there are ton's of programs leaking your user name.

This implies that reproducible builds aren’t supported by Rust. I didn’t realize that.

One step of making a build bit-for-bit reproducible is stripping out the leading parts of paths. I suppose you could achieve this with tooling that always builds in /opt and only links to /opt (or something), but the article says no such tooling exists.

Rust supports reproducible builds. There are options to remap file prefixes. This is the same situation as gcc and clang.

another thing that ruins reproducible builds is date stamps in the executable. Microsoft's C++ compiler has that issue. I'm not sure how common that is in other compilers

Rust is working toward fully reproducible builds; you can see outstanding issues here: https://github.com/rust-lang/rust/labels/A-reproducibility . The compiler already has options to enable many sources of variability. I've had success with reproducible builds for WASM.

The article actually says that you cnn work around the issue using containers.

I'm kind of curious why this is news today since it pops up from time to time in Rust discussions and seems like no one building software in Rust is that concerned.

Total anecdote and a bit of a strawman, but the only folks I've seen complain are random commenters that say something like "oh well it has privacy concerns so I can't use it." Which is a bit absurd.

The article says "This week, a pseudonymous developer chemsaf3 reached out to BleepingComputer reiterating their concern with this issue."

AFAIK this isn't uniquely a Rust problem; build systems are just extremely leaky and easy to fingerprint. In fact, it's part of the reason why reproducible builds are so difficult to pull off.

If I remember correctly, Stuxnet was identified as being a US/Israeli joint cyberattack by, among other things, common home directory names in debug build strings.

>"If it helps, you including user ids like this violates GDPR... so this should be addressed by the rust team."

This isn't true: GDPR covers entities that collect, store, and use user information ("data controllers"). Rust's developers are not data controllers; they're compiler developers. Maybe you could argue that crates.io has inadvertently become a data controller because they store these packages, but that still seems like a stretch. Regardless, the inadvertent leak of user information is directly tied to the legitimate purpose of allowing the compiler's panic machinery to output a usable backtrace. So I don't think you can scream "GDPR" just to get your Github issue more attention.

> Maybe you could argue that crates.io has inadvertently become a data controller because they store these packages, but that still seems like a stretch.

crates.io distributes packages by source, so I'd imagine it doesn't have any info regarding the username of the system the package was uploaded from (although it does obviously have the username associated with the crates.io account, but there's no reason those have to be the same)

Crates.io does not store built code.

> Maybe you could argue that crates.io has inadvertently become a data controller because they store these packages, but that still seems like a stretch.

Why so? It's not different from any other serivice where people submit things. It shouldn't matter if it's a binary, text, or a video, and I've heard opinions that the contents users submit need to be monitored for personal information anyway.

Not going to say it's not a problem or that it doesn't require fixing. I acknowledge that it can be an issue for some people.

However, can someone elaborate the actual (practical) implications. I.e. problems that emerged from such an issue in the past and caused significant damage in some way.

I've done pentests where we found a vulnerability that allows writing to arbitrary directories if you have sufficient permissions, and knowing the directory structure would make it so you don't have to guess which directories exist. It would also help if you were trying to bruteforce a login because it can show you what format the username is, like first initial + last name.

It's not really much on it's own, but it could definitely be used to make other security issues more effective.

So does MSVC and GCC and clang ...

No. There's existing tooling to get reproducible builds with those compilers, and it's fairly easy to produce output that doesn't include full file paths. Passing relative paths to the compiler is one of the more common ways to use them, and this only leaks relative pathnames which don't reveal information about the system they were compiled on. You can also redact information like __DATE__ quite easily, just by adding e.g. -D__DATE__=\"redacted\" to your command-line. Bazel does this by default because using __DATE__ interferes with reproducible builds. Other projects like Debian have also made efforts to make their builds reproducible. Not 100% of Debian packages are reproducible, but a significant percentage of them are, and using GCC or Clang is not a barrier.

gcc and clang have options to strip paths to support reproducible builds


So does rustc.

I just checked my project (multiplatform C++ server for Windows and Linux). Neither has any identifiable info in release build (Linux one is compiled by clang though)

Are you sure?

I can confirm MSVC. Not sure about clang. Although you can disable this. The default release builds, with optimization, leaks: paths, names and symbol names.

"you can disable this" is the critical difference.

I guess, but i would rather it be the case that not only the default but the _only_ config not leak this stuff.

Obviously this should be fixed, but as a work-around, does blotting the paths out of the binary with something like this not work?

   perl -pi.bak -e 's|/home/kfairmasterz|xxxxxxxxxxxxxxxxxx|g' a.out

Why having paths in a production binary? I mean, for debug binary kind of makes sense, but for production doesn't make any sense at all. As I understand even stripping the binary doesn't solve the issue.

This is not only a privacy concern but also a concern on efficiency to me. Sure on a desktop application, who cares, but since Rust is targeted also for embedded development, on a firmware where you have a few kB of flash memory wasting space for strings like paths on your system doesn't make a lot of sense.

So there's a few different things here, but the original bug was about paths from the Rust CI machines. That is, while almost all Rust code is compiled on users' machines, the standard library is a bit different; it's distributed pre-compiled, and so it's built in release mode, but with debug info on. So the paths that existed weren't ever user paths, but the ones from our CI.

In embedded you can use various options to reduce binary size, including the ones that get rid of these paths. I commented on this a few days ago over on reddit: https://www.reddit.com/r/rust/comments/m0irjk/opinions_on_ru...

It's one thing for a problem to exist. It's another entirely for a person, a group, or a community to be informed about the problem, only to have the problem be ignored.

The problem wasn't ignored. If you read the issue discussion on GitHub [edit: for the first linked issue], you can see that there's back and forth with the compiler developers about how certain information can easily be stripped and other places are harder to deal with. Eventually the original poster finds a workaround and declares their problem solved.

Rust github project has at the moment 6707 open issues. This is just one issue among many. Somewhat obviously at this scale the Rust team can not resolve every issue immediately.

the current status is that rustc has flags to remap the file paths, which I think makes the situation with rust the same as the situation with every other build system.

pretty much every build system will by default leak paths and times and whatnot from the builder, and need to be configured to not do that. AFAICT, the open issues are that some people want that to be the default and a few people are upset that changing the default requires an RFC.


I'm pretty sure they have 200000 other problems to care about too.

Why should they start with this?

There is fairly trivial workaround that you just do the release build in a container (or even plain chroot), which effectively anonymizes the paths. The fact that the workaround is so easy (in many cases) probably partially explains why this hasn't been the highest priority yet.

> "If it helps, you including user ids like this violates GDPR... so this should be addressed by the rust team."

This smells like bullshit to me, but I don't know enough about GDPR to say for sure. Can anyone comment?

edit: more broadly, can free software (ie: rustc), run on individual developer machines, actually run afoul of the GDPR somehow?

It's not a violation at all.

Yeah, there's no way this is a violation of GDPR. There's a trivial workaround, build in a container. The article is being overly sensational and should have fact checked the commenter's statement.

> There's a trivial workaround, build in a container.

The difficulty of a workaround is irrelevant for the nature of the problem. Also as the article mentions this is undocumented behaviour, so unless rustc/cargo builds everything in containers by default this workaround should not be required.

And while I'm not an expert in GDPR details it is still very problematic as the file path to the compiled project could contain an arbitrary selection of very personal information that would then be leaked.

Sure you can build it in a container, you could also run a car assembly line in a cleanroom. It's just ridiculous to have that as requirement to not secretly leak paths.

If the GDPR even applies, your mechanism for requesting that your personal information be removed is to build your code in a container.

But in any case, it wouldn't be rustc distributing your built code.

while it seems like an issue worth fixing, i doubt that GDPR applies here. the Rust foundation is not providing a service to the end users of any of these binaries, and it is at the developers own discretion whether to distribute the binaries rustc produces.

it's equivalent to saying that your self-hosted Postfix servers are violating GDPR because they includes information about which devices your message passed through on your local network during delivery in the header.

I really, really enjoy how even after all these years, still almost nobody knows what GDPR actually entails and how to actually be in compliance with it. Great work, Brussels, great work.

TL;DR: Debug information can contain absolute paths which can leak usernames and similar.


More precisely the debug information about dependencies can contain the absolute path to that dependency.

For example:


Note: The github.com-... is not leaking private information.

As you can see this might leak:

- your home directory and with this implicitly your user name.

- the CARGO_HOME path

- the dependency the binary uses, including it's version (but there are many ways you can potentially find that out, still a leak).

Especially in case of `path` and `git` dependencies this might leak more information.

BUT for debug builds I would prefer it to be this way as it allows me easily to navigate to the file which caused the panic. As this are potentially files not relative to the project root using relative path wouldn't work!!

But then on release builds I would prefer them to be obfuscated by default (maybe with an option to disable it).

There are various ways how you sometimes can not have this options included, e.g. by changing how panics work, so there is no need to include this. But they seem to not work always and are not reliable.

Still lets be honest building any binaries you plan to distribute not in CI or at least a container is generally a terrible idea so in practice this shouldn't be to much of an issue, which is also why no one bothered fixing it in 4 years (Rust isn't sold by a company, so if anyone wants fixes they either need to do them themself or hope someone else happen to fix them because they are interested in it being fixed).

I agree the it can be a problem, but it wouldn't be this bad if they didn't build on a local machine, didn't include symbols or stripped the binary. All standard practice.

This definitely does not violate GDPR.

`strip` the binary?

Some of the paths are included in the binary for printing backtraces and such, which `strip` doesn't remove.

remove unused symbols

Most loved? Seriously?

The metric measured in that survey is "the percentage of people who say that they use the language, who say they want to continue using the language in the future." I agree that calling this "loved" is... complicated.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact