Very annoying - the apparent author of the backdoor was in communication with me over several weeks trying to get xz 5.6.x added to Fedora 40 & 41 because of it's "great new features". We even worked with him to fix the valgrind issue (which it turns out now was caused by the backdoor he had added). We had to race last night to fix the problem after an inadvertent break of the embargo.
He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.
EDIT: Lasse Collin's account @Larhzu has also been suspended.
EDIT: Github has disabled all Tukaani repositories, including downloads from the releases page.
--
EDIT: Just did a bit of poking. xz-embedded was touched by Jia as well and it appears to be used in the linux kernel. I did quick look and it doesn't appear Jia touched anything of interest in there. I also checked the previous mirror at the tukaani project website, and nothing was out of place other than lagging a few commits behind:
EDIT: Just sent an email to the last committer to bring it to their attention.
EDIT: It's been removed.
--
jiatan's Libera info indicates they registered on Dec 12 13:43:12 2022 with no timezone information.
-NickServ- Information on jiatan (account jiatan):
-NickServ- Registered : Dec 12 13:43:12 2022 +0000 (1y 15w 3d ago)
-NickServ- Last seen : (less than two weeks ago)
-NickServ- User seen : (less than two weeks ago)
-NickServ- Flags : HideMail, Private
-NickServ- jiatan has enabled nick protection
-NickServ- *** End of Info ***
/whowas expired not too long ago, unfortunately. If anyone has it I'd love to know.
They are not registered on freenode.
EDIT: Libera has stated they have not received any requests for information from any agencies as of yet (30th Saturday March 2024 00:39:31 UTC).
EDIT: Jia Tan was using a VPN to connect; that's all I'll be sharing here.
Just for posterity since I can no longer edit: Libera staff has been firm and unrelenting in their position not to disclose anything whatsoever about the account. I obtained the last point on my own. Libera has made it clear they will not budge on this topic, which I applaud and respect. They were not involved whatsoever in ascertaining a VPN was used, and since that fact makes anything else about the connection information moot, there's nothing else to say about it.
I am not LE nor a government official. I did not present a warrant of any kind. I asked in a channel about it. Libera refused to provide information. Libera respecting the privacy of users is of course something I applaud and respect. Why wouldn't I?
Respect not giving out identifying information on individuals whenever someone asks, no matter what company they work for and what job they do? Yes. I respect this.
Good question, though I can imagine they took this action for two reasons:
1. They don't have the ability to freeze repos (i.e. would require some engineering effort to implement it), as I've never seen them do that before.
2. Many distros (and I assume many enterprises) were still linking to the GitHub releases to source the infected tarballs for building. Disabling the repo prevents that.
The infected tarballs and repo are still available elsewhere for researchers to find, too.
They could always archive it. Theoretically (and I mean theoretically only), there's another reason for Microsoft to prevent access to repo: if a nation state was involved, and there've been backdoor conversations to obfuscate the trail.
Archiving the repo doesn't stop the downloads. They would need to rename it in order to prevent distro CI/CD from keeping downloading untrustworthy stuff.
The latest commit is interesting (f9cf4c05edd14, "Fix sabotaged Landlock sandbox check").
It looks like one of Jia Tan's commits (328c52da8a2) added a stray "." character to a piece of C code that was part of a check for sandboxing support, which I guess would cause the code to fail to compile, causing the check to fail, causing the sandboxing to be disabled.
If your project becomes complex enough eventually you need tests for the configure step. Even without malicious actors its easy to miss that a compiler or system change broke some check.
The alpine patch includes gettext-dev which is likely also exploited as the same authors have been pushing gettext to projects where their changes have been questioned
It's still the wrong way to go about things. Tests are there for a reason, meaning if they fail you should try to understand them to the point where you can fix the problem (broken test or actual bug) instead of just wantonly distabling tests until you get a green light.
Asking this here too: why isn't there an automated A/B or diff match for the tarball contents to match the repo, auto-flag with a warning if that happens? Am I missing something here?
The tarballs mismatching from the git tree is a feature, not a bug. Projects that use submodules may want to include these and projects using autoconf may want to generate and include the configure script.
> The release tarballs upstream publishes don't have the same code that GitHub has. This is common in C projects so that downstream consumers don't need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub.
Multiple suggestions on that thread on how that's a legacy practice that might be outdated, especially in the current climate of cyber threats.
> Those days are pretty much behind us. Sure, you can compile code and tweak software configurations if you want to--but most of the time, users don't want to. Organizations generally don't want to, they want to rely on certified products that they can vet for their environment and get support for. This is why enterprise open source exists. Users and organizations count on vendors to turn upstreams into coherent downstream products that meet their needs.
> In turn, vendors like Red Hat learn from customer requests and feedback about what features they need and want. That, then, benefits the upstream project in the form of new features and bugfixes, etc., and ultimately finds its way into products and the cycle continues.
"and when the upstream is tainted, everyone drinks poisoned water downstream, simple as that!"
I think this has been in the making for almost a year. The whole ifunc infrastructure was added in June 2023 by Hans Jansen and Jia Tan. The initial patch is "authored by" Lasse Collin in the git metadata, but the code actually came from Hans Jansen: https://github.com/tukaani-project/xz/commit/ee44863ae88e377...
There were a ton of patches by these two subsequently because the ifunc code was breaking with all sorts of build options and obviously caused many problems with various sanitizers. Subsequently the configure script was modified multiple times to detect the use of sanitizers and abort the build unless either the sanitizer was disabled or the use of ifuncs was disabled. That would've masked the payload in many testing and debugging environments.
The hansjans162 Github account was created in 2023 and the only thing it did was add this code to liblzma. The same name later applied to do a NMU at Debian for the vulnerable version. Another "<name><number>" account (which only appears here, once) then pops up and asks for the vulnerable version to be imported: https://www.mail-archive.com/search?l=debian-bugs-dist@lists...
That looks exactly like what you'd want to see to disguise the actual request you want, a number of pointless upstream updates in things that are mostly ignored, and then the one you want.
Jia Tan getting maintainer access looks like it is almost certainly to be part of the operation. Lasse Colling mentioned multiple times how Jia has helped off-list and to me it seems like Jia befriended Lasse as well (see how Lasse talks about them in 2023).
Also the pattern of astroturfing dates back to 2022. See for example this thread where Jia, who has helped at this point for a few weeks, posts a patch, and a <name><number>@protonmail (jigarkumar17) user pops up and then bumps the thread three times(!) lamenting the slowness of the project and pushing for Jia to get commit access: https://www.mail-archive.com/xz-devel@tukaani.org/msg00553.h...
Naturally, like in the other instances of this happening, this user only appears once on the internet.
Also seeing this bug. Extra valgrind output causes some failed tests for me. Looks like the new version will resolve it. Would like this new version so I can continue work.
Wow, what a big pile of infrastructure for a non-optimization.
An internal call via ifunc is not magic — it’s just a call via the GOT or PLT, which boils down to function pointers. An internal call through a hidden visibility function pointer (the right way to do this) is also a function pointer.
The even better solution is a plain old if statement, which implements the very very fancy “devirtualization” optimization, and the result will be effectively predicted on most CPUs and is not subject to the whole pile of issue that retpolines are needed to work around.
Right, IFUNCs make sense for library function where you have the function pointer indirection anyway. Makes much less sense for internal functions - only argument over a regular function pointer would be the pointer being marked RO after it is resolved (if the library was linked with -z relro -z now), but an if avoids even that issue.
It’s certainly a pseudonym just like all the other personas we’ve seen popping up on the mailing list supporting this “Jia Tan” in these couple of years. For all intents and purposes they can be of any nationality until we know more.
PSA: I just noticed homebrew installed the compromised version on my Mac as a dependency of some other package. You may want to check this to see what version you get:
xz --version
Homebrew has already taken action, a `brew upgrade` will downgrade back to the last known good version.
GitHub disabled the xz repo, making it a bit more difficult for nix to revert to an older version. They've made a fix, but it will take several more days for the build systems to finish rebuilding the ~220,000 packages that depend on the bootstrap utils.
What should they be relying on instead? Maybe rsync everything to an FTP server? Or Torrents? From your other comments, you seem to think no one should ever use GitHub for anything.
"great" for whom? I've seen enough of the industry to immediately feel suspicious when someone uses that sort of phrasing in an attempt to persuade me. It's no different from claiming a "better experience" or similar.
I don't know how you can be missing the essence of the problem here or that comments point.
Vague claims are meaningless and valueless and are now even worse than that, they are a red flag.
Please don't tell me that you would accept a pr that didn't explain what it did, and why it did it, and how it did it, with code that actually matched up with the claim, and was all actually something you wanted or agreed was a good change to your project.
Updating to the next version of a library is completely unrelated. When you update a library, you don't know what all the changes were to the library, _but the librarys maintainers do_, and you essentially trust that librarys maintainers to be doing their job not accepting random patches that might do anything.
Updating a dependency and trusting a project to be sane is entirely a different prospect from accepting a pr and just trusting that the submitter only did things that are both well intentioned and well executed.
If you don't get this then I for sure will not be using or trusting your library.
// The "-2" is included because the for-loop will
// always increment by 2. In this case, we want to
// skip an extra 2 bytes since we used 4 bytes
// of input.
i += 4 - 2;
I’ve long thought that those “this new version fixes bugs and improves user experience” patch notes that Meta et al copy and paste on every release shouldn’t be permitted.
Tell me about it. I look at all these random updates that get pushed to my mobile phone and they all pretty much have that kind of fluff in the description. Apple/Android should take some steps to improve this or outright ban this practice. In terms of importance to them though I imagine this is pretty low on the list.
I have dreamed about an automated LLM system that can "diff" the changes out of the binary and provide some insight. You know give back a tiny bit of power to the user. I'll keep dreaming.
It's worse, as someone who does try to privide release notes I'm often cut off by the max length of the field. And even then, Play only shows you the notes for the latest version of the app.
Ugh. That's especially annoying because they're trying to be hip with slang and use a metaphor that requires cultural knowledge that you can't really assume everyone has.
Interesting that one of the commits commented on update of the test file that it was for better reproducibility for having been generated by a fixed random seed (although how goes unmentioned). For the future, random test data better be generated as part of the build, rather than being committed as opaque blobs...
I agree on principle, but sometimes programmatic generating test data is not so easy.
E.g.: I have a specific JPEG committed into a repository because it triggers a specific issue when reading its metadata. It's not just _random_ data, but specific bogus data.
But yeah, if the test blob is purely random, then you can just commit a seed and generate in during tests.
Debian have reverted xz-utils (in unstable) to 5.4.5 – actual version string is “5.6.1+really5.4.5-1”. So presumably that version's safe; we shall see…
Is that version truly vetted? "Jia Tan" has been the official maintainer since 5.4.3, could have pushed code under any other pseudonym, and controls the signing keys. I would have felt better about reverting farther back, xz hasn't had any breaking changes for a long time.
After reading the original post by Andres Freund, https://www.openwall.com/lists/oss-security/2024/03/29/4, his analysis indicates that the RSA_public_decrypt function is being redirected to the malware code. Since RSA_public_decrypt is only used in the context of RSA public key - private key authentication, can we reasonably conclude that the backdoor does not affect username-password authentication?
Yeah. Had a weird problem last week where GitHub was serving old source code from the raw url when using curl, but showing the latest source when coming from a browser.
Super frustrating when trying to develop automation. :(
These shouldn't be suspended, and neither should their repositories. People might want to dig through the source code. It's okay if they add a warning on the repository, but suspending _everything_ is a stupid thing to do.
This can also be handled relatively easily. They can disable the old links and a new one can be added specifically for the disabled repository. Or even just let the repository be browsable through the interface at least.
Simply showing one giant page saying "This respository is disabled" is not helpful in any way.
1. You don't actually know what has been done by whom or why. You don't know if the author intended all of this, or if their account was compromised. You don't know if someone is pretending to be someone else. You don't know if this person was being blackmailed, forced against their will, etc. You don't really know much of anything, except a backdoor was introduced by somebody.
2. Assuming the author did do something maliciously, relying on personal reputation is bad security practice. The majority of successful security attacks come from insiders. You have to trust insiders, because someone has to get work done, and you don't know who's an insider attacker until they are found out. It's therefore a best security practice to limit access, provide audit logs, sign artifacts, etc, so you can trace back where an incursion happened, identify poisoned artifacts, remove them, etc. Just saying "let's ostracize Phil and hope this never happens again" doesn't work.
3. A lot of today's famous and important security researchers were, at one time or another, absolute dirtbags who did bad things. Human beings are fallible. But human beings can also grow and change. Nobody wants to listen to reason or compassion when their blood is up, so nobody wants to hear this right now. But that's why it needs to be said now. If someone is found guilty beyond a reasonable doubt (that's really the important part...), then name and shame, sure, shame can work wonders. But at some point people need to be given another chance.
100% fair -- we don't know if their account was compromised or if they meant to do this intentionally.
If it were me I'd be doing damage control to clear my name if my account was hacked and abused in this manner.
Otherwise if I was doing this knowing full well what would happen then full, complete defederation of me and my ability to contribute to anything ever again should commence -- the open source world is too open to such attacks where things are developed by people who assume good faith actors.
I think I went through all the stages of grief. Now at the stage of acceptance here’s what I hope: I hope justice is done. Whoever is doing this be they a misguided current black hat (hopefully, future white hat) hacker, or just someone or someones that want to see the world burn or something in between that we see justice. And then forgiveness and acceptance and all that can happen later.
Mitnick reformed after he was convicted (whether you think that was warranted or not). Here if these folks are Mitnick’s or bad actors etc let’s get all the facts on the table and figure this out.
What’s clear is that we all need to be ever vigilant: that seemingly innocent patch could be part of a more nefarious thing.
We’ve seen it before with that university sending patches to the kernel to “test” how well the core team was at security and how well that went over.
Anyways. Yeah. Glad you all allowed me to grow. And I learned that I have an emotional connection to open source for better or worse: so much of my life professional and otherwise is enabled by it and so threats to it I guess I take personally.
It is reasonable to consider all commits introduced by the backdoor author untrustworthy. This doesn't mean all of it is backdoored, but if they were capable of introducing this backdoor, their code needs scrutiny. I don't care why they did it, whether it's a state-sponsored attack, a long game that was supposed to end with selling a backdoor for all Linux machines out there for bazillions of dollars, or blackmail — this is a serious incident that should eliminate them from open-source contributions and the xz project.
There is no requirement to use your real name when contributing to open source projects. The name of the backdoor author ("Jia Tan") might be fake. If it isn't, and if somehow they are found to be innocent (which I doubt, looking at the evidence throughout the thread), they can create a new account with a new fake identity.
They might have burnt the reputation built for this particular pseudonym but what is stopping them from doing it again? They were clearly in it for the long run.
I literally said "they", I know, I know, in English that can also be interpreted as a gender unspecific singular.
Anyways, yes it is an interesting question whether he/she is alone or they are a group. Conway's law probably applies here as well. And my hunch in general is that these criminal mad minds operate individually / alone. Maybe they are hired by an agency but I don't count that as a group effort.
Every Linux box inside AWS, Azure, and GCP and other cloud providers that retains the default admin sudo-able user (e.g., “ec2”) and is running ssh on port 22.
I bet they intended for their back door to eventually be merged into the base Amazon Linux image.
You don't need a "ec2" user. A backdoor can just allow root login even when that is disabled for people not using the backdoor.
It just requires the SSH port to be reachable unless there is also a callout function (which is risky as people might see the traffic).
And with Debian and Fedora covered and the change eventually making its way into Ubuntu and RHEL pretty much everything would have this backdoor.
my understanding is that any Debian/RPM-based Linux running sshd would become vulnerable in a year or two. The best equivalent of this exploit is the One Ring.
So the really strange thing is why they put so little effort into making this undetectable. All they needed was to make it use less time to check each login attempt.
In the other hand it was very hard to detect. The slow login time was the only thing that gave it away. It more seems like they were so close to being highly successful. In retrospect improving the performance would have been the smart play. But that is one part that went wrong compared to very many that went right.
Distro build hosts and distro package maintainers might not be a bad guess. Depends on whether getting this shipped was the final goal. It might have been just the beginning, part of some bootstrapping.
Not sure why are people downvoting you... it's pretty unlikely that various Chinese IoT companies would just decide it's cool to add a backdoor, which clearly implies that no matter how good their intentions are, they simply might have no other choice.
There are roughly speaking two possibilities here:
1. His machine was compromised, he wasn't at fault past having less than ideal security (a sin we are all guilty of). His country or origin/residence is of no importance and doxing him isn't fair to him.
2. This account was malicious. There's no reason we should believe that the identity behind wasn't fabricated. The country of origin/residence is likely falsified.
In neither case is trying to investigate who he is on a public forum likely to be productive. In both cases there's risk of aiming an internet mob at some innocent person who was 'set up'.
The back door is in the upstream GitHub tarball. The most obvious way to get stuff there is by compromising an old style GitHub token.
The new style GitHub tokens are much better but it’s somewhat intransparent what options you need. Most people also don’t use expiring tokens.
The authors seems to have a lot of oss contributions, so probably an easy target to choose.
I think the letters+numbers naming scheme for both the main account and the sockpuppets used to get him access to xz and the versions into distros is a strong hint at (2). Taking over xz maintainership without any history of open source contributions is also suspicious.
But my point is that people living in China might be "forced" to do such things, so we unfortunately can't ignore the country. Of course, practically this is problematic since the country can be faked
He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.