The "everybody has bugs" response is intellectually dishonest. Yes, everybody has bugs, but most people's bugs aren't an intentional feature that a trained monkey ought to have known was a bad idea.
- Someone implemented a YAML parser that executed code. This should have been obviously wrong to them, but it wasn't.
- Thousands of ostensible developers used this parser, saw the fact that it could deserialize more than just data, and never said "Oh dear, that's a massive red flag".
- The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.
- The issue was reported to RubyGems multiple times and they did nothing.
This isn't the same thing as a complex and accidental bug that even careful engineers have difficulty avoiding, after they've already taken steps to reduce the failure surface of their code through privilege separation, high-level languages/libraries, etc.
This is systemic engineering incompetence that apparently pervades an entire language community, and this is the tipping point where other people start looking for these issues.
Any sufficiently advanced serialization standard will let you serialize/deserialize a wide variety of objects. When J2EE SOAP libraries are passing large amounts of XML back and forth over the wire, it is often going to get instantiated via calling a zero-argument constructor and then firing a bunch of setter methods. As J2EE has learned to its sorrow, there are some good choices for objects to allow people to deserialize and some not so good ones.
If J2EE is a boring platform to you, pick your favorite and Google for a few variants. You'll find a serialization vulnerability. It's hard stuff, by nature.
* The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.*
Do you have a citation for this? What particular bug in the parser are you referring to? The behavior which is being exploited is a fairly complicated interaction between the parser and client Rails code -- I banged my head against the wall trying to get code execution with Ruby 1.8.7's parser for over 12 hours, for example, without any luck unless I coded a too-stupid-to-be-real victim class. (It's my understanding that at least one security researcher has a way to make that happen, but that knowledge was hard won.)
> Any sufficiently advanced serialization standard ... it is often going to get instantiated via calling a zero-argument constructor and then firing a bunch of setter methods.
Yes, this is always a bad idea. It's actually in a similar problem space as the constant stream of vulnerabilities in the Java security sandbox (eg, applets); all it takes is one mistake and you lose.
And thus, people have been saying to turn off Java in the browser for 4+ years, and this is also why Spring shouldn't have implemented such code.
> It's hard stuff, by nature.
Which is why deserializing into executable code is a bad idea, by nature. I'd thought this was well established by now, but apparently it is not.
> Do you have a citation for this? What particular bug in the parser are you referring to?
> This is systemic engineering incompetence that apparently pervades an entire language community
The original target of that claim was the Ruby community. With this comment allowing the same issue existing in the Java community, are you leveling the same claim against it? Does every severe security issue that remains unnoticed by a community for some time and is eventually noticed suggest pervasive engineering incompetence throughout that entire community? Maybe you would be entirely right to make that claim because any security issue is indicative of incompetence at some level, but I think the closer your definition of incompetence comes to including everybody, the less useful that definition is.
> Which is why deserializing into executable code is a bad idea, by nature. I'd thought this was well established by now, but apparently it is not
I'm not sure that means anything. In an OO language, you are always de-serializing into objects, and objects are always 'executable code'. Hashes and Arrays are executable code too, right?
The problem is actually when you allow de-serializing into _arbitrary_ objects of arbitrary classes, and some of those objects have dangerous side effects _just by being instantiated_, and/or have functionality that can turn into arbitrary code execution vector. (Hopefully Hash's and Array's don't).
It is a problem, and it's probably fair to say that you should never have a de-serialization format that takes untrusted input and de-serializes to anyting but a small whitelisted set of classes/types. And that many have violated this, and not just in ruby.
But if you can't even describe the problem/guidance clearly yourself, I think that rather belies your insistence that it's an obvious thing known by the standard competent programmer.
(I am not ashamed to admit it was not obvious to me before these exploits. I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.).
> I'm not sure that means anything. In an OO language, you are always de-serializing into objects, and objects are always 'executable code'. Hashes and Arrays are executable code too, right?
No. You're conflating code and state (which was the problem to begin with!)
Let's disassemble parsing a list of strings:
When you instantiate the individual string objects, you do not 'eval' the data to allow it to direct which string class should be instantiated. You also do not 'eval' the data to determine which fields to set on the string class.
You instantiate a known String type, and you feed it the string representation as an array of non-executable bytes using a method you specified when writing your code -- NOT a method the data specifies.
The data is not executable. It's an array of untrusted bytes. The string code is executable, and it operates on state: the data.
You repeat this process, feeding the string objects into the list object. At no point do you ask the data what class or code you should run to represent it. Your parsing code dictates what classes to instantiate, and the data is interpreted according to those fixed rules, and your data is never executed.
It should never be possible for data to direct the instantiation of types. The relationship must always occur in the opposite direction, whereby known types dictate how to interpret data.
> I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.
Given the preponderance of prior art, this seems unlikely.
The YAML vulnerability was not from any 'eval' in the YAML library itself, you realize, right?
It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
To use your language, I don't think it's 'intellectual honest' to call allowing de-serialization to data-specified classes "a YAML parser that executed code"--that's being misleading -- or to say that a 'trained monkey should have known it was a bad idea' (allowing de-serialization to arbitrary data-specified classes).
There have been multiple vulnerabilities _just like this_ in other environments, including several in Java (and in major popular Java packages). You could say with all that prior art it ought to have been obvious, but of course you could say that for each of the multiple prior vulnerabilities too. Of course, each time there's even more prior art, and for whatever reason this one finally got enough publicity that maybe this kind of vulnerablity will be common knowledge now.
> The YAML vulnerability was not from any 'eval' in the YAML library itself, you realize, right?
> It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
What you are looking for is not "OO language", but "dynamic interpreted language".
In a traditionally compiled OO language like C++, classes cease to exist after compilation; there is no fully generic way to instantiate an object of a class by data determined at runtime. So this whole concept of deserializing to whatever the protocol specifies goes completely out of the door.
So your conclusion is that dynamically interpreted languages are all insecure?
(You can instantiate objects with classes specified by data in Java too, although Java isn't usually considered exactly dynamicaly interpreted. In fact, there was a very analagous bug in Spring, as mentioned in many places in this comment thread. But anyway, okay, sufficiently dynamically interpreted to allow instantiation of objects with classes chosen at runtime... is the root of the problem, you're suggesting, if everyone just used C++ it would be fine?)
One could argue that since every call goes through a runtime messaging framework, Objective C is really just an interpreted language with pre-JITed function bodies.
Here's the thing. You can not load YAML with attackable data. Period. If you do, you have to assume bad things are going to happen. The fact that psych calls []=(key, val) on instantiated objects in combination with ActionController::Routing::RouteSet::NamedRouteCollection calling eval on the key made for a particularly easy drive-by attack on a huge range of deployments, but even without the []=, there are still plenty of ways to exploit loading arbitrary YAML, though they may require more custom targeting.
In terms of that issue request, I doubt that adding a safe_load option would have stopped the Rails vulnerability. After all, the Rails guys _already knew_ that they should not be loading YAML from the request body; that's why it was not allowed directly. The issue was loading XML, which then allowed YAML to be loaded. Allowing YAML to be loaded there was a mistake; it seems unlikely that someone would make that mistake, while at the same time mitigating it by adding safe_load.
You're describing the previous Ruby on Rails vulnerability. The latest one involved them deliberately using the YAML parser to parse untrusted JSON data. Also, the RubyGems compromise was a result of them parsing gem metadata represented using YAML - since the metadata is YAML you pretty much have to use a YAML parser to parse it.
Using YAML to parse JSON was obviously non-optimal, which is (presumably) why Rails stopped doing it in 3.1 (thus the vulnerability your refer to is only present in 3.0 and 2.x).
W.r.t RubyGems, I hear what you're saying, but that doesn't mean there's a bug in psych. Even the feature request of adding a safe_load option strikes me as problematic...either you're limiting the markup to json with comments, or you'd have to name the option something like sort_of_safe_load.
Spring isn't the only widely-used Java framework that's had these problems. The Struts developers put a general-purpose interpreter (OGNL) in their parameter parsing pipeline, and thought they'd kept things safe by blacklisting dangerous syntax.
It would obviously be unfair to claim on this basis, or the recent problems with the Java browser plugin, that the "entire Java language community" has a bad attitude on security matters. Communities are big, each of them has a range of attitudes within it, and most importantly --- regardless of attitude --- sooner or later, everyone screws up.
Parsing is not deserialization. I keep having to say this on all these threads. There should be a giant glowing force field between these two practices.
The security fuckup is a lot more simple than that - they fucked up as soon as they opened the door to this kind of complicated interaction by letting untrusted code instantiate arbitrary classes and pass strings of their choice to them. Doesn't matter that they weren't aware of any way this could be exploited, as soon as they let an attacker pass data to random classes that were never designed to accept untrusted input a security disaster was basically inevitable.
"Yes, everybody has bugs, but most people's bugs aren't an intentional feature that a trained monkey ought to have known was a bad idea."
First, given how many times I've seen a deserialization library "helpfully" allow you to deserialize into arbitrary objects in a language that is sufficiently dynamic to turn this into arbitrary code execution, evidence suggests this is not an accurate summary. I'd like to see "Don't deserialize into arbitrary objects" become General Programming Wisdom, but it is not there yet.
It's not like we live in a world where XSS is rare or anything anyhow. The general level of programming aptitude is low here. That's bad, regrettable, something I'd love to see change and love to help change, but it is also something we have to deal with as a brute fact.
Secondly, there's still the points of A: even if you don't use Ruby on Rails, your life may still be adversely affected by the Severity: Apocalyptic bug, and B: what are you going to do when the Severity: Apocalyptic bug is located in your codebase? And that's putting aside the obvious matters of what to do if you use Ruby on Rails and this was your codebase. The exact details of today's Severity: Apolalyptic bug are less relevant than you may initially think. Go back and read the piece, strike every sentence that contains "YAML". It's still a very important piece.
At which point a re-quoting of my favorite line in the piece is probably called for: "If you believe in karma or capricious supernatural agencies which have an active interest in balancing accounts, chortling about Ruby on Rails developers suffering at the moment would be about as well-advised as a classical Roman cursing the gods during a thunderstorm while tapdancing naked in the pool on top of a temple consecrated to Zeus while holding nothing but a bronze rod used for making obscene gestures towards the heavens." Epic.
No, his point is that it already is and we can declare anybody who doesn't know to be worse than a "trained monkey". I say the evidence clearly shows that it is not widespread knowledge (IMHO "trained monkey" makes it sound like you could ask a normal college sophomore about this and get the correct response, which is just false, a lot of programmers don't even know what the term "serialization" means), and while it is a great goal we can not act as if it is already true.
Speaking to the first portion of your complaint, I remember coming across that particular feature of YAML years ago, and being surprised at how odd it was that it was present. I shrugged it off because (at the time), I considered YAML to be something that simply would not enter the serialize / deserialize phase of incoming requests. Ignorance of all the frameworks actions on my part, and further ignorance to not think of the security effects of that particular feature, certainly. I would presume that I am not the only in that situation.
You're definitely right that the security reports should be handled better. I hope that this whole situation results in a better security culture in the Ruby community.
Regarding your tone ("intellectually dishonest", "trained monkey", "systemic engineering incompetence pervades an entire language community"), it's a bit of hyperbole and active trolling. You are certainly right in many of your points, and you are certainly coming off as a jerk. It may not be as cathartic for you, but I'd suggest toning it down to "reasonable human being" level in the future.
> It may not be as cathartic for you, but I'd suggest toning it down to "reasonable human being" level in the future.
The Rails community has exhibited such self-assured, self-promotional exuberance for so long (and continues to do so here), it feels necessary to rely on equivalently forceful and bellicose language to have a hope of countering the spin and marketing messaging.
Case in point, the article seriously says, with a straight face:
"They’re being found at breakneck pace right now precisely because they required substantial new security technology to actually exploit, and that new technology has unlocked an exciting new frontier in vulnerability research."
Substantial new security technology? To claim that a well known vulnerability source -- parsers executing code -- involves not only substantial new technology, but is a new frontier in vulnerability research?
This is pure marketing drivel intended to spin responsibility away from Ruby/Rails, because the problems are somehow advanced and new. This is not coming from some unknown corner of the community, but from a well-known entity with a significant voice.
I can understand your frustration with the community. I share this frustration at many times, because I feel that Rails / Ruby tends to value style over substance.
I'll also raise an eyebrow at that particular sentence, though without spending much time looking into what's backing it I can only add that I too find that slightly incredulous.
I definitely question your stated intent. Were you to "counter the spin and marketing messaging", would that reduce the number of vulnerable machines? Overall, reduce the number of people that use Ruby/Rails, if that is your intent? Given the number of comments you've made to that effect versus the number of folks using Ruby/Rails, I'd suggest you have a very long battle in front of you.
Put another way, I perceive your tone as an exasperated, reactionary tone to a group that you happen not to like. If you are indeed trying to achieve some greater good here, I believe there's more effective ways you could achieve it.
Otherwise, just tone it down in the future. You had good points, there's no need to insult people from an effectively unassailable position.
> Overall, reduce the number of people that use Ruby/Rails, if that is your intent? Given the number of comments you've made to that effect versus the number of folks using Ruby/Rails, I'd suggest you have a very long battle in front of you.
I'd like it to be 'cool' in the Ruby community to apply serious care towards security, API stability, code maintainability, and all the other things that aren't necessarily fun, but are very much necessary to avoid both huge aggregate overhead over time, and huge expensive failures like this one.
I'd like to see a shift towards an engineering culture where taking the time to consider things seriously is considered 'cooler' than spinning funny project names, promoting swearing in presentations, and posting ironic videos.
It seems increasingly obvious to me that for this to occur, one can succeed in pushing back against emotive marketing with a similar approach, and thus shift the conversation.
> The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.
Is that seriously what happened? It sounds oddly similar to the Rails issue from about a year ago (the one in which the reporter was able to commit to master on Github), even though I believe that was a separate set of developers altogether.
If so, then that might suggest a larger community/cultural issue, which makes me wonder what other exploits exist but haven't been reported (publicly) yet...
I agree with the sentiment to help and not just throw stones, but I think all the outrage and whatnot is very useful: it makes this kind of stuff less likely to happen.
Rails needs to die. It is super nice to code in (for a certain class of problems, ie. CRUD apps) and the language is awesome but it is too big and insecure to use.
Anything you implement to replace the functionality missed by not using Rails will be, statistically, just as insecure. Arguably, even more-so because you will no doubt lack the peer review a large project like Rails benefits from.
I don't think so. Rails has to cover all cases, you just have to code the few cases that you actually use.
And even if you get it wrong, you get it wrong in a different way. That might mean that you are technically more at risk, but so long as the attack is focused on getting as many targets as possible, rather than you explicitly, then that is arguably a great strategy: the cost of adapting an already existing attack to target a novel target is going to be astronomically high, versus using an already existing vulnability. If you are refining neuclear material for Iran, you are going to need all the protection you can get; if you are just another start-up you just need not to be vulneable to the latest drive-by exploit.
Can we please try to avoid making generalisations like this? Yes, the ruby community has some very vocal contributors with very questionable social skills. Please don't assume that all ruby developers are egotistical hipster hackers. The creator or Ruby, Matsumoto Yukihiro is one of the most softly spoken and humble individuals I have encountered in technology. We can all learn by his example.
Some ruby devs do, yes. Sadly, this feudalistic approach is prevalent in our industry which hurts all of us. It is probably the reason we have to keep re-learning the same concepts over-and-over again.
There is no karma here, there is just a race to the bottom for all of us. I thought the point of OS was for us all to group together and find and address these issues?
Wouldn't whitelisting conforming input be a better approach? I realize it may be more difficult, but wouldn't that be more secure?
Edit: I'm genuinely interested - I always try and whitelist things when I'm building software. Although I have next to no background when it comes to security in particular.
I know it was a different vulnerability - I was asking more whether the same developer(s) was/were responsible, since this seems to be a comment pattern that I'm hearing with regards to the initial response to the vulnerabilities.
What is crazy to me is everyone has had this bug, and learned from it, and fixed it. Why has it taken so long for Rails? Java has this bug; you can't deserialize untrusted input without a lot of work. Python has this bug; you can't unpickle untrusted input. Bad Javascript JSON parsers that just call eval() have this bug. It's not a complicated concept; you can't treat untrusted user input as code to execute. How'd the YAML developers miss it?
My analysis: layering. YAML doesn't in itself execute untrusted code, but it has a bunch of semi-experimental features most people don't know about or use, and one of those features (deserializing arbitrary classes) has a side effect that sets values on certain other classes -- not written by the people who wrote the YAML parser -- that then, often later on in their lifecycle, execute this untrusted code. I'm not saying this as an excuse -- I have long mistrusted YAML's blasé complexity, not to mention Rails' anarchic pass-HTTP-params-directly-into-the-DB pattern -- but as an explanation of how they missed it.
Also, more than other communities, Ruby has a cultural gap between the people developing the language and core libraries and the people using it to write web apps and frameworks.
Thanks, the complexity and layering are presumably part of the problem. This reminds me of the old XML External Entity attack that keeps coming back because developers don't realize you can coerce most XML parsers to open arbitrary URLs. That's been affecting products that parse XML for 10 years now and still hasn't stopped and leads to ugly security holes (like in Adobe Reader). The root cause is XML is far too complex and has surprising features, in this case, entity definition by URL.
The initial code for Rubygems was written at Rubyconf '06, if I remember correctly. The Ruby world was very, very different back then. Same with Rails, originally released in '05.
My point is that it's 'taken so long' because all this code is stuff that was written in a totally different time and place. And then was built on top of, after years and years and years.
Now that it _is_ being examined, that's why you see some many advisories. This is a good thing, not a bad one! It's all being looked through and taken care of.
Because it was not obvious that "allowing de-serialization to objects of arbitrary classes specified in the serialized representation" was the same thing as "treating input as code to execute"
And then, as someone else said, becuase of layering. The next downstream user using YAML might not have even realized that YAML had this feature, on top of not realizing the danger of this feature. And then someone else downstream of THAT library, etc.
Maybe it _should_ have been obvious, but it wasn't, as evidenced, as you say, by all the people who have done it before. After the FIRST time it was discovered, it should have been obvious, why did it happen even a second?
In part, becuase for whatever reason, none of those exploits got the (negative) publicity that the rails/yaml one is getting. Hopefully it (the dangers of serialization formats allowing arbitrary class/type de-serialization) WILL become obvious to competent developers NOW, but it was not before.
20 years ago, you could write code thinking that giving untrusted user input to it was a _special case_. "Well, I guess, now that you mention it, if you give untrusted input that may have been constructed by an attacker to this function it would be dangerous, but why/how would anyone do that?" Things have changed. There's a lot more code where you should be assuming that passing untrusted input to it will be done, unless you specifically and loudly document not to. But we're still using a lot of code written under the assumptions of 20 years ago -- assumptions that were not neccesarily wrong cost/benefit analyses 20 years ago. And yeah, some people are still WRITING code under the security assumptions of 20 years ago too, oops.
At the same time, we have a LOT MORE code _sharing_ than we had 20 years ago. (internet open source has changed the way software is written, drastically) And ruby community is especially 'advanced' at code sharing, using each other's code as dependencies in a complex multi-generation dependency graph. That greatly increases the danger of unexpected interactions of features creating security exploits that would not have been predicted by looking at any part in isolation. But we couldn't accomplish what we have all accomplished without using other people's open source code as more-or-less black box building blocks for our own, we can't do a full security audit of all of our dependencies (and our dependencies' dependencies etc).
Presumably they missed it the same way the developers of those other things you listed did. I can only assume they didn't know about these other problems when developing their YAML parsers.
Of course, you could argue that developers should always be thinking about and searching for security related issues in whatever field they're working in, but that doesn't appear to be the norm at the moment.
Python has this bug; you can't unpickle untrusted input
I thought you could unpickle untrusted input in Python? Sure there's a great big red warning message on the documentation, and hence it's currently rare for people to do it, but it is technically allowed, right?
> Someone implemented a YAML parser that executed code. This should have been obviously wrong to them, but it wasn't.
Someone implemented a YAML parser that could serialize and de-serialize arbitrary objects referenced by class name.
It was not obvious that this meant it 'executed code', let alone that this meant it could execute _arbitrary_ code, so long as there was a predictable class in the load path with certain characteristics, which there was in Rails.
In retrospect it is obvious, but I think you over-estimate the obviousness without hindsight. It's always easy to say everyone should have known what nobody actually did but which everyone now does.
As others have pointed out, an almost identical problem existed in Spring too (de-serializing arbitrary objects leads to arbitrary code execution). It wasn't obvious to them either. Maybe it _should_ have been obvious _after_ that happened -- but that vulnerability didn't get much publicity. Now that the YAML one has, maybe it hopefully WILL be obvious next time!
Anyhow, that lack of obviousness applies to at least your first two points if not first three. It was not in fact obvious to most people that you could execute (arbitrary) code with YAML. If it was obvious to you, I wish you had spent more time trying to 'paul revere' it.
> The issue was reported to RubyGems multiple times and they did nothing.
Now, THAT part, yeah, that's a problem. I think 'multiple times' is 'two' (yeah, that is technically 'multiple'), and only over a week -- but that still indicates irresponsibility on rubygems.org maintainers part. A piece of infrastructure that, if compromised, can lead to compromise to almost all or rubydom -- that is scary, that needs a lot more responsibilty than it got. We're lucky the exploit was in fact publisizied rather than kept secret and exploited to inject an attack into the code of any ruby gem an attacker wanted -- except of course, we can't know for sure if it was or not.
"The "everybody has bugs" response is intellectually dishonest."
Indeed. It's the "fallacy of gray". Nothing is black or white, hence everything is gray. Nothing is 100% secure, nothing is 100% insecure, hence everything is "semi-secure": it's bad, but not too bad, because every language / API / server can be attacked.
You've effectively substituted a black/white dichotomy with something even worse: instead of having only two options (black or white), you now only have one: gray.
It is probably one of the most intellectually dishonest logical fallacy of all times and we keep seeing it more and more.
- Someone implemented a YAML parser that executed code. This should have been obviously wrong to them, but it wasn't.
- Thousands of ostensible developers used this parser, saw the fact that it could deserialize more than just data, and never said "Oh dear, that's a massive red flag".
- The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.
- The issue was reported to RubyGems multiple times and they did nothing.
This isn't the same thing as a complex and accidental bug that even careful engineers have difficulty avoiding, after they've already taken steps to reduce the failure surface of their code through privilege separation, high-level languages/libraries, etc.
This is systemic engineering incompetence that apparently pervades an entire language community, and this is the tipping point where other people start looking for these issues.