- Someone implemented a YAML parser that executed code. This should have been obviously wrong to them, but it wasn't.
- Thousands of ostensible developers used this parser, saw the fact that it could deserialize more than just data, and never said "Oh dear, that's a massive red flag".
- The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.
- The issue was reported to RubyGems multiple times and they did nothing.
This isn't the same thing as a complex and accidental bug that even careful engineers have difficulty avoiding, after they've already taken steps to reduce the failure surface of their code through privilege separation, high-level languages/libraries, etc.
This is systemic engineering incompetence that apparently pervades an entire language community, and this is the tipping point where other people start looking for these issues.
If J2EE is a boring platform to you, pick your favorite and Google for a few variants. You'll find a serialization vulnerability. It's hard stuff, by nature.
* The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.*
Do you have a citation for this? What particular bug in the parser are you referring to? The behavior which is being exploited is a fairly complicated interaction between the parser and client Rails code -- I banged my head against the wall trying to get code execution with Ruby 1.8.7's parser for over 12 hours, for example, without any luck unless I coded a too-stupid-to-be-real victim class. (It's my understanding that at least one security researcher has a way to make that happen, but that knowledge was hard won.)
Yes, this is always a bad idea. It's actually in a similar problem space as the constant stream of vulnerabilities in the Java security sandbox (eg, applets); all it takes is one mistake and you lose.
And thus, people have been saying to turn off Java in the browser for 4+ years, and this is also why Spring shouldn't have implemented such code.
> It's hard stuff, by nature.
Which is why deserializing into executable code is a bad idea, by nature. I'd thought this was well established by now, but apparently it is not.
> Do you have a citation for this? What particular bug in the parser are you referring to?
The original target of that claim was the Ruby community. With this comment allowing the same issue existing in the Java community, are you leveling the same claim against it? Does every severe security issue that remains unnoticed by a community for some time and is eventually noticed suggest pervasive engineering incompetence throughout that entire community? Maybe you would be entirely right to make that claim because any security issue is indicative of incompetence at some level, but I think the closer your definition of incompetence comes to including everybody, the less useful that definition is.
I'm not sure that means anything. In an OO language, you are always de-serializing into objects, and objects are always 'executable code'. Hashes and Arrays are executable code too, right?
The problem is actually when you allow de-serializing into _arbitrary_ objects of arbitrary classes, and some of those objects have dangerous side effects _just by being instantiated_, and/or have functionality that can turn into arbitrary code execution vector. (Hopefully Hash's and Array's don't).
It is a problem, and it's probably fair to say that you should never have a de-serialization format that takes untrusted input and de-serializes to anyting but a small whitelisted set of classes/types. And that many have violated this, and not just in ruby.
But if you can't even describe the problem/guidance clearly yourself, I think that rather belies your insistence that it's an obvious thing known by the standard competent programmer.
(I am not ashamed to admit it was not obvious to me before these exploits. I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.).
No. You're conflating code and state (which was the problem to begin with!)
Let's disassemble parsing a list of strings:
When you instantiate the individual string objects, you do not 'eval' the data to allow it to direct which string class should be instantiated. You also do not 'eval' the data to determine which fields to set on the string class.
You instantiate a known String type, and you feed it the string representation as an array of non-executable bytes using a method you specified when writing your code -- NOT a method the data specifies.
The data is not executable. It's an array of untrusted bytes. The string code is executable, and it operates on state: the data.
You repeat this process, feeding the string objects into the list object. At no point do you ask the data what class or code you should run to represent it. Your parsing code dictates what classes to instantiate, and the data is interpreted according to those fixed rules, and your data is never executed.
It should never be possible for data to direct the instantiation of types. The relationship must always occur in the opposite direction, whereby known types dictate how to interpret data.
> I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.
Given the preponderance of prior art, this seems unlikely.
It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
To use your language, I don't think it's 'intellectual honest' to call allowing de-serialization to data-specified classes "a YAML parser that executed code"--that's being misleading -- or to say that a 'trained monkey should have known it was a bad idea' (allowing de-serialization to arbitrary data-specified classes).
There have been multiple vulnerabilities _just like this_ in other environments, including several in Java (and in major popular Java packages). You could say with all that prior art it ought to have been obvious, but of course you could say that for each of the multiple prior vulnerabilities too. Of course, each time there's even more prior art, and for whatever reason this one finally got enough publicity that maybe this kind of vulnerablity will be common knowledge now.
> It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
That is eval behavior.
In a traditionally compiled OO language like C++, classes cease to exist after compilation; there is no fully generic way to instantiate an object of a class by data determined at runtime. So this whole concept of deserializing to whatever the protocol specifies goes completely out of the door.
(You can instantiate objects with classes specified by data in Java too, although Java isn't usually considered exactly dynamicaly interpreted. In fact, there was a very analagous bug in Spring, as mentioned in many places in this comment thread. But anyway, okay, sufficiently dynamically interpreted to allow instantiation of objects with classes chosen at runtime... is the root of the problem, you're suggesting, if everyone just used C++ it would be fine?)
In terms of that issue request, I doubt that adding a safe_load option would have stopped the Rails vulnerability. After all, the Rails guys _already knew_ that they should not be loading YAML from the request body; that's why it was not allowed directly. The issue was loading XML, which then allowed YAML to be loaded. Allowing YAML to be loaded there was a mistake; it seems unlikely that someone would make that mistake, while at the same time mitigating it by adding safe_load.
W.r.t RubyGems, I hear what you're saying, but that doesn't mean there's a bug in psych. Even the feature request of adding a safe_load option strikes me as problematic...either you're limiting the markup to json with comments, or you'd have to name the option something like sort_of_safe_load.
Wackiness ensued: http://blog.o0o.nu/2010/07/cve-2010-1870-struts2xwork-remote...
It would obviously be unfair to claim on this basis, or the recent problems with the Java browser plugin, that the "entire Java language community" has a bad attitude on security matters. Communities are big, each of them has a range of attitudes within it, and most importantly --- regardless of attitude --- sooner or later, everyone screws up.
the particular issue in the Yaml parser is explained pretty well here: http://www.insinuator.net/2013/01/rails-yaml/
First, given how many times I've seen a deserialization library "helpfully" allow you to deserialize into arbitrary objects in a language that is sufficiently dynamic to turn this into arbitrary code execution, evidence suggests this is not an accurate summary. I'd like to see "Don't deserialize into arbitrary objects" become General Programming Wisdom, but it is not there yet.
It's not like we live in a world where XSS is rare or anything anyhow. The general level of programming aptitude is low here. That's bad, regrettable, something I'd love to see change and love to help change, but it is also something we have to deal with as a brute fact.
Secondly, there's still the points of A: even if you don't use Ruby on Rails, your life may still be adversely affected by the Severity: Apocalyptic bug, and B: what are you going to do when the Severity: Apocalyptic bug is located in your codebase? And that's putting aside the obvious matters of what to do if you use Ruby on Rails and this was your codebase. The exact details of today's Severity: Apolalyptic bug are less relevant than you may initially think. Go back and read the piece, strike every sentence that contains "YAML". It's still a very important piece.
At which point a re-quoting of my favorite line in the piece is probably called for: "If you believe in karma or capricious supernatural agencies which have an active interest in balancing accounts, chortling about Ruby on Rails developers suffering at the moment would be about as well-advised as a classical Roman cursing the gods during a thunderstorm while tapdancing naked in the pool on top of a temple consecrated to Zeus while holding nothing but a bronze rod used for making obscene gestures towards the heavens." Epic.
I think that's pifflesnort's point.
You're definitely right that the security reports should be handled better. I hope that this whole situation results in a better security culture in the Ruby community.
Regarding your tone ("intellectually dishonest", "trained monkey", "systemic engineering incompetence pervades an entire language community"), it's a bit of hyperbole and active trolling. You are certainly right in many of your points, and you are certainly coming off as a jerk. It may not be as cathartic for you, but I'd suggest toning it down to "reasonable human being" level in the future.
The Rails community has exhibited such self-assured, self-promotional exuberance for so long (and continues to do so here), it feels necessary to rely on equivalently forceful and bellicose language to have a hope of countering the spin and marketing messaging.
Case in point, the article seriously says, with a straight face:
"They’re being found at breakneck pace right now precisely because they required substantial new security technology to actually exploit, and that new technology has unlocked an exciting new frontier in vulnerability research."
Substantial new security technology? To claim that a well known vulnerability source -- parsers executing code -- involves not only substantial new technology, but is a new frontier in vulnerability research?
This is pure marketing drivel intended to spin responsibility away from Ruby/Rails, because the problems are somehow advanced and new. This is not coming from some unknown corner of the community, but from a well-known entity with a significant voice.
I'll also raise an eyebrow at that particular sentence, though without spending much time looking into what's backing it I can only add that I too find that slightly incredulous.
I definitely question your stated intent. Were you to "counter the spin and marketing messaging", would that reduce the number of vulnerable machines? Overall, reduce the number of people that use Ruby/Rails, if that is your intent? Given the number of comments you've made to that effect versus the number of folks using Ruby/Rails, I'd suggest you have a very long battle in front of you.
Put another way, I perceive your tone as an exasperated, reactionary tone to a group that you happen not to like. If you are indeed trying to achieve some greater good here, I believe there's more effective ways you could achieve it.
Otherwise, just tone it down in the future. You had good points, there's no need to insult people from an effectively unassailable position.
I'd like it to be 'cool' in the Ruby community to apply serious care towards security, API stability, code maintainability, and all the other things that aren't necessarily fun, but are very much necessary to avoid both huge aggregate overhead over time, and huge expensive failures like this one.
I'd like to see a shift towards an engineering culture where taking the time to consider things seriously is considered 'cooler' than spinning funny project names, promoting swearing in presentations, and posting ironic videos.
It seems increasingly obvious to me that for this to occur, one can succeed in pushing back against emotive marketing with a similar approach, and thus shift the conversation.
Is that seriously what happened? It sounds oddly similar to the Rails issue from about a year ago (the one in which the reporter was able to commit to master on Github), even though I believe that was a separate set of developers altogether.
If so, then that might suggest a larger community/cultural issue, which makes me wonder what other exploits exist but haven't been reported (publicly) yet...
And the RubyGems folks are trying to handle this with whitelisting specific classes that the YAML parsing will still be allowed to instantiate:
We can either sit around throwing stones at them or pull up our sleeves and help. I'm not sure what there is to gain with the former.
And even if you get it wrong, you get it wrong in a different way. That might mean that you are technically more at risk, but so long as the attack is focused on getting as many targets as possible, rather than you explicitly, then that is arguably a great strategy: the cost of adapting an already existing attack to target a novel target is going to be astronomically high, versus using an already existing vulnability. If you are refining neuclear material for Iran, you are going to need all the protection you can get; if you are just another start-up you just need not to be vulneable to the latest drive-by exploit.
There is no karma here, there is just a race to the bottom for all of us. I thought the point of OS was for us all to group together and find and address these issues?
You know, kumbaya and all that...
Failure to blacklist non-conforming input.
Really, it is that simple and that complicated.
Edit: I'm genuinely interested - I always try and whitelist things when I'm building software. Although I have next to no background when it comes to security in particular.
Whitelisting is what the rubygems folks are doing to work around this problem until a better implementation is put in-place in the YAML parser.
Generally, it is a better solution but it is more difficult and can break a lot of dependencies if not implemented correctly.
Yes, you are absolutely right.
Hey, yes, yaml bug is _very_ similar. Whitelist is better than no list at all
Also, more than other communities, Ruby has a cultural gap between the people developing the language and core libraries and the people using it to write web apps and frameworks.
Here's two good technical writeups of the exploit as it applies to Rails apps: http://blog.codeclimate.com/blog/2013/01/10/rails-remote-cod... http://ronin-ruby.github.com/blog/2013/01/09/rails-pocs.html
My point is that it's 'taken so long' because all this code is stuff that was written in a totally different time and place. And then was built on top of, after years and years and years.
Now that it _is_ being examined, that's why you see some many advisories. This is a good thing, not a bad one! It's all being looked through and taken care of.
And then, as someone else said, becuase of layering. The next downstream user using YAML might not have even realized that YAML had this feature, on top of not realizing the danger of this feature. And then someone else downstream of THAT library, etc.
Maybe it _should_ have been obvious, but it wasn't, as evidenced, as you say, by all the people who have done it before. After the FIRST time it was discovered, it should have been obvious, why did it happen even a second?
In part, becuase for whatever reason, none of those exploits got the (negative) publicity that the rails/yaml one is getting. Hopefully it (the dangers of serialization formats allowing arbitrary class/type de-serialization) WILL become obvious to competent developers NOW, but it was not before.
20 years ago, you could write code thinking that giving untrusted user input to it was a _special case_. "Well, I guess, now that you mention it, if you give untrusted input that may have been constructed by an attacker to this function it would be dangerous, but why/how would anyone do that?" Things have changed. There's a lot more code where you should be assuming that passing untrusted input to it will be done, unless you specifically and loudly document not to. But we're still using a lot of code written under the assumptions of 20 years ago -- assumptions that were not neccesarily wrong cost/benefit analyses 20 years ago. And yeah, some people are still WRITING code under the security assumptions of 20 years ago too, oops.
At the same time, we have a LOT MORE code _sharing_ than we had 20 years ago. (internet open source has changed the way software is written, drastically) And ruby community is especially 'advanced' at code sharing, using each other's code as dependencies in a complex multi-generation dependency graph. That greatly increases the danger of unexpected interactions of features creating security exploits that would not have been predicted by looking at any part in isolation. But we couldn't accomplish what we have all accomplished without using other people's open source code as more-or-less black box building blocks for our own, we can't do a full security audit of all of our dependencies (and our dependencies' dependencies etc).
Of course, you could argue that developers should always be thinking about and searching for security related issues in whatever field they're working in, but that doesn't appear to be the norm at the moment.
I thought you could unpickle untrusted input in Python? Sure there's a great big red warning message on the documentation, and hence it's currently rare for people to do it, but it is technically allowed, right?
This is master level, "Captain Obvious"-style trolling, beyond me how this is the top comment in a place like HN.
Someone implemented a YAML parser that could serialize and de-serialize arbitrary objects referenced by class name.
It was not obvious that this meant it 'executed code', let alone that this meant it could execute _arbitrary_ code, so long as there was a predictable class in the load path with certain characteristics, which there was in Rails.
In retrospect it is obvious, but I think you over-estimate the obviousness without hindsight. It's always easy to say everyone should have known what nobody actually did but which everyone now does.
As others have pointed out, an almost identical problem existed in Spring too (de-serializing arbitrary objects leads to arbitrary code execution). It wasn't obvious to them either. Maybe it _should_ have been obvious _after_ that happened -- but that vulnerability didn't get much publicity. Now that the YAML one has, maybe it hopefully WILL be obvious next time!
Anyhow, that lack of obviousness applies to at least your first two points if not first three. It was not in fact obvious to most people that you could execute (arbitrary) code with YAML. If it was obvious to you, I wish you had spent more time trying to 'paul revere' it.
> The issue was reported to RubyGems multiple times and they did nothing.
Now, THAT part, yeah, that's a problem. I think 'multiple times' is 'two' (yeah, that is technically 'multiple'), and only over a week -- but that still indicates irresponsibility on rubygems.org maintainers part. A piece of infrastructure that, if compromised, can lead to compromise to almost all or rubydom -- that is scary, that needs a lot more responsibilty than it got. We're lucky the exploit was in fact publisizied rather than kept secret and exploited to inject an attack into the code of any ruby gem an attacker wanted -- except of course, we can't know for sure if it was or not.
Er, there would have been trouble on that end too ...
Indeed. It's the "fallacy of gray". Nothing is black or white, hence everything is gray. Nothing is 100% secure, nothing is 100% insecure, hence everything is "semi-secure": it's bad, but not too bad, because every language / API / server can be attacked.
You've effectively substituted a black/white dichotomy with something even worse: instead of having only two options (black or white), you now only have one: gray.
It is probably one of the most intellectually dishonest logical fallacy of all times and we keep seeing it more and more.
It's really concerning.