
On Python security amidst recent Rails/YAML vulnerabilities - whalesalad
http://nedbatchelder.com/blog/201302/war_is_peace.html?
======
patio11
I think load and dangerous_load is a wonderful paradigm for API design. It
bakes security into the core use of your framework for non-security devs. It
_also_ makes the framework more productive for security professionals, since
for an assessment you can grep for dangerous_load and focus your efforts on
making sure those (rare) calls are hermetically sealed from user input rather
than having to audit every possible use of the 100x more common load command.
(That is where the Rails community is right now, and it sucks.)

This is similar to requiring a whitelist for mass assignment (rather than
making a blacklist optional): it calls out in the code "These are our weak
points! Check them carefully!"

~~~
untog
_I think load and dangerous_load is a wonderful paradigm for API design._

It is, but I think it ignores the reality of API design. Who would ever
_start_ with dangerous_load? If you knew it was dangerous, you'd probably fix
it. No, you start with load(), then find out it's dangerous afterwards. The
fix will be a breaking change, so you make safe_load().

"You should replace load()!", I hear you cry. But that removes backwards
compatibility. And people don't update legacy software. So then old software
becomes vulnerable to who knows how many _other_ security problems because of
it's obsolescence.

~~~
vog
_> ... ignores the reality of API design. Who would ever start with
dangerous_load?_

When I wrote the Texcaller library, which simplifies compiling (La)TeX code, I
disabled dangerous features such as "write18" from the very beginning.

I you don't take basic elements such as "secure/unsecure" variants into
account, you're no really doing API _design_. You then just have a
historically grown API, which is not necessarily bad in itself, but doesn't
qualify to be called _API design_.

 _> But that removes backwards compatibility. And people don't update legacy
software. So then old software becomes vulnerable to who knows how many other
security problems because of it's obsolescence._

In that case, at least only old hard-to-update legacy software is affected,
and not all the other (good, mostly well-written) software that is written
today and in the future.

Of course, don't forget to increase the major version number, as for every API
change that breaks backward compatibility.

~~~
untog
_doesn't qualify to be called API design._

True. I should have said the reality of the _difficulty of_ API design.

 _In that case, at least only old hard-to-update legacy software is affected,
and not all the other (good, mostly well-written) software that is written
today and in the future._

I like the sentiment, but I can't agree with it. In an ideal world all
critical systems would always be kept up the date, but that doesn't happen.
Has Python 3 usage overtaken 2.x yet?

------
vog
From the article:

 _> PyYAML has a .load() method and a .safe_load() method. Why do
serialization implementers do this? If you must extend the format with
dangerous features, provide them in the non-obvious method. Provide a .load()
method and a .dangerous_load() method instead._

I think this is a very good advice that holds in general:

The default should never be the most feature-rich version, but the _most safe
version_. This is also why you should generally prefer a whitelist approach
over a blacklist approach. And this is why templating systems should perform
escaping by default, forcing you to explicitly disable it, at concrete places,
when including raw HTML.

------
gingerlime
+1 for `dangerous_load`. If it was implemented this way, rather than `load`
and `safe_load`, I doubt we would have seen vulnerabilities like the one in
both tastypie and piston, the two leading API libraries for Django.

See [https://www.djangoproject.com/weblog/2011/nov/01/piston-
and-...](https://www.djangoproject.com/weblog/2011/nov/01/piston-and-tastypie-
security-releases/)

Whilst python might seem safer than the state in Ruby/Rails, it also has its
history of vulnerabilities.

------
grey-area
The comments on this post are really interesting ( save the first, which
unfortunately stoops to much the sort of tribal content-free attacks we've
seen on rails vuln news on HN recently). I particularly liked the long one
from Nick Coghlan, and though it does seem there are still some worrying
vulnerabilities in python, they are ahead of ruby in their packaging system at
least in relying on signatures. keen to see ruby gems step up and take
security equally seriously.

~~~
PommeDeTerre
Yes, Ruby and Ruby on Rails have received a lot of flak lately, but it is
well-deserved and I don't think it is "tribal" in nature.

To many of us, these are just yet another set of tools in our very large
toolbox. It's obvious that some tools are inherently better than others,
however.

When I point out that Ruby on Rails or JavaScript have some serious inherent
problems, it's not because I think that I belong to some Python "tribe" or the
Perl "camp", for instance. It's because I'm doing rational, emotionally-
detached analysis of certain pieces of software, and this analysis shows there
to be serious problems with said software.

I think the same goes for the other people out there who have the courage to
point out flaws with JavaScript, Ruby and related technologies. If anyone is
acting "tribal", it's those who are so emotionally tied to a particular
language or web development framework that they can't stand to hear legitimate
concerns regarding important factors like security, performance,
maintainability and reliability.

~~~
JshWright
In my opinion, the first comment on the post isn't 'tribal' simply because it
blindly attacks Ruby, but because it also blindly disregards the
vulnerabilities in Python (which the author carefully and reasonably
outlined).

~~~
mpyne
Have you ever looked at the pickle docs though? There is like, literally a
'red alert' banner saying how dangerous it is in combination with arbitrary
untrusted input.

PyYAML's warning is not quite so blatant
(<http://pyyaml.org/wiki/PyYAMLDocumentation>) but when you get past
installation blah blah blah onto actually loading YAML, the first thing it
says (in bold) is that using .load is as dangerous as pickle.load, and it
references looking as .safe_load instead.

However, it would be better for the tutorial that immediately follows to use
safe_load() __everywhere __that it can reasonably be used and to only mention
.load as an advanced topic (except to mention at first that it exists but
shouldn't be used)

~~~
Xylakant
You're missing the "real" vulnerability here. Yaml or in general object
instantiation is the attack vector - and admittedly a particularly stupid and
painful one - but the real vulnerability is sharing code via unsigned
repositories. There are more vectors to break into a repository server. So the
more serious problem is "How do we secure code?" and "How do we establish
trust for shared code?"

This issue will follow us around for quite a bit, even after the YAML bugs
have been fixed and gemcutter rebuilt. For ruby that means sign gems, for
python sign pip packages. And that's a point where python is not substantially
better of than ruby, heck, even PHP (packagist), node (npm) and java (Maven)
are in the same boat here. There's something more to learn here than pointing
out that "the pickle docs are better than the psych docs" and for all the pain
this incident brought the ruby developers, I'd be grateful if all other
language communities learned from it - but if you rather prefer leaning back
in your chair and pointing out your your communities docs state the danger
clearly[1], then you're welcome. This is exactly the trap that the first
commenter on the blog falls into. He sees this mess as a pure ruby problem and
attacks ruby instead of stepping back and trying to figure out why this
affects him as well.

[1] e.g. like npm, which helpfully states that packages should be inspected
before installing them. How many people do you expect to actually do that?

~~~
mpyne
You're exactly right to bring up the issues with code distribution (e.g. for
CPAN, PyPI), but the YAML usage is a more general/different problem. As far as
I understand it Ruby on Rails would have been vulnerable even if you installed
it from cryptographically-signed tarballs without any additional code from
Rubygems.

But pointing out that other languages don't have super-secure code
distribution systems doesn't change that they at least understand the danger
of deserializers that can run arbitrary code or create arbitrary objects,
_especially_ when Ruby is also weak in this area.

At least for Python I would hope that had learned their lesson in 2011 when
some popular third-party Django plugins used YAML.load instead of
YAML.safe_load, instead of waiting for this. Of course, RoR devs might have
noticed the same issue at that time, but there's nothing we can do about it
now.

~~~
Xylakant
See, there's a lot more to ruby than rails - I love and use padrino, since it
doesn't include as much magic. It doesn't suffer from the rails
vulnerabilities caused by the yaml usage. There's also a problem, that in ruby
yaml is the to-go marshalling and config format, thats what allowed the attack
against rubygems. But now, suddenly all ruby applications were in danger -
even my apps, even though I don't use yaml. Shell-scripts, daemons, everything
that uses ruby - whether it uses yaml.load or safe_yaml.load. Chef and Puppet
were at risk - and those provide root access to hundreds of servers, machines
that don't even run ruby apps.

And that is because gems are not signed. If gems were signed, all of this
would be a major nuisance, but with limited fallout. Signatures would get
checked, approved, done - no matter which attack was used to get to the gem
repo. There will be more attacks, using other vectors - a kernel exploit, a
webserver exploit, a mail account hacked into. And containing that fallout is
way more important. And that's the lesson that needs to be learned by all
language communities: Code distribution needs to be secured since otherwise a
single attack puts the whole community at risk.

So feel free to point at the python docs and pretend that that's the lack of
insight about yaml is what caused the problem. It's the spark that blasted the
powder keg, but we were sitting on it long before.

------
zobzu
and don't forget to sign your packages, because vulnerabilities will always
happen anyway, and if those compromise a distribution point, its hard to
authenticate those said packages.

~~~
__alexs
If PyPi was compromised like RubyGems I'm not sure they'd be more able to
reliably recover from it quickly either.

PyPi does support package signing (with GPG) but pip doesn't support signature
verification and hardly any packages are actually signed anyway. It's actually
probably more secure to load your python packages off of specific commits on a
public git repo over HTTPS right now. (Except that pip also doesn't validate
HTTPS certificates either...) And if you are lucky enough to be using a
package that is signed, establishing a WOT with the author to validate their
cert might not be easy.

It's not like people aren't working on this stuff though.
<https://www.updateframework.com/> have a 'secure' (the upstream ins't
obviously) PyPi mirror and the PEP427 Wheel
<http://wheel.readthedocs.org/en/latest/> format seems to be giving security
more consideration than previous attempts at Python packaging have.

~~~
wyuenho
Please +1 this pip ticket if you feel that supporting TLS cert and GPG
verification should be given the highest priority.

<https://github.com/pypa/pip/issues/425>

I think it's paramount that pip gets this done right now. Installing code
directly from PyPI is extremely scary now and has been for years...

------
reidrac
I'm not sure if we're talking about a language problem here. The tools are
there for you to use them in the proper way and there's always a chance that
someone misunderstands how things work.

Perl Data::Dumper can be used with eval for serialization, but it is a bad
idea. Just like using pickle in Python (most of the time, at least).

No matter what you do, there's always room for someone doing something stupid.
So I rather have the tools.

~~~
Argorak
Saying that it is a language problem might be true for that specific case.
YAML#load and #dump do Object marshaling, which is always dangerous when the
marshaled objects come from untrusted sources and parsed without a template.
Similar techniques exist in almost all other languages and object
instantiation attacks are nothing unheard of. So most of the bile is
unwarranted, unless you are asking for bile back if something like this
happens in your language. Everyone who is ranting about the security bugs of
other projects clearly lacks the humility that Nick Coghlan is asking for in
the comments.

But the reason why this specific one is very widespread is actually a cultural
one: YAML was propagated as a very convenient serialization format despite the
described property, especially as it was in stdlib very early. It turned into
one of Rubys beloved conventions. E.g. some static website generators like
Jekyll use YAML for meta data, called "front matter" (they use safe_yaml now,
don't try), Rubygems used it to dump their specs, etc. Combine that with the
fact that Rails activated certain parameter parsers without the users
knowledge (by convention, again) and made everyone vulnerable and you have a
recipe for desaster.

This is hard to fix, but thats the pain of suddenly being under attack. But
instead of all the hate, members of other communities should take away these
learnings and educate everyone they know that uses Ruby about these topics. In
clear words, but without hate.

------
moneypenny
If the original author is reading, those are George Orwell's "pithy maxims",
not Allen Short's, but entertaining usage on getting a Big Brother reference
into a security/trust context.

~~~
devinj
George Orwell wrote them, but Allen Short was the person who applied them in
that way to a security context. Much of the post is apparently greatly due to
Allen Short, so he deserves some mention.

~~~
abecedarius
For what it's worth, I applied "freedom is slavery" and "ignorance is
strength" to programming back in the 90s in a rambly post on my website. I
don't know if Allen ever saw it, and security wasn't much on my mind back
then. (We're acquaintances, I admire him, and I'm glad to hear of this talk.)

