Common Mistakes as Python Web Developer

davidhollander · on Dec 24, 2010

Good advice for problems, but one should focus more on the underlying problem mentioned:

chances are, vital files can be accessed with the rights your application has.

When you deploy your webapp, NEVER run your web server or application container server as root. I deploy python using nginx and uWSGI where nginx runs as the "nginx" user and group, and uwsgi runs as the "uwsgi" user and group.

Since uWSGI is spawning all the python instances, if I want to be able access files on disk such as for storing and reading sessions, I must explicitly change the folder owner to the uwsgi group for the webapp to even launch.

If you take only one security precaution for your production python app, make sure none of the instantiating servers run as root and that they must be given explicit permission to access folders.

Locke1689 · on Dec 24, 2010

The author also makes a common mistake: blacklisting as safe string escaping. My standard is to only accept alphabetic characters in any file-based user input system (I find very few filesystem problems require more than this). For example, one of my applications allows people to upload files. They never access the file names themselves, but are allowed to input a "file name." This file name is all alphabetic and is checked for length restrictions and encoded in my own file naming system. The translations are maintained in the application database. This prevents a wide variety of attacks on the filesystem itself.

Moreover, this whole article embodies the statement NEVER, EVER, IN ANY CIRCUMSTANCES TRUST USER INPUT.

the_mitsuhiko · on Dec 25, 2010

> The author also makes a common mistake: blacklisting as safe string escaping.

Which is why it says "at the very least". Werkzeug's secure filename function explicitly only whitelists characters in filenames.

> Moreover, this whole article embodies the statement NEVER, EVER, IN ANY CIRCUMSTANCES TRUST USER INPUT.

The problem is that often you will assume that certain APIs were written with that in mind when they were not. os.path.join for instance looks very innocent. I expect my database abstraction layer also to handle SQL injection projection for me, so I would be very confused if SQLAlchemy turned out to not escape strings passed in as placeholders.

Locke1689 · on Dec 25, 2010

The problem is that often you will assume that certain APIs were written with that in mind when they were not. os.path.join for instance looks very innocent.

Eh. I'm inclined to offer very little sympathy to anyone who believes that the os module in python provides any web injection protection. The os module is named "miscellaneous operating system interfaces" and anyone who got above a C in their OS course in college should know that no UNIX programmer took web injection vulnerabilities into account when designing the os interface.

the_mitsuhiko · on Dec 25, 2010

Sympathy or not, these things happen.

settrans · on Dec 24, 2010

Common mistakes as an experienced developer: over-complicate your system with a complex topology of multiple, independent services with separate maintenance, deployment and configuration concerns when a simple, monolithic script could do.

huxley · on Dec 25, 2010

There's a world of possibilities that exist between over-complicated systems and monolithic scripts.

I think it is fair to take a moment to consider maintenance, deployment and configuration before adding complexity to a project, but a monolithic script is usually only the answer for relatively simple questions.

mcantor · on Dec 25, 2010

Example?

jimmybot · on Dec 24, 2010

Hmm, startswith('/', '../') in is_secure_path doesn't check for paths like this: './../'

Although it would prohibit some legitimate uses, you could just prohibit any usage of '../' anywhere in the path.

tav · on Dec 24, 2010

You've overlooked the critical call in the previous line:

    def is_secure_path(path):
        path = posixpath.normpath(path)
        return not path.startswith(('/', '../'))

The call to `normpath` normalises the path, e.g.

    >>> normpath('./../foo')
    '../foo'

jimmybot · on Dec 26, 2010

Ah, you're right, thanks.

thwarted · on Dec 24, 2010

IIRC, the client should be resolving relative paths (using the provided or calculated base URI), and should be removing current directory (./) and parent directory (../) components before the request is even sent to the server. Server code should only ever see path-canonical requests. So it should be safe to just issue a 400 or 404 if you see ./ or ../ in the request.

Now obviously doesn't apply if you figure out which page to serve based on a submitted GET variable, like the OP seems to suggest doing. I'd see about using PATH_INFO instead (it would also make the Urls easier to read, obscure the implementation a little and potentially be more portable) But it's reasonable, and safer, to just say "the page variable has a specific subdir/subdir/file.html structure and immediately 404 if any of those components don't match ([\w–]), rather than trying to use filesystem-path semantics to resolve to a file and trusting that'll be secure and does what you want.

marcinw · on Dec 24, 2010

What about "..%2F..%2F" or "..\..\" or "..%5C..%5C"? In Java, you also need watch out for NULL (%00) in the path as well since java.io.File disregards anything after the null (most just test to see if filename ends with "ext" where ext is considered "safe", which is a huge error).

Don't ever allow the client to specify a filename; use an indirect reference such as a uuid. If you must, always perform path normalization and/or only allow alphanumeric characters.

the_mitsuhiko · on Dec 24, 2010

posixpath.normpath can never leave ./ in front:

  >>> import posixpath
  >>> posixpath.normpath('./../test')
  '../test'

ggchappell · on Dec 25, 2010

Shouldn't sanitization tools of the kind he is talking about, be in the Python standard library? The os package is supposed to let you deal with OS-specific things in an OS-independent way. Isn't filename sanitization one of those things?

More generally, I think we should have a reasonable expectation that any library/API that gives one access to resources that are dealt with using text (file paths, HTTP requests, SQL stuff, interfaces to a command line, etc.) should include sanitization functionality.

Wouldn't that make a nice general design guideline?

weaksauce · on Dec 24, 2010

It's been a while since I tried it but doesn't this also work to get you up a directory?

ExistingDir/../../parentRestrictedDir/passwords

His code seems to only check for the case where it starts with ../ of course this requires knowing of an existing directory in e current folder but that is not insurmountable for an enterprising hacker.

I'm on an iPad so I cannot check right now.

Edit: Just tried it out on my mac and it works like I remember.

tav · on Dec 24, 2010

I've also explained this in another comment, but basically, the call to `normpath` in his code takes care of this, e.g.

    >>> normpath('ExistingDir/../../parentRestrictedDir/passwords')
    '../parentRestrictedDir/passwords'

Ysx · on Dec 24, 2010

Django avoids dirctory traversal with django.utils._os.safe_join()

http://code.djangoproject.com/browser/django/tags/releases/1...

deno · on Dec 24, 2010

Parsing paths properly is certainly a good idea. But perhaps a more important issue is why you're not running your web application in a secure sandbox in the first place.

I think SELinux is now standard part of the Linux kernel[1], and if you're using Ubuntu there's AppArmor[2].

[1] http://en.wikipedia.org/wiki/Selinux

[2] https://help.ubuntu.com/10.04/serverguide/C/apparmor.html

telemachos · on Dec 24, 2010

> A few weeks ago I had a hatred discussion with a bunch of Python and Open Source people at a local meet-up about the way Python's path joining works. (emphasis mine - "heated"?)

Freudian slip? Or something halfway between a typo and a misheard word?

macco · on Dec 26, 2010

I have a question about the "Mixing up Data with Markup"-Problem. If you don't have markup in your database - how do you structure your content? What is a header, what is a paragraph?

deno · on Dec 27, 2010

If your input is HTML then your data is HTML. Author just warns about converting plain text etc to HTML before storing, thus making assumptions about data presentation.

macco · on Dec 27, 2010

Ah ok, that is something different. My fault. Thanks.

astrofinch · on Dec 25, 2010

I recommend Unipath for dealing with the os.path.join issue:

http://sluggo.scrapping.cc/python/unipath/

bcl · on Dec 24, 2010

os.path.join is evil. Don't use it. Yes, I recently learned this myself (luckily not the hard way).

TomasSedovic · on Dec 25, 2010

Well, that's a pretty broad statement.

Do you mean that it's evil when called with untrusted input by a trusting program or evil in general and not to be used under any or most circumstances?

The article suggests the former but your comment the latter. I would think the function is pretty safe and okay when you use it with a known input. Better than writing your own, anyway.

bcl · on Dec 25, 2010

Evil with un-trusted input. And dangerous when you forget that passing it a new root will wipe the previous paths.

obiefernandez · on Dec 25, 2010

Of course the most common mistake is not using Ruby instead :-p