Hacker News new | past | comments | ask | show | jobs | submit login
Common Mistakes as Python Web Developer (pocoo.org)
114 points by alanthonyc on Dec 24, 2010 | hide | past | web | favorite | 28 comments

Good advice for problems, but one should focus more on the underlying problem mentioned:

chances are, vital files can be accessed with the rights your application has.

When you deploy your webapp, NEVER run your web server or application container server as root. I deploy python using nginx and uWSGI where nginx runs as the "nginx" user and group, and uwsgi runs as the "uwsgi" user and group.

Since uWSGI is spawning all the python instances, if I want to be able access files on disk such as for storing and reading sessions, I must explicitly change the folder owner to the uwsgi group for the webapp to even launch.

If you take only one security precaution for your production python app, make sure none of the instantiating servers run as root and that they must be given explicit permission to access folders.

The author also makes a common mistake: blacklisting as safe string escaping. My standard is to only accept alphabetic characters in any file-based user input system (I find very few filesystem problems require more than this). For example, one of my applications allows people to upload files. They never access the file names themselves, but are allowed to input a "file name." This file name is all alphabetic and is checked for length restrictions and encoded in my own file naming system. The translations are maintained in the application database. This prevents a wide variety of attacks on the filesystem itself.

Moreover, this whole article embodies the statement NEVER, EVER, IN ANY CIRCUMSTANCES TRUST USER INPUT.

> The author also makes a common mistake: blacklisting as safe string escaping.

Which is why it says "at the very least". Werkzeug's secure filename function explicitly only whitelists characters in filenames.

> Moreover, this whole article embodies the statement NEVER, EVER, IN ANY CIRCUMSTANCES TRUST USER INPUT.

The problem is that often you will assume that certain APIs were written with that in mind when they were not. os.path.join for instance looks very innocent. I expect my database abstraction layer also to handle SQL injection projection for me, so I would be very confused if SQLAlchemy turned out to not escape strings passed in as placeholders.

The problem is that often you will assume that certain APIs were written with that in mind when they were not. os.path.join for instance looks very innocent.

Eh. I'm inclined to offer very little sympathy to anyone who believes that the os module in python provides any web injection protection. The os module is named "miscellaneous operating system interfaces" and anyone who got above a C in their OS course in college should know that no UNIX programmer took web injection vulnerabilities into account when designing the os interface.

Sympathy or not, these things happen.

Common mistakes as an experienced developer: over-complicate your system with a complex topology of multiple, independent services with separate maintenance, deployment and configuration concerns when a simple, monolithic script could do.

There's a world of possibilities that exist between over-complicated systems and monolithic scripts.

I think it is fair to take a moment to consider maintenance, deployment and configuration before adding complexity to a project, but a monolithic script is usually only the answer for relatively simple questions.


Hmm, startswith('/', '../') in is_secure_path doesn't check for paths like this: './../'

Although it would prohibit some legitimate uses, you could just prohibit any usage of '../' anywhere in the path.

You've overlooked the critical call in the previous line:

    def is_secure_path(path):
        path = posixpath.normpath(path)
        return not path.startswith(('/', '../'))
The call to `normpath` normalises the path, e.g.

    >>> normpath('./../foo')

Ah, you're right, thanks.

IIRC, the client should be resolving relative paths (using the provided or calculated base URI), and should be removing current directory (./) and parent directory (../) components before the request is even sent to the server. Server code should only ever see path-canonical requests. So it should be safe to just issue a 400 or 404 if you see ./ or ../ in the request.

Now obviously doesn't apply if you figure out which page to serve based on a submitted GET variable, like the OP seems to suggest doing. I'd see about using PATH_INFO instead (it would also make the Urls easier to read, obscure the implementation a little and potentially be more portable) But it's reasonable, and safer, to just say "the page variable has a specific subdir/subdir/file.html structure and immediately 404 if any of those components don't match ([\w–]), rather than trying to use filesystem-path semantics to resolve to a file and trusting that'll be secure and does what you want.

What about "..%2F..%2F" or "..\..\" or "..%5C..%5C"? In Java, you also need watch out for NULL (%00) in the path as well since java.io.File disregards anything after the null (most just test to see if filename ends with "ext" where ext is considered "safe", which is a huge error).

Don't ever allow the client to specify a filename; use an indirect reference such as a uuid. If you must, always perform path normalization and/or only allow alphanumeric characters.

posixpath.normpath can never leave ./ in front:

  >>> import posixpath
  >>> posixpath.normpath('./../test')

Shouldn't sanitization tools of the kind he is talking about, be in the Python standard library? The os package is supposed to let you deal with OS-specific things in an OS-independent way. Isn't filename sanitization one of those things?

More generally, I think we should have a reasonable expectation that any library/API that gives one access to resources that are dealt with using text (file paths, HTTP requests, SQL stuff, interfaces to a command line, etc.) should include sanitization functionality.

Wouldn't that make a nice general design guideline?

It's been a while since I tried it but doesn't this also work to get you up a directory?


His code seems to only check for the case where it starts with ../ of course this requires knowing of an existing directory in e current folder but that is not insurmountable for an enterprising hacker.

I'm on an iPad so I cannot check right now.

Edit: Just tried it out on my mac and it works like I remember.

I've also explained this in another comment, but basically, the call to `normpath` in his code takes care of this, e.g.

    >>> normpath('ExistingDir/../../parentRestrictedDir/passwords')

Django avoids dirctory traversal with django.utils._os.safe_join()


Parsing paths properly is certainly a good idea. But perhaps a more important issue is why you're not running your web application in a secure sandbox in the first place.

I think SELinux is now standard part of the Linux kernel[1], and if you're using Ubuntu there's AppArmor[2].

[1] http://en.wikipedia.org/wiki/Selinux

[2] https://help.ubuntu.com/10.04/serverguide/C/apparmor.html

> A few weeks ago I had a hatred discussion with a bunch of Python and Open Source people at a local meet-up about the way Python's path joining works. (emphasis mine - "heated"?)

Freudian slip? Or something halfway between a typo and a misheard word?

I have a question about the "Mixing up Data with Markup"-Problem. If you don't have markup in your database - how do you structure your content? What is a header, what is a paragraph?

If your input is HTML then your data is HTML. Author just warns about converting plain text etc to HTML before storing, thus making assumptions about data presentation.

Ah ok, that is something different. My fault. Thanks.

I recommend Unipath for dealing with the os.path.join issue:


os.path.join is evil. Don't use it. Yes, I recently learned this myself (luckily not the hard way).

Well, that's a pretty broad statement.

Do you mean that it's evil when called with untrusted input by a trusting program or evil in general and not to be used under any or most circumstances?

The article suggests the former but your comment the latter. I would think the function is pretty safe and okay when you use it with a known input. Better than writing your own, anyway.

Evil with un-trusted input. And dangerous when you forget that passing it a new root will wipe the previous paths.

Of course the most common mistake is not using Ruby instead :-p

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact