Good advice for problems, but one should focus more on the underlying problem mentioned:
chances are, vital files can be accessed with the rights your application has.
When you deploy your webapp, NEVER run your web server or application container server as root. I deploy python using nginx and uWSGI where nginx runs as the "nginx" user and group, and uwsgi runs as the "uwsgi" user and group.
Since uWSGI is spawning all the python instances, if I want to be able access files on disk such as for storing and reading sessions, I must explicitly change the folder owner to the uwsgi group for the webapp to even launch.
If you take only one security precaution for your production python app, make sure none of the instantiating servers run as root and that they must be given explicit permission to access folders.
The author also makes a common mistake: blacklisting as safe string escaping. My standard is to only accept alphabetic characters in any file-based user input system (I find very few filesystem problems require more than this). For example, one of my applications allows people to upload files. They never access the file names themselves, but are allowed to input a "file name." This file name is all alphabetic and is checked for length restrictions and encoded in my own file naming system. The translations are maintained in the application database. This prevents a wide variety of attacks on the filesystem itself.
Moreover, this whole article embodies the statement NEVER, EVER, IN ANY CIRCUMSTANCES TRUST USER INPUT.
> The author also makes a common mistake: blacklisting as safe string escaping.
Which is why it says "at the very least". Werkzeug's secure filename function explicitly only whitelists characters in filenames.
> Moreover, this whole article embodies the statement NEVER, EVER, IN ANY CIRCUMSTANCES TRUST USER INPUT.
The problem is that often you will assume that certain APIs were written with that in mind when they were not. os.path.join for instance looks very innocent. I expect my database abstraction layer also to handle SQL injection projection for me, so I would be very confused if SQLAlchemy turned out to not escape strings passed in as placeholders.
The problem is that often you will assume that certain APIs were written with that in mind when they were not. os.path.join for instance looks very innocent.
Eh. I'm inclined to offer very little sympathy to anyone who believes that the os module in python provides any web injection protection. The os module is named "miscellaneous operating system interfaces" and anyone who got above a C in their OS course in college should know that no UNIX programmer took web injection vulnerabilities into account when designing the os interface.
Common mistakes as an experienced developer: over-complicate your system with a complex topology of multiple, independent services with separate maintenance, deployment and configuration concerns when a simple, monolithic script could do.
There's a world of possibilities that exist between over-complicated systems and monolithic scripts.
I think it is fair to take a moment to consider maintenance, deployment and configuration before adding complexity to a project, but a monolithic script is usually only the answer for relatively simple questions.
IIRC, the client should be resolving relative paths (using the provided or calculated base URI), and should be removing current directory (./) and parent directory (../) components before the request is even sent to the server. Server code should only ever see path-canonical requests. So it should be safe to just issue a 400 or 404 if you see ./ or ../ in the request.
Now obviously doesn't apply if you figure out which page to serve based on a submitted GET variable, like the OP seems to suggest doing. I'd see about using PATH_INFO instead (it would also make the Urls easier to read, obscure the implementation a little and potentially be more portable) But it's reasonable, and safer, to just say "the page variable has a specific subdir/subdir/file.html structure and immediately 404 if any of those components don't match ([\w–]), rather than trying to use filesystem-path semantics to resolve to a file and trusting that'll be secure and does what you want.
What about "..%2F..%2F" or "..\..\" or "..%5C..%5C"? In Java, you also need watch out for NULL (%00) in the path as well since java.io.File disregards anything after the null (most just test to see if filename ends with "ext" where ext is considered "safe", which is a huge error).
Don't ever allow the client to specify a filename; use an indirect reference such as a uuid. If you must, always perform path normalization and/or only allow alphanumeric characters.
Shouldn't sanitization tools of the kind he is talking about, be in the Python standard library? The os package is supposed to let you deal with OS-specific things in an OS-independent way. Isn't filename sanitization one of those things?
More generally, I think we should have a reasonable expectation that any library/API that gives one access to resources that are dealt with using text (file paths, HTTP requests, SQL stuff, interfaces to a command line, etc.) should include sanitization functionality.
Wouldn't that make a nice general design guideline?
It's been a while since I tried it but doesn't this also work to get you up a directory?
ExistingDir/../../parentRestrictedDir/passwords
His code seems to only check for the case where it starts with ../ of course this requires knowing of an existing directory in e current folder but that is not insurmountable for an enterprising hacker.
I'm on an iPad so I cannot check right now.
Edit: Just tried it out on my mac and it works like I remember.
Parsing paths properly is certainly a good idea. But perhaps a more important issue is why you're not running your web application in a secure sandbox in the first place.
I think SELinux is now standard part of the Linux kernel[1], and if you're using Ubuntu there's AppArmor[2].
> A few weeks ago I had a hatred discussion with a bunch of Python and Open Source people at a local meet-up about the way Python's path joining works. (emphasis mine - "heated"?)
Freudian slip? Or something halfway between a typo and a misheard word?
I have a question about the "Mixing up Data with Markup"-Problem. If you don't have markup in your database - how do you structure your content? What is a header, what is a paragraph?
If your input is HTML then your data is HTML. Author just warns about converting plain text etc to HTML before storing, thus making assumptions about data presentation.
Do you mean that it's evil when called with untrusted input by a trusting program or evil in general and not to be used under any or most circumstances?
The article suggests the former but your comment the latter. I would think the function is pretty safe and okay when you use it with a known input. Better than writing your own, anyway.
chances are, vital files can be accessed with the rights your application has.
When you deploy your webapp, NEVER run your web server or application container server as root. I deploy python using nginx and uWSGI where nginx runs as the "nginx" user and group, and uwsgi runs as the "uwsgi" user and group.
Since uWSGI is spawning all the python instances, if I want to be able access files on disk such as for storing and reading sessions, I must explicitly change the folder owner to the uwsgi group for the webapp to even launch.
If you take only one security precaution for your production python app, make sure none of the instantiating servers run as root and that they must be given explicit permission to access folders.