Take binary packages, for example. Sure, eggs did that. Sort of. But they also introduced a weird parallel universe where you had to stop doing normal Python things and start doing egg things. So pip eschewed eggs.
And meanwhile the community banded together to find a way to separate the build and install processes, the result of which is the wheel format:
Similarly, virtual environments are still being improved (with the improvements now being integrated direct into Python itself).
And yes, you can use pip's requirements files as a duplicate way to specify dependencies. And people do that, and it's unfortunate. Because the thing requirements files are really useful for is specifying a particular environment that you want to replicate. That might be a known-good set of stuff you've tested and now want to deploy on, it might be an experimental combination of things you want others to test, etc., but that's something setup.py's dependency system isn't good at. And repeatable/replicable environments are certainly an important thing.
Also, while I support his offering a full-isolation alternative, realistically not everybody is going to want to develop in a Vagrant environment. It's a great solution if you're willing to run with that sort of overhead, but not everybody is.
Just saying "use full-isolation" doesn't solve the problem where I want to deploy multiple projects in the same unix environment. Heaven forbid I might want two processes from different projects to cooperate, without needing to do everything via APIs and TCP. Overheads there are not just performance, but also development time.
Isn't that not full isolation, then?
suppose you wanted to automate the installation of a python package to a windows machine, but the package has binary dependencies, is irritating to build from source, and is distributed as a pre-built .exe interactive installer (click next, next, ...). you can `wheel convert` the exe installer to get a binary wheel archive, then automate the installation of that wheel archive with pip. hopefully this isn't a common scenario, but the fact that pip + wheel make this kind of thing possible at all is very helpful.
Can someone seriously explain what the point of it is?
As far as I can tell, it's: make life easier for windows python users (which I am one of, and I don't care about it at all; you need VS to build everything anyway; one or two wheel modules makes zero difference; it'll only make a difference when everything uses wheel, which will be never...).
Plain old JARs get this right, Maven gets this right, NuGet gets this right, NPM gets this right. Why is it so complex on Python and Ruby? Some technological aspect of the (flexibility of) the languages that need you to basically copy over the entire world? Or just legacy of unfortunate bad design choices in earlier days?
I think the sane way to do it is to package your app using setuptools (and list dependencies) and then use pip install to install it in a new virtualenv on the production machines.
Here's how I do it on the project I'm currently on:
The issue I see is that the default tools people turn to hide this simplicity behind a shiny interface, and encourage people to think of the management tools as magic black boxes. The tools then become the only interface people know, and they're inevitably more complicated and fragile than the underlying mechanism.
let's say you are packaging up a java and a python "program". Both print Hello world to the stdout, but use a third party spell checking package. Both the venv and the jar will contain that third party package
All python needs to go from venv to jar is a tarball process.
that is roughly were I see us going anyway - ps anyone with good understanding of jar files please jump in - I would love to map out the python parts against the jar parts (ie pip install is what part of the java setup process?)
zip -r mymodule.zip mymodule/
and you have the same portability as you have with jar. For bonus points, you can append that zip to a copy of the python executable (either python.exe or whatever binary your OS uses) and it is available as a module for that instance of python. This is one way standalone Python program executables are distributed.
(this was only tangentially related, but I think it's cool. I may also be wrong)
Jar files do not support including third party libraries. You simply cannot put a jar in a jar. Eclipse/maven etc can explode the dependancies class files inside of the jar, but people rarely do that for anything other then distributing binaries of client apps (licensing becomes a massive pain btw).
war/ear files on the other hand does support including dependant jar files.
The idea is to divide the Operating system from the application. There is never a good divide point but at some point we can say the chef/puppet/salt infrastructure will give us the operating system, and the big binary blob (.deb) will give us everything else.
Throw in configuration as a seperate function and I am going to have to lie down in a dark room.
Because that ties you to a single system wide python. Why would you want that?
You have to do actual effort to install an npm module globally (namely, provide a switch). The default does the sane thing.
No it doesn't? You have to manually create a node_modules dir in the current directory because otherwise it'll scan up your tree for any parent dir containing node_modules, which is pretty much never what you want.
Until you want to actually deploy to production -- then good luck, write your own tools.
Of course we know it's python level only isolation. We're still running in an OS with a filesystem and such. If we wanted something more we'd use jails or something similar.
>Full methods of isolation make virtualenv redundant
So what? They are too heavy handed, and 99% of the time, I don't want them anyway.
>It is very, very easy to install something as large as a Django application into a prefix. Easier, I would argue, then indirectly driving virtualenv and messing with python shebangs.
You'd argue, but you'd lose the argument.
>You need to preserve this behaviour right down the line if you want to run things in this virtualenv from the outside, like a cron job. You will need to effectively hardcode the path of the virtualenv to run the correct python. This is at least as fiddly as manually setting up your PATH/PYTHONPATH.
Yes, if only they gave you OTHER BENEFITS in exchange. Oh, wait.
In general, move, nothing to see here...
Done it already, in production. The virtualenv fanboys didn't even notice. It's simple and elegant and works perfectly.
It was that it's "easier" ("Easier, I would argue, then indirectly driving virtualenv and messing with python shebangs").
Also, "virtualenv fanboys"? Please, are we 16 years old?
We use virtualenv and pip extensively here, with virtualenvwrapper.
pip install -r requirements.txt
Still, looking forward to some interesting comments on here.
virtualenv --no-site-packages -p$PYTHON $workdir
(cd $my-package-dir && python setup.py)
virtualenv --relocatable "$workdir"
fpm -s dir -t deb -n "$package" -p "$package.deb" -d <system dependencies> ...
Is the author really claiming that it's easier to script a non-virtualenv deployment than a virtualenv one? If so, great, do that - the only reason I deploy with virtualenv is because, guess what, that's easier to script.
Why default to --no-site-packages? Because it helps and it's easy. No, I'm not perfectly isolated from my host system - but then the host system could have a broken libc and then nothing, not even LXC, is going to make your isolation system work. Just because you can't isolate perfectly doesn't mean there's no point isolating as much as you can.
Yes, pip builds from source. That's a lot more reliable than the alternative. The Java guys certainly aren't mocking you if they've ever done the actually equivalent thing, i.e. deploy a library that uses JNI, which is a complete clusterfuck.
(URLs as dependencies are indeed a bad idea; don't do that. The complaint about pip freeze is purely derivative of the other complaints; it's wrong because they're wrong).
I'm glad you mentioned JNI. In Java, native is the exception. In python, it's much closer to the rule. A hell of a lot of python libraries rely on C components which leak out of a virtualenv.
Building from source isn't reliable. It's quite hard, not to mention relatively slow. See the great success of RPM and APT based Linux distributions as proof of this.
pip uninstall psycopg2
pip install --upgrade psycopg2
But I guess with easy_install you can fake it by running with -m and then deleting the errant egg files in lib and bin files. That's pretty easy, I guess.
Oh but hey, you know what you can do instead? Setup a virtualenv, easy_install everything and when it gets hopelessly out of date or munged, you can just delete the virtualenv directory and start again.
Snark aside, I would agree with the OP that the "feature" of installing via arbitrary URLs is an anti-pattern and encourages lazy development. Of course, not every package we build can be posted to a public package library, so there's always that issue with easy_install too. Sigh, what a mess we have. Good thing I'm still able to get work done with these tools :)
In short build your system and it's dependencies once and once only then pass them around through test into live.
We have three competing needs: a reliable deployment process that can move a binary-like blob across multiple test environments
A need for exactly reproducible environments but without dd/ghosting
A desire to keep things simple
Isolation is good - whether through the mostly isolated approach of venv, the almost total isolation of jails/LXC or the vagrant approach. But they focus almost entirely on binary builds - how does one pass around a python environment without rebuilding it and it's dependencies each time ala pip?
Well by taking the running built python environments and passing them into a package manager like apt and calling that a binary. That might mean tar balling a venv or tar balling /use/local/python but in the end it matters that we pass around the same basic bits.
I am working this out in pyholodeck.mikadosoftware.com and in my head - when I have a good answer I will shout
"For python packages that depend on system libraries, only the python-level part of those packages are isolated."
And there's nothing really bad about it. Well-written python libraries will work with any previous version of the library they're wrapping. They will also report incompatibilities. It's ok to use system libraries - especially if you're advocating getting rid of virtualenv as author does.
"Full methods of isolation make virtualenv redundant"
Well... no. There are times when installing a local version of some library is required and it cannot be installed system-wide, or it will break system's yum for example. You're not only isolating your app from the system, but also the system tools from the app.
"virtualenv’s value lies only in conveniently allowing a user to _interactively_ create a python sandbox"
There's nothing interactive about what `tox` does for example and it's a perfect example of why virtualenv is useful. You can have not only a virtualenv for testing your app, but also multiple configurations for different selected extras - all living side by side.
"Clearly virtualenv advocates don’t want any hidden dependencies or incorrect versions leaking into their environment. However their virtualenv will always be on the path first, so there’s little real danger"
Until you want the same package that's available in the system, but your app's version constraint is not looked at when the system's package is upgraded. Or you want different extras selected. Or your deps are incompatible with some-system-application deps, but you're calling it via subprocess (this is also where changing the python path in shbang comes useful).
Venvs are definitely not perfect, but for testing and installation of apps, they're amazingly useful. Binary libs issue is definitely annoying, but there's a different solution for it and I'm happy to see it used more often - don't compile extensions, but use cffi/ctypes.
Regardless, my experience with it so far has been... ideal. It really makes building environments and linking/unlinking packages a breeze. I haven't needed it for building my own packages yet, so we'll see how that goes.
I started using it recently and I see no need for virtualenv anymore.
I have nothing to say about the pip issue though, never had an issue with pip myself.
Generic algorithm of making things better:
0. Give it a go to fix it oneself first. Really.
1. Failing the previous, raise the perceived deficiency with a specific and workable proposed solution.
2. Failing the previous, indicate what's undesirable and how, and what behavior would be desirable.
3. Failing the previous, put a monetary bounty on the feature, fork the project or live with it. Rewriting from scratch has a 99.99% probability of being several times more work than it seems.
--no-site-packages has been default for a while. http://www.virtualenv.org/en/latest/virtualenv.html#the-syst...
I don't really see the argument about compiling against system headers and libs. Generally I do want to isolate my Python modules that are calling into other binary libs but don't care about isolating those binary libs themselves because their interface isn't changing when the Python wrapper for them changes. This is unless they are part of what I'm wanting to develop/deploy with, in which case the source will be in the virtualenv and install into the virtualenv using the config script at worst. A frequent example ends up being how Pygame will compile against some system libvideo.so, the behavior of which I never change, but may Pygames might have their own API's etc, and so the many compiled versions do have their own use.
Virtualenv is actually pretty noob friendly because one of the mistakes I see far more frequently than the others is that users will install things using pip system-wide that conflict with the system package manager. This can become pretty difficult to unscrew for inexperienced Linux users.
I've been meaning to actually add some virtualenv docs because of the frequency with which inexperienced Python and Linux users in general will waltz in and not be able to compile something because only the old version of Cython etc are on Ubuntu 11.blah and thus we start bringing in distribution-specific package managers into the realm Python package management was intended for and people try to figure out what version of Ubuntu they need instead of figuring out that they can install everything in one place in many instances and maintain an entire slew of projects without conflicts and without calling on IRC when synaptics clobbers things.
I know pip isn't perfect. I know venv isn't perfect. They do work pretty well though. And when you find something that works well for you in your process, use it.
Some valid points (many of which have been on articles featured on HN before). Shame about the tone.
You can use pip and virtualenv as well perhaps by creating a parallel Python install in /opt or something like that if needed. And then install that in an RPM if needed.
But if you are installing hundreds of binary files, dlls and using requirements.txt as the main way to specify dependencies you are probably going to end up with a mess.
It is much harder if you have multiple OS systems to support. Installing on Windows, RHEL/CentOS, Ubuntu and Mac OS X is hard in an easy and clean way. But if you target a specific server platform like say CentOS 6, take a look at RPMs.
I was just saying it is good to be aware of RPMs and APT packages. Even old and crusty setuptool has RPM support.
$ python setup.py bdist_rpm
You seem to need multiple OS-es. That is hard and I haven't found a clean universal solution.
It also depends on the software. We have lots of mixed, C/C++/Node.js/Java/Python/big data files. Using virtinst/pip, then unpacking tar.gz, make ; make install , then java's (whatever it has) then npm all to setup a repeatable dev and test environment would be a horrible mess. That is what RPMs packages provide for us.
I think the argument above (boss runs different OS) is a fallacy - you want to deploy to the same target OS, probably in the cloud, so optimise for that first then fiddle with different OS. I guarantee people will prefer deploying a cloud server and logging in "just to see" and be happy with manually bringing things up with `setup.py develop` locally.
You guys really got me excited about this. If I can get any air soon, I'll try to learn more about it!
Some of the recipes I have seen go more into configuration-management-like stuff but it is cool to see a single buildout script deploy nginx, DB, deps, and app in one go, on any linux box.
Would anyone please link to a practical, working example of this? I want to use buildouts, but I learn from example, and there seem to be very few examples of how to deploy a production configuration. How do the pros do it? What are the gotchas? Is there a book I can buy? Will someone please put together a PDF explaining all this, so that I can throw money at you?
EDIT: Arg, that's exactly what I mean... https://github.com/elbart/nginx-buildout is an ok example just to learn the basics of buildout, but making it "production ready" (i.e. extending it to build postgres, etc) is left as an exercise for the reader. I was really hoping to find a production buildout example... (But thank you, rithi! I appreciate you took the time to dig that one up for me.)
An example of how to combine many of these, for deploying a complex stack, could be:
Another somewhat complex example:
You might also find this enlightening: http://glicksoftware.com/blog/using-haproxy-with-zope-via-bu...
But most people just want to be able to write two apps that use different libraries, and for that, virtualenv is fine.
After spending some time with Node.js, I have become spoiled by how well npm works for everything. Albeit, if you want to mess with node versions you need something like nvm, but it is easy to get the hang of.
Seeing something as simple as npm for python would be awesome.
NPM is a wasteland of abandoned projects and nested npm_modules with broken symlinks to or from a bin that'll never work on your vm's shared directory (unless it's VMWare I guess, some kind of secret sauce they use).
Having said that, they're all ok tools that people use to build pretty cool things. I just think there's got to be a better way, and I think it's somewhere in between an extern and npm/pip thing where you store your dependencies somewhere where you don't have to worry about a site being down during deploy and versioning issues.
It's like you freeze and shrinkwrap, but by actually capturing everything you need once your shit works and putting it somewhere where you control not someone else.
2. Some tools make it easier to detect the mistakes and guide you to a solution. Some present you with unintelligible messages or a (for end-users rather useless) stacktrace. Others describe the problem nicely (in prose), point you to FAQ/documentation or tell you what the most likely solution is.
You're assuming that setting up virtualenv is a heroic, error-prone process rather than a couple seconds the first time you get a new system – just like node/npm.
Once you've installed virtualenv or npm, the process is identical: you need something, you install it and if something breaks you have to debug that particular package. In both cases, you're going to need to be able to read an error message and in neither case does the challenge usually involve packaging rather than, say, an issue with a shared library or incompatible/unavailable dependencies.
> 2. Some tools make it easier to detect the mistakes and guide you to a solution.
Again, there's no meaningful difference between the two unless you choose to make your environment complicated, which is not a problem specific to a language. I use both node and python on a regular basis and there's no general conclusion to be made about either one – npm installs are slower, python requires me to activate a virtualenv when I open a new window, and none of that really matters because no developer should be spending all day installing packages or opening terminal sessions.
1. If this is truly soul-crushing, it's a solved problem: https://gist.github.com/codysoyland/2198913
If you've got a bunch of project boilerplate, you can start a new, empty project, configure it to your boilerplate scenario, and then save that as a template for future projects.
Then, the next time you start a new Django app, you'd do something like this:
virtualenv project_name; cd project_name;
django-admin.py startproject --template=/Users/username/Django-templates/boilerplate project_name
pip install -r project_name/requirements.txt
curl -O https://pypi.python.org/packages/source/v/virtualenv/virtualenv-X.X.tar.gz
$ tar -xvfz virtualenv-X.X.tar.gz
$ python ./virtualenv-X.X/virtualenv.py myEnv
$ source ./myEnv/bin/activate
NPM may be simple to use, but it is not accurate.
No other posts on this 'python rants' blog, nothing else on HN?
IMO this is astroturfing ( http://en.wikipedia.org/wiki/Astroturfing )
And after reading it, I've realized I've been plain wrong in how I've been constructing my python related Docker containers.
Full disclosure: I love pip. I love virtualenv. I use them religiously.
I also work for the Docker team.
Full disclosure: Know 2 ex coworkers at Docker, no financial ties.
I don't like virtual env quite as much, but maybe I haven't juggled enough python versions at the same time (I don't often work on such disparate codebases that virtualenv becomes a real need, and it's a little obtuse to set up for me)...
As a current member of the `pip freeze > reqs.txt' brigade, I'd be interested in seeing a more detailed look at how to do it "right"/better.
Getting a conda package from can be as simple as
`conda build --build-recipe <pypi-name>`
There’s some history worth considering. LXC requires Linux >= 2.6.24, released 24 January 2008. Virtualenv was released in October 2007. So LXC was hardly an option at the time.
Virtualenv was, I think, a pretty good attempt at a pragmatic solution for purely python-related dependency management issues at that time. I found it a hell of a lot easier and quicker than chrooting or building a whole new python installation (I used to do that) or using (shudder) zc.buildout. System-level virtualization was pretty heavyweight in 2007.
I think maybe virtualenv is showing its age a bit; I agree with that the system library isolation issue is a huge hole in the virtualenv approach. But often, it’s enough to get work done.
As for pip vs. easy_install, anybody who was around at the time (sorry I can’t tell from the pythonrants blog if that includes "Adam" or not) remembers that life with easy_install was horrifically painful. It was buggy, often failed with completely unhelpful messages, and issues with it (and setuptools more generally) were simply not getting fixed at all. For _years_. (That is finally changing more recently, thankfully.) Pip was intended to route around all that (while still using setuptools internally) by doing less and by having less painful failure modes. As one example: if you tried to easy_install a bunch of packages, or one package with a bunch of dependencies, and one somewhere in the middle failed because it couldn’t find a dependency, you’d end up with a fucked environment that had half of the packages installed and half not. And since there was no easy_uninstall, you had no easy way to clean up the mess. And you wouldn’t even have any easy way to know what actually depended on the dependency that failed to install. Pip took the much nicer approach of downloading everything, resolving all dependencies, trying to tell you what depended on something that couldn’t be found, and building everything before installing any packages at all. So if there was a failure prior to the installation phase, it had no effect on installed libraries at all. It’s hard to overstate how much pain relief this provided.
For another example: setuptools and easy_install allow installing different versions of the same package into the same python environment at the same time. I’m not sure why that was ever considered desirable, because every time it actually happens, it’s been a source of nothing but pain for me and didn’t even seem to work as advertised. Pip took the opinionated approach that only one version should be installed and if you want different versions for some other application, just go build a separate environment for it (virtualenv or no).
I agree that the –no-site-packages vs. –system-site-packages options to virtualenv are problematic, but for me that’s largely because it amounts to a binary choice between having to build all C extensions from scratch vs. having to depend on whatever happens to be installed system-wide, which is a pretty poor choice to have to make.
As for the statement that “there’s little real danger” because “their virtualenv will always be on the path first”, he's forgetting (or possibly doesn’t know) that the easy_install.pth file messes with sys.path as well. I had several hair-tearing sessions trying to figure out why on earth the wrong version of some package was getting imported before I realized that. At one point I found that import behavior changed depending on which version of setuptools or distribute I had installed. That was fun.
It’s also true that people often misuse requirements files. I don’t think that’s the fault of the tool. Depending on the current commit of some master or other branch is idiotic regardless of what mechanism you use. Depending on a specific tarball URL is no more or less reliable than depending on a specific package version being available via pypi (hasn't everybody had the experience of somebody yanking away a package version you depended on?). The right thing to do is probably neither, but to host all your dependencies somewhere that you control (whether that’s your own python package index or .deb or .rpm server or whatever).
The .pth files don't do anything until their own paths are added to sys.path. It's safe to consider directories with .pth files as if they contain python modules.