Hacker News new | past | comments | ask | show | jobs | submit login
A non-magical introduction to pip and virtualenv for Python beginners (dabapps.com)
241 points by j4mie on April 18, 2013 | hide | past | favorite | 82 comments



> Python actually has another, more primitive, package manager called easy_install, which is installed automatically when you install Python itself.

It's actually not, it's part of setuptools/distribute, though some Python distributions (actually just brew that I know of) include distribute alongside Python.

Also, while the quick skim of the rest of this looks mostly good, there's some unnecessary advice which complicates things.

virtualenv is not necessary if the crux of the advice is "don't install packages globally" (which is fantastic advice, second probably to the more important advice "don't install stuff with sudo").

What beginners (and even some more experienced programmers) need to learn about is --user. You install packages with `pip install --user thing`, and they become per-user installed (be sure to add `~/.local/bin` to your `$PATH`). This is enough to not require sudo and for 99% of cases is actually sufficient.

There is a rare case where you actually have packages that depend on different, conflicting versions of another dependency. This has happened ~1 time to me.

Don't get me wrong, I like virtualenv for other reasons (relating to workflow and maintenance), but if we're teaching people about packaging, there's no particularly great reason to dedicate most of a tutorial on it.


> What beginners (and even some more experienced programmers) need to learn about is --user. You install packages with `pip install --user thing`, and they become per-user installed (be sure to add `~/.local/bin` to your `$PATH`). This is enough to not require sudo and for 99% of cases is actually sufficient.

Is this relevant advice for packages needed by your web server account, e.g. www-data?


Thanks, this is great advice. easy_install comes preinstalled on a Mac, which is where the confusion arose. Will update the article when I get a chance.


As I install/develop my python apps into VMs with a fixed python version, I rarely use virtualenv... don't see the need for the extra complexity.

I've eagerly read pieces like this but haven't yet found out the reason this solution is problematic or that I'm doing it wrong. Just that no one else seems to be recommending it. Anyone have an idea?

Btw, one of the best discussions of the various deployment options I've seen is from the Pylons book: http://pylonsbook.com/en/1.1/deployment.html#choosing-or-set...

One other thing I'm not happy about regarding packaging best-practices (and PyPi) is that security updates are not able to be automated leading to vulnerable packages.


Virtualenv is about managing multiple/conflicting versions of libraries, not different versions of Python itself. You're accomplishing the same thing by using a different VM for each app, just with higher overhead and isolation of things other than Python as well.


I recently learned the lesson about global packages. I thought it would be so nice to have all packages readily on hand, but now startup time is about 15 seconds of crawling the filesystem looking for files (over NFS).

Go is looking more attractive day by day.....


"It hurts"

"Well, stop doing it then..."

Jests aside - Go is appealing for many reasons but your self-inflicted pain is not necessarily one of them. ;-)


How would you someone who has used pip with sudo undo the mess that has created in his computer?

Uninstall everything and start over?


Just start using virtualenv for everything. By default virtualenv starts you with a clean python setup each time and nothing you've installed globally with pip will affect you.


Yeah. You can use 'pip uninstall'.


I ended doing the following for those who are in the same situation:

    pip freeze > pip_list_with_sudo.txt
    sudo pip uninstall -r pip_list_with_sudo.txt
So now let's start doing things right with virtualenv...


Want to make life even easier? Check out virtualenvwrapper.

http://virtualenvwrapper.readthedocs.org/en/latest/

Lots of benefits, but trading 'cd path/to/my/project && source env/bin/activate' for 'workon project_env' (with autocomplete) is alone easily worth the five seconds it takes to check it out.


I find it's really useful for beginners to clearly understand what's actually happening under the covers before they start adding magical stuff on top. They can always add magic later, if they prefer that approach.


> Lots of benefits, but trading 'cd path/to/my/project && source env/bin/activate' for 'workon project_env' (with autocomplete) is alone easily worth the five seconds it takes to check it out.

And/or use the virtualenvwrapper plug-in for oh-my-zsh. Automatically activates virtualenv when you cd to the working directory.


The only problem with virtualenvwrapper is setting it up can and does cause problems for beginning developers new to the shell. Otherwise it's really awesome. :-)


Definitely worth using as an experienced Python person. But you can skip a lot of confusion with new Python developers by waiting until later to introduce it :)


Man, global system-wide installations that require admin rights by default? That's certainly something! Quite the stark comparison to Node.js and npm, where everything is installed locally into the current directory (under node_modules) by default, and "global" installation is actually a per-user installation. Tricking pip with virtualenv seems to get you pretty close to what you get by default with npm, albeit still somewhat more clunky. But to be fair, most other package managing solutions seem to pale in comparison to npm :-)

Either way, nice article. Now if only most packages weren't still for Python 2... PyPI says it has 30099 packages total, but only around 2104 of them are for Python 3 (according to the number of entries on the "Python 3 Packages"-page[1]).

[1] https://pypi.python.org/pypi?:action=browse&c=533&sh...


npm's default for -g is to install to Node's prefix, which is usually /usr or /usr/local. If you want it to install to your home directory, you can set the prefix to somewhere appropriate in your ~/.npmrc, which gives roughly the same behavior as pip's --user flag.

Edit: perhaps you changed your .npmrc or set the option via npm and forgot about it? I just checked on a fresh user, and 'npm install -g' definitely tries to install to /usr, just like pip.


I use Windows as my main OS, and by default npm -g installs packages to %AppData%, which is user-specific. I guess it's different on *nix, then.


That sounds broken to me. Then it's no longer -g for global. You do run your app under a different account than your user-account, right? Something like "nodeuser"?


>You do run your app under a different account

No, why would I do something like that on a development box? (Or run web stuff on Windows servers for that matter.) And pretty much the only things I install with -g are useful CLI tools - any code I write will have its dependencies installed locally and listed in package.json for 'npm install'.


It wasn't clear (to me) that this was a development box. And it certainly wasn't something npm could know -- so my point still stands. If there's a way to install packages globally, then they should be globally available -- also on windows. But perhaps this is documented somewhere.

As for why you would run stuff on windows, perhaps you were writing an ajax gateway to a legacy system and it made more sense to run the node server on the same machine as the legacy system?

(To be clear, I would pity you if that was the case, but you never know ;-)


Are you trolling?

The primary use case of npm is quite different. No one installs system-wide npm packages.

Virtualenv solves a different problem (create a complete Python environment inside a directory) so you can replicate the various production setups in your machine and develop. It's not a way to avoid admin-privileges to install system software, for that you can just pip install --user, use homebrew, whatever.


>Virtualenv solves a different problem

Based on the article I'd say the main reason to use it is so that you can have what amounts to local packages instead of having to rely on global packages (be they system-wide or user-specific). This is what npm does by default - packages are installed locally to node_modules.

And for replicating production setups I'd rather take it a step further and use something like Vagrant instead of replicating just one part of the setup (Python).


    pip install --user XXX  
should get you what you want. Not default but not a huge burden either.


...Which would be equivalent to npm install -g XXX, whereas to replicate npm install XXX you'd need virtualenv. I don't think there even is an equivalent to pip install XXX without virtualenv with Node/npm (global system-wide installation for all users that requires admin rights).


If you're in Brighton in the UK and you like this then maybe you'll be interested in the one day workshop run by Jamie (author of the blog post) and myself. This post was actually based on some of the material written by Jamie for the course.

Next one is happening next Thursday and there are still a couple of tickets:

http://dabapps.com/services/training/python-for-programmers


"pip is vastly superior toeasy_install for lots of reasons, and so should generally be used instead."

Unless you are using Windows, as pip doesn't support binary packages.


Nice article, but after using leiningen (the clojure solution to a similar problem, based on maven), it's really hard to go back to something like this. I really, really wish there was an equivalent in python (really, every language I use).


He mentions not checking the env directory into git. Why not?

In general what are the best practices for using virtualenv with version control?


Re: "Why not?"

Generally you should strongly avoid putting generated artefacts into version control. This leads to complete pain if ever you find yourself trying to diff or merge when they inevitably change. The problem is that you end up with conflicts which are completely unnecessary - you should always be able to just regenerate the virtualenv at any time.

This is especially true for non-relocatable artefacts (as others have mentioned) such as virtualenvs or compiled binaries.

Another thing is that these generated artefacts can be costly in terms of space consumed in the repository - maybe not so much for a virtualenv with one package in it, but for binaries or larger virtualenvs, these things can become quite large. In addition they're often not so friendly for git's delta compression which is better suited for textual data. You can end up unnecessarily increasing the size of your repository significantly, which is another thing best avoided.


Keep your requirements file checked in, but not the virtualenv. The built env should be seen as disposable, and is both location (ie, path on disk) and platform (for libs which are not pure python) specific .

I keep my virtualenvs in ~/.virtualenvs/, away from the project.


I find it best to keep virtual envs completely away from the project (I use http://virtualenvwrapper.readthedocs.org/en/latest/ which puts them by default in ~/.virtualenvs). A virtualenv is completely machine-specific.

If your project is a package itself (i.e. it has a setup.py file), then use that file to specify dependencies. On a new machine I check out a copy, create a virtual env and activate it. Then in the local copy I run "pip install -e .". This installs all the requirements from setup.py in the virtualenv, and links the local copy of my project to it as well. Now your package is available in the virtual env, but fully editable.

If your python project is not a package, you can install its dependencies in a virtual env with pip. Then run "pip freeze" to generate a list of all installed packages. Save that to a text file in your repository, e.g. ``requirements.txt``. On a different machine, or a fresh venv, you can then do "pip install -r requirements.txt" to set everything up in one go.


Alright, so after I set up the environment using pip and virtualenv, I see it has python in it, etc. If I use pip freeze > requirements.txt, it lists the packages I have installed using pip, but it doesn't list anything for the python version itself. How do I make sure the right python version gets captured if I don't check in the /env/ folder?


> How do I make sure the right python version gets captured if I don't check in the /env/ folder?

Document it in setup.py:

    if sys.version_info < (2, 6, 0):
        sys.stderr.write("Foo requires Python 2.6 or newer.\n")
        sys.exit(1)
You're using setup.py, right? ;)


Heroku allows specifying the Python version in a file called runtime.txt, which is analogous to requirements.txt:

   https://devcenter.heroku.com/articles/python-runtimes
I think this works well as a convention even if you're not deploying to Heroku. I also like the suggestion to put a guard in setup.py that checks sys.version_info.


Virtual envs don't relocate well, they tend to be very specific to machine and even install location. Plus you can recreate them from your requirements.txt file so there's no need.

Just add the env directory to your .gitignore


check in your requirements.txt file and run pip install -r requirements.txt on whichever machine you've just cloned the git repo to.


What if you want to perform a security update for one of the libraries on the server?

Also, IIRC the environment will contain a symlimk to the python executable and companion files. That symlink will change depending on the environment.


You could have issues with other people not running the same version of python, also you might have different site packages if you are working on an experimental branch that will be pushed or merged later.


I would like to just give a difference advice regarding creating virtualenvs and installing dependencies:

When you create the virtualenv, the current package you're working on doesn't get added to site-packages, so you're forced to be at the repository root to import the package.

The best approach is to have a proper setup.py file so you can do `python setup.py develop`, which will link the package you're working on into the virtualenv site-packages. This way it acts as it's installed and you can import anyway you like.

If you define your requirements on the setup.py (I think you should), you can even skip the `pip install -r requirements.txt` step.

I've cooked up a package template that can help getting this working:

https://github.com/hcarvalhoalves/python-package-template


I'd like to know what the best practices with regards to security are for using pip, or installing packages in general.

How do you verify package integrity? Do you simply pray that PyPI isn't compromised at the moment, or do you download your packages from Github instead, because the main repositories have more eyeballs on them?

How do you do security updates with pip?

I'm using apt-get at the moment which gives me security updates AFAIK, but my need is growing for more recent versions and certain packages that aren't accessible with apt.


One important note is to use pip>=1.3 (included in virtualenv>=1.9) as prior to this version, pip downloaded from pypi using http and was thus vulnerable to man in the middle attacks.

You might also like to check out wheel, which allows you to compile signed binary distributions that you can install using pip.


    Python actually has another, more primitive, package manager called 
    easy_install, which is installed automatically when you install Python itself. 
    pip is vastly superior to easy_install for lots of reasons, and so should 
    generally be used instead. You can use easy_install to install pip as follows:
I found it quite ironic that the author says pip is "vastly superior" to easy_install and then proceeds to install pip using easy_install.


It's similar to using Internet Explorer to download Chrome or Firefox.


Those are two non-related things, unless your only metric of superiority is that it comes pre-bundled.


Really hope someone could write a similar introduction to buildout (http://www.buildout.org/)


You might enjoy:

http://jacobian.org/writing/django-apps-with-buildout/

Which covers a lot of the same ground, but with buildout.


Thanks for the article! I recently spent some time going through the process of learning VirtualEnv / Pip; after looking at several tutorials, agreed that this is one of the better ones out there. A few other things I saw in other articles that you might like to clarify (though I understand there's a tradeoff with simplicity/clarity).

(1) specifically state that `pip freeze` is how to create requirements file (as folks have said in comments already) (2) add "further reading" link on VirtualEnvWrapper, as it adds some convenience methods to ease use of VirtualEnv (3) the "yolk" package shows you what's currently installed; it can be helpful to `pip install yolk` then run `yolk -l` to view the different packages installed in each of your envs. (4) when installing a package, you can specify the version, e.g. `pip install mod==1.1`, whereas `pip install mod` will simply install the latest



No. Use homebrew for system packages and use pip to install python packages. It's much more flexible and doesn't rely on package managers keeping up with releases.

In the real world, a typical pip requirements.txt file will have a mix of package names (which pip looks for and downloads from an index like pypi), git repo urls (eggs installed directly from a git server, eg from github) and bleeding edge track the latest changes -e git-repo#egg=eggname urls. That you can switch between these with ease is important, eg to switch to using your fork of some package rather than the last official release.


Consider MacPorts over homebrew.. I'll withhold opinion on any system that turns the only sanctioned UNIX for site adminitrator's control ONLY directory in the system (since like the 80s) into a Git repository. MacPorts (and basically every other system of this kind) gets it right, unsurprisingly

I'd never have believed a day would come where something like Homebrew would ever gain the traction it has.


No, Homebrew solves a different problem (compiling and installing software in the global system).

Think that virtualenv is way to package a complete Python environment together with the package you need or are working on.


a good introduction - I would like to hear more about deployment with virtualenv though - is it expected that you just document any packages with requirements.txt and then you would create the virtualenv in the deployment target and set everything up again? Or can you "package" a virtualenv for deployment?


Just generate your requirements file from your env locally (pip freeze > requirements.txt), deploy all of your files (env folder excluded) to your sever however you want and then run 'env/bin/pip install -r requirements.txt' on your server.


If I use pip freeze > requirements.txt, it lists the packages I have installed using pip, but it doesn't list anything for the python version itself. How do I make sure the right python version gets captured if I don't check in the /env/ folder?


Deployment/distribution can be handled by bundling the app. For Python 2.x, there's PyInstaller[0], Py2App[1], Py2Exe[2] all of which do much the same thing: make a single binary out of a python app and all its dependencies including the interpreter. Then you distribute that and don't worry about what the user has or hasn't got.

[0]http://www.pyinstaller.org/ [1]https://pypi.python.org/pypi/py2app/ [2]https://pypi.python.org/pypi/py2exe/0.6.9


Calling env/bin/python directly, as opposed to `source activate` is very handy for things like:

* bash command line

* cron

* or daemon manager like supervisord


Yep. Years ago I settled on a convention of always installing a virtualenv into a 've' directory in the project so I can just set the shebang line on scripts (eg, django's manage.py) to "#!ve/bin/python". My Django project template sets all that up for me automatically. So now I just have muscle memory for typing "./manage.py ..." etc and I never have to activate virtualenvs, mess around with virtualenvwrapper type hacks or accidently run one project's script with a different project's virtualenv activated.


Speaking as someone relatively new to Python (coming from an embedded development background, mostly with C/C++): What's the standard way of distributing client-side programs/libraries? If you only have one script you can just put it in /usr/local/bin/ but otherwise you have to mess with the system site-packages or modify sys.path before importing, right? I've seen a surprisingly large number of distros that didn't check /usr/ and /usr/local/ for packages.

Do you just hand the user an automatic virtualenv script? (Outside of using one of the binary builders out there, obviously.)


Some of the mentioned problems with the traditional method are partially solved with `pip install requests --user` but I understand that the bigger problem/main reason for virtualenv isn't helped by this.

However, I was very surprised that the author didn't mention venv (http://docs.python.org/3.3/library/venv.html) at all since it is basically virtualenv but part of the Python standard library.


i tried pyvenv last week. and it was pretty useless without pip + distribute. So virtualenv still wins IMO.


Excellent, amazingly well-written guide. This should really be put on python.org as an "Intro to Python Package Management" or something.


VertualEnv seems less like a brilliant tool than it does a workaround for an architectural problem in Python and pip.

That said, dependency hell is always tricky, and I've had to deal with some far uglier solutions in other platforms.


it is a brilliant tool in that its an incredibly easy to use and lightweight solution to an otherwise very annoying problem.

you might call it a workaround because its not the most elegant solution possible, but it still just works really well, so why complain?


you aren't implying that dependency hell is a problem unique to python/pip are you?

dealing with dependencies is always step 2 for me when learning a new language and dependency hell seems like a universal problem. I could be wrong though.


Honestly, Maven is way easier, more powerful and works identically on Win, Linux or Mac. The key is that Java let's you set classpath as a command line arg and pythonpath is an environment variable.


IMO, buildout is more like maven for python. but again the tools solve this problem of "dependency hell" which to me is a problem all languages have, not just python


Excellent tutorial, new to python, and I always hated when things just install without letting you know where it's going..and next time there is some upgrade all sorts of weird errors keep coming..Thanks a lot.


Flawless explanation. As a somewhat beginner to python i must give my thanks !


Thanks for this article, it was definitely needed. I've known that I should be using virtualenv for a while now, but actually trying to figure it out has been somewhat daunting until now.


How does it play with OS packages?

Do I package virtualenvs in an RPM or DEB?


It's orthogonal. Create your base image, at deploy time, create a virtualenv and install your specific artifacts.


Can someone explain the historical reasons for this problem even existing in the first place? (the problem that virtual env solves)


It also comes in handy when I'm trying to install something finicky that has a million dependencies, each of which also might be finicky (e.g. numba) and I screw something up and just want to start over clean and not muck around uninstalling things. virtualenv makes this easy—I just make a new environment for experimenting and then delete it if I mess up and want to start over, leaving my system configuration clean and other environments intact.


setuptools/distribute didn't implement `python setup.py uninstall`.

Unix fragmentation of where the bin and lib directories should reside, i.e. /bin, /usr/bin, /usr/local/bin, ~/bin, ...

Windows doesn't have symlinks and the different packaging tools have tried to implement the functionality in various different ways.

Python doesn't add the path of the "main" executed file to the module lookup path. (edit: actually, I think this is wrong. I meant to say "Python module import lookup is complicated.")


the problem is that different applications might require different versions of the same packages. if you install packages globally then you can only ever have one version available. this is a VERY BAD THING if you have any expectation of running more than application per machine.


Did... did you read the article? Separation of environments for projects with different dependencies.


java does not have virtual envs, and i can have projects with different dependancies. clearly this is a python problem. I was asking why it exists.


My mentor did not even give such a good introduction. Kudos to the author, now I have a bright AHHA lightbulb in my head! :D


That is why they say to take multiple advice from multiple sources.


Interesting article.




Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: