Hacker News new | comments | show | ask | jobs | submit login
A Better Pip Workflow (kennethreitz.org)
171 points by ycnews 421 days ago | hide | past | web | 84 comments | favorite



I would additionally recommend to use :

  pip freeze -r requirements-to-freeze.txt > requirements.txt
instead of just:

  pip freeze > requirements.txt
So you can keep your file structure with comments, nicely separating your dependencies from the dependencies of your dependencies. Actually I only used one requirements.txt file, removing everything below "## The following requirements were added by pip --freeze:" and regenerating it when I changed my dependencies.

And sure, beware of git urls being replaced by egg names in the process.


Another tool you might find useful is pipreq.

    pipreq --savepath gen.requirements.txt /
The above will run through all your source code, and generate the requirements that you actually use via imports.

https://github.com/bndr/pipreqs

Disclaimer: I contribute to it.


Does it work well? I can't imagine it would be very accurate for things like a Django project...


It works pretty well for a Django project with the exception of anything that's only called from INSTALLED_APPs.

I have a branch that can handle Django, but I kept it private because it's not really general purpose to introspect Django's settings.py then read out data.


Nice! Didn't know this was even possible!


Duh! That's how I was using it along with the -l option!


This is sort of the same thing setup.py is intended for.

The way I've seen this successful in practice across many projects:

setup.py: specify top-level (i.e. used directly by the application) dependencies. No pinned deps as general practice, but fine to put a hard min/max version on them if it's for-sure known.

requirements.txt: Pin all deps + sub-deps. This is your exactly known valid application state. As mentioned, a hasty deploy to production is not when you want to learn a dependency upgrade has broken your app.

requirements-dev.txt: dev dependencies + include requirements.txt


This in fact was my intention when creating this feature in pip: setup.py has the abstract dependencies, requirements.txt is a recipe for building something exact. And I quickly found requirements-dev was useful for adding tools (some people have done that with setup.py extras, but the ergonomics on that never seemed great).

The one thing I wish was part of this is that there was a record of conflicts and purportedly successful combinations. That is, you distribute a package that says "I work with foo>=1.0" but you don't (and can't) know if the package works with foo 1.1 or 2.0. Semantic versioning feels like a fantasy that you could know, but you just can't. Understanding how versions work together is a discovery process, but we don't have a shared record of that discovery, and it doesn't belong to any one package version.

This sense that package releases and known good combinations are separate things developed at separate paces is also part of the motivation of requirements.txt, and maybe why people often moved away from setup.py


A pacakage/system that handled this would be great. Zope and Plone have the notion of "known good sets" -- and it's not really a pleasure to use (but much better than nothing). As far as I can tell, with Plone 5 - the recommended way to install Plone is from the unified installer - leaving known good sets, pinning and buildout to manage plugins:

http://docs.plone.org/manage/installing/installing_addons.ht...

For an example of what bootstrapping a full Plone 4 site via buildout entails, have a look at the (defunct) good-py project:

http://good-py.appspot.com/release/plone/4.2rc2

http://www.martinaspeli.net/articles/hello-good-py-a-known-g...

As long as one is able to manage to keep projects small, and in a virtual-env of it's own, managing "known good sets" (in buildout, or for pip) shouldn't really be too hard. But as projects grow, a real system for managing versions will be needed. As far as I know there are no good systems for this... yet. Ideally you'd want a list that people could update as they run into problems, so that if projectA runs fine with foo=1.0, and bar=1.1, maybe projectB discovers a bug in foo<=1.0.4rc5 and can update the requirement.

It's not a trivial thing (see also: All the package managers, apt, yum, etc).


Hi Ian, long time fan of your work.

I'm supporting both using some silly logic to pull in requirements.txt and supply that to setup.

https://github.com/russellballestrini/botoform


This. Using setup.py for specifying min versions (and occasionally max versions) is definitely the way to go.

However, manually maintaining a requirements file in addition to setup.py is quite tedious in the long run. It is much better to freeze requirements during the build process and use the generated requirements file for deployments. But, decent test coverage is key here.


I've bookmarked Donald Stufft's opinion on this for a couple years, which contrasts interestingly with Kenneth's and your own (and which makes a lot of sense): https://caremad.io/2013/07/setup-vs-requirement/

I like the concrete-vs-abstract, library-vs-application ideas there. Most opinions on this overlook the distinction.


I don't at all disagree with that, actually. I do consider that a part of setup.py, but it rarely matters in practice. The python ecosystem doesn't have similar programmatic api specs compared to, say, PHP where they get together and rigidly define these things: http://www.php-fig.org/psr/psr-7/

Not to say the PHP strategy is necessarily ideal (feels restrictive to me), rather just the comparison between the two. That's the sort of ecosystem that would make the abstract part matter more


No, no, Python very much does. PEP 0249 specifies DB API 2, PEP 0333 specifies WSGI, and PEP 3153 specifies the asyncio interface, for instance. If anything, I can think of more examples of it being done in the Python space than the PHP space.


This is easily the best option, barring how painful it is to write a setup.py.


Filling out setup.py and other package boilerplate used to annoy me as well but that all changed after I started using cookiecutter.

https://github.com/audreyr/cookiecutter


Thanks! I use cookiecutter and I still have to tweak things like MANIFEST.in and adding the package to the path while installing, for automatically-created entry points.


It's really not that bad with setuptools - as an added bonus it becomes easy to package your application up in an RPM.


setuptools? distutils you mean?

I much prefer fpm for creating Linux distro packaged from a python distribution, since it can create debs.


No, I mean setuptools - it does everything distutils does but better.

I don't trust random scripts to generate packages for me, writing an rpmspec or a debian control file is hardly a challenge and I'd encourage more people to take the 5 minutes to do so than relying on tools like FPM.


… or tools like setuptools?

Also, I assume you're talking about the bdist_rpm target that come from distutils.


No, I write a proper RPM spec file - only takes a few minutes.


So, what's the added bonus of using setuptools then?


setuptools is just less generally crufty than distutils, and things like entrypoints make writing installable console scripts feasible. Since it provides essentially the same interface the default rpmspec created by rpmdev-newspec works with it out of the box.


I was thinking about this the other day when I was helping a friend with his method #1-style requirements.txt, and how I wish there was something similar to composer's "lockfile".

The author's proposed method is basically the same as how php's composer does it, with its composer.json and composer.lock. Specify your application requirements by hand in composer.json, run composer install, and composer.lock is generated. Check both in so you can have consistent deploys. When you want to upgrade to the latest versions within constraints set by hand in composer.json, run composer update to pull latest versions, updating composer.lock. Run tests, and commit the new composer.lock if you are satisfied.


> The author's proposed method is basically the same as how php's composer does it

Composer merely cloned Ruby's Bundler and its Gemfile/Gemfile.lock in that regard. Which is a good thing. It's beyond puzzling that Python has spawned multiple dependency managers, none of which have replicated the same golden path.


That it hasn't been adopted as a core functionality doesn't there isn't one, zc.buildout first stable release predates bundler's 0.3.0 by at least one year:

https://pypi.python.org/pypi/zc.buildout/1.0.1


The lack of separation between requested direct dependencies and a pinned resolved dependencies(including transitive) has been one of the great confusions for me learning Python. Having used Bundler, CocoaPods, Composer, and NPM before, which all have this separation builtin, pip feels broken.

However there are a few project that tries to solves this, but the fact that the Python community has not decided on one cripples any initiative to fix it.


I can second this, I wish every time I interact with pip that it was npm. I never appreciated the KISS approach npm uses of just dumping all the dependencies in local folder exhaustively until I visited Python versioning hell upon myself by not using a virtualenv (which is an ugly hack itself).


Not to mention that NPM can resolve different version dependencies so if A requires v2 and B requires v3 of a module, they both can live separate and happy lives.

The above is possible for Python as well; I sketched out an implementation which patches __import__ to handle dependency resolution by version, but.... I'm afraid it's a bit unpythonic.


How does NPM do it? My requirements.txt looks like this and I've never had any problems:

  foo ~= 1.8.2
  bar ~= 2.4.1
  baz
(These are only requested dependencies, resolved are not specified).


I misspoke a little with regards to NPM. NPM has something called npm-shrinkwrap that allows you to lock resolved dependencies. It's used in many NPM projects and seems to be a standard chosen by the community.

I am not sure what ~= means in requirments.txt, but I'm gonna guess it means something like ~> or ^. With as system like that if everyone follows semver correctly we are fairly okay. The problem is that not everyone does and you have no guarantee that deploying the same code at two points in time t1 and t2 will produce the same application since one of the dependencies might have released new code.


Interestingly, the separation was the main problem I encountered deploying a Rails application for the first time. I still prefer the PIP's approach, as it promotes always using latest versions.


Promoting the use of newer versions is a little nice, but you will rue the day you deploy to production and it automatically pulls some flaky new package version that you didn't have a chance to test locally.


How does it promote using the latest version?


I've been using pip-tools to manage my requirements. It allows you to specify a top-level requirements file, and from that it builds you a requirements.txt with specific versions and all dependencies. It has really streamlined my pip workflow.

https://github.com/nvie/pip-tools


Yep, I'm surprised no one knows about pip-compile. It does exactly as the OP suggests, but with the ability to specify a range of versions.


If you don't mind committing to Anaconda Python distributions, then you should simply use conda.

You can still pip-install things within a conda environment, and conda can manage more dependencies than just Python dependencies (a common use case is managing R dependencies for a Python statistical workflow).

You can do

    conda list -e > requirements.txt
then

    conda create -n newenv --file requirements.txt
to create a Python environment from the frozen requirements.

I believe that conda makes it easier to selectively update, but even if you don't enjoy those features of conda, the same two-file trick as in this post will work for conda as well, since you can use `conda update --file ...`. Conda's "dry-run" features are more useful than pip's as well.

The perfect feature for conda to add is the ability to specify alternative Python distributions, either by a path to the executable, or by allowing alternative Python distributions to be hosted on Binstar.

I can understand why Continuum wants conda to heavily influence everyone to use only Anaconda, but I think the goodwill of making conda work for any Python distribution would bring more to them than keeping it focused solely on Anaconda. (For example, I know some production environments that still use Python 2.6 and are prevented from updating to 2.7 -- and even if they did update, they'd need to keep around some managed environments for 2.6 for testing and legacy verification work).


I agree. Only used conda on a few projects just recently, so not too battle tested, but I've found it much easier to use, especially with "difficult" libraries.


Nice post as alway from Kenneth.

However, this workflow has a little drawback. If you have a dependency not from pipy, e.g. `pip install git+ssh://github.com/kennethreitz/requests.git@master`, it won't work.


I've given up trying to use vcs links as dependencies with pip. I deploy everything to an internal devpi server and use --extra-index-url


yes. Its a continual irritation to me 'pip freeze' doesnt support this.


I would argue that the information in "requirements-to-freeze.txt" could simply be expressed in the "install_requires" list in setup.py. That is, the "general" dependencies (with only vague version information) should be in setup.py, whereas requirements.txt should pin the exact "guaranteed to work" versions. For small/non-commerical projects, requirements.txt may not even be necessary.


So, you use `python setup.py` instead of pip?


No...? AFAIK (and based on my daily experience), pip processes setup.py, so it definitely installs the dependencies listed there.


Ok, so you just skip the requirements file.


Pinto (https://metacpan.org/pod/distribution/Pinto/lib/Pinto/Manual...) nails this problem in the Perl ecosystem:

Pinto has two primary goals. First, Pinto seeks to address the problem of instability in the CPAN mirrors. Distribution archives are constantly added and removed from the CPAN, so if you use it to build a system or application, you may not get the same result twice. Second, Pinto seeks to encourage developers to use the CPAN toolchain for building, testing, and dependency management of their own local software, even if they never plan to release it to the CPAN.

Pinto accomplishes these goals by providing tools for creating and managing your own custom repositories of distribution archives. These repositories can contain any distribution archives you like, and can be used with the standard CPAN toolchain. The tools also support various operations that enable you to deal with common problems that arise during the development process.


The `pip-tools` utilities automate this nicely.


They automate that, plus more. Everyone interested in the original link should really check it out: https://github.com/nvie/pip-tools


... but it's notorious for not working on latest Python and pip.


A limitation you are not bound to cause they are open source.


So, I should learn the inner-workings of pip just to use a basic tool that exists both in Node.js and Ruby land and a dozen others?


Yes. Or, wait until some one else does.

This is how open source works. How do you think Node.js and Ruby got the capability? Do you imagine they sprout fully formed from hyperbole like "a dozen others".


I've been waiting, trust me. It's never been working with the latest pip, sorry!


With this workflow, what purpose does requirements.txt serve? Would the file ever be used directly?

Only thing I can think of is you'd track top-level packages in requirements-to-freeze.txt during development, while your deploy would use requirements.txt to get a deterministic environment.


That is indeed the idea.


What does "requests[security]" do? I don't remember ever using the bracket syntax.


Brackets are used to install recommended dependencies. See http://pythonhosted.org/setuptools/setuptools.html#declaring...

In the case of requests[security], it installs some extra packages that allow for more secure SSL. http://stackoverflow.com/questions/31811949/pip-install-requ...


This means "install the `requests` package with the `security` extras.

In this particular case, this installs 'pyOpenSSL>=0.13', 'ndg-httpsclient', 'pyasn1'. See: https://github.com/kennethreitz/requests/blob/46184236dc177f...


From the docs: extras_require A dictionary mapping names of “extras” (optional features of your project) to strings or lists of strings specifying what other distributions must be installed to support those features. See the section below on Declaring Dependencies for details and examples of the format of this argument.


Practically, it is used to prevent newer versions of pip from complaining about "insecure platform" and have cleaner CI build logs.


>While the Method #2 format for requirements.txt is best practice, it is a bit cumbersome. Namely, if I’m working on the codebase, and I want to $ pip install --upgrade some/all of the packages, I am unable to do so easily.

$ pip freeze --local | grep -v '^\-e' | cut -d = -f 1 | xargs ./venv/bin/pip install -U

Ought to be a first class command, though.

After running that the first thing I do is run the tests. Then I freeze, commit and push.

This is usually a good thing to do at the beginning of a new sprint so that more subtle bugs caused by upgrading packages can be teased out before releasing a new build.

For my projects that I release on pypi I don't want to use pinned dependencies, but for them I run periodic tests that download all dependencies and run tests so that I'm (almost) instantaneously notified when a package I depend upon causes a bug in my code (e.g. by changing an API).


Another method is to use pip constraints to enforce package versions https://pip.pypa.io/en/stable/user_guide/#constraints-files


I'm probably going to get downvoted to hell on this; but as a Rubyist who has been working on a Python project, I've been finding pip to be really weak against rubygems/bundler.

With RubyGems/Bundler I love the ability to point to github repos, lock versions (or allow minor/patch versions to update), have groups, etc.

requiresments.txt and pip just feels, awkward and weird. Especially when combined with virtualenv, in comparison to Ruby this is just stiff and strange.

I've had nothing but problems with more complex packages like opencv and opencl as well.


> I've had nothing but problems with more complex packages

I could say the same things about rubygems. With coding a lot of it is what you are familiar with. To me python and pip is clean and simple, ruby and rubygems is overly complex. But that's because I'm familiar with python, so yeah.


>With RubyGems/Bundler I love the ability to point to github repos

You can do this with pip.


Agree, isn't the difference between method #1 and method #2 precisely the reason that Bundler has a distinction between Gemfile and Gemfile.lock? http://bundler.io/v1.3/rationale.html Programmers want to handle and edit method #1, but sane deployment or collaboration requires tracking resolved versions via method #2.

The Gemfile contains the top-level dependencies that the app needs; the Gemfile.lock ensures that everybody developing or deploying is using the same gem versions for top-level and resolved transitive dependencies. Periodically one can `bundle update` to upgrade gem versions: http://bundler.io/man/bundle-update.1.html

It does continue to surprise me that distinctions like these are not handled by Python tooling, which has this absolutely sordid history around packaging, which continues aplomb in the wheel vs. egg wars... http://lucumr.pocoo.org/2014/1/27/python-on-wheels/


I am fairly positive you can do everything you have described. Just doing a quick google search returns the answers I need.

1. Point to github repos. `pip install git+ssh://git@github.com/echweb/echweb-utils.git` http://stackoverflow.com/questions/4830856/is-it-possible-to...

2. Lock minor versions `Django>=1.3,<1.3.99` http://stackoverflow.com/questions/6047670/pip-specifying-mi...

What is a group? My google search did not turn up anything relevant.


The only bummer is that you can't declare git repos as project dependencies, AFAIK they have to be installed manually as you've described.[0][1] Does ruby handle this well? This is a real PITA when the Cheese Shop / Warehouse either don't have a particular project or only have an outdated version. In such cases you pretty much just have to create your own mirror.

FYI groups allow you to group various dependencies. E.g. you have one group of dependencies for development, another for testing, and a minimal one for production.

[0]: --process-dependency-links will permit you to pull in VCS repos as dependencies locally, but it does you no good when distributing packages for third-party consumption

[1]: https://groups.google.com/forum/#!topic/pypa-dev/tJ6HHPQpyJ4


If you mean project dependencies as dependencies declared in the setup.py file, setuptools support the dependency_links option, which can include VCS links. So that shouldn't be a problem, albeit I believe it is frowned upon.


You have to pass pip the --process-dependency-links flag at install time (which is pretty shitty if users are expecting to be able to install from PyPi normally) and it will ignore your dependency links if it thinks there is a better package available at PyPi (e.g. some other package with the same name has a higher version or the GitHub repo has some critical bug fixes but still has the same version as the PyPi package). Sometimes you can fool it into working, but it's just been such an unpredictable mess that I've given up on dependency_links for software that's distributed to third-parties.


Same here, it's so weird coming for Perl (with its healthy admiration of Ruby's tooling) and to be stuck with manually tracking my package's (sorry, distribution's) requirements + pinning their versions in a Make target.


I personally use requirements.txt for source and requirements.txt.lock, which I find more standard.

Or maybe use the pip tools' standard: requirements.in for source and requirement.txt for compiled at least.


Hey, I have wrote Pundle to solve this problem! Idea is simple — we just have package to install other packages with versions freeze. Pundle maintaine frozen.txt alongside with requirements.txt.

What more — pundle does not use virtualenv and install all packages to user directory ~/.Pundledir and import frozen versions on demand.

It have all nice commands like install, upgrade, info etc

Check it https://github.com/Deepwalker/pundler


Vaguely related, I hacked together a little tool called Olaf[1] for people who like to pip freeze but also like to have multiple distinct requirements files like requirements-dev.txt – Needs a little work still but I've been using it on projects quite happily.

[1] https://pypi.python.org/pypi/olaf


Instead of doing that manually, use pip-tools https://github.com/nvie/pip-tools/

Some posts on theory and use http://nvie.com/posts/pip-tools-10-released/


He explicitly wrote: "I thought long and hard about building a tool to solve this problem. Others, like pip-tools, already have. But, I don’t want another tool in my toolchain; this should be possible with the tools available." so it's not like he's not aware of pip-tools.


/me has commit access to pip-tools.


We built Doppins (https://doppins.com) to be able to use pinned PyPI dependencies and/or ranges and still keep them up-to-date continuously. Still another tool, but it's quite quick to enable on a repository and doesn't require any maintenance afterwards.


I found myself using tox for managing the general case of this. But the spirit to the blog is in the same direction.


I still find myself automatically thinking of this PIP.

http://www.shaels.net/index.php/cpm80-22-documents/using-cpm...


I would also suggest using anaconda if you are using python for science: https://www.continuum.io/downloads

It makes managing the python packages and even the python versions quite easy.


like it or hate it, this is exact how its supposed to be done.


That doesn't mean its the way it should always be done though.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: