
Freezing Python’s Dependency Hell - ammaristotle
https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241
======
abhishekjha
What's wrong with pipenv? I am genuinely curious.

On local :

    
    
        mkdir my_project_directory
        cd my_project_directory
        export PIPENV_VENV_IN_PROJECT=1 (To make the virtual environment folder determininstic(.venv/) otherwise you will get a hash based directory(my_project_directory-some-hash-value) which might not be suitable for automatic deployments in applications like docker. I don't know why this is not default.)
        pipenv --python 3.6 (or any particular version number)
        pipenv install numpy scipy pandas matplotlib requests
        pipenv graph (Gives me a dependency graph)
        git add .
        git commit -a -S -m "init" 
        git push
    

On remote :

    
    
        git clone url/my_project_directory
        cd my_project_directory
        export PIPENV_VENV_IN_PROJECT=1
        pipenv install
        pipenv shell
        pipenv graph
    
    

Is this workflow not enough? I have recently started using pipenv after a lot
of struggle. The only issue I have is, Pycharm doesn't allow native pipenv
initialisation. I always end up creating an environment manually and then
importing the project. Pycharm does detect the environment though.

~~~
mkobit
Last time I tried, it also required that the target Python version be
installed somewhere on the path. If pipenv used venv instead of virtualenv,
something like pyenv to retrieve/install Python versions, and was distributed
as a full executable (rather than requiring a bootstrapped Python) I would
actually it.

~~~
maccam94
Install pyenv and add a .python-version file to your project. pipenv will use
it the pyenv installed python, and prompt to install it via pyenv if it's
missing

------
bjpbakker
I feel that all of these language specific solutions still only solve halve
the problem. Your code depends on a lot more than _just_ the python libraries.
And often this is exactly what makes projects break on different systems.

Let me make another suggestion: nixpkgs [0] it helps to define exactly that
fixed set of dependencies. Not just on published version number, but on the
actual source code _and_ all it's dependencies.

[0] - [https://nixos.org/nixpkgs/](https://nixos.org/nixpkgs/)

~~~
abakus
This. A lot of GPU deep learning libs depend on CUDA, cuDNN, which is not
solved by pipenv / virtualenv, BUT is actually handled by conda.

------
ris
Here we go again. The source of the problems in in toy package managers (and I
include all language package managers here) is not just the package managers
themselves, it's the "version soup" philosophy they present to the user. Not
daring to risk displeasing the user, they will take orders akin to "I'd like
version 1.2.3 of package a, version 31.4.1q of package b, version 0.271 of
package c, version 141 of package d...", barely giving a thought to inter-
version dependencies of the result.

Unfortunately, software _does not work this way_. You cannot just ask for an
arbitrary combination of versions and rely on it to work. Conflicts and
diamond dependencies lurk everywhere.

 _Sensible_ package systems (see specifically Nix & nixpkgs) have realized
this and follow a "distribution" model where they periodically settle upon a
collection of versions of packages which _generally_ are known to work pretty
well together (nixpkgs in particular tries to ensure packages' test suites
pass in any environment they're going to be installed in). A responsible
package distribution will also take it upon themselves to maintain these
versions with (often backported) security fixes so that it's no worry sticking
with a selection of versions for ~6 months.

However, I can't say I'm particularly surprised that these systems tend to
lose out in popularity to the seductively "easy" systems that try to promise
the user the moon.

~~~
kalefranz
There's been quite a bit of discussion about Anaconda and conda in this thread
already. Anaconda also takes this distribution approach, and it's targeted
specifically at python.

~~~
ris
Yet it will never be able to solve the system-library dependency problem in
the way that Nix does.

~~~
RayDonnelly
It solves this already and has done for many years (but this depends on what
you mean _exactly_ by "the way that Nix does").

------
zedr
Using a local virtual environment and then building a Docker image removes
most of the headaches. I also bundle a Makefile with simple targets. See this
as an example:
[https://github.com/zedr/cffi_test/blob/master/Makefile](https://github.com/zedr/cffi_test/blob/master/Makefile)
New projects are created from a template using Cookiecutter.

It isn't really so bad in 2018, but I do have a lot of scars from the old
days, most of them caused by zc.buildout.

The secret is using, as the article mentions, a custom virtual env for each
instance of the project. I never found the need for stateful tooling like
Virtualenvwrapper.

~~~
jbergknoff
You can also set a PYTHONUSERBASE environment variable (and `pip install
--user`) to scope the installed packages to the project's directory. This is
effectively the same as a virtualenv, but doesn't have the requirement on bash
or "activation", and it's less magical than virtualenv because these choices
are explicit on each command. The tradeoff is that it can be tedious to be
explicit, remembering to use `--user` and specify the PYTHONUSERBASE. If
you're scripting everything via make, though, then that's not such a burden.

~~~
gvalkov
There is no need to activate a virtualenv to use it. Just call
$VIRTUALENV/bin/python directly. Activating is just a convenience for doing
interactive work.

------
aequitas
"Pipfile looks promising for managing package dependencies, but is under
active development. We may adopt this as an alternative if/when it reaches
maturity, but for the time being we use requirements.txt."

If I where given the choice between community supported/in development
Pipfile/pipenv or the 3rd party supported yet-another-package-manager lore to
get those best practices my money would be on Pipfile/pipenv. I've been using
it for many project now and besides some minor annoyances (eg: the
maintainer's love for color output that is not form follow function) it has
been a great tool.

------
michaelmcmillan
Never had a problem with dependencies in Python. Just keep it simple.

When starting a new project:

    
    
      virtualenv venv -p *path-to-python-version-you-want*
      ./venv/bin/pip install *name-of-package*
    

When running that project:

    
    
      ./venv/bin/python *name-of-python-file*
    

Many people don't realize that the venv/bin/ contains all the relevant
binaries with the right library path's out of the box.

~~~
spapas82
Is there a reason you don't "activate" your virtualenv?

That (with the addition of using mkvirtualenv and friends) is the workflow I
use to both dev and prod and am really happy with!

~~~
crdoconnor
I hate the whole idea of activating virtualenvs. It's a tool that makes it
really easy to end up running a command in the wrong environment and see weird
behavior instead of a clear error message.

I've seen variations on this scenario happen at least 3 times, for instance:

1) Somebody creates script that activates and runs django and commits it.

2) Junior runs script but the virtualenv doesn't get created for some reason.

3) The "warning virtualenv doesn't exist" message appears briefly and gets
missed.

4) The junior gets "import error: cannot import django" or something.

5) They then start installing django in their system environment and... it
sort of works. Except then they get another import error. And a whole bunch of
python packages installed in their system environment. Yech.

Moreover, I'm really not sure what was so wrong with just running
./venv/bin/python in the first place and never having to worry about what
environment you're in.

~~~
spapas82
That's 7 characters more you'll need to write all the time! Also you'll need
to remember to prepend them to all scrips, ie pip, fab etc. Well that seems to
me to be more error prone for juniors than telling them to always use a
virtual env (ie have (envnane) in their prompt)!!

~~~
crdoconnor
It's less error prone. Never had ^^ that scenario since and I've not run into
additional problems either.

Having an extra 7 characters in a ./run.sh script doesn't really bother me.
I'm not a perl developer.

------
pytyper2
I'm not sure why the scientists don't use VMs and simply save the virtual disk
files? That would at the very least allow them to verify the settings at a
later date. Fresh install reproducibility doesn't seem necessary to verify
experimental findings as long as the original vm is available to boot up.

~~~
peatmoss
My guesses are that:

1\. Integrating the development environment on their host PC (for example
connecting RStudio in R's case, or connecting their web browser back to a
server running in the VM in the case of Jupyter) is another set of skills to
master.

2\. Many data analyses are memory hungry unless you want to resort to coding
practices that optimize for memory consumption. The overhead of running a VM
is a bummer for some scientists.

3\. Many scientists are not using Linux top-to-bottom, and therefore don't
have a great way of virtualizing a platform that they are familiar with (e.g.
Windows, macOS)

Can people think of others? I'm sure I'm missing some.

(EDIT: To be clear, I think VMs are a great path, but I do think there are
some practical reasons why some scientists don't use them)

~~~
pytyper2
I think these and other issues can be solved with technical training.

~~~
peatmoss
Sure, they’re all mitigatable, but that technical training is competing with a
lot of other considerations within the limited brainwidth of a scientist.

From the scientist’s perspective, a lot of this can start to feel like yak
shaving. The opportunity costs are real.

~~~
pytyper2
Eh, maybe. Virtualbox is point and click at this point, and taken on in
conjunction with their institutional IT departments as hopefully they do with
all desktop point and click software, totally doable with 5-10 hours of
training and some typed desk procedures. Learning new tools and workflow seems
to be part of the job. As I type that I also thought of a different response
from the perspective of a leader and software engineer, I did not type that
response.

------
nickjj
1\. Build Docker image out of requirements.txt

2\. Develop application

3\. Repeat 1-2 until ready to deploy

4\. Run Docker image in production with same dependencies as development

5\. ??

6\. Profit!

As long as you don't rebuild in between steps 3-4, you'll have the same set of
dependencies down to the exact patch level.

~~~
minitech
Doesn't help developers not get different versions of packages. Lockfiles are
necessary regardless of Docker.

~~~
RossM
This is important (though I'm oddly yet to run into this issue with pip; I've
only had conflicts with npm and composer before). Freezing dependency sources
in Docker images and using (pip install --require-hashes -r requirements.txt)
for development seems to cover everything.

~~~
fernandotakai
yeah, i was going to say the same: i never had that issue in ~7y of python
work.

nowadays, requirements + docker solves 99% of everything i do.

maybe it's because i'm not using numpy and the likes?

------
sandGorgon
genuine question - is nobody using anaconda/conda in production ? I have found
the binary install experience in conda far more pleasant than in anything
else.

Going forward, the trend is going to be pipenv+manylinux
([https://github.com/pypa/manylinux](https://github.com/pypa/manylinux)), but
conda is super pleasant today

~~~
virusduck
What happens when there isn't a conda recipe for some package or inexplicably
some dependency? Do I go back to pip? sudo pip ;) ? Use virtualenv?? Nothing
is ever solved.......

~~~
kalefranz
> What happens when there isn't a conda recipe for some package or
> inexplicably some dependency?

You go contribute it on conda-forge? The conda team is also actively working
on improving some of these problems specifically for python users. When you
create a new conda environment with python in it, we put pip in it for you
too. In a way, we're implicitly encouraging you to use pip along with conda,
and yet it's not a seamless experience. So
[https://github.com/conda/conda/issues/7053](https://github.com/conda/conda/issues/7053)
is a current effort. Eventually, we're working toward conda being able to
install wheels directly (at least pure-python wheels at a minimum), so that
packages available on PyPI that conda-forge or Anaconda haven't built out yet
can still be installed with conda.

> Do I go back to pip? sudo pip ;) ?

If you're putting `sudo` in front of `pip`, you're probably doing it wrong ;-)

------
tamatsyk
Interesting reading, I share some of the points in the post, however, one more
dependency manager?

Mostly I've used plain `python -m venv venv` and it always worked well. A
downside - you need to add a few bash scripts to automate typical workflow for
your teammates.

Pipenv sounds great but there are some pitfalls as well. I've been going
through this post recently and got a bit upset about Pipenv:
[https://chriswarrick.com/blog/2018/07/17/pipenv-promises-
a-l...](https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-lot-
delivers-very-little/)

Another point is that it does not work well with PyCharm and does not allow to
put all dependencies into the project folder as I used to do with venv. (just
like to keep everything in one folder to clean up it easily)

Are there any better practices to make life easier?

~~~
peterwwillis
Actually, I recommend bash scripts for automating team workflows as a best
practice.

You create a wrapper script around your application that calls a dev
environment set-up script, that [if it wasn't done yet] sets up the
environment from scratch for that project or application, and loads it before
running your application. This does a couple things.

First, it removes the need to train anyone on using your best practices. The
process is already enshrined in a version-controlled executable that anyone
can run. You don't even need to 'install lore' or 'install pipenv' \- you just
run your app. If you need to add documentation, you add comments to the
script.

Second, there's no need for anyone to set up an environment - the script does
it for you. Either set up your scripts to go through all the hoops to set up a
local environment with all dependencies, or track all your development in a
Docker image or Dockerfile. The environment's state is tracked by committing
both the process scripts and a file with pinned versions of dependencies (as
well as the unpinned versions of the requirements so you can occasionally get
just the latest dependencies).

Third, the pre-rolled dev environment and executable makes your CI-CD
processes seamless. You don't need to "set up" a CI-CD environment to run your
app. Just check out the code and run the application script. This also ensures
your dev environment setup scripts are always working, because if they aren't,
your CI-CD builds fail. Since you version controlled the process, your builds
are now more reproducible.

All this can be language-agnostic and platform-agnostic. You can use a tool
like Pipenv to save some steps, but you do not need to. A bash script that
calls virtualenv and pip, and a file with frozen requires, does 99% of what
most people need. You can also use pyenv to track and use the same python
version.

~~~
tamatsyk
Completely agree on every bullet point,

Every time I saw simple bash scripts or/and Makefile used - it did not seem to
be the idiomatic way of doing things in python but after using it for a while
- turned out to be one of the best development experiences.

------
jessaustin
I bitch a lot about npm, but then I remember that time when python's package
distribution drove me to learn a new language. I can't help but notice that
TFA and all the comments here are only talking about one end of this: managing
your dev environment. Is there a similar work explaining how to distribute
python packages in a straightforward manner? Is that article compatible with
this one?

------
superbatfish
The author's justifications for using this home-grown tool over miniconda are
weak at best, if not plain incorrect.

Conda really is the tool he wants; he just seems not to understand that.

~~~
cuchoi
How does Conda replaces a virtual environment? (honest question)

~~~
kalefranz
Python's virtualenvs target isolation of the site-packages directory. Conda
environments are one step up in abstraction, isolating the "prefix" (in python
world just the output of `sys.prefix`). The target for conda and conda
environments is the management of everything within that prefix, including
python itself. The target for pip, pipenv, virtualenv, and other python-only
package management tools is everything within `lib/pythonX.Y/site-packages`.

The distinction is important especially for people using python's data science
libraries, since those libraries are often just python wrappers around
compiled code and link to shared "system" libraries. Conda manages and
isolates those libraries; pip and virtualenv do not.

The distinction also has security implications, for example when openssl is
statically embedded in wheels. When this happens, there isn't any real
visibility into the openssl versions being used. Because conda has the
flexibility of the step up in abstraction as I described before, conda can
manage a single instance of openssl for the whole environment, and then the
python packages needing openssl need not statically embed it.

------
rbanffy
Version pinning is technical debt and a fool's errand. New versions will
always come out and your new development is confined to what once worked. You
need to keep testing with current versions to see what will break when you
upgrade and fix it as soon as possible so as to minimize the odds of a big
breaking change.

It may keep your environment stable for some time, but that stability is an
illusion because the whole world moves on. You may be able to still keep your
Python 2.2 applications running on Centos 3 forever, but you shouldn't want to
do it.

~~~
erik_seaberg
New versions will always come out, but it's not my job to test _all_ of them.
I'd rather consciously decide when I can afford to pay off the debt.

------
xycco
virtualenv + pip-tools

[https://github.com/jazzband/pip-tools](https://github.com/jazzband/pip-tools)

~~~
Alex3917
Came here to say this. Directly freezing requirements.txt rather than using a
requirements.in file is a mistake imho.

------
alanfranzoni
One things that comes to my mind is: when I was starting using Python, I was
eager to mock Java people and their absurd approach (write everything in Java,
specify a full classpath for all dependency, etc). I pointed out as it was
easy and quick to program in Python rather than in Java.

I did not appreciate what the pros of a linear and well-defined (by the
language) approach to the dependencies, and a clear API between the system
libraries (java, javax) vs the user libraries, actually gives A LOT of value.
Even though it's more cumbersome to use.

------
syoc
Why would you do this? Redirect chain:

    
    
      https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241
      https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftech.instacart.com%2Ffreezing-pythons-dependency-hell-in-2018-f1076d625241
      https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241?gi=85c0588ca374

~~~
pcl
It looks like tech.instacart.com is hosted on Medium. The redirect is part of
the auth flow. If you have a Medium account, you would have logged in to
medium.com, not tech.instacart.com. If you don't have a Medium account, Medium
still will want to add first-party tracking information to your interaction
with tech.instacart.com and all other Medium properties. So this client-side
redirect flow enables them to capture that association.

This is presumably what the `gi=85c0588ca374` query parameter is in the
follow-on redirect. I would guess that `gi` stands for "global identity" or
something.

------
Waterluvian
I ran into a migraine last week: cleaning up requirements.txt

How do you determine which requirements are no longer needed when you remove
one from your code? In node, your package.json lists only packages YOU
installed. So removing them cleans up their dependencies. But in Python,
adding one package with pip install might add a dozen entries, none indicating
they're dependencies of other packages.

~~~
tleguijt
At most projects we're using pip-tools which generates a fully pinned
requirements.txt based on a manually kept (and clean) requirements.in which
only contains the specific packages you need without their dependencies

~~~
Waterluvian
Thanks. I'll investigate this method. It sounds like you hand write
dependencies and their versions into the requirements.in file?

~~~
sciurus
The idea is described at [https://www.kennethreitz.org/essays/a-better-pip-
workflow](https://www.kennethreitz.org/essays/a-better-pip-workflow)

------
textmode
Naive question: Why does this url 302 redirect to medium.com and then
medium.com forwards back to the same original url?

Is there some commercial advantage?

Why not just post the medium url

[https://medium.com/p/f1076d625241](https://medium.com/p/f1076d625241)

This 302 redirects to tech.instacart.com

------
dorfsmay
Anybody played with the brand new XAR from Facebook?

[https://code.fb.com/data-infrastructure/xars-a-more-
efficien...](https://code.fb.com/data-infrastructure/xars-a-more-efficient-
open-source-system-for-self-contained-executables/)

~~~
handruin
Thanks for the link. That looks interesting; I'll have to give that a try.
When I started reading the link my first thought was Pex from Twitter. I don't
know how comparable XAR is to Pex but it's worth a look to compare the two.

~~~
terrelln
Both PEXs and XARs package a python script and its dependencies in single
hermetic file.

PEX is a self-extracting zip file which has to be fully extracted before being
run. The extracted files could potentially be modified.

XAR is a self-mounting compressed SquashFS filesystem image. SquashFS will
decompress pages lazily and cache the result in the page cache, so the startup
time is much faster. Since SquashFS is read-only, the files can't be modified.

------
BerislavLopac
Since we're sharing XKCD cartoons, here's one that comes to mind:
[https://xkcd.com/927/](https://xkcd.com/927/)

So not to disappoint, here's another contestant: Poetry [0]

That said, in my experience it works best if don't force any particular
workflow on your developers, but maintain a solid and repeatable process for
testing and deployment. People have different mental models of their
development environments -- I personally use virtualfish (or virtualenvwrapper
if I'm on Bash), while a colleague works with `python -m venv`; and we have
played with pipenv, pyenv, anaconda and poetry in various cases.

As long as your requirements are clearly defined -- requirements.txt works
perfectly well for applications, and setup.py for libraries [1] -- any method
should be good enough to build a development environment. On the other hand,
your integration, testing and deployment process should be universal, and
fully automated if possible, and of course independent of any developer's
environment.

[0] [https://github.com/sdispater/poetry](https://github.com/sdispater/poetry)

[1] [https://caremad.io/posts/2013/07/setup-vs-
requirement/](https://caremad.io/posts/2013/07/setup-vs-requirement/)

------
Animats
_Use a fresh virtualenv for each project_

As a form of version pinning, this locks in old versions and creates technical
debt. A few years downstream, you're locked into library modules no longer
supported and years behind in bug fixes.

~~~
yen223
The joy of not having to deal with broken production builds when dependencies
change under your feet is well worth the "technical debt" in my opinion.
Reproducible builds are valuable in their own right.

------
chocks
We’ve recently went through this process at our company & chose to use pipenv
as the dependency management tool. As mentioned in the article, pipenv is
under active development but takes care of many things that we had custom
scripts before such as requirements hashs, in-built graph of dependencies,
automatic retries of failed dependencies, automatic re-ordering of dependency
installations etc. it also has a few quirks - we had to pick a version that
had most commands working & also pipenv install is painfully slow & didn’t
seem to have a caching strategy for already built virtualenvs.

------
Wheaties466
Doesn't using requirments.txt not account for (I forget the official name)
Double Dependencies, you dependencies in requirements.txt might have a
dependency whose version number may change over time.

This seems like something pip freeze could handle but doesn't.

------
ausjke
I started using pipenv and it seems everything just works fine, except that I
can't really install wxPython with pipenv, but I can live with that.

------
jrochkind1
ruby practices based around bundler aren't perfect, but they did solve _this_
level of problem ~7 years ago.

It remains a mystery to me why python seems to have won the popularity battle
against ruby. They are very similar languages, but in all ways they differ
ruby seems superior to me.

~~~
kalefranz
My theory is that it's because Travis Oliphant wrote numpy for python rather
than ruby.

~~~
kenhwang
And Python is taught in the intro to programming course in just about every
college in the world.

Dumb simple languages make better teaching tools, but unlike Lisp and
Smalltalk, Python was also good enough for widespread professional use.

So almost everyone is exposed to Python, many people never bothered to learn
anything better. Inertia is a hell of a force.

~~~
jrochkind1
> And Python is taught in the intro to programming course in just about every
> college in the world.

Why do you think that ended up python instead of ruby? Something about the
language or it's uses, or just a coincidence of history?

I have no idea myself.

I think ruby and python are about equal level of both "simpleness" (neither is
very simple, actually; although it depends on what you mean by 'simple') and
"good enough for widespread professional use" (both are, and especially both
were ~8 years ago). Or do you disagree and think they differ there?

~~~
kenhwang
I love Ruby, but I would still advocate for Python as a teaching language.

Ruby's grammar is objectively more complex than Python's. People generally get
stuck on syntax issues when they begin learning programming. Python's
significantly simpler grammar, simpler and fewer basic building blocks, and
historically "only one way to do things" philosophy makes it easier to pick
up.

Ruby's conventional control flow also doesn't translate well to lower level
languages; Enumerable pretty much replaces all sorts of loops,
functions/methods tend to be chained instead of wrapped (math style), implicit
returns, and perlisms, all make Ruby more confusing for a first timer
language.

------
AstralStorm
Yes, let's add another incompatible tool to the list. /s

Here's to Python 4 actually fixing this mess.

------
Alir3z4
Dependency hell in Python ? The only annoying part would be missing some
library to build certain packages, like lxml, etc.

That's all.

We Python developers are fortunate to have amazing tools such as pip,
virtualenv, etc.

------
alanfranzoni
So... current tools miss some functionality. Let's invent a new one. Reminds
me of another xkcd: [https://xkcd.com/927/](https://xkcd.com/927/)

------
avip
Less blogs, more Dockerfiles. That's the solution.

------
kanox
This needs to be posted again: [https://xkcd.com/927/](https://xkcd.com/927/)

