Hacker News new | past | comments | ask | show | jobs | submit login
Python's New Package Landscape (andrewsforge.com)
325 points by teajunky 7 months ago | hide | past | web | favorite | 165 comments



While pipenv has garnered a lot of attention and praise for ease of use, it falls over whenever I integrate it with any serious work. Pipenv lock can take 20-30 minutes on a small flask app (~18 dependencies). And it often mixes up virtualenvs, enabling the wrong one with seemingly no remedy. I see the problems on Windows, MacOS and Ubuntu. 2018 is not the year of pipenv, for me. I'm sticking with regular virtualenvs and the manual-hell of requirements.txt. I hope it gets better eventually.


I tried a few of the package management in Python recently (https://www.vincentprouillet.com/blog/overview-package-manag...) and had the same conclusion with Pipenv. It is way too slow and frankly the UX is not that great either.


I’m disappointed your post did not cover using conda. As the pipenv drama has rolled on, I’ve moved from viewing conda merely as the best user experience in Python environment & package management to instead viewing it as the only serious option for professional scientific computuing work (and quite possibly any professional Python work at all).


Agree with you about using conda. And since no one has mentioned Jake Van der Plas' review of conda vs the alternatives, myths, etc., here it is:

https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-mi...


I read that page and looked for the reason I don't use Conda (because I already have virtualenvs and I'm not prepared to burn them all down):

> Myth #5: conda doesn't work with virtualenv, so it's useless for my workflow

> Reality: You actually can install (some) conda packages within a virtualenv, but better is to use Conda's own environment manager: it is fully-compatible with pip and has several advantages over virtualenv.

> [...] the result seems to be fairly brittle – for example, trying to conda update python within the virtualenv fails in a very ungraceful and unrecoverable manner, seemingly related to the symlinks that underly virtualenv's architecture.

Doesn't sound like much of a myth then, if Conda's take on virtualenv is "you can technically do this, but everything will break ungracefully and unrecoverably, so please don't".


He's not saying that you _should_ install conda within a virtualenv, but that some have tried with some success.

At the end, one of his conclusions is: "If you want to install Python packages within an Isolated environment, pip+virtualenv and conda+conda-env are mostly interchangeable". So don't change if you don't have to.

But he does give reasons why conda may be superior to virtualenv -- managing different version of Python, tracking non-Python dependencies, true isolation of environments, etc.


I should probably write another post once I've tried conda a bit more. I've used it very recently for some numpy/pytorch environment and it was quite nice.


Although conda is also getting slower and slower, and now routinely spends 5-20 minutes on dependency resolution even for trivial environments.


This is deeply untrue for conda. Even very complex environments build in less than a minute. I can believe there are corner cases where conda is very slow, but claiming conda takes 5 minutes for trivial environments is flat out wrong. Perhaps it is issues with a firewall, VPN connection or something else, absolutely no chance that is from normally executing conda.


Well, I experience it on a daily basis, and I'm not the only one! It's officially an open issue: https://github.com/conda/conda/issues/7239


The slowest operation in the linked thread is still taking less than 2 minutes...

Edit: correction, there are two examples that take longer, one at 3.5 minutes, one around 8 minutes. I don’t think it changes any takeaways though.


The last one in that thread is 11 minutes. And, I just did `time conda create -n test -y anaconda pytest-cov pytest-xdist coverage sphinx_rtd_theme flake8` on my Macbook, and it clocked in at 36 minutes! Probably, this is affected by also having the condaforge channel active (which I need), but it's definitely not from some network/VPN issues. Everyone I work with has similar problems. So don't tell me there's "absolutely no chance that is from normally executing conda".


The same command for me, also running on a Macbook, took just over 5 minutes, using conda 4.5.9. I also repeated inside a Ubuntu docker container with only miniconda installed and got the same.

It’s funny to me that you would expect operations involving “conda install anaconda” to be fast though. 11 minutes seems perfectly fast for that and would be comparable to pip or anything else if doing that huge of a set of package installs.

That is not at all a “trivial environment” like you said previously.


Am I weird that I use anaconda instead of virtualenvs? I guess it’s overkill if you aren’t using the other conda features.


It's worth mentioning that the package manager component of anaconda is released as a separate (small) install: miniconda[1]. It includes only Python and the package manager, and not all of the 700+ packages installed as part of a full anaconda installation.

With its ability to install Python and non-python packages (including binaries), conda is my go-to for managing project environments and dependencies. Between the bioconda[2] and conda-forge[3] channels, it meets the needs of many on the computational side of the biological sciences. Being able to describe a full execution environment with a yaml file is huge win for replicable science.

1. https://conda.io/miniconda.html https://conda.io/docs/user-guide/install/index.html

2. http://bioconda.github.io/

3. https://conda-forge.org/


It is an overkill for pure-python packages or packages with simple C extensions. Conda was developed specifically to handle non-python dependencies, which would be difficult to build in setup.py.

Also, a conda package is not a replacement for a distutils/setuptools package. When building a conda package, one still calls setup.py. So every python conda package has to be a distutils/setuptools package anyway.


Thanks for caveat. Nonetheless, anaconda makes my life so much easier when working with python libraries. If anybody got any other reasons to be careful of it, I'm interested!


If you need to work with cutting-edge python tools (e.g.from GitHub) it’s often easier to use virtualenv to control the versions you need installed.


conda environments support pip and arbitrary pip commands. So if you use pip to for example install a specific version of a library directly from github that information will be stored in your conda environment and be reproduced every time you recreate your environments.


It seems that Anaconda is under-appreciated outside of pydata circles. Before using it I had no idea that it could manage virtual environments, dependencies and different versions of python.

The fact that it's not a community-driven project might be one of the reasons.

Meanwhile, in a galaxy far away, people are also using buildout.


I have been revisiting buildout recently and I wish there's something that merge ease of use of pipenv with buildout concept. Perhaps something similar to Nix throw in the mix, but more specific to a Python project. I heard that I can do this with Conda, but I never tried.

Being able to define and install external dependencies (e.g. ImageMagick, libsodium, etc.) from a configuration file local to a project is something I missed the most, especially when I'm working on several projects at once.


> Perhaps something similar to Nix throw in the mix, but more specific to a Python project.

Any examples of how Nix itself doesn't do what you need? One example I can think of: Nix doesn't support Windows.

https://nixos.org/nixpkgs/manual/#python


Nix does everything I want, but I find it hard to convince friends and coworkers to try out Nix. I think this is partly due to Nix itself not belonging to the Python's ecosystem, so the barrier is higher than say, "Yeah, Pipenv is just Virtualenv+Pip"


I think Conda is the way to go, because:

- there is Miniconda that doesn't force you to install all PyData packages. - virtual envs and needed packages are all defined in simple yaml file. - it works well with pip. So if a package isn't in Conda repository, you can install from pip. The annoyance here is that you must try conda, fail, and then try pip. - you can easily clone envs. So you can have some base envs with your usual packages (or one for Python 2 and another fo Python 3), and just clone them to start a new project.


I tend to only use anaconda for "data science work" and not for my small side projects.

I "feel" like it is overkill to use anaconda for things unrelated to 'data science' and the likes, but I'm not sure why I feel that way.

It kind of makes sense to use for other projects as well since you don't need to import all the things conda offers.


For one, as a package developer, publishing a source distribution of a package on PyPI is almost trivial. Publishing on Anaconda Cloud requires you to build the binary packages on all the OS's that you want to support (and for all Python versions you want to support) which most people delegate to some CI. So there is a whole new level of complexity involved.


You can use pip-install from within a conda environment. Conda isolates better than virtualenv does, IMHO.


When I have used it, and I have to for a certain project, it is incredibly slow to resolve depenendcies. Enough so that I go for a walk or do something else for 15 minutes while it thinks about whatever it's doing.


Yeah, this is becoming a real problem. This didn't use to happen, but now conda is slow to the point of being unusable if you need to create environments a lot (like during testing)


> Pipenv lock can take 20-30 minutes on a small flask app (~18 dependencies)

Do you have scipy/numpy/keras or cython somewhere in the deps? pipenv lock is slow, but not 20-30 mins slow unless there's a very very large download and/or a long compilation somewhere in there.


The web app in question depends on PANDAS + numpy so that's definitely part of the toolchain. It wasn't 20-30 minutes from day one. The lock time started fast and then ballooned. Other comments here saying 2-3 minutes per lock are consistent with my general experience.

This wasn't for complex pipenv operations either. A simple command: pipenv run python main.py took progressively longer to execute.


It takes about 2 minutes (feels like 5!) on my 2016 MBP to install 102 dependencies. Doing that in Docker takes about 1.5x the time. I haven't seen it take 20-30 minutes, but 2-3 minutes is still obscenely slow in my view.


A lot of this time may be spent on downloading the dependencies to the cache. If you're doing it in docker, you likely don't have a persistent cache. I've hit this issue before. https://github.com/pypa/pipenv/issues/1785

If you configure the cache properly you might solve it, but yeah it's kinda dumb it has to do that just for locking.


There is no reason for Docker to be slower, it must be some kind of configuration issue. Containers are basically just processes and there is virtually no difference in execution times.


As I said in my other reply, Docker is more likely to have ephemeral storage for the cache. So every single lock it'll re-download the package. Whereas locally, you're likely to still have the packages cached.

This can make a difference of tens of minutes for some packages which have a 1 gigabyte (!!!) download.


I have a docker project with ~20 packages in the Pipfile, the lock step of a new `pipenv install` takes about 3 minutes.


I used to use pipenv and I find that hard work of actually properly learning the python pip/requirments.txt/setup.py/venv landscape well enough that I don't have problems anymore, took less work than actually getting pipenv to work right.


> Pipenv lock can take 20-30 minutes on a small flask app (~18 dependencies)

I've never seen anything like that on a number of fairly large apps – a minute or two, at most. Are some of those dependencies extremely large or self-hosted somewhere other than PyPI?


I got so frustrated using pipenv at work that I created an alternative package manager: https://pypi.org/project/dotlock/. It's not 1.0 yet but if it suits your needs I'd love if you tried it out.


Can you elaborate on why using a requirements.txt file is "manual hell"?

I rely on it for pretty much everything and I didn't run into game breaking problems.


The only problem I know with requirements.txt is that many people would require particular versions there while later versions work perfectly fine. Every time I clone someone's Python project to work with I have to manually replace all the =s with >=s to avoid downloading obsolete versions of the dependencies and have never encountered a problem.

Anyway, for me the most annoying thing about the Python projects architecture (and about the whole Python perhaps) is that you can't split a module into multiple files so you have to either import everything manually all over the project trying to avoid circular imports or just put everything in a single huge source file - I usually choose the latter and I hate it. The way namespaces and scopes work in C# feels just so much better.


> have never encountered a problem.

Oh, so you weren't around when Requests went 2.0 backward-incompatible (because they changed .json() with .json, or the other way around, can't remember) and half of PyPI, with its happy-go-lucky ">=1.0", broke...?

Since then, most people have learnt that you pin first and ask questions later.


Indeed. I just hate the versions hell (as well as dealing with old versions of a language although I happen to love old hardware) so much that I've been ignoring the whole Python until the 3.6 release waiting for the time when one will be able to use all the Python stuff without bothering to learn anything about Python 2.x. It took 10 years of waiting but we are finally here now and now I enjoy Python :-)


I just encountered the fun fact that Pip 18.1 broke Pipenv whereas Pip 18.0 worked just fine.


It requires a lot of work to produce reproducible/secure builds, see the original Pipfile design discussion for gory details:

https://github.com/pypa/pipfile


The problem requirements.txt doesn't solve is "what I want" versus "what I end up with".

There's no concept of explicit versus implicit dependencies. You install one package, and end up with five dependencies locked at exact versions when you do `pip freeze`. Which of those was the one you installed, and which ones are just dependencies-of-dependencies?

If you're consistent and ALWAYS update your requirements.txt first with explicit versions and NEVER use `pip freeze` you might be okay, but it's more painful than most of the alternatives that let you separate those concepts.


because if you pin stuff in requirements.txt, they either never get updated, or you have to go through, check which ones have updated, and manually edit the requirements.txt. the combination of Pipfile and Pipfile.lock were designed to solve this in a much better way (briefly: understanding standard deps vs development deps, and using the Pipfile.lock file for exact pinning/deployment pinning, vs general compatibility pinning in the Pipfile).


That is not my experience. Until recently, I used to pin versions in requirements.txt, then from time to time I removed the pinned versions, reinstalled everything, tested and added new versions to requirements.txt. Most of the work was testing for incompatibilities, but no package manager will help you there.

Recently I switched to pipenv because zappa insists on having virtualenv (as app dev I never had any need for it - but it seems my case is an exception, as I almost never work on multiple apps in parallel). Pipenv does make version management a bit easier, but it wasn't difficult (for me) to begin with.

From talking with other developers I know my view is somewhat unorthodox, but I haven't encountered the problems they describe, or the pain hasn't been that big for me to embrace all the issues that come with virtualenvs.


Btw, it is possible to use the compatible ~= operator (PEP 440) within requirements.txt.


Or just use pip-tools to automatically update dependency versions.


I thought I was the only one to not get pipenv to work. I tried it out after it was released and it was buggy. Every time I used it afterwards I would get a weird edge case that would make me go back to pip and virtualenv.


Something sounds broken here, I have projects with similar numbers of dependancies, including heavy ones like pandas and numpy, and don't get anywhere near that long to lock. I don't have a specific suggestion for you, though. How long does a regular `pip install -r requirements.txt` take for the same dependancies?


Pipenv is hopelessly slow. It's a shame. Remember when git first came out and it changed the way we worked because it was so quick to commit now? (I fully expect that most git users here don't remember that, actually). There is no going back. I will not use slow tools. My tools need to be at the very least as fast as me.


> Pipenv is hopelessly slow.

Interesting, this has never been a problem for me. I've built some large tools and while it isn't fast, it's always completed in a few minutes.


A few minutes??! That sounds very slow.


To be clear: with few deps it's very fast for me, it's just lager projects with LOTS of non-trivial deps where it can slow up.


What OS?


Mid-2015 MacBook Pro running the newest OS


It is abundantly clear that the pipenv developers use MacOS so I wonder if it's an OS dependent thing.


my current project is at 16 dependencies atm and ... its really not as bad as you make it sound.

    pipenv lock  5.65s user 0.29s system 77% cpu 7.639 total
i think 7.6 seconds is fine for an operation that you'd rarely do

it would probably take ages at work though. just opening a WSL terminal takes several seconds there, which is predictably instantaneous (<100ms) on fedora linux at home


SSD vs HDD may be?


We just went through this cycle - ultimately we build packages (debs) and dockers, for deployment within VMs. Our build process - depending on the component pushes the deb to repos, or uses the deb in the docker.

After trying to replace pip with Pipenv, we had to stop. The dependency resolution time for 20 declared dependencies (that in turn pull down > 100 components) takes well over 5 minutes. With poetry - it takes less than 33 seconds on a clean system. The times are consistent for both Ubuntu 16.04 and Mac OS X.

Our only goal is to get to the point we're now in - tracking dependencies, and separate dev requirements (like ipython and pdbpp) from our other requirements. Poetry made it fast, simple, and made me an addict.

Over two days, I moved our entire codebase and every single (active) personal project I had to poetry. I don't regret it :)


I can second this... poetry is a seriously good project and matches PEP standards for the pyproject.toml, you will definitely see more projects jumping onto that standard soon.


Sébastien Eustace is a really awesome developer, I've used Pendulum and Orator as well as Poetry for projects, the documentation is beautiful and complete, the APIs are well thought out, and if you find an issue contributing back to the project is simple and straightforward (I'm happy to have a few PRs accepted into Orator).

There's a certain joy working with tools when it's clear that the person making those tools actually cares about the developer and making it work well.


That was the part that made me try it, literally while waiting for Pipenv to resolve.


Slow dependency resolution is a known problem and will be fixed in a newer release. The reason is not entirely pipenvs fault (curse be to setup.py), and the pipenv maintainers are mimicking the somewhat novel method twine uses.


I was about to rule out Poetry due to pyup not supporting it, however it turns out Dependabot (which as a bonus looks to be more actively maintained than pyup) supports it:

https://dependabot.com/blog/announcing-poetry-support/


I did have the same problem, and apart from taking too long for dependency resolution it felt like breaking up at each update, sometimes from pipenv update, sometimes from pip update making pipenv breaking. I migrated to poetry and never been happier.


Published May 11th, 2018. But it's interesting it's popping up again. It's a good explanation of the landscape as of 2018, though Pipenv has since gone in a weird direction. There's a lot of recommendations for it, but I sometimes get the feeling people don't understand what they're recommending, such as replacing some things that work (setup.cfg) by things that don't do the same thing (Pipfile).

Man, the Python packaging ecosystem is one of those things which really bring me down regarding the state of Python, because there is such an extremely high barrier for breaking backwards compatibility and nothing really works.

The JS ecosystem is far better in this regard. Pipenv was most promising because it followed in Yarn's footsteps, but it didn't go all the way in replacing pip (which it really should have). So now there's still a bunch of stuff handled by pip, which pipenv does not / cannot know about, and this isn't really fixable.

The end result is that instead of telling people about pip + virtualenv, we now have pip, virtualenv and pipenv to talk about. And people who don't understand the full stack, and the exact role of each tool, can't really understand how to properly do the tasks we choose to recommend delegating to each one of them.

There's three separate-but-related use cases:

- "Installing a library" (npm install; pip install).

- "Publishing a library" (setup.py. Or Twine if you're using a tool. Both use setuptools.).

- "Deploying a Project", local dev or production (pipenv. Well, if it's configured with a pipfile, otherwise virtualenv, and who knows where your dependencies are, maybe requirements.txt. Pipenv does create a virtualenv anyway, so you can use that. Anyway you should be in docker, probably. Make sure you have pip installed systemwide. Yes I know it comes with python, but some distributions remove it from Python. Stop asking why, it's simple. What do you mean this uses Python 3.6 but there's only Python 3.5 available on Debian? Wait, no, don't install pyenv, that's not a good idea! COME BACK!)

The JS ecosystem manages to have two tools, both of which can do all of this. I don't know how we keep messing up when we have good prior work to look at.


> - "Deploying a Project", local dev or production (pipenv. Well, if it's configured with a pipfile, otherwise virtualenv, and who knows where your dependencies are, maybe requirements.txt. Pipenv does create a virtualenv anyway, so you can use that. Anyway you should be in docker, probably. Make sure you have pip installed systemwide. Yes I know it comes with python, but some distributions remove it from Python. Stop asking why, it's simple. What do you mean this uses Python 3.6 but there's only Python 3.5 available on Debian? Wait, no, don't install pyenv, that's not a good idea! COME BACK!)

This makes the situation sound a lot more complex than it actually is by conflating separate layers: the system distribution issue is exactly the same for both Python and JS (if Debian ships an old v8 you either need to install a new one, perhaps using Docker to make that easy and isolated). Similarly, the question of whether you install the app using pip or pipenv is a different layer from whether you're using Docker or not, just as Docker is unrelated to the question of whether you use npm or yarn.

For a new project in 2018, you can simply say “Use pipenv. Deploy in Docker, using pipenv.” and it works as well as the JS world. People sometimes choose to make their projects too complicated or to manage things at the wrong level but that's a social problem which is hard to solve with tooling.


One difference: large swaths of Python developers grew up using the system-provided version of Python.

Most Node developers grew up with

    curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash
or

    brew install node
or using one of the dozen other ways to install node. Distinct versions and per-project packages were the norm from day one. That was not true with Python.


Use pipenv. Deploy in Docker, using pipenv

As a developer I definitely see the advantage of this approach and realize that it trivially solves some very hard problems (and causes a whole bunch of different problems if you're running Windows...). As an end user I'm not super thrilled about the prospect each tool I use coming in its own docker container and needing to spin up 20 different containers each time I want to do anything.


> The JS ecosystem manages to have two tools, both of which can do all of this. I don't know how we keep messing up when we have good prior work to look at.

Agreed. You know you really blew it when even PHP does it better, and composer is unquestionably better than anything we've got in Python.


Poetry seems promising


In case this scares any new users, I've used nothing more than pip and virtualenv for several years with no issues of note.


Seconded. I read this article and thought "nope, not touching any of those tools". I'd rather spend my time building products than spending a week researching the landscape of 10 different package management approaches.


Same. pip + virtualenv just works.


There are lots of errors when it comes to reproducing the build on other machines. Pip install -r requirements.txt does not guarantee that you will install the same version of packages on a new machine, and in fact, you will typically not.


I've never had problems with this when requirements.txt contains package versions

Have you, or are you not using explicit versions supplied by eg pip freeze?


Totally agree. And why not just invest time into pip than invent again.


Same. I feel people invent problems in order to not use these tools.


Whenever talk in Python-world goes towards packaging, I feel like I have been transported to Javascript-world: it's never clear to me what concrete problems are being solved by the new tools/libraries.

This article seems well-written and well-intentioned. Despite reading it, I don't know why I would not have loose dependencies in setup.py and concrete, pinned dependencies in requirements.txt. It's never felt hard to manage or to sync up - the hard part is wading through all the different tools and recommendations.


> loose dependencies in setup.py

How does that work? How would someone else coming to work on your project use them?

> concrete, pinned dependencies in requirements.txt

How do you maintain that requirements.txt? And while that might work for applications, what do you do for libraries?


I assume that someone working on the project would do:

  pip install -e .
in a virtual environment. I thought this was quite well-established. Is there a problem with it that I'm not aware of?

  pip freeze > requirements.txt
for requirements.txt generation. For libraries just omit this? I'm not sure I understand the question. The article also mentions that several of the new tools aren't appropriate for libraries anyway.


> I assume that someone working on the project would do: pip install -e . in a virtual environment. I thought this was quite well-established. Is there a problem with it that I'm not aware of?

So ignoring your requirements.txt, and potentially working with different versions of dependencies from the ones you were working with and encountering different bugs?

(Also managing your virtual environments "by hand" is tedious and error-prone when you're working on multiple projects).

> pip freeze > requirements.txt for requirements.txt generation.

The problem with this is that it's not reproducible - if two people try to run it they might get different results, and it's not at all obvious who should "win" when the time comes to merge. If you mess up the merge and re-run then maybe you get a different result again, and have to do all your testing etc. over again.

> For libraries just omit this?

Maybe, but then you'll face a lot of bug reports from people who end up running your library against different versions of upstream libraries from the ones that you tested against.


People working on your project have the choice of using the requirements.txt or not. I would think core developers use the loose dependencies, with the aim of testing the latest and fixing the bugs. Someone has to move dependencies forward at some point, and doing this locally for knowledgeable people seems reasonable. CI should definitely - and part time contributors should probably - just use the pinned dependencies.

This is why I would not worry about pip freeze being non-reproducible. It is a manual step: upgrade our dependencies. Testing should happen all the time. If you are happy with the result of testing after upgrading dependencies, commit requirements.txt. I don't see new tools easing the burden of co-ordinating and testing dependency upgrades. Did I misunderstand them in this context?

I don't understand the concern for the library case. Pipenv doesn't address libraries. It seems to be an explicit goal of many people not to pin library dependencies. I'm asking what the new tools are solving - and again I can't see that they are solving this. Nothing is preventing you from pinning your library dependencies if you want (using old tools) but you'll probably get people complaining about being incompatible with other projects.


> I would think core developers use the loose dependencies, with the aim of testing the latest and fixing the bugs. Someone has to move dependencies forward at some point, and doing this locally for knowledgeable people seems reasonable.

Agreed that developers should be moving the dependencies forward, but you want to do that as a deliberate action rather than by accident. E.g. if you want to consult another developer about a bug you're experiencing, you want them to be on the same versions of dependencies as you.

> This is why I would not worry about pip freeze being non-reproducible. It is a manual step: upgrade our dependencies. Testing should happen all the time.

It's a manual step, but you still want to be able to reproduce it. E.g. if a project is in maintenance mode, you want to be able to do an upgrade of one specific dependency without having to move onto new versions of everything else.

I don't work in Python any more so I don't know what the new tools do or don't do, I was just starting from your "I don't know why I would not have loose dependencies in setup.py and concrete, pinned dependencies in requirements.txt." and I know that workflow gave me a number of problems that I simply don't have when working in other languages. So I'm hoping that Python has caught up with the things that are known-working elsewhere, but maybe not.


What would requirements.txt even _do_ in a library? AFAIK that file isn't even read when packaging, distributing, and installing libraries.


Well presumably you want some way to have all the developers working on a library be using the same version of upstream libraries that that library depends on.


Having used pure pip + virtualenv{,wrapper}, pip-tools + virtualenv, poetry and Pipenv for medium to large applications, I'm going to be sticking to pip-tools for the time being for apps. Poetry is fine, but pip-tools is faster and there's less to learn. Pipenv is unbearably slow for large applications and often buggy.

For libraries, I've been using Poetry for molten[1] and pure setuptools for dramatiq[2] and, at least for my needs, pure setuptools seems to be the way to go.

[1]: https://github.com/Bogdanp/molten

[2]: https://github.com/Bogdanp/dramatiq


The problem I ran into with pip-tools (and I assume pipenv has this) is that the lock files are platform and python-version specific (evaluates all of the conditionals for the given platform) when I do a lot of stuff cross-platform.


I'm not a programmer by trade, but I dabble, and these issues make it much less fun.

In my limited experience, Clojure's Leinengen is a far more pleasant way to solve these problems. I'm sure there are many other examples in other languages, but in the few I've used, nothing comes close. Each project has versioned dependencies, and they stay in their own little playground. A REPL started from within the project finds everything. Switch directories to a different project, and that all works as expected, too. It's a dream.

[https://leiningen.org/]


I've tried a lot of solutions, but nix-shell hands down is the best I've used. I wrote a little gist detailing how to develop in python using a Nix: https://gist.github.com/CMCDragonkai/b2337658ff40294d251cc79...


Not really packaging but related, my favourite new tool is Pyenv (https://github.com/pyenv/pyenv) it made getting a new laptop setup with various versions of Python so much quicker.

I haven’t used Pipenv yet but it works with pyenv to create virtual envs with a specified puthon version as well as all the correct packages.


Polyglots might also be interested in asdf (https://github.com/asdf-vm/asdf). It's like Pyenv but supports various languages via a plugin system (eg. https://github.com/danhper/asdf-python).


For those wondering (like I did): it doesn't seem like this has any relation to Common Lisp's ASDF.


Imagine I’m just getting started with Python, and I see this article. I think to myself, “Awesome, a primer!”

Then I start reading (these comments)... mayyybe I should try Julia... or anything else, at least while I’m still getting started.


Just use pip and virtualenv. I've been using them on a cluster of related projects for years and they've been rock-solid.


And you can avoid virtualenvs as well until you need to ship a large app.

"pip install --user pkg" is good enough for beginners and writing libraries with few deps.


Does anyone know why the Python community has seemed to struggle with package mgmt fragmentation/churn so much over the years, compared to other languages? Did Guido just not really care about package mgmt?


Guido wrote Python in 1990. Did you use a package manager back then? I didn't.

Part of the issue is how well Python integrates with non-Python dependencies. Before conda, when I wanted to upgrade some Python projects, I'd get errors complaining about my Fortran compiler. These days, most of the major projects upload precompiled binaries for major platforms to PyPI, but when it was just source code...


Sure, but couldn't it have been given some BFDL/high-level leadership importance in the last five years to rein in the craziness?


The PyPA team has done a lot over the past five years. The changelog for pip (https://pip.pypa.io/en/stable/news/) contains quite a bit, PyPI was migrated to Warehouse, and there have been several PEPs focused on improving the packaging situation. A lot of these ideas come from various people in the community and get formalized as official recommendations or tools, but these things take time, especially accounting for backward compatibility in an ecosystem as large and mature as Python's.

The short answer to "why isn't this solved?" is "it's hard, and there's a lot to do". Development practices change over time, and the tooling continues to evolve with them. It's easy to see a broad survey like this and think that there's too much going on, but taken at a high level, the space is definitely trending in the right direction.

(Note: I'm not part of the PyPA, but I'm interested in this area and try to follow along from the outside.)


Understood, I guess I'm wondering why it hasn't been possible to cull more of the less-successful attempts, or at least make it obvious to newer users what is legacy. As an outsider/newer person to Python, the number of package mgmt options to consider is vast and confusing, it would be helpful if there was one (or a few) more "blessed" solutions :)


> the number of package mgmt options to consider is vast and confusing

Part of the issue is due to the success of Python in very different niches. The likes of Rails or Node can concentrate on specific ecosystems, which account for the bulk of their users and have a limited set of scenarios they have to support; whereas Python users come from sysadmin to data-crunching to web to desktop development to games to to to...

So each packaging tool comes with certain ideas, usually a result of the author's experience; maybe they work very well in this or that scenario, but then they break badly on others and sizeable chunks of the community revolt. So a new tool comes around and the cycle starts again, but now people also want compatibility with the old tool.

I suspect part of the solution will require splits between niches. It already happened with Anaconda, which has basically become the standard in a particular group of users (academia / datascience). Since that came around, lamentations around building C libraries have substantially reduced (to be fair, the arrival of precompiled wheels on PyPI also helped). Some similarly-specialized tool might eventually emerge as standard for other niches.

Python developers are cats and they are pretty hard to herd at the best of times, which is unsurprising -- who would stick around a language that is almost 30 years old and was never promoted by any major vendor? Only hard-headed fools like myself.


There are some "blessed" recommendations at https://packaging.python.org/guides/tool-recommendations/ (the Python Packaging Authority is about as official as you're going to get), but this boils down to it being a large open source community. No one's going to cull other people's efforts, but tools do merge on occasion (e.g. the functionality of https://github.com/erikrose/peep has been merged into pip, so peep is deprecated now).


Problem is like that XKCD comic about standards. People keep reinventing the wheel instead of embracing and evolving the one thing that works well and has existed for a long time.


No it is dead simple. Requirements separated by environment and a virtualenv. It's awesome compared to JavaScript


Wait until someone from the python community recites to you "there should be one and only one good way to do something" ....!


I think plenty of people in the Python community will earnestly say that while acknowledging that there isn't universal agreement on what that one good way is. It's an ideal to strive for, not a statement of fact.


This isn’t a primer but more like a survey of packaging options.

Using pip and virtualenv is usually fine or using pipenv .


“Usually” is great until it’s not. I don’t want to get invested in a languages only to discover issues down the line. So, if something as basic as packaging is potentially problematic and there are other options available, I might look elsewhere before rolling the dice that this problem is not too problematic.

Perhaps “Primer” was the wrong word, but I believe the sentiment is valid. Reading the comments, there is simply no consensus. If code readablility is important since code will be read more than written, packing is important because in many many scenarios that count code will be distributed more than it will be read.

It is simply frustrating that the typical response to comments like my original comment is some form of “it’s not as bad as you think”. Look, we have a problem here. A problem many other languages deem important enough to solve upfront. It’s been a problem for a long long time.


pip and virtualenv has been solving the problem and is the standard for ever.

other people tried different approaches, like, pipenv and is just that, a separate project trying to solve the same problem.

I don't know why / who said that pipenv is the official recommended way, if it is it should not be and I hope it is not.


> I don't know why / who said that pipenv is the official recommended way, if it is it should not be and I hope it is not.

The Python Packaging Authority says that pipenv is the first "recommended way" for managing application dependencies: https://packaging.python.org/guides/tool-recommendations/#ap...

Except if pipenv doesn't meet your needs. Then use pip.

Or, if you need cross-platform support, use buildout.

Or, if you are doing scientific computation, don't use any of those, use conda, Hashdist, or Spack.

Or if you need to create a package, use setuptools and twine.

So, no, pip and virtualenv don't solve the problem, because there are a lot of different problems and use-cases. I can say from my experience that conda is _the_ best solution to the problem for scientific work.


The Python community is not monolithic. PyPA has made some decisions that others might disagree with.


Kinda exactly what went through my mind, I've dabbled in python, but only to use other ppl's projects, which sometimes required setting up pip, etc. So I was looking forward to reading about how it'd been simplified :(



I’m looking forward to a future where I no longer have to use languages that require the use of different mechanisms to reference functionality from library code than one uses to take advantage of your own source ... all the incidental complexity around custom compilation processes are in reality, just enormously non-productive relics of the past.

In the future — you have a set of entry points to your program, these are crawled by the language aware tool chain to identify and assemble all the requirements for the program (including 3rd party functionality). There’s no need for separate tools to manage packages, caches, and virtual environments — let’s just put all this logic into the compiler(s) — where necessary let the application describe the necessary state of the external world and empower language toolchains to ensure that it’s so ... let’s live in the future already ...


The biggest problem I have with python packaging tools is how do I start using them. I'd rather not install all of them in my global site-packages. Do I need to create a virtualenv just to get a tool to manage my virtualenv's?

I have seen poetry is working on their bootstrapping story. I could not get their current solution to work on Ubuntu. Maybe what they are developing towards will work.

https://github.com/sdispater/poetry/issues/342


pip install --user


I stumbled on this (--user flag) only the other day and it simplified things immensely. Too much information, various ways of doing things explained in bits and pieces over the decades makes it look confusing.

Often there is a very simple way, even in Python packaging and deploying. In my situation the easiest way began along these lines --

1. Install python3 for the local user from the source distribution (make sure you have compilers etc that the configure check lists out)

2. After compiling the sources and finishing with 'make install', make Python available in your local search path

3. And use pip with this magical --user flag as needed. No virtual env, conda, etc etc.

4. Leave HOMEPATH etc alone as this conflicts with the setup of the admin's system wide installs (when you su)

Things can go smoothly with pip alone.


Doesn't that just install into a global-to-the-user path? Isn't one of the things we're trying to avoid is conflicts between these tools. For example, pip is now more freely breaking compatibility. I now need to ensure if my different packaging tools are compatible with my version of pip all installed in my user location.


Yes, and that's fine for most folks. The one exception I make is developing a huge app at work, use a virtualenv for that to keep it separate.


pipenv --three && pipenv shell && pipenv install <package_name> and you're set


Except how do I install pipenv in the first place?

Or why do I need to use pipenv to install pipsi so pipsi can manage my virtualenv environments.


I've migrated to pipenv in most of my projects, it's simple and great for application development but I still write everything to work with pure pip as well so the Pipfile basically lists my application as a dependency and I mainly use it for the lock files.

For library development I target pure pip/setuptools but still use pipenv during development phase. There have been a few cases where pipenv had problems and I had to either remove my virtualenv and reinitialize it or even remove my pip-file/lockfile, but since I still have my setup.py it's not a big deal for me.

As for uploading etc I use twine but I wrap everything in a makefile to make handling easier.

A problem I noticed recently was a case where one of my developers used a tool which was implicitly installed in the testing environment since it was a subdependency of a testing tool but it was not installed into the production image. This resulted in "faulty" code passing the CI/CD and got automatically deployed to the live development environment where it broke (so it never reached staging). Caused a little bit of a headache before I found the cause.


No mention of containers? I didn't write Python code since a while now, but it would have been nice to have a comparison with container technologies, which weren't available at the creation time of pypi and pip. Containers solve both the problems of the article : isolation and repeatability, for any language. Are virtual env tools still needed in the container era?


Containers would seem to solve the isolation part of the problem, but dependency management is not something containers can deal with effectively.


Big monolithic Python projects will face dependency issues for sure. However, softwares structured into simpler, smaller components, using the right languages for the right tasks, will probably have simpler dependencies for each modules.

That's what Go and tools like Bazel allows for : static builds, which forces to modularize the project into smaller independent components.

In case of static builds, the protocol between components is the C ABI, or an RPC protocol, but it could be a mesh of microservices too.

What is currently happening with the explosions of tools with Python is the result (take it with a grain of salt, only my opinion) of people only working with Python and not exploring enough outside of it


Can you give practical examples of what you are mentioning? Like, how to achieve ""static builds"" with python? Got me interested!


Dropbox does this for their Windows client, I think.


At least on non-Linux systems, containers are far more heavyweight than something like virtualenv.


No mention of Anaconda?! How strange. I recommend using `conda` instead of virtualenv, and instead of pip where possible.

A Python project does not only depend on Python modules, but non-Python modules as well. Beyond Python, conda helps manage your other dependencies, like your database. I use Miniconda instead of Anaconda, to avoid the initial mega-download.


I find conda far and above the best tool to manage python packages and dependencies. Being able to concisely contain all Python and binary dependencies together is invaluable.

I recently wrote about it as blog post, using conda within containers has solved almost every pain point we had with python packaging and how to get things into production reliably.


Counterpoint: it's terrible. A lack of a lockfile is a killer, plus it not really fitting well with the general ecosystem (it's not really a python dependency manager). It's an all or nothing tool, which sucks to be honest

Now most projects have wheels pip is pretty damn good.

The conda CLI is also just terrible. It's good for ad-hoc research, but for big deployments? No thanks, I've had enough pain using it.


For pure-python library projects, I found Poetry the best option these days (haven't tried Hatch). But it is still heavily under development, so it's not necessarily a black-box solution.

The biggest pain point of Pipenv for me is that it cannot as yet selectively update a single dependency without updating the whole environment.


If you only think in Python then packaging really might be a painpoint. But honestly if we look at other languages it's not so bad actually, very specifically calling out Golang here because it claimed exactly this topic as an initial design goal and to this day has basically failed at delivering it.

There are even languages like C++ where the community as a (w)hole has given up on that topic and instead opts for completely building every tool by building the underlying libraries up first manually.

Considering all this, who can actually beat Python at this point? Java maybe? Is Ruby still competing? How is NodeJS doing?

Currently with what I see around me (mostly Go and C++) I don't feel too bad about setuptools+pip+virtualenv anymore.


Ruby's bundler is pretty neat. It's one of the first systems that introduced lock files.


I have been using purely setuptools for all of our open source Python libraries at Contentful, but have found that lately I've been getting deprecation warnings from PyPI not to use `setup.py upload` anymore.

What should the alternative be now?

Edit: I'm reading about twine right now, but I cannot begin to comprehend why it's not bundled directly if this is what they are intending for us to use to upload packages.


Anything PyPI-related has recently gone into the (terrible) habit of recommending very recent (and often half-baked) tools that live entirely outside of stdlib. It seems pretty silly to me, considering Python core developers made significant efforts to bundle and support pip and virtualenv (venv) in the stdlib precisely to avoid having a lot of de-facto essential libraries outside the core distribution.

If the problem is that stdlib cannot move as fast as PyPI-related development requires, maybe that should be fixed, rather than trying to bypass all quality checks and then relying on obscure shared knowledge to navigate the ecosystem. Maybe there should be a system where specific network-sensitive stdlib modules could be updated faster than the rest.


You're mostly right, the problem is also that users don't upgrade their Python distribution very often, so they miss out on new features.

> Maybe there should be a system where specific network-sensitive stdlib modules could be updated faster than the rest.

This is essentially what `setuptools` does, by putting a package on PyPI that monkeypatches/plugs in to the stdlib.


Hello, I'm the person who deprecated `setup.py upload`. The warnings should be telling you that `twine` is the preferred tool for uploading.

The reason for this is that right now, that command comes from `distutils`, which is part of the standard library. There is a huge disadvantage to bundling this functionality with your Python distribution, namely that it can only get upgraded when you upgrade your Python distribution. A lot of folks are still running versions of Python from several years ago, which is fine, but it means that they are missing out on anything new that's been added in the meantime.

For example, earlier this year, we released a new package metadata version which allows people to specify their package descriptions with Markdown. This required a new metadata field, which old versions of `distutils` know nothing about.

Upgrading `distutils` to support it would require that these changes go though the long process of making it into a Python release, and even then they would only be available to folks using the latest release.

Moving this functionality from `distutils` to a tool like `twine` means that new features can be made available nearly immediately (just have to make a release to PyPI) and that they're available to users on any Python distribution (just have to upgrade from PyPI).

The `distutils` standard library module comes from a time when we didn't have PyPI and thus, didn't have a better way to distribute this code to users. We have PyPI now though, so bundling `distutils` with Python is becoming less and less useful.


Why not bundle twine like pip? In fact, why not merge the twine functionality into pip?


> Why not bundle twine like pip?

The `pip` package is not actually bundled with your Python distribution, instead the standard library has `ensurepip` which provides a means of bootstrapping a `pip` installation without `pip` itself. See [0].

> In fact, why not merge the twine functionality into pip?

This has been considered and still might happen, see [1], specifically the comment at [2].

[0] https://docs.python.org/3/library/ensurepip.html

[1] https://github.com/pypa/packaging-problems/issues/60

[2] https://github.com/pypa/packaging-problems/issues/60#issueco...


> The `pip` package is not actually bundled with your Python distribution

It is bundled, as mentioned in the link [0] you posted: "pip is an independent project with its own release cycle, and the latest available stable version is bundled with maintenance and feature releases of the CPython reference interpreter."

> the standard library has `ensurepip`

Ensurepip is for Python distributions, which are supposed to do use it automatically to provide the bundled pip. See [3]: "Ensurepip is the mechanism that Python uses to bundle pip with Python." Basically it's the installer of the bundled pip. At least that's how I understand it.

> This has been considered and still might happen, see [1]

Note that while the users there all basically say the same thing (twine should be merged into pip as "pip publish") the (two out of three) PyPA devs say it "would be a major mistake" and they are "against adding pip publish". (Before starting offtopic rants against poetry...) I somehow doubt this will improve soon.

[3] https://mail.python.org/mm3/archives/list/distutils-sig@pyth...


What if you are already on Py3.6, don't need markdown-descriptions (not sure what that is btw), and been happily using setup.py upload for a decade?


Twine seems to be the recommended[1] way. It's pretty straightforward to use, thankfully.

[1]: https://packaging.python.org/tutorials/packaging-projects/#u...


Pipenv is unusable for me, since launching your app only works when your current working directory is the Pipfile directory. If you want to launch an app via a shell script from another directory, you have to first cd to the Pipfile dir, pipenv shell (maybe you can pass in a second shell script as an argument).

The article mentions Pipsi is designed to make command-line apps globally accessible, and I'll try it out.

Additionally, adding git/src/package/module.py may be fine when you're using an IDE, but when browsing in a file manager, you must navigate 3 directories deep to even see any source files, which seems to be trending towards the inconvenience amd pain of Java projects.


I gave conda a shot and found it to be better than pip + virtualenv, but still not amazing.


I did the same and found pip_venv to be much superior


We tried Pipenv last year, ran into a number of bugs, this one being the most irritating:

https://github.com/pypa/pipenv/issues/786


Why is it that Python is geared towards archiving packages at the site level by default while npm, composer, et all tend towards including packages in the project's folder? Is it convention from a time when disk space was less plentiful?


"Why" is hard, but Python's packaging system was created when global installation was just how packaging worked. There may have been exceptions, but the first local package installation tool I knew of was workingenv (2006). It was the predecessor of virtualenv, which I think led directly to pipenv.


I rarely even use requirements.txt and never use it in my personal projects.

I just pin the project's direct dependencies in the setup.py file and install the folder directly. I know it might cause bugs with different developers (or the CI) using different versions of the upstream dependencies but I guess I trust the developers who create each library I'm using. The moment I directly import something from what used to be an upstream dependency, I pin it too.

So far this approach hasn't given me trouble, but I'll still take a look at poetry based on what I read in the comments here.


Huh, OpenDNS blocks this due to "a security threat that was discovered by the Cisco Umbrella security researchers."


Interestingly this URL gets blocked by my work's security thing. Never saw that before.

edit: I requested an exemption but corp IT staff came back and said there's definitely been malware identified on that site. So... be careful with your clicks.

edit2: Well who knows where the malware alert is coming from, might be an ad or something.


Sadly there are things you can't always install reliably from the Python repository. E.g. you may have to install things like scipy and keras from the OS or a 3-rd party (like brew or conda) repository as pip install would fail in the build process.


I was trying to set up pipenv on a Mac earlier in the week.

Being able to select P2 or P3 environments is great.

Unfortunately it decided all my packages were in /var/mail.

No patience to debug it, so I gave up on it.


(incf confusion)


this incredible complexity is what docker is really good at simplifying


Uh, docker doesn't help at all with handling dependency upgrades. Good package-manager/version-manager combinations does just that.


Maybe for deployment, but this problem has been solved well by both Yarn and Cargo.


pretty sure this is about python


Who curates the packages to prevent security issues?


This is a glaring issue with Python. Does any other language package repository implement any security? (I honestly don't know the state with other languages.)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: