Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I occasionally ask our principle / sr. Python engineers about this, and their response is always, "These things come and go, virtualenv/wrappers + pip + requirements.txt works fine - no need to look at anything else."

We've got about 15 repos, with the largest repo containing about 1575 files and 34MBytes of .py source, 14 current developers (with about 40 over the last 10 years) - and they really are quite proficient, but haven't demonstrated any interest at looking at anything outside pip/virtualenv.

Is there a reason to look at poetry if you've got the pip/virtualenv combination working fine?

People who use poetry seem to love it - so I'm interested in whether it provides any new abilities / flexibility that pip doesn't.



I will give it 5-10 years. If they are serious about it, it will last that long. They're about halfway there.

Package management is terrible work. Nobody appreciates it. It's extremely complex, it has to work in enormous numbers of configurations, and very minor errors can have catastrophic impact to security or availability of systems.


How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?

I'm in the same boat as you in that I'd like to keep using pip but the lack of a lock file is very dangerous because it doesn't guarantee reproduceable builds (even if you use Docker).

In Ruby, Elixir and Node the official package managers have the idea of a lock file. That is the only reason I ever look into maybe switching away from pip.

Running a pip freeze to generate a requirements.txt file doesn't work nicely when you use a requirements.txt file to define your top level dependencies.

I've been bitten by issues like this so many times in the past with Python where I forgot to define and pin some inner dependency of a tool. Like werkzeug when using Flask. Or a recent issue with Celery 4.3.0 where they forgot to version lock a dependency of their own and suddenly builds that worked one day started to break the next day. These sets of problems go away with a lock file.


> How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?

`pip-compile` from `pip-tools` is my go-to for this.


> Running a pip freeze to generate a requirements.txt file doesn't work nicely when you use a requirements.txt file to define your top level dependencies.

Use setup.cfg to define your top level dependencies. Use requirements.txt as your "lock" file. But even then you won't get reproducible builds across different OSes, or with different non-Python things installed on your machines. Use Docker images to guarantee staging and production will be identical.


There is one! Pip supports, since a few years back, a constraints file (i.e. a lock file) that does just this.

This is a nice guide on how to use it: https://www.lutro.me/posts/better-pip-dependency-management


This isn't quite the same. Yes, I can update my root dependencies in the `requirements.txt` file, run `pip install -r requirements.txt` and the run `pip freeze > requirements.txt`, but that's convoluted and requires me to know exactly what my root dependencies are. Is `astroid` something our tools use directly, or is it just a dependency of `pylint`? It's not clear. A lockfile clears this up.


Yes, and in addition to the requirements file, pip supports a constraints file, which is the lockfile you describe. It's separate from the requirements file. It solves exactly this problem.

Docs: https://pip.pypa.io/en/stable/user_guide/#constraints-files


The docs for this mentions:

> Including a package in a constraints file does not trigger installation of the package.

Maybe I'm not following something but how do you get all of this to work like a lock file in other package managers?

Let's use Ruby as a working example:

1. You start a new project and you have a Gemfile.

2. This Gemfile is where you define your top level dependencies very much like a requirements.txt file. You can choose to version lock these dependencies if you'd like (it's a best practice), but that's optional.

3. You run `bundle install`

4. All of your dependencies get resolved and installed

5. A new Gemfile.lock file was created automatically for you. This is machine generated and contains a list of all dependencies (top level and every dependency of every dependency) along with locking them to their exact patch versions at the point of running step 3.

6. The next time you run `bundle install` it will detect that a Gemfile.lock file exists and use that to figure out what to install

7. If you change your Gemfile and run `bundle install` again, a new Gemfile.lock will be generated

8. You commit both the Gemfile and Gemfile.lock to version control and git push it up

At this point you're safe. If another developer clones your repo or CI runs today or 3 months from now everyone will get the same exact versions of everything you had at the time of pushing it.


It should be the same process, except that the constraints file is not automatically created or detected, so step 5 would be "pip freeze >constraints.txt" and step 6 would be "pip install -r requirements.txt -c constraints.txt".

The top level dependencies go in requirements.txt and trigger installation of those packages. Everything else goes in the constraints file, which constrains the version that will be installed if something triggers an installation of the package, but it doesn't by itself trigger the installation - it only locks/constrains the versions.


Wouldn't you also need to run a pip3 install -r requirements.txt before the pip freeze?

Otherwise pip freeze won't find any dependencies.

So you end up having to run something like this:

    pip3 install -r requirements.txt
    pip3 freeze > requirements-lock.txt
    pip3 install -r requirements.txt -c requirements-lock.txt
Mainly because you can't run pip3 install -c requirements-lock.txt on its own it seems. It requires the -r flag.

That is a lot more inconvenient than running `bundle install` and if you use Docker it gets a lot more tricky because a new lock file would get generated on every build which kind of defeats the purpose of it, because ideally you'd want to use the existing lock file in version control, not generate a new one every time you build your image.


Nice! I’ll be adding this to my virtualenv + requirements.txt + pip process. Not sure why everyone wants to overcomplicate Python dependency management with pyenv/poetry/etc.


How do you validate versions against the constraint file? I read your original link a few times and didn’t see it.


pip install -r requirements.txt -c constraints.txt

You can also, thanks to the weird way requirements.txt works, put the line "-c constraints.txt" in requirements.txt. In that case you don't have to specify it when you run pip.

That should apply the constraints when installing packages. I don't know if there's also a way to validate what's already installed.


I'm not sure what you mean by, "How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?"

We use requirements.txt + Docker/k8s to lock in the OS. All of the versions of python modules are defined like:

   six==1.11.0
   sqlalchemy==1.2.7
   squarify==0.3.0
Which locks them to a particular version.

What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)


> What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)

The dependencies of the dependencies of what you have listed aren't guaranteed to be locked.

For example, let's say for arguments sake you were using celery.

When you install celery, by default these deps will also be installed: https://github.com/celery/celery/blob/master/requirements/de...

Those aren't very locked down. If you install celery today you might get vine 5.0.0 but in X months from now you might get 5.9.4 which could have backwards compatibility issues with what celery expects.

So now you build your app today and everything works but X months from now you build your same app with the same celery version and things break because celery isn't compatible with that version of vine.

This happened a few months ago. Celery 4.3.0 didn't version lock vine at all and suddenly all celery 4.3.0 versions broke when they worked in the past. That was tracked at https://github.com/celery/celery/issues/3547.

Docker doesn't help you here either because your workflow might be something like this:

- Dev works locally and everything builds nicely when you docker-compose build

- Dev pushes to CI

- CI builds new image based on your requirements.txt

- CI runs tests and probably passes

- PR gets merged into master

- CI kicks in again and builds + tests + pushes the built image to a Docker registry if all is good

- Your apps use this built image

But there's no guarantee what you built in dev ends up in prod. Newer versions of certain deps could have been built in CI. Especially if a PR has been lingering for days before it gets merged.

A lock file prevents this because if a lock file is present the lock file gets used, so if you built and included a lock file in version control, then CI will build what you pushed from dev, so the chain is complete from dev to prod for guaranteeing the versions you want. That is how it works with Ruby, Elixir, Node and other languages too. They have 2 files (a regular file where you put your top level deps and a machine generated lock file). A lock file in Python's world would translate to what pip3 freeze returns.


Thanks very much - your description of our workflow is really good (and is pretty close to exactly what we have!)

I don't understand what dependencies everyone keeps talking about (which seems to be a big deal with Poetry) - when you run: pip freeze

It captures every single python module, dependencies as well. Because everything in the dependencies file is listed as: aaaaaaa==xy.z

You are guaranteed to have the exact same version.

We have all sorts of turf wars when someone wants to roll forward the version of a module, and, in the case of the big ones (Pandas) we sometimes hold off for 6-9 months before rolling it forward.

But there is something that Poetry is doing that is better than "pip freeze" - I think once I figure that out, I'll have an "aha" moment and start evangelizing it. I just haven't got there yet.


You need a constraints file

https://pip.pypa.io/en/stable/user_guide/#constraints-files

pip install celery==4.3.0 -c constraints.txt

Where constraints.txt defines exact versions for everything


Why not keep the same docker image throughout then lifecycle? E.g. merge to dev branch, trigger ci (build image at this point), maybe deploy to a test environment, run more tests, then deploy to prod. No chance of packages changing since the image isn't rebuilt. Of course if not using docker, a lock file (i.e. actual dependency resolution) would seem essential for reproducibility.


First, how did you generate that requirements file?

Second, how do you separate dev dependencies from prod dependencies, and how do you update a dependency and ensure all of its transitive dependencies are resolved appropriately?


    pip freeze > requirements.txt
Lists every python module that's been loaded into the virtualenvironment. So, from my (admittedly new) understanding, that means we guarantee that in the production/devel/docker environment - every python module will be identical to whatever was installed in the virtual env.

Dependencies and transitive dependencies are guaranteed to be resolved/ensured because we list everyone one of them out in the requirements.txt file.


Yes, this works as you expect. What it lacks are three big things; making it easy to see what your direct dependencies are, separating dev dependencies from prod dependencies, and an easy way to update a dependency while resolving all transitive dependencies.

There are other shortcomings, but those are the big ones.


If you have a working setup and especially a strategy for updating routinely (e.g. using pip-tools) and pinning dependencies with hashes, there’s less reason to change. The biggest reason I’d give is consistency and ease of adoption, which might be coming down to “do you spend much time on-boarding new developers?” or “are you supporting an open source community where this causes friction?”. If you aren’t spending much time on it, perhaps try it on new projects to see what people think in a low-friction situation - in my experience that’s basically been “I like spending less time on tool support”.


They either haven't experience the pain or are oblivious to it. Pip old resolver is borked[0]:

> [The new resolver] will reduce inconsistency: it will no longer install a combination of packages that is mutually inconsistent. At the moment, it is possible for pip to install a package which does not satisfy the declared requirements of another installed package. For example, right now, pip install "six<1.12" "virtualenv==20.0.2" does the wrong thing, “successfully” installing six==1.11, even though virtualenv==20.0.2 requires six>=1.12.0,<2 (defined here). The new resolver would, instead, outright reject installing anything if it got that input.

[0]: https://pyfound.blogspot.com/2020/03/new-pip-resolver-to-rol...


I haven’t found a compelling need to switch to Poetry, but independent experimentation and competition can be good thing. I wouldn’t be surprised if the new pip resolver was partly inspired by Poetry, similar to how some NPM improvements were motivated by Yarn. [1]

[1]: https://javascript.christmas/2019/10


That would probably be caught by the CI, though, wouldn't it? One of the packages would fail to install in a clean venv and the tests would fail.


No, it won't be caught in CI. The old version of pip didn't recognize that as an error condition.


Not if you used the old version of pip. That is hopefully fixed now with the new resolver.


Hm, you're right. I misread it as saying that it would 1) install six, 2) fail to install virtualenv and 3) fail to report an error.

But the old resolver actually installs both packages, even though it should just abort.


I was one of those setup.py + requirements.txt (generated by pip-compile).

Though, poetry is actually quite good. There are still some things that I wish it had, like plugin support (for example I really miss setuptools_scm) or being able to use it for C packages.

But if your code is pure python it is great from my experience. The dependency resolver is especially good.


beware of zealot geeks bearing gifts. if your environment is currently working fine and you are only interested in running one version of python and perhaps experimenting with a later one then venv + pip is all you need, with some wrapper scripts as you say to make it ergonomic (to set the PYTHON* environment variables for your project, for example)


The main reason I use Poetry is for its lockfile support. I've written about why lockfiles are good here: https://myers.io/2019/01/13/what-is-the-purpose-of-a-lock-fi...


Pip supports constraints files. https://pip.pypa.io/en/stable/user_guide/#constraints-files :

> Constraints files are requirements files that only control which version of a requirement is installed, not whether it is installed or not. Their syntax and contents is nearly identical to Requirements Files. There is one key difference: Including a package in a constraints file does not trigger installation of the package.

> Use a constraints file like so:

  python -m pip install -c constraints.txt


What does that command do? Does it install dependencies, or simply verify the versions of the installed dependencies?


It just gives you an error:

  ERROR: You must give at least one requirement to install (see "pip help install")
So it seems like a strange choice of usage example. You have to provide both requirements and constraints for it to do anything useful (applying the version constraints to the requirements and their dependencies).


What does this do/have over files generated by "pip freeze" ?


poetry has several advantages which I can no longer live without.

1. Packages are downloaded in parallel. This means dramatically quicker dependency resolution and download times.

2. Packages can be separated for development versions production environments.

3. Poetry only pins the packages you actually care about, unlike a pip freeze. One application I work on has 15 dependencies, which yields a little over 115 packages download. pip freeze makes it impossible to track actual dependencies, whereas poetry tracks my dependencies - and the non-pinned packages are in the poetry.lock file.

The rest is nice, but the above is essential.


Hang about. Poetry is SLOW because it does not operate in parallel. It’s the one thing holding me back from a large project.

Unless this has recently changed?


I think this changed a month or two ago. It installs each "level" of the dependency tree in parallel.


Poetry 1.1 introduced parallel downloads. Even before that, it was an order of magnitude faster than its competitor Pipenv.


I was happy with Pip until I spent time in the NPM/Yarn world. Frustrated with Pip I switched half of our projects to Pipenv. However, I found that it struggled to resolve dependencies. Poetry works like a dream now, and life is so much easy now that we have switched all our projects to it.

The methodology of specifying your core dependencies, but also having locked version of your dependency's dependencies works really well.

AND you can easily export to requirements.txt if you prefer to use that in production.


What does NPM/yarn do better than virtualenv/pip?

“Added 17000332 packages (including is-even) 875 vulnerabilities found, have fun with that info. Yours truly, NPM”


Ẁhile I agree the situation is ridiculous, what prevents anyone to do the same in Python?

I can publish is-even on PyPI if I want, is that Pip's fault?


It's a difference in the community's engineering values. JavaScript devs pull in a dependency for virtually everything, whereas Python distributes an extensive standard library with the language. It's less important that the same thing is hypothetically possible in both communities and more important that specific communities have chosen to use similar toolkits differently.


> I occasionally ask our principle / sr. Python engineers about this, and their response is always, "These things come and go, virtualenv/wrappers + pip + requirements.txt works fine - no need to look at anything else."

At the macro level, this seems like a bit of a self-fulfilling prophecy: if all the senior and principal engineers using Python don't care to take a look at something else, then it's not too surprising that new solutions don't end up sticking around. That isn't too say that there does necessarily need to be a change in the Python community's choice of package manager, but the rationale for not even considering looking at other options doesn't seem super compelling.


Or the new thing needs a really compelling case


I personally rather avoid any per language package managers and opt for distro level package management (rpm/deb). Thats something that has bern there pretty much for ever & has all the hard & dirty depsolving issues long solved.

Also I can use it for all software regardless of what laguage it's written in, not to mention having to learn multiple half-baked package managers per laguage!

And lastly any non trivial software project will need dependencies outside of the world of its own package manager anyway, so why not go all in properly with rmp/deb & make everything easier for your users & your future self.


python setup.py bdist_rpm is also useful (although it's quite naive, doesn't always do the right thing and you'd usually want to add dependencies).


> Is there a reason to look at poetry if you've got the pip/virtualenv combination working fine?

You probably have a bunch of scripts that do what poetry does (either that, or you repeat the same commands over and over A LOT).

Switching to poetry might have some initial overhead, but a big upside is that you stop using custom, internal tooling, and use something industry-standard. Importantly, it makes it easier for you to understand external projects (since you're familiar with the standard tooling), and faster to onboard newcomers.


I’m curious how they ensure every developer is using the same versions of things, as well as how they manage dev dependencies and transitive dependencies.

pip + venv + requirements.txt doesn’t solve this out of the box while most languages have common tools that do. Either they’ve rolled their own way to manage these things, or they’re rolling the dice every time they deploy.


We have the following:

   $ wc -l requirements.txt
   118 requirements.txt
And every module in it is locked to a particular version:

   alembic==0.8.8
   amqp==2.2.2
   anyjson==0.3.3
   azure-storage==0.36.0
   backports.shutil-get-terminal-size==1.0.0
   billiard==3.5.0.2
etc...

I don't really understand the "Dependencies" thing (Or the difference between dev dependencies/transitive dependencies)- we literally list every single module in our environment, and its version - It's not clear to me what other dependencies there could be in a python development environment.

I do note we have three requirements.txt files, a requirements.txt, requirements-test.txt, and a requirements-dev.txt. So, presumably there is a need for different requirements that you've identified that I don't understand. So there's that.


Dev dependencies: a library you need during development, but that isn’t needed in production. I think your -test and -dev are this, but it’s not clear how you are maintaining all of these, and building for prod.

This is the main complaint, most modern languages have a standard set of tools and flows for achieving this. Python doesn’t, and everyone does it a bit differently, and when starting a new project, you have to hand roll your own flow.

Or, use something like poetry but the python community as a whole doesn’t have a commonly used solution.


You can pin versions in requirements.txt, and have a separate requirements-dev.txt.


You're losing the ability to easily update using fuzzy specs and a lockfile if you're using pip vanilla without pip-tools or poetry.


To clarify further - lockfile == reproducible builds without having to pip freeze every single dependency in the tree.

Fuzzy specs == effortless upgrades according to your risk tolerance for a given library (major version for boto3, minor version for pandas, something like that).

Poetry gets you the combination of the two: Let your dep versions float, and easily revert back to a previous deterministic build using the version-controlled lockfile if something breaks.


I'm wondering why we wouldn't want to pip freeze every single dependency in the tree. I'm looking at our requirements.txt, and everything is of the form:

   vine==1.1.4
   urllib3==1.25.10
   wcwidth==0.1.7
I know that changing the versions of any of the underlying libraries is always a big conversation. (We're still locked in on pandas==0.23.4)

So, if I understand correctly, with Poetry, we might be able to say, "Keep Pandas at 0.23.4 and sqlalchemy at 1.2.7 but figure out all the other dependencies for us and load them."

Or, even better, "Keep Pandas at 0.23.x and sqlalchemy at 1.x.x but figure out all the other dependencies for us and load them."

The advantage here is security patches in underlying libraries come for free, while we focus on porting code for the really high-level + important Libraries which aren't always backwards compatible (Pandas)

Also - if we want to stick with specific versions, that's also possible with the lockfile - so every library will be exactly the same as the one in a build that worked.

The thing I don't understand - is when I do:

   pip install pandas==0.23.4
It does load the dependencies. Indeed, if I create a requirements.txt that just has:

   pandas==1.0.3
   pytz==2020.1
   six==1.14.0
Then pip install -r requirements.txt goes and does:

   $ pip install -r requirements.txt
   Collecting pandas==1.0.3
     Using cached pandas-1.0.3-cp38-cp38-manylinux1_x86_64.whl 
   (10.0 MB)
   Collecting pytz==2020.1
     Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
   Collecting six==1.14.0
     Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
   Collecting python-dateutil>=2.6.1
     Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 
   kB)
   Collecting numpy>=1.13.3
     Using cached numpy-1.19.4-cp38-cp38-manylinux2010_x86_64.whl 
   (14.5 MB)
   Installing collected packages: six, python-dateutil, pytz, 
   numpy, pandas
   Successfully installed numpy-1.19.4 pandas-1.0.3 python- 
  dateutil-2.8.1 pytz-2020.1 six-1.14.0
So - I'm still at a loss of the advantage of poetry vs pip install, given that pip loads dependencies as well - the advantage of "fuzzy specs" seems minimal given it's such a big deal to upgrade the big packages.


Nothing locks the version of numpy that you got there. If you run the same thing again in a few weeks you might get a completely different version, and have no way to revert to the version you had before.


setup.cfg vs requirements.txt offers this.


Poetry is by no means the only option for this.

Lots of people like pip-tools, it would feel a lot more lightweight and closer to pip than Poetry does.

Pipenv exists but... steer clear for a multitude of reasons.

Personally I like that Poetry centers itself around the pyproject.toml standard. I also think that its usability and the enthusiasm of both the maintainers of the users is going to really carry it more into the Python mainstream in the coming years.


Personally I'm very disappointed that we keep inventing new standards, like pyproject.toml when other things have like setup.cfg, existed for an extended period of time, works well, and is supported for reading and writing by the stdlib.


I see pyproject.toml as more like 'tox.ini is nice, it's good that so many tools use it, but it's really nothing to do with tox', and bringing them (and hopefully those declining tox too) and setup.cfg into one.


Pip supports this now. It calls them constraints files: https://pip.pypa.io/en/latest/user_guide/#constraints-files


[flagged]


This comment breaks the site guidelines—would you mind reviewing them and sticking to the rules when posting here? Note this one:

"Please don't post insinuations about astroturfing, shilling, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."

https://news.ycombinator.com/newsguidelines.html

Here's plenty of past explanation of why we have this rule: https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme.... The short version is that internet users are too quick to reach for "astroturfing" etc. as an explanation for what they don't like, and it poisons the ecosystem when people post these accusations without evidence. Someone else having different views—different tastes in package managers, for example—does not clear the bar for evidence.


> The commenter pushing for poetry must have a stake in it

One benefit of open source projects is how easy is it to discount your comment


Notoriety is still a stake, people routinely get job offers for being big contributors to PyTorch


I assume you are referring to me? I can assure you, I have absolutely zero stake in Poetry. I'm not one of the developers, I'm not even on of the contributors (excluding opening a bug report or two).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: