Hacker News new | past | comments | ask | show | jobs | submit login
Multiversion Python Thoughts (pocoo.org)
89 points by divbzero 9 days ago | hide | past | favorite | 85 comments





Should this be facilitated? Should this work really be done? I’m thinking not.

If I have a special case and I need to do this today, I’m not blocked from doing so (I’ll vendor one of the dependencies and change the name) - certainly a pain to do, especially if I have to go change imports in a c module as part of the dep but achievable and not a blocker for a source available dependency.

However, if this becomes easily possible, well why shouldn’t I use it?

The net result is MORE complexity in python packaging. More overheads for infra tools to accomadate.


You’ve never found yourself in a dependency resolution situation where there are no solutions to requirements. You need multiple versions of the same package in such cases.

The alternative is just cheating: ignore requirements, install packages and hope for the best. Alas hope is not a process.


No, the alternative is to just fix the code so that the conflict is removed. This is why open source is so powerful, you are empowered to fix stuff instead of piling up tower of workarounds.

You can fix your code, but indirect dependencies (you use A and B, both depend on C, but different versions of C) cannot be handled well.

In C, C++, maybe Java, you would at least be able to link A and B with their own private copies of C to avoid conflicts reliably with standard mechanisms rather than unreliably with clever magical tools.


You missed the point. If A, B and C are all open source you can actually fix them too. Just send a PR. Most projects are open to ensuring compatibility moves forward.

Just send a PR? What are you talking about? A project that requires an old version of a dependency either has technically valid reasons, and is unlikely to be upgraded just because one more user asks nicely, or is maintained at a slow pace and/or with a low effort level, so that even if you do the work your patch is likely to be ignored (at least temporarily).

The usual way is to put conflicting versions in optional-dependencies, and then build one target for each conflicting set of deps. That'd work fine if the code path of one target doesn't touch the others, which is often the case.

You'd obviously need to have tests for both targets, possibly using a flexible test runner like `nox` to setup separate test env for each target.


You're viewing this from the perspective of a dependency resolution engine. From the perspective of a software engineer, the solution is to find better-behaved dependencies.

I've never found myself in this situation where it can't be solved by being throughtful and taking the time to improve my code.

These issues crop up in dependencies more than your code. Then you have to vendor one of the deps and edit it (and hopefully ship that back upstream and hope the maintainer will merge it).

I think it could be very beneficial for enabling backwards compatibility:

    import version1
    import version2

    try:
      version2.load_data(input)
    except ValidationError:
      version1.load_data(input)


 I'm sure people will abuse it, but the idea doesn't seem terrible on the face of it to me.

Could you not do that today?

    import version2

    try:
      version2.load_data(input)
    except ValidationError:
      import version1
      version1.load_data(input)
(Assuming version in name like this example version2 or lib2 etc.)

The pseudocode in GP fails to capture the idea. Right now, two different versions of the same third-party library, would ordinarily have the same `import` name in the code. Even if you somehow hacked Pip or otherwise managed to install multiple versions of the library side-by-side in the same Python environment, there would be no way, at the level of Python syntax, to specify which one to use. The default import machinery would simply choose a) whatever's cached in `sys.modules`; b) failing that, whatever is found first by the `sys.path` search process. There are many hooks provided to customize this process, but there would be no way to specify the version to use in the `import` syntax, aside from using separate names.

Of course, you can change the import name between versions. That's one of the upsides of not tying the import name to the distribution name, and many real-world projects actually do this as part of a deprecation cycle (for example, `imageio` has been doing it with recent versions, offering "v2" and "v3" APIs). But in the general case, you'd have to change it with every version (since your transitive dependencies might want different minor/patch versions for some obscure reason - semver is only an ideal, after all), which in turn means your users would always have to pin their dependency to the exact version described by the code.


Only if the library is renamed for each new version, which seems a bit impractical.

I guess that’s my central thesis against the need for this work - it’s not impractical to rename an import.

With a tool like rope, I feel fairly confident I can refactor a source available, pure python dependency, pretty quickly.

Where I get less comfortable with my idea is that not every dependency has source available (e.g. db2 database driver as an example).

Another case is where some deps which have source available but the python module dependency is in C/C++/Rust - e.g. scipy


This assumes that you will get an actual error throw and not merely incorrect behavior.

I feel like most of these problems just disappear if people would follow the same naming scheme sqlite3 uses. Just put the major version in the name and most tools just work out of the box with multiple versions.

That’s why I called the library “jinja2”. It was a new major version. But people over the years really did not like it and it did not catch on much.

Once more, the wisdom of "Explicit is better than implicit" shines.

Instead, we jump through hoops with our hair on fire to manage complexity.

People.


Agree. The issue was breaking bc by releasing a new major under the same name.

Really, people should just update their software to use the newer libraries and fix whatever breaks. If you want to use functions from version 2, you should port the rest of the code to version 2.

Of course coming with the _major_ caveat that the interface and behaviour[1] of the dependency is absolutely stable for the entirety of the "major version".

I'm an advocate for this style of library/dependency development, unfortunately in my experience the average dependency doesn't have the discipline to pull it off.

[1] https://www.hyrumslaw.com


I don't want to "pull this off", nor do I want to expend the time/energy to do so. We're not Microsoft with effectively infinite budgets, nor are we Elasticsearch/Grafana Orgs with metric oodles of developer-hours, and 50k+ github star mindshares-worth of evangelists behind us to document and tutorialize every tiny little feature in a cookbook or doc website.

Code changes, and it's kinda silly to expect interfaces to be locked in place as that'll stifle development for even small-ish features. Does that mean every minor version or commit will change fundamental or large parts of the codebase? Probably not, but it's a sliding scale and people seriously need to find something better to do than writing Yet Another Python Package Manager.

I use the term "we" loosely here ofc in the context of this mini-rant.


If you are breaking the api it isn't that hard to add a new one instead of butchering the old one.

If it really can't be done then you aren't really shipping a library that is meant to be depended on.


I think this works ok if your library is something like Django or Pandas that people are building their project around. But it makes things exponentially more complex for libraries like pyarrow or fsspec that are normally subdependencies.

Imagine trying to do things like import pyarrow14, then if that failed try pyarrow13, etc. Additionally, python doesn't have a good way of saying "I need one of the following different libraries as a dependency"


I like the way python handled this situation - python3 for when you care, python for when you don't.

This is Rich Hickey's suggestion too

https://www.youtube.com/watch?v=oyLBGkS5ICk


sqlite3 was released 20 years ago, in 2004. I'm not convinced all code written in 2004 which used sqlite3 would still work today against the latest version.

If stuff was removed, then I would have expected them to bump the ABI? As they haven't done that, I would actually assume it would work absent evidence to the contrary?

This comes up every now and again, and there are two fairly simple examples that I think show the complexities:

One:

Library A takes a callback function, and catches “request.HttpError” when invoking that callback.

The callback throws an exception from a differing version of “request”, which is missing an attribute that the exception handler requires.

What happens? How?

Two:

Library A has a function that returns a “request.Response” object.

Library B has a function that accepts a “request.Response” object, and performs “isinstance”/type equality on the object.

Library A and library B have differing and incompatible dependencies on “request”.

What version of the request object is sent to library B from library A, and how does “isinstance”/“type” interact with it?

Both of these resolve around class identities. In Python they are intrinsically linked to the defining module. Either you break this invariant and have two incompatible/different types have the same identity and introduce all kinds of bugs, or you don’t and also introduce all kinds of bugs - “yes, this is a request.Response object, but this method doesn’t exist on this request.Response object”, or “yes this is someone’s request.Response object, but it’s not your request.Response object”

Getting different module imports to succeed is more than possible, getting them to work together is another thing entirely.

One solution to this is the concept of visibility, which in Python is famously “not really a thing”. It’s safe to use incompatible versions of a library as long as the types are not visible - I.e no method returns a request.Response object, so the module is essentially a completely private implementation detail. This is how Rust handles this, I think.

However is obviously fucked by exceptions, so it seems pretty intractable.


That is not any different in Python than it is in Rust, Go or JavaScript. Yes: it's a problem if those different dependency versions shine through via public APIs. However there are plenty of cases where the dependency you are using in a specific version is isolated within your own internal API.

I think if Python were to want to go down this path it should be isolated to explicit migration cases for specific libraries that want to opt themselves into multi-version resolution. I think it would enable the move of pretty core libraries in the ecosystem in backwards incompatible ways in a much smoother way than it is today.


The problem with this is exceptions: they easily allow dependencies to escape, be that via uncaught exceptions or wrapped ones.

Go and JavaScript have type systems and idioms far more amenable to this kind of thing (interfaces for Go, no real type identity + reliance on structural typing for JS) and rely a lot less on the kind of reflection common in Python (identity, class, etc).

I guess there are some use cases for this, I just feel that the lack of ability to enforce visibility combined with the “rock and a hard place” identity trade-off limits the practical usefulness.


> The problem with this is exceptions: they easily allow dependencies to escape, be that via uncaught exceptions or wrapped ones.

Sure, but that just means your dependency was not really internal. Errors are API too.


Exceptions are no different from Go's error types (and in general interface types in any language) from this point of view. If moduleA is doing something like `errors.Is(err, ModuleBError)` on an error that was returned from moduleC which uses a diffferent version of moduleB, you'll get the same issue.

That’s interesting - is it common to do this instead of casting to an interface?

It seems a lot more impactful with Python due to type equality being core to how exceptions are handled, even if there are similarities.


Well, the most common is of course `if err != nil`, which is unaffected. But in the very rare occasions that someone is actually handling errors in Go, `errors.Is` and `errors.As` are recommended over plain casts since they correctly handle composite exceptions

Say a function returns `fmt.Errorf("Error while doing intermediate operation: %w", lowerLevelErr)`, where `lowerLevelErr` is `ModuleBError`. Then, if you do `if _, ok := err.(ModuleBError) {...}`, this will return false; but if you do `if errors.Is(err, ModuleBError)`, you will get the expected true.

Regardless, the core problem would be the same: if your code can handle moduleB v1.5 errors but it's receiving moduleB v.17 errors, then it may not be able to handle them. This same thing happens with error values, Exceptions, and in fact any other case of two different implementations returned under the same interface.

You even have this problem with C-style integer error codes: say in version 1.5, whenever you try to open a path that is not recognized, you return the int 404. But in 1.7, you return 404 for a missing file, but 407 if it's a missing dir. Any code that is checking for err > 0 will keep working exactly as well, but code which was checking for code 404 to fix missing dir paths is now broken, even though the types are all exactly the same.


I think the issue is more that in Python you could get confusing runtime failures. In Rust, it will fail to compile if you're trying to mix two different major versions of a dependency like that. I'm fine with the latter, but the former is unacceptable.

It is still painful in Rust, I remember at one stage in a project having a bunch of helper functions to convert structs from one version of nalgebra to another as my game engine (Amethyst at the time, I think?) used one version and the ncollide library used other, and both exposed nalgebra in their public interfaces.

You can have multiple incompatible dependency versions imported by the one create; you have to give them different names when declaring them, but it works fine (I just tested it).

It follows the approach of "objects from one version of the library are not compatible with objects of the library" mentioned above, and results in a compile time error (a potentially confusing type error, although the error message might call out that there's multiple library versions involved).


You can have private and public dependencies. Private dependencies are the ones that don't show up on your interface at all. That is you don't return it, you don't throw it or catch it (other than passing through), you don't take callbacks that have it in their signature, etc... You can use private dependencies for the implementation.

It should be safe to use multiple versions of the same library, as long as they are used as private dependencies of unrelated dependencies. It would require some tooling support to do it safely:

1. Being able to declare dependencies are "private" or "public".

2. Tooling to check that you don't use private dependencies in your interfaces. This requires type annotations to gain some confidence, but even then, exceptions are a problem that is hard to check for (in Python that is).

In compiled languages there are additional compilications, like exported symbols. It is solveable in some controlled circumstances, but it's best to just not have this problem.


> you don't throw it or catch it

Herein lies the issue: in this context exceptions can be thought of as the same as returns. So you actually need to catch/handle all possible exceptions in order to not leak private types.

Also what does “except requests.HttpError” do in an outer context? It checks the class of an exception - so either it doesn’t catch some other modules version of requests.HttpError (confusion, invariants broken) or it does (confusion, invariants broken).


It's fine as long as you catch all exceptions, and only produce ones that you document. Your users aren't supposed to know that you used `requests` at all.

Sure, but who does this? And the typical pattern is to wrap exceptions, giving you access to the inner exception if you need more context.

The requests HTTP exception contains the request and response object. Wrapping that would be a huge pain and a lot of code.


I have done the following in the past:

1. pip install libfoo==1.x.x

2. pip install libfoo==2.x.x --target ~/libs/libfoo_v2 # vendor libfoo v2

3.

import sys

import libfoo

original_sys_path = sys.path.copy()

sys.path.insert(0, '~/libs/libfoo_v2')

import libfoo as libfoo_v2

sys.path = original_sys_path

There are caveats of course. But works for simple cases.


Another person here mentioned a comment "More overheads for infra tools to accomadate." I agree.

I am using python as my main language these days, coming from JS and C++ and a bit of rust. The biggest problem I face is that the tools (editors, mainly) don't support the basic packaging tools.

I use venv for everything, but when I try to use something else, I almost always get bitten.

For example, asdf. I tried to use this tool. So awesome! It works great from the command line. But, when I try to use zed, it cannot figure out what to do, and I cannot find references in the zed github repository on the right way to setup pyproject.toml.

And, emacs. Will uv work within emacs? Each of these packaging tools (and I'm thinking about the long history of nvm (node version manager), brew, and everything else) makes different assumptions about the right way to modify the path variable, or create aliases, or use shims (the definition of which varies with each tool) and I'm sure I'm missing other details.

Does uv do the right thing mostly? I will say my experiences with python and the tooling has been more frustrating than the tools for JS. I use pnpm and it just works, and I understand the benefits. And, I can survive with just npm and yarn. But, to me, it is saying a lot that the python tooling feels more broken than JS. I mean, I lived through the webpack years, and I'm still using JS and have a generally favorable opinion of it.


Yes, uv generally does the right thing.

I hold the same opinion as you. Python packaging is awful. But uv managed to just make it work.


Could you elaborate on the issues you've been seeing in Zed? I've been using asdf as a version manager for Python for quite some time and haven't really had any issues.

You are using asdf and zed successfully? I'm so glad to hear this.

I removed asdf because I could not get it to recognize the pyproject.toml file. This is working in harmony for you?

With Zed and a venv (from python -m venv .venv for example), Zed properly recognized installed packages and provided type hinting and docs, but when I switched to asdf it did not seem to work. But, I was new to asdf and perhaps was using it incorrectly.

I was always assuming that when I'm in the command line, running asdf to use the right python works because the path is correctly established. But, when I run zed, it launches without the path setup step, and things went badly. I'm just speculating, but I could not get type hinting and didn't know how to fix it.


Sorry, I should have clarified: I have not used Zed but I've successfully used asdf (with Python) with many other IDEs and the challenges are usually the same.

> I was always assuming that when I'm in the command line, running asdf to use the right python works because the path is correctly established

In a typical setup asdf is installed by sourcing some shell script in your .bashrc (which will then add shims to your path). It might very well be that Zed didn't execute `python` in an interactive shell, so the shims weren't available. There are various solutions here but the easiest is probably to add the shims to your PATH yourself.

As for venvs, using asdf doesn't mean you should no longer use venvs since all projects using the same Python version (managed by asdf) will still share the same site-packages folder. In other words: I'd still recommend setting up a venv, e.g. through Poetry or uv/Rye. Besides, once .venv/bin/python symlinks the asdf Python shim, Zed might have an easier time finding the right binary, too.


uv and zed are very new—stick to mature tools and you’ll have a better time. Or forget about some editor integration in the meantime.

Does this not fall over in the circumstance of linking against a C library (not specifically a Python extension) as many Python libraries do?

For e.g. I write “library” of which v1 depends on somelib.so.1.0.0 and v2 depends on somelib.so.2.0.0

If somelib has some symbols clashing in the names this can cause real problems!


They don’t show up in the global symbol namespace usually so it’s fine. It’s only an issue for some libraries that load globally so that one library can reference another c library.

Python dlopens binary wheels with RTLD_LOCAL on Linux, and I assume it does the equivalent on Windows.

There were issues relatively recently with -ffast-math binary wheels in Python packages, as some versions of gcc generates a global constructor with that option that messes with the floating point environment, affecting the whole process regardless of symbol namespaces. It's mostly just an insanity of this option and gcc behavior though.


Windows has a different design where the symbols are not merged, the .dll name is part of the resolution, so you can have 2 different .dll export the same symbol.

One way is to publish 2 variants of your packages: one with the major version number appended to package name and one without the version number. Users who need to install multiple versions can use the first variant, while user that just want to follow the latest can use the second.

It is kind of the same problem as for shared libraries. In the GNU universe the cleanest solution is to have multiple soversions. Transferred to jinja it would be jinja.so.1 and jinja.so.2.

Maintaining fine grained symbol versioning is a pain and a massive amount of work for the package maintainer:

https://invisible-island.net/ncurses/ncurses-mapsyms.html

Honestly, multiple installed versions like jinja1 and jinja2 sounds best to me.


What is the difference between this and and the approach taken by eggs/buildout (other than the time difference, and so some changes in APIs)? My impression (having never used buildout) was handling multiple versions made debugging hard, and there were lots of random issues unless you did everything correctly (and that setuptools and its variants switched away from the options to use multi-versioning because it was effectively a foot-gun)?

One of the reasons I made pip-chill is to create an incentive to not bother with version numbers and just make your software always run against the latest versions. If something breaks, fix it. If it's too hard to do it, maybe you depend on too many things. Leftpad feelings here.

Having your software depend on two different versions of a library is just asking for more pain.

BTW, I still need to fix it to run on 3.12+ in a neat way. For now, it runs, but I don't like it.


This is a fun idea. I always do this currently with some non-core dependencies like black, linters, or dependencies I controll the versioning of. I can't imagine I'd use this on libraries that provide real functionality in prod code though. If you have an incident popping off and you find out your dependency resolution goblin has reared its little head at the same time, bad day ahead.

You can always replicate the precise configuration running on the server. I assume the software passed all tests before being deployed with those same exact versions.

The absence of any references to how other ecosystems do this (e.g. oldest widely in use: ELF soversion) strikes me as massive oversight? There's 40-50 years of history of people trying to do close cousins of this.

Am I the only person here who thinks the easiest thing to do is to download the python version you need, compile, install in your preferred prefix like any other program and then run it?

I think that's quite easy, sure. But it's also irrelevant to the discussion. The issue the article is talking about (despite the title) isn't about managing multiple versions of Python. It's with trying to have multiple versions of a third-party library, within a single Python environment.

Ah you're right, thanks. Admittedly I only skimmed and relied on assumptions of the usual discussion around complaints on the python packaging ecosystem xD

Though in principle, my preferred approach here would be similar. Manually install in a particular prefix, add it to the path (manually at runtime or programmatically via the sys module), and then import multiple versions under different namespaces...


This approach was discussed elsewhere in the thread and you may want to be aware of its limitations (https://news.ycombinator.com/item?id=41706009).

Thanks!

I won't start the same thread here (though thank you for pointing that one out), but when I've come across scenarios similar to those in that thread in my own projects (which, admittedly, while not trivial, were probably not enterprise-scale), the solution generally still involved making sure that one ensured the paths / global variables were suitably modified in the relevant modules before the relevant calls, to ensure you're using the correct namespaces expected by the recipient. Which may be tedious, but to me is not absurd; you're just sticking to the contract expected by the recipient. The bigger issue here (for me) is probably whether those contracts are visible/explicit or not, and how/whether the contract is enforced in the recipient library ... but I would hesitate to call this a multiversion / dependency problem.


This would introduce more problems than it solves 99% of the time. The 1% of the time, it could be very handy.

I haven’t used UV, but it says that it manages python as well as packages - I’m guessing like conda, python-venv, and of course nix does.

If the C api is an issue, it sounds like you have control over it if you need it. You manage the python distribution, so could it be patched?

This way it feels like you’d be able to establish not just what is being imported, but what is importing it - then redirect through a package router and grab the one you want.

This may be particularly useful if you’re loading in shared libraries, because that is already a dumpster fire in python, and I imagine loading in different versions of the same thing would be quite awkward as-is.


It's one thing to see if something like this is possible from a technical standpoint, but whether this is desirable for the ecosystem as a whole is a different question. I would argue that allowing multiple versions of packages in the dependency tree is bad. It removes incentives for maintainers to adhere to sane versioning standards like semver, and also the incentive to keep dependencies updated, because resolution will never be impossible for package users. Instead, they will get subtle bugs due to unknowingly relying on multiple versions of the same package in their application that are incompatible with each other.

For lack of a better word, the single package version forces the ecosystem to keep "moving": if you want your package to be continued to be used, you better make sure it works with reasonably recent other packages from the ecosystem.


> It removes incentives for maintainers to adhere to sane versioning standards like semver

Semvers does not matter in this way. The issue with having a singular resolution — semver or not — is that you can only move your entire dependency tree at once. If you have a very core library then you are locked in unless you can move the entire ecosystem up which is incredibly hard.


Indeed, in my opinion it is the best way to finish in a cluster mess like the nodejs/npm ecosystem...

And a very real issue is that young developers don't know anymore how to develop by limiting dependencies to the strict minimum. You have some projects with hundreds of dependencies without a real reason than lazyness or always using the new shiny thing.


Indeed, I blame npm for normalizing this kind of thing. It's no surprise that frontend devs wouldn't understand why it's bad, but Python devs should know better.

Pip, Venv, poetry, pipenv, now uv

If you are still struggling with this in 2024, you are missing the actual challenges of the world.


Just like JS has npm, yarn, pnpm, bun...

It seems the more users a language has; the more dev tools get written for it.


Maybe if you don't import stuff like left-pad and actually write some code, you wouldn't have to write a dissertation on package management.

Please try familiarizing yourself with the actual dependency graphs of major large Python projects - and with what these dependencies actually do - and then see if you still want to maintain this facetious tone.

Actually, I prefer spending my time writing right-pad

Maybe we could abstract directions away.

d-pad(pad_character,direction,string)

Of course this would be an internal dependency of both left-pad and right-pad.


Not that multiversion support was one of the goals that I had when I built rye. It's also something I want to see if it's possible with uv. Multiversion support is entirely orthogonal with how you are installing packages.

Agreed, but who is struggling with it? Do you think tool development should just stop because it's not an "actual challenge"?

Not OP, but I honestly think that these kinds of devs are just way too opinionated and don't want to accept the 95% valid existing solution. In a healthy ecosystem, people can try and experiment and if something is awesome then people would naturally switch to it over time.

Sadly what happens now is that everyone under the sun tries to evangelise and "create content" for these new tools so much that the natural filter mechanisms don't work. Doubly-so because tools like Google effectively created a new fitness function for peoples' behavior that incentivizes just plain old content creation (whatever weird form it may take, including new libraries being created + promoted).


IMO, there's a direct line of improvement between pipenv, poetry, (briefly) rye and now uv. I think the ecosystem is improving over time and will eventually coalesce around a majority platform. I like uv, but I'm unsure if that will be the final product.

Beyond just tool names, it's also important to realise that there has been a significant movement from the Python development team to standardise aspects of tooling. Tools like Poetry and uv weren't possible a few years ago before there was pyproject.toml to unify a bunch of separate things, for example.


I don't know I don't care. I just use pip. If I need to virtualize I just do so at che OS or Kernel level.

At any rate I take care of all of my python installs by not downloading a gazillion of random packages online, if I ever reach the situation where package 517 depends on package 208 and package 598 depends on a different version of package 208, I'll just pull out the FlemmenWerfer and trash the whole thing before it reproduces.


So the fact that there are 5 different tools that (attempt to?) fix this problem is a sign that it is not a problem, from your point of view?

It's a sign developers get stuck in a paper bags. Same thing with text editors, orms, frameworks.

Imagine 2050, flying cars, talking robots, teleportation, and John Developer is going to be releasing solution 57 to a problem that was solved in the 90s


What is the 90's solution to this problem - autoconf, apt, yum, oops, here we go again with multiple standards.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: