Congrats to him for finding something fun to do in retirement - dictators usually end up with a different outcome. ;)
I'm looking forward to seeing the future of Python - I think this move will be great for the whole community, and lets him push boundaries without being bogged down on the management side.
The biggest hurdle to python right now is the stupid package managers. We need cargo for Python.
I strongly suspect that devs' satisfaction with Python is strongly correlated with the size of the codebase they're working on. Generally people using Python for one-off projects or self-contained tools tend to be pretty happy. People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.
What I've observed a lot is that many startups or greenfield projects start with Python to get an MVP out the door as fast as possible. Then as the scope of the software expands they feel increasingly bogged down and trapped in the language.
Part of the success of IG's large Python codebase is FB's investment into developer tooling; for example, FB wrote (and open-sourced, FWIW) our own PEP-484 compliant type checker, Pyre , because mypy was too slow for a codebase of our size.
For it's age and popularity - the tooling is abysmal.
I have to rewrite too many things, that I expected to just be there for the age of the project.
And some things were only fixed now! dict + dict only started to work with 3.9!
Frustrated like you, I wrote my own open source type checking library for Python. You might be interested to read about it here: https://github.com/kevinarpe/kevinarpe-rambutan3/blob/master...
I have used that library on multiple projects for my job. It makes the code run about 50% slower, on average, because all the type checking is done at run-time. I am OK with the slow down because I don't use Python when I need speed. My "developer speed" was greatly improved with stricter types.
Finally, this isn't the first time I wrote a type checking library/framework. I did the same for Perl more than 10yrs ago. Unfortunately, that code is proprietary, so not open source. :( The abstract ideas were very similar. Our team was so frustrated with legacy Perl code, so I wrote a type checking library, and we slowly applied it to the code base. About two years later, it was much less painful!
> doesn't scale well.
Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.
> bad packaging when there's a lot of cross-cutting dependencies
Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.
> slow performance
Meh, again depends on your use case. If you're really into performance then dump out to C/C++ and pybind it. fronting performance critical code in python is a fairly decent way to allow non specialists handle and interface performance critical code. Its far cheaper to staff it that way too. standard python programmers are cheaper than performance experts.
If we are being realistic, most of the time 80% of python programs are spend waiting on network.
Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.
> no concurrency
Yes, this is a pain. I would really like some non GIL based threading. However its not really been that much of a problem. multiprocessing Queues are useful here, if limited. Failing that, make more processes and use an rpc system.
> typing as second-class citizens
The annotation is under developed. being reliant on dataclass libraries to enforce typing is a bit poop.
> People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.
I work with a _massive_ monorepo. Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions. None of that is python's issues, its egotistical programmer not wanting to read other people's (un documented) code. And not wanting to spend time make other people's code better.
This is very important. A lot of people think that just using go or rust or whatever other language is new fixes all of this. But with a big enough project, you'll find all the issues. It's just a matter of time.
I love Python's bignum arithmetic when I write small prototypes for public key cryptography. I love Python's extensive standard library when I'm scrapping a couple web pages for easier local reading. But I would never willingly chose it for anything bigger than a few hundred lines. I'm simply not capable of dealing with large dynamically typed programs.
Now if people try Rust or OCaml with the mentality of an early startup's Lisp developer, they're going to get hurt right away ("fighting the language" and "pleasing the compiler" is neither pleasing nor productive), and they're going to get hurt in the long run (once you've worked around the language's annoying checks, you won't reap as much benefit).
If you'll allow the caricature, don't force Coq down Alan Kay's throat, and don't torture Edsger Dijkstra with TCL.
No, docker doesn't solve the fact that some packages just won't play nicely together. NPM actually does this better than the python ecosystem too since it will still work with different versions of the same dependency. You get larger bundle sizes but that's better than the alternative of it just flat not working.
As for processing scalability - Python is OK, but it's considerably hampered by BDFL's own opinions. The result is a few third party libraries that implement parallelism in their own way. That functionality should be integral to the standard library already. The worst part is lack of standard API for data sharing between processes.
Python's packaging issues only start with package management. Setuptools is a wholly mess of a system, that literally gave me headaches for the lack of "expected features". I hate it with every single cell in my body.
And then there are systems and libraries, where you literally cannot use docker (Hello PySpark!).
>read other people's (un documented) code
I lolled! Seriously... We get Python fanboys moan about how indentation makes everything more readable and it's a pleasure to write code in python. Give me a break!
When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years.
...I feel attacked.
>Nothing scales well. scaling requires lots of effort.
Sure, just like all PLs have their flaws, and most software has security vulnerabilities. But it's a question of degree and the tendency of the language. Different languages work better in different domains, and fail in others, and what Python is specifically bad at is scaling.
If only for the lack of (strong/static) typing and the relatively underpowered control flow mechanisms (e.g. Python often using exceptions in their stead)... While surely all languages have pain points that show up at scale, Python still has a notable lot of significant ones precisely in this area.
>docker, poetry, venv...
Yes, and this is exactly the point. There's at least three different complex solutions, none of which can really be considered a "go-to" choice. What is Rust doing differently? Hell, what are Linux distros doing differently?
>If you're really into performance then dump out to C/C++ and pybind it.
If you want performance, don't use Python - was the parent's point.
>If we are being realistic, most of the time 80% of python programs are spend waiting on network.
This really, really doesn't apply to all of programming (or even those domains Python is used in). Besides, what argument is that? If it were true for your workload, then it would be so for all other languages too, meaning discussion or caring about performance is practically meaningless.
>Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.
Once again, this applies to all languages equally, yet, for example, Python web frameworks regularly score near the bottom of all benchmarks. I doubt it is because of the lack of bright programmers working in Python, or the lack of efforts to make the frameworks faster.
>Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions.
Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation.
You can always trace any given error to a single individual making an honest mistake, that's really not a useful way to think about this. It's about a programming language (or an environment) leading the programmer into wrong directions, and the lack of safety measures for misguided "egotistical programmers" to do damage. You can blame the programmers all you want, but at the end of the day, the one commonality is the language.
Now Python is still one of my favorite languages, and I think that for a lot of domains, it really is the right choice, and I can't imagine doing my work without it. But performance and large, complex systems, is not one of those domains, and I honestly feel like all you've said in Python's favor is that other languages are like that too, and that it's the fault of the programmers anyway.
The is a points that I think I've failed to get over:
> Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation
I don't think I was arguing that point. of course all languages have their USP. My point I wanted to get across is that large python projects are not inherently hard to manage. That kind of scaling is really not that much of an issue. I've worked on large repos for C, C++, python, perl, node and as a punishment, php. The only language that had an issue with a large codebase was node, because it was impossible to build and manage security. The "solution" to that was to have thousands of repos hiding in github.
The biggest impediment to growth was people refusing to read code, followed swiftly by pointless abstractions. This lead to silly situations where there were 7-12(!) wrappers for s3 functions. none of them had documentation and only one had test coverage.
It's like getting on a roller coaster without a seat belt or a guard rail. It's fun at first, and you will make it around the first few bends OK ... then get ready ...
Of course, with enormous discipline, skill and effort you can overcome all this. But it just leaves the question - really, is this the best tool for the job in the end? Especially when you are paying for it with horrifically bad performance and other limitations.
Not all codebases are equal and maybe I was lucky, but in my experience, using dynamic languages (or, to be exact, any language where the compiler doesn't nag you when there is a potential problem) doesn't scale well.
Large codebases are really hard reason about without types. I'm glad we now have projects like Pyre that are trying to bring typing to Python.
large python code bases _could_ be written with well modularized, clean separation of concerns and composability. Or it could be written in spaghetti.
Using types _could_ help a code base from becoming spaghetti, but it's not the only way. I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.
The issue is that to have a nice and well architected code base, you have to constantly refactor and improve - sometimes you need to re-arrange and refactor huge parts of the code. Without types _and_ tests, this is just not gonna happen. It will be unproductive and scary, so that people will start to stop touching existing code and work their way around it.
> I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.
That is the same thing. Because someone who wants great maintainability will also want a great type system (amongst other things).
The quality of the product is down to the skill of the worker either way.
The same with types: they make some kind of errors much less likely though there is no silver bullet in general e.g., I much prefer a general purpose language such as Python for expressing complex requirements in tests over any type system (even if your type system is turing complete and you can express any requirement in it; it doesn't mean it is a good idea)
Everything interlocks in such intricate ways that you can't meaningfully choose your own tools, and working around problems only goes so far. And you can't repair your own tools.
Can you explain why? I honestly don't know, because my experience with C++ was during school ~20 years ago, and since then professionally I've used mostly python in relatively small codebases where it's all my own code (mostly for data processing/analysis). Thanks!
(Although I did have to write some C code to glue together data in a very old legacy system that didn't support C++, much less python. It took a lot more effort to do something simple, but it was also strangely a really rewarding experience. Kind of similar to feeling when I work with assembly on hobby projects)
Static typing prevents that by telling you early where the mismatch is happening - some method calls into another with a variable of the wrong type, and that's where the bug is. It also allows tooling to look up the types of variables and quickly get information about their properties.
If you can define methods on an object dynamically in Python, it doesn't mean that you should. Monkeypatching is not encouraged in culturally in Python. Most often it is seen in tests otherwise, it is rare.
Nobody forbids using ABCs to define your custom interfaces or using type hints for readability/IDE support/linting (my order of preference).
The Scala Spark project I can navigate, understand, test and consider to be average complexity... with some failures, unique to Scala.
The Python Spark project is barely readable.
People who built the Python Spark codebase are "experienced Python devs". While Scala codebase was built by people who used Scala for the first time.
(take this anecdote, as evidence for the poor tooling and guidance present in python community.... and BDFL's own failures)
Otherwise its a thousand implementations of the same 100-line piece of code interspersed everywhere.
Which is strange because those same managers may be full adherents to micro tasking projects in a project management system whose purpose is basically to do for the project what code management does for the code itself.
In my workplace, we've recently had leadership that appreciates these things, and the difference is night & day. Simple requests from "stakeholders" (I hate that term) are often filled in days, or same day, instead of weeks. I think it helps tremendously that the primary manager is also a coder herself, and still codes ~25% of her job.
I believe it's Guido that basically said - if you don't like how Python does it, then implement it in C. And that's how you end up with great C based libraries bound to python... and python is often used as a messy orchestrating language.
Or even worse - could you imagine how many lines that would be in C++ ?
I've seen the exact same scenario with other languages. The problem is that in a start up environment you are likely adding amd retiring more "features" at a speed that layers so much complexity that you can no longer reason about what business rules are actually valid any more.
It wouldn't surprise me if many of these issues are self-selecting in the language communities as well.
Dependency management is about as easy as it is going to get. We have problems with our dependencies breaking stuff, but who doesn’t?
People talk as if packaging is a solved problem. It isn’t in any language. And then they complain that Python packaging changes too much. That’s because folks are iterating on a hard problem.
- `curl -L https://app.example.com/install | sh` that downloads installer and runs for instance: apt/yum install <your-package>
- in CI environment on a VM: `git checkout` & `pipenv install --deploy`
- `pipx install glances` on a home computer
- just `pip install` e.g., in a docker container [possibly in a virtualenv]. For pure Python packages, it can work even in Pythonista for iOS (iphone/ipad)
- just copy a python module/archive (PyInstaller and the like)
- give a link to a web app (deployed somewhere via e.g., git push)
- for education: there are python in the browser options e.g., brython, repl.it, trinket.io, pythontutor.com
- just write a snippet in my favourite editor for literate devops tasks/research (jupyter-emacs + tramp + Org Babel) or give a link to a Jupyter notebook
- a useful work can be done even in a REPL (e.g., Python as a powerful calculator)
I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.
No. But if I talked about how I used 9 different word-processing programs, you'd see that as a problem, or at least an indictment of those programs. Deployment isn't that complicated.
> I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.
I use Maven/Scala and as far as I can see it covers all of them other than "give a link to a web app" which isn't actually deploying at all (and I'd still have used maven to deploy the webapp wherever I was deploying it).
I don't think there's any legitimate case for curl|sh, and I don't think there's any real reason for separate pip/pipenv/pipx (did you make that one up? Have I fallen for an elaborate troll?) - rather pipenv exists to work around only being able to install one version of a library at a time. Nothing's gained by having "just copy a module/archive" be different from what the tool does. Running in browser, notebook, or REPL can and should still use the same dependency management tooling as anything else.
If I want to deploy my code, I use maven. You can use curl (since maven repositories use standard HTTP(S)) or copy files around by hand, if you have a use case where you need to, but I can't think what that would be. If you want to bundle up your app as a single file, you can configure things to do that when publishing, but the dependency resolution, repository infrastructure, and deployment still look the same. Even if you want to build a platform-level executable, it's the same story, all the tooling just works the same. If I want a REPL or worksheet, I can start one from maven (and use the same dependency management etc. as always), or my IDE (where it's still hooked up to my maven configuration). If I want to use a Zeppelin notebook then there's maven integration there too.
Ever wonder why you don't hear endlessly about different ways of doing dependency management in non-Python ecosystems? Because we have tools that actually work, and get on with actually writing programs. It baffles me that Python keeps making new tools and keeps repeating the same mistakes over and over: non-reproducible dependency resolution, excessively tight integration between the language and the build tools, and tools and infrastructure that can't be reused locally.
- system packages (deb/rpm/etc)
- binary wheels (manylinux)
- building from source
To take your examples in order:
1) system packages: almost always out of date for my needs
2) Binary wheels: I actually haven't investigated this much, maybe it will work (and if it does, I'll buy you a drink if we ever meet in person).
3) Building from source: this kinda proves my point about Python having poor dependency management tools if this is a serious response. In general, this would be much further down the rabbit hole than I want to go.
That said, I do run into trouble when I have a dependency that requires compilation on Windows (i.e. like the popular turbodbc) because say, a wheel isn't available for a particular Python version. Any time a compilation is needed, it's a headache. Windows machines don't come with compilers, so one has to download and install a multigigabyte Visual Studio Build Essentials package just to compile. Sometimes the compilation fails for various reasons.
Require gcc compilation is headache for installing dependencies inside Docker containers too -- you have to install gcc in order to install Python dependencies and then remove gcc after.
I think requiring local compilation (instead of just delivering the binary) is a UNIX-mindset that is holding back many packaging solutions. I think a lot of pain would be alleviated if we could somehow mandate centralized wheel creation for all Python versions, otherwise the package manager marks a package as broken or unavailable and defaults to the last available wheel.
Also if only we applied some standards like R's CRAN repo does -- ie. if it doesn't pass error checks or doesn't build on certain architectures (institute a centralized CI/CD build pipeline in the package repo), it doesn't get published -- the Python packaging experience would be much improved.
For those who don't realise, when there's a new version of R, anything that doesn't build without errors/warnings is removed from the archive.
This is really annoying if you want something to keep running, but it prevents the kind of dependency rot common to Python (recently I found a dependency that was four years out of date).
Once a pip install needs to start compiling C, things do go way south very quickly. At that point you can install the union of all common C development tools, kernel headers and prepare for hours of header hunting.
I've done that too much to like python anymore.
Additionally, they are part of a larger application, which is mostly managed by pip, which means that I need both pip and conda which is where things get really, really hairy.
I actually blame Google and FB here, as neither of them use standard python dependency management tools, and many of their frameworks bring in the world, thus increasing the risk of breakage.
And putting them into a common shared directory.
Try doing that without writing convoluted code in your setup.py.
Deployment for development is just pyenv and virtualenv.
Which is still wrong, of course, but "no in-process (or in-single-runtime-instance) parallelism" would be correct, as would "forking inconvenient parallelism".
To claim that a language "supports parallelism", it has to do something more to facilitate parallel programming. I would say that parallel threads of computation with shared memory and system resources is the bare minimum. You can go the extra mile and support transactional memory or other "nice" abstractions which make parallel programming easier.
Saying that Python support parallelism because it has a fork() wrapper is like saying that Posix shell is a strongly typed language because it has strings and string is a type.
Pretty much any app that uses both fork and threads, has to jump through many hoops to make the two work together well. And this applies to all the libraries that it uses, directly or indirectly - if any library spawns a thread and does some locking in it, you get all kinds of hard-to-debug deadlocks if you try to fork.
So unless you have very good perf reasons to need fork, I would strongly recommend multiprocessing.set_start_method("spawn") on all platforms. No obscure bugs, and it'll also behave the same everywhere, so things will be more portable. Code using multiprocessing that's written to rely on fork semantics can be very difficult to port later.
This was probably a conscious design decision on the part of C Python implementers and perhaps a good one. But we should not claim that Python is something which (actively and by design) it's not.
It you are trying to get performance out of it (which doesn't really hinge on whether it's a million lines of code), then Python might be the wrong choice. But you can always write it in Rust or C and give Python an API to the functionality.
I agree that packaging is a mess. Fixing that mess with modularization in Java took a long time, and most other languages have that problem, too.
Explicitness and naming standards screw up the clarity of any code... Not to mention the complexity when you get into OOP.
This seems to be the case with most languages, especially if good code control isn't practiced, and unfortunately that's not uncommon.
The biggest need for a package manager and its ecosystem is continuity: the stance that new features and paradigms will be gradually shifted toward — without package-ecosystem incompatibilities, without CLI commands just disappearing (but instead, with long deprecation timelines), etc.
In other words, an officially-blessed package manager is one where, when something better-er comes along, it gets absorbed by the existing thing, instead of replacing it.
That is what the Python ecosystem is missing.
EDIT: It was probably misleading to characterize Pipenv as advertising itself as solving all problems; it’s probably more correct to say that it’s significant weaknesses weren’t advertised and thus one has to invest considerably before discovering them for oneself.
Just one example: you want your virtualenvs to be created in ~/.virtualenvs so that pipenv is a drop-in replacement for virtualenvwrapper+pip? Tough luck for you, Kenneth Reitz doesn't think that's how it should be done.
At least 3 or 4 times some issue I've wanted resolved I found in the issue tracker with the last message "we'll have to check with kennethreitz42 whether we're allowed to change that" and then silence for a year.
It could still catch up with poetry, but from what I've seen there's a fundamental mindset difference in how change requests are approached between pipenv and poetry.
I disagree. I used to think that that's the problem, but having seen a few more cycles of it, the problem isn't that kind of commitment - after all, the whole python ecosystem enthusiastically jumps into the new thing, and Python people are used to relatively short deprecation cycles. The problems are the actual problems; every Python package manager is just embarrassingly awfully bad as soon as you try to use it for 5 minutes, presumably because they're developed by Python people who've never used a decent package manager and so think that no-one could ever need deterministic dependency resolution, once you've pinned a transitive dependency there surely wouldn't be any reason to ever want to unpin it, having the package manager coupled to the language version is absolutely fine, no-one could ever want a standard way to run tests ...
He just doesn't care about package management.
Last time I tried to use poetry (and this is why it was the last time I tried to use poetry), it ignored global pip settings and had no documented mechanism for its own settings (I believe poetry uses its own implementation of or captive install of pip) which made it completely unusable in a corporate environment with annoying SSL interception issues to work around where pip + venv worked.
Poetry is a much smoother experience when it works, though.
I think this is happening these days frequently. People try to cover all use cases and then end up in biting more than they can chew. It won't work that way. Good set of MINIMALS, is easy to maintain, sustain and extend.
Given the varied use cases for Python, the goal of a single package manager may be misguided.
That was a long time ago, though, when scientific computing was a small niche for Python. It might have been reasonable to say it's not worthwhile to take on all that extra work just to support the needs of a small minority of users. Fast forward the better part of a decade, and it turns out that scientific computing did not stay a small niche. I think that one could make a strong argument that, in retrospect, that brush-off did not end up ultimately serving the best interests of the Python community. It made the community more fragmentary, in a way that divided, and therefore hindered, efforts at addressing what has proven to be one of Python's biggest pain points.
I forgive myself, it's pretty confusing :D.
It is true that PyPi was designed before the author/project naming scheme popularized by github. Other than that I don't see a greater problem with name collisions in Python.
That's why a strong leadership in a community, or subcommunities, works well. Python lacked this leadership, that leads to millions of half-arsed projects that compete... without moving the whole platform forward. It feels like NIH syndrome has permeated Python. Hopefully that's going to change
- no way to check if the lock file is up to date with the toml file
- no way to install packages from source if the version number is calculated (this will likely never be fixed as it's a design decision to use static package metadata insetad of setup.py, but is an incompatibility with pip)
- no way to have handle multiple environments: you get dependencies and dev-dependencies and that's it. You can fake it with extras, but it's a hack
- if you upgrade to a new python minor version you also have to upgrade to the latest poetry version or things just fail (Something to do with the correct selection of vendored dependencies. May have since been fixed -- new python versions don't come out all that often for me to run into it. And in fairness the latest pip is typically bundled with each python so it avoids that issue)
I still use poetry because it's more standard than hand-rolled pip freeze wrapper scripts, and there's definitely progress (the inability to sync packages was a hard requirement for me but is not fixed) but it's not quite there yet
I think it's because they have their own internal build systems, but they never play well with pip/conda et al.
One of my recent breakages was installing the recsim package, which pulled in tensorflow and broke my entire app. There's actually a recsim-no-tf package on PyPi, presumably because this happens to loads of people.
The core problem is that pip will happily overwrite your existing dependencies when you attempt to install a new package.
I migrated the CI/CD of my company to Poetry some time ago, it worked fine for some time until we needed a feature that Poetry didn't support. I submitted a PR adding the feature to Poetry but their sole developer was apparently taking some time off and the project remained without any development for several months.
I migrated the CI/CD to use my own Poetry fork but it was very cumbersome, Poetry has a very weird build system so forking it is not simple.
At this point, I realized that I was just wasting time. There is nothing that Poetry does that the other (old and stable) tools don't do. Poetry was the result of me falling for the shiny toy syndrome.
But a plurality of the people I encounter in the Clojure community came there because leiningen (Clojure's package manager that uses Maven under the covers) "just works" and they got tired of having a tough time reproducing builds consistently on other platforms / OSs with Python; not to mention the performance gains of the JVM.
You don't want that. When companies "sponsor" things they try to take them over, unless it's a pure donation, which is rare. The community then drops out because a company is in control. Later the project is abandoned by then company. It's a slow death spiral.
I would love if Guido could create a new PEP for extending modules with generic namespaces, ala Perl/CPAN modules.
There aren't 15 different libraries for doing the same thing in Perl, there's 1. You never replace it, you extend it by making a new module in a hierarchical namespace. The same core library's code might not change in years while new extensions can keep popping up. So even if you think Requests sucks, you can make Requests::UserAgent which inherits Requests code and extends it / gives a better interface. And these can be written & packaged by completely different authors.
Then maybe Pypi wouldn't have 5,000 nearly identical yet mostly unusable modules, or modules with nonsense names.
Some of this has been mitigated with virtualenv but having a project express it's packages & have that automatically reflected in the environment.
Finally, Cargo to my knowledge actually lets multiple dependencies exist (even within the same project!!!) so that you can have a dependency like:
dep1 -- dep3 <= v1.6
That's not possible if you don't have the right language hooks because module resolution needs to be aware of the version of the library (i.e. when you go `import numpy`, it actually needs to be aware of the package it's being imported from to resolve that correctly).
Now whether or not it's a good idea to support this kind of dependency stuff can be controversial. In practice though clearly it does cause problems the larger your codebase gets as you're more likely to have some nested dependency chain that's time-consuming to upgrade so you'd rather move faster than make sure you're only running 1 version of the dependency.
I really don't get the fuss about the global dependency management though, maybe I would change my mind if Python shipedp a great implementation of it. But I feel like the problem is already solvable in multiple ways with containers, VMs, or virtualenvs and I don't think yet another abstraction to separate environments would add much value to my day to day workflow building python apps.
And yet, I hear so many more complaints about Python pip and I really don’t understand the disconnect. Perhaps dislike of pip is actually triggered by usability issues? And then people look for other reasons to explain their dislike?
As mentioned elsewhere in this thread, resolving this dependency issue would require a change in the Python language itself.
If you're only working with the standard library, boto3, twisted and redis - you're unlikely to have issues. You get into big issues, when you get to more obscure libraries... or libraries that are C bindings.
There are per python binary, per user, per virtualenv installation (per project or per whatever you like) that make the conflict less likely.
Sometime packages "vendor" their dependencies e.g., there is `pip._vendor.requests` (thus you may have different `requests` versions in the same environment).
There were setuptools' multi-version installs https://packaging.python.org/guides/multi-version-installs/ (I don't remember using it explicitly ever -- no need)
That's not pip's fault, that's Python's fault. Python's module system has no concept of versioning, so there can only ever be one copy of a module that has a given name.
And this is an interpreter detail that is exposed through the language itself, so it can't be fixed without causing severe pain.
That's a bug, not a feature. It enables sloppy development and the disasters like on NPM
> Now whether or not it's a good idea to support this kind of dependency stuff can be controversial. In practice though clearly it does cause problems the larger your codebase gets as you're more likely to have some nested dependency chain that's time-consuming to upgrade so you'd rather move faster than make sure you're only running 1 version of the dependency.
Consider the view that some times it can be sloppy & other times it's not & it's impossible to distinguish between the two in an automated fashion.
If a package is just using a particular library internally, I don’t see why the package manager should prevent using it with another library that depends on a different version.
I do. The main reason for Linux distributions to exists is to provide a development and running environment where:
- API/ABIs do not change for the whole lifetime of the distribution. No new features, no new bugs, no new vulnerabilities, so that your production code can run reliably for 5+ years.
- Vulnerabilities are fixed with minimally invasive patches.
- Vulnerabilities are fixed in reasonable times even if the upstream development stopped. Patches are well tested against the set of packages in the distribution.
You simply cannot have these 3 features together if a distribution ships 10 different version of each library.
It's already a ton of work to maintain packages in stable distributions.
For example, my experience with `cargo` (that I mostly use to install command-line utilities such as rg, fd, dust): it is great when it works as written in the instructions
but sometimes it doesn't (running `cargo` may involve a lot of compiling -- in contrast to `pip` which can use wheels transparently and avoid compiling even for modules with C extensions -- I guess there might be a way to do something similar with `cargo` though not by default).
Perhaps ruby was more "hackable" by bundler. (Bundler has now become part of ruby stdlib, but didn't start out that way, it definitely started hacking around the way the more fundamental stdlib 'rubygems' worked).
Kind of, if you ignore Rubygems, which is also part of stdlib at a lower level then bundler (and also, originally wasn't.)
> but bundler dependency manager solves it anyway to give you per-project dependencies not just system-wide dependencies.
It can do that because rubygems manages multiple installed versions of packages and allows per-project ("per call to require", potentially, IIRC) specification of which one to pull from the globally-installed versions (this was originally done by monkey patching require when rubygems was an add-on.) This lets bundler easily live on top of it providing per-project dependencies somewhat more smoothly than Rubygems does without requiring anything like a venv.
> Perhaps ruby was more "hackable" by bundler.
Ruby is ludicrously hackable, yes.
> (Bundler has now become part of ruby stdlib, but didn't start out that way, it definitely started hacking around the way the more fundamental stdlib 'rubygems' worked).
Rubygems also wasn't part of stdlib originally, and started out relying on hacking around the way Kernel#require works.
Oh wow, the default python dependency manager only lets you have one version of each package installed system-wide?
Yeah, that is a limitation. As opposed to rubygems (the first dependency manager although as you say not originally built-in to ruby) which has system-wide install, but always let you have more than one version installed.
Without fixing that one way or another, there's no sensible way, true. virtualenv is certainly one way to fix it. I wonder if there would have been a more rubygems way to fix it.
It’s a directory - you delete and create them at will, fast, and don’t worry or care about the system Python. Having some crazy setup that patches “require” to handle concurrently installed package versions seems insane, especially if you cannot actually use them concurrently in the same Ruby process. So, segmenting them by project (aka virtualenv) seems like the best solution.
Bundler is not a "node_modules" style setup. It does not require dependencies to be in a local path (although they can be, the default is they live in a system-wide location, and this does not limit functionality). It also does not support more than one version of a dependency in the same execution environment (as node_modules does) -- that really would be impossible in ruby too.
It's possible something about python's design would make the bundler approach impossible, I don't know. But it's not "dependencies are installed globally" alone, as that's true of ruby too.
We would probably all benefit in understanding better how these things are handled in other environments. And I include myself here. I think ruby's bundler really set a new standard for best practices here, and many subsequent managers (like cargo) were heavily influenced by it, although many don't realize it. But meanwhile many don't even realize what they are missing or what's possible.
Like the basic idea of having a specification of top-level dependencies (including allowable ranges) separate from a "lockfile" of exact versions in use of ALL dependencies... is just so hugely useful I never want to do without it, and I think is compatible with just about any architecture, and yet somehow JS is still only slowly catching on to it.
But nothing stops you from masking global-libraries with local library versions (similar to node_modules). Why hasn't anyone done this you may ask? I don't know the answer to that.
So yes, Python’s import system is dynamic enough to do crazy things, but I don’t see how we can ever retrofit that into Python.
Regarding dependencies installed into a project-local directory (node_modules): that’s a virtual environment. Just more flexible.
If you're using python, you don't know to check out pyenv/etc until you have a huge mess on your computer due to pip's behavior.
Maven Scala project - create skeleton, add libraries to POM, write app, run app
PIP Venv Python project - create venv, enable venv, create requirements file, write app, run pip to install dependencies(possibly install GCC and extra libraries), run app
(Oh... and god forbid that you forget to deactivate venv)
You're lying when you say that library management is easier in Python. It's just factually untrue.
Instead, simply run the interpreter installed in the environment when you run your app, e.g. "./my_env/bin/python my_app.py", and things will just work. No activation required, no special mode, nothing to forget.
The part about requirements.txt and installing packages could also be simplified if you did it the other way around: install first and create the requirements file from that:
$ python3 -m venv my_env
$ my_env/bin/pip install some-dependency
$ my_env/bin/pip freeze >requirements.txt
$ my_env/bin/python3 my_app.py
That's before you get to package your app...
There's no community consensus - that keeps Python from advancing to where it needs to be.
I said it once and I'll say it again - Python lacks mature tooling.
There are contexts where little delays matter, and you didn't pick one of those.
It's literally a `mvn jar` and that's it!
It's really hard with many deps, it's why cabal (for instance) moved away from a global model.
It is a package manager but it lacks features that many other package managers have in Ruby, Node, Elixir, and other languages.
For example there's no concept of a separate lock file with pip.
Sure you can pip freeze your dependencies out to a file but this includes dependencies of dependencies, not just your app's top level dependencies.
The frozen file is good to replicate versions across builds but it's really bad for human readability.
Ideally we should have a file made for humans to define their top level dependencies (with version pinning support) and a lock file that has every dependency with exact pinned versions.
* Basically I would have a single bash script that every `.py` entrypoint links to.
* Beside that symlink is a `requirements.in` file that just lists the top-level dependencies I know about.
* There's a `requirements.txt` file generated via pip-tools that lists all the dependencies with explicit version numbers.
* The bash script then makes sure there's a virtual environment in that folder & the installed package list matches exactly the `requirements.txt` file (i.e. any extra packages are uninstalled, any missing/mismatched version packages are installed correctly).
This was great because during development if you want to add a new dependency or change the installed version (i.e. pip-compile -U to update the dependency set), it didn't matter what the build server had & could test any diff independently & inexpensively. When developers pulled a new revision, they didn't have to muck about with the virtualenv - they could just launch the script without thinking about python dependencies. Finally, unrelated pieces of code would have their own dependency chains so there wasn't even a global project-wide set of dependencies (e.g. if 1 tool depends on component A, the other tools don't need to).
I viewed the lack of `setup.py` as a good thing - deploying new versions of tools was a git push away rather than relying on chef or having users install new versions manually.
This was the smoothest setup I've ever used for running python from source without adopting something like Bazel/BUCK (which add a lot of complexity for ingesting new dependencies as you can't leverage pip & they don't support running the python scripts in-place).
Isn't that a good thing?
> no concept of a separate lock file with pip.
setup.py/.cfg vs requirements.txt, no?
Yes, a very good thing.
> setup.py/.cfg vs requirements.txt, no?
A lot of web applications aren't proper packages in the sense that you pip install them.
They end up being applications you run inside of a Python interpreter that happen to have dependencies and you kick things off by running a web app server like gunicorn or uwsgi.
For a Python analogy vs what other languages do, you would end up having a requirements.txt file with your top level dependencies and when you run a pip install, it would auto-generate a separate requirements.lock file with all deps pinned to their exact versions. Then you'd commit both files to version control, but you would only ever modify your requirements.txt by hand. If a lock file is present that gets used during a pip install, otherwise it would use your requirements.txt file.
The above work flow is how Ruby, Elixir and Node's package managers operate out of the box. It seems to work pretty well in practice for ensuring your top level deps are readable and your builds are deterministic.
Currently there's no sane way to replicate that behavior using pip. That's partly why other Python package managers have come into existence over the years.
My method for deploying a web application is to have a Dockerfile which pip-installs the Python package, but I could see someone using a Makefile to pip-install from requirements.txt instead. In fact, I use `make` to run the commands in my Dockerfile.
I am running a pip install -r requirements.txt when I do install new dependencies. I happen to be using Docker too, but I don't think that matters much in the end.
In practice it doesn't tho.
Let's say I'm working on a project without a lock file and commit a change that updates my dependencies. I get distracted by anything and don't push the code for a few hours.
I come back and push the code. CI picks it up and runs a docker-compose build and pushes the image to a container registry, then my production server pulls that image.
With this work flow there's no guarantee that I'm going to get the same dependencies of dependencies in dev vs prod, even with using Docker. During those few hours before I pushed, a dep of a dep could have been updated so now CI is different than dev. Tests will hopefully ensure the app doesn't break because of that, but ultimately it boils down to not being able to depend on version guarantees with Docker alone.
There's also the issue of having multiple developers. Without a lock file, dev A and B could end up with having different local dependency versions when they build their own copy of the image.
I've seen these types of issues happen all the time with Flask development. For example Flask doesn't restrict Werkzeug versions, so you wake up one day and rebuild your image locally because you changed an unrelated dependency and suddenly your app breaks because you had Werkzeug 0.9.x but 1.x was released and you forgot to define and pin Werkzeug in your requirements.txt because you assumed Flask would have. The same can be said with SQLAlchemy because it's easy to forget to define and pin that because you brought in and pinned Flask-SQLAlchemy but Flask-SQLAlchemy doesn't restrict SQLAlchemy versions.
Long story short, a lock file is super important with or without Docker.
if you make pip run 'pip freeze > requirements.txt.lock' after every 'pip install whatever', you almost solve that particular problem if setup.py is configured to parse that (it isn't by default and there's no easy way to do that!)
The setup.py file contains a human readable designation of requirements and then `pip compile` generates a requirements.txt with all deps' (and deps of deps') versions specified.
But, for the other 99% projects though: Most of their dependencies won't break compatibility, you'll never uncover a hard version dependency that the package manager can't solve, you'll never need to "freeze" your dependency versions and you can pretty much just rely on a semi-persistent environment with all your necessary packages installed and semi-regularly updated. Essentially smooth-sailing.
"evangelized" a sibling team also to consider switching, they were sceptical but just recently they mentioned they also like it more.
Python is plenty fast for most automation tasks.
Projects like Cython allow to tweak without too much effort the parts of a program that need an extra-boost.
Last but not least, there have been discussions in the Python community in the last weeks of ways to speed up considerably the default (CPython) implementation.
Hopefully, all of this will bear some fruits in the next 2-5 years. Guido will probably help from his new position at Microsoft.
This piqued my interest. I found this, by Mark Shannon:
Is that what you mean?
Try building a Unix app, with data in standard Unix locations(bin, shared, lib, etc) and you'll find that you have to write custom code.
And Google searches don't help :(
(Although the maven package ecosystem seems ideal to me, Gradle is just "good enough" - it's mostly the standards around versioning and tooling dealing with versioning and everything being there that makes it good to me).
It should be heavily promoted as the official option and included with the distribution.
You realize that we're potentially just one leadership change away from returning to the 'bad old days', right?
It controlled the client (desktop) and was working on controlling the server, too.
Google, Facebook, Netflix, etc didn't exist. Amazon was much smaller and in a different niche. Apple was trying to not keel over in 2 months.
We're not a leadership change away from anything. The world has changed.
The world has changed, but I didn't mean "the bad old days when Microsoft had a stranglehold over the industry" but just "the bad old days of an evil Microsoft". Their market power isn't relevant to whether they are a bad actor, only to how large their impact is as a bad actor.
I was already in love with Python, but Poetry has definitely made me a much happier Python developer.
The rest is too vocal.
I doubt that even 90% of those professional python developers who believe that python dependencies is a solved problem believe that it is solved with pip and virtualenv; the conda faction has to be bigger than 10%. Plus there's the people that think pip/venv aren't enough, but that tools on top of them plug the gaps (poetry).
But I think that the share of professional developers who see it as a solved problem at all is less than 90%. Obviously, we've all got some way of working with/around the issues, that doesn't mean that we don't feel that they exist.
conda had minor usage in the last one for building a handful of special projects mixing C++ and Python code (highly specific code in the finance industry), after build the artifacts could go into the python repository (internal pypi) and be usable with pip. Everything was down to pip at the end of the day. As a matter of fact, the guys who used and pushed for conda were also the ones pushing the hardest for pip because pip is the answer to everything.
Our data scientists like Conda - but our developers don't touch it.
99% of professional python developers think that you've pulled this statistic out of your ass!
> 99% of professional python developers think that you've pulled this statistic out of your ass!
72.6% of all statistics are made up, anyway.
I struggle to see what spending time looking at Poetry will yield in terms of any actual benefit, though I would love to be informed/educated otherwise.
I don’t expect my dev tools to be idiot proof, but they should at least try to be “I’ve been hacking for 18 hours straight and I just need to commit this last line and I can finally go to bed” proof.
That prevents you from ever doing a pip install in your root environment.
I've got about 20 different pip environments, and virtualenvwrappers (workon xxxx) makes it pretty seamless for me to hop back and forth between sessions. I'm also pretty dedicated to doing all my work in tmux windows - so my state in a half dozen virtualenvs is changed by changing my window (which I've done a workon)
I guess what I'm really interested in is, "The last three years of my life I've used virtualenv/wrappers + pip, and haven't run into any problems. What can Poetry do for me, and why should I change my work habits? Genuinely interested in using new and better tools.
I don't think I've even written a setup.py .
Obviously there's a whole world of development and deployment where these things are relevant, but there's also a massive world where nobody even understands what they are missing.
Need something like cabal. And a package index.
Have never used cargo - what can cargo do that conda cannot?
Alternatively, what does cargo do better than conda if they are not feature-for-feature comparable ?
I'm unsure what are the core issues, but in my experience cargo was always pretty quick to use and if it fails, it fails fast.
Conda on the other hand is slow for the simple cases, and if the graph becomes complex it will just churn for 15 minutes before throwing it's hands in the air and giving up with some cryptic error.
I suspect it comes back to the fact that packaging and dependency management was thought about upfront for Rust and the whole ecosystem was built well from the get go?
> This is due to the fact that not all libraries on PyPI have properly declared their metadata and, as such, they are not available via the PyPI JSON API. At this point, Poetry has no choice but downloading the packages and inspect them to get the necessary information. This is an expensive operation, both in bandwidth and time, which is why it seems this is a long process.
Cargo doesn't need to do anything with packages directly to do its job; everything it needs is in the index. This makes it pretty fast.
This sounds like something that could be done server side, either by PyPI or another entity and expose it through a new API endpoint, instead of doing it on every Python developer's machine.
There are more, but those are the big three.
I used conda to manage 2.6, 2.7 and 3.3 side by side, a d that was fine. I never locked the patch version (e.g. 2.7.3 vs 2.7.5) though that is definitely possible.
It apparently requires a specific workflow - explicitly editing the requirements.txt file, rather than freezing it - which is not harder and which I was doing since day 1 but is apparently uncommon.
(And it worked well across a few tens of machines, some running windows and some running Linux, with a mix of os versions and distributions. So I know it worked well already in 2013, and I’m sure it works better now).
Speed is not good, but I never had it take more than a minute for anything. Some people here are reporting 15 minutes resolution times - that is a real problem.
You make sure everyone uses the same Python version by setting it as a dependency. I mentioned I only set the dependencies on minor version (e.g. 2.7) rather than path version (e.g. 2.7.3), but the latter is supported - for the python version as well as for any other package.
You make sure the exact versions you want are in use by editing and curating the requirements.txt rather than freezing it. It really is that simple, but somehow that's not a common workflow.
prod vs. dev is the only one I didn't address because I don't have experience with that - but I know people who manage "requirements_prod.txt" vs "requirements_dev.txt" and it seems to work for them.
A better question, what’s your workflow for installing a dependency and 6 months later updating that dependency?
That’s basically the whole idea of conda’s solver, I think. If your list works - fine. If n9t, it will find a set of package installs that makes it work.
I guess that’s also why my resolution times are way faster than what some people describe.
I treat requirements.txt closer to a manually curated lock file than a wishlist (which most req.txt files in the wild are).
But manually curating the dependencies has been painless and works fine for me and my team for almost a decade now.
i've never used it particular feature, but then i'm using python since 1.5 and i'm just used to it being a bit behind the times. stockholm syndrome, you might say, especially after trying out rust and seeing the work of art cargo is.
The two worst are in the other comments: ensuring sync'd dependencies across multiple environments and developers, and the horrendous resolution times leading to a useless error message when failures occur.
You found a bug report for a problem specifically on fish on MacOS, which can be resolved by "deactivate / reactivate" according to the discussion.
Were you trying to say something?
What are those "workspaces" you refer to?
What is "feature selection"?
At this point, this criticism has become a "thing" that no one really expands on or enumerates as if it's a given. It's not a given, and at the very least is nuanced and complicated.
No doubt that Python is sometimes fast enough (e.g., vanilla CRUD apps that do all of the heavy lifting in Postgres), but sometimes it’s not and you’re left with really crumby optimization options. And since we rarely can know with certainty on the outset of a project whether or not the bottlenecks will be amenable to the optimizations afforded by Python, it’s a dangerous game. I would even go so far as to say that other languages have become quite good at many of the things that Python is good at (namely pace of development) while being much better at the things that it’s not good at (performance, package management, etc), so I actually wouldn’t recommend starting new projects in Python except for certain niches, like scientific computing (and who knows if Python will even retain its dominance there).
As for scientific computing domain, I would start a new project in Julia rather than Python.
The XKCD comic on Python package managers/Python environments is not an exaggeration. I've always wanted to get more into Python but every time I attempt to, it's this hurdle that dissuades me.
Edit: Also, I guess Poetry is another thing that came along since my last attempt.
Thanks, I haven't seen it before. It's really quite close to the state on my home machine:
It's not that I couldn't eventually resolve the current state, it's that at the moment a few programs I regularly run work, in spite of the different dependencies they have, and I know that starting "cleaning up" will cost too much time.
> Thanks, I haven't seen it before. It's really quite close to the state on my home machine:
Hmm. I've had machines with similar states, though never with both Anaconda and Homebrew.
Instead, I often had (in addition to the rest of the system-level snarl) isolated per-app Python interpreters that were specified & constructed with Buildout for development testing & deployment.
Though I don't quite get the arrow from Homebrew 2.7 to Python.org 2.6. Is that actually a thing or hyperbole?
It's fiction, of course. Art often has to exaggerate to make a point. I exaggerated too, just to remain in the artistic rather than the technical spirit of the comics. But there are some important bits of truth there, like in every good joke.
I thought C and C++ might have similar issues since there's no unified package management there either.
That is some reasonable way to deploy to someone who is not a programmer nor tech person.
Not sure how much has changed since this was written: http://effbot.org/pyfaq/can-python-be-compiled-to-machine-co...
PyInstaller.. well.. it works okay with std.. sometimes it does not. It is the Electron solution.. add pandas to a project and you get a 600MB install.
As developers we can manage to deal with package managers.
1. Install foo
2. No not that foo!
3. Reinstall python
4. Goto 1
Same customer, another project, I'm experimenting with a deployment system based on git format-patch. It copies the patches on the server (we have only one server) and applies them with patch. Then restart the web app.
It's fun to learn the internals by rewriting the tooling but a good tooling to start with would be better.
If anything, this further corporate influence on Python development is not something to be applauded. I bet there will be lots of more churn in the next five years, big announcements and no results.
I guess he is going for a Diocletian cabbage farmer style retirement.