Python has a lot of problems that really slow down development, but they are all fixable.
The biggest issue, in my opinion, is in dependency management. Python has a horrible dependency management system, from top-to-bottom.
Why do I need to make a "virtual environment" to have separate dependencies, and then source it my shell?
Why do I need to manually add version numbers to a file?
Why isn't there any builtin way to automatically define a lock file (currently, most Python projects just don't even specify indirect dependency versions, many Python developers probably don't even realize this is an issue!!!!!)?
Why can't I parallelize dependency installation?
Why isn't there a builtin way to create a redistributable executable with all my dependencies?
Why do I need to have fresh copies of my dependencies, even if they are the same versions, in each virtual environment?
There is so much chaos, I've seen very few projects that actually have reproducible builds. Most people just cross their fingers and hope dependencies don't change, and they just "deal with" the horrible kludge that is a virtual environment.
We need official support for a modern package management system, from the Python org itself. Third party solutions don't cut it, because they just end up being incompatible with each other.
Example: if the Python interpreter knew just a little bit about dependencies, it could pull in the correct version from a global cache - no need to reinstall the same module over and over again, just use the shared copy. Imagine how many CPU cycles would be saved. No more need for special wrapper tools like "tox".
I've always seen it like this: Not everyone builds reproducible software with Python (or in general) and how you handle dependencies can vary. Python leaves it open how you do it: globally installed packages, local packages, or a mix of both.
In the end, it needs to find the import in the PYTHONPATH, so there's no magic involved, and there are multiple robust options to choose from.
So instead of bashing Python for not shoveling down an opinion on you, it's up to the developers to choose which tools they want to use.
If they don't choose one and are unable to freeze their dependencies, it's not a Python problem, but IMO lack of skill and seniority.
You can have both: provide a sane default for most users and allow people to roll their own.
The reason why Python gets extra criticism for this is because it likes to tell people that there should be one obvious way to do it and that it comes with batteries included yet it's dependency management system is just crap and doesn't follow that at all.
Yes :-) It's fair to say Python's approach to dependency management doesn't follow the Zen of Python, but there's a simple way documented in the tutorial:
https://docs.python.org/3/tutorial/venv.html
The fact that there's more than one way to do things in Python is why i've found it so easy and flexible, I have no idea why that goober put this motto in the zen
It's general design guideline and I like Zen of python PEP-20. Explicit is better than implicit and most packaging system in python are explicit which I like. Been using it for over 15 years after perl and been happy with it.
Nothing to complaint as every language has their own set of good and bad. This is what makes it interesting, there is always a room to improve and make things better.
I think they could learn a lot from Rust, which has a very usable, clearly defined way of listing and making dependencies. You can decide how you want to handle individual dependencies (version number, version range, git commit hash, wildcard, etc). I'm not sure how binary dependencies work (i.e. something from your system's package manager), but I've used projects that use them, so the problem is solvable.
Python has always stood out for me as a particularly odd way of doing it. It feels a bit like more like C, but with a package manager that's not quite as nice as other scripting languages have.
It's from the days when Perl was Python's main rival (the late 90s / early 00s). Perl has complex syntax and the"there's more than one way to do it" motto.
Syntactically, and especially in early Python, there were fewer ways of doing things than in Perl and Python people saw that as a positive.
Wait, are you seriously complaining about executing code you downloaded from the internet, that installs a package manager - i.e. a piece of software that downloads executable code from the internet?!
I think what the comment you are replying to are getting at is the fact that installing pip packages from the Internet and importing them in your python app is not that different from piping code from the Internet into your python executable. In both cases python code from the Internet will be executed with your user privileges from within Python. Unless you audit every python package you consume, you might as well accept a curl https://example.com | python installer too.
Yeah, I hate this trend. Unfortunately, you can't pip install poetry because it needs to manage packages, so I guess a different way was necessary. Still, OS-specific packages would be nice, I guess they just need volunteers.
It’s running over HTTPS from an auditable source. Is that _really_ so much worse than a pip install, and can you explain in detail why you believe that to be true?
I teach my kids to use the right tool for the job, because using the wrong tool for the job can lead to injuries. But I violate this all the time, myself. It's just a good habit to get into.
"curl | bash" is a bad habit to get into. It works under certain circumstances, like making sure it's an SSL connection from a source you trust. But it's just a bad habit for the average person to get into.
Yes, funny, but seriously, where's the threat model where you've analyzed the risks of installing code from GitHub over HTTPS and found it to be less secure?
To be clear, either of these methods can have problems, it's not unique to curl and your shell of choice. Some of the better open source projects will say up front that if you are concerned about this kind of thing, feel free to read the installer script and decide for yourself if everything's kosher.
Yes, my point was that if you're worried about running someone else's code the answer is to audit that code rather than the transport layer. There are valid concerns with HTTP or in scenarios where something could be targeted to a single user, but neither of those are relevant to 99% of the time people raise this complaint.
There's always the risk that the script will fail to completely download and leave your system in a broken state. This can be mitigated against by the script authors by wrapping everything in a function which is called on the last line, but how do you know they've done that without downloading the script and checking first?
Do you believe GitHub has that infrastructure deployed? If not, this is a blind alley to worry about. If so, what other precautions have you taken to avoid compromised tarballs, unauthorized pushes to repos with auto-deployment pipelines, etc.?
The point is that in reality you’re orders of magnitude more likely to be compromised by ads in your browser, an undetected flaw in legitimate code, or a compromised maintainer than GitHub having deployed custom infrastructure to target you. If you’re being target by a government, why would they do this instead of using the same TLS exploit to serve you a dodgy Chrome or OS update which is harder to detect and will work against 100% of targets?
So because ads can compromise us we should ignore the security of package managers?
How about this for a reason, where are the checksums when I’m curling and piping? How do I validate in an automated fashion the validity of this file I’m piping into an interpreter? When installing a package it’s quite easy to have redundant copies of an index with checksums pointing to a repository hosting the actual code. The attack surface is much smaller vs a curl | python
This is bad practice, stop promoting it or downplaying it’s security issues.
HTTPS has checksums, and note that we’re specifically talking about installing from Github, where every change is tracked.
> This is bad practice, stop promoting it or downplaying it’s security issues.
I’m trying to get you to do some security analysis focused on threats which are possible in this model but not the real alternatives (download and install, install from a registry like PyPI or NPM, etc.). So far we have “GitHub could choose to destroy their business”, which seems like an acceptable risk and about the same as “NPM could destroy their business”.
HTTPS doesn’t know if the file changed on the server so that doesn’t count here.
I am doing security analysis. If this file changes and I’m using it in built server images then I have no way of automatically validating the changes are good without doing the checksumming myself and managing this data. What we have is a server that can be hacked and the files are unable to be verified by checksum
> Also installable via pip, but... "not recommended", and:
If you install it via pip you need to update it via pip, the alternative would be insane. And the reason it's not recommended is that it doesn't let you use multiple Python versions, but if you're only using one version then installing by pip works fine.
They could just as easily add the same code to setup.py, and then pip would run it as soon as you run pip install. There's generally no security difference between curl | python and pip install.
I agree. Most of the issues the parent mentions have been solved with poetry and pipenv.
And if you need "to create a redistributable executable with all your dependencies". You can either use pyinstaller [0] or nuitka [1] both of which are very actively maintained/developed and continually improving.
Pipenv is plagued with problems and issues. It takes half an hour to install dependencies to our project. The —keep-outdated flag doesn’t (didn’t?) work, so I don’t know if my pipfile is being modified because the constraints require changing versions or because the package manager is errantly updating versions to latest. There are mixed messages about the kind of quality the project aims for. I would not recommend.
Frankly I’ve been burned enough that I won’t use any new packaging technology for Python because everyone thinks they’ve solved it, but once you’re invested you run into issues.
Anyone considering it for production usage should note that package installs in the current versions are much slower than pip or Pipenv. This might affect your CI/CD.
Could you give some details as to why it's better than other more commonly used tools (pip, venv, ...)?
Looking at the home page it's not immediately obvious to me. For example, the lock file it creates seems to be the equivalent of writing `pip freeze` to the requirements file. I see a quick mention of isolation at the end, it seems to use virtual environments, does it make it more seamless? What's the advantage over using virtualenv for example?
I'm not an expert on the internals, but virtualenv interactions feel more seamless. When you run poetry, it activates the virtualenv before it runs whatever you wanted.
So `poetry add` (it's version of pip install) doesn't require you to have the virtualenv active. It will activate it, run the install, and update your dependency specifications in pyproject.toml. You can also do `poetry run` and it will activate the virtualenv before it runs whatever shell command comes after. Or you can do `poetry shell` to run a shell inside the virtualenv.
Python's dependency hell is what made me first look at Julia. I develop on Windows (someone has to :) ), and it was just impossible to get all of the numerical libraries like pydstool, scipy, FEniCS, Daedalus, etc. playing nicely together... so I gave Julia a try. And now the only time I have issues getting a package to run are Julia packages which have a Python dependency. Python is a good language, but having everything in one language and binary-free is just a blessing for getting code to run on someone else's computer.
I've had a good experience with pip-tools (https://github.com/jazzband/pip-tools/) which takes a requirements.in with loosely-pinned dependencies and writes your requirements.txt with the exact versions including transitive dependencies.
Same here, in my team we had immediate dependencies defined in setup.cfg when PR was merged, a pip-compile was run and generated requirements.txt and store it in central database (in our case it was consul because that was easiest to get without involving ops).
pip-sync was then called to install it in given environment, any promotion from devint -> qa -> staging -> prod, was just copying the requirements.txt from environment earlier and calling pip-sync.
Take my upvote. This has helped us a ton. So nice that it resolves dependencies. Only issue we're running into is that we don't use it to manage our dependencies for our internal packages (only using it at the application level). I've been advocating we change so that we simply read in the generated requirements.txt/requirements-dev.txt in setup.py
Late to the party but `pip-tools` also has a flag for its `pip-compile` flag: `--generate-hashes`. It generates SHA256 hashes that `pip install` checks.
"If two of your dependencies are demanding overlapping versions of a library, pip will not necessarily install a version of this library that satisfies both requirements" e.g. https://github.com/pypa/pip/issues/2775
This is what I've always done. Develop using a few dependencies, freeze, continue development with reproducible builds. It has always included the sub-dependencies in the list so, as far as I can tell, this works great for that case...
> nor does pip know what to do if your transitive dependencies conflict with each another
This is true, but because Python exposes all libraries in a single namespace at runtime, there isn't actually anything reasonable to do if they genuinely conflict. You can't have both, say, MarkupSafe 1.1.1 and MarkupSafe 1.1.0 in PYTHONPATH and expect them to be both accessible. There's no way in an import statement to say which one you want.
However, it's notable that pip runs into trouble in cases where transitive dependencies don't genuinely conflict, too. See https://github.com/pypa/pip/issues/988 - this is a bug / acknowledged deficiency, and there is work in progress towards fixing it.
It would change the semantics of the language. You could also write a sys.path hook to interpret the remainder of the file as Ruby and not Python, were pip so inclined....
(Also it's not clear what those changed semantics would be.)
Import system is pluggable, so the semantics are there to be customized. Sure, it could be abused (as many things in Python), but an import hook that checks for a vendored dependency with specific version, seems like a reasonable way to resolve the problem above.
But it changes the semantics of the rest of the language, e.g., if two modules interoperate by passing a type of a third module between themselves, and now there are two copies of that third module, they can't communicate any more.
Getting this right and reliable would be a) a considerable language design project in its own right and b) confusing to users of Python as it is documented, and in particular to people testing their modules locally without pip. It wouldn't be as drastically different a language as Ruby, but it would certainly be a different language.
>Pip freeze does not resolve transitive dependencies
How? Doesn't pip freeze literally list all packages that's installed in the current environment besides basic toolings such as setuptools (and you could even instruct it to list those as well)?
I'm not sure about conflict resolution when but I run a pip freeze it adds TONS of dependencies outside of the 2-3 I had in my app because those were the dependencies of my dependencies.
I think that is what you want. Having all your dependencies, including their dependencies, explicitly specified (including name and version) is what gives you reproducible builds.
Ruby does the same thing with Gemfile.lock. npm does the same thing with package-lock.json.
You can do that with any library. You can issue Django commands by running `python -m django`; that doesn't change the fact that Django is a completely separate project from Python.
It catches way too much. IPython, black and the testing libraries are _not_ a part of my actual dependencies and shouldn't be installed in production. A good UI for a dependency manager at the very least distinguishes between dev and production context, and ideally lets me define custom contexts.
Why do you want to update your dependencies if they work? Isn't the whole point of dependency management to avoid using different versions of dependencies than the ones they have been tested on?
Security fixes, performance enhancements, new features. There are many reasons. But the point is you update in a controlled manner. You don't just push the latest version of everything out on to prod, but you also don't keep pushing the same version that worked a year ago.
saddened to see this poorly constructed comment berating python at the top of this thread. the author seems to have some personal issues with the language given the generally frustrated tone of the comment. the entire comment could have just been 1 line "We need official support for a modern package management system, from the Python org itself." which would be consumed as constructive feedback by all readers with the right context. but somehow the author chooses to "vent" adding unnecessary drama to something that does not get in the way of writing high quality production grade python apps (general purpose, web, ai or otherwise)
there is no language that is devoid of shortcomings - so to any new (<3 yrs exp) python users, please ignore the above comment entirely as it has no bearing on anything practical that you are doing/will do. and all experienced python users know that there are ways to work around the shortcomings listed here and beyond.
> the author seems to have some personal issues with the language given the generally frustrated tone of the comment
What "personal issues" do you think the author has? The frustrated tone comes from the frustrations the author explicitly outlines; unless you think this shouldn't be so, you are turning this into an ad-hom.
> the entire comment could have just been 1 line "We need official support for a modern package management system, from the Python org itself."
Why? because you don't appreciate the detail on why we need such a thing? These issues certainly get in the way of producing production apps; not in the sense that they make it impossible, but they make the process harder and slower than it needs to be.
I actually quite liked that post. I use python maybe once a year or less, and don't enjoy the experience. That post distinguished some of the details which in my rare usage I see simply as a gloopy mess.
It is funny - half of the real desire/need for containers comes back to these sorts of issue with both node and Python. And then they bring in their own different challenges.
I have been programming with node for the last 3 years and I never had any dependency issues with node (at least for 3rd party dependencies). I cannot say that with python that requires using some tool be it docker or virtualenv to isolate them from the already installed ones.
Node's dependency managers npm/yarn just copy the versioned dependencies from their cache folder into the local node_modules folder and remove transitive dependencies duplicates when possible by flattening them into node_modules.
Lucky! I wrote a small internal app in node for my company that relied on an IMAP library. 3 months after launch, someone upgraded the library and my app stopped working. Stack traces were incomprehensible. No “how to upgrade” documentation in sight.
So I spent 2 hours and rewrote it in Java 8 with Maven.
Issues all gone. Node has some work to do before I’ll consider touching it again.
Having such a basic part of a programming language be awful is inexcusable. It's not just that it takes a lot of time; even if it took no extra time, you're still wasting extra space on your computer, risking breakage on external updates, and compromising security because you can't even tell what code you're running.
50% of C++ devs don't use a package manager and 27% rely on a system package manager [1]. You don't hear C++ devs complaining about these issues not because they're happy with the state of dependency management in C++ but because there's a very low rate of adoption for package management systems. That, and the state of dependency management in C++ was so bad for so long that it's viewed as a fact of life.
Also with C and C++ your dependencies compile with your code into a single binary, unless you explicitly opt into using a library, and when you do it becomes a package manager's issue not yours.
That's vastly oversimplifying the problem. "DLL hell" is a term for a reason. The vast amount of effort and complexity Microsoft has put into managing this problem is proof that dependency management for C and C++ is not a solved problem.
We definitely care about it. And I don't know why you think C++ developers can just push issues onto packagers when 50% don't use any kind of package management system. Meaning compiling libraries from source.
If a project you want to depend on isn't using a dependency management framework, how would you then make it work in your project? You will have to do extra work to define the transitive dependencies!
What needs to happen is standardization - this has been done in java because of it's maturity. There's almost no java project that isn't using the standard maven dependency management (even projects that don't use maven, such as gradle projects, would use maven dependency management, and export themselves as an artifact usable via maven).
Javascript has an even worse problem, so python isn't alone me thinks...
First of all, Poetry locks every dependency (even transitive ones) to the version you know works. This solves the problem of the project not using dependency management properly.
Secondly, setup.py allows you to specify your dependencies there, so most libraries use and specify that, which means that that isn't that much of a problem in Python. Sure, sometimes it is, but I haven't run into that particular problem very often.
Last time I gave it a go, I found it was pretty strongly welded to virtualenv, rather than using Python's own (much less problematic) venv. I came away less than enthused as a result.
(To be fair, modifying it to use venv is... non-trivial).
Pipenv has a good approach to management, similar to npm, but the implementation is buggy. It gained popularity by being recommended very early on in the official python documentation while being misleadingly advertised as production-ready.
If you look at the issues section in its github repo, you'll see that there are some pretty basic bugs there which are very annoying or disruptive. Moreover it seems the author has almost left the boat and a handful of contributors have to tidy things up.
Just to illustrate my point. I think a package manager that takes a few minutes to install a single tiny package, or don't prevent you from adding non-existing packages (e.g. spelling mistake), or doesn't let you install a new package without trying to upgrade all other packages isn't really production-ready. These are known issues since November last year.
Excuse me, but as a long time Python user I have to disagree. I started using Rust two years ago and Rust’s dependency managment is easily the best thing I ever saw (keep in mind that I didn’t see everything, so there is a chance there are better things out there).
The project-/dependency-manager Cargo¹ is more “pythonic” than anything Python ever came up with and where others mumbled ”dependency hell is everywhere” the Rust people seem to have thought something like: ”there must be a way to do this properly”.
The whole thing gives me hope for Python, but angers me everytime I start a new python project. Poetry is good, but there should be an official way to do this.
It saddens me to see, that some people seem to just have given up on dependency managment alltogether and declared it unsolvable (which it is not, and probably never has been).
There's a pattern to this. The later the dependency manager was created, the better it is. This is a hard problem space where each new language got to use the lessons learned on the earlier ones.
Cargo, though, has a silver bullet. If it can't find a solution to determine a single version for a package, it simply includes more than one version in the object code. That would take a lot of work to duplicate in Python.
Unfortunately, pip came along and took over, despite lacking support for that. It would have been nice if a better packaging tool had replaced easy_install, but alas.
They definitly gave this a thought while designing the library system (”crates”) for the language. I am not sure if it is feasible to retrofit such a solution to something like python.. Python 4 maybe?
Rust's cargo, JS's yarn and the grand daddy of them all Ruby's bundler address all these issues. Even newer versions of Gradle support a workflow where you specify the versions you know you want and just on everything else, including transitive dependencies, down.
This is the other frustrating thing: there is this stockholm syndrome effect, because people are so used to dependency management being horrible, they think there are just no good dependency management systems, and they give up.
GitHub is still the host for many of them, but there are Modules, so you get proper versioning and all that even when the place you end up getting them from is GitHub.
I think ruby is alive and well for a lot of startups. I do think it is being squeezed on three sides though.
* From javascript. If you have a app like front end, you are going to use js. Why not have the whole stack be js and have your developers use only one language.
* From python for anything web + data science. Again, why not have your whole stack be in one language?
* From lack of hype. Rails is still evolving, but a lot of packages are not seeing releases (I have used packages 3-4 years old). This indicates to me that the energy isn't there in the community the way it used to be. I have seen the consultants who are always on the bleeding edge move on to elixir.
That said I have seen plenty of startups using ruby (really rails) and staffing when I hired for ruby wasn't an issue.
I do help run a local ruby meetup and attendance is good but not exceptional (15-40 people every month). So that may skew my viewpoint.
"From JavaScript" also includes another side: When your frontend is in JS, your backend can be a simple REST API. And building a REST API requires much less framework than building a server-side-rendering webapp does, so it's tempting to use Go or Rust or whatever you like.
You’ll need (probably) at least:
-Database connection
-An ORM
-Middleware against attacks / rate limiting
-Caching
-Jobs / workers
-A rendering engine for email and maybe pdf
-Some sort of admin/backend
-Logging
-Validation
I’ve written an API once from scratch. Actually twice. First time in Modena, because it was all the hype, but it was arcane. Then Sinatra, where I ended up creating all of the above. Rails is excellent for APIs.
Rust is nice, but I’m not sure if I’d like it for all of an API. I don’t like go. Crystal seems great, because it’s typed and it’s also super fast.
Agreed. I still think that ruby is great for jamming out an API (far better in terms of development speed than go or rust) but a lot of the great gems that can speed up development assume server side rendering. That plus the fact that go/rust/whatever are probably more "interesting" and faster (at runtime) than ruby is an additional obstacle (for ruby!).
I loved Ruby, but unfortunately it didn't hold on to any kind of "killer app" role after Rails clones showed up in other language. I've switched to Elixir/Phoenix in that space and not looked back.
With the rise in ML and data science over the past years, Python finally has a killer app that no other scripting languages come close to touching. I migrated completely from Ruby when I started dabbling in ML, Pandas, etc.
So, as someone who spends maybe 20% of their time hiring, it's still a very effective screen. You wouldn't believe how many people can't do it. People at big companies, respected places. It's surprising.
I used a different screen (having people make change based on an arbitrary amount, so if the input was 81, you'd return [25, 25, 25, 5, 1], as we were in the USA) and it was also helpful. I didn't track the number of people that it stymied though.
Yah, that's also a good one. I like the variant that asks how many different ways you can make change for a given amount and a given array of currencies.
(I always feel weird talking about interview questions publicly, but honestly anyone who prepares that diligently deserves to go to the next stage. If anyone's reading this because they're preparing for an interview with me and I ask this question, just mention this comment and I'll be impressed.)
I am trying to find a place in the industry - again, starting from RoR. I absolutely love Ruby. And all this talk of "Ruby dying" makes me feel sad. The rational thing to do is to move on, and learn something popular, like node.js but the more I see Ruby in action, I just can't pull myself away from it.
I had managed to get a job as a Java developer a long time ago, but at that time all I could do was barely write toy apps in Java and I had no exposure to stuff like design patterns. The whole experience left me in a bad place. Now after all these years, Ruby feels like a breath of fresh air, and the texts that I have come across on the subject - Design Patterns in Ruby, Practical OOD in Ruby, Ruby under a Microscope etc. have increased my interest in the language.
But more and more frequent articles on Ruby's decline are pretty disheartening.
Ruby's future may actually not be Ruby itself. Probably the major problem with Ruby is its performance, which is slow even compared to other interpreted languages. While I'm not sure it is really production ready yet, Crystal is very interesting -- it's a native compiled statically typed language that nevertheless feels very much like Ruby. Check it out if you haven't.
I saw Tenderlove's interview on SE Radio, about Ruby internals. He seemed optimistic, but also because he's been working on a performance related project for the past few years now. Anyway, I'm hopeful.
Python is uniquely ill-suited for dependency management compared to many other languages. For some reason dependencies are installed into the interpreter itself (I know what I just said is very imprecise/inaccurate but I think it gets the point across).
In JS, which also has a single interpreter installed across the system (or multiple if you use nvm), the packages aren't installed "directly" into the interpreter, which removes the need for things like virtual-envs, thus making life a lot easier. I wish Python did something like this.
That being said, pipenv is making things easier. However, I think pipenv is a workaround more fundamental problems.
Thanks for that! I am tired of messing with conda, virtualenv, etc. and since I use simple Makefiles to build and run most of my code, I can easily stick with the standard latest stable version Python installation when using your trick.
EDIT: a question: when I have to use Python, I like to break up my code into small libraries and make wheel files for my own use. How do you handle your own libraries? Do you have one special local directory that you build wheel files to and then reference that library in your requirements.txt files?
> How do you handle your own libraries? Do you have one special local directory that you build wheel files to and then reference that library in your requirements.txt files?
We didn't build wheels. We had a centralized git host (Gitlab, but any of them works) with all our libraries, and just added the git url (git+https://...) to the requirements.txt
It's not a "model," but if you're able to 1) use fewer dependencies 2) use stable dependencies 3) use dependencies with fewer dependencies, it helps with dependency hell. I've even made commits to projects to reduce their dependency count.
I have found that I would rather code my own versions of some libraries so I have control over it. Even if there is some extra long term maintenance and some up front dev costs, it's paid off already a number of times.
A little off topic, but this is why I really like a Common Lisp with Quicklisp: library dependencies are stored in a convenient location locally and the libraries I write can be treated the same way (with a trivial config change to add Quicklisp load paths to my own library project directories).
Indeed. We have some Python modules written in Rust. It needs Rust nightly, because pyo3 requires Rust nightly. The Rust crate relies on libtensorflow. Unit tests for the Python module use Python and pytest. And we use our own build of libtensorflow (optimizations for AVX and FMA).
The dependencies of such projects are easy to specify in Nix. Moreover, it's easy to reproduce the environment across machines by pinning nixpkgs to a specific version.
I installed Nix operating system on an old lap top earlier this year, and indeed it does solve a lot of development and devops problems. I retired this spring, so I only played with Nix out of curiosity, but if I still had an active career as a developer I would use Nix.
I agree that builtin tools suck for dependency management.
However a lot of the issues that you mentioned (such as lock file and transitive dependencies) can be handled by pipenv, which should be the default package manager
Python was a scripting language. All those problems are caused by people using it like something it isn't. Python has way outlived it's usefulness and it's about time we move on to something better.
The virtualenv thing just galls me. Sure, pipenv aped rbenv - appropriately, I might add - but until they supplant virtualenv as the recommended way to have separate environments, I'll pass.
.NETs solution to this was the project file, a configuration file that lists the compiler version, framework version, and dependencies (now including NuGet packages and their versions).
> The biggest issue, in my opinion, is in dependency management. Python has a horrible dependency management system, from top-to-bottom.
I agree, although a lot of it has to do that there's so much misinformation about the web, and many articles recommending bad solutions. This is because python went through many packaging solutions. IMO the setuptools one is the one that's most common and available by default. It has a weakness though, it started with people writing setup.py file and defining all parameters there. Because setup.py is actually a python program it encourages you to write it as a program and that creates issues, setuptools though for a wile had a declarative way to declare packages using setup.cfg file, you should use that and your setup.py should contain nothing more than a call to setup().
> Why do I need to make a "virtual environment" to have separate dependencies, and then source it my shell?
Because chances are that your application A uses different versions than application B. Yes this could be solved by allowing python to keep multiple versions of the same packages, but if virtualenv is bothering you you would like to count on system package manager to keep care of that, and rpm, deb don't offer this functionality by default. So you would once again have to use some kind of virtualenv like environment that's disconnected from the system packages.
> Why do I need to manually add version numbers to a file?
You don't have to, this is one of the things that there's a lot of misinformation about how to package application. You should create setup.py/cfg and declare your immediate dependencies, then you can optionally provide version _ranges_ that are acceptable.
I highly recommend to install pip-tools and use pip-compile to generate requirements.txt, that file then works like a lock file and it is essentially picking the latest versions within restrictions in setup.cfg
> Why isn't there any builtin way to automatically define a lock file (currently, most Python projects just don't even specify indirect dependency versions, many Python developers probably don't even realize this is an issue!!!!!)?
Because Python is old (it's older than Java) it wasn't a thing in the past.
> Why can't I parallelize dependency installation?
Not sure I understand this one. yum, apt-get etc don't parallelize either because it's prone to errors? TBH I never though of this as an issue, because python packages are relatively small and it installs quickly. The longest part was always downloading dependencies, but caching solves that.
> Why isn't there a builtin way to create a redistributable executable with all my dependencies?
Some people are claiming that python has a kitchen sink and that made it more complex, you're claiming it should have even more things built in, I don't see a problem, there are several solutions to package it as an executable. Also it is a difficult problem to solve, because Python also works on almost all platforms including Windows and OS X.
> Why do I need to have fresh copies of my dependencies, even if they are the same versions, in each virtual environment?
You don't you can install your dependencies in system directory and configure virtualenv to see these packages as well, I prefer though to have it completly isolated from the system.
> There is so much chaos, I've seen very few projects that actually have reproducible builds. Most people just cross their fingers and hope dependencies don't change, and they just "deal with" the horrible kludge that is a virtual environment.
Not sure what to say, it works predictable to me and I actually really like virtualenv
> We need official support for a modern package management system, from the Python org itself. Third party solutions don't cut it, because they just end up being incompatible with each other.
setuptools with declarative setup.cfg is IMO very close there.
> Example: if the Python interpreter knew just a little bit about dependencies, it could pull in the correct version from a global cache - no need to reinstall the same module over and over again, just use the shared copy. Imagine how many CPU cycles would be saved. No more need for special wrapper tools like "tox".
There is a global cache already and pip utilizes it even withing an virtualenv. I actually never needed to use tox myself. I think most of your problems is that there are a lot of bad information about how to package a python app. Sadly even the page from PPA belongs there.
It's not that bad if you use the right tools. The two main options are an all-in-one solution like poetry or pipenv, and an ensemble of tools like pyenv, virtualenvwrapper, versioneer and pip-tools. I prefer the latter because it feels more like the Unix way.
Why should Python have some "official" method to do this? Flexibility is a strength, not a weakness. Nobody ever suggests that C should have some official package manager. Instead the developers build a build system for their project. After a while every project seems to get its own unique requirements so trying to use a cookie-cutter system seems pointless.
They almost rescinded that in PEP 357 and ultimately did so in PEPs 468/469. PEP -23 updated the standard library to match, but not until 3.9.1.a. Until then, beware the various blog posts you'll find on Google talking about this concept on 1.x.
It was always an ideal to aim for rather than a strict rule. I don't see any of those PEPs changing the balance enough to claim the principle was dead (maybe a bit injured...)
In general, leaving such things open leads to a proliferation of different 'solutions', as multiple people try to solve the issue... leading to the additional confusion and cognitive load of trying to find a single solution which suits your use-case and works, when often none of them are perfect.
Sometimes a 'benign dictator' single approach has benefits...
The virtual env is really the thing that has stopped me from using python. It's a lovely language but the tooling around it needs a lot of help. I'm sure it will get there though. I mean if the js folks can do it, certainly python can.
If it is running on its own computer, for shell scripting.
If it is trying to process ML data, or running in some cloud provider, or deployed in some IoT device supposed to run for years without maintenance, then maybe yes.
Right, but when you're at that point in performance considerations you already have a team of specialists working on multiple angles in performance.
And precisely, for ML code all python libraries run extremely optimized natively compiled code. The language overhead is a minimal consideration. And for business domain code language performance is rarely the limiting factor.
If your team size is 1 then you're not doing yourself any favor thinking about performance beyond basic usability when dev productivity is a far higher priority.
Thanks, that definitely looks like useful data as a starting point.
1. What is the impact of a continuous long-running process? That is, if instead of trying to calculate a result and then shut down, I'm running a web server 24/7, what's the impact of an interpreted language over a compiled language? (Assume requests are few and I'm happy with performance with either.) This not models web servers but things like data science workloads where one wants to conduct as much research as possible, so a faster language will just encourage a researcher to submit more jobs.
2. According to https://www.epa.gov/energy/greenhouse-gases-equivalencies-ca... , 1 megawatt-hour of fossil fuels is 1559 pounds of carbon dioxide. The site you link calculates an excess of 2245 joules for running their test programs, which is approximately .001 pounds of carbon dioxide, or roughly what a human exhales in half a minute. (Put another way, if using the interpreted language saved even one minute of developer time, it was a net win for the carbon emissions of the program.)
> What is the impact of a continuous long-running process?
OK so you're asking about steady-state electricity consumption of a process that's idling? I would bet that it's still lower for a more energy-efficient language, but let's say purely for the sake of argument that they're both at parity, let's say (e.g.) 0. Now what happens when they both do one unit of work, e.g. one data science job? Suppose you're comparing C and Python. C is indexed at 1 by Table 4, and Python at 75.88. So even ignoring runtime, the Python version is 75 times more power-hungry than the baseline C. And this is for any given job.
> a faster language will just encourage a researcher to submit more jobs.
Sure, that's a behavioural issue. It's not a technical issue so I can't give you a technical solution to that one. Wider roads will lead to more traffic over time. What people will need to realize is that if they're doing science, shooting jobs at the server and 'seeing what sticks' is not a great way to do it. Ideally they should put in place processes that require an experimental design–hypothesis, test criteria, acceptance/rejection level, etc.–to be able to run these kinds of jobs.
> if using the interpreted language saved even one minute of developer time, it was a net win for the carbon emissions of the program
I don't understand, what does a developer's time/carbon emission have to do with the runtime energy efficiency of a program? They are two different things.
> What people will need to realize is that if they're doing science, shooting jobs at the server and 'seeing what sticks' is not a great way to do it. Ideally they should put in place processes that require an experimental design–hypothesis, test criteria, acceptance/rejection level, etc.–to be able to run these kinds of jobs.
Sure, but they don't, and perhaps that's a much bigger issue than interpreted vs. compiled languages - either for research workloads or for commercial workloads. People start startups all the time that end up failing, traveling to attract investors, flying people out to interview them, keeping the lights on all night, heading to an air-conditioned home and getting some sleep as the sun is rising, etc. instead of working quietly at a 40-hour-a-week job. What's the emissions cost of that?
> I don't understand, what does a developer's time/carbon emission have to do with the runtime energy efficiency of a program? They are two different things.
This matters most obviously for research workloads. If the goal of your project is "Figure out whether this protein works in this way" or "Find the correlation between these two stocks" or "See which demographic responded to our ads most often," then the cost of that project (in any sense - time, money, energy emissions) is both the cost of developing the program you're going to run and actually running it. This is probably most obvious with time: it is absolutely not worth switching from an O(n^2) algorithm to an O(n) one if that shaves two hours off the execution time and it takes you three hours to write the better algorithm (assuming the code doesn't get reused, of course, but in many real-world scenarios, the better algorithm takes days or weeks and it shaves seconds or minutes off the execution time). Development time and runtime are two different things - for instance, you can't measure development time in big-O notation in a sensible way - but they're definitely both time.
Correct, and computers continue running. I'm referring to the carbon emissions of the development project itself. The faster the development is done, the sooner you can get on with developing other things.
It's a valid objection to the statement they replied to. Saving developer time does not equate to lower emissions, so it is incorrect to call it a "net win".
Sure and trains burn fuel even when you aren’t using them. But if we look at you carbon footprint it doesn’t seem wise to factor in every single train on the planet in your specific account har because they don’t all air still when you aren’t using them.
When talking about the footprint of a company or a project, then you need to restrict the calculations to the resources they actually use. So if a project uses tools to get a product out quicker that means they’ve spend less human-hours, which have a co2 cost associated with them. Then you can weigh the cost of that tool versus the Human Resources both in a financial sense but also with respect to emissions.
The biggest issue, in my opinion, is in dependency management. Python has a horrible dependency management system, from top-to-bottom.
Why do I need to make a "virtual environment" to have separate dependencies, and then source it my shell?
Why do I need to manually add version numbers to a file?
Why isn't there any builtin way to automatically define a lock file (currently, most Python projects just don't even specify indirect dependency versions, many Python developers probably don't even realize this is an issue!!!!!)?
Why can't I parallelize dependency installation?
Why isn't there a builtin way to create a redistributable executable with all my dependencies?
Why do I need to have fresh copies of my dependencies, even if they are the same versions, in each virtual environment?
There is so much chaos, I've seen very few projects that actually have reproducible builds. Most people just cross their fingers and hope dependencies don't change, and they just "deal with" the horrible kludge that is a virtual environment.
We need official support for a modern package management system, from the Python org itself. Third party solutions don't cut it, because they just end up being incompatible with each other.
Example: if the Python interpreter knew just a little bit about dependencies, it could pull in the correct version from a global cache - no need to reinstall the same module over and over again, just use the shared copy. Imagine how many CPU cycles would be saved. No more need for special wrapper tools like "tox".