Hacker News new | past | comments | ask | show | jobs | submit login
Guido van Rossum joins Microsoft (twitter.com/gvanrossum)
1337 points by 0xmohit on Nov 12, 2020 | hide | past | favorite | 799 comments

So instead of BDFL for Python, he's going "make using Python better".

Congrats to him for finding something fun to do in retirement - dictators usually end up with a different outcome. ;)

I'm looking forward to seeing the future of Python - I think this move will be great for the whole community, and lets him push boundaries without being bogged down on the management side.

An official package manager with great dependency resolution would be fantastic. Or over take pipenv or poetry and sponsor it through Microsoft $$$.

The biggest hurdle to python right now is the stupid package managers. We need cargo for Python.

I think in general Python's biggest challenge is that it doesn't scale well. This is an agglomeration of issues around the same theme: bad packaging when there's a lot of cross-cutting dependencies, slow performance, no concurrency, typing as second-class citizens, etc. All of that is barely noticeable when you're just getting started on a small experimental project, but incredibly painful in large production systems.

I strongly suspect that devs' satisfaction with Python is strongly correlated with the size of the codebase they're working on. Generally people using Python for one-off projects or self-contained tools tend to be pretty happy. People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

What I've observed a lot is that many startups or greenfield projects start with Python to get an MVP out the door as fast as possible. Then as the scope of the software expands they feel increasingly bogged down and trapped in the language.

I work at Instagram, which is a O(millions) LOC Python monorepo with massive throughput and a large engineering team. It's actually quite nice — but our code is heavily, heavily typed. It would be miserable without the type system. Some of the older parts of the codebase are more loosely typed (although they're shrinking reasonably quickly), and those sections are indeed a huge pain to ramp up on.

Part of the success of IG's large Python codebase is FB's investment into developer tooling; for example, FB wrote (and open-sourced, FWIW) our own PEP-484 compliant type checker, Pyre [1], because mypy was too slow for a codebase of our size.

1: https://github.com/facebook/pyre-check

That's my major complaint about Python...

For it's age and popularity - the tooling is abysmal.

I have to rewrite too many things, that I expected to just be there for the age of the project.

And some things were only fixed now! dict + dict only started to work with 3.9!

dict1 + dict2 is syntactic sugar, you had been able to do `dict1.update(dict2)` since forever.

No, it's completely different: dict1+dict2 creates a new object and leaves inputs unchanged, dict1.update(dict2) modifies dict1.

you've been able to do splat for forever! https://dpaste.org/yU7s

Yeah, you could use my hairline to tell when I’m working with Python... I still haven’t figured out a good typed framework for it yet.

I also struggle with weak(er) typing in Python, compared to strong(er) typing in C++, Java, or C# -- or even VBA(!).

Frustrated like you, I wrote my own open source type checking library for Python. You might be interested to read about it here: https://github.com/kevinarpe/kevinarpe-rambutan3/blob/master...

I have used that library on multiple projects for my job. It makes the code run about 50% slower, on average, because all the type checking is done at run-time. I am OK with the slow down because I don't use Python when I need speed. My "developer speed" was greatly improved with stricter types.

Finally, this isn't the first time I wrote a type checking library/framework. I did the same for Perl more than 10yrs ago. Unfortunately, that code is proprietary, so not open source. :( The abstract ideas were very similar. Our team was so frustrated with legacy Perl code, so I wrote a type checking library, and we slowly applied it to the code base. About two years later, it was much less painful!

Try using Pyre! It's open-source. I use it daily at IG.

Have you guys published any whitepaper on this subject? These last few years working on moderately large python codebases with dynamic typing have been less than idilic.

"devop" here

> doesn't scale well.

Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.

> bad packaging when there's a lot of cross-cutting dependencies

Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.

> slow performance

Meh, again depends on your use case. If you're really into performance then dump out to C/C++ and pybind it. fronting performance critical code in python is a fairly decent way to allow non specialists handle and interface performance critical code. Its far cheaper to staff it that way too. standard python programmers are cheaper than performance experts.

If we are being realistic, most of the time 80% of python programs are spend waiting on network.

Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.

> no concurrency

Yes, this is a pain. I would really like some non GIL based threading. However its not really been that much of a problem. multiprocessing Queues are useful here, if limited. Failing that, make more processes and use an rpc system.

> typing as second-class citizens

The annotation is under developed. being reliant on dataclass libraries to enforce typing is a bit poop.

> People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

I work with a _massive_ monorepo. Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions. None of that is python's issues, its egotistical programmer not wanting to read other people's (un documented) code. And not wanting to spend time make other people's code better.

>Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.

This is very important. A lot of people think that just using go or rust or whatever other language is new fixes all of this. But with a big enough project, you'll find all the issues. It's just a matter of time.

Do not miss that one will find all of the language's pain points. I'd wager that a dynamically typed languages such as Python has quit a few more pain points at scale than a more principled language such as OCaml.

I love Python's bignum arithmetic when I write small prototypes for public key cryptography. I love Python's extensive standard library when I'm scrapping a couple web pages for easier local reading. But I would never willingly chose it for anything bigger than a few hundred lines. I'm simply not capable of dealing with large dynamically typed programs.

Now if people try Rust or OCaml with the mentality of an early startup's Lisp developer, they're going to get hurt right away ("fighting the language" and "pleasing the compiler" is neither pleasing nor productive), and they're going to get hurt in the long run (once you've worked around the language's annoying checks, you won't reap as much benefit).

If you'll allow the caricature, don't force Coq down Alan Kay's throat, and don't torture Edsger Dijkstra with TCL.

Though OCaml’s tooling pain points hurt at least as much as Pythons, even though I adore the language.

This is somewhat true - scaling is hard no matter what - but some things scale much better than others. I have been miserable working with ruby on rails codebases that are much smaller than java codebases I have been content working on. This is despite personally enjoying the ruby language far more than the java language.

> Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.

No, docker doesn't solve the fact that some packages just won't play nicely together. NPM actually does this better than the python ecosystem too since it will still work with different versions of the same dependency. You get larger bundle sizes but that's better than the alternative of it just flat not working.

Scalability is not just runtime, it's also developer time scalability. The larger the project, the more you have to split it up and write interface documentation between libraries - which adds complexity.

As for processing scalability - Python is OK, but it's considerably hampered by BDFL's own opinions. The result is a few third party libraries that implement parallelism in their own way. That functionality should be integral to the standard library already. The worst part is lack of standard API for data sharing between processes.

> packaging

Python's packaging issues only start with package management. Setuptools is a wholly mess of a system, that literally gave me headaches for the lack of "expected features". I hate it with every single cell in my body.

And then there are systems and libraries, where you literally cannot use docker (Hello PySpark!).

>read other people's (un documented) code

I lolled! Seriously... We get Python fanboys moan about how indentation makes everything more readable and it's a pleasure to write code in python. Give me a break!

its programmer being "clever"

When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years.

"When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years."

...I feel attacked.

I'm sorry, but as a fellow Python "devop" too, this really reads like empty apologism.

>Nothing scales well. scaling requires lots of effort.

Sure, just like all PLs have their flaws, and most software has security vulnerabilities. But it's a question of degree and the tendency of the language. Different languages work better in different domains, and fail in others, and what Python is specifically bad at is scaling.

If only for the lack of (strong/static) typing and the relatively underpowered control flow mechanisms (e.g. Python often using exceptions in their stead)... While surely all languages have pain points that show up at scale, Python still has a notable lot of significant ones precisely in this area.

>docker, poetry, venv...

Yes, and this is exactly the point. There's at least three different complex solutions, none of which can really be considered a "go-to" choice. What is Rust doing differently? Hell, what are Linux distros doing differently?

>If you're really into performance then dump out to C/C++ and pybind it.

If you want performance, don't use Python - was the parent's point.

>If we are being realistic, most of the time 80% of python programs are spend waiting on network.

This really, really doesn't apply to all of programming (or even those domains Python is used in). Besides, what argument is that? If it were true for your workload, then it would be so for all other languages too, meaning discussion or caring about performance is practically meaningless.

>Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.

Once again, this applies to all languages equally, yet, for example, Python web frameworks regularly score near the bottom of all benchmarks. I doubt it is because of the lack of bright programmers working in Python, or the lack of efforts to make the frameworks faster.

>Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions.

Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation.

You can always trace any given error to a single individual making an honest mistake, that's really not a useful way to think about this. It's about a programming language (or an environment) leading the programmer into wrong directions, and the lack of safety measures for misguided "egotistical programmers" to do damage. You can blame the programmers all you want, but at the end of the day, the one commonality is the language.

Now Python is still one of my favorite languages, and I think that for a lot of domains, it really is the right choice, and I can't imagine doing my work without it. But performance and large, complex systems, is not one of those domains, and I honestly feel like all you've said in Python's favor is that other languages are like that too, and that it's the fault of the programmers anyway.

I have thought about what you've written. I broadly agree. I didn't mean for my post to be a "python is great really", It was more to illustrate that all programming languages have drawbacks.

The is a points that I think I've failed to get over:

> Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation

I don't think I was arguing that point. of course all languages have their USP. My point I wanted to get across is that large python projects are not inherently hard to manage. That kind of scaling is really not that much of an issue. I've worked on large repos for C, C++, python, perl, node and as a punishment, php. The only language that had an issue with a large codebase was node, because it was impossible to build and manage security. The "solution" to that was to have thousands of repos hiding in github.

The biggest impediment to growth was people refusing to read code, followed swiftly by pointless abstractions. This lead to silly situations where there were 7-12(!) wrappers for s3 functions. none of them had documentation and only one had test coverage.

Very much agree. I oversee a relatively small python codebase, but getting good quality, safe code out of the developers in a controlled way is really hard - there are so many ways in which the language just doesn't have enough power to serve the needs of more complex apps. We have massive amounts of linting, type hinting, code reviews spotting obvious errors that would be just invalid code in other languages.

It's like getting on a roller coaster without a seat belt or a guard rail. It's fun at first, and you will make it around the first few bends OK ... then get ready ...

Of course, with enormous discipline, skill and effort you can overcome all this. But it just leaves the question - really, is this the best tool for the job in the end? Especially when you are paying for it with horrifically bad performance and other limitations.

Have you ever seen O(million) lines enterprise codebase that didn't suck?

This is surely anecdotic and very subjective, but I have (in Java and in C++; IIRC the exact versions were Java 7 and C++03), and the level of pain was lower than with a Python code base that was about one order of magnitude smaller. In the case of C++, the pain was mostly asociated with an ancient build system we used; the code itself was relatively manageable. There was almost zero template code, and maybe that helped (although in other occasions I've worked with smaller C++03 codebases that relied heavily on templates and I didn't find them that bad).

Not all codebases are equal and maybe I was lucky, but in my experience, using dynamic languages (or, to be exact, any language where the compiler doesn't nag you when there is a potential problem) doesn't scale well.

I've worked with an O(100k) line code base in Python that was pure torture. Honestly, I was so desperate for static-typing by the end that I would have preferred if it was all written in C++.

Large codebases are really hard reason about without types. I'm glad we now have projects like Pyre that are trying to bring typing to Python.

I've worked with Python for more than 15 years, usually with code bases 50-100k lines per app. The only time I have had real issues with types was a codebase I inherited where previous developers were using None, (), "", [], and {} all to mean roughly the same thing and sometimes checking for "", sometimes () etc. I couldn't handle it, so I put asserts everywhere and slowly found out where those things were coming from and sanitized it to be consistent.

There's some confounding issues that are often confused together tho.

large python code bases _could_ be written with well modularized, clean separation of concerns and composability. Or it could be written in spaghetti.

Using types _could_ help a code base from becoming spaghetti, but it's not the only way. I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.

No they can't, at least not with the same amount of effort. Of course, you can make everything good by throwing enough time and money on it, but that's not the point.

The issue is that to have a nice and well architected code base, you have to constantly refactor and improve - sometimes you need to re-arrange and refactor huge parts of the code. Without types _and_ tests, this is just not gonna happen. It will be unproductive and scary, so that people will start to stop touching existing code and work their way around it.

> I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.

That is the same thing. Because someone who wants great maintainability will also want a great type system (amongst other things).

A good carpenter never complains about his tools. He works around their limitations or uses something else.

The quality of the product is down to the skill of the worker either way.

We can assume buffer overflows are less common in Java than in C and I doubt that Java programmers are better craftsmen.

The same with types: they make some kind of errors much less likely though there is no silver bullet in general e.g., I much prefer a general purpose language such as Python for expressing complex requirements in tests over any type system (even if your type system is turing complete and you can express any requirement in it; it doesn't mean it is a good idea)

How often is a carpenter told to use this particular rusty saw or their work won't be compatible with everyone else's?

Everything interlocks in such intricate ways that you can't meaningfully choose your own tools, and working around problems only goes so far. And you can't repair your own tools.

There's also failures of the community to provide good guidance.

> desperate for static-typing

Can you explain why? I honestly don't know, because my experience with C++ was during school ~20 years ago, and since then professionally I've used mostly python in relatively small codebases where it's all my own code (mostly for data processing/analysis). Thanks!

(Although I did have to write some C code to glue together data in a very old legacy system that didn't support C++, much less python. It took a lot more effort to do something simple, but it was also strangely a really rewarding experience. Kind of similar to feeling when I work with assembly on hobby projects)

The main problem with duck-typing like python has is the lack of consistency between different objects that code has to work on. Different callers may pass objects with different sets of methods into a function and expect it to work. You run into the case where the object that was passed in is one with subtly-mismatched behavior from what your method expects, but you don't know who created it - it was probably stored as a member variable by something 10 callstack levels and 5 classes distant from what you're currently working on.

Static typing prevents that by telling you early where the mismatch is happening - some method calls into another with a variable of the wrong type, and that's where the bug is. It also allows tooling to look up the types of variables and quickly get information about their properties.

Got it, that makes sense. It also makes sense why I've not much been bothered by it in python since my relative small code bases don't have that many layers of abstraction laid on top of each other. I'm generally not working with more than 2,000-3,000 lines, and I can just about keep the basic structure in my head. (Unless it's been a while since I've had to revisit it... then I often hate my past self for getting "clever" in some way)

For these small code bases, static typing is still great (if you are used to it already) but the adverse effects of not having it usually show much stronger with a team (and not a single person). And yeah, if you keep the structure in your head, then you are good anyways.

> it was probably stored as a member variable by something 10 callstack levels and 5 classes distant from what you're currently working on

If you can define methods on an object dynamically in Python, it doesn't mean that you should. Monkeypatching is not encouraged in culturally in Python. Most often it is seen in tests otherwise, it is rare.

Nobody forbids using ABCs to define your custom interfaces or using type hints for readability/IDE support/linting (my order of preference).

Funny, I'm working on a project about the same size, and the overly aggressive type and value restrictions are the main problem that I struggle with daily.

I am literally working on two projects that are roughly 100kLOC each.

The Scala Spark project I can navigate, understand, test and consider to be average complexity... with some failures, unique to Scala.

The Python Spark project is barely readable.

People who built the Python Spark codebase are "experienced Python devs". While Scala codebase was built by people who used Scala for the first time.

(take this anecdote, as evidence for the poor tooling and guidance present in python community.... and BDFL's own failures)

I've worked on several separate projects of that size in C++ and Go. None of them seemed to achieve a similar mess as Python codebases with one or two dozen thousand lines seem to. OTOH, all the typing developments in Python should have helped? I don't have that much experience with them in enterprise setting.

I have - and it's not that bad. The key is you have to have someone coordinating and driving a shared vision for the codebase and patterns. But it's hard to find people with that sort of passion and drive to follow-through as it's a multi-year endeavor with politics all over.

Otherwise its a thousand implementations of the same 100-line piece of code interspersed everywhere.

It seems like quality code management gets passed over by (bad) management because it looks like it doesn't directly move the project forward.

Which is strange because those same managers may be full adherents to micro tasking projects in a project management system whose purpose is basically to do for the project what code management does for the code itself.

In my workplace, we've recently had leadership that appreciates these things, and the difference is night & day. Simple requests from "stakeholders" (I hate that term) are often filled in days, or same day, instead of weeks. I think it helps tremendously that the primary manager is also a coder herself, and still codes ~25% of her job.

That's the problem with some languages - they lack a visionary, that drives the overall understanding of how things should be structured.

I believe it's Guido that basically said - if you don't like how Python does it, then implement it in C. And that's how you end up with great C based libraries bound to python... and python is often used as a messy orchestrating language.


Or even worse - could you imagine how many lines that would be in C++ ?



And how much of those problems are an artefact of moving fast and getting things down.

I've seen the exact same scenario with other languages. The problem is that in a start up environment you are likely adding amd retiring more "features" at a speed that layers so much complexity that you can no longer reason about what business rules are actually valid any more.

I think that's part of it. There is a convention over configuration issue as well. A language like Go forces some patterns like package management and formatting unless you actively try to subvert it.

It wouldn't surprise me if many of these issues are self-selecting in the language communities as well.

I work on Python every day on a reasonably large code base and have none of the issues you’re talking about. I’m 10x more productive than similar C or Java projects.

Dependency management is about as easy as it is going to get. We have problems with our dependencies breaking stuff, but who doesn’t?

People talk as if packaging is a solved problem. It isn’t in any language. And then they complain that Python packaging changes too much. That’s because folks are iterating on a hard problem.

Do you handle deployment of this Python application? For me, that's where the pain points arise. I love writing Python, but deploying it does not spark joy at all, at all.

Here's some of the ways to deploy Python code:

- `curl -L https://app.example.com/install | sh` that downloads installer and runs for instance: apt/yum install <your-package>

- in CI environment on a VM: `git checkout` & `pipenv install --deploy`

- `pipx install glances` on a home computer

- just `pip install` e.g., in a docker container [possibly in a virtualenv]. For pure Python packages, it can work even in Pythonista for iOS (iphone/ipad)

- just copy a python module/archive (PyInstaller and the like)

- give a link to a web app (deployed somewhere via e.g., git push)

- for education: there are python in the browser options e.g., brython, repl.it, trinket.io, pythontutor.com

- just write a snippet in my favourite editor for literate devops tasks/research (jupyter-emacs + tramp + Org Babel) or give a link to a Jupyter notebook

- a useful work can be done even in a REPL (e.g., Python as a powerful calculator)

The fact that you have 9 different ways all with their own different problems is exactly the problem here.

Do you use a single program on all of your devices for all possible computer-related tasks? Do you see a fault in such logic? Why do you think the deployment space is any different: do you use kubernetes for everything?

I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.

> Do you use a single program on all of your devices for all possible computer-related tasks? Do you see a fault in such logic?

No. But if I talked about how I used 9 different word-processing programs, you'd see that as a problem, or at least an indictment of those programs. Deployment isn't that complicated.

> I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.

I use Maven/Scala and as far as I can see it covers all of them other than "give a link to a web app" which isn't actually deploying at all (and I'd still have used maven to deploy the webapp wherever I was deploying it).

I don't think there's any legitimate case for curl|sh, and I don't think there's any real reason for separate pip/pipenv/pipx (did you make that one up? Have I fallen for an elaborate troll?) - rather pipenv exists to work around only being able to install one version of a library at a time. Nothing's gained by having "just copy a module/archive" be different from what the tool does. Running in browser, notebook, or REPL can and should still use the same dependency management tooling as anything else.

If I want to deploy my code, I use maven. You can use curl (since maven repositories use standard HTTP(S)) or copy files around by hand, if you have a use case where you need to, but I can't think what that would be. If you want to bundle up your app as a single file, you can configure things to do that when publishing, but the dependency resolution, repository infrastructure, and deployment still look the same. Even if you want to build a platform-level executable, it's the same story, all the tooling just works the same. If I want a REPL or worksheet, I can start one from maven (and use the same dependency management etc. as always), or my IDE (where it's still hooked up to my maven configuration). If I want to use a Zeppelin notebook then there's maven integration there too.

Ever wonder why you don't hear endlessly about different ways of doing dependency management in non-Python ecosystems? Because we have tools that actually work, and get on with actually writing programs. It baffles me that Python keeps making new tools and keeps repeating the same mistakes over and over: non-reproducible dependency resolution, excessively tight integration between the language and the build tools, and tools and infrastructure that can't be reused locally.

My core problem is with C/C++ depedencies. Can you describe to me how you handle these when you deploy Python?

  - system packages (deb/rpm/etc)
  - binary wheels (manylinux)
  - building from source
plus some caching if appropriate

God, I wish that would work for me.

To take your examples in order:

1) system packages: almost always out of date for my needs

2) Binary wheels: I actually haven't investigated this much, maybe it will work (and if it does, I'll buy you a drink if we ever meet in person).

3) Building from source: this kinda proves my point about Python having poor dependency management tools if this is a serious response. In general, this would be much further down the rabbit hole than I want to go.

I use Anaconda exclusively and deployments (with virtual environments) have been fairly ok.

That said, I do run into trouble when I have a dependency that requires compilation on Windows (i.e. like the popular turbodbc) because say, a wheel isn't available for a particular Python version. Any time a compilation is needed, it's a headache. Windows machines don't come with compilers, so one has to download and install a multigigabyte Visual Studio Build Essentials package just to compile. Sometimes the compilation fails for various reasons.

Require gcc compilation is headache for installing dependencies inside Docker containers too -- you have to install gcc in order to install Python dependencies and then remove gcc after.

I think requiring local compilation (instead of just delivering the binary) is a UNIX-mindset that is holding back many packaging solutions. I think a lot of pain would be alleviated if we could somehow mandate centralized wheel creation for all Python versions, otherwise the package manager marks a package as broken or unavailable and defaults to the last available wheel.

Also if only we applied some standards like R's CRAN repo does -- ie. if it doesn't pass error checks or doesn't build on certain architectures (institute a centralized CI/CD build pipeline in the package repo), it doesn't get published -- the Python packaging experience would be much improved.

Yeah, if PyPi was as annoying as Cran with respect to new versions, then a lot of this pain would go away.

For those who don't realise, when there's a new version of R, anything that doesn't build without errors/warnings is removed from the archive.

This is really annoying if you want something to keep running, but it prevents the kind of dependency rot common to Python (recently I found a dependency that was four years out of date).

Curious to know what issues you have with deploying Python codebases. Out of all of the minor and major gripes I have with Python, deployment is not one of them.

To me, python deployments are painless, as long as you can stick to pure dependencies and possibly wheels.

Once a pip install needs to start compiling C, things do go way south very quickly. At that point you can install the union of all common C development tools, kernel headers and prepare for hours of header hunting.

I've done that too much to like python anymore.

Yup, yup. I deploy statistical models with Python, and these always have C dependencies.

Additionally, they are part of a larger application, which is mostly managed by pip, which means that I need both pip and conda which is where things get really, really hairy.

I actually blame Google and FB here, as neither of them use standard python dependency management tools, and many of their frameworks bring in the world, thus increasing the risk of breakage.

Adding data files via setuptools....

And putting them into a common shared directory.

Try doing that without writing convoluted code in your setup.py.

It is funky, but importlib package resources helps.

Production deployment is Docker all the time.

Deployment for development is just pyenv and virtualenv.

No concurrency? asyncio is great for I/O bound network stuff!

"No parallelism" is probably what was meant.

> "No parallelism" is probably what was meant.

Which is still wrong, of course, but "no in-process (or in-single-runtime-instance) parallelism" would be correct, as would "forking inconvenient parallelism".

Posix fork() doesn't really count, if that's what you mean...

Why doesn't Python's multiprocessing module (which uses fork by default on Unix) count? It literally exists for parallelism.

It's understood that you can have "parallelism" by running two copies of your program using basic system facilities like fork(), or even by buying several computers and running one instance of your program on each of them. That's not what is meant by a language "supporting parallelism". If it was, then every language ever designed supports parallelism and so the term is meaningless.

To claim that a language "supports parallelism", it has to do something more to facilitate parallel programming. I would say that parallel threads of computation with shared memory and system resources is the bare minimum. You can go the extra mile and support transactional memory or other "nice" abstractions which make parallel programming easier.

Saying that Python support parallelism because it has a fork() wrapper is like saying that Posix shell is a strongly typed language because it has strings and string is a type.

It doesn't use fork() on macOS anymore, because some of Apple's own APIs get broken by its use.

Pretty much any app that uses both fork and threads, has to jump through many hoops to make the two work together well. And this applies to all the libraries that it uses, directly or indirectly - if any library spawns a thread and does some locking in it, you get all kinds of hard-to-debug deadlocks if you try to fork.

So unless you have very good perf reasons to need fork, I would strongly recommend multiprocessing.set_start_method("spawn") on all platforms. No obscure bugs, and it'll also behave the same everywhere, so things will be more portable. Code using multiprocessing that's written to rely on fork semantics can be very difficult to port later.

You wouldn't fork() for performance, but for security reasons.

It's not wrong. If running two processes counts as parallelism, then everything does parallelism, and it becomes pointless to talk about it.

Then one should talk about how convenient the related abstractions are. I like the concurrent.futures library.

Concurrency is not the same as parallelism. Python has good concurrency support, I agree. Python (C Python) does not support parallelism however due to its Big Interpreter Lock which actively prevents any parallelism in Python code.

This was probably a conscious design decision on the part of C Python implementers and perhaps a good one. But we should not claim that Python is something which (actively and by design) it's not.

I use `concurrent.futures.ProcessPoolExecutor` fairly often. I handle inter-process communication usually through a database or message queue, expecting that someday I'll want to go clustered instead of just single-machine multiprocessing. I've been burned by implementing multithreading and then needing to overhaul to clustered enough times to stop doing it.

At O(million), the problem wrangling it has more to do with how well its architected and written than it being Python. Python is at least easy to read. Its major deficiency is the lack of annotation of parameters, and that's something that could now be fixed... but it isn't going to be fixed in that much historical code.

It you are trying to get performance out of it (which doesn't really hinge on whether it's a million lines of code), then Python might be the wrong choice. But you can always write it in Rust or C and give Python an API to the functionality.

I agree that packaging is a mess. Fixing that mess with modularization in Java took a long time, and most other languages have that problem, too.

I disagree. Python is not inherently easy to read.

Explicitness and naming standards screw up the clarity of any code... Not to mention the complexity when you get into OOP.

Also lack of switch-case statements dont help. (Workaround is either if statements or dict of functions to be called)

>People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

This seems to be the case with most languages, especially if good code control isn't practiced, and unfortunately that's not uncommon.

Is concurrency really an issue? Yes, you do not have threads, but you can launch multiple processes. Do you really need to habe shared memory for your concurrent needs (i think it is muuuch easier to introduce subtle bug into a shared memory concurrency (threads))

Could I ask you what language you use instead then?

This described is perfectly

We use poetry for apps in production. At this point I think that's the winning solution and as it continues to grow and improve I think it will overtake all the others in this respect.

People keep saying that about every new solution. But then another one comes along that's even better-er, and the previous one peters out.

The biggest need for a package manager and its ecosystem is continuity: the stance that new features and paradigms will be gradually shifted toward — without package-ecosystem incompatibilities, without CLI commands just disappearing (but instead, with long deprecation timelines), etc.

In other words, an officially-blessed package manager is one where, when something better-er comes along, it gets absorbed by the existing thing, instead of replacing it.

That is what the Python ecosystem is missing.

I don’t think it’s that another one comes along that’s better as much as it is the new “better” ends up missing some important corner case. Pipenv advertised itself as solving all problems, but once someone tries it in practice, they realize that it introduces a new problem: every little interaction takes literally 30 minutes for any non-toy project. I’ve heard mixed things about poetry, but I wouldn’t be surprised in the least of it failed to behave as advertised just because this has been my experience with every package manager I’ve tried to use. And it’s embarrassing when every other language has a package manager that just works.

EDIT: It was probably misleading to characterize Pipenv as advertising itself as solving all problems; it’s probably more correct to say that it’s significant weaknesses weren’t advertised and thus one has to invest considerably before discovering them for oneself.

Just a heads up to anyone who hasn’t looked recently: pipenv has been very actively worked on since earlier this year and has had four updates that fix a lot of issues. Earlier this year I would have said Poetry is better hands down, but after the updates and after using poetry and seeing some of its quirks, it’s a much closer matchup.

If it wasn't so opinionated it might have been more successful.

Just one example: you want your virtualenvs to be created in ~/.virtualenvs so that pipenv is a drop-in replacement for virtualenvwrapper+pip? Tough luck for you, Kenneth Reitz doesn't think that's how it should be done.

At least 3 or 4 times some issue I've wanted resolved I found in the issue tracker with the last message "we'll have to check with kennethreitz42 whether we're allowed to change that" and then silence for a year.

It could still catch up with poetry, but from what I've seen there's a fundamental mindset difference in how change requests are approached between pipenv and poetry.

Last I checked (3-4 month ago) Pipenv only cares about the situation when you are deploying code on machines (or containers) you have complete control over. If you're writing code for deploying on machines you don't have control over, via for example pip install, then pipenv isn't helpful while poetry supports this out of the box.

Interesting. I haven't used pipenv on any very large projects, but I'm surprised to hear about the slowness. With the (admittedly small) projects I've tried it, I found that it does more or less just work.

As I understand it, the problem is that Pipenv needs to resolve the dependency tree to do just about anything; however, the dependency tree is dynamic—to determine a package’s dependencies, you have to download the package and run it’s setup.py. To get the whole tree, you have to recursively download and run each package. So the cost is proportional to the size of the dependency tree, so it’s very plausible that it works fine for the smallest projects.

> The biggest need for a package manager and its ecosystem is continuity: the stance that new features and paradigms will be gradually shifted toward — without package-ecosystem incompatibilities, without CLI commands just disappearing (but instead, with long deprecation timelines), etc.

I disagree. I used to think that that's the problem, but having seen a few more cycles of it, the problem isn't that kind of commitment - after all, the whole python ecosystem enthusiastically jumps into the new thing, and Python people are used to relatively short deprecation cycles. The problems are the actual problems; every Python package manager is just embarrassingly awfully bad as soon as you try to use it for 5 minutes, presumably because they're developed by Python people who've never used a decent package manager and so think that no-one could ever need deterministic dependency resolution, once you've pinned a transitive dependency there surely wouldn't be any reason to ever want to unpin it, having the package manager coupled to the language version is absolutely fine, no-one could ever want a standard way to run tests ...

What the Python ecosystem actually needs is a single opinionated perspective on versioning that is followed by everyone, such as NPM's semantic versioning. In the absence of that I don't see how dependency resolution and thus packaging is ever going to improve in Python.

Guido is very opinionated...

He just doesn't care about package management.

Yes, Poetry should be the blessed package manager.

I only want Poetry to become the be-all-end-all of package managers if it turns out that Python really is never going to fix the core problems that have engendered so many of the hacks upon which Poetry (and its competitors) is precariously balanced. Pyenv and venv, for example.

If you were doing a green field redesign, how would you want Python to fix the core problems?

> Yes, Poetry should be the blessed package manager.

Last time I tried to use poetry (and this is why it was the last time I tried to use poetry), it ignored global pip settings and had no documented mechanism for its own settings (I believe poetry uses its own implementation of or captive install of pip) which made it completely unusable in a corporate environment with annoying SSL interception issues to work around where pip + venv worked.

Poetry is a much smoother experience when it works, though.

It will install a virtual env if you don't have one active and it will use the active one if you do. What global pip settings, for example?

Although I generally like it, the two major issues I have with poetry are the abysmal dependency resolution times and the handling of binary wheels.

> People keep saying that about every new solution. But then another one comes along that's even better-er, and the previous one peters out.

I think this is happening these days frequently. People try to cover all use cases and then end up in biting more than they can chew. It won't work that way. Good set of MINIMALS, is easy to maintain, sustain and extend.

Much of Python's growth has been driven by data science. Here, the conda package manager is pretty ubiquitous. Conda packages system and other non-Python dependencies (such as the CUDA SDK), removing the need for data scientists to resolve these non-trivial dependencies themselves. This is likely unneeded/ unwanted for production web app deployments.

Given the varied use cases for Python, the goal of a single package manager may be misguided.

My understanding is that the people who developed Conda would love to have stuck with pip, and originally wanted to see about upgrading pip to support their use cases. And it was GvR himself who told them that that wasn't going to happen.

That was a long time ago, though, when scientific computing was a small niche for Python. It might have been reasonable to say it's not worthwhile to take on all that extra work just to support the needs of a small minority of users. Fast forward the better part of a decade, and it turns out that scientific computing did not stay a small niche. I think that one could make a strong argument that, in retrospect, that brush-off did not end up ultimately serving the best interests of the Python community. It made the community more fragmentary, in a way that divided, and therefore hindered, efforts at addressing what has proven to be one of Python's biggest pain points.

Conda predates pip by perhaps a decade.

Really? The first release of pip was in 2011 [1] and the earliest release of Conda I can find is 1.1.0 in Nov. 2012 [2], and the first public commit (into an empty repo) was a month earlier [3].

[1] https://en.wikipedia.org/wiki/Pip_(package_manager)

[2] https://github.com/conda/conda/tags?after=1.3.0

[3] https://github.com/conda/conda/commit/c9aea053d8619e1754b24b...

May be anaconda that I'm thinking of.

Anaconda was released in 2012, as well. Conda is a tool that is part of anaconda.

This one: https://en.wikipedia.org/wiki/Anaconda_(installer)

I forgive myself, it's pretty confusing :D.

Ah. Name collisions suck.

Especially when they conceptually do the same thing.

And that's another issue with Python ecosystem

Why do you think Python is more susceptible than other platforms?

It is true that PyPi was designed before the author/project naming scheme popularized by github. Other than that I don't see a greater problem with name collisions in Python.

Susceptible - yes, all platforms could fall to this.

That's why a strong leadership in a community, or subcommunities, works well. Python lacked this leadership, that leads to millions of half-arsed projects that compete... without moving the whole platform forward. It feels like NIH syndrome has permeated Python. Hopefully that's going to change

I also use poetry for everything. I have 0 problems, things work on my mac, my interns pc, aws instances, I don't even see what problem people are having. Before that I was using pipenv, and before that just good old requirements.txt - there were a few occasional issues, but really not much even then. At this point, I suspect it is more about regurgitating a complaint than a real issue. But, I could be lucky and completely wrong...

- until a few months ago no way to sync an environment with a lockfile (remove packages that shouldn't be there)

- no way to check if the lock file is up to date with the toml file

- no way to install packages from source if the version number is calculated (this will likely never be fixed as it's a design decision to use static package metadata insetad of setup.py, but is an incompatibility with pip)

- no way to have handle multiple environments: you get dependencies and dev-dependencies and that's it. You can fake it with extras, but it's a hack

- if you upgrade to a new python minor version you also have to upgrade to the latest poetry version or things just fail (Something to do with the correct selection of vendored dependencies. May have since been fixed -- new python versions don't come out all that often for me to run into it. And in fairness the latest pip is typically bundled with each python so it avoids that issue)

I still use poetry because it's more standard than hand-rolled pip freeze wrapper scripts, and there's definitely progress (the inability to sync packages was a hard requirement for me but is not fixed) but it's not quite there yet

Interesting, i usually rebuild my env from pacakges so don't notice 1,2, or 3. I guess 2 should be fixable by poetry by including more from the toml in the lock file. Point 4 also didn't bother me as I in general just have the main and dev deps, this seems an easier thing to fix for poetry though. I actually have encountered 5 when fiddling around with pyenv.

If you don't need c or c++ dependencies it's ok. If you do, it's very very painful. To be fair, most of the DS libraries can be handled by conda, but if you need both conda and pip, then you're going to have a bad time. (Source: this is my life right now).

Oh man, this is my life right now, too. In my case, we're using tensorflow or tensorflow-gpu, depending on the host system and, unfortunately, only Conda offers tensorflow-gpu with built-in CUDA. Add to this that the tensorflow packages themselves are notoriously bad at specifying dependencies and that different versions of tensorflow(-gpu) are available on conda-forge, depending on your OS.

Tensorflow is the worst (along with ReAgent from FB).

I think it's because they have their own internal build systems, but they never play well with pip/conda et al.

One of my recent breakages was installing the recsim package, which pulled in tensorflow and broke my entire app. There's actually a recsim-no-tf package on PyPi, presumably because this happens to loads of people.

I see, I miss a lot of issues as don't use any GPU stuff, mainly flask + scipy and friends.... probably this is what saves me.

It's not even the GPU versions, even the CPU stuff causes issues.

The core problem is that pip will happily overwrite your existing dependencies when you attempt to install a new package.

If you don't know what the problems with pipenv or requirements.txt were, you're really not qualified to judge whether poetry has solved them or not.

You are reading wrong, it did solve my issues with reqs and pipenv. Also you certainly aren’t qualified to judge my qualifications.

Yes, Poetry is great! I avoided Python for a long time due to it's bad package management/environment handling situation, but Poetry solved all my problems there.

I wouldn't advise using anything other than the tools blessed by the PSF for mission-critical stuff. Using Poetry for local development is fine but don't build a huge infrastructure around it and don't use in production.

I migrated the CI/CD of my company to Poetry some time ago, it worked fine for some time until we needed a feature that Poetry didn't support. I submitted a PR adding the feature to Poetry but their sole developer was apparently taking some time off and the project remained without any development for several months.

I migrated the CI/CD to use my own Poetry fork but it was very cumbersome, Poetry has a very weird build system so forking it is not simple.

At this point, I realized that I was just wasting time. There is nothing that Poetry does that the other (old and stable) tools don't do. Poetry was the result of me falling for the shiny toy syndrome.

So I hear Poetry is the way to go these days for python.

But a plurality of the people I encounter in the Clojure community came there because leiningen (Clojure's package manager that uses Maven under the covers) "just works" and they got tired of having a tough time reproducing builds consistently on other platforms / OSs with Python; not to mention the performance gains of the JVM.

Python's package management is light years behind, the much hated Maven.

But if you fix Python package managers that will remove 50% of the audience for Docker. Think of the children! ;)

poetry feels like the closest equivalent to cargo that I've used. pipenv is better than the previous status quo but is still oddly unstable, with random new issues I encounter with every release. poetry "just works" for me, has better dependency resolution, and IMO has a nicer interface and terminal output to boot.

Could you elaborate on what issues you've had with pipenv? I've only had very good experiences with it, so I'm surprised how many people here seem to prefer poetry.

Regarding not being able to work outside the project root[0]: This is actually one of the things that I love about pipenv! Anaconda, for instance, has environments that are not tied to a directory and are referred to by name (rather than by a directory path) and I've found this to be an absolute nightmare and extremely cumbersome! Not everyone has 10 projects that can share the same environment. (Besides, I would argue they never should.) I, for instance, have 10 projects that all require a slightly different environment and it's much easier to type a generic `pipenv shell` on the command line no matter what project I'm in, rather than trying to remember the Conda environment's name time and again. (Besides, it can be easily automated using .bashrc.)

[0]: https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-l...

> sponsor it through Microsoft $$$

You don't want that. When companies "sponsor" things they try to take them over, unless it's a pure donation, which is rare. The community then drops out because a company is in control. Later the project is abandoned by then company. It's a slow death spiral.

I would love if Guido could create a new PEP for extending modules with generic namespaces, ala Perl/CPAN modules.

There aren't 15 different libraries for doing the same thing in Perl, there's 1. You never replace it, you extend it by making a new module in a hierarchical namespace. The same core library's code might not change in years while new extensions can keep popping up. So even if you think Requests sucks, you can make Requests::UserAgent which inherits Requests code and extends it / gives a better interface. And these can be written & packaged by completely different authors.

Then maybe Pypi wouldn't have 5,000 nearly identical yet mostly unusable modules, or modules with nonsense names.

I only know a bit about Python - in what sense is pip not a package manager?

It's a package manager in the vein of "old-school" package managers that came from Linux distros and whatnot. It maintains a global dependency chain across your entire machine. This can be good for security fixes in that you only have 1 copy of a package & everyone references it. This is not good for development because it doesn't provide a sandbox'ed environment for you to do development in (ala cargo as others have mentioned). It also causes issue if you try to install 2 packages but they rely on incompatible versions of a popular package meaning you have to choose which package you want installed.

Some of this has been mitigated with virtualenv but having a project express it's packages & have that automatically reflected in the environment.

Finally, Cargo to my knowledge actually lets multiple dependencies exist (even within the same project!!!) so that you can have a dependency like:

                        dep1 -- dep3 <= v1.6
< my awesome project> -------- dep3 >= 3.0 \ dep2 -- dep3 >= 2.0

That's not possible if you don't have the right language hooks because module resolution needs to be aware of the version of the library (i.e. when you go `import numpy`, it actually needs to be aware of the package it's being imported from to resolve that correctly).

Now whether or not it's a good idea to support this kind of dependency stuff can be controversial. In practice though clearly it does cause problems the larger your codebase gets as you're more likely to have some nested dependency chain that's time-consuming to upgrade so you'd rather move faster than make sure you're only running 1 version of the dependency.

For some reason I've run into this type of dependency chain issue many times in JS, but have never run into it in Python despite using both languages pretty heavily. Maybe because the JS ethos is to change things so quickly, if you're not making a major breaking change to your package's API every year or two then you're getting left behind (only kind of joking). Also probably because the standard library is so small in JS (or at least it used to be, and may projects want to be compatible with at least some older browsers) so the average number of dependencies that a typical library has is probably much higher than in Python.

I really don't get the fuss about the global dependency management though, maybe I would change my mind if Python shipedp a great implementation of it. But I feel like the problem is already solvable in multiple ways with containers, VMs, or virtualenvs and I don't think yet another abstraction to separate environments would add much value to my day to day workflow building python apps.

This is my experience too. I’ve never actually encountered a global dependency conflict with Python pip though in theory it’s possible. But I have encountered the same version conflict problem in dependency chains that exists with Node npm.

And yet, I hear so many more complaints about Python pip and I really don’t understand the disconnect. Perhaps dislike of pip is actually triggered by usability issues? And then people look for other reasons to explain their dislike?

It happens with the data science/scientific programming stack a lot, at least twice in the last month I've pulled in a small dependency that changed my numpy version which broke everything.

Thanks. I suppose it could be related to the rate of change in the ecosystem. Python’s data science / scientific programming stack definitely changes faster than Python’s web stack which is currently my primary use case.

As mentioned elsewhere in this thread, resolving this dependency issue would require a change in the Python language itself.

Maybe you just don't rely on packages that have different update cycles.

If you're only working with the standard library, boto3, twisted and redis - you're unlikely to have issues. You get into big issues, when you get to more obscure libraries... or libraries that are C bindings.

> It maintains a global dependency chain across your entire machine.

There are per python binary, per user, per virtualenv installation (per project or per whatever you like) that make the conflict less likely.

Sometime packages "vendor" their dependencies e.g., there is `pip._vendor.requests` (thus you may have different `requests` versions in the same environment).

There were setuptools' multi-version installs https://packaging.python.org/guides/multi-version-installs/ (I don't remember using it explicitly ever -- no need)

> Finally, Cargo to my knowledge actually lets multiple dependencies exist (even within the same project!!!)

That's not pip's fault, that's Python's fault. Python's module system has no concept of versioning, so there can only ever be one copy of a module that has a given name.

And this is an interpreter detail that is exposed through the language itself, so it can't be fixed without causing severe pain.

> Finally, Cargo to my knowledge actually lets multiple dependencies exist

That's a bug, not a feature. It enables sloppy development and the disasters like on NPM

As I said above:

> Now whether or not it's a good idea to support this kind of dependency stuff can be controversial. In practice though clearly it does cause problems the larger your codebase gets as you're more likely to have some nested dependency chain that's time-consuming to upgrade so you'd rather move faster than make sure you're only running 1 version of the dependency.

Consider the view that some times it can be sloppy & other times it's not & it's impossible to distinguish between the two in an automated fashion.

How so? The version ends up being part of the type, so a type from two different versions of a given package are not compatible, which solves most if not all of the issues.

If a package is just using a particular library internally, I don’t see why the package manager should prevent using it with another library that depends on a different version.

> I don’t see why the package manager should prevent using it with another library that depends on a different version.

I do. The main reason for Linux distributions to exists is to provide a development and running environment where:

- API/ABIs do not change for the whole lifetime of the distribution. No new features, no new bugs, no new vulnerabilities, so that your production code can run reliably for 5+ years.

- Vulnerabilities are fixed with minimally invasive patches.

- Vulnerabilities are fixed in reasonable times even if the upstream development stopped. Patches are well tested against the set of packages in the distribution.

You simply cannot have these 3 features together if a distribution ships 10 different version of each library.

It's already a ton of work to maintain packages in stable distributions.

I’m confused by your comment. This is about a programming language package manager, not an OS package manager. Or was that just an example?

It does a bad job of dealing with versioning conflicts and multiple projects, so Python developers resort to hacks like virtual environments to get work done. Compared to Cargo or even Go modules, it's not a great solution. It's also missing lots of features that are standard in other package managers.

It may depend on you experience with each language. I have much more experience with Python than Go (almost none with Rust) and therefore I have much better time with python packaging tools (I don't remember a single issue, that I didn't find a satisfactory solution -- as much as possible in the packaging world with myriad conflicting use-cases)

For example, my experience with `cargo` (that I mostly use to install command-line utilities such as rg, fd, dust): it is great when it works as written in the instructions but sometimes it doesn't (running `cargo` may involve a lot of compiling -- in contrast to `pip` which can use wheels transparently and avoid compiling even for modules with C extensions -- I guess there might be a way to do something similar with `cargo` though not by default).

The need for a virtualenv has nothing to do with pip. Python only has “global” dependencies due to the way its import system works.

Ruby works the same way, but bundler dependency manager solves it anyway to give you per-project dependencies not just system-wide dependencies. (I believe other well-liked dependency managers like cargo are largely based on bundler's semantics).

Perhaps ruby was more "hackable" by bundler. (Bundler has now become part of ruby stdlib, but didn't start out that way, it definitely started hacking around the way the more fundamental stdlib 'rubygems' worked).

> Ruby works the same way

Kind of, if you ignore Rubygems, which is also part of stdlib at a lower level then bundler (and also, originally wasn't.)

> but bundler dependency manager solves it anyway to give you per-project dependencies not just system-wide dependencies.

It can do that because rubygems manages multiple installed versions of packages and allows per-project ("per call to require", potentially, IIRC) specification of which one to pull from the globally-installed versions (this was originally done by monkey patching require when rubygems was an add-on.) This lets bundler easily live on top of it providing per-project dependencies somewhat more smoothly than Rubygems does without requiring anything like a venv.

> Perhaps ruby was more "hackable" by bundler.

Ruby is ludicrously hackable, yes.

> (Bundler has now become part of ruby stdlib, but didn't start out that way, it definitely started hacking around the way the more fundamental stdlib 'rubygems' worked).

Rubygems also wasn't part of stdlib originally, and started out relying on hacking around the way Kernel#require works.

> It can do that because rubygems manages multiple installed versions of packages

Oh wow, the default python dependency manager only lets you have one version of each package installed system-wide?

Yeah, that is a limitation. As opposed to rubygems (the first dependency manager although as you say not originally built-in to ruby) which has system-wide install, but always let you have more than one version installed.

Without fixing that one way or another, there's no sensible way, true. virtualenv is certainly one way to fix it. I wonder if there would have been a more rubygems way to fix it.

Honestly I found the multiple versions approach more complex, confusing and more hacky. A virtualenv is just “node_modules” that also contains a Python executable.

It’s a directory - you delete and create them at will, fast, and don’t worry or care about the system Python. Having some crazy setup that patches “require” to handle concurrently installed package versions seems insane, especially if you cannot actually use them concurrently in the same Ruby process. So, segmenting them by project (aka virtualenv) seems like the best solution.

so they’ve just automated the virtual environment creation then. There’s nothing about pip that is global or not global. It unzips files from pypi into a directory. Python, not pip, doesn’t really support a “node_modules” style setup. We use virtual environments (venv/) which is somewhat similar.

Sort of. The equivalent of "virtualenv" would be more "rvm gemsets", which is what people did before bundler. Bundler is doing something different.

Bundler is not a "node_modules" style setup. It does not require dependencies to be in a local path (although they can be, the default is they live in a system-wide location, and this does not limit functionality). It also does not support more than one version of a dependency in the same execution environment (as node_modules does) -- that really would be impossible in ruby too.

It's possible something about python's design would make the bundler approach impossible, I don't know. But it's not "dependencies are installed globally" alone, as that's true of ruby too.

We would probably all benefit in understanding better how these things are handled in other environments. And I include myself here. I think ruby's bundler really set a new standard for best practices here, and many subsequent managers (like cargo) were heavily influenced by it, although many don't realize it. But meanwhile many don't even realize what they are missing or what's possible.

Like the basic idea of having a specification of top-level dependencies (including allowable ranges) separate from a "lockfile" of exact versions in use of ALL dependencies... is just so hugely useful I never want to do without it, and I think is compatible with just about any architecture, and yet somehow JS is still only slowly catching on to it.

Not quite true. Python import mechanics are quite hackable and controllable programmatically and externally via controlling the relevant PATH env variables. It's just that no one bothers with it and instead seems to rely on the set of global folder-path lookup mechanics that are standard.

But nothing stops you from masking global-libraries with local library versions (similar to node_modules). Why hasn't anyone done this you may ask? I don't know the answer to that.

There are a few reasons you can’t have multiple versions of the same module at the same time. Consider a simple enum class defined in a package: two versions now have two different enum objects which may not compare equally. Maybe a function from module A would return Enum.A and pass it to module B, which would then compare A.Enum.A to B.Enum.A, which fails. Super confusing.

So yes, Python’s import system is dynamic enough to do crazy things, but I don’t see how we can ever retrofit that into Python.

Regarding dependencies installed into a project-local directory (node_modules): that’s a virtual environment. Just more flexible.

What's wrong with virtual environments? The required tools come bundled with Python nowadays and are super easy to use.

Agreed 100%,in fact of the languages I work with regularly (Python, Java, C#, JavaScript, Go) Python has the simplest dependency management solution via virtualenv + pypi + pip. Not sure why every Python thread turns into a conversation about the pain of Python dep management... seems overblown.

The biggest piece that is missing is you have to go out of your way to get sane dependency management. I have only used JS and F# (same dependency management as C#) out of your mentions, and it's the official tools that enable the local dependency management with a single command ("npm install X" or "dotnet add package X").

If you're using python, you don't know to check out pyenv/etc until you have a huge mess on your computer due to pip's behavior.

Hm.... Let me see. The number of operations I have to perform in Maven vs Python for independent packages:

Maven Scala project - create skeleton, add libraries to POM, write app, run app

PIP Venv Python project - create venv, enable venv, create requirements file, write app, run pip to install dependencies(possibly install GCC and extra libraries), run app

(Oh... and god forbid that you forget to deactivate venv)

You're lying when you say that library management is easier in Python. It's just factually untrue.

You don't have to activate the environment. I never do; it's a strictly optional convenience (it you think it's convenient).

Instead, simply run the interpreter installed in the environment when you run your app, e.g. "./my_env/bin/python my_app.py", and things will just work. No activation required, no special mode, nothing to forget.

The part about requirements.txt and installing packages could also be simplified if you did it the other way around: install first and create the requirements file from that:

  $ python3 -m venv my_env
  $ my_env/bin/pip install some-dependency
  $ my_env/bin/pip freeze >requirements.txt
  $ my_env/bin/python3 my_app.py
There you go. Setup, install and run in four steps and zero modes.

That's literally one step more, than Maven.

That's before you get to package your app...

Most of the steps you're listing take only a few seconds. They're talking about the actual management, not how long it takes you to type . venv/bin/activate

A few seconds here and a few seconds there - it's death by a thousand papercuts.

There's no community consensus - that keeps Python from advancing to where it needs to be.

I said it once and I'll say it again - Python lacks mature tooling.

You can't have death by a thousand papercuts when there are exactly three delay-papercuts adjacent to a step that takes a significant amount of time.

There are contexts where little delays matter, and you didn't pick one of those.

Maven is far more complicated and burdensome when compared to virtualenv and pip... pom.xml, what a righteous mess of XML and overly specified nonsense.

This is just to run your app, to package it - it's a whole different headache in Python.

It's literally a `mvn jar` and that's it!

> so Python developers resort to hacks like virtual environments to get work done

It's really hard with many deps, it's why cabal (for instance) moved away from a global model.

> in what sense is pip not a package manager?

It is a package manager but it lacks features that many other package managers have in Ruby, Node, Elixir, and other languages.

For example there's no concept of a separate lock file with pip.

Sure you can pip freeze your dependencies out to a file but this includes dependencies of dependencies, not just your app's top level dependencies.

The frozen file is good to replicate versions across builds but it's really bad for human readability.

Ideally we should have a file made for humans to define their top level dependencies (with version pinning support) and a lock file that has every dependency with exact pinned versions.

FWIW I had a lot of success using https://github.com/jazzband/pip-tools to have dependencies automatically managed in a virtualenv.

* Basically I would have a single bash script that every `.py` entrypoint links to.

* Beside that symlink is a `requirements.in` file that just lists the top-level dependencies I know about.

* There's a `requirements.txt` file generated via pip-tools that lists all the dependencies with explicit version numbers.

* The bash script then makes sure there's a virtual environment in that folder & the installed package list matches exactly the `requirements.txt` file (i.e. any extra packages are uninstalled, any missing/mismatched version packages are installed correctly).

This was great because during development if you want to add a new dependency or change the installed version (i.e. pip-compile -U to update the dependency set), it didn't matter what the build server had & could test any diff independently & inexpensively. When developers pulled a new revision, they didn't have to muck about with the virtualenv - they could just launch the script without thinking about python dependencies. Finally, unrelated pieces of code would have their own dependency chains so there wasn't even a global project-wide set of dependencies (e.g. if 1 tool depends on component A, the other tools don't need to).

I viewed the lack of `setup.py` as a good thing - deploying new versions of tools was a git push away rather than relying on chef or having users install new versions manually.

This was the smoothest setup I've ever used for running python from source without adopting something like Bazel/BUCK (which add a lot of complexity for ingesting new dependencies as you can't leverage pip & they don't support running the python scripts in-place).

> Sure you can pip freeze your dependencies out to a file but this includes dependencies of dependencies, not just your app's top level dependencies.

Isn't that a good thing?

> no concept of a separate lock file with pip.

setup.py/.cfg vs requirements.txt, no?

> Isn't that a good thing?

Yes, a very good thing.

> setup.py/.cfg vs requirements.txt, no?

A lot of web applications aren't proper packages in the sense that you pip install them.

They end up being applications you run inside of a Python interpreter that happen to have dependencies and you kick things off by running a web app server like gunicorn or uwsgi.

For a Python analogy vs what other languages do, you would end up having a requirements.txt file with your top level dependencies and when you run a pip install, it would auto-generate a separate requirements.lock file with all deps pinned to their exact versions. Then you'd commit both files to version control, but you would only ever modify your requirements.txt by hand. If a lock file is present that gets used during a pip install, otherwise it would use your requirements.txt file.

The above work flow is how Ruby, Elixir and Node's package managers operate out of the box. It seems to work pretty well in practice for ensuring your top level deps are readable and your builds are deterministic.

Currently there's no sane way to replicate that behavior using pip. That's partly why other Python package managers have come into existence over the years.

I don't understand the distinction you're making. Are you pip-installing or not? If not, why not?

My method for deploying a web application is to have a Dockerfile which pip-installs the Python package, but I could see someone using a Makefile to pip-install from requirements.txt instead. In fact, I use `make` to run the commands in my Dockerfile.

> Are you pip-installing or not? If not, why not?

I am running a pip install -r requirements.txt when I do install new dependencies. I happen to be using Docker too, but I don't think that matters much in the end.

Docker does matter, because the Docker image should take the place of requirements.txt (your "locked" dependencies) in your deployment process. I suggest you pip-install the package, rather than the package's requirements.txt file.

> Docker does matter, because the Docker image should take the place of requirements.txt (your "locked" dependencies) in your deployment process.

In practice it doesn't tho.

Let's say I'm working on a project without a lock file and commit a change that updates my dependencies. I get distracted by anything and don't push the code for a few hours.

I come back and push the code. CI picks it up and runs a docker-compose build and pushes the image to a container registry, then my production server pulls that image.

With this work flow there's no guarantee that I'm going to get the same dependencies of dependencies in dev vs prod, even with using Docker. During those few hours before I pushed, a dep of a dep could have been updated so now CI is different than dev. Tests will hopefully ensure the app doesn't break because of that, but ultimately it boils down to not being able to depend on version guarantees with Docker alone.

There's also the issue of having multiple developers. Without a lock file, dev A and B could end up with having different local dependency versions when they build their own copy of the image.

I've seen these types of issues happen all the time with Flask development. For example Flask doesn't restrict Werkzeug versions, so you wake up one day and rebuild your image locally because you changed an unrelated dependency and suddenly your app breaks because you had Werkzeug 0.9.x but 1.x was released and you forgot to define and pin Werkzeug in your requirements.txt because you assumed Flask would have. The same can be said with SQLAlchemy because it's easy to forget to define and pin that because you brought in and pinned Flask-SQLAlchemy but Flask-SQLAlchemy doesn't restrict SQLAlchemy versions.

Long story short, a lock file is super important with or without Docker.

Use the same method to verify in dev as in staging (Docker image). If you don't know it works in staging, then you didn't know in dev either.

yes, but don't underestimate the power of convention.

if you make pip run 'pip freeze > requirements.txt.lock' after every 'pip install whatever', you almost solve that particular problem if setup.py is configured to parse that (it isn't by default and there's no easy way to do that!)

That's the whole point of distinguishing between logical dependencies and reproducibility dependencies. I use setup.cfg to describe the logical dependencies, I supply a requirements.txt (or environment.yml, or a Dockerfile) to provide the tools necessary to create a deployable build.

Isn't that effectively the result of a typical `setup.py -> pip compile -> requirements.txt` flow?

The setup.py file contains a human readable designation of requirements and then `pip compile` generates a requirements.txt with all deps' (and deps of deps') versions specified.

Honestly - My non-polite, personal impression of all the complaints is that they're borne from a very specific development environment. One that includes lots of dependencies similar to "npm", wants "docker style" local development, and the devs seem to think dependency-management is hard and you need complicated "semantic" versioning and version operators... but really it's just because they're working in a complex ecosystem of microservices.

But, for the other 99% projects though: Most of their dependencies won't break compatibility, you'll never uncover a hard version dependency that the package manager can't solve, you'll never need to "freeze" your dependency versions and you can pretty much just rely on a semi-persistent environment with all your necessary packages installed and semi-regularly updated. Essentially smooth-sailing.

GP comment was objecting to "stupid" package managers. Maybe they think pip is stupid, because it's unquestionably a package manager.

in the functioning sense.

care to elaborate?

I'd love to see Poetry take off. I'm watching it pretty closely.

Same, we switched from pipenv ~6 months ago, had not had to worry about package/env related stuff since then, "just works".

"evangelized" a sibling team also to consider switching, they were sceptical but just recently they mentioned they also like it more.

Same here. Poetry has been a joy, after many bouts of frustration with pipenv.

Python’s biggest hurdle is and always will be speed. Pip and wheels aren’t great but I would rather stop using golang. That’s a bigger win for me to stop maintaining proficiency and tooling for a language that I honestly consider inferior for the way I think about problems.

Strongly disagree. Python’s adoption is primarily driven by how easy it is to get started (barring some nasties in datetime, typing, and ofcourse the package manager).

Python is plenty fast for most automation tasks.

You are talking past the point I am making. No doubt Python has excellent adoption because the language is incredibly idiomatic and easy to understand -- thus why it is my favorite language. Python is great in all of the ways you just stated but it could be better and I am saying that speed for CPU bound tasks is the way it could be better. I am so tired of binning Python in favor of golang the moment latency becomes important, I would like to use it all the time.

Python is fast enough for many tasks, including number crunching for which it excels (as long as you are using the right libraries).

Projects like Cython allow to tweak without too much effort the parts of a program that need an extra-boost.

Last but not least, there have been discussions in the Python community in the last weeks of ways to speed up considerably the default (CPython) implementation.

Hopefully, all of this will bear some fruits in the next 2-5 years. Guido will probably help from his new position at Microsoft.

Only because libraries have to use C bindings to make it fast enough. I don't think Python performance is good enough when I have to stop writing in the language that we are benchmarking to get good results.

> discussions in the Python community in the last weeks of ways to speed up CPython

This piqued my interest. I found this, by Mark Shannon: https://mail.python.org/archives/list/python-dev@python.org/...

Is that what you mean?

Python's biggest hurdle is it's lack of good tooling.

Try building a Unix app, with data in standard Unix locations(bin, shared, lib, etc) and you'll find that you have to write custom code.

And Google searches don't help :(

I've never used anything I liked better than Maven Central + Gradle. Why can't everyone just copy the Java ecosystem when it comes to package management.

(Although the maven package ecosystem seems ideal to me, Gradle is just "good enough" - it's mostly the standards around versioning and tooling dealing with versioning and everything being there that makes it good to me).

Poetry is very good: https://python-poetry.org/

It should be heavily promoted as the official option and included with the distribution.

Microsoft already bought NPM so there's precedent.

Yeah, but do we REALLY want M$ to control, github, npm, node, rust, python, and ...

M"$"? 1999 called, they want their edginess back...

> M"$"? 1999 called, they want their edginess back...

You realize that we're potentially just one leadership change away from returning to the 'bad old days', right?

You're missing the big picture. In 1999, Microsoft's revenue was at least 10x that of the closest competitor.

It controlled the client (desktop) and was working on controlling the server, too.

Google, Facebook, Netflix, etc didn't exist. Amazon was much smaller and in a different niche. Apple was trying to not keel over in 2 months.

We're not a leadership change away from anything. The world has changed.

> We're not a leadership change away from anything. The world has changed.

The world has changed, but I didn't mean "the bad old days when Microsoft had a stranglehold over the industry" but just "the bad old days of an evil Microsoft". Their market power isn't relevant to whether they are a bad actor, only to how large their impact is as a bad actor.

We still have Idris and ATS.

Considering that half of all prominent functional PL researchers are hired by Microsoft Research I won't be too confident about that.

Cargo for Python… that's pretty much Poetry.

I was already in love with Python, but Poetry has definitely made me a much happier Python developer.

pip is completely fine

It is, unless you want to ship.

Just ship a container to the server.

Hmmm... Let me check if all of my users run a server on their computers.

Sure... Ship that docker to Spark master node and... do what exactly?

pip in virtualenv. It is solved problem.

Not even pip developers believe "pip in virtualenv" makes python dependencies and package management a "solved problem"...

90% of professional python developers believe that python dependencies is a solved problem (with pip and virtualenv).

The rest is too vocal.

> 90% of professional python developers believe that python dependencies is a solved problem (with pip and virtualenv).

I doubt that even 90% of those professional python developers who believe that python dependencies is a solved problem believe that it is solved with pip and virtualenv; the conda faction has to be bigger than 10%. Plus there's the people that think pip/venv aren't enough, but that tools on top of them plug the gaps (poetry).

But I think that the share of professional developers who see it as a solved problem at all is less than 90%. Obviously, we've all got some way of working with/around the issues, that doesn't mean that we don't feel that they exist.

If I can give my anecdote, the last 3 companies I worked at that were heavily using Python and the hundreds of developers in them were all relying on pip and virtualenv. And it worked just fine no matter what the HN crowd would have you believe.

conda had minor usage in the last one for building a handful of special projects mixing C++ and Python code (highly specific code in the finance industry), after build the artifacts could go into the python repository (internal pypi) and be usable with pip. Everything was down to pip at the end of the day. As a matter of fact, the guys who used and pushed for conda were also the ones pushing the hardest for pip because pip is the answer to everything.

Well - you can add one more to your anecdote - my current job is 13 active developers, 200+ megabytes of 1500+ .py files developed over 9 years by 30+ developers. It's all virtualenv/wrappers + pip.

Our data scientists like Conda - but our developers don't touch it.

Well, if the consumers are all devs, then sure.

My pet conspiracy theory is that Docker was created in part because Python code is un-shipable otherwise.

>90% of professional python developers believe that python dependencies is a solved problem (with pip and virtualenv).

99% of professional python developers think that you've pulled this statistic out of your ass!

> >90% of professional python developers believe that python dependencies is a solved problem (with pip and virtualenv).

> 99% of professional python developers think that you've pulled this statistic out of your ass!

72.6% of all statistics are made up, anyway.

Spot on!

I'm talking about Susie Q and Joe Sixer. The amount of fiddling with package systems, build, and container systems is an anti-pattern. One guy on a small team sets up and controls the stuffs. Individual contributors shouldn't be messing with the env or putting new packages in :)

I often wonder what problem is being solved by Poetry, that pip with virtualenv(+wrappers) doesn't solve perfectly well. requirements.txt ensures you have the right version of everything in production, and virtualenv lets you explore various versions of libraries without risking any destabilization of your development environment.

I struggle to see what spending time looking at Poetry will yield in terms of any actual benefit, though I would love to be informed/educated otherwise.

virtualenv always feels like a hack to me. Too many times I’ve forgotten to activate the virtualenv halfway through a project and now I’m troubleshooting all the problems I just caused with some packages installed in the virtualenv and some globally and oh half of them aren’t in my packages.txt so now I can’t remember which package I needed for this...

I don’t expect my dev tools to be idiot proof, but they should at least try to be “I’ve been hacking for 18 hours straight and I just need to commit this last line and I can finally go to bed” proof.

One thing that helps is: # Don't let people touch anything in the root environment export PIP_REQUIRE_VIRTUALENV=true

That prevents you from ever doing a pip install in your root environment.

I've got about 20 different pip environments, and virtualenvwrappers (workon xxxx) makes it pretty seamless for me to hop back and forth between sessions. I'm also pretty dedicated to doing all my work in tmux windows - so my state in a half dozen virtualenvs is changed by changing my window (which I've done a workon)

I guess what I'm really interested in is, "The last three years of my life I've used virtualenv/wrappers + pip, and haven't run into any problems. What can Poetry do for me, and why should I change my work habits? Genuinely interested in using new and better tools.

Same here. 20 years on python and I've never done anything more than clone a repo, venv and pip install -r requirements. And then cd && python -m to run the source code, maybe set PYTHONPATH in some cases.

I don't think I've even written a setup.py .

Obviously there's a whole world of development and deployment where these things are relevant, but there's also a massive world where nobody even understands what they are missing.

My current gig has a 9 year python repo, 3526 .py files over 228 megabytes, 13 (current) developers. It's all managed with virtuanenv/pip install. So - even larger projects seem to get by okay - would love to read something on Poetry that just says, "Here is why it's a lot better than virtualenv/pip"

Agreed. I don't get the fuzz about pip/setuptools/virtualenv. I have shipped and deployed Python code countless times, never encountered an issue a `setup.py` couldn't solve.

> An official package manager with great dependency resolution would be fantastic.

Need something like cabal. And a package index.

Conda works well.

Have never used cargo - what can cargo do that conda cannot?

Conda works slightly better than pip, which is a pretty low bar. Python package management is probably the absolute worst thing about that language.

Still, it worked perfectly for me for Python, and a few more things. So I ask - what problems does conda have? And which of those does cargo not have?

Alternatively, what does cargo do better than conda if they are not feature-for-feature comparable ?

This is probably a hard problem, but the dependency graph resolution in conda is a thing of nightmares. It is so appallingly slow even for relatively menial tasks that updating environments becomes an exercise in frustration.

I'm unsure what are the core issues, but in my experience cargo was always pretty quick to use and if it fails, it fails fast.

Conda on the other hand is slow for the simple cases, and if the graph becomes complex it will just churn for 15 minutes before throwing it's hands in the air and giving up with some cryptic error.

I suspect it comes back to the fact that packaging and dependency management was thought about upfront for Rust and the whole ecosystem was built well from the get go?

I've been meaning to write a blog post on this topic. That is, on the various speeds of various package managers. I don't know a ton about Conda, but when I was looking into Poetry, one of the core issues is: https://python-poetry.org/docs/faq/#why-is-the-dependency-re...

> This is due to the fact that not all libraries on PyPI have properly declared their metadata and, as such, they are not available via the PyPI JSON API. At this point, Poetry has no choice but downloading the packages and inspect them to get the necessary information. This is an expensive operation, both in bandwidth and time, which is why it seems this is a long process.

Cargo doesn't need to do anything with packages directly to do its job; everything it needs is in the index. This makes it pretty fast.

>> This is due to the fact that not all libraries on PyPI have properly declared their metadata and, as such, they are not available via the PyPI JSON API. At this point, Poetry has no choice but downloading the packages and inspect them to get the necessary information. This is an expensive operation, both in bandwidth and time, which is why it seems this is a long process.

This sounds like something that could be done server side, either by PyPI or another entity and expose it through a new API endpoint, instead of doing it on every Python developer's machine.

The core issue is that setup.py can be non-deterministic, so them doing it server side may not give the right results.


conda taking 15+ mins to resolve dependencies is by far its biggest weakness.

If it doesn't fail to "solve the environment" after 30 minutes that's already a win.

How do you ensure a team of 30 developers are all using the same version of python and all of the dependencies of your project with conda? How do you distinguish prod dependencies from dev dependencies? How do you update and manage transitive dependencies?

There are more, but those are the big three.

Conda manages Python itself as one of the dependencies - so that’s not actually a problem.

I used conda to manage 2.6, 2.7 and 3.3 side by side, a d that was fine. I never locked the patch version (e.g. 2.7.3 vs 2.7.5) though that is definitely possible.

It apparently requires a specific workflow - explicitly editing the requirements.txt file, rather than freezing it - which is not harder and which I was doing since day 1 but is apparently uncommon.

(And it worked well across a few tens of machines, some running windows and some running Linux, with a mix of os versions and distributions. So I know it worked well already in 2013, and I’m sure it works better now).

Speed is not good, but I never had it take more than a minute for anything. Some people here are reporting 15 minutes resolution times - that is a real problem.

You didn’t answer the issues I raised.

Actually I did.

You make sure everyone uses the same Python version by setting it as a dependency. I mentioned I only set the dependencies on minor version (e.g. 2.7) rather than path version (e.g. 2.7.3), but the latter is supported - for the python version as well as for any other package.

You make sure the exact versions you want are in use by editing and curating the requirements.txt rather than freezing it. It really is that simple, but somehow that's not a common workflow.

prod vs. dev is the only one I didn't address because I don't have experience with that - but I know people who manage "requirements_prod.txt" vs "requirements_dev.txt" and it seems to work for them.

How do you manage the specific version of transitive dependencies? For example, I install library A and it requires library B, but your code doesn’t use B directly?

A better question, what’s your workflow for installing a dependency and 6 months later updating that dependency?

Just put them in your requirements.txt even if you don’t depend on them directly.

That’s basically the whole idea of conda’s solver, I think. If your list works - fine. If n9t, it will find a set of package installs that makes it work.

I guess that’s also why my resolution times are way faster than what some people describe.

I treat requirements.txt closer to a manually curated lock file than a wishlist (which most req.txt files in the wild are).

This is a giant pain in the ass compared to every other modern language. Either you’re manually managing precise versions for all direct and transitive dependencies or you aren’t running exactly the same versions of libraries across team members and environments which would be horrifying.

There is a command (not freeze) that dumps the exact versions deployed, similar to a lock file. I never used it, because it doesn’t work well across operating systems (e.g. do it on Windows and try to use it on Linux). This is less of a Conda problem and more of a Python problem - same package name/version has different dependencies on different architectures.

But manually curating the dependencies has been painless and works fine for me and my team for almost a decade now.

I prefer that my tools do the busy time consuming manual work when possible. Every other language I use has better tooling than python for this.

And they were the biggest headaches on my team, bar none, of them all. An absolute nightmare to coordinate--and it seemed like even minor things that should be unrelated, e.g., a locally installed system (not python) package slightly off-version, could eat a developer's time trying to resolve.

conda can manage python versions itself as if it was another package.


i've never used it particular feature, but then i'm using python since 1.5 and i'm just used to it being a bit behind the times. stockholm syndrome, you might say, especially after trying out rust and seeing the work of art cargo is.

In addition to the other replies, one I've encountered is the case where conda doesn't provide a build for a package (thus, one must use pip or something else to manage that dependency) causing weird issues with respect to maintenance and updates.

The two worst are in the other comments: ensuring sync'd dependencies across multiple environments and developers, and the horrendous resolution times leading to a useless error message when failures occur.

Just work, instead of not working? https://github.com/conda/conda/issues/9059

Wow, color me impressed /s.

You found a bug report for a problem specifically on fish on MacOS, which can be resolved by "deactivate / reactivate" according to the discussion.

Were you trying to say something?

>> which can be resolved by "deactivate / reactivate"


Feature selection, dependency overriding, worspaces, to name a few.

conda has "environments" which are isolated from each other and can each have specific versions of python and dependencies installed in them (and each has their own "pip" local installation as well, in case something wasn't specifically packaged for conda).

What are those "workspaces" you refer to?

What is "feature selection"?

If only there was a way to find out about such things... (wistfully looks into the distance)

Package management and performance. Microsoft has done an impressive job of making software development better lately, and if any community needs it, it’s Python. Python has a lot of potential, but it’s sorely hampered by poor package management and performance, which are ultimately both symptoms of its commitment to exposing the entire CPython interpreter as a stable public interface. If the Python leadership were willing to commit to a narrower interface that met the goals of the c-extension community and also allowed Python to improve with respect to performance and package management (and I think this is very possible, hpy gets us close afaict), then we could have the best of both worlds, and I’m optimistic that Microsoft could provide this leadership.

What poor performance?

At this point, this criticism has become a "thing" that no one really expands on or enumerates as if it's a given. It's not a given, and at the very least is nuanced and complicated.

I’m not sure where you’re getting your information, but Python itself is quite slow (100-1000X slower than Go or Java) and it’s only “fast” when it’s dispatching to C or Rust or FORTRAN, and even then (to your point about nuance) these optimizations are only feasible sometimes; namely, when the cost of serializing is lower than the efficiencies gained by the optimizations available in the target language. This is all pretty widely discussed, including here on this very forum.

No doubt that Python is sometimes fast enough (e.g., vanilla CRUD apps that do all of the heavy lifting in Postgres), but sometimes it’s not and you’re left with really crumby optimization options. And since we rarely can know with certainty on the outset of a project whether or not the bottlenecks will be amenable to the optimizations afforded by Python, it’s a dangerous game. I would even go so far as to say that other languages have become quite good at many of the things that Python is good at (namely pace of development) while being much better at the things that it’s not good at (performance, package management, etc), so I actually wouldn’t recommend starting new projects in Python except for certain niches, like scientific computing (and who knows if Python will even retain its dominance there).

I'm curious about sources for your claim of Python being 100-1000x slower than Go or Java. While the lower bound of the cited range is realistic (but only for certain problems), the majority of the range is highly unlikely. According to my sources, the difference is not that dramatic: depending on a specific type of a problem, it ranges from 1.5x to 100x. For applications using relevant Web frameworks, the composite difference is just 4-5x.

As for scientific computing domain, I would start a new project in Julia rather than Python.

It’s a ballpark estimate from various real world benchmarks I’ve done over the years (I’ve spent a lot of my career optimizing Python, including rewriting things on occasion). In the case of web framework benchmarks, I’m guessing that the fastest Python web frameworks are almost pure-C, and the benchmark is probably running very little actual Python code (while the Go benchmarks are pure-Go) this doesn’t extrapolate well to real world applications where the web server is never the bottleneck and your bottleneck tends to be app code that isn’t easily rewritten in C. Mostly the 100-1000X figure is intended to compare pure-Python code with pure-Go or pure-Java.

I guess, your ballpark estimate is too rough (and/or Python 3 has improved significantly since then). The sources / benchmarks that I used to back up my claims [1-3] imply use of pure Python, not "almost pure-C". That includes Web frameworks (FastAPI that I compared to Java / Go frameworks is pure Python, which, in turn, is based on Starlette, a pure-Python implementation of an ASGI framework).

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[2] https://github.com/frol/completely-unscientific-benchmarks

[3] https://www.techempower.com/benchmarks/#section=data-r19&hw=...

Nobody would pick Python because it performs well. In many (and as performance of computers in general improves, more) cases, it performs well enough though. For one (admittedly absurd) example, where it just doesn't: a network device driver in user space: https://github.com/ixy-languages/ixy-languages

I like the way npm does. It has a cache of all downloaded versions and it tries to flatten the dependencies when possible. If there are incompatible versions they remain deep in the dependency hierarchy.

Npm has so many woes though. Love the idea but the execution leaves much to be desired.

Can you go deeper on this, honestly interested. I use NPM all the time have not run into too many issues.

Like what?

I had a job interview today and said this exact last sentence ahah

Maybe they should just adopt the Nix package manager, and put some of Microsoft’s resources into making it work on Windows.

Just use poetry (my preference) or pipenv. This has been a solved problem for a long time.

Better not move to NuGeT.eXe

Definitely this, but I doubt such a luminary of computer science will interested in such work.

The XKCD comic on Python package managers/Python environments is not an exaggeration. I've always wanted to get more into Python but every time I attempt to, it's this hurdle that dissuades me.

Edit: Also, I guess Poetry is another thing that came along since my last attempt.

> The XKCD comic on Python package managers/Python environments is not an exaggeration.

Thanks, I haven't seen it before. It's really quite close to the state on my home machine:


It's not that I couldn't eventually resolve the current state, it's that at the moment a few programs I regularly run work, in spite of the different dependencies they have, and I know that starting "cleaning up" will cost too much time.

> > The XKCD comic on Python package managers/Python environments is not an exaggeration.

> Thanks, I haven't seen it before. It's really quite close to the state on my home machine:

> https://xkcd.com/1987/

Hmm. I've had machines with similar states, though never with both Anaconda and Homebrew.

Instead, I often had (in addition to the rest of the system-level snarl) isolated per-app Python interpreters that were specified & constructed with Buildout for development testing & deployment.

Though I don't quite get the arrow from Homebrew 2.7 to Python.org 2.6. Is that actually a thing or hyperbole?

> Is that actually a thing or hyperbole?

It's fiction, of course. Art often has to exaggerate to make a point. I exaggerated too, just to remain in the artistic rather than the technical spirit of the comics. But there are some important bits of truth there, like in every good joke.

Of course. I just wasn't sure which bin that bit belonged to.

Is python the worst for package management?

I thought C and C++ might have similar issues since there's no unified package management there either.

What’s wrong with virtualenv + requirements.txt + pip?

What's wrong with poetry right now?

defaulting to installing and finding packages locally would be a big win for me.

I jumped down the black hole of pip + geo packages this week. Your comments are absolutely spot onm

Solve "How to deploy my sweet Python script on grandma's computer" problem.

That is some reasonable way to deploy to someone who is not a programmer nor tech person.

Not sure how much has changed since this was written: http://effbot.org/pyfaq/can-python-be-compiled-to-machine-co...

PyInstaller.. well.. it works okay with std.. sometimes it does not. It is the Electron solution.. add pandas to a project and you get a 600MB install.

As developers we can manage to deal with package managers.

In the education space (where Python is pretty big), the biggest problem we have is that Python code isn't 'shareable' so it's hard for kids to show off their creations to Grandma. Sure, there are some products which work to work around that (like repl.it), but the core problem remains - a student can't easily get their Python program to run on an arbitrary computer. It's why I'm very tempted to move to JS for teaching, despite all of the dragons lurking there.

I agree, it’s one of the reasons I stopped teaching my kids Python and switched to JS. It is so easy to share and so portable, and runs everywhere!

Not just to grandma. I have trouble deploying my scripts to my colleagues.

I had this with a collleague the other day:

1. Install foo

2. No not that foo!

3. Reinstall python

4. Goto 1

I did that to myself so many times...

Such a shame things like that used to be simple - Py2Exe was fine and Windows didn't freak the hell out at the sight of an unsigned binary.

I'm hoping for a Jupyter/Excel hybrid, but that's mostly wishful thinking.

That's arguably the goal of [Pluto.jl](https://github.com/fonsp/Pluto.jl). It's a (Julia-specific) reactive notebook that automatically updates all affected cells and has no hidden state.

That looks so good that I am thinking about learning Julia.

Well then I would like to through this your way. https://mitmath.github.io/18S191/Fall20/

Thx I will check that.

Ooh that's really good, thank you for sharing!

Please no. Excel is great when it stays within its lane and use case and doesn't try to be everything. Jupyter is okay-ish is some places but in general is way overused. Mixing them together would be a move in the wrong direction and a bit of a mess.

If you mean to retire VB in favor of Python for writing macros, I'm all for it. Not much Jupyter though

There's been a few projects in this space (PySpread is the first that jumps to mind), but also, not too long ago (last year maybe?), MS was investigating making Python a first-class inhabitant of Excel, so might already be in the pipeline.

Oh no.

At the top of my wish list there is a deployment tool for web applications like capistrano / mina for Ruby. I ended up writing my own for the project of a customer. It's been running for a few years now.

Same customer, another project, I'm experimenting with a deployment system based on git format-patch. It copies the patches on the server (we have only one server) and applies them with patch. Then restart the web app.

It's fun to learn the internals by rewriting the tooling but a good tooling to start with would be better.

I'm not sure that Python will profit form this. He has had 50% positions before and most of the new things he did were precisely in the period in which he did not work for a corporation.

If anything, this further corporate influence on Python development is not something to be applauded. I bet there will be lots of more churn in the next five years, big announcements and no results.

If I remember correctly, Guido resigned as BDFL in 2018.

This is good news. It's sad that I find a lot of interesting possibilities on Azure end up being "windows only." Hopefully having more support for Python will fix some of that.

>Congrats to him for finding something fun to do in retirement - dictators usually end up with a different outcome. ;)

I guess he is going for a Diocletian cabbage farmer style retirement.

Does that mean I will need to switch from mypy to pyright?

I hope they use Microsoft's money to make Python fast (or at least fast enough)

Why dictator? lmao I'm not familiar with the guy on anything else than he's PHP's creator

In the off chance this isn't a joke: Python, not PHP. And he wasn't just its creator, he was the head of the project for its whole life until his recent retirement. Typical term in the OSS community for one guy who is in charge of a massive OSS project is Benevolent Dictator For Life, or BDFL. And Guido was definitely one of those.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact