Hacker News new | past | comments | ask | show | jobs | submit login
“Python's batteries are leaking” (pyfound.blogspot.com)
552 points by narimiran on May 18, 2019 | hide | past | favorite | 409 comments

Note that similar issues were raised with Ruby stdlib, which is being addressed in part with “Gemification” of stdlib, so that all of stdlib (targeted for 3.0, though it's been going on since 2.4)[0] is being moved out to externally-updatable packages that are included by default (default and bundled gems), so that it is still “batteries included” but the batteries are at least replaceable.

Amber's suggestion seems to be in the same direction (though perhaps not as extreme.)

[0] https://www.slideshare.net/mobile/hsbt/gemification-for-ruby...

Python cannot be atomized effectively, and the issue is political.

The problem is that I cannot count on being able to install new software in many environments.

If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

Consequently, the standard libraries need to be very complete and very useful.

And, while people seem to love the Rust approach to libraries, I'm not necessarily a fan. Far too many times I have pulled a library that is "obviously" something that a language should consider to be "standard library" and gotten bitten because it was broken. Only VERY core libraries in Rust are guaranteed to work across multiple architectures and OS's.

I think Rust is probably doing the right thing for Rust as "batteries included" is NOT one of its tenets. However, that doesn't make it right for everybody else.

> If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

Can you explain this more? What kind of place do you work? I've had some experience with large, bureaucratic companies, but nothing ever so far as "you can't install any other libraries."

Not who you asked, but I work for a large international company, a big 4 professional services firm. I wanted to install anaconda and Jupyter on my machine (data science is not a part of my 'official' job description, but I wanted to see how much of my data exploration workflow I could speed up or automate). I had to go up three separate hierarchy ladders to get sign off. First, my own team, then IT, then our risk and quality team (thanks to our audit practice and a few historical issues with whistleblowers and leaks, the risk guys are pretty much the final arbiter of... Everything)

After about 9 weeks of emails, meetings, and pitches, I finally got Anaconda up and running. A week later, I tried to upgrade the 3rd party packages.. No dice. Blocked by the corporate VPN. I'd need the sign off every time I wanted to `pip upgrade` anything

Needless to say, I do not bother anymore.

I also work at a big 4, and perhaps the same one as you, and I assure you, there is a way. There is always a procedure or some way you can follow. You just need to know how to look it up.

We have our own development team, our own servers, our own freedom to deliver to clients fast without the hassle of the main corporation. How? We talked to the right persons.

Big 4?

I can't imagine myself working in a place like this..I understand there should be a level of checks, however this is just crazy...

Hierarchies and the systems or checks that they serve at places like this exist only to keep some people employed. That has got to suck! For anyone who has a rogue or novel idea will get shot down because it’s too much of a burden in terms of overhead to get any decision made.

I don't think your company deserves you.

Where I work, there is currently a push to get python on the computers that manage physical equipment operation. These computers are not allowed to connect to the internet, and have extremely limited connectivity to the rest of the business network. Installing anything new on them requires risk assessments like you wouldn't believe, since the consequences of malicious code could easily hit 10s of millions of dollars and a nonzero number of lives.

If your risk assessment says that the exact same tkinter outside of Python stdlib is riskier than in Python stdlib, maybe your risk evaluation process needs reevaluation.

I hear this sentiment frequently. Come on, one software engineer cannot steer the huge ship that is BigCo Risk Assessment. Well, they couldn't do that and the original task.

It might me more helpful to think of these types of external factors as fixed points that cannot be moved and just engineer around them.

You'll burn out if you try to boil the ocean on every business process that doesn't seem "logical" from your cursory examination.

On one hand, this is true. On the other hand, this is being put forth as a reason to not make a change in the entire Python ecosystem, and it's not really Python's job to bend over backwards for shops that have bad risk assessment either.

As long as you cannot even prove that due to a lacking python code signing infrastructure for packages (wheels can do it, but it is far from wide spread).

And setup.py is a trainwreck, e.g. some packages compile download and compile huge dependencies (e.g. a full Apache httpd...), the default compiler flags may lack all the mandatory security flags (e.g. for using ASLR on python 2.x), or ship their own copy of openssl statically and break your FIPS-140 certification that way...

And since setup.py is a Python file, you can't express build time dependencies properly. Pyproject.toml let's you do that, but it's new, nobody knows about it, and older pip clients don't support it.

Yes but it won't get it. And at the end if the day, people need to be able to get work done.

The corporate world is full of stupid things that will never not change, or take years to change.

Where I work the solution was to use a proxy to pypi. Basically an internal pip repo (and docker, npm, maven, everything else...). All internal apps go through the internal repository that creates a local version of the package from pypi. That gives the security / compliance folks a way to block packages with security issues, etc. and at the same time provide the developers flexibility to get most of what is needed.

In a large company this gives the compliance folks a central place to blacklist packages - along with a trail of what systems have downloaded the package to target for upgrades.

Many technical solutiins exist, but the problem is political or organisational.

Agree. At this point it was more a case of executives saying they wanted internal dev teams to use and contribute to open source and supporting orgs to come up with solutions on how that can be possible with a 0-touch approach. That’s what tipped the balance.

I disagree. tcl/tk is written in C and C can be compiled in very very badly indeed (from a security perspective).

Maybe a stupid question but can you ship code on the machine? If you can, what is stopping you from including the source of the library that you're trying to 'install'?

Many do, but you can get fired for it.

>What kind of place do you work?

Not OP but same. I'm currently debating with myself whether I should attempt to install PUTTY. Given that port 22 is blocked and it's not needed for my core role it'll be dicey if I get challenged.

Pulling executable code off some repo...no way that is ever officially passing muster. People might do it anyway, but on a personal risk basis.

>Can you explain this more?

Place that are heavy on confidential financial info basically. Practically everything I touch is confidential client data. So employer is naturally jumpy about what's on my laptop software wise.

Ironically the above comes full circle...need putty to get onto a VM in cloud where there are no restrictions and crucially no client data. Nobody cares what I do there - hell they'll even pay for it thanks for MSDN enterprise

I'm glad Windows 10 has OpenSSH now: > Microsoft Windows [Version 10.0.17763.437] > (c) 2018 Microsoft Corporation. All rights reserved. > > ssh -V > OpenSSH_for_Windows_7.7p1, LibreSSL 2.6.5

I've found the built-in Windows 10 ssh client has trouble with tunneling, so I still use the ssh client included with git.

If you can figure out how to reproduce the issue, I'm sure the team would accept a bug report at https://github.com/PowerShell/Win32-OpenSSH

netsh interface portproxy add v4tov4 listenport=8001 connectport=80 connectaddress=

this isn't useful for tunneling through a remote machine (which is what ssh does)

I've been checking but I can't add the module. Either it's slow to get to Enterprise version or it's blocked. Not sure.

Found a work-around though - Google cloud shell being *nix works fine for SSHing about the place. Gets me around the port fw too

Nice! Too bad our desktops are still on Windows 7 at the office.

I work for a major defense contractor, and while we have vanilla python we are categorically not allowed to download software off the internet and install it without authorization, even on the unclassified internet-connected corporate network.

Theoretically there's a process for requesting new software and getting it approved, but actually pushing it through requires getting one of my program architects to care enough to file the request (As a mere level 2 engineer all I can do is write it, can't submit), then potentially weeks of followup, for 1 specific version of 1 specific package.

In the case of python packages, perl and the perl packages we need are already approved because a few senior devs got together and pushed them through 10 years ago (was before my time, but I understand it was with quite a bit of arm twisting). It's more time-efficient to just code perl than to fight for python.

It's one of the many reasons I intend to get myself another job for Christmas. :)

As for why the system exists: Cost cutting, in the sense of "the less we invest in infrastructure the more we can divert to sexy hardware for the cameras and shareholder dividends. So long as it's theoretically possible for you to do your work, we don't care how many hoops you have to jump through to do it. And our competition is even worse than us, so we don't have to worry about anyone undercutting."

As a result all our infrastructure is centralized. Programs have to jockey with each other for everything from virtual servers to physical workstations and monitors. Hell the only reason my program has our primary test server is because one of our architects literally overheard a hallway conversation about a program that was spinning down and getting rid of some servers, so he jumped on it.

> Theoretically there's a process for requesting new software and getting it approved, but actually pushing it through requires getting one of my program architects to care enough to file the request

I worked for a much smaller government contractor, but before I left they were moving to a system where you needed approval from the customer in order to get new packages. (For those who don't work in this field, that means you are actually making a request to the contracting representative from the particular government agency for each package you want.) So it wasn't just in-house bureaucracy in the way of progress, and I generally just went without or wrote my own instead of trying to deal with it.

If you must work with one hand tied behind your back you are fucked, and your company is even more fucked. Your priority should be supporting your managers who want to get things done in the struggle against centralized IT managers who want to repress aspirations to work.

I'm afraid the standard library has to be aligned with the needs of more normal users who, as already discussed, want to allow libraries to have their own release cycles and to be more "opinionated" and specialized than the standard library would permit.

> the needs of more normal users

I'm afraid users dealing with that sort of bureaucracy are much more normal than you think, if not the norm. They're just usually not the types of folks that are hanging around HN, or they're at least less vocal.

At least in some places, local install permission may be available, but cross company install permission is much more difficult to get.

As in, it is very difficult to get software installed in general images or on multiuser servers.

There are obviously good reasons for it to be conservative about this.

As far as I understand this. It’s something like “You can install whatever libraries you want, but we’ll only ever install default python on user PC’s”

Last June I opened a ticket for our procurement department to have an open source license reviewed. It is still open. Our purchasing process is similar to [0].

[0] https://training.kalzumeus.com/newsletters/archive/enterpris...

Maybe just me but sometimes after initial install l, I later find a weird non-standard library that I need to do something.

At which point you have to scale the IT wall all over again if you work at a Fortune 500 company.

> At which point you have to scale the IT wall all over again if you work at a Fortune 500 company.

I work at a Fortune 500 company, and the only wall I have to scale when I want to use a library that nobody at our company has ever used before, is to get someone to check and approve the license (typically takes 1-2 days), and import it into our code repos.

I mean I see your point, but not everywhere is as bad as you make it seem.

Putting code into the standard library doesn't magically create developer resources to maintain it. Indeed, Amber Brown is saying that many libraries in the Python standard library aren't properly maintained. So it's not clear that standard library policy is relevant to the Rust issues you have.

> If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

You could install packages with `pip install --user` to have them install under your home directory.

> > If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

> You could install packages with `pip install --user` to have them install under your home directory.

You might find that the issue isn't necessarily always technical (ie permissions), but policy. A lot of places don't let you arbitrarily download software off the internet onto a system. Pp's point is that having a full feature stdlib let's you only need to crank the policy/approval process once, rather than once for every dependent lib.

> The problem is that I cannot count on being able to install new software in many environments.

The approach Ruby is taking with gemification and default and bundled gems for standard libraries is equivalent to the traditional standard library if you can't install updates, but superior in other cases.

As I understand, the Ruby model -- that is suggested for Python above -- is to include the libraries by default but to keep them as separate pypi packages.

In this case, even If you can't upgrade, you still get the same libraries but perhaps older versions.

This is close to what Rust is doing, and it's working pretty well, apart from shocking newcomers who expect libstd to be useful on its own.

In Rust, libstd is mainly for interfacing with the compiler and providing interoperability between packages (crates). The wider crate ecosystem is the real standard library, since external crates are as easy to use as the standard library.

For example, the libstd doesn't even have support for random number generation. There's a rand crate, which is now on 6th major breaking version. That's perfectly fine, because multiple versions can coexist in one program, and every user can upgrade (or not) at their own pace. And the crate was able to refine its interface six times, instead of being stuck with the first try forever.

This is interesting. Go seems to have the complete opposite stance. The stdlib are some of the most useful and well written packages you can use in the Go ecosystem, and then you have the "extended" standard lib which isn't 100% in the language yet, and even further sometimes concepts from useful community packages make it into the std lib.

As to why this is the case, I think maybe this is enabled by Go's backward's compatibility focus and encouragement to upgrade early and often, and the community's focus to utilize small interfaces sometimes from the stdlib itself, like io.Reader and io.Writer, or http.Handler. Added to that w.r.t. using the latest and greatest, most Go users frequently are using the latest version of Go even in production (per the go experience surveys).

I am sure it also helps that Google pays people to develop, maintain, and improve the stdlib.

No doubt Go's stdlib is useful, and there's plenty of things it got right. However, it's not immune to making some mistakes and having to freeze them forever. The more functionality you add, the harder it gets to get it perfect on the first (and only) try. Search for "deprecated site:https://golang.org/pkg/" finds various issues ranging from cosmetic mistakes to entire packages being deprecated.

    CompressedSize     uint32 // Deprecated: Use CompressedSize64 instead.
    CompressedSize64   uint64 // Go 1.1

    // Deprecated: HeaderMap exists for historical compatibility
    // and should not be used.
Requirements may change over time, so even getting something perfect now is not a guarantee it will last (e.g. pre-UTF-8 languages froze byte-oriented or UCS-2 strings, even though these were good decisions at the time).

Sometimes improvements are not worth the cost of deprecation and replacement, so things are just left as they are. For example, an HTTP interface designed for request-response HTTP/1 works for stream-oriented HTTP/2, but support for push, prioritization and custom frames is bolted on. Packet-oriented HTTP/3 will add even more stuff that will have to be retrofitted somehow to the old model. Libraries can come and go, but std can't just throw away an old interface and start over.

This approach is useful for a while, but once something is in a stdlib, its interface is frozen forever.

(I am trying to tie the two concepts together from your reply, so I am not saying this to come off as combative but understand your objection)

- What would you define as a while? I think Go has done well with it, given that it is almost 10 years old now at this point, and in reality, was in development internally at Google years before that.

- Do you feel like an interface being frozen forever is particularly a bad thing in of itself? What if the interface does a really good job describing the thing, whatever it may be? For example, Go's io.Reader/io.Writer interfaces.

I agree sometimes an interface being frozen is bad, but for example when Go standardized the context package, it simply added "Context" to the existing functions that now take a context (e.g. in the database/sql package, you have the old, Exec, Query, QueryRow functions, and after 1.8 you have ExecContext, QueryContext, QueryRowContext, which some people may view as a reason to have method overloading, but I view as adding better clarity.

Go was and is by design a "boring" language. The core designers didn't have much trouble looking at decades of prior art and getting it mostly right.

Rust libraries should be expected to take a few tries to get right, especially earlier in its lifecycle. There's more possibilities and less experience in the language.

You can see a similar effect in Haskell, which has iterated many basic bits of functionality many times over.

The way to answer your two questions is to combine them, the definition of “a while” depends on how good a job the interface does. If the interface is really good, then it can last a really long time (perhaps as long as the language lasts).

I read some discussion the other day about some ways in which the Any type in Rust isn’t as flexible as it could be, since it is frozen the only way to improve it would be to introduce a new name, such as Unknown. Similarly in C# along with adding async support, the standard library added async versions of many methods, eg Read now also has a ReadAsync partner.

It does seem that having multiple names for basically the same thing adds a small but tolerable level of overhead to a language. At least if as much as possible is moved out, then projects can choose to only use the latest versions, and live in a world as if past versions never existed.

Having incompatible libraries solving the same basic problems is absolutely not fine. Over time, this will become a huge issue for composability. C++, for instance, already is in this kind of mess with things like STL containers versus Qt comstainers vs. homegrown special case (optimized containers) and heaps of additional libraries building upon each. There are other good examples in other programming languages as well, mostly older ones.

The example of random number generators is a good one, too. There are a lot of applications that require (reproducable!) PRNG sequences and sometimes you have to share PRNGs between modules. Now, looking at the rand crate, I see that the prng part of it was recently mucked around with. If I have two 3rd party modules that I require to share a PRNG that I control (say, noise generators for procedural textures), I cannot compose them if one of them uses the old and one of them uses the new version of the library.

When a crate is part of a public interface, that's harder, indeed. Crates solve it in a few ways:

• crates that expect to be used for interoperability are often split into smaller crates (like API and back-end, or low-level API and high-level API), so that they can evolve some parts without breaking others.

• sometimes breaking changes are technically breaking, but easy to upgrade (e.g. methods renamed, args reordered). In that case most users catch up quickly.

• in desperate cases, a new version can import its own old version and re-export old structs and interfaces that haven't changed, so they're compatible across major versions.

• proper sharing and composition should be done via traits, so that you can implement a trait for any number generator, not just one version of one implementation.

I've never coded Rust - is there any distinction between a really important crate used by millions of people and something really obscure with 3 users? Are all the crates subject to security audit?

There is no technical distinction. The community is working on a WoT/review tool (cargo-crev), but in the meantime you can see who has published the crate and who uses it. The de-facto standard crates are maintained by Rust team members or well-known authors.

ok. So it's kind of informal at the moment.

Maybe in the future we'll see more hacking of libraries (people managing to deliberately sneak exploits in) and in response stronger lockdowns on important library code.

Anyone can upload to crates.io without review.

But then you have six versions to have security fixes, how do you do that?

You release fixes for older versions according to semver.

Yes and no. Until the Ruby maintainers give the gem versions of the internal modules different names you will get bizarro conflicts when using a gem after the "batteries included" version has already loaded.

I don't know how many hours I've spent battling Psych errors because of this very thing, but it's way too many. Calling the gem something, anything else, would solve the issue.

It's great that they're unbundling a lot of things, but there's still some serious friction between external and internalized versions of these gems.

For Ruby, EventMachine sub-universe is really in bad shape. EventMachine is creaky and old. Event-aware packages are in short supply and are usually woefully out of date, unmaintained.

This works in Perl, a language that almost everybody criticizes or hates. I've never had any issues in upgrading core packages.

If your project has any third-party dependencies, and so (nowadays) you're going to set up requirements.txt and virtualenv and whatever anyway, I can see that you're going to think things like "this XML parser in the standard library is just getting in the way; I can get a better one from PyPi".

But I think a lot of the value of a large standard library is that it makes it possible to write more programs without needing that first third-party dependency.

This is particularly good if you're using Python as a piece of glue inside something that isn't principally a Python project. It's easy to imagine a Python script doing a little bit of code generation in the build system of some larger project that wants to parse an XML file.

I think the biggest problem is going from zero third-party dependencies to one and more. Adding that very first one is a huge pain since there are many ways of doing it with many different trade offs. It is also time consuming and tedious. The various tools like you mention are best at adding even more dependencies, but are hurdles for the very first one.

Not true. The more external dependencies you add, the more likely it is that one of them will break. I try to have as few external dependencies as possible, and to pick dependencies that are robust and reliably maintained. There is so much Python code on GitHub that is just broken out of the box. When people try your software and it fails to install because your nth dependency is broken or won't build on their system, you're lucky if they open an issue. Most potential users will just end up looking for an alternative and not even report the problem.

To add one more third party dependency when you already have some is as simple as adding one more to whatever solution you are already using (eg another line in requirements or running a command).

When you have no third-party dependencies, then adding the first one requires picking amongst trade offs and lots of work. A subset of choices include using virtualenv, using pip, using higher layer tools, copying the code to the project, using a Python distribution that includes them, writing code to avoid needing the first dependency ...

* You have to document to humans and to the computer which of the approaches is being used

* Compiled extensions are a pain

* You have to consider multiple platforms and operating systems

* You have to consider Python version compatibility (eg third party could support fewer Python versions than the current code base)

* And the version compatibility of the tools used to reference the dependency

* And a way of checking license compatibility

* The dependency may use different test, doc, type checking etc tools so they may have to be added to the project workflow too

* Its makes it harder for collaborators since there is more complexity than "install Python and you are done"

I stand by my claim that the first paragraph (adding another dependency) is way less work, than the rest which is adding the very first one.

They're typically broken out of the box because they don't pin their dependencies. pip-tools[1] or pipenv[2], and tox[3] if it's a lib, should be considered bare minimum necessities - if a project isn't using them, consider abandoning it ASAP, since apparently they don't know what they're doing and haven't paid attention to the ecosystem for years.

[1] https://github.com/jazzband/pip-tools [2] https://docs.pipenv.org/en/latest/ [3] https://tox.readthedocs.io/en/latest/

It's trickier than just pinning dependencies because some libraries also need to build C code, etc. Once you bring in external build tools, you have that many more potential points of failure. It's great. Also, what happens if your dependencies don't pin their dependencies? Possibly, uploading a package to pipy should require freezing dependencies or do it automatically.

Modern python tooling like pipenv pins the dependencies of your dependencies as well. This is no longer an issue

I used a requirements.in file to list out all the top-level direct dependencies & then used pip-compile from piptools to convert that into a frozen list of versioned dependencies. pip-compile is also nice because it doesn't upgrade unless explicitly asked to which makes collaboration really nice. I then used the requirements.txt & various supporting tooling to auto-create & keep updated a virtualenv (so that my peers didn't need to care about python details & just running the tool was reliable on any machine). It was super nice but there's no existing tooling out there to do anything like that & it took about a year or two to get the tooling into a nice place. It's surprisingly hard to create Python scripts that work reliably out-of-the-box on everyone's environments without the user having to do something (which always means in my experience that something doesn't work right). C modules were more problematic (needing Xcode installation on OSX, potentially precompiled external libraries not available via pip but also not installed by default), but I created additional scripts to help bring a new developer's machine to a "good state" to take manual config out of the equation. That works in a managed environment where "clean" machines all share the same known starting state + configs - I don't know how you'd tackle this problem in the wild.

I do think there's a lot of low-hanging fruit where Python could bake something in to auto-setup a virtualenv for a script entrypoint & have the developer just list the top-level dependencies & have the frozen dependency list also version controlled (+ if the virtualenv & frozen version-controlled dependency list disgaree rebuild virtualenv).

I don't know if it'd work the same way, but I've had a lot of success with Twitter's Pex files. They package an entire Python project into an archive with autorun functionality. You distribute a Pex file and users run it just like a Python file and it'll build/install dependencies, etc. before running the main script in the package.

I used it to distribute dependencies to Yarn workers for PySpark applications and it worked flawlessly, even with crazy dependencies like tensorflow. I'm a really big fan of the project, it's well done.


Unless your dependency is a C-header file updated by your distro as part of a new version.

Requiring people to "pay attention...for years" is not the way to build long-term robust software.

the problem is it can fall apart quickly. the XML parsing in the standard library is limited and slow, so most people consume lxml instead [0]. so it depends on the case. counterpoint: e.g. pathlib being in included is great. it was at least inspired by 3rd party libraries, but the features are relatively stable and the scope defined, and relatively few dependencies, and so moving it into the standard library is a win IMO. not only for import ease, but for consistency.

[0] https://pypi.org/project/lxml/

ElementTree is in the stdlib. It isn't slow and has incremental parsing and so on.

It's also a nice API for dealing with XML.

> ElementTree is in the stdlib. It isn't slow and has incremental parsing and so on.

I had enough trouble using it efficiently that I went and wrapped Boost property tree[0] and can happily churn out all sorts of data queries (including calling into python for the sorting function from the C++ lib) in almost no time.

I was taking daily(ish) updates of an rss feed and appending it to a master rss file but sorting was pretty slow using list comprehensions so now I convert it automagically to json and append it as is. No more list comprehensions either, just hand it a lambda and it outputs a sorted C++ iterator.

Though I probably should've just thrown the data into a database and learned SQL like a normal person...

[0] https://github.com/eponymous/python3-property_tree

> and so (nowadays) you're going to set up requirements.txt and virtualenv and whatever anyway

If only that were the norm amongst long-tail python users. Heck I don't do it; I have the anaconda distribution installed on Windows and when I need to do a bit of data analysis hope I have the correct version of packages installed.

Making this core to the python workflow (bundling virtualenv? updating all docs to say "Set up a virtualenv first"?) is the first required change, before thinking about unbundling stdlib

I develop in python since 10+ years. And setting up a virtualenv only happens when developing patches for 3rd party packages. (due to the other commercial environment that doesn't work with venvs).

And it is a totally miserable experience on Windows, every single time.

The stdlib would still ship with Python I presume, it would be simply updatable. But that doesn't mean it wouldn't work without a "requirements.txt" (use Pipenv which is light-years ahead).

Last week I have written a Python script purely relying on the stdlib. Basically a coworkers shell script had to be adjusted to account for newer datafiles I was processing and I am not that experienced with shell script magic. I typed down 15 lines of Python code only relying on the std lib and was happy, it ran on the server's 2.x Python without an issue. This is the primary selling point of a larger std. library.

But, I do wonder whether it needs to grow. Python is in a stage where adoption of new std library features is inherently slow, not only in third-party libraries like twisted, but also in applications, even if they use newer Python versions.

What kills Python here is that it is most commonly bundled with the linux distribution or the OS (true also on mac). This reduces cycle times drastically. Compare this to newr language platforms that people like to install in newer versions on older platforms quite regularly.

Some recent additions would be fine additions at an early stage of language development, but surely not for Python.

> What kills Python here is that it is most commonly bundled with the linux distribution or the OS (true also on mac)

It doesn't kill it, quite the contrary, rather makes it ubiquitous. If you want another version you just install virtualenv. It's the same with Perl. We use the Perl version shipped with the distribution (openSuSE) and deploy to that. It's older but it's stable and it works. On our dev environment (mac) we have the same version with all of the modules installed in plenv. We also chose a framework with as little deoendencies as possible (Mojolicious). It looks like it was a great choice.

What they need is an Apache Commons or Guava of Python. They're both defacto part of the standard java library.

I try to avoid Guava because they have a habit of making incompatible breaking changes, and because so many libraries depend on it, it's likely to cause version conflicts. The way Apache Commons puts the major version in the package is much better in that regard.

I have not experienced this running guava 16-23 in various apps. Maybe incompatible but they're good about security patches for old versions. I have never seen a version conflict between guava releases

It's very easy to get a Guava version conflict because (a) Guava frequently adds new stuff, and (b) Guava semi-frequently deprecates and removes stuff a couple versions later.

So all you need is one dep that needs Guava version X with method M that is removed in version X+2 (say) and another dep that needs something new introduced in version X+2, and you have a Guava version conflict. That's, Guava releases are not backwards compatible due to removal of classes and methods.

You can sometimes fix this with a technology like shade or OSGi or whatever to allow private copies but it does not always work.

Transitive dependencies on Guava 19, 20, 21 can lead to runtime crashes if your dependencies differ in what guava versions they expected when they were compiled:


She seems to be advocating that Python do pretty much what Perl has ended up doing, which is "we have some batteries, but we haven't been adding new ones for a decade or more".

The reasons are similar, it's a constant drag on core compiler development to need to support various batteries included that most core contributors aren't going to care about, so it's easier to tell people "use CPAN".

There was even talk of "distros" for the interpreter. Where the core bits would be similar to what Linux is, and all the batteries would be provide as collections of add-on packages.

Strangely enough these efforts seem to stop at OS distributors. They really seem to like to install just the one "compiler", and wouldn't stand for a project like Perl or Python telling them "we mean for you to distribute the core compiler plus these 100 packages, because that's what forms our 'language'". "Strangely" because you'd think they'd be the best positioned to make easy work of packaging up such a thing, and it shouldn't in principle make a difference if you need to install 100 RPMs / APTs by default.

And perl solved that perfectly: just let the OS/distro solve the 100s of packages. And it have been solved, despite you claiming otherwise on your last paragraph.

When did you have to use cpan in a modern system? Compare that to how many times you had to use pip.

Now, if you use a crappy OS or distro (or god forbid, some container built by you have no idea who on top of nobody knows what) then yeah, you are bound to do the leg work yourself, but you will be doing that regardless of the language/subsystem you are trying to use in that case.

Not to mention that it is the only way to do things professionally. For example, if you must have a system that parses XML but for company policy is not allowed to have even the means of performing a network request. With python you either have both xml and an http library and whatever else included and you will either have to do a special package with a striped down python+xml only or get a corporate exception. While on other languages you can install only the xml parser component package and your code will run happily and be compliant with company policy.

> When did you have to use cpan in a modern system? Compare that to how many times you had to use pip.

Well yeah. A sizable amount of new software is still being written in Python. But when I use Perl software (besides my custom scripts), it's always stuff that's old enough that the distribution is carrying packages for it.

If you disagree, please name a significant new software written in Perl that was released in the last, say, 5 years.

name one package you missed in those 5 years.

I see your comment as the goal, not the problem.

> if you must have a system that parses XML but for company policy is not allowed to have even the means of performing a network request. With python you either have both xml and an http library and whatever else included and you will either have to do a special package with a striped down python+xml only or get a corporate exception. While on other languages you can install only the xml parser component package and your code will run happily and be compliant with company policy.

...wouldn't the company policy involve removing the means of performing a network request from the computer, making the notional capabilities of the software irrelevant?

Python will let you drive network requests through the OS. It's just that you wouldn't normally want to.

+1 for this. Trying to ban programming languages that support network connections is both foolish and impossible. Any Turing complete language that allows any sort of OS interaction can be used to communicate over a network. Even if you have to manually do the syscalls yourself.

most companies i've worked for have special packages for perl/python/php/etc that compile the interpreter without support for system calls, for example.

That alone have probably paid of handsomely over the years considering all the XSS we patched, which could very well have been full network compromises.

I don't mean OS distributors can't package up CPAN modules. They can do that, no problem, same for the Python equivalents.

I mean that a significant use people get out of Python and Perl is that they aren't bare-bones like say Scheme or Lua where the standard library is really spartan.

It allows you to write useful code that works on the lowest common denominator of "just OS Perl or Python". Whether that's some random version on whatever Linux distro, or *BSD or Solaris or whatever without needing to write your own getopt library or whatever.

Which is why the "let's ship a bare-bones compiler and have people use CPAN or PyPi" is contentious. In theory it shouldn't matter, and for a lot of shops who install hundreds of packages it doesn't, but it does for people who target stdlib-only, which is a big use-case. Particularly since the people who have that use-case are drawn to these languages.

> Which is why the "let's ship a bare-bones compiler and have people use CPAN or PyPi" is contentious.

But you don't have to ship just a bare-bones interpreter to deal with the problem of stdlib staleness, you just need the stdlib libraries to be updatable via package manager, you don't need to not ship a baseline version of them with the interpreter.

That doesn't deal with the bloat issue raised with relatively unused libraries, but if they are relatively unused because they aren't good rather than because the use case is uncommon, upgradability could solve that.

Of course, you don't solve compatibility for versions before the move to upgradable packages, but at the same time if you solve problems going forward you increase the incentive to upgrade.

Sure, in Perl these are called "dual-life" modules. It makes things easy for users, but makes the life of the compiler-maintainer worse.

Now not only do they need to ship a stable compiler+large-stdlib, but they can't even rely on there being a 1=1 version relationship between the two, instead it'll be many=many as users might use multiple library versions with multiple compiler versions.

> Sure, in Perl these are called "dual-life" modules. It makes things easy for users, but makes the life of the compiler-maintainer worse.

Well, yeah, but you're not going to have much of a language at all if you optimize for quality of life of the language maintainer.

I use cpan constantly, though indirectly via carton. I used to use the OS for Perl libs but this starts to fail hard when you have multiple projects that all demand different versions of stuff.

You use CPAN all the time in perl develpment

The idea of "distros" for python is interesting, and to a certain extent has already happened: just look at Anaconda.

I've been using built-in environment isolation tools such as virtualenv for ages but have recently switched over to using miniconda for all things python. Among other things it has amazing support across all three major OS's, and I happen to be dealing with all three at any given time. Whether one uses miniconda, pipenv, virtualenv, or anything else like it, as far as I am concerned the days of ever using the system python are over. I will always create my own personal "distro" on the fly with full control over the python version and every add-on package.

> the days of ever using the system python are over

You don’t have any Python scripts in your bin folder?

As a non-scientific user of pyenv[0], would I benefit from switching to Anaconda/miniconda?

[0]: https://github.com/pyenv/pyenv

The default "Anaconda" install is like 7Gb after it grabs everything. Turnkey if you're using all that stuff anyway, but otherwise not particularly worth it. Miniconda on the other hand I'm finding meets my needs exactly. The base install is standard.

You don’t have to switch. You can install anaconda using pyenv, to get the best of both worlds.

not likely IMO. i've found conda - which is their environment management tool - to be a hassle unless one needs specific numpy/scipy/GPU libs. i'm using pipsi and pew, although i'll look into pyenv.

Former conda dev lead here. Definitely interested in more details regarding what part of the conda experience you found to be a hassle, if you’re willing to share.

That problem does not seem to happen with the Haskell Platform. Maybe it's because GHC has almost no batteries at all, so nobody thinks it's sufficient, or maybe it's because distros get it in a single package, so it does not feel like installing 100 libraries.

> There was even talk of "distros" for the interpreter. Where the core bits would be similar to what Linux is, and all the batteries would be provide as collections of add-on packages.

Not a million miles from the modularisation that Java has been going through.

The Python standard library has been a huge help for me. Evaluating which third party packages to trust and handling updates is a hassle. (Would love a solution for this. Does anyone have a curated version of PyPI?) I’m surprised that people want to slim it down other than for performance on a more constrained system.

As an aside, why doesn’t the Python standard library extend/replace features with code from successful packages like Requests? Tried it and it didn’t work? Too much bloat? Already got too much on the to-do list?

The quote about the stdlib being "where packages go to die" was, for a long time, considered a feature and not a bug. The theory was that once a package is in the stdlib its development should slow to prioritize stability over new features.

This may somewhat explain why lots of successful packages are not in the stdlib: putting them in effectively killed future development until a few years ago. But today that argument is inconsistently applied and I'm not sure if it's a rule worth keeping.

That phrase never had a good connotation. I believe someone skilled in PR respun that meaning.

> Does anyone have a curated version of PyPI?

Pypi have thrown out the downloads counter—a huge misservice to coders. Like I got all day to figure out the best libs for ten different features which I only need in passing, so my primary concern is to not pick complete garbage.

So, my solution to that now is to look up Github pages for the libs and choose the one with most stars. As much as I dislike Github for its occasional typical proprietary behavior, Gitlab doesn't help in this case.

Pypi still has a download counter, but it will tend to reflect which libraries are used in ci, so it's a biased estimate.

If only there were a way to filter machines on the internet by some kind of id number, and to subtract numbers from other numbers at scale.

And what about caches and proxies?

Could you please point me to the location of that counter? Because I ain't seeing it anywhere.

afaik it's not in the UI, but the dataset is published and and access provided by other sites: https://pypistats.org/

Perl has stars on MetaCPAN and every version has a test counter. Download counter is unreliable.

Especially download counters for libraries that make http requests!

Requests depends on urllib3 which would also have to go into the stdlib. It also contains a CA bundle which the core devs don’t want to do. It also likely the internal implementation doesn’t follow core dev standards and practices, a common problem with integrating external libs. Finally there’s a risk it would slow or discourage new feature development by tying it to the core release cycle.

A better approach might be to add a core of basic requests-like features built from the stdlib’s existing resources. That would be beneficial to many users and if they need more then there’s always Requests.

That makes sense; if it makes any difference I don’t necessarily mean “take code from X and drop it in” so much as “if the consensus appears to be that the X api is better then add that to stdlib”. I like that notion of adding the X api, or parts thereof, And then having a third party X+ package. Maybe I just like the idea of something I built being “worthy” of stdlib.

I've come to use packages outside of the standard library very sparingly; been burned too many times to find that development of some package stopped or slowed down and backing out can be a real pita.

What is your argument here? Standard library module development is also extremely slow.

There is one major difference: an abandoned external package may break with newer python versions, whereas you can always count on stdlib packages being updated for new versions.

Can you provide an example?

To the best of my knowledge, minor version updates in Python 3 have been entirely backwards so far and shouldn't have broken any library code. And as for incompatible changes sick as the transition to 3, then these of course had interface changes in the standard library.

exactly. and i'm in academic software. its even worse.

> Does anyone have a curated version of PyPI?

I'm not entirely sure what you're looking for, but have you tried https://www.enthought.com/product/enthought-deployment-manag... ?

Edit: example

    $ edm envs create tester36 --version 3.6
    $ edm shell -e tester36
    (tester36) $ edm install ipython matplotlib pyqt

That’s interesting. Thanks!


> As an aside, why doesn’t the Python standard library extend/replace features with code from successful packages like Requests?

It is possible (ie. asyncio was separate package). It is slow process though.

I'd bet that absorbing Requests into the standard library, no matter the particular method of absorption proposed, would present too much of a political challenge to overcome.

pathlib is another example.

Curating packages for quality across multiple versions and architectures is hard work. In addition to Enthought (mentioned earlier), Anaconda maintains a curated set.

> Does anyone have a curated version of PyPI?

https://python.libhunt.com/ Not exactly curation but it does have rated libs for lots of categories.

> Already got too much on the to-do list?

That would be my guess. I would also add that getting through the process of adding a third-party set of modules to the standard library can take quite a while.

Hawk Owl is a _fantastic_ dev. She was the main force behind the twisted 2->3 transition. But because she is, she is missing the point of batteries included.

Asyncio is in the stdlib so that we have an official lib and API. The main benefit is that most people now, when looking for async, are not wondering about twisted or gevent or tornado. Most just go asyncio. Most dev efforts go to asyncio. It's the end of the great async war. Is it perfect ? No. And I don't care. It's one thing less to worry about. For those who know what they are doing, you can still choose and pip install twisted, but most people don't, and that's solved. Before that, just choosing the lib was a nighmare, as basically it's a definitive call. Out it on pypi, even with a "stdlib" tag, we go back to the 200X era. And it was not fun.

And the goal for having things like xml/sqlite/ssl without installing anything makes python very useful in a load of situations where you can't install stuff. Sometime you are offline. Sometime you are in a restricted env. Sometime you are not on your machine. Sometime your security protocol is hell. Don't assume people use Python as we do, from our comfortable dev laptop driven by the knowledge of our craft. Python is used in banks, by scientists, in schools, by kids, by poor people in the third world, by geographers and pentesters. The python user base is incredibly diverse, it's why it's so popular: it fits a lot of use cases.

So I see the benefit of having a side version of official modules we can pip install that can move faster. I see the benefit of cleaning the stdlib of old stuff, like the wave module, Template or @static.

But I'm glad I don't have anything to install to generate a uuid or unzip stuff. I'm glad I don't have to worry about twisted anymore (depiste that I did write a book on the topic !).

Also, pip install is NOT simple when you learn the language. I have to spend some time in the classroom, even with adult professionals, to explain the various subtleties of site-packages, import path, py -x on windows, python-pip on linux, -m, virtualenv, header files, etc. before my students become autonomous with it. Without a teachers, this turn into months of bad practices and frustrations.

You'd have to fix that first, way, way before moving stuff to pypi. I do think it should be high priority actually: it affects way more than pip.

Having a huge standard library also kills analysis paralysis and lets people be more productive.

If you're in the flow and trying to hack together something, the last thing you need is to lose all momentum to pick a date time library. I've had this issue tons of times with Node and Rust, where I'm not up to date with the current meta and my 30 minute hack job is interrupted 5 minutes in by having to google which library should I use to do an HTTP request. (I've actually lost interest in whatever I was doing a few times because of this.)

Python's stdlib is nobody's favourite, but when you start to get to its limits, you're probably past your flow state, you've written most of the logic and you can spend some time to replace http.client with requests because the latter is much better.

On a tangent note, I've been trying to find another scripting language to replace Python because I'm not a fan of it anymore (I won't get into it right now), and considering what I just wrote, there's not much that can replace it, as most languages have a bare-bones standard library and if you're not up to date with the current best library to do X, you'll never achieve great productivity.

> I've been trying to find another scripting language to replace Python because I'm not a fan of it anymore (I won't get into it right now), and considering what I just wrote, there's not much that can replace it

Have you considered ruby?

I think go did this well, they provide a very solid toolset that is nothing fancy but you can forget about it right away and start producing solutions.

Give it 25 years.

You still need to learn to use the library you picked, even if it’s part of stdlib. Python has two HTTP libraries in stdlib, http.client and urllib.request. The former is low-level, and the latter has a fairly complicated API. Learning to use them will take much longer than just giving in and picking Requests. The stdlib docs will tell you to use Requests. Everyone on the Internet will tell you to use Requests. Any questions you might have for http.client/urllib.request will be answered by “use Requests”.

> I see the benefit of cleaning the stdlib of old stuff, like the wave module

Wait... what ? No way ! Some of us do you use Python to process wav files. If anything, I'd like this module to be updated, not removed.

The argument is not that this module should be removed, it's that it too niche to be included in every python installation by default and should be installed through a package manager or similar.

Non maintained modules should be split off and put on PyPI with a big warning, of course.

> Asyncio is in the stdlib so that we have an official lib and API.

It was pitched first as a common low level async loop for other applications like Twisted and Tornado.

Then people started using it directly and the keywords were added.

It's great for the people who think the way asyncio does, others are now forced to use it. I find all of Twisted, Go and Jane Street's Async easier to use.

Perhaps Python is just the wrong language for me.

>She was the main force behind the twisted 2->3 transition.

Explain? I’ve hated the slow adoption of 3.x from 2.x, and generally how terrible it is to have apps that are 2.x on your 3.x system, and would like to know more about how that happened.

Twisted is a lib, she happened to have contributed a lot to the migration effort for it.

It's funny to me that they're making a point that PyPI is better than core, because actually I think PyPI has created a rather crap ecosystem. The non-hierarchial organization of packages, the lack of curation, lack of inheriting past functionality and extending it as more standard functionality, etc has resulted in a confusing sprawl of packages with duplicate, incompatible, buggy functionality. It's a bit like Linux internals; it's grown haggard over time, isn't organized well, is badly documented, and so it's difficult to pick it up and use it without stumbling over a decade or more of stale documentation and obsolete software.

Perl has a much better set of modules that extend standard functionality, which considering how much flack Perl gets for being hard to read, is rather funny. Rather than every new feature being its own independent project, most of the useful modules inherit a parent and follow the same convention, leading to very simple and easy to use extensions. And Perl Core isn't all that great, but it does have some batteries included, and everything else is extended easily and in a more standard manner by CPAN.

Wow, I've had a really opposite experience with CPAN modules. I've overwhelmingly found them to not respect encapsulation (messing with all sorts of global state, not mentioning that they're doing it, and failing to clean up after themselves or even provide the tools to clean up well), be massively inconsistent in their APIs, have messy and hard-to-parse documentation (still better than Python's conventions here, though), and have some really silly hierarchy-related decisions, most of which I suspect stem from inter-maintainer politics and infighting, of which I've observed a large amount.

Sure, I've found some gems on CPAN, but, having worked on both Perl, Python, and Java at reasonable scale for awhile, I cannot understand all the praise CPAN gets. It's the worst-quality scripting language package ecosystem out there. Even NPM does a better job, and some things about NPM are awful. CPAN might have been the first/only/best package manager for a get-shit-done scripting language at some point, but not any more.

Separately, I agree about modules which extend language functionality (e.g. class systems, async programming, runtime typing) specifically. Perl does pretty well in that area. While many of those language-extension modules really don't play well with any other metaprogramming tools being installed in the project, I don't imagine that any alternatives in other languages do, either. My main beef above is with "simple" (read: not pervasive semantics changes) modules like IPC utilities, HTTP clients, or loggers that don't know how to stay in their lanes.

What functionality would you like to have on PyPI, in addition to curation?

The big "function" I would like is just organizing the packages differently to get people to think about and use them differently.

Search engines are a "cool" technology that have become the de facto way to find what you're looking for. But if there's a lot of content related to what you're looking for, they can suck.

Go to PyPI and search for "semantic version". 10,000+ projects for "semantic version" found. As you go through page after page of different modules related to versioning, the one module you won't find immediately is Versio (https://pypi.org/project/Versio/), a well-documented and useful module which I ended up using. I have no idea how I found this module, but it certainly wasn't from PyPI's search engine.

Now go to CPAN (really metacpan) and search for "semantic version". Yes, you're still looking at thousands of results - but wait! There are only two modules here that look useful: Version::Dotted::Semantic, and SemVer. And the description comes straight from the docs' README, rather than being a short uninformative blurb. The first module, Version::Dotted::Semantic, is inheriting a separate module, Version::Dotted, and adding some extra functionality. Not only does the search page give more information about the module, but the hierarchy makes it easier to find (and later extend) useful modules in an intuitive way. Since the base module's functionality is boring, generic, and simple, it's less likely that people will make 20 different versions of it, so it'll be reused more often and thus remain stable for a long time.

A lot of CPAN's module names have sprawled over time and gotten less useful, but there's still a general convention that you name your module as a hierarchy of what it does (even if it's kind of verbose) and make small, reusable modules, rather than giant modules that are hard to extend. Not all modules measure up to this standard, and there's definitely room to improve, but I think Python modules could benefit greatly from a system like this.

As far as curation goes, PyPI is often filled with cruft. While searching for Jenkins packages, you will come across lots of entries like this: https://pypi.org/project/jenkins2api/. The homepage leads to a GitHub 404, it's only ever had one release, and it has no documentation. This project should probably not have been listed on the main search page, or at least sorted well down the list by default with intelligent filters and marked accordingly. (The "date last updated" and "trending" sorting just results in having virtually no Jenkins-related modules in the results at all)

I agree with Amber’s point that more stuff should be moved from the standard library to PyPI. I made my first pull request to CPython during the development sprints this year, and it’s honestly not the best experience. Everything is built from scratch in CI after every commit, even a documentation change. There’s nowhere near enough CI builds and pipelines for everything Python supports. Pull requests are outstanding for several months, and there’s at least a thousand PRs open when I checked this morning.

I’m not sure if Python’s ideal solution is to reduce stdlib and have endorsed packages in PyPI, but it would be an improvement over the current process.

The story of Python 2 to Python 3 migration, in a nutshell:

> Van Rossum argued instead that if the Twisted team wants the ecosystem to evolve, they should stop supporting older Python versions and force users to upgrade. Brown acknowledged this point, but said half of Twisted users are still on Python 2 and it is difficult to abandon them. The debate at this point became personal for Van Rossum, and he left angrily.

Hopefully the “python foundation” will declare python 2 deprecated soon so that it can be handed over to responsible maintainers.

I don't understand the lowercase letters and scare quotes around python foundation. Is that not its name? (Okay, it's Python Software Foundation.)

The end of life date is already set: January 1, 2020.

It's open source so I don't know what you're looking for in terms of a formal handover. Yes, I bet Red Hat and others will continue to maintain their own versions past that date.

That's happening: https://pythonclock.org/

To ensure things move along: pip has been printing highly-visible "python 2.7 will deprecate soon" warnings for a couple months or so now.

And backing out of it when running on pypy, as that does not deprecate python 2 compatibility...

Sure. Pypy is a separate implementation, they only control CPython. That's a pretty normal arrangement - official moves on, other forks might backport fixes for longer or focus on stability or some other realm of performance or something.

Probably only on a recent pip version. Pip 10's dependency resolution doesn't like our requirements files (we have contradictory versions that work due to the order they are in the file), so we've mostly only gone up to pip 9.

When I first used python like 20 years ago I was blown away by how much functionality was blown in, and it can be annoying using languages where even the most basic functionality involves downloading 50 packages from the internet, but on the other hand the standard library does seem to be a mess now.

I found that attractive when I first learned Python, but when I (much more recently) started picking up Rust, I was blown away by how easy and normal it is to use external packages: the build tool and package manager are the same thing and shipped with the language, the hello-world-equivalent docs assume you're using it, and even the Rust compiler and standard library themselves can (carefully) depend on external packages. Having run up against limits of the Python standard library several times in years of writing production software in it and not just learning it, I find the batteries-not-includes-but-easy-to-install approach better on the whole. (Rust is not the only language that does this - my impression is Node/NPM and Swift, at least, are similar - but it's the one I happen to be familiar with.)

In Python's defense, this was not obvious at the time; my understanding is Rust came to this approach by looking at the experience of Python and other languages. When Python's standard library was first being written, there were no easy package managers for any language, and the normal thing to do for installing dependencies in e.g. C was to grab random tarballs and figure out how to build and deploy them yourself. So avoiding that process made perfect sense.

> Having run up against limits of the Python standard library several times in years of writing production software in it and not just learning it, I find the batteries-not-includes-but-easy-to-install approach better on the whole.

Obviously it doesn't apply to everyone, and it certainly doesn't apply to most startups or open source developers, but I spent most of the last 20 years working in environments where you have to get permission for every third party library you bring on to the network. Many networks were essentially "airgapped", so it's not like you could just ignore the rules. The bureaucratic process alone meant that we preferred large bundles like Anaconda or Qt. Trying to use Cargo as it is typically used and documented would be a complete non-starter.

>Obviously it doesn't apply to everyone, and it certainly doesn't apply to most startups or open source developers, but I spent most of the last 20 years working in environments where you have to get permission for every third party library you bring on to the network.

Situations like this will really make you appreciate "batteries included". I think this particular issue is fairly revealing of the attitudes common among programmers of different languages. I think it's a good thing to be skeptical of a program pulling in a bunch of standard libraries over the Internet. It worries me when I find something on Github I want to try and I can't download and compile it without it pulling in 30 or 100 other libraries that I haven't looked at or decided to trust come along for the ride. I don't like that way of doing software, and unfortunately it's the norm in node and starting to become a norm in Rust as well. Real security fails have been caused this way in node's case at least.

Even in cases when Python programs depend on external libraries, I usually don't need to use pip for anything because Python programs will pull in dependencies provided by your distribution just fine. (My distribution doesn't even have any Rust libraries, so even if dynamic linking is possible in Rust not many people are shipping software that way.)

"Download by default" is a worse way of doing things, and it makes me sad to see newer languages like Go and Rust embracing it.

Standard libraries serve several functions. One of them is to bless certain versions of certain libraries as "known good". This can be done outside the standard library too, without incurring the penalties of actually moving things into the standard library. Rust definitely needs more work in this area though.

Yes, my own employer is similar (you don't need permission, but production systems have no internet access and so you need to pre-download all your tarballs etc.), and cargo doesn't work right. But I think there is work on pointing cargo at an internal mirror.

We do have an internal PyPI mirror (with devpi) and we point `pip` at that, and it works pretty well.

Hm, Cargo should absolutely work in this environment; it’s required by the Firefox and Debian build systems, for example.

(Pointing at an internal mirror is now stable. Setting up that mirror is the hard part.)

You can use `conda install rust_osx-64` on macOS and `conda install rust_linux-64` to use `cargo` with the Anaconda Distribution libraries and tools (including its compilers).

Rust is good at many things, but it's not as attractive for Python for the sort of program you write when a shell script gets a bit too complicated and you want to write it in a proper language.

So Rust can get away more easily without having support for things like command line parsing in the standard library, because it isn't really trying to support situations where it would be inconvenient to give your program its own project directory and Cargo.toml and all.

This is a pretty niche case imo. If you really want a stand-alone dole, you can forego the use of Cargo.toml and instead use std::env::args to get command line arguments.

If it really matters, then do it right (in whatever language makes the most sense).

> When Python's standard library was first being written, there were no easy package managers for any language

Wasn't perl's CPAN developed around that time period (mid '90s) ?

Emphasis on easy :-P Using CPAN in even the late '00s was an ordeal.

It just took ages to install and test some huge and popular modules like Catalyst, Moose. However, once installed and tested they would just on your system. And it wasn't that easy for modules creators and core developers as red hat was notorious for shipping decades of Perl versions (5.10 times).

Node is ostensibly batteries included, but they’re weird, arcane batteries powered by tears. Therefore, to get anything done easily usually requires an external package.

Node doesn't even have a stdlib. You have to npm or yarn anything. And yes most of the time it will end in tears. That's not my definition of batteries included, but rather some batteries might explode, others might only ignite, we wish you the best of luck.

> Node doesn't even have a stdlib.

Yes it does. Perhaps you're thinking of browser JS, which does not.

What's a stdlib then, if Node's isn't?

Yea. I'm using Rust at the moment for fun and the amount of things not in the standard library is crazy to me. What? There is no built-in dictionary? What do I use instead and where is it?

Edit: based off of all the replies below, everyone understands the validity of what I'm trying to say, but also have fortunately pointed out my admittedly grevious error of not knowing you can just import hashmap from stdlib. The extra step is pretty minimal and not a problem. I'm hoping this is covered in the Rust book.

Oof, I wasted a good hour+ trying to convert from one to the other. Maybe there's an easy way to do this, but not many people on IRC knew (or were available at the time).

    use std::iter::FromIterator;

Or .into_iter().collect(), I believe. No import needed.

Assuming the keys are Hash + Ord, wouldn't that just be:

  for (k,v) in &src
    { dst.insert(k,v); }


Yes, that's what I ended up doing, but the error confused me and I ended up looking for a less hacky way for more than an hour until I said "fuck it" and just did that.

At least the first one seems pretty straightforward and thanks for replying. I only just started and have a lot to learn.

Are they compatible?

I know it's not the whole point, but there is a hash map and sorted (btree) map in the std collections.

That being said the point still stands, I remember specifically feeling it at the lack of included regex. I overall prefer the slim std lib of rust over the massive python though. Especially with the community being pretty good about nominating de facto standard packages like for regex.

Yeah Rust is pretty extreme. I was just trying to generate a random number in it and was pretty surprised to find out that at some point it did have this functionality in the standard library but they actually removed it and now it's a separate crate.

That is actually being considered for inclusion at some point. It was mainly excluded due it not being ready and them not wanting to commit to the existing API. That said, I like the Rust way. Including a library is pretty easy, and it mains you get the best API as the main one (rather than a stdlib function that is "good enough", and another library to use if you really care about that functionality).

If you are using dictionary in the Python sense, Rust has that built in too, you just have to import it from the standard library collections module.


You mean a map? It's in the standard library.

Yep. Map/Hash/Dictionary.

I didn't know it was included in stdlib (I really thought it wasn't) and feel idiotic now. It is still odd (having a primarily scripting background and not coming from the systems side) that I have to include what seems to be essentially an import statement at the top. I guess it is a lot more efficient that way though. Thanks for pointing out my error!

Er, doesn't hashmap do pretty much the same thing? https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

When I first touched Python I had only used C. Is your story similair?

Had I been using Java or Visual Basic or even C++ proirly maybe I wouldn't be so impressed as I were.

I think the mistake Python is doing is messing with it's simplicity with decorations, halfass lambdas and stuff. As a newbie you could understand Python code, while eg. C++ templates were magic. You need to know more Python to understand Python code nowadays.

Decorators added were in in 2004, I think lambdas were an original language feature (like... before 2004).

There are other new stuff that might be confusing to beginners (like our mighty walrus operator), but those two examples have been there for effectively forever.

Ok it's maybe just my random perception of the language evolving in unsync with my (moderate) skills in it.

I just had the feeling there's more "stuff" that you need to know.

> She thinks that some bugs in the standard library will never be fixed.

This is actually an interesting paradox to be in, and one that Linus Torvalds recently commented on. His focus, like Guido’s, is the user and even fixing a bug can break the user.


This isn't a paradox. once it's released it's not a bug anymore, it's just behaviour. document the behaviour, but breaking compatibility with previous versions is a bug. it doesn't matter how obviously wrong the previous behaviour is.

> but breaking compatibility with previous versions is a bug

That's how you get an inconsistent mess that never evolves. There's something called semver, increase the version number and do the fix / refactors / radical redesign / whatever. People will see that you've went from version 1.0 to 87.3 in one year and they may choose not to use your thing because you're moving too fast for them, but that's life...

1.0 to 87.3 in one year? More realistic would be python 2->3 in 10 years or so and still too fast...

Better than breaking software. Linux is far more important when it comes to ABI stability here than Python though.

Linux has no stable ABI :-)

For driver developers sure, but it's userspace ABI is very stable.

Perhaps my wording is incorrect, but you’re 100% capturing what I was trying to get at. Thanks for the clarification!

Guido is a good dude, through-and-through, despite his perhaps bad behavior here.

Amber is nothing short of an open source hero, having brought Twisted, one of the best open source projects in the world, to new heights. Her insights are as important as anyone in the python community, and after six consecutive PyCons sprinting at the Twisted table (including literally in a chair with Amber to my left and Glyph to my right earlier this month), I consider Amber's voice to be one of the truest and clearest among the leadership of the language into the future.

Amber and Guido are both beautiful human beings.

In the dispute that is the topic of this blog post, Amber is basically totally right. Moreover, the distinction has less to do with any kind of nagging python 2 holdover than this article suggests. The standard lib's role as a place where code goes to die is a view that is widely held and accurate for many cases.

The following question went unanswered during the Steering Council Q&A:

"Every feature request has a constituency of people who want it. Is there a constituency for conservatism and minimalism?"

...and that's really what this whole thing is about.

I'm only going from this article. But I don't think the "ranting" was constructive and I feel like it obscured her point. A lot of comments here thought the goal was to include more (less crufty) packages in the standard library. Which is the opposite of her intention.

I agree with a lot of her arguments. I never understood why tkinter was included (I could see why it might have been added years ago, but before Python3 came along it felt obsolete and unused). I've had to support old versions of Python along with their bugs, and it sucks.

But languages are about choices. Python's syntax and decision to use whitespace is a strong choice--so is their stdlib. I've been on projects that saw performance problems with etree, moved to lxml, then had to go back to etree because of missing features. I've spent a lot of time looking over arrow, dateutil, and moment because datetime seemed inadequate. But I like that there is a thoughtful default that serves many needs. A lot of these examples, like requests, are built right on top of stdlibs--so both would be needed even if it was shipped along with stdlib. I'm ok if code goes there to die because I would hope that due diligence was taken when it was included.

I kind of agree with Guido for a lot of this. I can't wait to move from Python 2 and have looked a lot of the new stdlib and looked for backports. I've looked at twisted and alternatives and I'm so happy something is built in even if its far from perfect.

The problem with rants is it stings and it divides.

When it comes to constructive criticism, I think Amber did a good job with her criticism but can do better at the constructive front. Her problem statement was spot on and I agree that the direction she proposed is a good one.

However, to separate the standard library from the core is probably even more dramatic than the Python 2 to 3 migration. Is that what the community can afford at the moment? What's needed to make the transition? What's the opportunity costs? i.e. what other developments we can do for a bigger impact? What are the pros and cons?

Is "embrace PyPI and move things like asyncio there" not a constructive suggestion, or is she sort of being penalized because the most reasonable solution to the problem can be described in less than half a sentence so it's seems like there's more complaint than solution?

Yup. Totally agree with you on that. And yes this one is a constructive proposal.

My problem is on the like part.

Where shall we draw the line and how do we decide? To me this is a far more interesting discussion. (Maybe it has happened. I don’t go to many conferences these days so I might be missing something here. )

She mentioned http.client vs requests, datetime vs. moments etc, which are also quite correct to me. How about the cgilibs? Or pickle? Or the collections? Or unittest? Stay or go?

Lastly, the title of the talk can be tempered a bit. No? We all know what a leaking battery mean right? Toxic.

> Where shall we draw the line and how do we decide?

Why does there need to be a line? As long as the package manager is part of the core distribution (even if it is itself an upgradable package) why not moving everything into packages, even if some are maintained by the core team and have the stable version at time of distribution release included with the core distribution—but perhaps installed only on demand?

How would you get standards like unittest?

pip install unitest? Oops I spelled it wrong wonder what I just installed?

> How would you get standards like unittest?

“have the stable version at time of distribution release included with the core distribution”

Ruby, for instance has both “default” and “bundled” gems with the core distribution.


pip install pytest

Kidding aside, most of the problem with the fake libraries could be solved if pypi namespace them per author, like GitHub does for repos.

Lastly, the title of the talk can be tempered a bit. No? We all know what a leaking battery mean right? Toxic.

Come on. Python has had the 'batteries' metaphor for decades and this is a straightforward and obvious play on it. You're bringing 'toxic' into this which I suppose is technically and biochemically correct (both, surely, the best kinds of correct) but has far, far more negative connotations than the title warrants. You're having to work really hard to make a generic thing sound dreadful.

Actually I feel like asyncio is the one thing that should be part of the standard library. It even introduces special syntax.

To be honest, I think she might be biased here given that she maintains a competing package.

> To be honest, I think she might be biased here given that she maintains a competing package.

I hope that you take time to reconsider this view.

We're not talking about competition in the same sense as in a capitalist system, between two companies.

Prior to asyncio, Twisted maintained the only viable flow control for serious asynchrony in python. Put another way, python had no standard flow control for serious asynchronous abstractions in the standard language (and standard library) before asyncio (and `await/async def`, etc) landed.

Nobody is saying that the syntactical changes to python belong in a separate package on PyPI (cue Gary Bernhardt's Pretzel Colon). These things are fine.

But `asyncio.Future`? Yeah, I see a very reasonable argument for that stuff (ie, the asyncio namespace) being in a separate package.

But OK - looking again at the "competing package" narrative: now that these things have landed, Twisted has done an amazing job of using them alongside all the other tooling that Twisted also provides, most notably its test infrastructure.

Amber doesn't stand to personally benefit from asyncio failing. To the contrary, having the flow control taken care of so that Twisted doesn't have to be its sole brainparent gives her much less free work to feel obligated to do.

> cue Gary Bernhardt's Pretzel Colon

Could you provide a reference/explanation for this? Thanks in advance!

Not GP, but Gary Bernhardt is the guy who gave the classic "Wat" [0] and "Birth and Death of JavaScript" [1] talks, and some searching turns up "pretzel colon" as the "&:" operator in Ruby [2]. I assume he's mentioned it in a screencast or something, but I wasn't able to find it.

[0] https://www.destroyallsoftware.com/talks/wat

[1] https://www.destroyallsoftware.com/talks/the-birth-and-death...

[2] https://technology.customink.com/blog/2015/06/08/ruby-pretze...

I thought the title was very clever. I associated the leaking with “leaky abstraction” more than anything.

Maybe Python should upgrade from toxic leaking alkaline batteries to explosive overheating lithium batteries!

As mentioned in the article, this has already been explored with the "ensurepip" approach.

And it wasn't the only thing at the language summit that proposed expanding that approach: there was a discussion of carrying time zone updates in the same way, by shipping something with the interpreter that works but allowing updates from PyPI. http://pyfound.blogspot.com/2019/05/paul-ganssle-time-zones-...

So I think the development community / target audience at the language summit already understands the pros and cons of the suggested approach and the technical route to get there. (For an end user, my guess is the experience will be that anything in the Python 3.x standard library today will still be in the Python 3.x standard library, but you'll have to `pip install` a newer version if you want more features, and you get the benefit of being able to `pip install` something from the standard library where you previously couldn't.)

Is there a limit to how constructive the feedback can be?

What do we know? Software configuration management is a heinous problem.

Python gets it more correct than most.

Let us rejoice, be patient, and respect everyone's good-faith efforts.

Aren't conservatism and minimalism pulling in opposite directions here?

Amber wants to remove asyncio from the standard library. That's minimalism but not conservatism.

I think "gracefully correcting mistakes" is in the usual definition of conservatism: political conservatives who say that recently-recognized rights aren't actually rights or who wish to shut down recently-instituted programs that depend on government spending are no less conservative simply because they want a change to the status quo.

Also note that one way of interpreting her proposal is "generalizing the 'ensurepip' model" to keep things already in the standard library in the standard library, but move feature development externally and make it easy to upgrade packages in the standard library. (Then Twisted, which is also installed externally, can simply depend on a fixed version of a standard-library package.)

>Amber and Guido are both beautiful human beings.

Perhaps, but Guido's response was shitty here (and purposefully refusing to get the point), whether one think Amber was right or not.

I deal constantly with the shitshot that is Python packaging (and lack-luster default packages) and I happen to think she is 100% on the spot.

It took me an hour to create a program that tails some logs and alerts when it doesn't receive any logs for a given amount of time. For this task I did not even need to leave the asyncio module. It lets you create subprocesses and execute call_later on the event loop in order to simulate a heartbeat while reading the output of tail at the same time.

Did asyncio module feel bloated? It certainly did. It seems like every module from subprocess to networking to io is crammed into it.

On the other hand, did it get the job done without resorting to any packages or threading? Yep, and that is pretty powerful and rare.

Asyncio is an amazing tool that makes me not hate doing async with Python. There was quite a learning curve with the library. A lot of Lego pieces. But I found the handful I need and then learn new ones occasionally.

I think bloatedness of the stdlib isn't actually a practical problem. It's just an inelegance that you kind of have to learn to tolerate.

> six is non-optional for writing code for Python 2 and 3

I maintain a Python 2 & 3 compatible project that has no external dependencies.

I do similarly as you (maintain 2/3 code without dependencies) but every time I have to do string encoding/decoding it kills me to find a way that half works, and I don't have a ready solution in my mind for these that doesn't break half the time. How do you handle non-ASCII in a compatible manner? Like Unicode stdio? Unicode file paths? Unicode sys.argv? string_escape/unicode_escape? I feel like Python 3 completely wrecked strings instead of making them better.

For the most part, when you do I/O to some external system, if it's text, you encode/decode at the border to that system. Interally, all text data is `str` (or `unicode` in 2) and all binary data is `bytes` (in both).

In some cases of common OS-induced pain, I'd say "do whatever 3 does" in 2, since that'll make migration easier in the long run. (But I understand that can be hard, and I think my responses to your examples below even demonstrate that to be hard.)

To your specific pain points:

> Unicode stdio?

Mostly, `io` should handle this in both 3/2. You might need to help it get the right encoding in 2.

> Unicode file paths

This is going to be a mess in any language, because file paths really aren't text. On nix, they're byte strings that don't have nuls in them. Hopefully* they're encoded according to LANG, and hopefully LANG is a UTF-8 variant, but it isn't required, and it isn't required that two users on the same system use compatible LANGs, so you get a Tower of Babel. I really wish OSs would just start enforcing a. Unicode filenames, and b. no newlines in filenames; those two alone would make life so much easier.

Hopefully you've seen os.fsencode / fsdecode, but alas those aren't in 2, so I'm not sure they really help you. Often one is not really munging paths that much, and can just pass through whatever value/type you get, but it does happen, of course. (E.g., adding or removing extensions)

> Unicode sys.argv

This is also a pain point, since again, the underlying type in nix is a byte string without nuls. I'd hope it decodes w/ the LANG encoding, but since the user could easily tab-complete a filename, fsencode/decode might be more appropriate. I think I'd say "do whatever 3 does".

1 Jan 2020 is nearly here. Forget about 2 / assume UTF-8 in 2 and don't support anything else?

> I feel like Python 3 completely wrecked strings instead of making them better.*

A clear separation of text and binary is needed in the long run, and makes other operations much clearer and saner. The pain you're feeling is introduced from the OS not having the same clarity.

>> Unicode stdio?

> Mostly, `io` should handle this in both 3/2. You might need to help it get the right encoding in 2.

Does io handle stdio? I was referring to standard input and standard error/output. How do you read/write Unicode in a cross-2/3 way from/to standard input/output/error without adding your own translation layer?

>> Unicode file paths

> This is going to be a mess in any language, because file paths really aren't text.

Sorry I need to be more clear. That general mess is not the aspect of it I was referring to. I'm specifically referring to a 2/3 compatibility mess.

I meant that, for example, to have any semblance of Unicode handling, in Python 2 you do os.listdir(u"."), whereas in Python 3 you do os.listdir(b"."). I know how to handle it in both Python 2 and Python 3 in a way that's Good Enough (TM), but how do I even get that with cross-compatible code? I'd need to write a translation layer of sorts for every single I/O function I might use.

>> Unicode sys.argv

> I think I'd say "do whatever 3 does".

Hmm okay thanks, I'll need to try to see what the implications are again. I think the problems I recalled from this may have been just been a result of the other issues, not sure.

I can't "forget about 2" though, it's still on Ubuntu LTS systems and there are still packages in 2 that haven't been ported to 3.

>> I feel like Python 3 completely wrecked strings instead of making them better.

> A clear separation of text and binary is needed in the long run, and makes other operations much clearer and saner. The pain you're feeling is introduced from the OS not having the same clarity.

Again, I think this is "cleaner" in theory, not in practice. What happened to the string_escape/unicode_escape nonsense I pointed out with the new system? Any rebuttal to that one? ;-)

Your suggestion works for stand alone programs, less so for libraries that are expected to keep working in python 2 programs that are not using unicode_literals etc.

Everytime I hear somebody claim python 3 broke strings I just cringe.

No. No on just so many levels.

It finally fixed it by introducing two (or three if you count bytearray) types for completely different semantics of data.

Python 2 was just a mess whenever you left ascii.

Meanwhile I cringe every time I see people say "No" while ignoring the issues people like me point out.

I don't say the transition is easy or maintaining a codebase that handles these issues correct that supports python 3 and 2 at the same time.

But looking solely at python3 , everything byte and unicode related just got so much easier and better testable and with better error messages at better points in your code than in python 2.

So claiming python 3 broke stuff just not what the case is.

Python 2 had a deeply flawed, boundryless view of unicode vs bytes and python 3 fixed that.

So for example what is your rebuttal to my actual example pointing out that "a".encode('unicode_escape') returns bytes instead of strings? Why/how the heck is it even encoding anything when it never even asked me for an encoding? If for nonsensical reasons it really wants to encode things, shouldn't it at least be asking me for an encoding if it wants a "clean separation" between strings and bytes? You find Python 2's behavior of returning a string to be somehow deeply flawed but Python 3's returning of bytes to be sensible? Really? What problem are they solving here?

Encode and decode always move between str and bytes. Otherwise you could never write a function that takes an encoding as the argument and uses it in the middle of its code (because the types of your locals would vary at runtime in incompatible ways).

What is your reason for using this obscure encoding anyway?

> Encode and decode always move between str and bytes.

Yes and my entire point is that this makes zero sense when we're talking about escaping.

> What is your reason for using this obscure encoding anyway?



The encoding `unicode_escape` is not about escaping unicode characters. It's about python source code. It's defined as:

> Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped.

It makes absolutely no sense to have escaped unicode characters as actual unicode string. If you really need that, a version that also works in python 2 would be:


> It makes absolutely no sense to have escaped unicode characters as actual unicode string.

Absolutely no sense? So is a basic Python expression evaluator like this complete nonsense to you? #!/usr/bin/env python3

  try:  # Python 2
   from Tkinter import Tk, Entry, END
   import tkMessageBox as messagebox
  except ImportError:  # Python 3
   from tkinter import Tk, Entry, END, messagebox

  import ast
  def my_eval(s):
   # assume I've implemented this functionality manually...
   return ast.literal_eval(s)

  root = Tk()
  e = Entry(root)
  e.insert(END, '"Hell\\xc3\\xb6"')  # Assume the user typed has typed in a Python expression, not me
  root.bind('<Return>', lambda evt: messagebox.showinfo("Result", repr(my_eval(e.get()))))
I get an escaped string... because that's quite literally what Entry.get() gives me from the text box the user typed into. The simple fact that I got a string containing Python source code with escape characters makes "absolutely no sense" to you?

Note that that wasn't even my choice! That was the choice of the built-in Python GUI toolkit... distributed by the same folks who decided this string/bytes overhaul was a brilliant idea...

You have to invest some time in understanding how strings (byte strings and unicode) work in Python. I put this off for way too long.

After A TON OF PAIN I decided to put the time in to watch the talk "Pragmatic Unicode, or, How do I stop the pain?"[0] by Ned Batchelder and it all just clicked. Now, I make unicode sandwiches like a boss.

[0] https://youtu.be/sgHbC6udIqc

Uhm I think you misunderstood the problem. I already understand Unicode, and I can handle Unicode just fine as far as the facilities are there, in both Python 2 and Python 3 -- individually. The places where I have trouble are (1) where either or both languages do something nonsensical, (2) where the facilities just don't seem to be there (sys.argv?), (3) where there are bugs (try writing binary to stdout with IDLE open), or (4) where there seems no reasonable way to write code that behaves correctly in both languages without making your own little translation layer. Like as an example for #1, "a".encode('unicode_escape') == b'a' makes no sense. Why should escaping a string suddenly turn it into bytes? I never even specified an encoding for those bytes. And similarly why should "a".decode('unicode_escape') be an error? It makes perfect sense. What they did looked nice in theory (no implicit conversions between bytes and strings etc.) but IMHO they practically completely wrecked strings in Python 3.

Also, I think the facilities are lacking for case-insensitive comparisons in Python 2, so I guess this is one thing they improved. Still don't know how to do it "correctly" when writing Python 2-compatible code.

> Still don't know how to do it "correctly" when writing Python 2-compatible code.

Does writing python 2 code stop being 'worth it' at some point? If so, where is that point? It sort of sounds like you're already there. There's still people using P2 at least in part because of all the hoops people are jumping through to keep supporting it, no?

I'm not a professional python dev, so it doesn't impact me as much as some of my colleagues, but 'backwards compatible' issues crop up in other platforms/langauges as well. Wordpress might be the biggest example in PHP. They've kept a minimum target language which is far behing 'current' or even 'currently supported', and it's been a catch 22. Hosts keep supporting PHP 5.4, for example, far later than they 'should' have, because people kept writing new stuff targeting PHP 5.4. WP 5 was, alas, a missed opportunity to target PHP7 as a minimum. :/

> Does writing python 2 code stop being 'worth it' at some point? If so, where is that point? It sort of sounds like you're already there.

How do you know I'm there? If nothing else I still have my own previous Python 2 code that I've spent time on and that's useful to me! Why would I just throw them all away or waste massive amounts of time rewriting them into Python 3? Is my goal supposed to be to please the masses here?

Honestly that's sufficient reason already. But if you want a more "standard"/politically-correct response: Python 2 is still being supported, Ubuntu 14.04 LTS literally only just reached EOL last month and can be found in the wild, Python 3 support is lagging in a lot of places (e.g. PyPy is still on 3.5, although I'm just using it as an example; I don't use it much), and I still come across Python 2-based tools and packages once in a while.

So I guess the answer to your question is it "stops being worth it" when I stop coming across situations where I'd regret abandoning Python 2.

> There's still people using P2 at least in part because of all the hoops people are jumping through to keep supporting it, no?

"Because" is an odd way to put it... it's true that if they didn't support P2 then people would use P2 less, but people use it because it benefits them, not "because it's supported". I mean, you're also alive "at least in part because" of all the hoops people jump through to grow and bring food within your reach, but I doubt your conclusion is that this should stop being the case...

> They've kept a minimum target language which is far behind 'current' or even 'currently supported', and it's been a catch 22.

What I don't understand is why are people supposed to keep abandoning good software just because someone made something shiny and declared it "unsupported"? I hate this "you have to like my updates... or I will force you" attitude that every organization seems to have nowadays. People are trying to solve their own problems, not please the leader they're following.

> Hosts keep supporting PHP 5.4, for example, far later than they 'should' have, because people kept writing new stuff targeting PHP 5.4.

I mean, this isn't even the same situation? I'm not targeting Python 2 or introducing a dependency on it generally. I try hard to keep my code both Python 3 and Python 2 compatible. So the decision as to whether to move on or not is left to the client (which is often myself) and there's no obstacle either way.

> just because someone made something shiny and declared it "unsupported"?

Part of the argument for using much of the stuff out there is because it's "supported" (commercial, community, etc). Without some degree of support, things dwindle/die. The 'support' for language stuff may simply be security patches. If/when those stop being provided, you're starting to do a disservice to people to continue to push that language, even if the core functionality you need is still working and perhaps even unaffected.

> I'm not targeting Python 2 ... I try hard to keep my code both Python 3 and Python 2 compatible.

Given that there are breaking changes between the two, I can't read this any other way that that you're 'targeting' Python 2. If there's a way to do something in 2, and it's not working that way any more in 3, and you're making code so that it runs in both (weird syntax, translation layers, etc) then... you're targeting 3. The same way that if you use 3 syntax, you're 'targeting' 3. That's just how it is.

> but people use it because it benefits them, not "because it's supported".

The benefit is that they don't have to go through whatever hoops it takes to upgrade to have the latest versions. In many cases the later version may not add much directly, but 'community support' and 'security' are two ... emergent properties of a critical mass adopting new versions and dropping support for old versions.

> How do you know I'm there?

Wild-ass guess from the gist of all your earlier posts in this thread?

> But if you want a more "standard"/politically-correct response: Python 2 is still being supported,

I'd say that's partially 'under duress'. My reading of the situation years ago was that the Python community - at least the leadership - wanted to EOL P2 on Jan 1, 2015. There was a gigantic stink/pushback, and support was extended for another 5 years. I will grant that the v3 switchover was pretty bad - I'm not a pro user, but had some colleagues that dealt with a lot, and certainly, 8+ years ago, it was pretty hard to just 'switch' running systems. Even developing new stuff from scratch - there were a lot of 'only works in 2' libraries that were entrenched standards without clear update paths. However, rather than doubling down and making 3 a more attractive proposition (work on upgrade path, faster language, better docs, whatever), the agreement was to support P2 for another 5 years. That's an eternity in the tech world. You could argue it shouldn't be, but it is. Yet, from python colleagues - those who've stayed - the upgrade story isn't significantly better. I'm taking their word for it, but based on your posts, it doesn't seem to be either.

have you written your own 2/3 compatibility layer? I can't imagine writing anything large without six...

fwiw, six can easily be vendored into a project to avoid the technical external dependency. that is how we manage it for kafka-python.

No compatibility layer. Its really not bad.

there are a few modules that are simply at different locations but have the same API

  if PY3:
    from http import client as httplib
    import httplib

  PY3 = sys.version_info >= (3, 0)
  PY2 = sys.version_info < (3, 0)
  PY26 = sys.version_info >= (2, 6) and sys.version_info < (2, 7)
is string

  isinstance(<maybe_string>, basestring if PY2 else str)
using different classes

  # Python 2.6 doesn't properly UTF-8 encode syslog messages, so it needs
  # to be performed in a custom formatter.
  formatter_class = UnicodeLoggingFormatter if PY26 else logging.Formatter

You don't need all of six in most cases. You just need a couple of different functions.

As the maintainer of WebOb/Pyramid/Waitress we have a compat module that contains all of the changes/renames/functions to help with the Python2/3 compatibility and all tests run across both platforms.

Are the functions borrowed heavily from six? Yes, but we don't need to vendor all of six.

Same here, though mine is a small project. I'd guess six probably becomes more useful when you starting hitting certain features/areas of the language. I'd be curious about what things make six start being useful.

Seconding this, I contribute to (and recently became a maintainer of) a relatively popular package that supports Python 2 and 3. We have 3 dependencies, and none of them are six.

To be fair, one of our three dependencies does in fact depend on six, but we don't use six anywhere in our code. This dependency of ours is used only in one specific area of the codebase, and in fact I (the person who added the dependency) wish that we didn't have to use it at all.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact