Hacker News new | past | comments | ask | show | jobs | submit login
Pip has dropped support for Python 2 (pypa.io)
774 points by groodt 41 days ago | hide | past | favorite | 673 comments



This is awesome in terms of avoiding all of the weird things when a person typed pip rather than pip3 and module didn't seem to get installed anywhere. That said, watching perl trying to kill perl5 with perl6 (unsuccessful) and python trying to kill python 2 with python 3 (more successful) it struck me how ridiculous it is that open source languages have to put up with this. Clearly "major" numbers are insufficient, the only real answer is to rename the entire freaking language when you make incompatible changes to it.


I think a lot of important lessons got learned in both cases. Clearly perl6 should have had a different name. But I think python2->python3 could've been much less painful if they'd known to prioritize single-codebase compatibility from the very beginning. I think you can see that lesson applied with e.g. Rust editions, which as far as I can tell have been a complete success.


Hindsight might very well be easy in this case, but I cannot think otherwise than finding the Python developers to have been ridiculously naïve in how they handled this, and foolish to even begin it from the start.

The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.

The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.

Everything about the switch, from it's very conception, to it's execution, was handled in a veritably disastrous way by the team, that really did not seem to appreciate even a fraction of what obviously is involved with projects that have millions of lines of code and of course would rather not have to rewrite that all.

This is why many projects such as Linux, Windows, Rust, Cobol, Fortran, C, and C++ take backwards compatibility quite serious. Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.

Even on my home computer, I simply do not have the time to rewrite the many Python 2 scripts that have written over the years that run my computer. — it is cumbersome enough that once in a while part of my desktop stops functioning because my distribution removed a Python 2 library which I had relied upon as a system library that I now have to install as a user library but that was hitherto quite easily fixed.

Do these men think that time is free?


I have migrated tons of Python codebases from 2 to 3, i guess starting with the release of Python3.4 which was when Python3 reached a kind of production readiness (and also had gained enough trust and IIRC it had also reestablished compatibility in some parts).

I think the incompatibilities between Python 2 and Python 3 fell into two categories

1. trivial and totally avoidable API changes by the Python developers ( like `iteritems()` being renamed to `items()` and the Python2-`items` being removed from the language). The bet on the Python-dev side was that `2to3` would take care of that and in this they totally underestimated that libraries couldn't and wouldn't just make a python3 migration in lockstep with the primary python release.

2. Change to unicode-strings by default with a clear distinction between unicode-strings and byte-buffers for all data encoded in any other fashion.

Most people on Python3 nowadays won't actually know how beneficial Change no. 2 was overall for the health of the Python ecosystem and the stability of their code bases. But it also was the tricky part of the migration for some code bases that would do a lot of string / file content plumbing like mercurial as a prominent example).

Change no. 1. was a PITA and a lot could have been avoided, but it wasn't a huge problem. The huge problem for the ecosystem was the unicode change, but I don't think anyone questions its usefulness (except for Armin Ronacher maybe, who is the most prominent voice with a dislike for that).


The string change (rightly) is always brought up, but the one that has always amazed me is they changed how division works.

Whenever I show this to people they are (rightly) horrified.

  ~ $ python
  Python 2.7.16
  >>> 3/2
  1
  
  ~ $ python3
  Python 3.7.7
  >>> 3/2
  1.5


Well, breaking change, but harmless in most instances, the other way around would be more harmful. Also that was I think a „from future import“ option so you could enable it while on python 2.7 on a per file basis


I am so thankful for that unicode change. 2.x's unicode was a problem for me.


Well, in general yes. Had people who had to deploy console apps To non Unicode terminals and ran into massive overhead...


> The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.

There where large changes to fundamental parts of the type system, as well as core types. Pretending this isn’t the case belays ignorance or at the very least cherry picking.

How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?


Let's not pretend that py3's string changes weren't fundamentally wrong and didn't create years of issues trying to decode things that could properly be arbitrary byte sacks as utf-8.

So my answer is that it was a deeply misconceived change that shouldn't have been made at all, let alone been taken as the cornerstone of a "necessary" break in backward compatibility.


The string changes where both necessary and correct. There is a difference between bytes and strings and treating them as the same led to so many issues. Thank god I’ve not seen a UnicodeDecodeError in decades.

And the ecosystem agrees.


what's wrong about strings representing text?

You're not making an argument about backward compatibility here, you're making a strong claim that representing text as a sequence of Unicode code points is fundamentally wrong. I have never heard anyone make this point before, and I am inclined to disagree, but I'm curious what your reasoning is for it.


Indeed, representing text as a sequence of Unicode code points is fundamentally wrong.

There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.

(Everyone's favourite example, length, actually becomes less correct—a byte array's length at least corresponds to the amount of space one might have to allocate for it in a particular encoding. A length in codepoints is absolutely meaningless both technically and linguistically. And this is, for what little it's worth, close to the only operation you can do on a string without imposing additional restrictions about its context.)


> There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.

That’s ridiculous. Uppercasing/lowercasing, slicing, “startswith”, splitting, etc etc.

Your statement is correct if you only care about ascii.


Uppercasing/lowercasing cannot be done on Unicode code points, because that fails to handle things like fi -> FI where the uppercased version does not consist of the same number of Unicode code points. Slicing and splitting cannot be done on Unicode code points because it may separate a character from a subsequent combining character. "startswith" cannot be done on Unicode code points because some distinct code points need to be treated as equivalent. These are pretty much the same problems you also have when you perform those same operations on bytes. You might encounter those problems in fewer cases when you perform operations on code points rather than on bytes, but you won't have solved the problems entirely.


Worse, you'll have pushed the problematic cases out of the realm of obviously wrong and not sensible to do, into subtly wrong and will break down the line in ways that will be hard to recognize and debug.


None of those operations are correct on Unicode codepoints. Your statement is only just barely tenable if you only care about well-edited and normalized formal prose in common Western languages.

Even then, upper/lower-casing is iffy.


> There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.

Wow. I wonder how you arrived at this point. You can't, for example, truncate a UTF-8 byte array without the risk of producing a broken string. But this is only the start. Here are two strings, six letters each, one in NFC, the other in NFD, and their byte-length in UTF-8:

    "Åström" is 8 bytes in UTF-8

    "Åström" is 10 bytes in UTF-8
If your software tells the user that one is eight and the other is 10 letters long, it is not "less correct". It is incorrect. Further, if searching for "Åström" won't find "Åström", your software is less useful than it could be if it knew Unicode. (And it's sad how often software gets this wrong.)


> If your software tells the user that one is eight and the other is 10 letters long, it is not "less correct". It is incorrect.

In fact, if the software tells you that either of the strings is either 8 or 10 letters wrong, then either way the software is incorrect - those are both obviously 6 letter strings.

Now, does UTF8 help you discover they are 6 letter strings better than other representations? There are certainly text-oriented libraries that can do that, but not those that simply count the UTF8 code points - they must have an understanding of all of Unicode. Even worse, the question "how many letters does this string have" is not generally meaningful - there are plenty of perfectly valid unicode strings for which this question doesn't have a meaningful answer.

However, the question "how many unicode code points does this string have" is almost never of interest. You either care about some notion of unique glyphs, or you care about byte lengths.


> then either way the software is incorrect - those are both obviously 6 letter strings.

What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree. Why should I consider starting at byte-arrays?

> there are plenty of perfectly valid Unicode strings for which this question doesn't have a meaningful answer.

I don't get it. Why does the existence of degenerate cases invalidate the usefulness of a Unicode lib? If I want to know how many letters are in a string, I can probably get a useful answer from a Unicode lib. Not for all edge-cases, but I can decide on the trade-offs. If I have a byte-array, I start at a lower level.


> What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree.

You do not. You merely happen to get the right answer by coincidence in some cases, same as bytes-that-probably-are(n't)-ASCII. To throw your own words back at you:

  "Åström" is 6 code points in Unicode

  "Åström" is 8 code points in Unicode
If your software tells the user that one is six and the other is 8 letters long, it is not "less correct". It is incorrect. Further, if searching for "Åström" won't find "Åström", your software is less useful than it could be if it knew text. (And it's sad how often software gets this wrong.)


You can't truncate a sequence of Unicode codepoints without the risk of producing a broken string, either. What do you get if you truncate "Åström" after the first "o"? What do you get if you truncate 🇨🇦 after the first codepoint?

Normalization is not a real solution unless you restrict yourself to working with well-edited formal prose in common Western languages.

This is not a claim made from ignorance.


Sorry, we're mixing two layers. Of course, if I truncate a string, it may lose its meaning. And having accents fall off is problematic. But it's not the same as truncating a byte-array, because then an invalid sequence of bytes may result.

Stop treating these cases as equivalent. They're not.


They are equivalent. The only reason you find it problematic that a sequence of bytes is "invalid" (read: can't be decoded in your preferred encoding) is because you've manufactured the problem.

In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer, and just being valid utf-8 isn't good enough for it either.


> In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer

Ok that explains how we ended up here. I'm considering some other common uses! A search-index for example greatly profits from being able to normalize representations and split words.


Search index use cases probably also benefit from normalizing inputs across encodings, that's no shining example of utf-8-onlyism.

You can still best-effort split words! You can do a pretty good job splitting words without ensuring that the words decode in your preferred encoding.


Here's the thing: I don't want to work in UTF8. I want to work in Unicode. Big difference. Because tracking the encoding of my strings would increase complexity. So at the earliest convenience, I validate my assumptions about encoding and let a lower layer handle it from then on.

I understand you're arguing about some sort of equivalency between byte-arrays and Unicode strings. Sure there are half-baked ways to do word-splitting on a byte-array. But why do you consider that a viable option? Under what circumstances would you do that?


Every circumstance. Why do you consider it unviable? What problems do you think having a Unicode sequence solves?


Convince me. Here's a little library function that turns text into a set of words:

    def keywords(text):
        return set(filter(None, re.split("\W+", unicodedata.normalize("NFKC", input_str).lower())))
How would this look if strings were byte-arrays? How would `normalize()`, `lower()`, and `split()` know what encoding to use?

The way I see it: If the encoding is implicit, you have global state. If it's explicit, you have to pass the encoding. Both is extra state to worry about. When the passed value is a Unicode string, this question doesn't come up.


It looks pretty much the some, except that you assume the input is already in your library's canonical encoding (probably utf-8 nowadays).

I realize this sounds like a total cop-out, but when the use-case is destructively best-effort tokenizing an input string using library functions, it doesn't really matter whether your internal encoding is utf-32 or utf-8. I mean, under the hood, normalize still has to map arbitrary-length sequences to arbitrary-length sequences even when working with utf-32 (see: unicodedata.normalize("NFKC", "a\u0301 ffi") == "\xe1 ffi").

So on the happy path, you don't see much of a difference.

The main observable difference is that if you take input without decoding it explicitly, then the always-decode approach has already crashed long before reaching this function, while the assume-the-encoding approach probably spouts gibberish at this point. And sure, there are plenty of plausible scenarios where you'd rather get the crash than subtly broken behaviour. But ... I don't see this reasonably being one of them, considering that you're apparently okay with discarding all \W+.


I agree with you. I wish Python 3 had strings as byte sequences mainly in UTF-8 as Python 2 had once and Go has now. Then things would be kept simple in Japan. Python 3 feels cumbersome. To handle a raw input as a string, you must decode it in some encoding first. It is a fragile process. It would be adequate to treat the input bytes transparently and put an optional stage to convert other encodings to UTF-8 if necessary.


I know this from PHP, where I have to be aware of the encoding the strings are in. I still don't see what the advantage should be of that.


So in one case, the text becomes corrupted and unreadable (i.e. loses its meaning), and in the other, it becomes corrupted and unreadable. What's the difference?

Having "accents fall off" has gotten people murdered [0]. Accents aren't things peppered in for effect, they turn letters into different letters, spelling different words. Analogously, imagine that a bunch of software accidentally turned every "d" into a "c" because some committee halfway around the world decided "d" should be composed of the "c" and "|" glyphs. That's the kind of text corruption that regularly happens in other languages when dealing with text at the code point layer.

[0] https://languagelog.ldc.upenn.edu/nll/?p=73 . Note that this is Turkish, which has the "dotted i" problem, meaning that this was more than likely a .toupper() gone wrong rather than a truncation issue.


The difference is that for truncating, I can work within Unicode to deal with the situation. I can accept the possibility of mutilated letters, I can convert to NFC, I can truncate on word-boundaries, I have choice.

If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string. End of story.


And what is wrong with an invalid UTF-8 string? Why were you truncating the string in the first place?

Basically, I believe the point here is that a Unicode aware truncation should be done in a Unicode aware truncate method. There is no good reason to parse a string as UTF-8 ahead of time - just keep it as a blob of bytes until you need to do some something "texty" with it. It is the truncate-at-word-boundaries() method that should interpret the bytes as UTF-8 and fail if they are not valid. Why parse it sooner?


> If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string.

Yes, and? You can have an invalid sequence of Unicode code points too, such as an unpaired surrogate (something Python's text model actually abuses to store "invalid Unicode" in a special, non-standard way).

If you truncate at the byte level, you are just truncating "between code points"; it's a closer granularity than at the code point layer, so you can also convert to NFC, truncate on word boundaries, etc. You just need to ignore the parts of the UTF-8 string that are invalid; which isn't difficult, because UTF-8 is self-synchronizing.


> How would you have handled the string/bytes split in a way that’s backwards compatible?

A language pragma.

All functions that return `bytes` continue to do so unless specifically opted in on a per file basis, then they return `unicode`.

`str` thus returns `bytes` as it does in 2, unless the pragma ask otherwise.

> Or the removal of old-style classes?

They would obviously not be removed and still be available but depræcated.


> All functions that return `bytes` continue to do so unless specifically opted in on a per file basis, then they return `unicode`.

Nothing in py2 returns bytes. They all return strings. That is the issue. What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?

A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.

It would make the already ridiculous py2 str/bytes situation even more ridiculous.

> They would obviously not be removed and still be available but depræcated.

Having two almost separate object models in the same language is rather silly.


> Nothing in py2 returns bytes. They all return strings. That is the issue.

No, that is not an issue, that is semantics.

What one calls it does not change the behavior. And aside that the system could perfectly well be designed that this pragma changes that `str` is synonymous with either `bytes` or `unicode` depending on it's state.

What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?

You would now which is which by using the pragma or not.

Not using the pragma defaults to the old behavior, as said, one only receives the new, breaking behavior, when one opts in.

Python could even support always opting in by a configuration file option for those that really want it and don't want to add the pragma at the top of every file.

> A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.

Opposed to the burden they already had of maintaining a 2 and 3 version?

Any new code can of course always return `unicode` rather than `str` which in this scheme is normally `bytes` but becomes `unicode` with the pragma.

It would make the already ridiculous py2 str/bytes situation even more ridiculous.

> Having two almost separate object models in the same language is rather silly.

Yes, it is, and you will find that most languages are full of such legacy things that no new code uses but are simply for legacy purposes.

It is silly.” turns out to be a rather small price to pay to achieve. “We have not broken backwards compatibility.”.


I don’t really have the time of inclination to continue arguing, but I will point out that you say all this as though the approach the team took failed. It worked. The ecosystem is on py3.

You can imagine some world with a crazy context-dependent string/bytes type. Cool. In reality this would have caused endless confusion, especially with beginners and the scientific community, and likely killed the language or at the very least made the language a shadow of what it is now.

They made the right choice given the outcome. Anything else is armchair postulation that was discussed previously and outright rejected for obvious reasons.


> It worked. The ecosystem is on py3.

Because they're doing everything they can to force py2 to go away. It's not it's dying a natural death out of disuse. Exhibit A is everyone else in this post still wanting to use it.

If you think strings "work" under py3, my guess is you've never had to deal with all the edge cases, especially across all the 3 major desktop platforms. Possibly because your applications are limited in scope. (You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.) Most things Python treats as Unicode text by default (file contents, file paths, command-line arguments, stdio streams, etc.) are not guaranteed to be contain only Unicode. They can have invalid Unicode mixed into them, either accidentally or intentionally, breaking programs needlessly.

As a small example, try these and compare:

  python2 -c "import sys; print('Your input was:'); print(sys.argv[1])" $'\x80' | xxd
  python3 -c "import sys; print('Your input was:'); print(sys.argv[1])" $'\x80' | xxd
This program is content-agnostic (like `cat`, `printf`, etc.), and hence, with a decent standard library implementation, you would expect it to be able to pass arbitrary data through just fine. But it doesn't, because Python insists on treating arguments as Unicode strings rather than as raw data, and it behaves worse on Python 3 than Python 2. You really have to go out of your way to make it work correctly—and the solution is often pretty much to just ditch strings in many places and deal with bytes as much as possible... i.e., you realize Unicode strings were the wrong data type. But since you're still forced to deal with them in some ways, you get the worst of both worlds and that increases the complexity dramatically and it become increasingly painful to ensure your program still works correctly as it evolves.

I say all these because I've run into these and dealt with them, and it's become clear to me that others who love Unicode strings just haven't gone very far in trying to use them. Often this seems to be because they (a) are writing limited-scope programs rather than libraries, (b) confine themselves to nice, sanitized systems & inputs, and/or (c) take an "out-of-sight -> out-of-mind" attitude towards issues that don't immediately crop up on their systems & inputs.


> You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.

At the risk of sounding like a dick, I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?

If you want a string then it needs to be a valid string with a known encoding (not necessarily utf8). If you want to pass through any data regardless of its contents then you use bytes. They are two very different things with very different use cases.

If I read a file as utf8 I want it to error if it contains garbage, non-text contents because the decoding failed. Any other way pushes the error down later into your system to places that assume a string contains a string but it’s actually arbitrary bytes. We did this in py2 and it was a nightmare.

I concede that it’s convenient to ignore the difference in some circumstances, but differentiating between bytes/str has a lot of advantages and makes Python code more resilient and easier to read.


> I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?

That's not quite what I was saying here. Note I said "wide variety of usage", not "widely used". Django is a web development framework—and its purpose is very clear and specific: to build a web app. Crucially, a web framework knows what its encoding constraints are at its boundaries, and it is supposed to enforce them. For examples, HTTP headers are known to be ASCII, HTML files have <meta ...> tags to declare encodings, etc. So if a user says (say) "what if I want to output non-ASCII in the headers?", your response is supposed to be "we don't let you do that because that's actually wrong". Contrast this with platform I/O where the library is supposed to work transparently without any knowledge of any encoding (or lack thereof) for the data it deals with, because that's a higher-level concern and you don't expect the library to impose artificial constraints of its own.


"If I read a book as Russian, I want it to error if it contains French, non-Russian contents because the decoding failed. Any other way pushes the error down later into your system to readers that assume a Russian passage contains Russian but it's actually arbitrary text. We did this in War and Peace and it was a nightmare."


“If I expect a delivery of war and peace in English, I want it to error if I actually receive a stone tablet containing Neanderthal cave paintings thrown through my window at night”. They are two very different things, even if they both contain some form of information.


You are engaged in some deep magical thinking about what encodings, to believe that knowing the encoding of a so-called string allows you to perform any operations on it more correctly than on sack of bytes. (Fewer, in fact—at least the length of a byte array has any meaning at all.)

It's an easy but very much confused mistake to make if the text you work with is limited to European languages and Chinese.


> You are engaged in some deep magical thinking about what encodings, to believe that knowing the encoding of a so-called string allows you to perform any operations on it more correctly than on sack of bytes.

Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?

Every single operation on a string-that-might-not-be-a-string-really would have to be fallible, which is a terrible interface to have for the happy path.

How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding (not that it means much with it).

How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.

How would “.startswith()” work with regards to grapheme clusters?

Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to do if your base type is “just bytes”.

Sure you could make all of these return garbage if your “string” is actually an mp3 file, aka the JavaScript way, but... why?


> Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?

It doesn't. It doesn't work with Unicode either. No, not "would need giant tables", literally doesn't work—you need to know whether your text is Turkish.

> How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding.

It's meaningless with an encoding: what are the first four characters of "áíúéó"? Do you expect "áí"? What are the first four characters of "ﷺ"? Trick question, that's one unicode codepoint.

At least with bytes you know that your result after slicing four bytes will fit in a 4-byte buffer.

> How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.

It doesn't work with Unicode either. I'm sure you've enjoyed the results of concatenating a string with an RTL marker with unsuspecting text.

It gets worse if we remember try to ascribe linguistic meaning to the text. What's the result of concatenating "ranch dips" with "hit singles"?

> How would “.startswith()” work with regards to grapheme clusters?

It doesn't. "🇨" is a prefix of "🇨🇦"; "i" is not a prefix of "ij".

> Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to before.

None of the distinctions you're trying to make are tenable.


It is not clear to me whether there is a material difference here. Any text string is a sequence of bytes for which some interpretation is intended, and many meaningful operations on those bytes will not be meaningful unless that interpretation is taken into account.

The problem that you have raised here seems to be one of what alphabet or language is being used, but that issue cannot even arise without taking the interpretation into account. If you want alphabet-aware, language-aware, spelling-aware or grammar-aware operators, these will all have to be layered on top of merely byte-aware operations, and this cannot be done without taking into account the intended interpretation of the bytes sequence.

Note that it is not unusual to embed strings of one language within strings written in another. I do not suppose it would be surprising to see some French in a Russian-language War and Peace.


This implies that you should have types for every intended use of a text string. This is, in fact, a sensible approach, reasonably popular in languages with GADTs, even if a bit cumbersome to apply universally.

A type to specify encoding alone? Totally useless. You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..


To implement any of the above, while studiously avoiding anything making explicit the fact that the interpretation of the bytes as a sequence of glyphs is an intended, necessary and separable step on the way, would be bizzarre and tendentious.

I see you have been editing your post concurrently with my reply:

> You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..

Of course you can (though maybe not "just as well"), but that does not mean it is the best way to do so, and certainly not that it is "totally useless" to implement the decoding as a separate step. Separation of concerns is a key aspect of software engineering.


> To implement any of the above, while studiously avoiding anything making explicit the fact that the interpretation of the bytes as a sequence of glyphs is an intended, necessary and separable step on the way, would be bizzarre and tendentious.

Codepoints are not glyphs. Nor are any useful operations generally performed on glyphs in the first place. Almost all interpretable operations you might want to do are better conceived of as operating as substrings of arbitrary length, rather than glyphs, and byte substrings do this better than unicode codepoint sequences anyway.

So I contest the position that interpreting bytes as a glyph sequence is a viable step at all.


Fair enough, codepoints, but the issue remains the same: you keep asserting that it is pointless - harmful, actually - to make use of this one particular interpretation from the hierarchy that exists, without offering any valid justification for why this one particular interpretation must be avoided, while both lower-level and higher-level interpretations are useful (necessary, even.)

Going back to the post I originally replied to, how would going down to a bytes view avoid the problems you see?


Let me rephrase. Codepoints are even less useful than abstract glyphs, cf. https://manishearth.github.io/blog/2017/01/14/stop-ascribing... (I don't agree 100% with the write-up, and in particular I would say that working on EGCs is still just punting the problem one more layer without resolving it; see some of my other posts in this thread. But it makes an attempt at clarifying the issue here.)

The choice of the bytes view specifically is just that it's the most popular view from which you can achieve one specific primitive: figuring out how much space a (sub)string occupies in whatever representation you store it in. A byte length achieves this. Of course, a length in bits or in utf-32 code units also achieves this, but I've found it rather uncommon to use utf-32 as a transfer encoding. So we need at least one string type with this property.

Other than this one particular niche, a codepoint view doesn't do much worse at most tasks. But it adds a layer of complexity while also not actually solving any of the problems you'd want it to. In fact, it papers over many of them, making it less obvious that the problems are still there to a team of eurocentric developers ... up until emoji suddenly become popular.

Now, I can understand the appeal of making your immediate problems vanish and leaving it for your successors, but I hope we can agree that it's not in good taste.


While all the facts in this post appear correct, they do not seem to me to amount to an argument either for the proposition that an implementation at the utf-8 level is uniquely harmful, or that a bytes-level approach avoids these problems.

For example, working with the utf-8 view does not somehow foreclose on knowing how much memory a (sub)string occupies, and it certainly does not follow that, because this involves regarding the string as a sequence of bytes, this is the only way to regard it.

For another, let's consider a point from the linked article: "One false assumption that’s often made is that code points are a single column wide. They’re not. They sometimes bunch up to form characters that fit in single “columns”. This is often dependent on the font, and if your application relies on this, you should be querying the font." How does taking a bytes view make this any less of a potential problem?

Is a team of eurocentric developers likely to do any better working with bytes? Their misconceptions would seem to be at a higher level of abstraction than either bytes or utf-8.

You are claiming that taking a utf-8 view is an additional layer of complexity, but how does it simplify things to do all your operations at the byte level? Using utf-8 is more complex than using ascii, but that is beside the point: we have left ascii behind and replaced it with other, more capable abstractions, and it is a universal principle of software engineering that we should make use of abstractions, because they simplify things. It is also quite widely acknowledged that the use of types reduces the scope for error (every high-level language uses them.)


The burden of proof is on showing that the unicode view is, in your words, a more capable abstraction. My thesis is that it is not. This is not because it necessarily does anything worse (though it does). It must simply do something better. If there were actually anything at all it did better—well, I still wouldn't necessarily want it as a default but it would be a defensible abstraction.

The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.

There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem; that's cargo-culting. An abstraction that does no work is, ceteris paribus, worse than not having it at all.


> The burden of proof is on showing that the unicode view is, in your words, a more capable abstraction. My thesis is that it is not.

The quote, as you presented it, leaves open the question: more capable than what? Well, there's no doubt about it if you go back to my original post: more capable than ascii. Up until now, as far as I can tell, your thesis has not been that unicode is less capable than ascii, but if that's what your argument hangs on, go ahead - make that case.

What your thesis has been, up to this point, is that manipulating text as bytes is better, to the extent that doing it as unicode is harmful.

> It must simply do something better. If there were actually anything at all it did better...

It is amusing that you mentioned the burden of proof earlier, because what you have completely avoided doing so far is justify your position that manipulating bytes is better - for example, you have not answered any of the questions I posed in my previous post.

> The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.

Here we have another assertion presented without justification.

> There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem...

It is about as close as anthing gets to a universal principle in software engineering, and if you want to disagree on that, go ahead, I'm ready to defend that point of view.

>... that's cargo-culting.

How about presenting an actual argument, instead of this bullshit?

Furthermore, you could take that statement out of my previous post, and it would do nothing to support the thesis you had been pushing up to that point. You seem to be seeking anything in my words that you think you can argue against, without regard to relevance - but in doing so, you might be digging a deeper hole.

> An abstraction that does no work is, ceteris paribus, worse than not having it at all.

Your use of a Latin phrase does not alter the fact that you are still making unsubstantiated claims.


Put it this way: claim a use-case you believe the unicode view does better on than an array of bytes. Since you're making the positive claim, this should be easy.

I guarantee you there will be a quick counterexample to demonstrate that the claimed use-case is incorrect. There always is.

You may review the gish gallop in the other branch of this thread for inspiration.


Now you are attempting a full-on burden-shifting approach, but the unsupported claims here are that a unicode view is "fundamentally wrong" and that the proper approach is to operate on raw bytes. You can start on correcting this omission by answering the questions I posed about your claims a couple of posts ago.

https://news.ycombinator.com/item?id=25895523


If I recall, this is the solution: https://stackoverflow.com/a/27185688

I don't know why there isn't a sys.argvb as there is is.environb


Fwiw, the python3 version didn't run at all for me in Python 3.9.0 on Mac.

    "UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed


> It worked. The ecosystem is on py3.

Primarily because they killed py2, not because they won people over with their Unicode approach.


I agree with this, but I would make one important tweak: make the new behavior opt-out, instead of opt-in, with a configuration file option for switching the default.

You're still breaking code by default this way, but no one would have trouble updating.

My concern is that, if you don't make the preferred behavior clear, a lot of people would simply never adopt it. I don't think that Python's userbase in particular is going to spend time reading documentation on best practices.


I do believe that such a trivial change would indeed be fine. If one can go to the effort of installing the new version, one can add one line in a configuration file to depend upon old behavior.


Yeah even CMake got the bit right. CMake!!


I think some modular approach could have solved the incompatiblity issue, such as "from future import ...". Shorthands could have been invented to define everything in a single line.

Perl5 has similar flags ("use strict"), and Racket brings it even further to define the whole fucking language of the rest of the file ("#lang racket/gui"). Having the language being choosable by the user is against the "zen of python", I guess. In other words: Such an attemp does not feel "pythonic".


For some syntactical sugar, like print-function, sure. But there are more fundamental changes that couldn’t be papered over.


Fundamental changes shouldn't have been papered over by calling the new language Python, as it was a fundamentally different language at that point.


No, it’s the same language but with different semantics around a specific type. That’s not a different language and code can co-exist with a bit of thought.


Every language goes through this at some point in its development: flaws that limit future development have to be fixed. Should every language rename itself and split its community at that point? That seems like an extreme response to a common problem.


> Should every language rename itself and split its community at that point?

yes. If breaking, fundamental changes are common, that's a problem.


That people can make an initial plan that is self-consistent, logical, and foresees and provides for all future use-cases is a basic tenet of waterfall-style development. The history of software engineering does not uphold that principle. Why would it be different for language designers?


Just because an unforeseen issue arises, doesn't mean you need to introduce a breaking change right away.


The "new language" is called Python 3.


Yes, yes it is. And, like "Perl 6" and to a lesser extent "C++", that name is misleading (and therefore bad), because there is already a different language called "Python" (respectively "Perl", "C"), with significant superficial similarities that it could be confused with.


Please note that the misleading part of Perl 6 has been fixed by renaming it to the Raku Programming Language (https://raku.org using the #rakulang tag on social media).


> How would you have handled the string/bytes split in a way that’s backwards compatible?

My understanding is that the corresponding types are available in both 2 and 3, they're just named differently. A different one is "string". So you could have had some kind of mode directive at the top of the file which controlled which version that file was in, and allow files from 2 and 3 to be run together.


Actually think about it. bytes is str in Python 2. There is no bytes type in py2. How would a per-file directive (of all things) help?

What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be? If different, how would it be detected or converted? What if it retuned a utf8 string OR bytes? What if that py3 function then passed it to a py2 function - would it become a string again? Would you have two string types - py2string that accepts anything and py3string that only works with utf8? How would this all work with C modules?


> What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be?

It would be bytes. Because py2 string === py3 bytes.

> What if that py3 function then passed it to a py2 function - would it become a string again?

Yes

> Would you have two string types - py2string that accepts anything and py3string that only works with utf8?

Yes. You already have those two types in python3. bytes and string. You'd just alias those as string and utf8 or whatever you want to call it in python2.

How would this all work with C modules?

They'd have specify which mode they were working with too.


But all this would require huge rewrites of code and would never be backward compatible. You’re trading “py2 vs py3” with “py2 mode vs py3 mode”.

So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.

Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.

It would become an absolute complete clusterfuck of corner cases that would have killed the language outright.


> You’re trading “py2 vs py3” with “py2 mode vs py3 mode”.

Yes, that's the whole point. Because compatible modes allow for a gradual transition. Which in practice allows for a much faster transition, because you don't have to transition everything at once (which puts some people off transitioning entirely - making things infinitely harder for everyone else).

Languages like Rust (editions) and JavaScript (strict mode) have done this successfully and relatively painlessly.

> So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.

Well yes, you'd still have to upgrade your code. That goes with a major version bump. But it would allow you to do it on a library-by-library basis rather than forcing you to wait until every dependency has a v3 version. Have that one dependency that keeping you stuck on v2: no problem, upgrade everything else and wrap that one lib in conversion code.

> Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value[0]) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.

I'm not sure I understand the problem here. The types themselves are the same between python 2 and 3 (or could have been). It's just the labels that refer to them that are different. A subclass of string in python 2 code would just be a subclass of bytes in python 3 code.


We have lots of existence proofs of languages evolving gracefully and not throwing old code off a cliff.

Python 3 made the wrong trade-off of core developer hours vs. external developer hours.


py2 str == py3 bytes

py2 unicode == py3 str

The problem with this approach is that they wanted to reuse the `str` name, which requires a big "flag day", where it switches meaning and compatibility is effectively impossible across that boundary (without ugly hacks).

What they could have done instead would have been to just rename `str` to `bytes`, but retain a deprecated `str` alias that pointed to `bytes`.

That would keep old scripts running indefinitely, while hopefully spewing enough warnings that any maintained libraries and scripts would make the transition.

Eventually they could remove `str` entirely (though I'd personally be against it), but that would still give an actual transition period where everything would be seamlessly compatible.

Same thing with literals: deprecate bare strings, and transition to having to pick explicitly between `b"foo"` and `u"foo"`. Eventually consider removing bare strings entirely. DO NOT just change the meaning of bare strings while removing the ability to pick the default explicitly (in contrast, 3.0 removed `u"asdf"`, and it was only reintroduced several versions later).

What made me personally lose faith in the Python Core team wasn't that Guido made an old mistake a long time ago. It wasn't that they wanted to fix it. It was the absolutely bone-headed way that they prioritized aesthetics over the migration story.


>Would you have two string types - py2string that accepts anything and py3string that only works with utf8? Yes. A single naive one for py2 and two separate ones for py3 bytes and unicode. All casting between the two would have to be made explicit.

> How would this all work with C modules? In non-strict mode, you'd be able to use either py2 strings or py3 bytes with these, and gradually move all modules to strict mode which requires bytes.

And then, gradually after a decade or so attempt to get rid of all py2 types.


> How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?

I'm not sure it's the best way to handle it, but I would have been fine with:

    from __python2__ import *
for full backward compatibility; or, more explicitly:

    from __python2__ import ascii_strings, old_style_classes, print_statement, ...

As the parent poster mentions, several other popular languages and systems (C++, Java, etc.) have done a pretty decent job preserving backward compatibility, for good reason: it saves millions of hours of human effort. It's embarrassing and disappointing that Python simply blew it with the Python 2 to 3 transition.

Maybe we could still evolve pypi to support a compatibility layer to allow easy mixing of python2 and python3 code, but I get the feeling that Python 3 has poisoned the well.


When I was learning Python 6 years ago I was the only one using Python 3 in my group because I use arch linux. It was very basic code and everyone basically solved the same problem. Everyone else's code didn't work on my machine because print is not a statement in Python 3.

That's just plain stupid. Just print a warning and add a python2 flag that hides the warning. Don't release a major version because of something trivial like this.


They didn’t release A major version because they changed print from a statement to a function.


Python gave everyone 12 years to deal with version 3 being the way forward. There are many fundamental changes.

The fact that people seem to complain exclusively after Python 2' end of life a year ago feels a little telling. Perl's community roffle stomped their previous vision for Perl 6. Python community wasn't vocal about this being a bad change. Rather the opposite, very loud support.

Keep in mind, I dislike Python either way, but I'm not one of the devs that complains about continuing education requirements, or language adding things over each 10 year period. I can work in Python just fine, but that doesn't mean it feels nice & hygienic to use for me personally.


The Python core devs did not have the time or motivation to support the old codepaths in the CPython runtime, and the legacy code was getting in the way of a lot of longtime wants and needs for improving performance, runtime maintainability, language ergonomics, and the standard library. They also specifically increased the major revision number to signal their intent to move on from that legacy.

But you kind of addressed your entire spiel: Hindsight is exceedingly easy. They didn't realize how inadequate their migration tooling was, or how very entrenched Python 2 is in various places. It's hard when you don't know what you don't know and you're highly motivated by hopeful aspirations.


>The Python core devs did not have the time or motivation to support the old codepaths in the CPython runtime, and the legacy code was getting in the way of a lot of longtime wants and needs for improving performance, runtime maintainability, language ergonomics, and the standard library.

They could have fixed most of this legacy code without changing the external user-facing API so much.


> They could have fixed most of this legacy code

They could have, but they didn't want to.

It's an open source project. Is there really much of a difference between "I'm not going to work on this system because it's terrible" and "I'm forking this system and I'm not going to support the previous version"?

In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit. In fact I believe you can still pay if you happen to want py2 support.

But if you're not paying, you're saying "hey, this thing you work on and provide to me for free - why are you working on it in the way you want rather than the way I want??"


> In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit

Was python-2 handed off to new maintainers? News to me.

> why are you working on it in the way you want rather than the way I want

Is "it" python-2 or python-3?

This isn't users demanding py3 devs support py2 - it's users asking that devs who no longer want to support py2 to hand it off to those that do, rather than blocking it.


When I say "the developers of X could have done Y better" I don't mean that they owe it to me in any way to have done so.

I'm just judging their technical decision making. They are perfectly entitled to delete the whole project and start a new one and I have absolutely no right to say they shouldn't.

But I do have a right to critique their decisions from a technical standpoint.


Yes it's interesting if this is cited as one of the motivations.

That's a problem of a language being oriented around a single implementation. Is it even defined by this implementation?

Compare to eg. C or C++.

Diversity, and interoperability is important as it is a significant contributor to longevity.

I do like that you've used the term "API" as I think that sums it up. To think of "Python" not as a language agreed by multiple implementors, the behaviour here is that of a "library" with an "API".


It would have taken considerable effort, regardless. This cost was offset onto the development teams in companies doing migrations. It was the decision made and if you don't like it, consider using another programming language. Perhaps consider that it's an open source project with a lot of contributors essentially working for free.


This explanation can be used to justify anyone breaking anything. You might as well argue no project ever needs to care about backwards compatibility.


How many downstream, Python dependent companies were funding its development? Everyone is entitled to their opinion. But if they build their system on a platform outside their control then they'll have to roll with the changes, fork it, or move to a different platform / fork.


I don’t think hindsight can be claimed here. It was a decision that was not made from ignorance. The Python developers chose to sacrifice backwards compatibility. Other languages do not typically make such choices and if they do they make updating codebases relatively easy.

Nothing about python versioning is easy. It’s a disaster and the key reason I do not start any projects in python.


> The Python developers chose to sacrifice backwards compatibility.

And it is quite clear that that choice was not based on accurate estimates and insights.

The original e.o.l. was laughably short and then had to be doubled. It was quite clear they based their choice on the assumption that consumers would have all switched to e at a time when 2 was still used by 80%.

They made that choice based on what can only be seen as complete ignorance of the cost of rewriting software.

Right now, the biggest reason to drop Python 2 for most serious consumers is not any of the improvements that Python 3 brings, but that it is e.o.l..


> Other languages do not typically make such choices and if they do they make updating codebases relatively easy.

I want to understand what was so hard about porting code from Python 2 to 3. I ported a few tens of thousand lines of Python 2 code to Python 3 and it was pretty trivial. In my experience the only thing that made porting code hard was when a package you depended on was not ported to Python 3 yet. But maybe my experience does not reflect some other cases. Can you eloborate on what was so hard about porting code from Python 2 to 3?


I have a pretty nasty example here:

https://news.ycombinator.com/item?id=25890126

How do I regression test five different pieces of DAQ hardware? My best plan is to pull them working systems and deal with them missing. I don’t think it’s a good use of resources to buy extra DAQ cards just for a regression test bed.

Regardless of that, moving from python 2.5 to 2.7 is not trivial because not all used libraries were even updated to 2.7 from 2.5. Some that were broke backwards compatibility. How far do I have to bend backward just to get in the right place to update to python 3? I see many comments trivializing the effort needed to update to python 3 because they know of narrow use cases and expect large amounts of resources to maintain code. That isn’t the reality for most users.


The hardship of porting from 2 to 3 very much depends on how critical the software is. Porting 1000 lines of python 2 that deals with files encoded in various ways where it’s impossible to test all edge cases and where a failure might lead to huge liability charges is hard not because it’s hard to run 2to3 and do some random tests, but because you don’t know what you have missed. And still, a 300k lines of code project might be fine to just run 2to3 on and then find the bugs as you go. It’s a matter of context.


And the opposite -- tons of little unimportant scripts sitting around that add a lot of value as a whole, but just aren't worth devoting to rewriting on a whim of poor decision making skills of the Python developer team....

I have Python code dating back to when I was an undergrad. It's sad to see the Python team decide to nuke that. My C code from then (mostly) runs fine still.

The team decided to externalize a massive cost on its community without much benefit. That was sad to see at the time, and it continue to be sad to see.


As someone who does a decent part of development in python, I'd say you are using the wrong language, if you can't test your edge cases and have huge liabilities.

Python code is inherently almost-untestable and fragile. These days, when coding something critical and non-trivial, I choose a memory-safe language with static typing and type inference, ADTs, pattern matching and try to write simple yet pure functional code with well defined semantics, that works almost by definition.


Sure, but how does that help? I mean, absent a time machine, you're technically correct but operationally moot.


If you have a liability situation, maybe you could work to rectify it, instead of blaming 2to3 transition that's shaking up your house of cards?


> If you have a liability situation, maybe you could work to rectify it

Well yes, sure, of course.

And like you said, maybe Python isn't the right language in the first place for mission-critical life-is-on-the-line software.

But if you have already gotten yourself into a position where some piece of your business infrastructure is dependent on an obscure bit of hard-to-port-to-Python-3-and-maintain-exact-behaviour Python code, then it is exactly the "2to3 transition that's shaking up your house of cards", no?

And, furthermore, like you said, if you find yourself in this position, you should be looking at some other language entirely rather than porting to Py3, eh?


Note that I am not against using python in mission-critical code.

I was referring to untestable code with a myriad of edge-cases, in which case you have a problem that will surface sooner or later, be it 2to3 transition or something else.

If the code is truly static, you can ignore the transition and deprecation. Otherwise you should probably work on documentation/testing/refactoring and/or porting to another language.

2to3 transition was handled badly, up to about 2.7 and 3.4 or so, but the pains described here seem mostly self-inflicted, and I don't see it as an argument against the needed changes.


These are exactly the concerns of serious enterprises that the Python developers have missed, that made them seem as though they be hobbyists that have never dealt software that actually powers infrastructure.


Python was intended for education originally. It's possible that some uses are just too far outside that wheelhouse to expect it to work well forever. Doubtful I'll ever write desktop GUIs in PHP for example, though it appears some have already done it.


The standard library of 2 already came with many facilities that go well beyond that.

They targeted business; it came to be adopted by business; and then they were surprised that business was not enthusiastic about updating currently working code with all the potential regressions and downtime that might come from it.


Could you explain how "Python was intended for education originally."?

As I recall, Python was designed for the Amoeba operating system, and drew on experience from implementing ABC; ABC was definitely designed for education.

But ABC != Python. Checking now, the first Usenet post for Python 0.9 says:

> Python can be used instead of shell, Awk or Perl scripts, to write prototypes of real applications, or as an extension language of large systems, you name it.

See https://groups.google.com/g/comp.sys.sgi/c/7r8kVgQ84j0 .

It doesn't specifically mention using Python for education.


There are certainly some cases where even the smallest backward incompatible change would cause serious problems on some systems. Thanks for giving an example, instead of just downvoting.


The problem was that you could only port once all the libraries you use had ported, but libraries didn't want to commit to abandoning Python 2 quickly.


Agreed, that was also my experience as well, the hardest part was not changing our codebase but if we depended on a package that was not ported to Python 3 yet.


> The Python core devs did not have the time or motivation to support the old codepaths

Then sounds like they didn't want to be python devs anymore, good luck on their new project..

Instead they held onto the reins and drove python into the ground so that their new code could devour the remains of the old.

> They didn't realize how inadequate their migration tooling was

A shame then that they decided that migration was mandatory. They don't need to know either, they just have to encourage users to migrate, rather than force them to. Saying "They didn't realize ... how very entrenched Python 2 is" is basically saying "we didn't think we'd encounter (significant) resistance". Their "hopeful aspirations" was that everybody (that mattered) would be onboard, which is why they didn't bother ask..


There are a billion blog posts about the python core developers acknowledging their mistakes and saying they would handle future changes much differently.

This post might be true, but it's roughly 10 years late in terms of hitting the intended audience. Everyone gets this now, and "beating a dead horse" might be an understatement


It's in response to something happening today.

Some dead horses need a serious beating every now and then to remind people that they can resurrect if you're not careful. All of the lessons the python team did not put into practice were well known at the time, but they knew better and here we are.

The day after tomorrow someone will make breaking changes to some API, framework, language or OS who still needs to learn this lesson, maybe we'll get to them in time.


The lessons have not arrived at the current Python cabal. They just deprecated unittest and are seriously considering to break parts of the C-API again.

For the people who work in the correct companies this will generate many billable hours for no gain.

For others it will be a lot of unpaid work again. At this stage Python should be forked.


> At this stage Python should be forked.

I seriously (I mean seriously) thought about forking Py2 (Tauthon is great BTW) but then I found out that PyPy has a Python2 mode and will for the foreseeable future. Just to be clear: PyPy runs Python 2 code, and always will. (As far as I know. Although it occurs to me that I have no idea what it's like if you're trying to work with the C API.)

(Also I got into Prolog, but that's another story.)


Where has unittest been deprecated?


Probably a mixup with distutils, which has just been deprecated.


> Python should be forked

It is pretty much forked.


Apparently not everyone gets it, seeing that many are arguing against it, and every time this subject lands there are many ignorant users that say “Just upgrade your code.” as if that be free.


Python 2.7 still works as a binary. You can vendor all your requirements. The rug is not being pulled out from anyone, we’re looking at 10+ years of this.

pypi is a mostly volunteer-only endeavor, so it’s tough to support stuff forever. And even there older pips still will work!

Python 2 still works! It’s still there! Nobody is taking it away from you in any real sense. But Python developers don’t want to continue developing in that environment so are choosing to not handle it for future stuff.

Python 2 works. You can use it forever if you want. Nobody is forcing you to upgrade... except if you want the free labor from the community. And you have had years and years and years.


Uh, have they acknowledged their mistakes?

One or two of them are in this comment section right now, denying culpability and fanning flamewars.

Not quite the behavior of the repentant.


> Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.

Python 2 to Python 3 was nothing like rewriting an entire codebase. Most of the difficulty was if you depended on a package that only supported Python 2, other than that it was pretty easy to port a Python 2 codebase to Python 3. If you have millions of lines of code it might take more time understandably, but still it was nothing like rewriting a whole codebase.


> The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.

And yet we haven't heard of this being an actual, real problem, or are there any high profile examples?

I had to migrate multiple small projects (~10k loc) myself. That should be the typical use case for python (power law etc.) The whole thing took about half an hour per 1000 loc, and I had more than 10 years to plan it.


> The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.

There were serious issues in Python 2 that could not be fixed in any backward compatible way, and would have made further progress forward impossible.

It wasn't done lightly and a lot of smart people thought about it for a long time.

---

And your old Python 2 scripts will continue to work forever, so I'm not quite sure what your beef is.


Progress needs to be made, and sometimes dropping support for stuff you no longer want to spend time supporting makes sense.

That said, I still think situation is mishandled for this reason: py3 is basically another language, similar to py2. Calling it py3 is an exercise in marketing - instead of creating a new language to compete with py2 (along with all similar languages e.g. Julia) the existing py2 community was leveraged/shamed* into supporting the new thing, and most importantly, py2 was killed by it's maintainers (rather than handed off) so it couldn't compete with py3, and so that users would be forced to move somewhere else - py3 being the easiest.

If it had properly been a new language, they could have taken more liberties (compat breaking) to fix issues, like a single official package manager. And migration to py3 would have been more by consent, than by force.

* https://python3wos.appspot.com/


Very much this. It's a separate language that, if it hadn't been pushed by BDFL and co., if it had appeared as an independent project (like e.g. Stackless Python or something), would have had to live or die on its own merits.

- - - -

An additional aspect that I see as an old Python user is the "poisoning of the well" of the inclusive and welcoming spirit of the community. We (I'm speaking as a Pythonista here) have had problems with this in the past (remember how grouchy effbot could be? He's a sweet person IRL though.)

We made great progress and got a lot of acceptance in the educational and academic worlds.

Now just read this very thread and you'll find so many people making curt dismissive comments to folks who aren't on board with Python 3.

I still love and respect GvR (I once, with his permission, gave him a hug!) even though I think he messed up with this 2->3 business (and in any event, the drama around language innovation eventually pushed him to resign, as we all know.) He's a human being. And a pretty good one.

I guess what I'm trying to say is Python 3 won. Let us (all of us) be gracious about it.


> I simply do not have the time to rewrite the many Python 2 scripts that have written over the years that run my computer.

In 12 years? You must have written some very extensive scripts.


While I also find the timeline totally reasonable, I think most "I don't have the time" complaints are probably less about being able to finish it in time, and more about wanting to spend time doing something other than rewriting otherwise finished or stable code to satisfy a backwards-incompatible change.


> Do these men think that time is free?

Written by someone that, for some reason, did not decide to maintain their own fork of python2. If time isn't free, why is it expected from maintainers to support other companies' lifestyle with their own time?


It isn't - just hand it off.

If you don't like the laws, are you a hypocrite for not starting your own country?

Arguments in this thread seem to miss a discrepancy:

    "We don't want to support py2, and so why should we? Our time isn't free and we do what we want!"

    "We know you don't want to migrate your py2 code, but you have to."
Forks aren't easy, especially when you get no support from the "official" python-2 maintainers. At the very least, a fork would not own the name.

Here's a question - why isn't python-3 a fork of python? Answer: because forks are hard, and the devs wanted to keep all the momentum/resources of python-2.


The fork comment is not meant to be a realistic suggestion, it just points out that there is work needed to maintain compatibility. The thing is, you can't both complain about the time it takes to migrate your project _and_ expect maintainers to spend an incommensurate amount of time maintaining suff for you, free of charge.

I understand that some people and companies are now caught between a rock and a hard place right now. But honestly, that rock has been coming for 12 years now, and the alternative is to put other people in that situation.


Sure, but the assumption is py3 devs do the work in order to dismiss the idea and suggest people are entitled.

py3 devs don't need to do the work, they just need to hand it off.

> The thing is, you can't both..

Yes you can, if "maintaining" is handing it off, as opposed to the straw man of forcing py3 devs to do it. Why do the gatekeepers only allow for themselves to do the work?

> that rock has been coming for 12 years

notice is not consent.

> the alternative is to put other people in that situation

12 years is enough time to hand off to people who are happy to maintain py2. But there was no choice given.


Really nice of them to spend their time supporting your case for free though.

It’s open source, you can fund some program to keep supporting python 2.


> It’s open source, you can fund some program to keep supporting python 2.

No, actually, you can't - last time I checked, they were specifically threatening[0][1][others] to sue anyone who tried to continue developing Python for trademark infringement (despite that they are ones falsely using the trademark for something other what it got it's reputation from).

0: https://lwn.net/Articles/711092/

1: https://github.com/naftaliharris/placeholder/issues/47


So they chose their own name and released under that.

Here’s the Open Source Definition:

https://opensource.org/osd

> The license may require derived works to carry a different name or version number from the original software.


I've already tried running old C++ projects and every time something breaks, so it's not as clear cut as you make it to be

Some things in Python 2 were not fixable by keeping it backwards compatible. Print as a statement? Sure. But strings/byte arrays, no way.

Of course they could have made the Py2 implementation less broken and less stupid (yes please do use ASCII as the default, ignore the existence of unicode, be trigger-happy about errors, etc)


The whole string/bytearray disaster could have been prevented if strings would always be UTF-8 encoded. That way strings and bytearrays can continue to be different views on the same data. The great divide between byte- and string-data was completely pointless, especially in 2008 when python3 was started because by that time UTF-8 was already firmly established for at least a decade (it would be an excusable design fault only in the 1990's).


> by that time UTF-8 was already firmly established for at least a decade

For text files, maybe, but various APIs like the Window API and the Java String API still use UTF-16.

UTF-8 dependence is also a major pain for many where the local character set conflicts with UTF-8. For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly. The country of Myanmar officially switched to unicode less than two years ago so if you still need to operate on older data, you're going to need to support their old character set.

UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English. Instead of breaking compatibility with most libraries, python3 would have broken compatibility with most libraries and a few countries instead.

Just like the rest of the world has to deal with three countries refusing to switch to metric, python3 needed to deal with countries refusing to switch to UTF8.


> UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English.

Huh? I'm using UTF-8 exclusively for string data since around 20 years in C and C++ and never had to deal with language specifics (also true for non-European languages, we need to deal with various East Asian languages, and Arabic for instance). You need to convert from and to operating system specific encodings when talking to OS APIs (like UTF-16 on Windows), but that's it (and this is not language specific, code pages are an "8-bit pseudo-ASCII" thing that's irrelevant when working with UTF encodings.

When dealing with "vintage" text files with older language-specific encodings, you need to know the encoding/codepage used in those files anyway, and do the conversion from and to UTF-8 while writing or reading such files. Those conversions shouldn't be hardwired into the "string class".


> UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English.

From a European perspective, this sounds very unlikely. Sure, you may have to deal with deprecated _encodings_, but I’d like to hear about mainstream languages with writing derived from the Latin alphabet, that aren’t supported by UTF-8


From a Japanese perspective, it also sounds very unlikely. Go with UTF-8 strings is successful nowadays in Japan.


They could have made the Windows version UTF-16 by default for example.

Or they could have fixed setdefaultencoding or give us a way to set the default encoding https://stackoverflow.com/questions/3828723/why-should-we-no...

I don't buy the "discouragement" part there, if anything they could have made it mandatory or at least it set it to UTF-8

> For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly.

Yes but you would have to work on those cases anyway and ASCII would have made it blow anyway. But convert it to UTF-8/16 and it works.

EDIT: the reason is apparently that "(setdefaultencoding) will allow these to work for me, but won't necessarily work for people who don't use UTF-8. The default of ASCII ensures that assumptions of encoding are not baked into code"

Really. I can't explain my anger at how this is such an idiotic excuse. Yes, your program will fail if you use Latin-1 encoding, duh. Configure your environment correctly and it will work. Sounds like the kind of pedantry that made Guido quit over the walrus operator


So keep the conversion functions around then. Or just do what Go did.

Either way, it's no big deal. There are no excuses for Python 3. Some people are just stubborn.


Another typically ungrateful and entitled comment about opensource on HN. Color me surprised.

> Do these men think that time is free?

Do you think they have endless time to plan a migration with minute detail for every possible usecase?

Users have about a decade to migrate their codebases and stop writing new projects in Python 2. Do you need another decade? Or are you personally going to take over the maintenance of the python2 runtime?

Does anybody actually pay the core dev team for support? Do you? Does your company? Have they been coordinating all these years with the core devs and are unhappy with the result they paid for? I kinda doubt it.

It would be really nice if people were just thankful for all the free stuff they got and built their enterprises on.


> Do you think they have endless time to plan a migration with minute detail for every possible usecase?

My point is that for these small changes from 2 to 3, there should have never been a migration to begin with.

It's not an accusation of lack of effort; it's an accusation of ignorance on their part.

The migration has not only cost everyone else time and money; it has cost them time and money that was better spent elsewhere.

It has been a net detriment to all parties, including them, because they severely underestimated the cost of rewriting software and dealing with the regressions it might lead to.

I will damned well call a man foolish, for pointing a gun at his foot and getting shot in it, because he underestimated how easily the trigger would go off by accident, instead of being thankfully that he was willing to put in the effort to aim it at his foot.


You may be right or blinded by hindsight. Personally I'd rather see more breaking changes in languages like PHP that have a lot of baggage holding them back.


Agreed, although I see our opinions are being heavily downvoted.


It’s the 21st century. Questions like “do these men think that time is free?” have no place in this century. Why not be accurate and say ‘people’? You chose to use a number of big words when simple ones would suffice. Why not take the same care in addressing people fairly?


For many non-native speakers, 'man' tends to be used as 'person' instead of 'male', probably because the translation of many common idioms involving 'man' uses a neutral word in their language.

For example, when translating between English and Romanian, 'man' often gets translated to 'om', which doesn't imply a gender in modern Romanian.

Even in English, 'man' is sometimes used without a gendered connotation. For example, if I say 'man is evil', I am unlikely to be referring to males, but rather people. Similarly, 'hey, man!' is not reserved for males.


I know that but take a moment to read OP’s finer use of the language and use of more precise language. Every single word is perfectly precise, right down to the tone.

That was a pointlessly gendered comment and has no place in our industry. You can keep defending it but I’m not going to stop calling it out.

Have a nice Sunday!


Is that so? I find I used various open phrases such as “Python developers” without going into the exact semantics of which ones, as I'm sure many objected to it, or “serious enterprises” without naming them, an incomplete list of programming languages, and so forth.

It was certainly an informal statement I made, not a formal specification.

There was nothing gendered about that statement, and most do not seem to have interpreted it as such, nor was it so intended by me.


You don’t have to get defensive just be aware. The times have changed, the world is different and our default language absolutely has to adapt.

Language adapts or does. That’s how English got here. You can adapt too - it’s not as hard as you’re making it out to be. Heck, if you spent an eighth of the time thinking about inclusivity as you do about individual words, I wouldn’t have had to say this.

However, again, I’m glad I said something and regardless of what you claim, I’m not going to stop calling these kinds of grammatical monstrosities out. Language is important. Full stop.


My point is that, on an international site where English is only used as a means of communication, you should generally be more sensitive to cultural differences in the use of a language such as English. It is often used as a common common language between people who don't speak English natively, and so idioms and nuances from their own languages seep into this common English.

The finer points about the semantics of a word such as 'man'/'men' and when it can be taken to refer to people unambiguously vs when it may accidentally imply you are talking about adult males are likely to be lost on a non-native speaker, especially if they come from a culture/language where this distinction and its implications are not subjects of general interest. Even if they are well-versed in the use of English in general.

So it's better to follow HN guidelines and assume the best intentions where meaning is unclear, instead of calling people out on their use of English.

Now, if you know for a fact that the GP is a native English speaker, and especially if you know that they are American, then what I'm saying is not very relevant.


I get your point completely and I’m glad you shared it. When I was in University, I worked with ESL (English as a second language) students and they were always really happy to hear about idiomatic quirks like that. I should rethink how I approach this online. I don’t want to be patronizing because lots of non-native speakers have better written English than I do, but I’ll think about it and find a new line.


Singular man is more likely to be gender-neutral.

Good example: “man is evil” clearly means people, since one would say “men are evil” if referring to males.


Men are evil.” can also refer to humans in general.

I just searched for the phrase and it's about half split between either meaning from context inference. Yet, the meaning pertaining to the species is mostly from discussions by educated philosophers, and the other one are annoying identity politics arguments about why one's North American dating life is disappointing, — not exactly the audience I am ever interested in reaching, frankness be.


So basically, you won’t be kind because you can’t find suitable sources?? Okay then.


I'm simply disputing the claim that “Men are evil.” would be construed by English speakers to automatically refer to males.

The reason I'm not what you call “kind” is simply because this is how English works, and how it has always worked and how English speakers would interpret and parse that word.

I see no reason to avoid using a word in a perfectly acceptable, current, and historic use simply because you find that it has a different, secondary use. You call that “not being kind”. I call it “You don't own the English language any more than I do.

You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context. I merely ask that I be allowed the same and speak as I will and use the word in it's original meaning, that obviously still sees current use.


> You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context.

To be fair, while I consider your original wording to be pretty clear, this is wrong. According to Wikipedia, the word 'man' has adopted the meaning of 'adult male human' as its primary meaning starting with Middle English, when it displaced Old English 'wer'. There are still uses where it retains the much older meaning, but its primary meaning today is 'adult male human', and has been for a good few hundred years.


I do not find that to be the case investigating the Global Web-Based Corpus, which contains modern, global internet-published English:

https://i.imgur.com/EG4zaoU.png [sadly the corpus cannot be easily linked, but one may search in it here: https://www.english-corpora.org/glowbe/]

The way I look at it, the usage therein of the word “man” to specifically discriminate sex is very rare but definitely occurs. What does occur is the use of the word “man” to refer to a specific individual, which would typically be male, but in most cases where the word “man” is used indeterminately to refer to a class, it seems to be used without regard to sex.

Apart from that the most common usage seems to simply be vocatively as address, which is also gender neutral.

I would agree that it is rare, outside of compounds, to use the word “man” in a determinate sense for a female man, such as “that man over there” which would mostly be used in a military context, but in an indeterminate context to speak of “a man in general” or “men in general”, the most common usage from context seems to be sexless to this day.


Reading through the first 100 results, I see it mostly used to refer specifically to adult male individuals, or to "a man" meaning specifically an adult male ("would've flipped out if a weird man said some creepy remarks"). There are some uses where it may or may not be gender neutral ("you are a Spammier man than I" - may refer to a man or a woman, but it is probably used because the author is male; a woman might have written "a Spammier woman than I" instead, while also addressing both men and women).

There are also clear cases where "a man" is used to refer to "a human", such as "wheat growing taller than a man".

Rather more interestingly, if you instead search for "men", you'll see that is used essentially exclusively to mean "adult males". The only exceptions I found was "and because the greed of a few men is such that they think it is necessary that they own everything" and even there I'm not sure.


> Reading through the first 100 results, I see it mostly used to refer specifically to adult male individuals, or to "a man" meaning specifically an adult male ("would've flipped out if a weird man said some creepy remarks"). There are some uses where it may or may not be gender neutral ("you are a Spammier man than I" - may refer to a man or a woman, but it is probably used because the author is male; a woman might have written "a Spammier woman than I" instead, while also addressing both men and women).

I disagree; the first uses of “man” in an indeterminate sense are these:

> down the economy, Here is the truth the republicans feel uncomfortable with a black man in the with house and a lot of voters are riding the republicans coat tail

> someday you might ask me to help you move. Or, to kill a man. # Leonard: I'll doubt he'll ask you to kill a man

> say, in 35 years of working I have almost always had at least one man who I felt " wrong " about. (the exception? Disney Studios!

> boyfriend, well husband, but either way would've flipped out if a weird man said some creepy remarks regarding me at a christmas party. To me this says

I have specifically included up till your reference, which was the first of an indeterminate usage of the word “man” that by implication is most likely gendered, whereas all the others are most likely not.

So there are three sexless ones before the first gendered one.


I would argue that the one about 'a black man in the white house' was in fact gendered, though it is somewhat debatable. It was referencing Barack Obama specifically. If there had been a black woman president, the phrase would have definitely been written to specifically say 'a black woman in the whitehouse'. On the other hand, if it had been written before either a black man or a black woman had (tried to) become president, it may have still used 'man' in a gender less way.


> You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context.

You mean primary.


You wrote:

“Do these men think that time is free?“

That’s not even the same structure as ‘all men are evil.’ Instead what you wrote is gendered and thus completely inaccurate.

So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.

That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?

I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.

Be better. It’s easy.


> That’s not even the same structure as ‘all men are evil.’

Indeed it is not. I merely separately disagreed with that the statement “All men are evil.” would also by necessity be interpreted as such. Either can be, depending on context, but this is not such a context.

> Instead what you wrote is gendered and thus completely inaccurate.*

You seem to be of the minority that has interpreted it as such. I would not quickly use votes for an argument except when they pertain to popular opinion, and this is a matter of which interpretation is more common.

I certainly didn't mean any gendered statement, and I also believe that most readers did not read any gender into it.

> So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.

I could, and you could also change your language to avoid any and all possible ambiguities that would not be a problem in practice due to the power of contextual inference.

You seem to ask that this specific word be given special treatment above all others.

> That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?

Such as here, the word “our community” is quite vague. You used the word “our” which is ambiguous in English as it's unclear whether it includes the listener or not, and on top of that also what it includes.

I can however perfectly well infer from context that this is an “our” that includes the listener, and can make a reasonable guess to the extent of the “community” you refer to.

Finally, do not know that it is “an issue” and I certainly do not know that there are “norms” about this. It very much seems that the majority sides with me on this issue given the votes, at least here. I do not believe I am going against any norms, not that I would consider an argumentum ad populum a strong one, but you were the one that raised it here.

> I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.

Well, frankness be, it seems from your language as though your default expectation is that your arbitrary wims, at least on this particular issue should be accommodated, and that everyone who disagrees with you is unkind or lacks sympathy.

You call it a conversation, but it seems as though you started it from the assumption that you are right, and everyone who disagrees is wrong.

> Be better. It’s easy.

It is your opinion that this is better that this is better indeed. Not everyone has to agree with you on that matter, and not everyone does.


Nobody ever has to agree with me and I’m proud to be a minority of one.

However, you’re a beautiful writer and beautiful writers can cause immeasurable pain. I’ll always speak out in case another minority of one feels pain but is too ??? to speak out.

Seriously, take good care. This has been a wonderful thread and again, you’re a really beautiful writer. :)


Not terribly pertinent, then. One is more likely to fall into conversations about mundane topics with uneducated people than to stumble upon existential conversations with educated philosophers, even though the latter might produce a large corpus.

One would also think that “man is evil” would be preferred by the erudite philosopher to the more ambigious “men are evil”, although one can never overestimate the fondness that an educated person might have towards pedantry, frankly.


> Not terribly pertinent, then. One is more likely to fall into conversations about mundane topics with uneducated people than to stumble upon existential conversations with educated philosophers, even though the latter might produce a large corpus.

“Mundane people” is an entirely different segment than “raging identity politics aficionados complaining about their romantic life”.

The common man on the street will think nothing ill of the word being used as such, even when he be a blue collar construction worker, and will normally interpret it as intended.

I have never met such a raging identity politics aficionado in real life. I would assume not living in the U.S.A., where most of them seem to be centred, reduces my chances. But even there, it seems to be a rather small segment that is isolated to weblogs, as even newspaper columns do not seem to find it mainstream enough to dedicate segments to it.

I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.


I said mundane topics... I won’t bother with the rest.


>I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.

Actually I think there's a very good chance she'll object.

The problem is that in your mind, males are the "default" human, and using sexist language reinforces this. This is not a recent opinion confined to "raging identity politics aficionados" or "weblogs" - at this point it's the wrong side of history for the better part of half a century. Consider this piece of satire by Douglas Hofstadter, written in 1985, which substitutes racist language for sexist language in a precisely analogous way:

https://www.cs.virginia.edu/~evans/cs655/readings/purity.htm...


> Actually I think there's a very good chance she'll object.

If you mean to suggest that this position runs across gender lines, then I very much object and find that a naive, but common, assumption.

It reminds me of a Canadian act that sought to introduce the word “fisherwoman” as a sign of good faith to the female fishermen, but it revealed that, overwhelmingly, the fishermen, male or female, did not like this change and found the word to sound silly.

I have noticed no correlation with the gender as to what position one takes on this, as many females as males seem to either favor, or object to, innovations such as “chairwoman” or “councilwoman”.

> The problem is that in your mind, males are the "default" human

No, that would be in the mind of those that read the word “man” and must compulsively attach a gender to a statement containing it.

I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.

> and using sexist language reinforces this

The sexist history is to use the word that has always simply meant “human” and giving it a gendered, ageist meaning. — you reverse the history of the word here.

> at this point it's the wrong side of history for the better part of half a century.

What would you mean with “wrong side of history”? It is undeniable that the meaning of the word “man” to mean “human” is the original meaning of the word and that the secondary usage to mean “adult male human” is a later innovation.


No, you missed the point entirely. The point is that you pictured this "blue collar local" as a man, as evidenced by your use of the pronoun "he". Don't tell me that it's about the word "man" and its historical role to mean "human".

>I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.

The irony. Next time say "they" instead of "he".


> No, you missed the point entirely. The point is that you pictured this "blue collar local" as a man, as evidenced by your use of the pronoun "he".

No I didn't. The pronoun “he” in English is also very often used to refer to an indeterminate, hypothetical person of irrelevant and unspecified sex.

I didn't picture him as anything in particular, given that I am partially aphantastic and never draw mental pictures about such scenarios.

> The irony. Next time say "they" instead of "he".

There is no irony here; you infer that he is male because of the pronoun and I find such usage to not be universal at all.

The pronoun “he” has a very long history in English for use with a hypothetical person, from which the listener is not meant to infer any particular gender. It is also true that some use the pronoun “they” in that case, but that is not a universal behavior and either may be encountered.

Use of “she” for such hypothetical persons has also seen recent use, and was probably innovated deliberately; some auctors deliberately alternate both in even distribution.

All of this is how the English language is used by different speakers. I am not telling you which is better and how you should use it; I am telling you that if you are denying that all have currency, you are but certainly being willfully ignorant because you do not like the descriptive truth about how English is used by it's speakers.


You’re an excellent writer!! Again, while sad, this has been a wonderful thread, full of amazing writing.

(I’m on your side but that’s an aside. You’re a really beautiful writer.)


Well, the other perspective would be that you should stop using the word that refers to human beings and has always done so only for the adult male members thereof.

My perspective is that context is usually sufficient and that this is not the only word in English that is used as such. I never find such passionate debates about the word “chess” for instance which can be used for every game that descended from the Indian game, the European variant specifically, or simply any exercise of great tactical planning.

Such interesting objectivity men are awarded when politics not be in play.


I agree with you and the sentiment but your tone really discredits the argument. Instead of putting others down it's better to assume good faith, and educate in an elevating way. This way feminism gets a good reputation.


Ironically it is accurate. Python2 was written entirely by men and so is virtually all of Python3.

Women have better things to do with their time, like studying law or medicine.


>Rust editions, which as far as I can tell have been a complete success.

Rust's commitment to backward compatibility is certainly extremely commendable, but I don't think the language went through anything resembling the switch from Python 2 to Python 3 in terms of breakage.

Some of the changes in Python 3 are very fundamental. Imagine if Rust had shipped without String/str and they were added after the fact, would Rust manage to avoid splitting the ecosystem? That's an open question as far as I'm concerned.

And I also hope that we never find out. Rust's fundamentals have proven to be very solid so far. Having things like OsStrings (something missing from most programming languages, including Python 3 AFAIK) shows a great amount of foresight and understanding of the problem space. Contrast that with Go which seems very intent on completely ignoring 30 years of programming language evolution.


Well they had a lot of languages to look at and say "ummm that didn't work out for them so well, let's not do that" such as the Python fiasco.


Single codebase compatibility meaning that you can have python2 and python3 code in the same application? Isn't that significantly harder with an interpreted language or am I missing something?


It's a mostly solved problem with Racket. What they probably should have done was have python 2 code somehow declare itself as python 2 (Racket does this with #lang at the top of files). Then, just have a python 2 compatibility layer that works in two steps. First step is to compile/parse it into a similar form as python 3. Additionally, provide a small python 2 runtime which provides different versions of the functions which changed from 2 to 3. I think the two steps are important because some stuff is easier to solve via compilation like "print" while other stuff may be only possible at runtime like strings being unicode.

You would still have some differences which can't be papered over, but it would have made writing code that works in both python 2 and 3 much easier.


Single-codebase compatibility means that the same code can run under both Python 2 and Python 3.

The initial expectation was that translation tools would solve this problem, but it didn’t really work out that way. Adding language features and library shims to make it possible to write pidgin Python that would run under either version meant that you could migrate libraries and parts of large codebases one at a time until the whole thing ran under Python 3.


That's the main working solution I found: code works in both versions.

The problem is that it's way trickier than it should be. Has they made that relatively easy, the Python 2->3 transition would have had a much smoother "normal" upgrade process.


“Pidgin Python” is such an apt term, why am I only seeing it when it is no longer relevant!


“Rust editions, which as far as I can tell have been a complete success.”

Can anyone tell me why statements such as these get the emotions going?


Hello AI!


Hello AI!


I hear very often criticism of Python's transition but considering how insanely popular the language has grown I don't understand why.


Python2 was poised to take over the world and be the next Java. 3 is losing ground to Node, Go, Rust, even Lua. 3 is a really fun and productive language to work in as long as you don't need to think about bytes.


My guess is Python wouldn't have done much better anyway. The GIL and lack of client side support kept it out of a lot of use cases.


Really? I’ve been using Python for 15 years and never heard anyone say “GIL”. How and what is GIL preventing?


It causes issues with running low overhead multithreaded code. I get by with the multiprocessing library, but then again I don't have a lot of threads (10 is the most I've ever needed), some people want to run hundreds of "light threads" depending on the type of programming that you are looking to do.



Statistics and probability.


I am still not sold to Rust editions and think in the long run they won't be much different from solutions in other platforms.

There will come the day when something will change semantics, or require different kinds of runtime support across editions, and then the headaches how to link binary crates from different editions will start.

Editions to me appear only to work, provided everything is compiled from source code with the same compiler, aware of all editions that came into use.


You keep repeating FUD.

Read the damn Editions RFC. The community agreed that no semantics or ABi breaking changes will land to Rust, EVER.

This is not a lesson from Py th on, but from C++, which introduces breaking changes every single release, which are much smaller than Python but still a pain in million LOc code bases.

If that ever happens, it was agreed that the result would be a different language, with a different name.

That is, editions don’t have this problem because the FUD that you are trying to spread every single time this issue comes up cannot happen, by design.

Your argument “Rust editions don’t solve this problem because they don’t handle semantic or ABI changes” is false, because in Rust there CANNOT be any semantics or ABI changes, and editions handle this situation just fine.

In the context of Python 2 vs 3 this argument makes even less sense, because editions allows combining libraries from different editions ina forward and backward compatible way without issues. Source: work on multiple >500kLOC Rust code base and one >1 million LOC and they all use crates from all editions, and mix & match them, without any issues, doing LTO across them, using dynamic libraries, and all possible combinations of binaries across 4 major platforms.


The problem is there, the fact you choose to ignore expectations of C and C++ enterprise developers using binary libraries across language versions is another matter.

You call it FUD, I call it hand waving.

I want Rust to succeed and one day be available as official Visual Studio installer language, but apparently unwelcome points of view are always pointed out as FUD and attacks.

When cargo does finally support binary crates, I will be glad to be proven wrong when linking 4 editions together into the same binary, just like I do with DLLs and COM today.


I think you misunderstood. C++ doesn't even have an official ABI, nevermind having a stable one. ABI changes can and do happen in many C++ implementations (and there is no compatibility across implementations - you can't link a library compiled with clang to one compiled with MSVC). You can't generally expect to link together libraries compiled with different major versions of the same toolchain, though this may be supported by some toolchains.

Instead, Rust has defined an ABI and has committed to never breaking that ABI. Editions support API-level changes, but the ABI won't change.


Rust has not defined an ABI. You're misunderstanding how the edition mechanism works. Each compiler knows how to turn source code of any given edition into the same internal IR, but that's purely internal. You still cannot compile an rlib with Rust compiler version 1.X and use it in a program compiled with Rust compiler version 1.Y. You can compile an rlib with Rust compiler version 1.Z that uses edition 2015 and use it in a program compiled with Rust compiler version 1.Z that uses edition 2018.


That is news to me, where is the ABI specified, given that cargo is yet to support binary crates?

> you can't link a library compiled with clang to one compiled with MSVC

You surely can, provided it is made available as DLL or COM.


Rust actually supports multiple ABIs and you can pick which one to use.

The one I use for maximum portability is the C ABI defined in the ISO C standard and on the platform docs (eg the Itanium ABI specified eg in the x86psABI document on Linux).


I didn’t chose to ignore that. I compiled a Rust binary library 4 years ago, it still works without recompiling today on a dozen of operating systems and toolchains that did not exist back then.

Try doing the same with C++.

I really believe you when you say that you are clueless about how to do this, since you don’t seem to have any idea about what you are talking about, and all your posts start with a disclaimer about that.

But at this point the only thing I have to tell you is RTFM. Doing this is easy. HackerNews isnt a “Rust for illiterates” support group. Go and read the book.


There's no ABI compatibility between different Rust compiler versions as it is, so I don't see how editions will break a compatibility that doesn't exist.


Which is my point about it being half solution.


Python is the best example I can point to for how important it is to get the versioning and dependent management story right.

Python is one of the most "accessible" languages in terms of the actually programming experience, but making a project reproducible is a nightmare. There doesn't seem to be a real "right way" to manage dependencies, and getting a project running seems to often start with figuring out how the author decided to encapsulate or virtualize the environment their project is running in, since changing your system python for one project can break another.

I know it's an older language, so many lessons have been learned, but after working with Rust, or even NPM it seems amazing that developers tolerate this situation.


Re: Dependencies - There are at least two well known, well supported, rock solid ways of managing dependencies that are in very common use in the python deployment world.

Some combination of requirements.txt (which lets you dial in, with great precision, each of the libraries you need, and is trivially created in < 50msec with "pip freeze")

1. Containers - That's it. You control everything in it. 2. virtualenv - Every environment comes their own version of python and their own set of packages and versions of those packages. Add virtualenvwrappers and you can create/switch trivially between them.

It's been at least 2 years since I've run into a issue with python and dependencies that wasn't solved by both of those approaches.


I mean I think you can get a workable setup, it just seems really clunky to me. Like you cobble together some solution out of pip, docker, venv, and if you're jumping into someone else's project you better hope they documented it (wait I have to call `source` on which file?).

It's a far cry from being able to download any git repo and call `cargo build`/`cargo run` or `npm install`/`npm run` with confidence that it's just going to work.


I guess part of it depends on the teams/people you work with. I agree - that it would be nice in the python world if we all just agreed, "virtualenv + requirements.txt - Done." - Instead, as you noted, the python ecosystem has split into venv, pyenv, pipenv, Poetry, Conda, ....

Where I work - life is simple. You build your project in a virtualenv so it only has the libraries it needs, generate a clean requirements.txt, check it into git with a requirements.txt - everyone can run it, and, because we have day-1 onboarding to teach everyone virtualenv/virtualenvwrapper - the first thing a person does before installing the application is mkvirtualenv.

I see a lot of references to Poetry here - but I've never been able to interest any of our senior developers into looking at it - they are pretty happy with our existing system.


To be honest, I do expect to find a project provided with a pipenv/poetry setup, just like I would expect a haskell codebase to have a cabal/stack setup, and a java codebase to have a maven/gradle setup.

It is true that most recent languages ship with these from day 1, but ecosystems rarely lack this kind of stuff. I mean, even my vim has a package manager nowadays.

As for whether you want to salvage old code that isn't provided with package management, it's up to you. But you would have this kind of problem with any old, unmaintained codebase.


> It's been at least 2 years since I've run into a issue with python and dependencies that wasn't solved by both of those approaches.

The problem is that this leaves us with 2 years of documentation that's reliable and addresses easily-solved problems, and 28 years of everything else that will confuse anyone new to the language. Not ideal when accessibility is one of the language's primary selling points.

One of the major problems with fixing design choices or odd behaviours in software is that all of the old threads and posts don't just disappear, and people are now going to be lead down paths that are not only so convoluted and ridiculous that they were eventually changed, but often paths that don't even work any more.

It's very very tough to fix that problem retroactively.


Forget containers. The actual "right" way to manage python dependencies in a project is Poetry. It's very solid and super reliable and uses virtualenvs internally.

Pipenv could have been it but it never got far.


Pipev shipped an an emoji filled easter egg that broke all installs one day. It definitely had other problems, but that's the one I remember.


A dedicated docker container per python project "solves" this issue, but it's a hacky workaround, not a solution.


There's a scene in Major Payne:

===

[Marine has been wounded]

Marine Private: AHHHH my arm, my arm!

Major Payne: Want me to show you a little trick to take your mind off that arm?

[Marine nods and Payne grabs the private's pinky finger]

Major Payne: Now you might feel a little pressure.

[Major Payne breaks the Marine's pinky]

Marine Private: AUGGGGH! My finger, my finger!

Major Payne: Works every time.

====

That's kind of how I feel about Docker. Before, you had a problem. With Docker, you have a new, bigger problem (and most of your old problem hasn't gone away; it's just been masked for a while).

(And yes, I know I'm in the minority here)


Snap! I feel the same seeing people use docker for dependency management. Now you have two problems.


On a more serious note, most uses of Docker that I've seen push problems back, and have accumulating technical debt (with interest).

* Robust systems shouldn't be tied to pinned versions. If your code works with PostgreSQL 9.6.19, and doesn't work with 9.6.20 or 9.6.18, that's usually the sign of something going very, very wrong.

* In particular, robust systems should always work with the latest versions of libraries. In most cases, they should work with stock versions of libraries too (whatever comes with Ubuntu LTS, Fedora, or similar). It's okay if you have one or two dependencies in a system beyond that, but if it's a messy web, that's a sign of something going very, very wrong.

* Even if that's not happening, as much as I appreciate having decoupled, independent teams, your whole system should work with the same versions of tools and libraries. If one microservice only works with PostgreSQL 11.10, and another with 12.07, that's a sign of something having gone way off the rails.

These aren't hard-and-fast rules -- exceptional circumstances come up (e.g. if you're porting Python 2->Python 3, everything might not land at the same time) -- but these should be rare enough to be individually approved (and usually NOT approved) by your chief architect/architecture council/CTO/however you structure this thing.

For the most part, I've seen Docker act as an enabler of bad practices:

* Each developer can have an identical install, so version dependencies creep in

* Each team has their own container, and it's easy for versions and technologies to diverge

* With per-team setups, you end up with an uncontrollable security perimeter, since you need to apply patches to a half-dozen different versions of the same library (or worse, libraries performing the same function)

The docker/microservices/etc. mode of operating gives a huge short-term productivity boost, but I haven't actually seen a case on teams I've been on where the benefits outweigh the long-term costs. That's not to say they don't exist, but they're in the minority.

For the most part, I use Python virtual environments and similar, but by the time you hit docker, I back away.


What are the issues with using docker to solve this problem ?


> What are the issues with using docker to solve this problem ?

Docker alone doesn't solve the problem and neither does pip unless you take extra steps.

Here's a common use case to demonstrate the issue:

I open source a web app written in Flask and push it to GitHub today with a requirements.txt file that only has top level dependencies (such as Flask, SQLAlchemy, etc.) included, all pinned down to their exact patch version.

You come in 3 months from now and clone the project and run docker-compose build.

At this point in time you're going to get different versions than I had 3 months ago for many sub-dependencies. This could result in broken builds. This happened multiple times with Celery and its sub-dependency of Vine and Flask with its sub-dependency of Werkzeug.

So the answer is simple right, just pip freeze your requirements.txt file. That works but now you have 100 dependencies in this file when really only about 8 of them are top level dependencies. It becomes a nightmare to maintain that as a human. You basically need to become a human dependency resolution machine that traces every dependency to each dependency.

Fortunately pip has an answer to this with the -c flag but for such a big problem it's not very well documented or talked about.

It is a solvable problem tho, to have a separate lock file with pip without using any external tools and the solution works with and without Docker. I have an example of it in this Docker Flask example repo https://github.com/nickjj/docker-flask-example#updating-depe..., but it'll work without Docker too.


Throwing more technology on a problem means more complexity and more things that can go wrong. Doing it once or twice is fine, but the complexity increases exponentially.

Also, Docker is not a universal and secure solution. It works great as an "universal server executable format and/or programming environment" on Linux, but less so on Windows, macOS and especially FreeBSD.


Imagine you get a new phone with a new phone number, and you have a problem because some people still contact you on your old number. So instead of getting everyone to use the same number, you hire someone to take both of your phones and forward you the messages from each one.

Yes at some level it solves your problem, but it adds a lot of complexity which doesn't need to exist. Also now depend on someone in the middle who takes effort to manage, and might not do exactly what you want.


If you are developing a software that you run on your own servers, none. It works fine.

If you are developing an open source software that people can install on their machines, is a terrible solution. In that case you should package it correctly and distribute it via pip, so people can easily install it on their systems.


> it struck me how ridiculous it is that open source languages have to put up with this. Clearly "major" numbers are insufficient, the only real answer is to rename the entire freaking language when you make incompatible changes to it.

Perl and Python are the only two examples of this to my knowledge: most open source languages do fine introducing breaking changes in major versions.

The question is why Perl and Python had such problems while for example NodeJS, PHP (comparable webserver scripting languages) have had no such issues.

I wonder is it anything to do with the areas they're used in (Python & Perl are popular local/cli scripting languages in addition to web—has Bash had similar version woes?), or is it purely that the changes they made were more significantly breaking than others'? That's probably true of Perl6/Raku at least.


Perl never had a problem (at least with the Perl 5 and Perl 6 distinction) besides marketing. No one has ever been confused why their perl script doesn't work because their perl was version 6 instead of 5, and they never will be. No one has ever had worry about having writing a perl program to be compatible with both, unlike python where you never know if "/usr/bin/python" is 2 or 3.


Marketing is important. I did have people ask me if there was any point to learning Perl 5 when Perl 6 was just around the corner, and people buying the butterfly book instead of the llama book for the same reason.

Then the Python people did the same mistake. Beginners learned Python 3 and then had trouble with App Engine or some other platform. The most popular question in Python forums for many years was if someone should learn 2 or 3. Some probably just went with Go instead.


In some fields, Python is embedded as an interpreter into major binary platforms or commercial apps.

So in many of these cases the end user doesn't have a choice to use Python 3 until it's on offer.

And the vendor has usually integrated Python at a binary level into C code; that's why they provide a Python API.

The answer could even be "Red Hat Enterprise Linux 6"; consider that Python 2 is the default in this OS, which ended official support only at the end of last year. Many enterprises _chose_ this platform for its longevity, along with 3rd party vendors of commercial software.


Likely a key factor is leaving support for the old version. If they had immediately deprecated python 2, users would have quickly updated their packages over to a supported version.


>>> If they had immediately deprecated python 2, users would have quickly updated their packages over to a supported version.

Painfully wrong. They tried to deprecate at least twice, it was so bad a decision that they had to backtrack.

In fact Python 3 is the hard proof that you cannot hope users to upgrade, by abandoning ship and giving no way to upgrade.


Well there is a way to upgrade. They even provided an automated tool that would convert code and warn you about the handful of changes.


Somehow it feels like PHP being a synonym to WordPress and Node going though babel makes these languages different too from perl and python too.


Node 0.12, Java 6 and 8, PHP 4 - there's been plenty of examples of "sticky" versions in the past that had to be pushed away for other languages


Java and PHP I could see as being similar to the Python as they all shared the pain of widespread adoption by vendors that were reluctant to update (Java: enterprise organisational internal, PHP4: bad cheap webhosts, Python2: everyone?).

With Node 0.12 though I don't see it. IOJS was a pretty momentary internal political issue that many users didn't even register on their radars. It certainly didn't have any long-lived impact on version adoption within the community.

And: the important point, they've all had very successful major bumps since. So even if there are pains, they can be overcome. There's nothing fundamentally un-doable about major version releases for open-source languages.


A better comparison would be to Typescript which is a breaking change from JS but is branded differently. I'd love native typescript in the browser but browser vendors aren't going to go pushing a massive breaking change like that.


PHP had, never heard of php6?


PHP6, similar to ES4/ESX, was a language version that ended up in spec. hell. Nothing to do with release woes (none were ever released), so I don't see any real relevance in this discussion?

PHP7 on the other hand was a pretty seamless migration from PHP5, and PHP8 looks likely to be similar.

General point is the original commenter was posing this as some fundamental issue of open source languages: clearly there's plenty of examples of success, so it can't be.


Php6 also had a codebase. They tried something Similar as python 3 , well they stopped it and backported stuff like intl.


> This is awesome in terms of avoiding all of the weird things when a person typed pip rather than pip3 and module didn't seem to get installed anywhere

This won't change that at all; pip/pip3 is a distro packaging thing, and any distro that packages legacy python2 as “python" and Python 3 as “python3” will probably continue packaging legacy pip-for-python2 as “pip” and pip-for-python3 as “pip3”.


Most likely, some will rename the command and some will not, causing more pain for everyone :(


I like using `<python_executable> -m pip` to avoid all ambiguity about what python version I'm running pip with/installing things for. Usually `python3 -m pip`, or `python3.8 -m pip`.


I like using pyenv to manage Python versions which will then always symlink "pip" and "python" to whichever version is my system default, directory tree default, shell default, virtualenv etc.


This is already the case. Arch has python for python 3 and python2 for python 2. It's the opposite on Ubuntu and most other distros.


Ubuntu 20.04 has python2 and python3 executables, but no python, so no ambiguity.


This is absolutely my favourite way of doing it. If I need a `python` executable then I'll just make the symlink myself.


`sudo apt-get install python-is-python3` can fix that.


On Fedora:

    $ python --version
    Python 3.9.1


Arch is why there's a PEP saying not to do that.


The PEP [0] has been revised since then, and the current recommendation regarding /usr/bin/python is "equivalent to python2 OR equivalent to python3 OR not available at all OR configurable by user/sysadmin".

[0] https://www.python.org/dev/peps/pep-0394/


`pip` is still going to work - all the weird mistakes are going to keep happening. `pip` is just no longer providing new clients for 2.7 - existing clients will keep working.


Thanks. As someone with python 2 code I was worrying. No probs typing pip2 install if 3 is default.


For the record, Perl6 has been officially renamed to Raku[1]

IIRC the language known as "Perl", version 5, when it gets to the point when it is going to update it's major version, will skip 6 and go right to Perl 7

1: https://lwn.net/Articles/802329/


Yes, but it took so many years of confusing people not deeply familiar with the situation that perl6 will be a better perl5 (as perl5 was a better perl4, perl4 was a better perl3, etc.) while in reality it was a very different beast.


It would've taken 45 seconds of reading wikipedia article to figure out that Perl 5 and Perl 6 are different languages. Java went from 1.4 to 5. With standardized languages like C/C++/etc you have to keep track of not only the toolchain version but also the standard version (like C99/C++11, etc). Programmers routinely keep track of these just fine.


Renaming to a complete different name is not necessary, everyone understands a major version breaks compatibility. Python3 is still very close to Python2 both in syntax and spirit.

But there was a sort of broken promise given by the Python creators: Python3 was almost like Python2, but every library author had to review and repackage their libraries, anyway.

At that point, Python 3 should have been unambiguously incompatible with Python 2 :

  - the only allowed file extension should be py3

  - all environment variables should have been duplicated with a "3" (it shouldn't read or modify Python 2 env vars)

  - all installation folders should have been duplicated with "3"

  - all tools like pip should be suffixed with "3"

  - and most importantly, it shouldn't try to  optimistically run previous Python2 code or previous v2 tools
The mistake was that you could use "pip" or "python" in bash scripts/shell, and not know if python2 or python3 were going to run.

Still today, you can run "python" in a recent version of Ubuntu or Fedora, and it will be Python 3. Only "python3" should be possible. Distro are repeating the same mistake than with Python 2, and we will struggle again with Python 4, if there is any Python 4.

Many headaches wouldn't have happened if "python" was reserved to Python 2.

Pro tip to language and distro maintainers : make the major version part of the language name and executable, from version 1.


> - the only allowed file extension should be py3

That would have made cross-version codebases impossible, and that's what ultimately allowed migrating. One-shot migrations were not convenient, or successful, or even effectively feasible for complex enough projects.

What allowed the migration was community experiment in cross-version sources, as well as reintroduction of "compatibility" features into Python 3.


I dont buy this argument at all. Any sufficiently complex Python 2 project does not work with Python 3 without modification, there is _no_ cross-version compatibility.

And if you wanted to try that yourself anyway, changing all .py to .py3 in a directory is one unix command... It could easily have been part of a 2to3 tool


> I dont buy this argument at all. Any sufficiently complex Python 2 project does not work with Python 3 without modification, there is _no_ cross-version compatibility.

I was personally and solely responsible for migrating a >250kLOC project from Python 2 to Python 3, doing so without cross-version compatibility would not have been feasible. We literally picked the earliest P3 version we decided to support based on cross-compatibility features.


The issue is third-party libraries: They need to simultaneously support both versions of Python during the transition period. If you unilaterally migrate to v3, you break lots of existing projects. On the other hand, if you stay v2 only, you‘re holding up your dependent projects’ migration efforts.


I understand the benefit in theory, but in practice, you had two options :

- the codebase had to be modified to work on both versions at the same time

- you had to maintain two versions in different branches for a while

Is there any data to show which options was most often chosen amongst all pypi packages? I suspect that the second option was more popular for the most important packages of the ecosystem


The first option was by far the most popular and was used for years (including pip), only recently packages started dropping python 2 support. I'm not even aware of any packages which went with the second.


They made option 1 possible because people complained about option 2.

six was the #2 package on PyPI.[1] It's a library to help with option 1.

[1] https://python3wos.appspot.com/


> I suspect that the second option was more popular for the most important packages of the ecosystem

You're 100% wrong. The few packages which decided on option 2 early on (e.g. dateutil) ended up having to roll back to option 1 because it was such a pain in the ass, both for the maintainer and for downstream users. The migration only really started happening once 2.7 dropped, projects like Six[0] started appearing, and the community started ignoring 2to3 and building up experience with cross-version projects and idioms (e.g. [1], [2])

[0] whose entire point is to help with option 1

[1] https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-p...

[2] https://python-future.org/compatible_idioms.html


"the codebase had to be modified to work on both versions at the same time"

This. This is the only option.

I can tell you in decades of experience with I don't know how many companies and language, this is ALWAYS the option that is done.


That is not an elegant solution. That way you would forever had 3 in the name, even for future version of Python (e.g. python4, that will not break compatibility with old source code and thus making a .py4 wouldn't make sense).

In your shell you don't run "java8" or "java11", you just run "java", and then it's the matter of what version of java JDK you have in your PATH. The same with all other language interpreters and compilers, you don't run gcc9 or node14. Why doing something different for python?

Really, the mistake of python3 was to break compatibility with past programs. A lot of changes could have been done more gradually, in the first version require a __future__ import, then gradually remove compatibility with old features, and then remove them completely, making the new way the default and thus no longer require the __future__ import. And I think it will be the way for next python version, so in theory we will never have the same problem again.

Also to me it was an error for distros to continue packaging python2 as python. Other distributions, like ArchLinux, switched everything to python3 as default a lot of time ago, it's only Ubuntu that continued to ship python2 as python, thus making a lot of programmers relying on it. It would make sense that the command without the name refers to the latest version, and not the legacy one.


There's a certain cognitive overhead and ugliness to having to run `tool3` instead of `tool`, which makes me dislike this solution.


> everyone understands a major version breaks compatibility

That's not the case with C# or Java.


Why try to imagine the way it should have worked, when it shouldn't have happened in the first place. Python 3 was invented because Guido felt it looked nice, and the economic value of all the labor that went into pleasing him is likely equal to that of a small country.


OTOH PHP managed to migrate 4→5→7, and knew when to back out of v6. JS managed to migrate to ES6.

I think Python's failure is unique. It needlessly broke too much back-compat at once, provided too little benefits to make up for it, and let everyone drag their feet for a decade with the upgrade.


Also guaranteed that there will never be a Python 4 or any way to innovate the language.


> JS managed to migrate to ES6.

"We have nearly endless money and therefore we can make any new feature by treating older versions of our langage as bytecode and write transpilers for it".


This is why you always type “python3 -m pip” instead, to avoid ambiguity.


Regarding rename the framework: It did and did not work with .NET Framework and .NET Core. There is a lot of issues still there (and their branding decision now is to go ahead with ".NET" ... Which is also not good).

However, I do not disagree. Renaming is the right thing. A version number is easily omitted.


Perl 6 was never meant to 'kill' perl 5. It's a completely separate language, has been from the beginning, and it's been renamed Raku recently. Unlike with python, perl devs realized even 20 years ago people realized there was too much legacy perl 5 code for 'replacement' to be practical. The result is that perl 5 is very backwards compatible, Raku is at the very least an interesting language worthy of some attention.

Compare this to Python 2/3. It's basically an incompatible fork that doesn't add enough for many projects to consider upgrades, and adds the overhead of having to worry about two version. All it really accomplished was guarantee that "Python 4" will never, EVER be a thing.


Perl 6 *was* to be the next version of Perl. That was the intent when the whole effort started in 2000. That it didn't turn out that way, has many causes. Enough has been said about that already. But to say that it was a completely separate language from the beginning, is historically incorrect in my opinion.


I disagree. Perl 6 changed fundamental low level syntax and semantics in the language. The 4 -> 5 transition in contrast was mostly syntax compatible, and in fact Perl 4 scripts are our there in the wild running on the Perl 5 interpreter just fine. Yes, it should've been called "Larry's next crazy experiment language" from the start.

The most that was ever promised about the 5->6 transition was that there'd be ways of using 5 modules in 6 (which more or less works for 'pure-perl' 5 modules, within reason).


Fwiw I agree with both you and Liz.

As Larry put it the day he announced it:[1]

> It is our belief that if Perl culture is designed right, Perl will be able to evolve into the language we need 20 years from now. It’s also our belief that only a radical rethinking of both the Perl language and its implementation can energize the community in the long run.

> (which more or less works for 'pure-perl' 5 modules, within reason)

Are you misunderstanding what has been achieved?

Using the "Best First" view of replies to the 2008 PerlMonks question What defines "Pure Perl"?[2]

> "Pure Perl" refers to not using the C-extensions ("XS") and thus, not requiring a working C compiler setup.

Inline::Perl lets Raku code use Perl modules as if they were Raku modules. XS or pure perl.[3]

Not all of course. Some make no sense whatsoever in Raku (eg source filters). Some don't yet work but could if someone cared to deal with issues that arise. But if you're thinking that Rakudo only imports pure perl modules for the above definition of "pure perl", please know that Rakudo is light years ahead of that due to niner's amazing work. And if you mean some other definition of "pure perl" it would help me if you shared it. :)

[1] https://www.perl.com/pub/2000/10/23/soto2000.html/

[2] https://www.perlmonks.org/?node_id=709757

[3] https://raku.land/cpan:NINE/Inline::Perl5


Agree that it did change low level syntax. What was promised initially was that a "use v5" in a lexical scope, would switch to a Perl 5 compatible parser. This project was started, but became pre-empted by the Inline::Perl5 module, which now allows you to use 99.9% of CPAN's Perl 5 modules (even with XS components) seamlessly in Raku. And yes, that is stable enough to be used in production.


The more accurate thing would be to say it ended up killing all perl, which is something the Python 2/3 transition hasn't done to Python, warts an all.


Go ahead and run "sudo rm /usr/bin/perl" on all your servers then and tell me what happens.


I'm reasonably sure a big distro recently removed Perl from the base install.


Which one?


What difference could that make? There are lots of languages shipped in various distributions that are no longer 'living' languages in the sense that they have little mindshare and little new development is done in them. Perl wasn't one of those languages and now it is. If you want to comment on that or perhaps dispute it, sure. But the 'delete perl and see what happens' thing is beside-the-point nitpickery.


Perl has "little mindshare" and "little new development" in the same way as Bash. It's there, people who understand unix will reach for it when it's appropriate and it's as indispensable.

I want to know which distro removed perl because that's quite a drastic step and am interested in studying it. Sorry if that offends you.


It doesn't offend me, it just isn't related to the point I was making. You replied to me as if that sort of test is relevant and I don't think it is. The analogy to bash is similarly not suitable for this kind of argument - bash has never had any ambition to be a general purpose programming language, perl very much did. Unfortunately for perl, perl's own efforts in that regard pretty much eliminated the possibility of that ever happening. Python's trajectory hasn't been that, even with the travails of the 2/3 transition.

'look at how perl did things!' is just a really strange approach in a discussion about the Python 2/3 thing. That operation was successful, the patient died.


https://linuxconfig.org/how-to-install-perl-on-redhat-8

Biggest of them all. I can't find the announcement.


The pip/pip3 thing is partly the fault of distros and partly due to lazy training and insufficient tooling for python.

The only distro I'm aware of which tries to protect users shooting themselves in the foot with pip is Gentoo. Most of the others will happily let you "sudo pip install" stuff and lead people to think that's the correct way to do things.

Unfortunately pipx has come too late. Pipx provides a real way for users to install arbitrary python tools, but too many docs out there tell users to use pip. Then you've got all the users who want to "play around" with python and install libraries. Even things like jupyter have crap support for virtualenvs and make it far too easy for users to have all their projects in a single env. It's a mess.

Regular users should never have been exposed to pip2/pip3. They should never have even been interacting with the OS python interpreter. Pip should only exist in a "global" contexy to support bootstrapping virtualenvs and nothing else. Poetry does a lot of this right.


This still won't free us from the "your pip version is out of date" errors.


They should have called the new language python.encode("utf-8").


> open source languages have to put up with this

Not sure what any of this has to do with open source languages, mind clarifying?


omg yes. this needs to become the gospel. by renaming you lose "branding" that you quickly regain, and don't affect compatibility at all.

if you step back and think about it, using the same word for two fundamentally irreconcilable things is mad.


Ruby did it right.


what was their approach?


Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: