The minor improvements of Python 3 did not warrant breaking backwards compatibility and most could have been handled in a way that would not break it opt-in directives.
The very swarms of users that chanted “just upgrade” as if that not incur a significant cost also seemed ridiculously naïve to me, not understanding the real cost that large projects have by having to rewrite very extensive codebases and dealing the potential regressions that that might involve.
Everything about the switch, from it's very conception, to it's execution, was handled in a veritably disastrous way by the team, that really did not seem to appreciate even a fraction of what obviously is involved with projects that have millions of lines of code and of course would rather not have to rewrite that all.
This is why many projects such as Linux, Windows, Rust, Cobol, Fortran, C, and C++ take backwards compatibility quite serious. Serious enterprises do not like to invest in something if it mean that 10 years later they would have to rewrite their entire codebase again.
Even on my home computer, I simply do not have the time to rewrite the many Python 2 scripts that have written over the years that run my computer. — it is cumbersome enough that once in a while part of my desktop stops functioning because my distribution removed a Python 2 library which I had relied upon as a system library that I now have to install as a user library but that was hitherto quite easily fixed.
Do these men think that time is free?
I think the incompatibilities between Python 2 and Python 3 fell into two categories
1. trivial and totally avoidable API changes by the Python developers ( like `iteritems()` being renamed to `items()` and the Python2-`items` being removed from the language). The bet on the Python-dev side was that `2to3` would take care of that and in this they totally underestimated that libraries couldn't and wouldn't just make a python3 migration in lockstep with the primary python release.
2. Change to unicode-strings by default with a clear distinction between unicode-strings and byte-buffers for all data encoded in any other fashion.
Most people on Python3 nowadays won't actually know how beneficial Change no. 2 was overall for the health of the Python ecosystem and the stability of their code bases. But it also was the tricky part of the migration for some code bases that would do a lot of string / file content plumbing like mercurial as a prominent example).
Change no. 1. was a PITA and a lot could have been avoided, but it wasn't a huge problem. The huge problem for the ecosystem was the unicode change, but I don't think anyone questions its usefulness (except for Armin Ronacher maybe, who is the most prominent voice with a dislike for that).
Whenever I show this to people they are (rightly) horrified.
~ $ python
~ $ python3
There where large changes to fundamental parts of the type system, as well as core types. Pretending this isn’t the case belays ignorance or at the very least cherry picking.
How would you have handled the string/bytes split in a way that’s backwards compatible? Or the removal of old-style classes?
So my answer is that it was a deeply misconceived change that shouldn't have been made at all, let alone been taken as the cornerstone of a "necessary" break in backward compatibility.
And the ecosystem agrees.
You're not making an argument about backward compatibility here, you're making a strong claim that representing text as a sequence of Unicode code points is fundamentally wrong. I have never heard anyone make this point before, and I am inclined to disagree, but I'm curious what your reasoning is for it.
There are no operations on sequences of Unicode code points that are more correct than an analogous operation on bytes.
(Everyone's favourite example, length, actually becomes less correct—a byte array's length at least corresponds to the amount of space one might have to allocate for it in a particular encoding. A length in codepoints is absolutely meaningless both technically and linguistically. And this is, for what little it's worth, close to the only operation you can do on a string without imposing additional restrictions about its context.)
That’s ridiculous. Uppercasing/lowercasing, slicing, “startswith”, splitting, etc etc.
Your statement is correct if you only care about ascii.
Even then, upper/lower-casing is iffy.
Wow. I wonder how you arrived at this point. You can't, for example, truncate a UTF-8 byte array without the risk of producing a broken string. But this is only the start. Here are two strings, six letters each, one in NFC, the other in NFD, and their byte-length in UTF-8:
"Åström" is 8 bytes in UTF-8
"Åström" is 10 bytes in UTF-8
In fact, if the software tells you that either of the strings is either 8 or 10 letters wrong, then either way the software is incorrect - those are both obviously 6 letter strings.
Now, does UTF8 help you discover they are 6 letter strings better than other representations? There are certainly text-oriented libraries that can do that, but not those that simply count the UTF8 code points - they must have an understanding of all of Unicode. Even worse, the question "how many letters does this string have" is not generally meaningful - there are plenty of perfectly valid unicode strings for which this question doesn't have a meaningful answer.
However, the question "how many unicode code points does this string have" is almost never of interest. You either care about some notion of unique glyphs, or you care about byte lengths.
What I wanted to get at is that in Unicode, I have a chance to count letters to some useful degree. Why should I consider starting at byte-arrays?
> there are plenty of perfectly valid Unicode strings for which this question doesn't have a meaningful answer.
I don't get it. Why does the existence of degenerate cases invalidate the usefulness of a Unicode lib? If I want to know how many letters are in a string, I can probably get a useful answer from a Unicode lib. Not for all edge-cases, but I can decide on the trade-offs. If I have a byte-array, I start at a lower level.
You do not. You merely happen to get the right answer by coincidence in some cases, same as bytes-that-probably-are(n't)-ASCII. To throw your own words back at you:
"Åström" is 6 code points in Unicode
"Åström" is 8 code points in Unicode
Normalization is not a real solution unless you restrict yourself to working with well-edited formal prose in common Western languages.
This is not a claim made from ignorance.
Stop treating these cases as equivalent. They're not.
In the end, the only layer at which it really matters whether your byte sequence can be decoded is the font renderer, and just being valid utf-8 isn't good enough for it either.
Ok that explains how we ended up here. I'm considering some other common uses! A search-index for example greatly profits from being able to normalize representations and split words.
You can still best-effort split words! You can do a pretty good job splitting words without ensuring that the words decode in your preferred encoding.
I understand you're arguing about some sort of equivalency between byte-arrays and Unicode strings. Sure there are half-baked ways to do word-splitting on a byte-array. But why do you consider that a viable option? Under what circumstances would you do that?
return set(filter(None, re.split("\W+", unicodedata.normalize("NFKC", input_str).lower())))
The way I see it: If the encoding is implicit, you have global state. If it's explicit, you have to pass the encoding. Both is extra state to worry about. When the passed value is a Unicode string, this question doesn't come up.
I realize this sounds like a total cop-out, but when the use-case is destructively best-effort tokenizing an input string using library functions, it doesn't really matter whether your internal encoding is utf-32 or utf-8. I mean, under the hood, normalize still has to map arbitrary-length sequences to arbitrary-length sequences even when working with utf-32 (see: unicodedata.normalize("NFKC", "a\u0301 ﬃ") == "\xe1 ffi").
So on the happy path, you don't see much of a difference.
The main observable difference is that if you take input without decoding it explicitly, then the always-decode approach has already crashed long before reaching this function, while the assume-the-encoding approach probably spouts gibberish at this point. And sure, there are plenty of plausible scenarios where you'd rather get the crash than subtly broken behaviour. But ... I don't see this reasonably being one of them, considering that you're apparently okay with discarding all \W+.
Having "accents fall off" has gotten people murdered . Accents aren't things peppered in for effect, they turn letters into different letters, spelling different words. Analogously, imagine that a bunch of software accidentally turned every "d" into a "c" because some committee halfway around the world decided "d" should be composed of the "c" and "|" glyphs. That's the kind of text corruption that regularly happens in other languages when dealing with text at the code point layer.
 https://languagelog.ldc.upenn.edu/nll/?p=73 . Note that this is Turkish, which has the "dotted i" problem, meaning that this was more than likely a .toupper() gone wrong rather than a truncation issue.
If I have an byte-array, I can do none of these things short of implementing a good chunk of Unicode. If I truncate, I risk ending up with an invalid UTF-8 string. End of story.
Basically, I believe the point here is that a Unicode aware truncation should be done in a Unicode aware truncate method. There is no good reason to parse a string as UTF-8 ahead of time - just keep it as a blob of bytes until you need to do some something "texty" with it. It is the truncate-at-word-boundaries() method that should interpret the bytes as UTF-8 and fail if they are not valid. Why parse it sooner?
Yes, and? You can have an invalid sequence of Unicode code points too, such as an unpaired surrogate (something Python's text model actually abuses to store "invalid Unicode" in a special, non-standard way).
If you truncate at the byte level, you are just truncating "between code points"; it's a closer granularity than at the code point layer, so you can also convert to NFC, truncate on word boundaries, etc. You just need to ignore the parts of the UTF-8 string that are invalid; which isn't difficult, because UTF-8 is self-synchronizing.
A language pragma.
All functions that return `bytes` continue to do so unless specifically opted in on a per file basis, then they return `unicode`.
`str` thus returns `bytes` as it does in 2, unless the pragma ask otherwise.
> Or the removal of old-style classes?
They would obviously not be removed and still be available but depræcated.
Nothing in py2 returns bytes. They all return strings. That is the issue. What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?
A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.
It would make the already ridiculous py2 str/bytes situation even more ridiculous.
> They would obviously not be removed and still be available but depræcated.
Having two almost separate object models in the same language is rather silly.
No, that is not an issue, that is semantics.
What one calls it does not change the behavior. And aside that the system could perfectly well be designed that this pragma changes that `str` is synonymous with either `bytes` or `unicode` depending on it's state.
What about subclasses or type wrappers? What about functions that return bytes or utf8 strings? How would you handle code that then calls “.startswith()” on a returned string/bytes value?
You would now which is which by using the pragma or not.
Not using the pragma defaults to the old behavior, as said, one only receives the new, breaking behavior, when one opts in.
Python could even support always opting in by a configuration file option for those that really want it and don't want to add the pragma at the top of every file.
> A language pragma that fundamentally alters a built in type across all the code you have in a program is never going to work and pushes the burden onto library authors to support a large matrix of different behaviours and types.
Opposed to the burden they already had of maintaining a 2 and 3 version?
Any new code can of course always return `unicode` rather than `str` which in this scheme is normally `bytes` but becomes `unicode` with the pragma.
> Having two almost separate object models in the same language is rather silly.
Yes, it is, and you will find that most languages are full of such legacy things that no new code uses but are simply for legacy purposes.
“It is silly.” turns out to be a rather small price to pay to achieve. “We have not broken backwards compatibility.”.
You can imagine some world with a crazy context-dependent string/bytes type. Cool. In reality this would have caused endless confusion, especially with beginners and the scientific community, and likely killed the language or at the very least made the language a shadow of what it is now.
They made the right choice given the outcome. Anything else is armchair postulation that was discussed previously and outright rejected for obvious reasons.
Because they're doing everything they can to force py2 to go away. It's not it's dying a natural death out of disuse. Exhibit A is everyone else in this post still wanting to use it.
If you think strings "work" under py3, my guess is you've never had to deal with all the edge cases, especially across all the 3 major desktop platforms. Possibly because your applications are limited in scope. (You're definitely not writing general-purpose libraries that guarantee correctness for a wide variety of usage.) Most things Python treats as Unicode text by default (file contents, file paths, command-line arguments, stdio streams, etc.) are not guaranteed to be contain only Unicode. They can have invalid Unicode mixed into them, either accidentally or intentionally, breaking programs needlessly.
As a small example, try these and compare:
python2 -c "import sys; print('Your input was:'); print(sys.argv)" $'\x80' | xxd
python3 -c "import sys; print('Your input was:'); print(sys.argv)" $'\x80' | xxd
I say all these because I've run into these and dealt with them, and it's become clear to me that others who love Unicode strings just haven't gone very far in trying to use them. Often this seems to be because they (a) are writing limited-scope programs rather than libraries, (b) confine themselves to nice, sanitized systems & inputs, and/or (c) take an "out-of-sight -> out-of-mind" attitude towards issues that don't immediately crop up on their systems & inputs.
At the risk of sounding like a dick, I’m a member of the Django technical board and have been involved with its development for quite a while. Is that widely used or general purpose enough?
If you want a string then it needs to be a valid string with a known encoding (not necessarily utf8). If you want to pass through any data regardless of its contents then you use bytes. They are two very different things with very different use cases.
If I read a file as utf8 I want it to error if it contains garbage, non-text contents because the decoding failed. Any other way pushes the error down later into your system to places that assume a string contains a string but it’s actually arbitrary bytes. We did this in py2 and it was a nightmare.
I concede that it’s convenient to ignore the difference in some circumstances, but differentiating between bytes/str has a lot of advantages and makes Python code more resilient and easier to read.
That's not quite what I was saying here. Note I said "wide variety of usage", not "widely used". Django is a web development framework—and its purpose is very clear and specific: to build a web app. Crucially, a web framework knows what its encoding constraints are at its boundaries, and it is supposed to enforce them. For examples, HTTP headers are known to be ASCII, HTML files have <meta ...> tags to declare encodings, etc. So if a user says (say) "what if I want to output non-ASCII in the headers?", your response is supposed to be "we don't let you do that because that's actually wrong". Contrast this with platform I/O where the library is supposed to work transparently without any knowledge of any encoding (or lack thereof) for the data it deals with, because that's a higher-level concern and you don't expect the library to impose artificial constraints of its own.
It's an easy but very much confused mistake to make if the text you work with is limited to European languages and Chinese.
Not really. How would “.toupper()” work on a raw set of bytes, which would either contain an MP3 file or UTF8 encoded text?
Every single operation on a string-that-might-not-be-a-string-really would have to be fallible, which is a terrible interface to have for the happy path.
How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding (not that it means much with it).
How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.
How would “.startswith()” work with regards to grapheme clusters?
Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to do if your base type is “just bytes”.
It doesn't. It doesn't work with Unicode either. No, not "would need giant tables", literally doesn't work—you need to know whether your text is Turkish.
> How would slicing work? I want the first 4 characters of a given string. That’s completely meaningless without an encoding.
It's meaningless with an encoding: what are the first four characters of "áíúéó"? Do you expect "áí"? What are the first four characters of "ﷺ"? Trick question, that's one unicode codepoint.
At least with bytes you know that your result after slicing four bytes will fit in a 4-byte buffer.
> How would concatenation work? I’m not saying Python does this, but concatenation two graphemes together doesn’t necessarily create a string with len() == 2.
It doesn't work with Unicode either. I'm sure you've enjoyed the results of concatenating a string with an RTL marker with unsuspecting text.
It gets worse if we remember try to ascribe linguistic meaning to the text. What's the result of concatenating "ranch dips" with "hit singles"?
> How would “.startswith()” work with regards to grapheme clusters?
It doesn't. "🇨" is a prefix of "🇨🇦"; "i" is not a prefix of "ĳ".
> Text is different from bytes. There’s extra meaning and information attached to an arbitrary stream of 1s and 0s that allows you to do things you wouldn’t have been able to before.
None of the distinctions you're trying to make are tenable.
The problem that you have raised here seems to be one of what alphabet or language is being used, but that issue cannot even arise without taking the interpretation into account. If you want alphabet-aware, language-aware, spelling-aware or grammar-aware operators, these will all have to be layered on top of merely byte-aware operations, and this cannot be done without taking into account the intended interpretation of the bytes sequence.
Note that it is not unusual to embed strings of one language within strings written in another. I do not suppose it would be surprising to see some French in a Russian-language War and Peace.
A type to specify encoding alone? Totally useless. You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..
I see you have been editing your post concurrently with my reply:
> You can just as well implement those operations on top of a byte string assuming the encoding and language &c., as you can implement those operations on top of a Unicode sequence assuming language and culture &c..
Of course you can (though maybe not "just as well"), but that does not mean it is the best way to do so, and certainly not that it is "totally useless" to implement the decoding as a separate step. Separation of concerns is a key aspect of software engineering.
Codepoints are not glyphs. Nor are any useful operations generally performed on glyphs in the first place. Almost all interpretable operations you might want to do are better conceived of as operating as substrings of arbitrary length, rather than glyphs, and byte substrings do this better than unicode codepoint sequences anyway.
So I contest the position that interpreting bytes as a glyph sequence is a viable step at all.
Going back to the post I originally replied to, how would going down to a bytes view avoid the problems you see?
The choice of the bytes view specifically is just that it's the most popular view from which you can achieve one specific primitive: figuring out how much space a (sub)string occupies in whatever representation you store it in. A byte length achieves this. Of course, a length in bits or in utf-32 code units also achieves this, but I've found it rather uncommon to use utf-32 as a transfer encoding. So we need at least one string type with this property.
Other than this one particular niche, a codepoint view doesn't do much worse at most tasks. But it adds a layer of complexity while also not actually solving any of the problems you'd want it to. In fact, it papers over many of them, making it less obvious that the problems are still there to a team of eurocentric developers ... up until emoji suddenly become popular.
Now, I can understand the appeal of making your immediate problems vanish and leaving it for your successors, but I hope we can agree that it's not in good taste.
For example, working with the utf-8 view does not somehow foreclose on knowing how much memory a (sub)string occupies, and it certainly does not follow that, because this involves regarding the string as a sequence of bytes, this is the only way to regard it.
For another, let's consider a point from the linked article: "One false assumption that’s often made is that code points are a single column wide. They’re not. They sometimes bunch up to form characters that fit in single “columns”. This is often dependent on the font, and if your application relies on this, you should be querying the font." How does taking a bytes view make this any less of a potential problem?
Is a team of eurocentric developers likely to do any better working with bytes? Their misconceptions would seem to be at a higher level of abstraction than either bytes or utf-8.
You are claiming that taking a utf-8 view is an additional layer of complexity, but how does it simplify things to do all your operations at the byte level? Using utf-8 is more complex than using ascii, but that is beside the point: we have left ascii behind and replaced it with other, more capable abstractions, and it is a universal principle of software engineering that we should make use of abstractions, because they simplify things. It is also quite widely acknowledged that the use of types reduces the scope for error (every high-level language uses them.)
The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.
There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem; that's cargo-culting. An abstraction that does no work is, ceteris paribus, worse than not having it at all.
The quote, as you presented it, leaves open the question: more capable than what? Well, there's no doubt about it if you go back to my original post: more capable than ascii. Up until now, as far as I can tell, your thesis has not been that unicode is less capable than ascii, but if that's what your argument hangs on, go ahead - make that case.
What your thesis has been, up to this point, is that manipulating text as bytes is better, to the extent that doing it as unicode is harmful.
> It must simply do something better. If there were actually anything at all it did better...
It is amusing that you mentioned the burden of proof earlier, because what you have completely avoided doing so far is justify your position that manipulating bytes is better - for example, you have not answered any of the questions I posed in my previous post.
> The heart of the matter is that a Unicode codepoint sequence view of a string has no real use case.
Here we have another assertion presented without justification.
> There is no "universal principle" that we use abstractions always, regardless of whether they fit the problem...
It is about as close as anthing gets to a universal principle in software engineering, and if you want to disagree on that, go ahead, I'm ready to defend that point of view.
>... that's cargo-culting.
How about presenting an actual argument, instead of this bullshit?
Furthermore, you could take that statement out of my previous post, and it would do nothing to support the thesis you had been pushing up to that point. You seem to be seeking anything in my words that you think you can argue against, without regard to relevance - but in doing so, you might be digging a deeper hole.
> An abstraction that does no work is, ceteris paribus, worse than not having it at all.
Your use of a Latin phrase does not alter the fact that you are still making unsubstantiated claims.
I guarantee you there will be a quick counterexample to demonstrate that the claimed use-case is incorrect. There always is.
You may review the gish gallop in the other branch of this thread for inspiration.
I don't know why there isn't a sys.argvb as there is is.environb
"UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed
Primarily because they killed py2, not because they won people over with their Unicode approach.
You're still breaking code by default this way, but no one would have trouble updating.
My concern is that, if you don't make the preferred behavior clear, a lot of people would simply never adopt it. I don't think that Python's userbase in particular is going to spend time reading documentation on best practices.
Perl5 has similar flags ("use strict"), and Racket brings it even further to define the whole fucking language of the rest of the file ("#lang racket/gui"). Having the language being choosable by the user is against the "zen of python", I guess. In other words: Such an attemp does not feel "pythonic".
yes. If breaking, fundamental changes are common, that's a problem.
My understanding is that the corresponding types are available in both 2 and 3, they're just named differently. A different one is "string". So you could have had some kind of mode directive at the top of the file which controlled which version that file was in, and allow files from 2 and 3 to be run together.
What if one function running in “py2 mode” returned a string-that-is-actually-bytes, how would a function in “py3 mode” consume it? What would the type be? If different, how would it be detected or converted? What if it retuned a utf8 string OR bytes? What if that py3 function then passed it to a py2 function - would it become a string again? Would you have two string types - py2string that accepts anything and py3string that only works with utf8? How would this all work with C modules?
It would be bytes. Because py2 string === py3 bytes.
> What if that py3 function then passed it to a py2 function - would it become a string again?
> Would you have two string types - py2string that accepts anything and py3string that only works with utf8?
Yes. You already have those two types in python3. bytes and string. You'd just alias those as string and utf8 or whatever you want to call it in python2.
How would this all work with C modules?
They'd have specify which mode they were working with too.
So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.
Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.
It would become an absolute complete clusterfuck of corner cases that would have killed the language outright.
Yes, that's the whole point. Because compatible modes allow for a gradual transition. Which in practice allows for a much faster transition, because you don't have to transition everything at once (which puts some people off transitioning entirely - making things infinitely harder for everyone else).
> So you’d have some magic code that switches py2str to bytes. Which means every py3 caller has to cast bytes into a string to do anything useful with it, because returning strings is the most common case. Then that code has to be removed when the code it’s calling is updated to py3 mode. Which is basically the blue/green issue you see with async functions but way, way worse.
Well yes, you'd still have to upgrade your code. That goes with a major version bump. But it would allow you to do it on a library-by-library basis rather than forcing you to wait until every dependency has a v3 version. Have that one dependency that keeping you stuck on v2: no problem, upgrade everything else and wrap that one lib in conversion code.
> Then you’d need to handle subclasses, wrappers of bytes/str, returning collections of strings across py2/py3 boundaries (would these be copies? Different types? How would type(value) work?), ending up with mixed lists/dicts of bytes and strings depending on the function context, etc etc.
I'm not sure I understand the problem here. The types themselves are the same between python 2 and 3 (or could have been). It's just the labels that refer to them that are different. A subclass of string in python 2 code would just be a subclass of bytes in python 3 code.
Python 3 made the wrong trade-off of core developer hours vs. external developer hours.
py2 unicode == py3 str
The problem with this approach is that they wanted to reuse the `str` name, which requires a big "flag day", where it switches meaning and compatibility is effectively impossible across that boundary (without ugly hacks).
What they could have done instead would have been to just rename `str` to `bytes`, but retain a deprecated `str` alias that pointed to `bytes`.
That would keep old scripts running indefinitely, while hopefully spewing enough warnings that any maintained libraries and scripts would make the transition.
Eventually they could remove `str` entirely (though I'd personally be against it), but that would still give an actual transition period where everything would be seamlessly compatible.
Same thing with literals: deprecate bare strings, and transition to having to pick explicitly between `b"foo"` and `u"foo"`. Eventually consider removing bare strings entirely. DO NOT just change the meaning of bare strings while removing the ability to pick the default explicitly (in contrast, 3.0 removed `u"asdf"`, and it was only reintroduced several versions later).
What made me personally lose faith in the Python Core team wasn't that Guido made an old mistake a long time ago. It wasn't that they wanted to fix it. It was the absolutely bone-headed way that they prioritized aesthetics over the migration story.
> How would this all work with C modules?
In non-strict mode, you'd be able to use either py2 strings or py3 bytes with these, and gradually move all modules to strict mode which requires bytes.
And then, gradually after a decade or so attempt to get rid of all py2 types.
I'm not sure it's the best way to handle it, but I would have been fine with:
from __python2__ import *
from __python2__ import ascii_strings, old_style_classes, print_statement, ...
Maybe we could still evolve pypi to support a compatibility layer to allow easy mixing of python2 and python3 code, but I get the feeling that Python 3 has poisoned the well.
That's just plain stupid. Just print a warning and add a python2 flag that hides the warning. Don't release a major version because of something trivial like this.
The fact that people seem to complain exclusively after Python 2' end of life a year ago feels a little telling. Perl's community roffle stomped their previous vision for Perl 6. Python community wasn't vocal about this being a bad change. Rather the opposite, very loud support.
Keep in mind, I dislike Python either way, but I'm not one of the devs that complains about continuing education requirements, or language adding things over each 10 year period. I can work in Python just fine, but that doesn't mean it feels nice & hygienic to use for me personally.
But you kind of addressed your entire spiel: Hindsight is exceedingly easy. They didn't realize how inadequate their migration tooling was, or how very entrenched Python 2 is in various places. It's hard when you don't know what you don't know and you're highly motivated by hopeful aspirations.
They could have fixed most of this legacy code without changing the external user-facing API so much.
They could have, but they didn't want to.
It's an open source project. Is there really much of a difference between "I'm not going to work on this system because it's terrible" and "I'm forking this system and I'm not going to support the previous version"?
In both cases you can say "well someone else will just come along and support it", and for py2 they did, for a bit. In fact I believe you can still pay if you happen to want py2 support.
But if you're not paying, you're saying "hey, this thing you work on and provide to me for free - why are you working on it in the way you want rather than the way I want??"
Was python-2 handed off to new maintainers? News to me.
> why are you working on it in the way you want rather than the way I want
Is "it" python-2 or python-3?
This isn't users demanding py3 devs support py2 - it's users asking that devs who no longer want to support py2 to hand it off to those that do, rather than blocking it.
I'm just judging their technical decision making. They are perfectly entitled to delete the whole project and start a new one and I have absolutely no right to say they shouldn't.
But I do have a right to critique their decisions from a technical standpoint.
That's a problem of a language being oriented around a single implementation. Is it even defined by this implementation?
Compare to eg. C or C++.
Diversity, and interoperability is important as it is a significant contributor to longevity.
I do like that you've used the term "API" as I think that sums it up. To think of "Python" not as a language agreed by multiple implementors, the behaviour here is that of a "library" with an "API".
Nothing about python versioning is easy. It’s a disaster and the key reason I do not start any projects in python.
And it is quite clear that that choice was not based on accurate estimates and insights.
The original e.o.l. was laughably short and then had to be doubled. It was quite clear they based their choice on the assumption that consumers would have all switched to e at a time when 2 was still used by 80%.
They made that choice based on what can only be seen as complete ignorance of the cost of rewriting software.
Right now, the biggest reason to drop Python 2 for most serious consumers is not any of the improvements that Python 3 brings, but that it is e.o.l..
I want to understand what was so hard about porting code from Python 2 to 3. I ported a few tens of thousand lines of Python 2 code to Python 3 and it was pretty trivial. In my experience the only thing that made porting code hard was when a package you depended on was not ported to Python 3 yet. But maybe my experience does not reflect some other cases. Can you eloborate on what was so hard about porting code from Python 2 to 3?
How do I regression test five different pieces of DAQ hardware? My best plan is to pull them working systems and deal with them missing. I don’t think it’s a good use of resources to buy extra DAQ cards just for a regression test bed.
Regardless of that, moving from python 2.5 to 2.7 is not trivial because not all used libraries were even updated to 2.7 from 2.5. Some that were broke backwards compatibility. How far do I have to bend backward just to get in the right place to update to python 3? I see many comments trivializing the effort needed to update to python 3 because they know of narrow use cases and expect large amounts of resources to maintain code. That isn’t the reality for most users.
I have Python code dating back to when I was an undergrad. It's sad to see the Python team decide to nuke that. My C code from then (mostly) runs fine still.
The team decided to externalize a massive cost on its community without much benefit. That was sad to see at the time, and it continue to be sad to see.
Python code is inherently almost-untestable and fragile. These days, when coding something critical and non-trivial, I choose a memory-safe language with static typing and type inference, ADTs, pattern matching and try to write simple yet pure functional code with well defined semantics, that works almost by definition.
Well yes, sure, of course.
And like you said, maybe Python isn't the right language in the first place for mission-critical life-is-on-the-line software.
But if you have already gotten yourself into a position where some piece of your business infrastructure is dependent on an obscure bit of hard-to-port-to-Python-3-and-maintain-exact-behaviour Python code, then it is exactly the "2to3 transition that's shaking up your house of cards", no?
And, furthermore, like you said, if you find yourself in this position, you should be looking at some other language entirely rather than porting to Py3, eh?
I was referring to untestable code with a myriad of edge-cases, in which case you have a problem that will surface sooner or later, be it 2to3 transition or something else.
If the code is truly static, you can ignore the transition and deprecation. Otherwise you should probably work on documentation/testing/refactoring and/or porting to another language.
2to3 transition was handled badly, up to about 2.7 and 3.4 or so, but the pains described here seem mostly self-inflicted, and I don't see it as an argument against the needed changes.
They targeted business; it came to be adopted by business; and then they were surprised that business was not enthusiastic about updating currently working code with all the potential regressions and downtime that might come from it.
As I recall, Python was designed for the Amoeba operating system, and drew on experience from implementing ABC; ABC was definitely designed for education.
But ABC != Python. Checking now, the first Usenet post for Python 0.9 says:
> Python can be used instead of shell, Awk or Perl scripts, to write prototypes of real applications, or as an extension language of large systems, you name it.
See https://groups.google.com/g/comp.sys.sgi/c/7r8kVgQ84j0 .
It doesn't specifically mention using Python for education.
Then sounds like they didn't want to be python devs anymore, good luck on their new project..
Instead they held onto the reins and drove python into the ground so that their new code could devour the remains of the old.
> They didn't realize how inadequate their migration tooling was
A shame then that they decided that migration was mandatory. They don't need to know either, they just have to encourage users to migrate, rather than force them to. Saying "They didn't realize ... how very entrenched Python 2 is" is basically saying "we didn't think we'd encounter (significant) resistance". Their "hopeful aspirations" was that everybody (that mattered) would be onboard, which is why they didn't bother ask..
This post might be true, but it's roughly 10 years late in terms of hitting the intended audience. Everyone gets this now, and "beating a dead horse" might be an understatement
Some dead horses need a serious beating every now and then to remind people that they can resurrect if you're not careful. All of the lessons the python team did not put into practice were well known at the time, but they knew better and here we are.
The day after tomorrow someone will make breaking changes to some API, framework, language or OS who still needs to learn this lesson, maybe we'll get to them in time.
For the people who work in the correct companies this will generate many billable hours for no gain.
For others it will be a lot of unpaid work again. At this stage Python should be forked.
I seriously (I mean seriously) thought about forking Py2 (Tauthon is great BTW) but then I found out that PyPy has a Python2 mode and will for the foreseeable future. Just to be clear: PyPy runs Python 2 code, and always will. (As far as I know. Although it occurs to me that I have no idea what it's like if you're trying to work with the C API.)
(Also I got into Prolog, but that's another story.)
It is pretty much forked.
pypi is a mostly volunteer-only endeavor, so it’s tough to support stuff forever. And even there older pips still will work!
Python 2 still works! It’s still there! Nobody is taking it away from you in any real sense. But Python developers don’t want to continue developing in that environment so are choosing to not handle it for future stuff.
Python 2 works. You can use it forever if you want. Nobody is forcing you to upgrade... except if you want the free labor from the community. And you have had years and years and years.
One or two of them are in this comment section right now, denying culpability and fanning flamewars.
Not quite the behavior of the repentant.
Python 2 to Python 3 was nothing like rewriting an entire codebase. Most of the difficulty was if you depended on a package that only supported Python 2, other than that it was pretty easy to port a Python 2 codebase to Python 3. If you have millions of lines of code it might take more time understandably, but still it was nothing like rewriting a whole codebase.
And yet we haven't heard of this being an actual, real problem, or are there any high profile examples?
I had to migrate multiple small projects (~10k loc) myself. That should be the typical use case for python (power law etc.) The whole thing took about half an hour per 1000 loc, and I had more than 10 years to plan it.
There were serious issues in Python 2 that could not be fixed in any backward compatible way, and would have made further progress forward impossible.
It wasn't done lightly and a lot of smart people thought about it for a long time.
And your old Python 2 scripts will continue to work forever, so I'm not quite sure what your beef is.
That said, I still think situation is mishandled for this reason: py3 is basically another language, similar to py2. Calling it py3 is an exercise in marketing - instead of creating a new language to compete with py2 (along with all similar languages e.g. Julia) the existing py2 community was leveraged/shamed* into supporting the new thing, and most importantly, py2 was killed by it's maintainers (rather than handed off) so it couldn't compete with py3, and so that users would be forced to move somewhere else - py3 being the easiest.
If it had properly been a new language, they could have taken more liberties (compat breaking) to fix issues, like a single official package manager. And migration to py3 would have been more by consent, than by force.
- - - -
An additional aspect that I see as an old Python user is the "poisoning of the well" of the inclusive and welcoming spirit of the community. We (I'm speaking as a Pythonista here) have had problems with this in the past (remember how grouchy effbot could be? He's a sweet person IRL though.)
We made great progress and got a lot of acceptance in the educational and academic worlds.
Now just read this very thread and you'll find so many people making curt dismissive comments to folks who aren't on board with Python 3.
I still love and respect GvR (I once, with his permission, gave him a hug!) even though I think he messed up with this 2->3 business (and in any event, the drama around language innovation eventually pushed him to resign, as we all know.) He's a human being. And a pretty good one.
I guess what I'm trying to say is Python 3 won. Let us (all of us) be gracious about it.
In 12 years? You must have written some very extensive scripts.
Written by someone that, for some reason, did not decide to maintain their own fork of python2. If time isn't free, why is it expected from maintainers to support other companies' lifestyle with their own time?
If you don't like the laws, are you a hypocrite for not starting your own country?
Arguments in this thread seem to miss a discrepancy:
"We don't want to support py2, and so why should we? Our time isn't free and we do what we want!"
"We know you don't want to migrate your py2 code, but you have to."
Here's a question - why isn't python-3 a fork of python? Answer: because forks are hard, and the devs wanted to keep all the momentum/resources of python-2.
I understand that some people and companies are now caught between a rock and a hard place right now. But honestly, that rock has been coming for 12 years now, and the alternative is to put other people in that situation.
py3 devs don't need to do the work, they just need to hand it off.
> The thing is, you can't both..
Yes you can, if "maintaining" is handing it off, as opposed to the straw man of forcing py3 devs to do it. Why do the gatekeepers only allow for themselves to do the work?
> that rock has been coming for 12 years
notice is not consent.
> the alternative is to put other people in that situation
12 years is enough time to hand off to people who are happy to maintain py2. But there was no choice given.
It’s open source, you can fund some program to keep supporting python 2.
No, actually, you can't - last time I checked, they were specifically threatening[others] to sue anyone who tried to continue developing Python for trademark infringement (despite that they are ones falsely using the trademark for something other what it got it's reputation from).
Here’s the Open Source Definition:
> The license may require derived works to carry a different name or version number from the original software.
Some things in Python 2 were not fixable by keeping it backwards compatible. Print as a statement? Sure. But strings/byte arrays, no way.
Of course they could have made the Py2 implementation less broken and less stupid (yes please do use ASCII as the default, ignore the existence of unicode, be trigger-happy about errors, etc)
For text files, maybe, but various APIs like the Window API and the Java String API still use UTF-16.
UTF-8 dependence is also a major pain for many where the local character set conflicts with UTF-8. For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly. The country of Myanmar officially switched to unicode less than two years ago so if you still need to operate on older data, you're going to need to support their old character set.
UTF-8 as a fixed encoding only works if you manage to write mappers from and to alternative character set for practically any language outside US English. Instead of breaking compatibility with most libraries, python3 would have broken compatibility with most libraries and a few countries instead.
Just like the rest of the world has to deal with three countries refusing to switch to metric, python3 needed to deal with countries refusing to switch to UTF8.
Huh? I'm using UTF-8 exclusively for string data since around 20 years in C and C++ and never had to deal with language specifics (also true for non-European languages, we need to deal with various East Asian languages, and Arabic for instance). You need to convert from and to operating system specific encodings when talking to OS APIs (like UTF-16 on Windows), but that's it (and this is not language specific, code pages are an "8-bit pseudo-ASCII" thing that's irrelevant when working with UTF encodings.
When dealing with "vintage" text files with older language-specific encodings, you need to know the encoding/codepage used in those files anyway, and do the conversion from and to UTF-8 while writing or reading such files. Those conversions shouldn't be hardwired into the "string class".
From a European perspective, this sounds very unlikely. Sure, you may have to deal with deprecated _encodings_, but I’d like to hear about mainstream languages with writing derived from the Latin alphabet, that aren’t supported by UTF-8
Or they could have fixed setdefaultencoding or give us a way to set the default encoding https://stackoverflow.com/questions/3828723/why-should-we-no...
I don't buy the "discouragement" part there, if anything they could have made it mandatory or at least it set it to UTF-8
> For example, there's still a lot of Japanese files out there in SJIS that need to be decoded accordingly.
Yes but you would have to work on those cases anyway and ASCII would have made it blow anyway. But convert it to UTF-8/16 and it works.
EDIT: the reason is apparently that "(setdefaultencoding) will allow these to work for me, but won't necessarily work for people who don't use UTF-8. The default of ASCII ensures that assumptions of encoding are not baked into code"
Really. I can't explain my anger at how this is such an idiotic excuse. Yes, your program will fail if you use Latin-1 encoding, duh. Configure your environment correctly and it will work. Sounds like the kind of pedantry that made Guido quit over the walrus operator
Either way, it's no big deal. There are no excuses for Python 3. Some people are just stubborn.
> Do these men think that time is free?
Do you think they have endless time to plan a migration with minute detail for every possible usecase?
Users have about a decade to migrate their codebases and stop writing new projects in Python 2. Do you need another decade? Or are you personally going to take over the maintenance of the python2 runtime?
Does anybody actually pay the core dev team for support? Do you? Does your company? Have they been coordinating all these years with the core devs and are unhappy with the result they paid for? I kinda doubt it.
It would be really nice if people were just thankful for all the free stuff they got and built their enterprises on.
My point is that for these small changes from 2 to 3, there should have never been a migration to begin with.
It's not an accusation of lack of effort; it's an accusation of ignorance on their part.
The migration has not only cost everyone else time and money; it has cost them time and money that was better spent elsewhere.
It has been a net detriment to all parties, including them, because they severely underestimated the cost of rewriting software and dealing with the regressions it might lead to.
I will damned well call a man foolish, for pointing a gun at his foot and getting shot in it, because he underestimated how easily the trigger would go off by accident, instead of being thankfully that he was willing to put in the effort to aim it at his foot.
For example, when translating between English and Romanian, 'man' often gets translated to 'om', which doesn't imply a gender in modern Romanian.
Even in English, 'man' is sometimes used without a gendered connotation. For example, if I say 'man is evil', I am unlikely to be referring to males, but rather people. Similarly, 'hey, man!' is not reserved for males.
That was a pointlessly gendered comment and has no place in our industry. You can keep defending it but I’m not going to stop calling it out.
Have a nice Sunday!
It was certainly an informal statement I made, not a formal specification.
There was nothing gendered about that statement, and most do not seem to have interpreted it as such, nor was it so intended by me.
Language adapts or does. That’s how English got here. You can adapt too - it’s not as hard as you’re making it out to be. Heck, if you spent an eighth of the time thinking about inclusivity as you do about individual words, I wouldn’t have had to say this.
However, again, I’m glad I said something and regardless of what you claim, I’m not going to stop calling these kinds of grammatical monstrosities out. Language is important. Full stop.
The finer points about the semantics of a word such as 'man'/'men' and when it can be taken to refer to people unambiguously vs when it may accidentally imply you are talking about adult males are likely to be lost on a non-native speaker, especially if they come from a culture/language where this distinction and its implications are not subjects of general interest. Even if they are well-versed in the use of English in general.
So it's better to follow HN guidelines and assume the best intentions where meaning is unclear, instead of calling people out on their use of English.
Now, if you know for a fact that the GP is a native English speaker, and especially if you know that they are American, then what I'm saying is not very relevant.
Good example: “man is evil” clearly means people, since one would say “men are evil” if referring to males.
I just searched for the phrase and it's about half split between either meaning from context inference. Yet, the meaning pertaining to the species is mostly from discussions by educated philosophers, and the other one are annoying identity politics arguments about why one's North American dating life is disappointing, — not exactly the audience I am ever interested in reaching, frankness be.
The reason I'm not what you call “kind” is simply because this is how English works, and how it has always worked and how English speakers would interpret and parse that word.
I see no reason to avoid using a word in a perfectly acceptable, current, and historic use simply because you find that it has a different, secondary use. You call that “not being kind”. I call it “You don't own the English language any more than I do.”
You may speak as you will, I do not deny that the current usage of the word “man” has acquired a secondary meaning of “adult male human” opposed to it's historical meaning of “human" and if you wish to use it as such, then I'm confident I can usually discriminate by context. I merely ask that I be allowed the same and speak as I will and use the word in it's original meaning, that obviously still sees current use.
To be fair, while I consider your original wording to be pretty clear, this is wrong. According to Wikipedia, the word 'man' has adopted the meaning of 'adult male human' as its primary meaning starting with Middle English, when it displaced Old English 'wer'. There are still uses where it retains the much older meaning, but its primary meaning today is 'adult male human', and has been for a good few hundred years.
https://i.imgur.com/EG4zaoU.png [sadly the corpus cannot be easily linked, but one may search in it here: https://www.english-corpora.org/glowbe/]
The way I look at it, the usage therein of the word “man” to specifically discriminate sex is very rare but definitely occurs. What does occur is the use of the word “man” to refer to a specific individual, which would typically be male, but in most cases where the word “man” is used indeterminately to refer to a class, it seems to be used without regard to sex.
Apart from that the most common usage seems to simply be vocatively as address, which is also gender neutral.
I would agree that it is rare, outside of compounds, to use the word “man” in a determinate sense for a female man, such as “that man over there” which would mostly be used in a military context, but in an indeterminate context to speak of “a man in general” or “men in general”, the most common usage from context seems to be sexless to this day.
There are also clear cases where "a man" is used to refer to "a human", such as "wheat growing taller than a man".
Rather more interestingly, if you instead search for "men", you'll see that is used essentially exclusively to mean "adult males". The only exceptions I found was "and because the greed of a few men is such that they think it is necessary that they own everything" and even there I'm not sure.
I disagree; the first uses of “man” in an indeterminate sense are these:
> down the economy, Here is the truth the republicans feel uncomfortable with a black man in the with house and a lot of voters are riding the republicans coat tail
> someday you might ask me to help you move. Or, to kill a man. # Leonard: I'll doubt he'll ask you to kill a man
> say, in 35 years of working I have almost always had at least one man who I felt " wrong " about. (the exception? Disney Studios!
> boyfriend, well husband, but either way would've flipped out if a weird man said some creepy remarks regarding me at a christmas party. To me this says
I have specifically included up till your reference, which was the first of an indeterminate usage of the word “man” that by implication is most likely gendered, whereas all the others are most likely not.
So there are three sexless ones before the first gendered one.
You mean primary.
“Do these men think that time is free?“
That’s not even the same structure as ‘all men are evil.’ Instead what you wrote is gendered and thus completely inaccurate.
So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.
That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?
I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.
Be better. It’s easy.
Indeed it is not. I merely separately disagreed with that the statement “All men are evil.” would also by necessity be interpreted as such. Either can be, depending on context, but this is not such a context.
> Instead what you wrote is gendered and thus completely inaccurate.*
You seem to be of the minority that has interpreted it as such. I would not quickly use votes for an argument except when they pertain to popular opinion, and this is a matter of which interpretation is more common.
I certainly didn't mean any gendered statement, and I also believe that most readers did not read any gender into it.
> So again, you could have used ‘people’ to be respectful and inclusive but you’re choosing to stick with ‘man’ because that’s what you know.
I could, and you could also change your language to avoid any and all possible ambiguities that would not be a problem in practice due to the power of contextual inference.
You seem to ask that this specific word be given special treatment above all others.
> That’s unkind. You know that this is an issue within our community but you are fully choosing to go against the norms because of ‘your language’?
Such as here, the word “our community” is quite vague. You used the word “our” which is ambiguous in English as it's unclear whether it includes the listener or not, and on top of that also what it includes.
I can however perfectly well infer from context that this is an “our” that includes the listener, and can make a reasonable guess to the extent of the “community” you refer to.
Finally, do not know that it is “an issue” and I certainly do not know that there are “norms” about this. It very much seems that the majority sides with me on this issue given the votes, at least here. I do not believe I am going against any norms, not that I would consider an argumentum ad populum a strong one, but you were the one that raised it here.
> I’m sorry but I thought we could have a conversation. This many replies in and I realize that you don’t actually have much sympathy, understanding or even basic caring.
Well, frankness be, it seems from your language as though your default expectation is that your arbitrary wims, at least on this particular issue should be accommodated, and that everyone who disagrees with you is unkind or lacks sympathy.
You call it a conversation, but it seems as though you started it from the assumption that you are right, and everyone who disagrees is wrong.
> Be better. It’s easy.
It is your opinion that this is better that this is better indeed. Not everyone has to agree with you on that matter, and not everyone does.
However, you’re a beautiful writer and beautiful writers can cause immeasurable pain. I’ll always speak out in case another minority of one feels pain but is too ??? to speak out.
Seriously, take good care. This has been a wonderful thread and again, you’re a really beautiful writer. :)
One would also think that “man is evil” would be preferred by the erudite philosopher to the more ambigious “men are evil”, although one can never overestimate the fondness that an educated person might have towards pedantry, frankly.
“Mundane people” is an entirely different segment than “raging identity politics aficionados complaining about their romantic life”.
The common man on the street will think nothing ill of the word being used as such, even when he be a blue collar construction worker, and will normally interpret it as intended.
I have never met such a raging identity politics aficionado in real life. I would assume not living in the U.S.A., where most of them seem to be centred, reduces my chances. But even there, it seems to be a rather small segment that is isolated to weblogs, as even newspaper columns do not seem to find it mainstream enough to dedicate segments to it.
I'd gander that if I were to find myself in New York and strike a conversation with a blue collar local and say something such as “A beautiful city isn't it? all these millions of men, working as an organized beehive.”, that he'll not interpret me wrongly or even think much of it.
Actually I think there's a very good chance she'll object.
The problem is that in your mind, males are the "default" human, and using sexist language reinforces this. This is not a recent opinion confined to "raging identity politics aficionados" or "weblogs" - at this point it's the wrong side of history for the better part of half a century. Consider this piece of satire by Douglas Hofstadter, written in 1985, which substitutes racist language for sexist language in a precisely analogous way:
If you mean to suggest that this position runs across gender lines, then I very much object and find that a naive, but common, assumption.
It reminds me of a Canadian act that sought to introduce the word “fisherwoman” as a sign of good faith to the female fishermen, but it revealed that, overwhelmingly, the fishermen, male or female, did not like this change and found the word to sound silly.
I have noticed no correlation with the gender as to what position one takes on this, as many females as males seem to either favor, or object to, innovations such as “chairwoman” or “councilwoman”.
> The problem is that in your mind, males are the "default" human
No, that would be in the mind of those that read the word “man” and must compulsively attach a gender to a statement containing it.
I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.
> and using sexist language reinforces this
The sexist history is to use the word that has always simply meant “human” and giving it a gendered, ageist meaning. — you reverse the history of the word here.
> at this point it's the wrong side of history for the better part of half a century.
What would you mean with “wrong side of history”? It is undeniable that the meaning of the word “man” to mean “human” is the original meaning of the word and that the secondary usage to mean “adult male human” is a later innovation.
>I've certainly noticed that those so interested in gender language police invariably seem incapable of abstractly thinking of a person without attaching a gender thereto.
The irony. Next time say "they" instead of "he".
No I didn't. The pronoun “he” in English is also very often used to refer to an indeterminate, hypothetical person of irrelevant and unspecified sex.
I didn't picture him as anything in particular, given that I am partially aphantastic and never draw mental pictures about such scenarios.
> The irony. Next time say "they" instead of "he".
There is no irony here; you infer that he is male because of the pronoun and I find such usage to not be universal at all.
The pronoun “he” has a very long history in English for use with a hypothetical person, from which the listener is not meant to infer any particular gender. It is also true that some use the pronoun “they” in that case, but that is not a universal behavior and either may be encountered.
Use of “she” for such hypothetical persons has also seen recent use, and was probably innovated deliberately; some auctors deliberately alternate both in even distribution.
All of this is how the English language is used by different speakers. I am not telling you which is better and how you should use it; I am telling you that if you are denying that all have currency, you are but certainly being willfully ignorant because you do not like the descriptive truth about how English is used by it's speakers.
(I’m on your side but that’s an aside. You’re a really beautiful writer.)
My perspective is that context is usually sufficient and that this is not the only word in English that is used as such. I never find such passionate debates about the word “chess” for instance which can be used for every game that descended from the Indian game, the European variant specifically, or simply any exercise of great tactical planning.
Such interesting objectivity men are awarded when politics not be in play.
Women have better things to do with their time, like studying law or medicine.
Rust's commitment to backward compatibility is certainly extremely commendable, but I don't think the language went through anything resembling the switch from Python 2 to Python 3 in terms of breakage.
Some of the changes in Python 3 are very fundamental. Imagine if Rust had shipped without String/str and they were added after the fact, would Rust manage to avoid splitting the ecosystem? That's an open question as far as I'm concerned.
And I also hope that we never find out. Rust's fundamentals have proven to be very solid so far. Having things like OsStrings (something missing from most programming languages, including Python 3 AFAIK) shows a great amount of foresight and understanding of the problem space. Contrast that with Go which seems very intent on completely ignoring 30 years of programming language evolution.
You would still have some differences which can't be papered over, but it would have made writing code that works in both python 2 and 3 much easier.
The initial expectation was that translation tools would solve this problem, but it didn’t really work out that way. Adding language features and library shims to make it possible to write pidgin Python that would run under either version meant that you could migrate libraries and parts of large codebases one at a time until the whole thing ran under Python 3.
The problem is that it's way trickier than it should be. Has they made that relatively easy, the Python 2->3 transition would have had a much smoother "normal" upgrade process.
Can anyone tell me why statements such as these get the emotions going?
There will come the day when something will change semantics, or require different kinds of runtime support across editions, and then the headaches how to link binary crates from different editions will start.
Editions to me appear only to work, provided everything is compiled from source code with the same compiler, aware of all editions that came into use.
Read the damn Editions RFC. The community agreed that no semantics or ABi breaking changes will land to Rust,
This is not a lesson from Py th on, but from C++, which introduces breaking changes every single release, which are much smaller than Python but still a pain in million LOc code bases.
If that ever happens, it was agreed that the result would be a different language, with a different name.
That is, editions don’t have this problem because the FUD that you are trying to spread every single time this issue comes up cannot happen, by design.
Your argument “Rust editions don’t solve this problem because they don’t handle semantic or ABI changes” is false,
because in Rust there CANNOT be any semantics or ABI changes, and editions handle this situation just fine.
In the context of Python 2 vs 3 this argument makes even less sense, because editions allows combining libraries from different editions ina forward and backward compatible way without issues. Source: work on multiple >500kLOC Rust code base and one >1 million LOC and they all use crates from all editions, and mix & match them, without any issues, doing LTO across them, using dynamic libraries, and all possible combinations of binaries across 4 major platforms.
You call it FUD, I call it hand waving.
I want Rust to succeed and one day be available as official Visual Studio installer language, but apparently unwelcome points of view are always pointed out as FUD and attacks.
When cargo does finally support binary crates, I will be glad to be proven wrong when linking 4 editions together into the same binary, just like I do with DLLs and COM today.
Instead, Rust has defined an ABI and has committed to never breaking that ABI. Editions support API-level changes, but the ABI won't change.
> you can't link a library compiled with clang to one compiled with MSVC
You surely can, provided it is made available as DLL or COM.
The one I use for maximum portability is the C ABI defined in the ISO C standard and on the platform docs (eg the Itanium ABI specified eg in the x86psABI document on Linux).
Try doing the same with C++.
I really believe you when you say that you are clueless about how to do this, since you don’t seem to have any idea about what you are talking about, and all your posts start with a disclaimer about that.
But at this point the only thing I have to tell you is RTFM. Doing this is easy. HackerNews isnt a “Rust for illiterates” support group. Go and read the book.
Python is one of the most "accessible" languages in terms of the actually programming experience, but making a project reproducible is a nightmare. There doesn't seem to be a real "right way" to manage dependencies, and getting a project running seems to often start with figuring out how the author decided to encapsulate or virtualize the environment their project is running in, since changing your system python for one project can break another.
I know it's an older language, so many lessons have been learned, but after working with Rust, or even NPM it seems amazing that developers tolerate this situation.
Some combination of requirements.txt (which lets you dial in, with great precision, each of the libraries you need, and is trivially created in < 50msec with "pip freeze")
1. Containers - That's it. You control everything in it.
2. virtualenv - Every environment comes their own version of python and their own set of packages and versions of those packages. Add virtualenvwrappers and you can create/switch trivially between them.
It's been at least 2 years since I've run into a issue with python and dependencies that wasn't solved by both of those approaches.
It's a far cry from being able to download any git repo and call `cargo build`/`cargo run` or `npm install`/`npm run` with confidence that it's just going to work.
Where I work - life is simple. You build your project in a virtualenv so it only has the libraries it needs, generate a clean requirements.txt, check it into git with a requirements.txt - everyone can run it, and, because we have day-1 onboarding to teach everyone virtualenv/virtualenvwrapper - the first thing a person does before installing the application is mkvirtualenv.
I see a lot of references to Poetry here - but I've never been able to interest any of our senior developers into looking at it - they are pretty happy with our existing system.
It is true that most recent languages ship with these from day 1, but ecosystems rarely lack this kind of stuff. I mean, even my vim has a package manager nowadays.
As for whether you want to salvage old code that isn't provided with package management, it's up to you. But you would have this kind of problem with any old, unmaintained codebase.
The problem is that this leaves us with 2 years of documentation that's reliable and addresses easily-solved problems, and 28 years of everything else that will confuse anyone new to the language. Not ideal when accessibility is one of the language's primary selling points.
One of the major problems with fixing design choices or odd behaviours in software is that all of the old threads and posts don't just disappear, and people are now going to be lead down paths that are not only so convoluted and ridiculous that they were eventually changed, but often paths that don't even work any more.
It's very very tough to fix that problem retroactively.
Pipenv could have been it but it never got far.
[Marine has been wounded]
Marine Private: AHHHH my arm, my arm!
Major Payne: Want me to show you a little trick to take your mind off that arm?
[Marine nods and Payne grabs the private's pinky finger]
Major Payne: Now you might feel a little pressure.
[Major Payne breaks the Marine's pinky]
Marine Private: AUGGGGH! My finger, my finger!
Major Payne: Works every time.
That's kind of how I feel about Docker. Before, you had a problem. With Docker, you have a new, bigger problem (and most of your old problem hasn't gone away; it's just been masked for a while).
(And yes, I know I'm in the minority here)
* Robust systems shouldn't be tied to pinned versions. If your code works with PostgreSQL 9.6.19, and doesn't work with 9.6.20 or 9.6.18, that's usually the sign of something going very, very wrong.
* In particular, robust systems should always work with the latest versions of libraries. In most cases, they should work with stock versions of libraries too (whatever comes with Ubuntu LTS, Fedora, or similar). It's okay if you have one or two dependencies in a system beyond that, but if it's a messy web, that's a sign of something going very, very wrong.
* Even if that's not happening, as much as I appreciate having decoupled, independent teams, your whole system should work with the same versions of tools and libraries. If one microservice only works with PostgreSQL 11.10, and another with 12.07, that's a sign of something having gone way off the rails.
These aren't hard-and-fast rules -- exceptional circumstances come up (e.g. if you're porting Python 2->Python 3, everything might not land at the same time) -- but these should be rare enough to be individually approved (and usually NOT approved) by your chief architect/architecture council/CTO/however you structure this thing.
For the most part, I've seen Docker act as an enabler of bad practices:
* Each developer can have an identical install, so version dependencies creep in
* Each team has their own container, and it's easy for versions and technologies to diverge
* With per-team setups, you end up with an uncontrollable security perimeter, since you need to apply patches to a half-dozen different versions of the same library (or worse, libraries performing the same function)
The docker/microservices/etc. mode of operating gives a huge short-term productivity boost, but I haven't actually seen a case on teams I've been on where the benefits outweigh the long-term costs. That's not to say they don't exist, but they're in the minority.
For the most part, I use Python virtual environments and similar, but by the time you hit docker, I back away.
Docker alone doesn't solve the problem and neither does pip unless you take extra steps.
Here's a common use case to demonstrate the issue:
I open source a web app written in Flask and push it to GitHub today with a requirements.txt file that only has top level dependencies (such as Flask, SQLAlchemy, etc.) included, all pinned down to their exact patch version.
You come in 3 months from now and clone the project and run docker-compose build.
At this point in time you're going to get different versions than I had 3 months ago for many sub-dependencies. This could result in broken builds. This happened multiple times with Celery and its sub-dependency of Vine and Flask with its sub-dependency of Werkzeug.
So the answer is simple right, just pip freeze your requirements.txt file. That works but now you have 100 dependencies in this file when really only about 8 of them are top level dependencies. It becomes a nightmare to maintain that as a human. You basically need to become a human dependency resolution machine that traces every dependency to each dependency.
Fortunately pip has an answer to this with the -c flag but for such a big problem it's not very well documented or talked about.
It is a solvable problem tho, to have a separate lock file with pip without using any external tools and the solution works with and without Docker. I have an example of it in this Docker Flask example repo https://github.com/nickjj/docker-flask-example#updating-depe..., but it'll work without Docker too.
Also, Docker is not a universal and secure solution. It works great as an "universal server executable format and/or programming environment" on Linux, but less so on Windows, macOS and especially FreeBSD.
Yes at some level it solves your problem, but it adds a lot of complexity which doesn't need to exist. Also now depend on someone in the middle who takes effort to manage, and might not do exactly what you want.
If you are developing an open source software that people can install on their machines, is a terrible solution. In that case you should package it correctly and distribute it via pip, so people can easily install it on their systems.
Perl and Python are the only two examples of this to my knowledge: most open source languages do fine introducing breaking changes in major versions.
The question is why Perl and Python had such problems while for example NodeJS, PHP (comparable webserver scripting languages) have had no such issues.
I wonder is it anything to do with the areas they're used in (Python & Perl are popular local/cli scripting languages in addition to web—has Bash had similar version woes?), or is it purely that the changes they made were more significantly breaking than others'? That's probably true of Perl6/Raku at least.
Then the Python people did the same mistake. Beginners learned Python 3 and then had trouble with App Engine or some other platform. The most popular question in Python forums for many years was if someone should learn 2 or 3. Some probably just went with Go instead.
So in many of these cases the end user doesn't have a choice to use Python 3 until it's on offer.
And the vendor has usually integrated Python at a binary level into C code; that's why they provide a Python API.
The answer could even be "Red Hat Enterprise Linux 6"; consider that Python 2 is the default in this OS, which ended official support only at the end of last year. Many enterprises _chose_ this platform for its longevity, along with 3rd party vendors of commercial software.
Painfully wrong. They tried to deprecate at least twice, it was so bad a decision that they had to backtrack.
In fact Python 3 is the hard proof that you cannot hope users to upgrade, by abandoning ship and giving no way to upgrade.
With Node 0.12 though I don't see it. IOJS was a pretty momentary internal political issue that many users didn't even register on their radars. It certainly didn't have any long-lived impact on version adoption within the community.
And: the important point, they've all had very successful major bumps since. So even if there are pains, they can be overcome. There's nothing fundamentally un-doable about major version releases for open-source languages.
PHP7 on the other hand was a pretty seamless migration from PHP5, and PHP8 looks likely to be similar.
General point is the original commenter was posing this as some fundamental issue of open source languages: clearly there's plenty of examples of success, so it can't be.
This won't change that at all; pip/pip3 is a distro packaging thing, and any distro that packages legacy python2 as “python" and Python 3 as “python3” will probably continue packaging legacy pip-for-python2 as “pip” and pip-for-python3 as “pip3”.
$ python --version
IIRC the language known as "Perl", version 5, when it gets to the point when it is going to update it's major version, will skip 6 and go right to Perl 7
But there was a sort of broken promise given by the Python creators: Python3 was almost like Python2, but every library author had to review and repackage their libraries, anyway.
At that point, Python 3 should have been unambiguously incompatible with Python 2 :
- the only allowed file extension should be py3
- all environment variables should have been duplicated with a "3" (it shouldn't read or modify Python 2 env vars)
- all installation folders should have been duplicated with "3"
- all tools like pip should be suffixed with "3"
- and most importantly, it shouldn't try to optimistically run previous Python2 code or previous v2 tools
Still today, you can run "python" in a recent version of Ubuntu or Fedora, and it will be Python 3. Only "python3" should be possible. Distro are repeating the same mistake than with Python 2, and we will struggle again with Python 4, if there is any Python 4.
Many headaches wouldn't have happened if "python" was reserved to Python 2.
Pro tip to language and distro maintainers : make the major version part of the language name and executable, from version 1.
That would have made cross-version codebases impossible, and that's what ultimately allowed migrating. One-shot migrations were not convenient, or successful, or even effectively feasible for complex enough projects.
What allowed the migration was community experiment in cross-version sources, as well as reintroduction of "compatibility" features into Python 3.
And if you wanted to try that yourself anyway, changing all .py to .py3 in a directory is one unix command... It could easily have been part of a 2to3 tool
I was personally and solely responsible for migrating a >250kLOC project from Python 2 to Python 3, doing so without cross-version compatibility would not have been feasible. We literally picked the earliest P3 version we decided to support based on cross-compatibility features.
- the codebase had to be modified to work on both versions at the same time
- you had to maintain two versions in different branches for a while
Is there any data to show which options was most often chosen amongst all pypi packages? I suspect that the second option was more popular for the most important packages of the ecosystem
six was the #2 package on PyPI. It's a library to help with option 1.
You're 100% wrong. The few packages which decided on option 2 early on (e.g. dateutil) ended up having to roll back to option 1 because it was such a pain in the ass, both for the maintainer and for downstream users. The migration only really started happening once 2.7 dropped, projects like Six started appearing, and the community started ignoring 2to3 and building up experience with cross-version projects and idioms (e.g. , )
 whose entire point is to help with option 1
This. This is the only option.
I can tell you in decades of experience with I don't know how many companies and language, this is ALWAYS the option that is done.
In your shell you don't run "java8" or "java11", you just run "java", and then it's the matter of what version of java JDK you have in your PATH. The same with all other language interpreters and compilers, you don't run gcc9 or node14. Why doing something different for python?
Really, the mistake of python3 was to break compatibility with past programs. A lot of changes could have been done more gradually, in the first version require a __future__ import, then gradually remove compatibility with old features, and then remove them completely, making the new way the default and thus no longer require the __future__ import. And I think it will be the way for next python version, so in theory we will never have the same problem again.
Also to me it was an error for distros to continue packaging python2 as python. Other distributions, like ArchLinux, switched everything to python3 as default a lot of time ago, it's only Ubuntu that continued to ship python2 as python, thus making a lot of programmers relying on it. It would make sense that the command without the name refers to the latest version, and not the legacy one.
That's not the case with C# or Java.
I think Python's failure is unique. It needlessly broke too much back-compat at once, provided too little benefits to make up for it, and let everyone drag their feet for a decade with the upgrade.
"We have nearly endless money and therefore we can make any new feature by treating older versions of our langage as bytecode and write transpilers for it".
However, I do not disagree. Renaming is the right thing. A version number is easily omitted.
Compare this to Python 2/3. It's basically an incompatible fork that doesn't add enough for many projects to consider upgrades, and adds the overhead of having to worry about two version. All it really accomplished was guarantee that "Python 4" will never, EVER be a thing.
The most that was ever promised about the 5->6 transition was that there'd be ways of using 5 modules in 6 (which more or less works for 'pure-perl' 5 modules, within reason).
As Larry put it the day he announced it:
> It is our belief that if Perl culture is designed right, Perl will be able to evolve into the language we need 20 years from now. It’s also our belief that only a radical rethinking of both the Perl language and its implementation can energize the community in the long run.
> (which more or less works for 'pure-perl' 5 modules, within reason)
Are you misunderstanding what has been achieved?
Using the "Best First" view of replies to the 2008 PerlMonks question What defines "Pure Perl"?
> "Pure Perl" refers to not using the C-extensions ("XS") and thus, not requiring a working C compiler setup.
Inline::Perl lets Raku code use Perl modules as if they were Raku modules. XS or pure perl.
Not all of course. Some make no sense whatsoever in Raku (eg source filters). Some don't yet work but could if someone cared to deal with issues that arise. But if you're thinking that Rakudo only imports pure perl modules for the above definition of "pure perl", please know that Rakudo is light years ahead of that due to niner's amazing work. And if you mean some other definition of "pure perl" it would help me if you shared it. :)
I want to know which distro removed perl because that's quite a drastic step and am interested in studying it. Sorry if that offends you.
'look at how perl did things!' is just a really strange approach in a discussion about the Python 2/3 thing. That operation was successful, the patient died.
Biggest of them all. I can't find the announcement.
The only distro I'm aware of which tries to protect users shooting themselves in the foot with pip is Gentoo. Most of the others will happily let you "sudo pip install" stuff and lead people to think that's the correct way to do things.
Unfortunately pipx has come too late. Pipx provides a real way for users to install arbitrary python tools, but too many docs out there tell users to use pip. Then you've got all the users who want to "play around" with python and install libraries. Even things like jupyter have crap support for virtualenvs and make it far too easy for users to have all their projects in a single env. It's a mess.
Regular users should never have been exposed to pip2/pip3. They should never have even been interacting with the OS python interpreter. Pip should only exist in a "global" contexy to support bootstrapping virtualenvs and nothing else. Poetry does a lot of this right.
Not sure what any of this has to do with open source languages, mind clarifying?
if you step back and think about it, using the same word for two fundamentally irreconcilable things is mad.