1. Unicode support was actually an anti-feature for most existing code. If you're writing a simple script you prefer 'garbage-in, garbage-out' unicode rather than scattering casts everywhere to watch it randomly explode when an invalid byte sneaks in. If you did have a big user-facing application that cared about unicode, then the conversion was incredibly painful for you because you were a real user of the old style.
2. Minor nice-to-haves like print-function, float division, and lazy ranges just hide landmines in the conversion while providing minimal benefit.
In the latest py3 versions we've finally gotten some sugar to tempt people over: asyncio, f-strings, dataclasses, and type annotations. Still not exactly compelling, but at least something to encourage the average Joe to put in all the effort.
Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.
In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.
> In the latest py3 versions we've finally gotten some sugar to tempt people over: asyncio, f-strings, dataclasses, and type annotations. Still not exactly compelling, but at least something to encourage the average Joe to put in all the effort.
That's because until 2015 all python 2.7 features were from python 3. Python 2.7 was basically python 3 without the incompatible changes. After they stopped backporting features in 2015. Suddenly python 3 started looking more attractive.
> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.
Sometimes you don't care about weird characters being print as weird things. In python 2 it works fine: you receive garbage, you pass garbage. In python 3 it shuts down your application with a backtrace.
Dealing with this was one of my first Python experiences and it was very frustrating, because I realized that simply using #!/usr/bin/python2 would solve my problem but people wanted python3 just because it was fancier. So we played a lot of whack-a-mole to make it not explode regardless of the input. And the documentation was particularly horrible regarding that, not even the experienced pythoners knew how to deal with it properly.
You run your python 2 code on python 3 and it fails, most people at that point will place encode() or decode() in place where you have a failure. When the correct fix would be to place encode/decode at I/O boundary (writing to files (and in python 3 even that is not needed if you open files in text mode), network etc).
Ironically a python 2 code that doesn't use unicode is easier to port.
When you program in python 3 from the start it's very rare to need encode/decode strings. You only do that if you are working on I/O level.
> And the documentation was particularly horrible regarding that, not even the experienced pythoners knew how to deal with it properly.
Because it's not really python specific knowledge. It's really about understanding what the unicode is, what bytes are, and when to use each.
The general practice is to keep everything you do as text, and do the conversion only when doing I/O. You should think of unicode/text as as a representation of a text, as you think of a picture or sound. Similarly to image and audio text can be encoded as bytes. Once it is bytes it can be transmitted over network or written to a file etc. If you're reading the data, you need to decode it back to the text.
This is what Python 3 is doing:
- by default all string is of type str, which is unicode
- bytes are meant for binary data
- you can open files in text and binary mode, if you open in text the encoding is happening for you
- socket communication - here if you need to convert string to bytes and back
Python 2 is a tire fire in this area:
- text is bytes
- text also can be unicode (so two ways to represent the same thing)
- binary data can also be text
- I/O accepts text/bytes, no conversion happening
- a lot (most? all?) stdlib is actually expecting string/bytes as input and output
- cherry on top is that python2 also implicitly converts between unicode and string so you can do crazy thing like my_string.encode().encode() or my_string.decode()
So now you get a python 2 code, where someone wanted to be correct (it is actually quite hard to do it, mainly because of the implicit conversion) so the existing code will have plenty of encode() and decode() because some functions now expect str some unicode.
At different functions you might then have bytes or unicode as a string.
Now you take such code and try to move it to python 3, which no longer has implicit conversion and will throw an error when it expected text and got bytes and vice versa. The str now is unicode, unicode type no longer exists and bytes is now not the same thing as str. So your code now blows up.
Most people see an error so they add encode() or decode() often trying which one works (like what you were removing) when the proper fix would be actually removing encodes() and decodes() in other places of the code.
It's quite difficult task when your code base is big, so this is why Guido put a lot of effort with type annotations, mypy. One of its benefits supposed to help with these issues.
Native English speakers are usually the ones blissfully unaware of it, because it just happens to cover all their usual inputs. But as soon as you have so much as an umlaut, surprise! And there are plenty of ways to end up with a Unicode string floating around even in Python 2 - JSON, for example. And then it ends up in some place like a+b, and you get an implicit conversion.
With 2 vs 3 code is easiest to write your code for python 3 and then in 2 import everything you have in __future__ package including unicode literals. That's still not enough and you still might need to do extra work. In python 3 there's argument encoding, which could do the encoding which doesn't look like it is available in python 2. So you probably shouldn't be use it and treat all input/output as bytes (i.e. call encode() when sending data to stdin, and decode() on what you get back from stdout and stderr).
Perhaps that might be enough for your case, although many things is hard to get right in python 2 even when you know what you should do, because of the implicit conversion.
Edit: this also might be useful: https://unicodebook.readthedocs.io/good_practices.html
Also this could help: https://unicodebook.readthedocs.io/programming_languages.htm...
This is definitely the case. I've been wrestling with bytes and strings all the time during the port of a Django application to Python 3 for a costumer. I can see myself encoding and decoding response bodies and JSON for the time being. For reasons I didn't investigate I don't have to do that with projects in Ruby and Elixir. It seems everything is a string there and yet they work.
Perhaps there’s something about a port that requires encoding/decoding bytes/strings?
Ironically when your python 2 app doesn't care about unicode, the porting to python 3 is actually much easier.
If you write code in python 3 from the start you rarely need to use encode() and decode(). Typically what you always want is a text not bytes.
Exception to it might be places where you want to serialize like IO (network or files, although even files are converted on the fly unless you open file in a binary mode).
Example, I just had to write this
This is the only language where I have to explicitly deal with encodings at such low level. I don't feel like I want to use it for my pet projects.
Of course urllib could have method text() that would do such conversion, but then urllib is not requests. It never was user friendly.
Edit: personally I use aiohttp, the interface is much nicer: https://aiohttp.readthedocs.io/en/stable/client_reference.ht... if I can't use asyncio then would use requests.
Not that I've seen.
Example of where Python 3 has rained shit on my parade: I wrote a program that backs up files for Linux. It works fine in python 2, but in python 3 you rapidly learn you must treat filenames as bytes otherwise your backup program blows up on valid Linux filenames. It's not just decoding errors, it's worse. Because Unicode doesn't have a unique encoding for each string, so the round trip (binary -> string -> binary) is not guaranteed to get you the same binary. If you make the mistake of using that route (which Python3 does by default) then one day Python3 will tell you can't open a file you os.listdir() microseconds ago and can clearly see is still there.
Later, you get some sort of error when handling one of those filenames, so you sys.stderr.write('%s: this file has an error' % (filename,)). That worked in python2 just fine, but in python3 generates crappy looking error messages even for good filenames. You can't try to decode the filename to a string because it might generate a coding error.
This works: sys.write('b%b: this file has an error' % (filename,)), but then you find you've inserted other strings into error messages and soon the only "sane" thing to do is to to convert every string in your program to bytes. Other solutions like sys.write('%s: this file has an error' % (filename.decode(errors='ignore'),)) but corrupt the filename the user sees, are verbose, and worst of all if you forget it isn't caught by unit tests but still will cause your program to blow up in rare instances.
I realise that for people who live in a land of clearly delineated text and binary, such as the django user posting here, these issues never arise and the clear delineation between text and bytes is a bonus. But people who use python2 as a better bash scripting language than bash don't live in that world. For them python2 was a better scripting language than bash, but is being being depreciated in favour of python3 that's actually more fragile than bash for their use case. (That's a pretty impressive "accomplishment".) Perhaps they will go to back to Perl or something, because it stands Python3 isn't a good replacement.
This! IMO Python 2 has better usability for prototyping and thinking and doing things on the fly. Python 3 also often seems to have deprecated the functions I want to use in favor of those that are more cumbersome and take more keystrokes. More explicit sure, but less fluid.
When python was created the Unicode didn't even exist.
Anyway in python 3, many os functions accept string and bytes, and might behave depending on it. For example os.walk, if you pass path as byte string, will output paths as bytes.
Not always. As far as I can tell writing garbage bytes to various APIs works fine unless they explicitly try to handle encoding issues. First time I noticed encoding issues in my code was when writing an xml structure failed on windows, all because of an umlaut in an error message I couldn't care less about. The solution was to simply kill any non ascii character in the string, not a nice or clean solution but the issue wasn't worth more effort.
That is nice if your job involves dealing with unicode issues. My job doesn't, any time I have to deal with it despite that is time wasted.
"Dealing with unicode" is really just about dealing with it at the input/output boundaries (and even then libraries handle it most of the time). But without the clear delineation that Python 3 provides, when you _do_ hit some issue you probably insert a "fix" in the wrong space. Leading to the classic Py2 "I just call decode 1000 times on the same string because I've lost track"
Interesting text follows company set naming schemes, which means all english and ascii. The rest could be random bytes for all I have to care about.
Many formats like plain text or zip don't have a fixed encoding and I am not going to start guessing which one it is for every file i have to read, there is no way to do that correctly. Dealing with that mess is explicitly something I want to avoid.
This is a lot of old code, and it's all ASCII, no matter what the locale of the system is. And even if the code was updated, all the messages would still be in some text == bytes encoding, because there's no "user data" involved, and the throughput desired is in many gigabytes of text processed per second.
So yeah, unicode is not "everywhere": it may be everywhere on the public internet, but there is a world beyond this.
So you can throw in your emoji and they might not correctly show up on the xml logging metadata I write, because I don't care. But they will end up in the processed file the same way they came in instead of <?> or some random Chinese or Japanese symbol that the guessing algorithm thought appropriate.
Also, there's no guessing happening in this instance. A locale configured in your environment variable are used if you open files using text mode.
The only reasonable scenario I can think of is when you are porting python 2 code to python 3 and play with .decode() and .encode().
We're talking about simple scripts, the solution is to not send in invalid characters.
Personally, asyncio and type annotations are a big turnoff. I know this is a bit contrarian, but I've always favored the greenlet/gevent approach to doing cooperative multi-tasking. Asyncio (neé twisted) had a large number of detractors, but now that the red/blue approach has been blessed, it seems like many are just swallowing their bile and using it.
Type annotations really chafe because they seem so unpythonic. I like using python for it's dynamicity, and for the clean, simple code. Type annotations feel like an alien invader, and make code much more tedious to try and read. If I want static typing, I'll use a statically typed language.
No one wants to spend energy re-programming to stay in place.
Python 3 came out in 2008 so say no backported features after 2009 no bug fixes after 2012. All announced in 2008 of course.
Given 4 years to migrate most would have made the jump sooner.
Once again, how can you ask/require users to expend precious limited energy to re-program just to stay in place? It's totally obnoxious and completely unnecessary.
This is exactly backwards of reality. It's as if they were eating at someones home and had turned a cup off coffee into a week long stay during which they rudely complained when the host asked them to please do something about their pile of dishes, trash, laundry, and leavings.
Nobody is after all taking away your version of python 2 or ability to use and maintain it. It takes active effort to keep fixing bugs in software that may be network facing. If you want to do that maintenance you can of course but people it seems aren't going to be doing this for python 2 forever. If you disagree either take up the reigns or pool your funds to pay someone to do it.
The thing to do back in 2008 was to figure out when you wanted to switch and schedule a bit of time to learn python 3. Anyone who did this by oh 2009 or 2010 would have virtually no work to do now. Any work that has been created since based on something you were told 11 years ago was going away is most assuredly work that you have created for yourself and will be obliged to take up.
Anyone who did this in 2014 would have a decade of runway before they can no longer run their python 2 apps on rhel/centos. Anyone who switches TODAY 11 years late to the party can run python + redhat for another 4 years.
It would be more work to do otherwise. Nobody wants to do that work. You don't and they don't.
1. it's free so it's OK to be user-hostile
2. if you don't like the direction, just fork/fix it
You don't have to fork it to fix it personally. You may also consider putting your money where your mouth is and organizing an effort to fund the change you want to see in the world. If you succeed the world will have additional value it wouldn't without it and owe you kudos. Everyone likes options. If you fail you ought to move on you have no basis for complaint. I think this is informative.
Open Source is Not About You
From where do you derive the requirement to graciously work for free to serve your ends?
If anyone can call anything anything then how is it even possible for the consumer to make intelligent choices? Having it be called something else allows your consumers to make an informed choice about using it rather than allowing you to incorrectly trade on the official projects reputation. Of course YOU might merely want someone to competently maintain python 2.
Others might opt to do so badly and thus damage the actual python brand. Worst yet others might opt to make changes to projects that serve their nefarious needs like folding in ads or data collection. Without a defining line between official and unofficial how do we prevent such?
Call it cobra and brand it pythons cooler cousin if you like.
- run 2to3
- spend 2h max fixing any failing tests
- cook of any remaining issues in a few days of beta testing like you'd do for any new release
Then you wouldn't have much to port.
They've been porting hg into Python 3 for the last 10 years and are only now nearing completion.
I've written a bit more about this in Lobsters:
Even taking into account the fact that new features were still being added and not all focus was on porting, this doesn't really seem like a reasonable representation of what's going on; I have a suspicion that "10 years" of porting here does not entail nearly as much work as it seems.
The average few hundred to few thousand loc app, which should be 98% of all production code-bases will almost certainly port with no issue.
In python2, that's trivial. Whatever system you're on would normally be configured so that filename bytes dumped to the terminal would be displayed correctly, so you could just treat the strings as bytes and it would be fine.
In python3, it was a nightmare. No, you could not just decode from/encode to UTF-8, even if that was what your system used! Python had its own idea of what the encoding of the terminal was, and if you used the wrong one, it wouldn't let you print. And if you tried to convert from UTF-8 to whatever it thought the terminal was using, it would also break, because not all characters were representable. And your script could just not tell Python to treat the terminal as UTF-8, either; you had to start digging into locale settings, and if you tried to fix those, then _everything else_ would break, and nobody had any idea what the right thing to do was, because you were using an obscure OS (the latest macOS at the time).
I assume that it works better now.
What about codebases with python2 third party dependencies that don't work in python3? Now you have to port that entire library as well, or write it yourself while crossing your fingers that it is well documented and easy to work through.
What about codebases without decent test suites? I'd argue most production codebases don't have good test suites, or at least the most complex of code is usually poorly tested. You'll end up spending most of your time digging for regressions especially if your code creates large amounts of user interfaces.
What about code bases that were written by scientists, mathematicians, or other professionals who may not be as fluent in writing "good" code?
There are almost no relevant 3'rd party libs that has not been ported at this stage. If they have not they probably have been abandoned and the client codebase have bigger issues. Same for uncovered code bases, and 'unprofessional' Python production code. That's hardly Python's fault.
The only real killer feature of Python3 is the async programming model. Unfortunately, the standard library version is numbingly complex. (Curio is far easier to follow, but doesn't appear to have a future.)
On the down side, switching to Unicode strings is a major hurdle. It mostly "just works", but when it doesn't, it can be difficult to see what's going on. Probably most programmers don't really understand all of the ins and outs. And on top of that, you get weird bugs like this one, which apparently is simply never going to be fixed.
The model is similar to Golang in many ways, e.g. communication using channels  and cancellation  reminiscent of context.WithTimeout, except that in Golang you need to reify the context passing.
The author has written some insightful commentary on designing async runtimes  and is actively developing the library, so I'm optimistic about its future. There were plans to use it for requests v3 until the fundraiser fiasco .
The link to support requests (which is a great piece of software) is here:
Note: This is NOT a charitable donation, it is a gift to an individual. These are not tax deductible under US law.
Njs has a long attacking blog post saying this needs to go through PSF (huh?) and that they should be getting most of this money not the person the funds were directed towards (it's not clear how much they've actually contributed to requests over time). This supposedly also may trigger folks who have also suffered from "gaslighting".
Supporting the developer of a piece of software does not, as far as I know, require that they sign up to handle it on a charitable basis. A big todo is made about the "large" amount raised. The amounts is 33K. To be frank, this is almost zero in tech land at least in the bay area and requests is a very highly used project. I was literally expecting something like 300K or even $1M - silly kickstarter projects raise for more and deliver nothing. Requests has already delivered a lot of utility.
Just a bit of perspective from someone who wasn't familiar with this "fiasco".
The money was raised specifically to support development of requests 3
> [Reitz] announced that work had begun on "Requests 3", that its headline feature would be the native async/await support I was working on, and that he was seeking donations to make this happen.
It's not so much that PSF needed to be used, as that there needed to be some accountability as to how those funds were used.
> [Reitz] chose a fundraiser structure that avoids standard accountability mechanisms he was familiar with. He never had any plan or capability to deliver what he promised. And when I offered a way for him to do it anyway, he gave me some bafflegab about how expensive it is to write docs. Effectively, his public promises about how he would use the Requests 3 money were lies from start to finish, and he hasn't shown any remorse or even understanding that this is a problem.
It sounds like a great deal of the work being done on requests is done by volunteers but the funding only goes to support Reitz
> I think a lot of people don't realize how little Reitz actually has to do with Requests development. For many years now, actual maintenance has been done almost exclusively by other volunteers. If you look at the maintainers list on PyPI, you'll see he doesn't have PyPI rights to his own project, because he kept breaking stuff, so the real maintainers insisted on revoking his access. If you clone the Requests git repo, you can run git log requests/ to see a list of every time someone changed the library's source code, either directly or by merging someone else's pull request. The last time Reitz did either was in May 2017, when he made some whitespace cleanups.
The issue is not so much that money is being made, but the way that it is done and the lack of accountability
> I don't have any objection to trying to make money from open-source. I've written before about how open-source doesn't get nearly enough investment. I do object to exploiting volunteers, driving out community members, and lying to funders and the broader community. Reitz has a consistent history of doing all these things.
NO WHERE that I can see did Reitz say he would hire NJS to do some work. Is Reitz even setup properly to report for taxes or withold taxes on amounts paid to NJS, check paperwork for work eligiblity etc? Would this even be allowable if requests is not a business, or would deductions be disallowed as non business (ie, a payment to NJS -> subject to tax to both Reitz and NJS potentially?).
If you as a donor want full charitable compliance using PSF you would ask to give through them - and perhaps more would have been given if PSF had been an option.
I don't even see NJS on the requests contributor list:
Finally, requests 3 has a number of features.
> And on a more personal level, I felt his interactions with me were extremely manipulative. I felt like he tried to exploit me, and that he tried to make me complicit in covering up his lies to protect his reputation. I was extremely uncomfortable with the idea of going along with this, but he created a situation where my only other options were to either give up on working on async entirely, or else to go public with the whole story, at potentially serious cost to myself.
> Ultimately, I decided to speak out because I care deeply about the Python community and its members. If one of our community's most prominent members freely lies to donors and harms volunteers, and if we all let that go without saying anything, then that puts everything we've built together at risk. And I'm in a better position than many to speak up.
The intent seems not to be trying to get people to blacklist or dogpile on Reitz, but to simply make people aware of the issues so they won't get caught off guard.
> This is the classic "missing stair" problem. Those in the inner circle quietly work around the toxic person. Outsiders come in blind. I'm pretty well-connected in the Python world, and I came in blind.
> Since this is the internet, I have to say explicitly: Please do not harass or abuse Reitz. That's never appropriate
The takeaway here appears to be "never work for free". If NJS has worked on his own project, controlled by him alone, this wouldn't have happened. If you donate a bunch of work to an open source project, then... well... the source is open.
And is it impossible that KR is doing an order of magnitude more work on the requests projects, with a much longer track records and raised the money directly to himself (and so has to pay taxes potentially on it) such that using it to support his work on requests is unreasonable?
Dropbox invested three years of work, actually hired Python's creator, and are still not done. What are they getting out of it that they wouldn't have gotten if Python2 simply had been maintained?
Who wants to break old SQL? Nobody.
But changing database vendors for a company can be a big deal, as bad as going from Python 2 to 3.
As a language however, it's a whole lot of nonsense. Extremely inconsistent syntax, stuffing a trinary logic into a boolean system, a standard that gets extended arbitrarily, and even the tooling ecosystem is a fair bit pathetic (the lack of formatters particularly annoy me; everyone tries to support SQLs generally, and end up missing every extended feature.. if it's not a simple select query/ddl, you're not getting a decent format output)
And it most certainly is a whole lot of fuss to migrate unless your database is tiny, or you didn't actually utilize the DB except as a dumb datastore (eg you relied solely on your ORM + indexes); there's a reason no good translator exists, and those that exist only support a very limited subset of any particular SQL variant, despite programming languages having a whole array of transpilers; it's simply not at all a simple language, and the variants only superficially look the same.
Would love to see your point in action. For me personally, speaking strictly writing simple scripts, they pretty much translate fairly well. Regarding formatting, are you referring to the output?
Every couple of months there's a new startup / dev site that says "SQL is broken/old/bad, so we reinvented it!". They all sink without trace, but there's a cohort who agrees with them.
It's not clear to me why postgres hasn't simply grown a whole array of frontends..
However, the lack of GUI frontends is also really weird. I don't see why it'd be harder to support than any other DB, and afaik pg has gotten fairly popular..
Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely. If someone wanted other than the core Python team wants to step up and maintain Python 2, they are free to do so, it's open source. But failing that, expecting the Python team to support the older/ less functional version of the code indefinitely is unrealistic. Corporate owned languages have even shorter lifecycles for exactly this reason.
And the alternative is cargo cult "newer is better".
>Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely.
On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...
How do you define just fine? It's taken us many years to migrate EMCA versions only to have multiple incompatible runtimes.
And JS "The good parts" is like 1/10 of the full language so often it feels like a lot of pile on.
>as does Java
How are them generics?
ECMAScript versions are forward-compatible. Any valid ECMAScript 3 code is also valid 5.1, 2015, 2016, etc.
I'm not sure what migration you're talking about. If you mean using new language features before your runtime targets support them, that's kinda on you. Even so, the ecosystem has tons of robust solutions for supporting legacy interpreters. Most notably, Babel does a wonderful job of transpiling to lower language-version targets.
Besides the total domination of the web programming space, which is of course aided by it being the only option:
1) Used by choice even on the server and application development (where it was never the only option, and wasn't even preferable/viable before)
2) Fast pace of language development
3) A thriving package ecosystem with millions of packages
4) Adopted by all major companies
6) Increasingly adopted as an embedded scripting language in all kind of apps
7) With a viable gateway into both a native trans-language runtime (webassembly) and a typed version of the language (typescript).
>How are them generics?
They're doing great. It's not that type erasure is that big of a deal, and Java might even get it with Valhalla eventually anyway. It's not a "backwards compatibility prevents this" issue (which is our topic here), it's a "no time devoted to add it in yet" issue.
Not that I'm implying that 'popularity' is a good measure of anything.
Python is popular largely based on the fact that it's so approachable. It is the BASIC/ VB of modern times for whatever that is worth. It does scale up to larger projects and is frequently used for big scale stuff, but I suspect the fact that it's so ubiquitous has more to do with the fact that it's also easy to pick up and for companies to find people with Python dev skills (or train them up).
You are moving the goalposts and ignoring the fact that Python3 still didn't deliver anything for most users.
When I upgrade to a new version of C# ... nothing happens.
Backwards compatibility is what made Microsoft the company it is. I think Python deserves all the crap it gets for 2 vs 3.
The question is rather whether it would have been better to gradually improve support for it in a Python2-esque way, rather than creating a discontinuity and a raft of new problems, some of which linger to this day.
Also, for many purposes, there is wide agreement that ASCII is still the way to go. Even if Americans vanished tomorrow, the majority of remaining programmers in the world would prefer to look at source code in English, which they already know, rather than a host of other languages, most of which they don't.
You do realize that C# was itself Microsoft's replacement for C++ right? And that when C# was released it had it's own growing pains and long roll-out in spite of having the worlds biggest corporation pushing it.
Python as a language is far older than C# so it had a lot more baggage than C# does.
Of course it's a "cargo cult" when someone disagrees with you.
> On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...
Likewise, Java development is slowly being superseded by Kotlin. Java is a mess, there are often 3-4 ways to do simple things and many of them are just terrible for performance.
No, it's obviously "backwards thinking", right?
CoffeeScript was just adopted (and not that much in the first place) because it brought new syntax/features earlier. Now JS has been getting new syntax itself at a great pace and CoffeeScript just died off.
>Likewise, Java development is slowly being superseded by Kotlin. Java is a mess, there are often 3-4 ways to do simple things and many of them are just terrible for performance.
Python has 10x+ worse performance, and more than 3-4 ways to do simple things (from package management to basic libs), most of which are just terrible for performance.
Compared to that, nobody has had any problem with Java performance for 15+ years...
And Kotlin is still insignificant except in the Android space where it's pushed, so there's that. Java sees an order of magnitude more usage.
Common Lisp has a backwards compatibility which goes into decades, and implementations like SBCL had no difficulties at all to absorb Unicode.
Racket even supports different language standards and completely different languages (such as Scheme and Algol) running on the same runtime. And both SBCL and Racket are compiled languages with a high-end GC which should make such things more difficult than CPython, which is purely interpreted and has a simpler GC.
But the incompatibility between Python 2 and Python 3 is perhap s only a symptom of a larger problem. The Python developers have decided that backwards compatibility is not that important any more. This is not a problem for companies like Dropbox, or small start-ups, from which 95 % will not even exist in five years on. It is, however, a huge problem for domains like scientific computing, where most code has no maintainers and even for very important code there is no budget or staff for maintenance:
Exactly: and that was a wrong decision for anybody but the developers of Python.
Everybody else prefers having something that works: "The improvements are welcome, but please allow us to to run our old programs too, thank you, and allow us to use that new feature only once we need it."
It's an obvious expectation. We would also hate a new version of a word processor which wouldn't open our old documents. Or a new version of Photoshop which wouldn't open our old pictures. Or a new version of the browser where only the newest web pages are visible.
It follows that it was absolutely technically possible to have a new version of Python in which the old programs still work. It's the failure of the developers that they haven't made it.
Compare that decision of theirs with the policy of Linus Torvalds who insists that the old user programs should never break on newer kernels.
"How long have you been a
maintainer? And you still haven't learnt the first rule of kernel
If a change results in user programs breaking, it's a bug in the
kernel. We never EVER blame the user programs."
Why do some python developers have to maintain installations of a whole handful of python versions just to ensure that their code is working? Why all the mess with pyenv, virtualenv, and so on? If the python developers, as well as the library developers would support backwards compatibility, this would not be necessary at all.
Exactly. The incompatibility mess continues when using different 3.x versions.
I think that's a tool selection problem, not just confined to the python world. If the language and libraries won't have a supported lifespan that matches with the maintenance budget of the projects using them then the wrong tool was chosen. If a project is expected to have a 10+ year lifespan of little to no maintenance then it needs to be built on languages/libraries that will have supported versions for that long.
You could say then, "well, then Python is perhaps just not a good match for those pesky scientists".
And this brings up two more points:
* A lot of important tools and libraries in the Python ecosystem was developed by scientists. Numarray/Numpy is a good example.
* If the core Python developers don't have the intention to maintain a backwards-compatible language version for more than, say, 15 years, they should perhaps clearly state on the python.org main page something like: "great, as you are a scientist, we welcome your contribution, but Python might not be suitable for tools that support long-term research".
Data science is doing just fine, in fact is leading the migration: https://www.jetbrains.com/research/python-developers-survey-...
"Data science" is a broad term but usually just means the application of numeric, and sometimes scientific, tools to commercial means. It is almost always done in companies. Typically, between such companies there is no open exchange of tools and methods, no exchange of knowledge, and no long-term use of generated codes. This is the reason why data science companies don't have the problems which Hinsen pointed out. But, they could become affected by a degrading suitability of Python for computational science, because their tools were initially developed by scientists.
int max = new Max(10, 5).intValue();
In any case, that's not some counter-argument, even if it points to a real wart.
It's "let me throw a random Java wart, as if it means something, and as if other languages don't have their own warts".
I just don't think that's true.
This is a story about language upgrades. Those languages showed how to do it right.
At some point plain users hated Java applets and Java desktop apps, but those are not a thing much more. In the server space, very few that use it hate Java, and millions use it.
Lots of people learn and use these languages because:
1 - That's what they're taught at school.
2 - That's where many if not most programming jobs are.
3 - There are a bazillion libraries they can use, compared to other languages.
4 - JS is built in to browsers.
5 - They don't know any better.
6 - Marketing.
Also, many companies want their staff to develop in these languages because of the reasons on this list plus that's what most programmers know, so it's relatively easy to find employees.
> Python's language was not solid (e.g., strings not unicode by default)
From the article:
"Python itself predates the first volume of the Unicode standard which came out in October 1991."
In any case, I think the jury is still out on whether Unicode in the primary string type is a good idea.
Python didn't get Unicode until later, so it had a chance to do it right - and it finally did, even on platforms like Windows where wchar_t is also 16-bit for historical reasons.
No maintaining 2 versions of python is much cheaper, it's only being done in one place compared to the thousands and thousands of python 2 code bases you'd have to convert.
It also only needs bug fixes, there are plenty of people/organisations out there that would be perfectly happy for the language to be unchanging.
It's extremely hard to keep compatibility with Python 2, many authors can't wait to do the support next year, many already dropped.
Presumably any packages worth maintaining will have far more dependent projects, so it's still far less overall effort.
> It's extremely hard to keep compatibility with Python 2
So don't? I don't think most of the people dragging their feet on the upgrade need or even want new features. A stable python 2 branch with bug fixes and security patches would suffice for most and be ideal for many. Over time the bug fixes should trend to zero and there probably aren't a heap of security issues in python projects anyway.
Only if they name it something completely different from python or py-anything. Guido refuses to allow anyone to just step in to maintain py2.
Tauthon is a project that aims to keep compatability with py2 while adding whatever features of py3 that won't break py2 and to have a maintained py2.
Don't forget that that person/organization would not only have to maintain python but also all the packages that you will be used.
I'm quite certain I'm not the only one.
I wasn't commenting on how buggy my own code is. Which version of a language I'm using doesn't really affect that variable.
But I guess you know this, and are OK with the compromises involved. I'll stop here ;)
This is true, and if we were talking about code that is exposed to the world at large, then my stance might be different. However, the projects that I've used Python for are not exposed in that way.
Note that I'm talking about personal projects, not work-related ones. At work, I use whatever is required.
About some of the design decisions in Guido's own words: http://python-history.blogspot.com/2009/02/adding-support-fo...
Sure there is: consistency. If member functions on an object that happens to be a class (i.e., methods) did magic transformations that member functions of other objects did not do, the mental overhead to understand Python code would be higher, and the ability to build custom general abstractions would be weaker.
It would perhaps make the simplest cases microscopically easier, but at the expense of making the already hard things more confusing and difficult to wrap with generalities.
Most statically-typed OOPLs don't have first-class classes that are just normal objects, so this isn't an issue because the things that enables aren't available; other dynamic languages may use models where methods aren't data members of classes (e.g., Ruby where methods are associated with classes, but not as instance members which, in Ruby, wouldn't be externally accessible—while Ruby classes are objects, the ability to hold methods is a special power of Ruby classes/modules compared to other objects, not just having instance members. This is one way Ruby's model is more complex than Python’s, and it definitely bites you in terms of seeking general solutions some times..)
It's true that languages are abstractions, but not all abstractions are useful.
member this.foo(x, y) = ...
It's more elegant firstly because it follow use, and secondly because it means that "def foo(x)" has the same meaning both inside and outside of a class declaration - it's just a function, and there's nothing special about its first argument. As it is, we need stuff like @staticmethod and @classmethod. It's especially annoying when you have class attributes that happen to reference a function, because conversion from functions to methods is a runtime thing - so you have to remember to wrap those in staticmethod() as well.
You would need those even with your suggestion (except you could drop @staticmethod if you add another layer of magic so that methods declared without an leading identifier were assumes static; you'd still need @classmethod some equivalent mechanism to distinguish which non-static methods were class vs instance methods.)
No it doesn't. Currently `def foo(self): pass` is called as `instance.foo()`. You're suggesting that `def self.foo(): pass` would be called as `instance.foo()`, except now it looks like self and instance are syntactically related in ways that they aren't.
> Again, "this" is just an identifier here, and doesn't have any special meaning.
But the grammar is no longer LL(1), and you have weird conditionally valid syntax, like `.` in a function name is valid only in a class block.
> "def foo(x)" has the same meaning both inside and outside of a class declaration
This is a stretch, especially since you're now optimizing for the uncommon case. Staticmethods are rare compared to instance methods (I'll go further and claim that static methods are an antipattern in python, modules are valid namespaces, you can stick a function in a module and couple it to a class also defined in that module and nothing bad will happen. Banning staticmethods entirely doesn't reduce expressiveness). Aligning staticmethods with functions, instead of aligning instance methods with functions (as python does currently) encourages you to do the wrong thing.
Your changes don't affect classmethod at all, if anything they'd make classmethod more of a special case. How do you signal that `self.foo()` takes `self` as the class instead of the instance?
> It's especially annoying when you have class attributes that happen to reference a function, because conversion from functions to methods is a runtime thing - so you have to remember to wrap those in staticmethod() as well.
What do you mean? Like
a = Foo.func()
a = func()
> because conversion from functions to methods is a runtime thing
I'd also quibble with this: it's a binding thing.
a = Foo.foo(None)
Specifically this means that if you can get your hands on the `method` constructor (like with `type(instance.method)`), you can then do silly things like
def foo(self): pass
instance = A()
def f(self): return 5
assert instance.func() == 5
But also, there's no reason to make those legal only inside classes. All it needs to do is make "def foo.bar" produce a different type of function, that has the method descriptor-producing behavior that is currently implemented directly on regular functions.
As far as less vs more common case - I think it's more important to optimize for obviousness and consistency. If "def foo" is a function, it should always be a function, and functions should behave the same in all contexts. They currently don't - given class C and its instance I, C.f is not the same object as I.f, and only one of those two is what "def" actually produced.
What I meant by function references inside classes is this:
Foo.bar = lambda: 123
foo = Foo()
Foo.bar = staticmethod(lambda: 123)
On the other hand, this only applies to objects of type "function", not all callables. So e.g. this is okay:
Foo.bar = functools.partial(lambda x: x, 123)
def frob(self, x, y): ...
def frob(self, x, y): ...
frob_xyzzy = functools.partial(frob, x=1, y=2)
frob_whammo = functools.partial(frob, x=3, y=4)
foo = Foo()
foo.frob(x=0, y=0) # okay
foo.frob_xyzzy() # TypeError: frob() missing 1 required positional argument: 'self'
foo.frob_whammo(foo) # okay!
Yes it is. LL(1) Grammars can still be recursive, they just can't change the parsing rules based on distant context.
> As far as less vs more common case - I think it's more important to optimize for obviousness and consistency
Yes, but having the "easiest" thing you do:
The rest of your comment complains about inconsistencies of how python converts various callables to methods. This is a fairly valid and interesting complaint, but has nothing to do with syntax, it is solely a semantic complaint that would be solved by having class creation treat all attributes that are callables as functions. In fact, you could customize class creation yourself this way using __new__, no syntactic changes required.
> as software grows more complex, the uncommon cases become common enough that you have to deal them regularly
While this is true, I think you vastly overestimate how common these constructs are. Like, you're in the realm of "this doesn't appear on github" levels of uncommon.
Personally, again, I think "you can't use partial() to define methods" is a very good thing: if you're doing this, you're into weird metaprogramming land. Its not any harder to, for example, write out
def frob(self, x, y): ...
def frob_xyzzy(self): return self.frob(x=1, y=2)
def frob_whammo(self): return self.frob(x=3, y=4)
1. You deserve what you get
2. You can invoke deeper magic to solve these problems
My complaint is of course more complicated than the syntax alone. I'm just saying that a distinctive syntax for self-as-receiver could be used to drive other changes that would make things more intuitive and self-consistent overall. And, of course, any discussion about "elegance" is going to be inherently subjective.
With respect to commonality of various constructs - this is all from personal experience writing Python code (for a project that is hosted on GitHub, by the way). It doesn't require particularly fancy code to trip that wire in general - it just requires code that tries to be generic, i.e. not make more assumptions that it needs to about the types of values that flow through it. Python classes break that genericity by treating functions, and only functions, in a special way whenever they flow through class attributes. This is particularly egregious in a language where every object is potentially callable, and non-function callables are very common; so functions really aren't all that special in general - except for that one case.
With partial() specifically, of course you can avoid that in this manner. But why would you, if it works with regular functions? I prefer it over defs, not just because it's more concise and avoids repeating things, but because it's also clearer - when you see partial(), it immediately tells you that it's a simple alias, nothing else.
But regardless, it's a function that has a certain documented behavior, and common sense would dictate that this behavior works the same for methods as well as functions. That it doesn't is not an intentional limitation of partial() - it's an unfortunate quirk of the design of methods themselves. And if you don't know exactly how functions become methods in Python, you wouldn't have any reason to expect the behavior that it exhibits. That's why the docs for partial() have to spell it out explicitly: "partial objects defined in classes behave like static methods and do not transform into bound methods during instance attribute look-up" - because that's not a reasonable default assumption!
The bigger problem is that every library that offers a generic wrapper callable has to add the same clause to its docs, because they're all affected in the same way. And if they don't document it, and you use, say, a decorator from a library - how do you know whether the fact that it returns a function and not some other callable is part of the contract that you can rely on, or an implementation detail? Conversely, whenever you implement a decorator, you have to be cognizant that changing the type of the return value from/to plain function can be a breaking change for your clients - and that is even less obvious.
Let's take the following code snippet as an example:
foo = val
> With partial() specifically, of course you can avoid that in this manner. But why would you, if it works with regular functions?
Simply: because I'd prefer it if functions look like functions. Understanding that `def x` is a callable is easier than trying to discern if `x = foo(other_thing)` results in x being callable or not, where it does for some values of `foo`, but not for others. Which isn't to say that python shouldn't make this change, I think I mostly agree with your complaint, I just probably wouldn't take advantage of it.
> My complaint is of course more complicated than the syntax alone.
To be frank, I don't see any connection between your syntactical suggestions and your semantic ones. They seem to be entirely orthogonal.
Thing.foo(thing_instance, bar, baz)
Not saying Python fell on the wrong side of that line, just that it's an easy line to end up on the wrong side of.
Actually Python is one of my favorite programming languages, probably the language with the closest mapping to how I naturally think about a problem. I really like it. But I'm also willing to admit it has some warts, as does any language.
If I were listing Python warts, I'd point to things like single-element tuples (1,), the `datetime` support (timezone-naive datetimes are an ambiguous disaster), or the cpython Global Interpreter Lock.
My editor supports snippets so moderate python boilerplate is not a problem.
This was done intentionnaly, because "Explicit is better than implicit". It also has some uses, eg. if you want to do this:
pass # you can refer to both 'self' and 'self2 here
other_object.bar = new_bar
Too bad Python breaks this "commandment" pretty much whenever it wants to.
The second is not true, if you add methods with double underscore (also design decision) they do behave like private. The real method name will be randomly generated so you won't get conflicts, and you can still access it for debugging purposes.
Perhaps more importantly, it makes even more sense in a language like Python where classes are (unlike many, especially statically-typed, class-based OO languages) first class objects to do what Python does, because of the relation of methods to classes. It also, to me. makes unbound/bound methods slightly more intuitive.
Now, I too was initially thrown by it because I'd used a bunch of OO languages that did it the other way first, and for quite a long time.
Separate from that, I think that would be a much bigger breaking change than you think. __setattr__ and __setattribute__ means that self.y doesn't necessarily refer to a traditional attribute so without self you could end up with either:
1. `x = y` where the expression `y` executes code
2. `x = y` where `x = ???` and `x = self.y` where `x = self.__setattr__('y')`.
That said, I do think the language could use something to reduce the size of __init__() since
self.arg1 = arg1
self.arg2 = arg2
self.arg3 = arg3
can get pretty verbose
I don't understand this. Aside from people seeing an OOP for the first time in their life, are they getting confused about where "this" comes from?
How would an implicit "self" a la "this" make it any less understandable where the data come from?
How is passing self making "navigating an unfamiliar code base much easier"? Aside from total newbs who see an OO codebase with an implicit instance variable for the first time?
So instead of having:
this.x = y
x = y
arg3: int = 5 # default value
And if you need to compute some values at object initialization, then they have a __post_init__() hook you can use .
Rust does the same thing - with self, it's an instance method, without, it's a static method.
Any function inside a class declaration becomes an instance method, with its first argument becoming the explicit receiver. You have to use @staticmethod to prevent that (or @classmethod to get the class as the first argument, instead of the instance).
Furthermore, this behavior is not parse-time, but runtime. The "def" statement that defines a function produces a plain function object, regardless of whether it's inside a class or not. Once the class finishes defining, the function is still a function - which is why C.f gives you a plain instance that can be called by passing "self" explicitly as an argument.
However, Python allows objects to provide special behavior for themselves whenever they're retrieved via a member of some class - that is, when you write something like x.y, after y is retrieved from x, it gets the opportunity to peek at x, and substitute itself with something else. Function objects in Python (here I mean specifically the type of objects created with "def" and "lambda", not just any callable) use this feature to convert themselves to bound methods. So, when you say x.f, after retrieving f, f itself is asked if it wants to substitute something else in its place - and it returns a new callable object m, such that m(y) calls f(x, y). That's what makes x.f(y) work.
In fact, that's a key difference between Python and many other class-based OO languages.
I'm not sure whether that (and the associated need to make the receiver an explicit parameter—conventionally, though this is not required, named “self” in instance methods—in method definitions) is your problem, or if your problem is that (unlike some, but fewer than the previous difference, OO languages) you can't omit explicitly naming receiver in references to its own instance variables in an instance methods, making instance variable reference syntactically distinct from local variable references.
How do you define enums? with a superclass.
How do you define data classes? With a class decorator.
And then there's metaclasses too.
I understand that this is one of the major features, but I personally never saw the appeal, given that gevent exists and in my experience works well most of the time. It also allows me to multiplex IO operations and doesn't rely on new syntax. I'm probably missing something?
- mandatory keyword arguments
- multi-dict splatting
- nicer yield semantics for generators
- Fixing system-specific encoding ambiguities
- inline type annotations
- better metaclass support
- more introspection tooling
- pathlib (for nicer path handling)
- mocking pulled into the standard library in a cleaner way
- stable ABIs for extensions
- secrets handling
- ellipsis instead of pass (yeah who cares but I care)
- lots of standard lib API cleanup
All of this is very helpful for making clean applications. But I would say it's _very_ helpful for making good libraries as well. This stuff is about having a strong language foundation to avoid plain weirdness like the click issue .
Obviously it doesn't kill all of them, but there used to be even more of that kind of thing all the time. Library issues would basically get exported to its users, all basically due to language problems.
pd.read_excel(filepath) will read an entire dataset even if it contains unicode characters.
pd.ExcelFile() silently drops(!!) unicode rows. The resulting object will simply skip unicode-containing rows (in ANY column) them without even a warning.
For example, if you had an excel file:
then pd.read_excel() would give you a dataframe with 5 rows. ExcelFile() on the other hand would return (silently!) a dataframe with only the first two and the last row.
Maybe this is a pandas issue, not a python issue, but it was really horrendous to debug for such a long time only to realize this was the issue.
I'm not sure how to submit a bug report, to be honest.
I understand why it's the way it is, but when it comes the the typical unixy things I need to do shuffling of files around, tar'ing stuff, etc, it definitely trips me up more than I'd wish.
The visibility of the errors is a minor point, but I think it more appropriate that it be solved by e.g. the windowing toolkit API.
You need to slap a decode anyway on reads from subprocesses in python3, and files open in Unicode mode by default. Wouldn't that fix the majority of silly UTF-8 compat bugs? Or am I missing a class of bugs that's not avoided automatically by python3 strings?
On the other end, most programs don't actually care what the data encoding is. They just move it.
Well, no, not really. You go read the docs and try to find out. Most of the time, there is a definitive encoding - if there weren't, a lot more things would be broken. Sometimes, it is not guaranteed, even though de facto that is the case - and this highlights broken interface specifications. When it is truly unknown, you explicitly treat it as raw bytes.
And the good thing about Python 3 is that it forces you to think about this. In Python 2, most of the time, data processing code can be hacked together, and it "just works", right until the point the input happens to include something unanticipated. Like, say, the word "naïve".
> On the other end, most programs don't actually care what the data encoding is. They just move it.
It doesn't necessarily mean that they get to dodge the bullet. In Python 2, if you read data from a file, you get raw bytes, but if you read data from parsed JSON, you get Unicode strings - because JSON itself is guaranteed to be Unicode. Guess what happens when the byte string you've read from the file, and the Unicode string you've read from a JSON HTTP response, are concatenated?
Migrating from Python 2 to Python 3 is way worse than that -- code changes are required, and because Python is a dynamic language you may not notice bugs until you actually run the code (or even worse, until after you release it to production and some code branch that is rarely invoked somehow gets called...). In other words, the tooling and the type system are not confidence-inspiring and it's really hard to verify that you migrated without breaking stuff.
At a certain point this sort of compatibility/forward motion of a codebase through big language revisions is something that has to be designed as part of the language in either being able to break it down into small enough chunks to chew through in pieces (updating a submodule with the updated language without affecting anything else), completely transparent to the code being run through it (this happens for compilers for C for different standards), or to have a version to version automated rewriting mechanism that is so reliable the outcome of the automated tool is not in question (tools like Go's gofmt). Python in my opinion only has partial solutions to all of those answers so it turns into a lot of hand work.
So while there are other languages that may do other things better there are still a class of programs that are very effective to write in Python, and that's plenty enough reason to keep it around. Do not forget that Python 2 was released in 2000 and Python 3 was released almost a decade later. The general time scale makes worrying about the next release for many people, but for people who do they start considering other languages because that's important to them.
If/when the day comes that using Python 2 isn't realistic, I may go with 3, or I may choose a different language, depending on the project. I'll cross that bridge when I come to it.
Besides Java and Python already discussed, another big mess of a transition was from Qt 4 to Qt 5, where all the strings became unicode.