Hacker News new | past | comments | ask | show | jobs | submit login
Why Is the Migration to Python 3 Taking So Long? (stackoverflow.blog)
202 points by josep2 66 days ago | hide | past | web | favorite | 351 comments



The simple reason is that there was no compelling feature to reward you for upgrading. You'd spend a tremendous amount of effort for dubious return and (until recently) a smaller ecosystem.

1. Unicode support was actually an anti-feature for most existing code. If you're writing a simple script you prefer 'garbage-in, garbage-out' unicode rather than scattering casts everywhere to watch it randomly explode when an invalid byte sneaks in. If you did have a big user-facing application that cared about unicode, then the conversion was incredibly painful for you because you were a real user of the old style.

2. Minor nice-to-haves like print-function, float division, and lazy ranges just hide landmines in the conversion while providing minimal benefit.

In the latest py3 versions we've finally gotten some sugar to tempt people over: asyncio, f-strings, dataclasses, and type annotations. Still not exactly compelling, but at least something to encourage the average Joe to put in all the effort.


> Unicode support was actually an anti-feature for most existing code. If you're writing a simple script you prefer 'garbage-in, garbage-out' unicode rather than scattering casts everywhere to watch it randomly explode when an invalid byte sneaks in. If you did have a big user-facing application that cared about unicode, then the conversion was incredibly painful for you because you were a real user of the old style.

Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.

In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.

> In the latest py3 versions we've finally gotten some sugar to tempt people over: asyncio, f-strings, dataclasses, and type annotations. Still not exactly compelling, but at least something to encourage the average Joe to put in all the effort.

That's because until 2015 all python 2.7 features were from python 3. Python 2.7 was basically python 3 without the incompatible changes. After they stopped backporting features in 2015. Suddenly python 3 started looking more attractive.


> Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.

> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.

Sometimes you don't care about weird characters being print as weird things. In python 2 it works fine: you receive garbage, you pass garbage. In python 3 it shuts down your application with a backtrace.

Dealing with this was one of my first Python experiences and it was very frustrating, because I realized that simply using #!/usr/bin/python2 would solve my problem but people wanted python3 just because it was fancier. So we played a lot of whack-a-mole to make it not explode regardless of the input. And the documentation was particularly horrible regarding that, not even the experienced pythoners knew how to deal with it properly.


Those issues are common when you're having python 2 code that uses unicode datatype and you have a task to migrate it to python 3.

You run your python 2 code on python 3 and it fails, most people at that point will place encode() or decode() in place where you have a failure. When the correct fix would be to place encode/decode at I/O boundary (writing to files (and in python 3 even that is not needed if you open files in text mode), network etc).

Ironically a python 2 code that doesn't use unicode is easier to port.

When you program in python 3 from the start it's very rare to need encode/decode strings. You only do that if you are working on I/O level.

> And the documentation was particularly horrible regarding that, not even the experienced pythoners knew how to deal with it properly.

Because it's not really python specific knowledge. It's really about understanding what the unicode is, what bytes are, and when to use each.

The general practice is to keep everything you do as text, and do the conversion only when doing I/O. You should think of unicode/text as as a representation of a text, as you think of a picture or sound. Similarly to image and audio text can be encoded as bytes. Once it is bytes it can be transmitted over network or written to a file etc. If you're reading the data, you need to decode it back to the text.

This is what Python 3 is doing:

- by default all string is of type str, which is unicode - bytes are meant for binary data - you can open files in text and binary mode, if you open in text the encoding is happening for you - socket communication - here if you need to convert string to bytes and back

Python 2 is a tire fire in this area:

- text is bytes - text also can be unicode (so two ways to represent the same thing) - binary data can also be text - I/O accepts text/bytes, no conversion happening - a lot (most? all?) stdlib is actually expecting string/bytes as input and output - cherry on top is that python2 also implicitly converts between unicode and string so you can do crazy thing like my_string.encode().encode() or my_string.decode()

So now you get a python 2 code, where someone wanted to be correct (it is actually quite hard to do it, mainly because of the implicit conversion) so the existing code will have plenty of encode() and decode() because some functions now expect str some unicode.

At different functions you might then have bytes or unicode as a string.

Now you take such code and try to move it to python 3, which no longer has implicit conversion and will throw an error when it expected text and got bytes and vice versa. The str now is unicode, unicode type no longer exists and bytes is now not the same thing as str. So your code now blows up.

Most people see an error so they add encode() or decode() often trying which one works (like what you were removing) when the proper fix would be actually removing encodes() and decodes() in other places of the code.

It's quite difficult task when your code base is big, so this is why Guido put a lot of effort with type annotations, mypy. One of its benefits supposed to help with these issues.


The worst part about Unicode in Python 2 isn't even that everything defaults to bytes. It's that the language will "helpfully" implicitly convert bytes/str, using the default encoding that literally makes no sense in practically any context - it's not the locale encoding. It's ASCII!

Native English speakers are usually the ones blissfully unaware of it, because it just happens to cover all their usual inputs. But as soon as you have so much as an umlaut, surprise! And there are plenty of ways to end up with a Unicode string floating around even in Python 2 - JSON, for example. And then it ends up in some place like a+b, and you get an implicit conversion.


I've been struggling with this recently when trying to print stdout from subprocess.communicate with code that runs on both 2 and 3. Such a headache - got any recommended reading around this area?


I don't think this is exactly what you're asking but a good starting point:

https://sunscrapers.com/blog/python-best-practices-what-ever...

With 2 vs 3 code is easiest to write your code for python 3 and then in 2 import everything you have in __future__ package including unicode literals. That's still not enough and you still might need to do extra work. In python 3 there's argument encoding, which could do the encoding which doesn't look like it is available in python 2. So you probably shouldn't be use it and treat all input/output as bytes (i.e. call encode() when sending data to stdin, and decode() on what you get back from stdout and stderr).

Perhaps that might be enough for your case, although many things is hard to get right in python 2 even when you know what you should do, because of the implicit conversion.

Edit: this also might be useful: https://unicodebook.readthedocs.io/good_practices.html

Also this could help: https://unicodebook.readthedocs.io/programming_languages.htm...


> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.

This is definitely the case. I've been wrestling with bytes and strings all the time during the port of a Django application to Python 3 for a costumer. I can see myself encoding and decoding response bodies and JSON for the time being. For reasons I didn't investigate I don't have to do that with projects in Ruby and Elixir. It seems everything is a string there and yet they work.


I’ve worked in a variety of Django codebases, and the last time I had trouble with string encoding/decoding was with Python 2. Since moving to Python 3, I have rarely needed to manually encode or decode, and I genuinely can't remember the last time I did.

Perhaps there’s something about a port that requires encoding/decoding bytes/strings?


The encoding/decoding is heavy in codebases that have to run on Python 2 and Python 3 at the same time, and authors are worried about handling unicode correctly on python 2.

Ironically when your python 2 app doesn't care about unicode, the porting to python 3 is actually much easier.


you don't have to do these things in python 3 either, your problem was that you had python 2 code that was already broken and you are started adding encode/decode to fix it, typically making the problem worse.

If you write code in python 3 from the start you rarely need to use encode() and decode(). Typically what you always want is a text not bytes.

Exception to it might be places where you want to serialize like IO (network or files, although even files are converted on the fly unless you open file in a binary mode).


The problem are external APIs returning whatever they want no matter what they should return. The world is messy.

Example, I just had to write this

  return urllib.request
    .urlopen(url, timeout=60)  
    .read()
    .decode("utf-8", errors="backslashreplace")
(probably not valid code because of the newlines but you'll forgive me) Then I use that string in a regexp, etc.

This is the only language where I have to explicitly deal with encodings at such low level. I don't feel like I want to use it for my pet projects.


Why is that bad? The result returned from an URL is always binary. In certain situations it could be text but it doesn't have to be. If the result was an image and you would want to convert the data to image, if it was sound file same, you should think of text as another distinguished type.

Of course urllib could have method text() that would do such conversion, but then urllib is not requests. It never was user friendly.

Edit: personally I use aiohttp, the interface is much nicer: https://aiohttp.readthedocs.io/en/stable/client_reference.ht... if I can't use asyncio then would use requests.


> Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.

Not that I've seen.

Example of where Python 3 has rained shit on my parade: I wrote a program that backs up files for Linux. It works fine in python 2, but in python 3 you rapidly learn you must treat filenames as bytes otherwise your backup program blows up on valid Linux filenames. It's not just decoding errors, it's worse. Because Unicode doesn't have a unique encoding for each string, so the round trip (binary -> string -> binary) is not guaranteed to get you the same binary. If you make the mistake of using that route (which Python3 does by default) then one day Python3 will tell you can't open a file you os.listdir() microseconds ago and can clearly see is still there.

Later, you get some sort of error when handling one of those filenames, so you sys.stderr.write('%s: this file has an error' % (filename,)). That worked in python2 just fine, but in python3 generates crappy looking error messages even for good filenames. You can't try to decode the filename to a string because it might generate a coding error. This works: sys.write('b%b: this file has an error' % (filename,)), but then you find you've inserted other strings into error messages and soon the only "sane" thing to do is to to convert every string in your program to bytes. Other solutions like sys.write('%s: this file has an error' % (filename.decode(errors='ignore'),)) but corrupt the filename the user sees, are verbose, and worst of all if you forget it isn't caught by unit tests but still will cause your program to blow up in rare instances.

I realise that for people who live in a land of clearly delineated text and binary, such as the django user posting here, these issues never arise and the clear delineation between text and bytes is a bonus. But people who use python2 as a better bash scripting language than bash don't live in that world. For them python2 was a better scripting language than bash, but is being being depreciated in favour of python3 that's actually more fragile than bash for their use case. (That's a pretty impressive "accomplishment".) Perhaps they will go to back to Perl or something, because it stands Python3 isn't a good replacement.


>For them python2 was a better scripting language than bash

This! IMO Python 2 has better usability for prototyping and thinking and doing things on the fly. Python 3 also often seems to have deprecated the functions I want to use in favor of those that are more cumbersome and take more keystrokes. More explicit sure, but less fluid.


Filenames need to be treated as binary because of bad designs decades ago. Rust handles this correctly imho, by having a separate type for such strings, OsStr.


Python has pathlib nowadays. But I'm not sure whether that stores them as raw bytes or Unicode internally - the API provides for either.


Rust had the luxury to learn from mistakes of others :)

When python was created the Unicode didn't even exist.

Anyway in python 3, many os functions accept string and bytes, and might behave depending on it. For example os.walk, if you pass path as byte string, will output paths as bytes.


> Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.

Not always. As far as I can tell writing garbage bytes to various APIs works fine unless they explicitly try to handle encoding issues. First time I noticed encoding issues in my code was when writing an xml structure failed on windows, all because of an umlaut in an error message I couldn't care less about. The solution was to simply kill any non ascii character in the string, not a nice or clean solution but the issue wasn't worth more effort.

> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.

That is nice if your job involves dealing with unicode issues. My job doesn't, any time I have to deal with it despite that is time wasted.


So you don't have to deal with it until user data includes _any non-ascii character_ (including emoji, weird spaces copied from other stuff, or loan words like café)

"Dealing with unicode" is really just about dealing with it at the input/output boundaries (and even then libraries handle it most of the time). But without the clear delineation that Python 3 provides, when you _do_ hit some issue you probably insert a "fix" in the wrong space. Leading to the classic Py2 "I just call decode 1000 times on the same string because I've lost track"


> So you don't have to deal with it until user data includes _any non-ascii character_ (including emoji, weird spaces copied from other stuff, or loan words like café)

Interesting text follows company set naming schemes, which means all english and ascii. The rest could be random bytes for all I have to care about. Many formats like plain text or zip don't have a fixed encoding and I am not going to start guessing which one it is for every file i have to read, there is no way to do that correctly. Dealing with that mess is explicitly something I want to avoid.


What kind of text do you have to process at your job, that you never meet any unicode in it? Nowadays unicode is everywhere, especially with emojis. Even a simple IRC bot needs to handle that.


A lot of scientific/numeric work (up until quite recently, it's slowly, slowly changing) involves text processing of inputs and outputs of other programs, using Python as the glue language.

This is a lot of old code, and it's all ASCII, no matter what the locale of the system is. And even if the code was updated, all the messages would still be in some text == bytes encoding, because there's no "user data" involved, and the throughput desired is in many gigabytes of text processed per second.

So yeah, unicode is not "everywhere": it may be everywhere on the public internet, but there is a world beyond this.


I deal with file formats that like plain text files and zip do not specify an encoding and have different encodings depending on where they come from. I think the generic approach is to guess, which means trying encodings until one successfully converts unknown input to garbage unicode resulting in output that is both wrong and different from the original input. Most of the time I can just treat the text contents as byte array, with a few exceptions that are specified to use ascii compatible names.

So you can throw in your emoji and they might not correctly show up on the xml logging metadata I write, because I don't care. But they will end up in the processed file the same way they came in instead of <?> or some random Chinese or Japanese symbol that the guessing algorithm thought appropriate.


In that case you should be opening files in binary mode "b", then you will be operating in bytes.

Also, there's no guessing happening in this instance. A locale configured in your environment variable are used if you open files using text mode.


Doesn't always blow up. Notably b"key" and "key" are now distinct dictionary keys, and both can coexist in the same dict. Is the absence of an optional key a fatal error? No, the program runs, and just does the wrong thing, or fails to copy the right value to the next stage, or whatever. Fun to debug.


To get b'key' and 'key' in a dictionary in python 3 you really need to try hard.

The only reasonable scenario I can think of is when you are porting python 2 code to python 3 and play with .decode() and .encode().


>Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.

We're talking about simple scripts, the solution is to not send in invalid characters.


even in very simple scripts you don't get invalid characters until you actually get them.


Solid take. I'd add that performance was worse for a number of releases, and there were significant warts and incompatibilities in versions before 3.4.

Personally, asyncio and type annotations are a big turnoff. I know this is a bit contrarian, but I've always favored the greenlet/gevent approach to doing cooperative multi-tasking. Asyncio (neé twisted) had a large number of detractors, but now that the red/blue approach has been blessed, it seems like many are just swallowing their bile and using it.

Type annotations really chafe because they seem so unpythonic. I like using python for it's dynamicity, and for the clean, simple code. Type annotations feel like an alien invader, and make code much more tedious to try and read. If I want static typing, I'll use a statically typed language.


Another problem with python’s type annotations is that false negatives are common in partially type annotated code bases: i.e. an annotation which is untrue, but for which there are no supporting calls/usages causing the type checker to reject it. This is pretty pathological in my experience: it means that annotations have the semantic status of comments (i.e. might be true, might not, who knows) while being given the syntactic status of “real code”.


I’m writing Elixir code currently and find the red/blue approach in JavaScript a pain. Never used asyncio beyond trying a few "hello world" and it was just baffling. In Rust async seems not terrible with the newer syntax, typing, and of course, huge speed improvement making it worthwhile. But in a dynamic VM? Just a pain. Julia’s approach with "tasklets" seems intriguing as well.


I and many others are totally with you when it comes to asyncio vs. gevent.


They really should have used the breaking nature of v3 to drop features that prevented good JIT implementations or speedups in cpython.


I am flabbergasted every time I see a software project eschew backwards-compatibility.

No one wants to spend energy re-programming to stay in place.

Especially APIs.


Yes python 3 was clearly a mistake. There could have been less hostile ways to make improvements in the language.


Probably the mistake was not dropping support much sooner.

Python 3 came out in 2008 so say no backported features after 2009 no bug fixes after 2012. All announced in 2008 of course.

Given 4 years to migrate most would have made the jump sooner.


Dropping support is about as user-hostile as it gets.

Once again, how can you ask/require users to expend precious limited energy to re-program just to stay in place? It's totally obnoxious and completely unnecessary.


It's absolutely amazing to me that people can pay nothing for something and frame the party providing that software for free opting not to spend even more effort to provide bug fixes for an old version of their software as them effectively taxing them. This is especially true of python 2 which is was supported from the release of python 3 in 2008 until 2020 and further supported by red hat until 2024.

This is exactly backwards of reality. It's as if they were eating at someones home and had turned a cup off coffee into a week long stay during which they rudely complained when the host asked them to please do something about their pile of dishes, trash, laundry, and leavings.

Nobody is after all taking away your version of python 2 or ability to use and maintain it. It takes active effort to keep fixing bugs in software that may be network facing. If you want to do that maintenance you can of course but people it seems aren't going to be doing this for python 2 forever. If you disagree either take up the reigns or pool your funds to pay someone to do it.

The thing to do back in 2008 was to figure out when you wanted to switch and schedule a bit of time to learn python 3. Anyone who did this by oh 2009 or 2010 would have virtually no work to do now. Any work that has been created since based on something you were told 11 years ago was going away is most assuredly work that you have created for yourself and will be obliged to take up.

Anyone who did this in 2014 would have a decade of runway before they can no longer run their python 2 apps on rhel/centos. Anyone who switches TODAY 11 years late to the party can run python + redhat for another 4 years.

>completely unnecessary

It would be more work to do otherwise. Nobody wants to do that work. You don't and they don't.


Two of the worst responses ever:

1. it's free so it's OK to be user-hostile

2. if you don't like the direction, just fork/fix it


If you don't pay anything the project gets to decide to what extent serving your interests is a worthy goal.

You don't have to fork it to fix it personally. You may also consider putting your money where your mouth is and organizing an effort to fund the change you want to see in the world. If you succeed the world will have additional value it wouldn't without it and owe you kudos. Everyone likes options. If you fail you ought to move on you have no basis for complaint. I think this is informative.

Open Source is Not About You

https://gist.github.com/richhickey/1563cddea1002958f96e7ba95...


I'm amazed at the lengths people go to justify user-hostility.


Do you regularly go to restaurants you can't afford and declare their desire not to make you a plate hostile?

From where do you derive the requirement to graciously work for free to serve your ends?


“ Nobody is after all taking away your version of python 2 or ability to use and maintain it” No, but they are taking away the right of anyone who does this to call the result “python”, and that is user-hostile.


Why do you believe you have a right to call such a work python? Trademark when not abused is the one form of intellectual property that is trivially defensible.

If anyone can call anything anything then how is it even possible for the consumer to make intelligent choices? Having it be called something else allows your consumers to make an informed choice about using it rather than allowing you to incorrectly trade on the official projects reputation. Of course YOU might merely want someone to competently maintain python 2.

Others might opt to do so badly and thus damage the actual python brand. Worst yet others might opt to make changes to projects that serve their nefarious needs like folding in ads or data collection. Without a defining line between official and unofficial how do we prevent such?

Call it cobra and brand it pythons cooler cousin if you like.


I know its simple, but it wasn't until I learned about f-strings that I actually switched for good.


I thought the reason was because Py2 was still getting new features too for some time. I’ve only just started learning And using Python so it isn’t my world.


asyncio is actually really nice and with ThreadPoolExecutor / ProcessPoolExecutor it fit a lot of use cases I had hacked together things for in Python2. That alone was worth it to me.


i like the condescending bit at the end of your post. python 3 is for average joe’s.


Again with the 'Tremendous amount of effort' meme. I've done many ports and they were all trivial:

    - run 2to3
    - spend 2h max fixing any failing tests
    - cook of any remaining issues in a few days of beta testing like you'd do for any new release
Now now doubt Python 2.7 is a excellent and solid release and will remain so for as long anyone keeps the bitrot in check, but to keep using it because porting is 'hard' is patent bs.


It's not so much that it's "hard", but that it's time consuming when you have hundreds or even thousands of python scripts to port -- and since those scripts already work and you probably weren't going to have to touch them at all, you're not really gaining anything for all of that porting effort.


Maybe whomever should have stopped writing new ones by 2009 a decade ago.

Then you wouldn't have much to port.


If you'd have been writing python a decade ago, you'd know why people couldn't transition immediately to Python 3 even if they wanted to. I no longer work for the company that has hundreds of Python scripts left to migrate, but I don't think all of libraries needed (including some API libraries from vendors) were ported to python3 until a few years ago.


Behold the tremendous amount of effort for Mercurial:

https://www.mercurial-scm.org/repo/hg/log?rev=py3&revcount=2...

They've been porting hg into Python 3 for the last 10 years and are only now nearing completion.

I've written a bit more about this in Lobsters:

https://lobste.rs/s/3vkmm8/why_i_can_t_remove_python_2_from_...


Honest question, how can it possibly take 10 years to port hg to Python 3? If I am to believe the Wikipedia source for the first release of Mercurial[0], it would've been only 4 years old at the start of the 10 year porting process. How on earth does it take 10 years to port 4 year old software?

Even taking into account the fact that new features were still being added and not all focus was on porting, this doesn't really seem like a reasonable representation of what's going on; I have a suspicion that "10 years" of porting here does not entail nearly as much work as it seems.

[0] https://lkml.org/lkml/2005/4/20/45


Please follow the links to answer your questions. That should help.


Yes of course there will be exceptions. But the vast majority off Python code bases are not mercurial or dropbox or imgur. Just like the vast majority of software using companies are not google or facebook.

The average few hundred to few thousand loc app, which should be 98% of all production code-bases will almost certainly port with no issue.


Maybe now. When python3 came out, anything that touched the filesystem was a hideous mess to port. Let's say you have a simple script that takes a file name as an argument, reads the file, prints some message to stdout (which includes the name of the file), and creates a new output file whose name is based on the name of the first file.

In python2, that's trivial. Whatever system you're on would normally be configured so that filename bytes dumped to the terminal would be displayed correctly, so you could just treat the strings as bytes and it would be fine.

In python3, it was a nightmare. No, you could not just decode from/encode to UTF-8, even if that was what your system used! Python had its own idea of what the encoding of the terminal was, and if you used the wrong one, it wouldn't let you print. And if you tried to convert from UTF-8 to whatever it thought the terminal was using, it would also break, because not all characters were representable. And your script could just not tell Python to treat the terminal as UTF-8, either; you had to start digging into locale settings, and if you tried to fix those, then _everything else_ would break, and nobody had any idea what the right thing to do was, because you were using an obscure OS (the latest macOS at the time).

I assume that it works better now.


You're assuming a lot.

What about codebases with python2 third party dependencies that don't work in python3? Now you have to port that entire library as well, or write it yourself while crossing your fingers that it is well documented and easy to work through.

What about codebases without decent test suites? I'd argue most production codebases don't have good test suites, or at least the most complex of code is usually poorly tested. You'll end up spending most of your time digging for regressions especially if your code creates large amounts of user interfaces.

What about code bases that were written by scientists, mathematicians, or other professionals who may not be as fluent in writing "good" code?


dependency rot is a more general problem. in my work we deal with lots of seldomly used applications, and I've found that reducing external dependencies tends to keep life happier


It happens... but maybe you are assuming to little.

There are almost no relevant 3'rd party libs that has not been ported at this stage. If they have not they probably have been abandoned and the client codebase have bigger issues. Same for uncovered code bases, and 'unprofessional' Python production code. That's hardly Python's fault.


No one wants to spend a ton of energy just to remain in place. I don't understand how software providers are so cavalier about eschewing backwards compatibility.


What's the largest codebase you've migrated?


would you be willing to port my 796,113 line program for two hours of pay at $45.00/ hour? Because if so it would be a bargain to hire you. Last time I tried to plan the conversion by looking over the codebase it took me two days of concerted effort to just come to the conclusion that it wasn't worth the effort.


Because there's been not enough carrot and too much stick.

The only real killer feature of Python3 is the async programming model. Unfortunately, the standard library version is numbingly complex. (Curio is far easier to follow, but doesn't appear to have a future.)

On the down side, switching to Unicode strings is a major hurdle. It mostly "just works", but when it doesn't, it can be difficult to see what's going on. Probably most programmers don't really understand all of the ins and outs. And on top of that, you get weird bugs like this one, which apparently is simply never going to be fixed.

https://github.com/pallets/click/issues/1212


With respect to async, I'm partial to trio [0], which is a spiritual successor to curio [1].

The model is similar to Golang in many ways, e.g. communication using channels [2] and cancellation [3] reminiscent of context.WithTimeout, except that in Golang you need to reify the context passing.

The author has written some insightful commentary on designing async runtimes [4] and is actively developing the library, so I'm optimistic about its future. There were plans to use it for requests v3 until the fundraiser fiasco [5].

[0] https://github.com/python-trio/trio

[1] https://vorpus.org/blog/announcing-trio/

[2] https://trio.readthedocs.io/en/stable/reference-core.html#us...

[3] https://trio.readthedocs.io/en/latest/reference-core.html#ca...

[4] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

[5] https://vorpus.org/blog/why-im-not-collaborating-with-kennet...


I was curious about [5]

The link to support requests (which is a great piece of software) is here:

https://cash.app/$KennethReitz

Note: This is NOT a charitable donation, it is a gift to an individual. These are not tax deductible under US law.

Njs has a long attacking blog post saying this needs to go through PSF (huh?) and that they should be getting most of this money not the person the funds were directed towards (it's not clear how much they've actually contributed to requests over time). This supposedly also may trigger folks who have also suffered from "gaslighting".

Supporting the developer of a piece of software does not, as far as I know, require that they sign up to handle it on a charitable basis. A big todo is made about the "large" amount raised. The amounts is 33K. To be frank, this is almost zero in tech land at least in the bay area and requests is a very highly used project. I was literally expecting something like 300K or even $1M - silly kickstarter projects raise for more and deliver nothing. Requests has already delivered a lot of utility.

Just a bit of perspective from someone who wasn't familiar with this "fiasco".


I don't have a horse in the game and I'm not familiar with these people or the "fiasco" but your summary does not seem like accurate description of the issues raised in [5]. Here are the quotes that give me a very different take on the situation:

The money was raised specifically to support development of requests 3

> [Reitz] announced that work had begun on "Requests 3", that its headline feature would be the native async/await support I was working on, and that he was seeking donations to make this happen.

It's not so much that PSF needed to be used, as that there needed to be some accountability as to how those funds were used.

> [Reitz] chose a fundraiser structure that avoids standard accountability mechanisms he was familiar with. He never had any plan or capability to deliver what he promised. And when I offered a way for him to do it anyway, he gave me some bafflegab about how expensive it is to write docs. Effectively, his public promises about how he would use the Requests 3 money were lies from start to finish, and he hasn't shown any remorse or even understanding that this is a problem.

It sounds like a great deal of the work being done on requests is done by volunteers but the funding only goes to support Reitz

> I think a lot of people don't realize how little Reitz actually has to do with Requests development. For many years now, actual maintenance has been done almost exclusively by other volunteers. If you look at the maintainers list on PyPI, you'll see he doesn't have PyPI rights to his own project, because he kept breaking stuff, so the real maintainers insisted on revoking his access. If you clone the Requests git repo, you can run git log requests/ to see a list of every time someone changed the library's source code, either directly or by merging someone else's pull request. The last time Reitz did either was in May 2017, when he made some whitespace cleanups.

The issue is not so much that money is being made, but the way that it is done and the lack of accountability

> I don't have any objection to trying to make money from open-source. I've written before about how open-source doesn't get nearly enough investment. I do object to exploiting volunteers, driving out community members, and lying to funders and the broader community. Reitz has a consistent history of doing all these things.


The people who can and should complain if they feel misled about how their money was used are the people who donated the money. Have any of them complained? I say this with some seriousness.

NO WHERE that I can see did Reitz say he would hire NJS to do some work. Is Reitz even setup properly to report for taxes or withold taxes on amounts paid to NJS, check paperwork for work eligiblity etc? Would this even be allowable if requests is not a business, or would deductions be disallowed as non business (ie, a payment to NJS -> subject to tax to both Reitz and NJS potentially?).

If you as a donor want full charitable compliance using PSF you would ask to give through them - and perhaps more would have been given if PSF had been an option.

I don't even see NJS on the requests contributor list:

https://github.com/psf/requests/blob/master/AUTHORS.rst

Finally, requests 3 has a number of features.


My take on it is that NJS was no so much upset/concerned about not getting paid as he was about the way he was treated and how this behavior affects the community.

> And on a more personal level, I felt his interactions with me were extremely manipulative. I felt like he tried to exploit me, and that he tried to make me complicit in covering up his lies to protect his reputation. I was extremely uncomfortable with the idea of going along with this, but he created a situation where my only other options were to either give up on working on async entirely, or else to go public with the whole story, at potentially serious cost to myself.

> Ultimately, I decided to speak out because I care deeply about the Python community and its members. If one of our community's most prominent members freely lies to donors and harms volunteers, and if we all let that go without saying anything, then that puts everything we've built together at risk. And I'm in a better position than many to speak up.

The intent seems not to be trying to get people to blacklist or dogpile on Reitz, but to simply make people aware of the issues so they won't get caught off guard.

> This is the classic "missing stair" problem. Those in the inner circle quietly work around the toxic person. Outsiders come in blind. I'm pretty well-connected in the Python world, and I came in blind.

> Since this is the internet, I have to say explicitly: Please do not harass or abuse Reitz. That's never appropriate


How can the people who donated money complain without NJS posting as he did?


I wouldn’t say 33k is “almost zero.” For me, a single person living in the Bay Area, $33k would pay all of my expenses for a little over 6 months. When we’re talking about this amount of money going to a single individual, I don’t think it’s fair to call it “almost zero.”


I suppose you would have to pay taxes on the 33k you get from a crowdfunding platform.


You would - yes, it's considered income and you may get a 1099-K even if no 1099-MISC in many cases. The 1099-K's have caught a fair number of folks not reporting income they otherwise were getting.


It's always amazing how adding a little money to the mix makes people lose their minds. NJS was working for free. KR raised some money from large businesses, then told NJS to keep working for free. NJS retroactively discovered that working for free is stupid, and tried demanding all the money NJS raised. This didn't happen, of course, since he had no leverage at all on NJS besides complaining online.

The takeaway here appears to be "never work for free". If NJS has worked on his own project, controlled by him alone, this wouldn't have happened. If you donate a bunch of work to an open source project, then... well... the source is open.


Where did KR say he was going to hire NJS or give him any money? NJS is not even on the contributor list for requests I don't think - so of all the people to hire is he even at the top of queue?

And is it impossible that KR is doing an order of magnitude more work on the requests projects, with a much longer track records and raised the money directly to himself (and so has to pay taxes potentially on it) such that using it to support his work on requests is unreasonable?


I wonder how many people feel that migrating to Python3 would have been worth doing in the absence of being forced to do so.

Dropbox invested three years of work, actually hired Python's creator, and are still not done. What are they getting out of it that they wouldn't have gotten if Python2 simply had been maintained?


This is such a good point. Take SQL. It has survived because it is well designed, and changes so little and so slowly, and it’s obvious what SQL is and isn’t meant for. Amateur programmers can port SQL queries between database systems semi-painlessly. In the right environment a query can survive with small edits for YEARS.

Who wants to break old SQL? Nobody.


That's something of an illusion. The flaws in SQL often aren't noticed because you usually pick one database vendor and stick with them. There are plenty of differences between vendors, but you don't usually have to support them at the same time, and you don't notice for simple cases.

But changing database vendors for a company can be a big deal, as bad as going from Python 2 to 3.


This is true. But, I think he means syntactically you can make jumps from one dbms to another without much of a fuss. Obviously there are non-language features but, that is not what is being discussed here.


That's mostly a result of databases (kinda) converging, SQL being a declarative language and it primarily being a wrapper around relational algebra, with a bunch of flags. And of course, it's a very direct interface to the RDBMS, and you're just migrating between systems that try very hard to have the same interface.

As a language however, it's a whole lot of nonsense. Extremely inconsistent syntax, stuffing a trinary logic into a boolean system, a standard that gets extended arbitrarily, and even the tooling ecosystem is a fair bit pathetic (the lack of formatters particularly annoy me; everyone tries to support SQLs generally, and end up missing every extended feature.. if it's not a simple select query/ddl, you're not getting a decent format output)

And it most certainly is a whole lot of fuss to migrate unless your database is tiny, or you didn't actually utilize the DB except as a dumb datastore (eg you relied solely on your ORM + indexes); there's a reason no good translator exists, and those that exist only support a very limited subset of any particular SQL variant, despite programming languages having a whole array of transpilers; it's simply not at all a simple language, and the variants only superficially look the same.


"As a language however, it's a whole lot of nonsense. Extremely inconsistent syntax, stuffing a trinary logic into a boolean system, a standard that gets extended arbitrarily, and even the tooling ecosystem is a fair bit pathetic (the lack of formatters particularly annoy me; everyone tries to support SQLs generally, and end up missing every extended feature.. if it's not a simple select query/ddl, you're not getting a decent format output)"

Would love to see your point in action. For me personally, speaking strictly writing simple scripts, they pretty much translate fairly well. Regarding formatting, are you referring to the output?


Dealing with Unicode in SQL has long been a backwards-compatibility nightmare.


In all fairness, dealing with unicode in general has been a pain point for me.


> Who wants to break old SQL? Nobody.

Every couple of months there's a new startup / dev site that says "SQL is broken/old/bad, so we reinvented it!". They all sink without trace, but there's a cohort who agrees with them.


The problem I think is they typically try to go ahead and reinvent the entire RDBMS as well.

It's not clear to me why postgres hasn't simply grown a whole array of frontends..


Agree totally. I had to climb the painful learning curve of psql because none of the front ends worked the way I wanted for some reason or another. Of course, having learned psql, I'm now scathing of anyone wanting a front end... hmm... maybe that's why ;)


I was actually thinking about the lack of SQL alternatives using the postgres engine; for example, why is datalog not simply available as an extension? Or MySQL syntax? PG implements a wide array of alternatives to PL/pgSQL (including standard programming languages eg python), but for whatever reason these SQL-alts never seem to consider being layered on top of the postgres engine.

However, the lack of GUI frontends is also really weird. I don't see why it'd be harder to support than any other DB, and afaik pg has gotten fairly popular..


true, why isn't there a MongoDb alternative that just wraps jsonb on Postgres?


This is backwards thinking.

Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely. If someone wanted other than the core Python team wants to step up and maintain Python 2, they are free to do so, it's open source. But failing that, expecting the Python team to support the older/ less functional version of the code indefinitely is unrealistic. Corporate owned languages have even shorter lifecycles for exactly this reason.


>This is backwards thinking.

And the alternative is cargo cult "newer is better".

>Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely.

On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...


>JS manages to do that just fine

How do you define just fine? It's taken us many years to migrate EMCA versions only to have multiple incompatible runtimes.

And JS "The good parts" is like 1/10 of the full language so often it feels like a lot of pile on.

>as does Java

How are them generics?


> How do you define just fine? It's taken us many years to migrate EMCA versions only to have multiple incompatible runtimes.

ECMAScript versions are forward-compatible. Any valid ECMAScript 3 code is also valid 5.1, 2015, 2016, etc.

I'm not sure what migration you're talking about. If you mean using new language features before your runtime targets support them, that's kinda on you. Even so, the ecosystem has tons of robust solutions for supporting legacy interpreters. Most notably, Babel does a wonderful job of transpiling to lower language-version targets.


>How do you define just fine?

Besides the total domination of the web programming space, which is of course aided by it being the only option:

1) Used by choice even on the server and application development (where it was never the only option, and wasn't even preferable/viable before)

2) Fast pace of language development

3) A thriving package ecosystem with millions of packages

4) Adopted by all major companies

5) Three best of class runtimes (v8, JavascriptCore, Tracemonkey (?)) by 3 major vendors, with performance that smokes any dynamic language that is not LuaJit

6) Increasingly adopted as an embedded scripting language in all kind of apps

7) With a viable gateway into both a native trans-language runtime (webassembly) and a typed version of the language (typescript).

>How are them generics?

They're doing great. It's not that type erasure is that big of a deal, and Java might even get it with Valhalla eventually anyway. It's not a "backwards compatibility prevents this" issue (which is our topic here), it's a "no time devoted to add it in yet" issue.


And yet Python is eating the world: http://pypl.github.io/PYPL.html

Not that I'm implying that 'popularity' is a good measure of anything.


Javascript is the king of compromises. It's supported by every browser on the planet so of course support is massive. It's not as if a front end web developer can choose to work on the web and not use Javascript in some form or another.

Python is popular largely based on the fact that it's so approachable. It is the BASIC/ VB of modern times for whatever that is worth. It does scale up to larger projects and is frequently used for big scale stuff, but I suspect the fact that it's so ubiquitous has more to do with the fact that it's also easy to pick up and for companies to find people with Python dev skills (or train them up).


Then what are you implying?


Perhaps that listing all the excellent reasons for why some tech is 'better' than another does not translate into said tech being used.


Now do C#.

You are moving the goalposts and ignoring the fact that Python3 still didn't deliver anything for most users.


This is a hugely American point of view. For anyone who has to deal with unicode on a regular basis, the better unicode support alone is a huge improvement. That's without even looking at the advantages of Async support which offers big performance benefits for web developers—roughly 70% of Python users.


Fair criticism. But shouldn't you be even more pissed that the Unicode release was botched and we're still talking about it more than 10 years after?

When I upgrade to a new version of C# ... nothing happens.

Backwards compatibility is what made Microsoft the company it is. I think Python deserves all the crap it gets for 2 vs 3.


They have to be backwards-compatible to persuade people to pay for their proprietary products rather than switch to another vendor.


So you're saying backwards-compatibility is awesome. I agree.


Note that the question is not "Shall we support Unicode?". Clearly we should.

The question is rather whether it would have been better to gradually improve support for it in a Python2-esque way, rather than creating a discontinuity and a raft of new problems, some of which linger to this day.

Also, for many purposes, there is wide agreement that ASCII is still the way to go. Even if Americans vanished tomorrow, the majority of remaining programmers in the world would prefer to look at source code in English, which they already know, rather than a host of other languages, most of which they don't.


There was no way to gradually improve Unicode support in Python without breaking things, because a big part of it was stuff like implicit str/unicode conversions - that were broken because they practically never use the right encoding, but that you can't remove without introducing as much breakage as Python 3 did.


You lack imagination here :) There are several solutions that could have been pursued, including introducing a completely new type and effectively duplicating the existing string library for it.


It's not just the string library that is affected. It's literally every API in the stdlib that returns a string. You'd have to fork all of those, because changing any of them to return a completely new type would be a breaking change as well.


Unicode support isn't about supporting unicode in files of source code!


They did try to do it in Python 2 and intruded Unicode type. The problem was that not only it didn't help, it is the main reason why migration to Python 3 is so painful.


You are replying out of context here. The comment I was replying to was specifically "Python3 still didn't deliver anything for most users". Which is just not true.


Replying here because the other is nested too deep.

You do realize that C# was itself Microsoft's replacement for C++ right? And that when C# was released it had it's own growing pains and long roll-out in spite of having the worlds biggest corporation pushing it.

Python as a language is far older than C# so it had a lot more baggage than C# does.


> And the alternative is cargo cult "newer is better".

Of course it's a "cargo cult" when someone disagrees with you.

> On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...

"Just fine"... that explains why so many shops are dropping support for straight Javascript and switching to TypeScript or CoffeeScript before that. And why Javascript is littered with band-aid libraries like Underscore that are needed to turn it into an effective development language.

Likewise, Java development is slowly being superseded by Kotlin. Java is a mess, there are often 3-4 ways to do simple things and many of them are just terrible for performance.


>Of course it's a "cargo cult" when someone disagrees with you.

No, it's obviously "backwards thinking", right?

>* "Just fine"... that explains why so many shops are dropping support for straight Javascript and switching to TypeScript or CoffeeScript before that.*

CoffeeScript was just adopted (and not that much in the first place) because it brought new syntax/features earlier. Now JS has been getting new syntax itself at a great pace and CoffeeScript just died off.

As for TypeScript, this is just Javascript + type annotations. Kinda like what Python is getting with 3.6 and mypy, but more useful and with actual tooling available. So not sure how "TypeScript or CoffeeScript" prove anything about JS not doing great.

>Likewise, Java development is slowly being superseded by Kotlin. Java is a mess, there are often 3-4 ways to do simple things and many of them are just terrible for performance.

Python has 10x+ worse performance, and more than 3-4 ways to do simple things (from package management to basic libs), most of which are just terrible for performance.

Compared to that, nobody has had any problem with Java performance for 15+ years...

And Kotlin is still insignificant except in the Android space where it's pushed, so there's that. Java sees an order of magnitude more usage.


> On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...

Common Lisp has a backwards compatibility which goes into decades, and implementations like SBCL had no difficulties at all to absorb Unicode.

Racket even supports different language standards and completely different languages (such as Scheme and Algol) running on the same runtime. And both SBCL and Racket are compiled languages with a high-end GC which should make such things more difficult than CPython, which is purely interpreted and has a simpler GC.

But the incompatibility between Python 2 and Python 3 is perhap s only a symptom of a larger problem. The Python developers have decided that backwards compatibility is not that important any more. This is not a problem for companies like Dropbox, or small start-ups, from which 95 % will not even exist in five years on. It is, however, a huge problem for domains like scientific computing, where most code has no maintainers and even for very important code there is no budget or staff for maintenance:

https://blog.khinsen.net/posts/2017/11/16/a-plea-for-stabili...


> But the incompatibility between Python 2 and Python 3 is perhap s only a symptom of a larger problem. The Python developers have decided that backwards compatibility is not that important any more.

Exactly: and that was a wrong decision for anybody but the developers of Python.

Everybody else prefers having something that works: "The improvements are welcome, but please allow us to to run our old programs too, thank you, and allow us to use that new feature only once we need it."

It's an obvious expectation. We would also hate a new version of a word processor which wouldn't open our old documents. Or a new version of Photoshop which wouldn't open our old pictures. Or a new version of the browser where only the newest web pages are visible.

It follows that it was absolutely technically possible to have a new version of Python in which the old programs still work. It's the failure of the developers that they haven't made it.

Compare that decision of theirs with the policy of Linus Torvalds who insists that the old user programs should never break on newer kernels.


As an example for Linus' "kernel changes should not break user programs" policy, he was famous for using harsh words to pass the message to those who tired to steer otherwise (these words I don't have to repeat, so I'll just quote his main message):

https://lkml.org/lkml/2012/12/23/75

"How long have you been a maintainer? And you still haven't learnt the first rule of kernel maintenance?

If a change results in user programs breaking, it's a bug in the kernel. We never EVER blame the user programs."


To explain that a bit, a Linux kernel maintainer is somebody who organizes and collects contributions of others, screens them, and when they are finished, passes them on to Linus to integrate them. That's a huge responsibility, and Linus uses such strong expressions only for people who have such responsibility and which do things which would harm his project. It is not the case that he talks like that to normal contributors (which should be guided by the maintainers).


And the reason why I think there is a deeper problem is that the issue is clearly not solved with the python2/python3 transition:

Why do some python developers have to maintain installations of a whole handful of python versions just to ensure that their code is working? Why all the mess with pyenv, virtualenv, and so on? If the python developers, as well as the library developers would support backwards compatibility, this would not be necessary at all.


> the issue is clearly not solved with the python2/python3 transition

Exactly. The incompatibility mess continues when using different 3.x versions.


Obligatory Reference to Rich Hickeys talk on versioning and interface specs:

https://www.youtube.com/watch?v=oyLBGkS5ICk

Discussion:

https://news.ycombinator.com/item?id=13085952


> It is, however, a huge problem for domains like scientific computing, where most code has no maintainers and even for very important code there is no budget or staff for maintenance

I think that's a tool selection problem, not just confined to the python world. If the language and libraries won't have a supported lifespan that matches with the maintenance budget of the projects using them then the wrong tool was chosen. If a project is expected to have a 10+ year lifespan of little to no maintenance then it needs to be built on languages/libraries that will have supported versions for that long.


Well, first thing is that most new code in scientific research is developed in PhD projects which last some three or maybe four years. The people who develop this do not have resources and time to maintain this code. Projects don't have a budget for that. There are projects which are very long-running (think CERN or ESO's VLT) but even there the true duration of the code usage is seldomly planned (AFAIK, ESO VLT has just started to transition from Tk/Tcl to Python).

You could say then, "well, then Python is perhaps just not a good match for those pesky scientists".

And this brings up two more points:

* A lot of important tools and libraries in the Python ecosystem was developed by scientists. Numarray/Numpy is a good example.

* If the core Python developers don't have the intention to maintain a backwards-compatible language version for more than, say, 15 years, they should perhaps clearly state on the python.org main page something like: "great, as you are a scientist, we welcome your contribution, but Python might not be suitable for tools that support long-term research".


"ESO VLT has just started to transition from Tk/Tcl to Python" - they might regret this, new Tcl releases tend to take backward compatibility much more seriously than Python :-)


Yes, I've always thought that Tcl should be promoted as an archival quality programming language.


> It is, however, a huge problem for domains like scientific computing, where most code has no maintainers and even for very important code there is no budget or staff for maintenance:

Data science is doing just fine, in fact is leading the migration: https://www.jetbrains.com/research/python-developers-survey-...


"data science" and "science" are quite different things. Science is the systematic and collaborative pursuit of knowledge in a long-term endeavour. It is based on sharing and open exchange of methods and tools. Numericas, mathematics and computational codes are just important tools to do that. As Hinsen in the blog post I cited above points out, the most part of important computational codes is written in one-off research projects which go for a few years, and the people who develop these codes normally have to move on and work for a different institution, if they manage to keep working in science. On the other hand, important codes and algorithms may be used for many many years.

"Data science" is a broad term but usually just means the application of numeric, and sometimes scientific, tools to commercial means. It is almost always done in companies. Typically, between such companies there is no open exchange of tools and methods, no exchange of knowledge, and no long-term use of generated codes. This is the reason why data science companies don't have the problems which Hinsen pointed out. But, they could become affected by a degrading suitability of Python for computational science, because their tools were initially developed by scientists.


Discussion of Hinsen's post on HN:

https://news.ycombinator.com/item?id=17058269


Both Java and Javascript are giant messes.


And yet, somehow, Python manages to be a bigger mess than than both of them combined.


Java is nothing like a mess, and JS is going from strength to strength.


  int max = new Max(10, 5).intValue();


Not sure what's that supposed to be. What's this "Max" class?

In any case, that's not some counter-argument, even if it points to a real wart.

It's "let me throw a random Java wart, as if it means something, and as if other languages don't have their own warts".


No, 'cargo culting' is when you don't understand the causal link between cause and effect. Of course "newer is better" in this case, and we know exactly why: Thousands of man days have been spend improving underlying libraries, lessons have been learnt, new ideas stolen from other languages, optimization, security improvements and so on.


Almost everyone hates both Java and Javascript.


> Almost everyone hates both Java and Javascript.

I just don't think that's true.


And I dislike Python, but that's at best a cool story, bro.

This is a story about language upgrades. Those languages showed how to do it right.


Javascript is a hugely popular language, as is Java.

Some hipsters hate "Java and Javascript". The world at large loves them.

At some point plain users hated Java applets and Java desktop apps, but those are not a thing much more. In the server space, very few that use it hate Java, and millions use it.


Just because they're popular doesn't necessarily mean their users like them.

Lots of people learn and use these languages because:

1 - That's what they're taught at school.

2 - That's where many if not most programming jobs are.

3 - There are a bazillion libraries they can use, compared to other languages.

4 - JS is built in to browsers.

5 - They don't know any better.

6 - Marketing.

Also, many companies want their staff to develop in these languages because of the reasons on this list plus that's what most programmers know, so it's relatively easy to find employees.


Ah yes, let's turn this into a good old "your language sucks" thread


JS and Java can do that because they were designed reasonably well from the start. All they tend to add are more features on top of the solid core language. Python's language was not solid (e.g., strings not unicode by default) so they needed a major overhaul.


> JS and Java can do that because they were designed reasonably well from the start.

Javascript was knocked up over a week or so. Sure it implements concepts from Scheme and other languages but it was certainly not "designed reasonably well from the start". Otherwise we wouldn't have needed books such as Crockford's "Javascript The Good Parts" to help us understand areas of the language to avoid/misuse.

> Python's language was not solid (e.g., strings not unicode by default)

From the article:

"Python itself predates the first volume of the Unicode standard which came out in October 1991."


Kind of an odd slam, given that Python (and most other widely used languages) predated Unicode.

In any case, I think the jury is still out on whether Unicode in the primary string type is a good idea.


Didn’t C# come out after Unicode (1990s vs 2000)? And they still didn’t implement it properly. Instead, they went with the horrible UCS-2 because it made interfacing with the Windows API easier (it uses wchar_t (16-bit) and UCS-2).


It's funny that you should bring up Unicode strings as an example. Java strings aren't Unicode by modern standards - their char is fixed as 16 bits, because back in the day, UCS2 was "good enough for everybody". So the moment you have, say, an emoji, stuff like length() and indexing is no longer dealing in actual codepoints. And you can slice in the middle of a surrogate pair, and end up with an instance of String that's not even valid UTF-16.

Python didn't get Unicode until later, so it had a chance to do it right - and it finally did, even on platforms like Windows where wchar_t is also 16-bit for historical reasons.


I wouldn't call Js a reasonably well designed language but at least it supports Unicode strings.


> Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely.

No maintaining 2 versions of python is much cheaper, it's only being done in one place compared to the thousands and thousands of python 2 code bases you'd have to convert.

It also only needs bug fixes, there are plenty of people/organisations out there that would be perfectly happy for the language to be unchanging.


That's only true if you ignore all packages on PyPi.

It's extremely hard to keep compatibility with Python 2, many authors can't wait to do the support next year, many already dropped.


> That's only true if you ignore all packages on PyPi.

Presumably any packages worth maintaining will have far more dependent projects, so it's still far less overall effort.

> It's extremely hard to keep compatibility with Python 2

So don't? I don't think most of the people dragging their feet on the upgrade need or even want new features. A stable python 2 branch with bug fixes and security patches would suffice for most and be ideal for many. Over time the bug fixes should trend to zero and there probably aren't a heap of security issues in python projects anyway.


> If someone wanted other than the core Python team wants to step up and maintain Python 2, they are free to do so, it's open source.

Only if they name it something completely different from python or py-anything. Guido refuses to allow anyone to just step in to maintain py2.

Tauthon is a project that aims to keep compatability with py2 while adding whatever features of py3 that won't break py2 and to have a maintained py2.

https://github.com/naftaliharris/tauthon


I suspect that someone, or a group of people, will step up to unofficially maintain Python 2 for the foreseeable future. It's clear that there are a lot of people using it that either can't easily migrate or are unwilling to do so for the various reasons already discussed in this thread.


I'm sure that for the right money you can find someone.

Don't forget that that person/organization would not only have to maintain python but also all the packages that you will be used.


If I encounter Python 2 bugs that matter to me, I can and will fix them myself if needed, and submit or otherwise publish the change.

I'm quite certain I'm not the only one.


Why not just migrate to 3?

genuinely curious...


Because of the amount of work involved in porting my existing Python code. Python 3 doesn't offer any advantages that matter to me, so that's a lot of effort for little gain.


OK, thanks. Good luck with the bug fixing ;)


I'm not really worried about bugs, because the code I have works well, and I doubt I'm going to do any serious new Python 2 development.


I meant the bugs in your dependencies that you'll have to backport fixes to. But hey, if you're writing bug-free code that'll be easy ;)


I understood what you meant. What I'm saying is that my existing code works fine, so whatever bugs are in the dependencies are ones that don't affect me. Should I make a code change that surfaces one, then I have the means to deal with it -- but that is almost certainly going to be a rare event, as my Python projects are stable and aren't going to see much change.

I wasn't commenting on how buggy my own code is. Which version of a language I'm using doesn't really affect that variable.


I get that. But the interesting thing about dependencies is how they surface vulnerabilities that can hurt code that works perfectly well. Your current code probably doesn't have many bugs, but includes an unknown number of vulnerabilities from your dependencies. The bad people probably won't bother examining your code for vulnerabilities, but they will be informed of vulnerabilities in popular libs, and then looking for projects that use those versions of those libs is a lot easier than scanning all those projects individually. So you end up having to backport a bunch of fixes to other people's code because that code was popular and came under intense scrutiny.

But I guess you know this, and are OK with the compromises involved. I'll stop here ;)


> but they will be informed of vulnerabilities in popular libs, and then looking for projects that use those versions of those libs is a lot easier than scanning all those projects individually.

This is true, and if we were talking about code that is exposed to the world at large, then my stance might be different. However, the projects that I've used Python for are not exposed in that way.

Note that I'm talking about personal projects, not work-related ones. At work, I use whatever is required.


Python 3 seems like a bit of a missed opportunity. Since they were introducing breaking changes in the first place, why didn't they go bigger? E.g. make the OOP seem less tacked on, immutable data structures, clean up inconsistencies in the standard library?


Can you describe how OOP feels tacked on? One of the major changes in Python 3 is that new-style classes are the only style of classes.


The nuisance of having to add self as parameter to every class method, no way of enforcing private methods and the mix between methods on objects and functions in the standard library. OO feels more integrated in e.g. Ruby.

About some of the design decisions in Guido's own words: http://python-history.blogspot.com/2009/02/adding-support-fo...


I’ve seen a lot of people taking issue with the self argument. Am I crazy to actually love it as a feature? I pulled out a lot of my hairs when I first started OOP (Java) trying to gork `this`, and I remember myself thinking “why didn’t everybody just do this” when I saw how Python declare methods. And I still think that’s a good idea now. So much easier to keep my sanity than using JavaScript’s bind.


What's weird is that it's half baked

   def foo(self):
instead of a more logical:

   def self.foo():
to match the calling syntax.


I don't think you understand what is actually going on under the hood here, at all. Methods are member functions of a class, and must be invoked on an object.

  someInstance.foo()
When invoking a method from within the class, you still need a reference to the object the method will be invoked on. In python, the reference is passed implicitly as the first argument, and usually called self. That has nothing to do with the method name or signature itself, and calling the method self.foo() would result in calling it like

  someInstance.self.foo()
for cases when you aren't referencing the function from within the class, which would be even more confusing.


The syntax of a language is an abstraction over what is actually going on under the hood. There's no reason python couldn't have designed their OOP abstraction more elegantly.


> There's no reason python couldn't have designed their OOP abstraction more elegantly.

Sure there is: consistency. If member functions on an object that happens to be a class (i.e., methods) did magic transformations that member functions of other objects did not do, the mental overhead to understand Python code would be higher, and the ability to build custom general abstractions would be weaker.

It would perhaps make the simplest cases microscopically easier, but at the expense of making the already hard things more confusing and difficult to wrap with generalities.

Most statically-typed OOPLs don't have first-class classes that are just normal objects, so this isn't an issue because the things that enables aren't available; other dynamic languages may use models where methods aren't data members of classes (e.g., Ruby where methods are associated with classes, but not as instance members which, in Ruby, wouldn't be externally accessible—while Ruby classes are objects, the ability to hold methods is a special power of Ruby classes/modules compared to other objects, not just having instance members. This is one way Ruby's model is more complex than Python’s, and it definitely bites you in terms of seeking general solutions some times..)


Why is self.foo more elegant? It's very rare that you'll call a given method as self.foo, and there's a slew of complications that come with such a syntax (the declaration grammar is more complex, `self` is now special, etc.)

It's true that languages are abstractions, but not all abstractions are useful.


It doesn't have to treat "self" as a special keyword - it just has to desugar "foo.bar(x, y)" into "bar(foo, x, y)". There are some other languages that do that - e.g. in F#, you write:

   member this.foo(x, y) = ...
Again, "this" is just an identifier here, and doesn't have any special meaning.

It's more elegant firstly because it follow use, and secondly because it means that "def foo(x)" has the same meaning both inside and outside of a class declaration - it's just a function, and there's nothing special about its first argument. As it is, we need stuff like @staticmethod and @classmethod. It's especially annoying when you have class attributes that happen to reference a function, because conversion from functions to methods is a runtime thing - so you have to remember to wrap those in staticmethod() as well.


> As it is, we need stuff like @staticmethod and @classmethod

You would need those even with your suggestion (except you could drop @staticmethod if you add another layer of magic so that methods declared without an leading identifier were assumes static; you'd still need @classmethod some equivalent mechanism to distinguish which non-static methods were class vs instance methods.)


My suggestion implied removing the descriptor-based "magic" on regular functions that make them behave as methods when accessed using dot-member syntax, and instead making the "def self.foo" syntax produce a special kind of function that would have that behavior. @staticmethod today basically just suppresses that special behavior on regular functions, so it wouldn't be needed in this case. But yeah, we'd still need @classmethod.


> It's more elegant firstly because it follow use

No it doesn't. Currently `def foo(self): pass` is called as `instance.foo()`. You're suggesting that `def self.foo(): pass` would be called as `instance.foo()`, except now it looks like self and instance are syntactically related in ways that they aren't.

> Again, "this" is just an identifier here, and doesn't have any special meaning.

But the grammar is no longer LL(1), and you have weird conditionally valid syntax, like `.` in a function name is valid only in a class block.

> "def foo(x)" has the same meaning both inside and outside of a class declaration

This is a stretch, especially since you're now optimizing for the uncommon case. Staticmethods are rare compared to instance methods (I'll go further and claim that static methods are an antipattern in python, modules are valid namespaces, you can stick a function in a module and couple it to a class also defined in that module and nothing bad will happen. Banning staticmethods entirely doesn't reduce expressiveness). Aligning staticmethods with functions, instead of aligning instance methods with functions (as python does currently) encourages you to do the wrong thing.

> classmethod

Your changes don't affect classmethod at all, if anything they'd make classmethod more of a special case. How do you signal that `self.foo()` takes `self` as the class instead of the instance?

> It's especially annoying when you have class attributes that happen to reference a function, because conversion from functions to methods is a runtime thing - so you have to remember to wrap those in staticmethod() as well.

What do you mean? Like

    class Foo:
        a = Foo.func()
        @staticmethod
        def func():
            return 1
I'll say again: staticmethods are an antipattern in python:

    def func()
        return 1

    class Foo:
        a = func()
works just as well, better in fact. Modules are great namespaces. Classes are more than namespaces, and if all you need is a namespace, you shouldn't use a class.

> because conversion from functions to methods is a runtime thing

I'd also quibble with this: it's a binding thing.

    class Foo:
        a = Foo.foo(None)
        def foo(self):
            return 1
will work, and if you check, type(Foo.foo) is still just `function`, its only when you create an instance of Foo that the function `foo` is bound to the instance, and when that is done, the bound `foo` is converted to a method object. This was different in python2, where Foo.foo and instance.foo were both "instancemethod" objects, but in python3, Foo.foo is a plain old function, and instance.foo is a method.

Specifically this means that if you can get your hands on the `method` constructor (like with `type(instance.method)`), you can then do silly things like

    class A():
      def foo(self): pass
    instance = A()
    def f(self): return 5
    assert instance.func() == 5
and this will work. You'll have bound the function to the instance. Of course, if you stick an attribute on `instance` (or `A`), and reference `self.attribute` in the function, this will still work. (this also lets you do things like bind a given instance of a function to a different instance of the class, but that's because the method constructor is essentially just partial with some bookkeeping for class information)


Python grammar is not LL(1) in general; just look at set and dict literals. This is really no different than ":" after an expression being legal inside a dict literal (and how you know that it is a dict literal).

But also, there's no reason to make those legal only inside classes. All it needs to do is make "def foo.bar" produce a different type of function, that has the method descriptor-producing behavior that is currently implemented directly on regular functions.

As far as less vs more common case - I think it's more important to optimize for obviousness and consistency. If "def foo" is a function, it should always be a function, and functions should behave the same in all contexts. They currently don't - given class C and its instance I, C.f is not the same object as I.f, and only one of those two is what "def" actually produced.

What I meant by function references inside classes is this:

   class Foo:
      pass

   Foo.bar = lambda: 123
   foo = Foo()
   print(foo.bar())
This blows up with "TypeError: <lambda>() takes 0 positional arguments but 1 was given", because lambda is of type "function", and it gets the magic treatment when it's read as a member of the instance. So you have to do this:

   Foo.bar = staticmethod(lambda: 123)
and even then this is only possible when you know that the value is going to end up as a class attribute. Sometimes, you do not - you pass a value to some public function somewhere, and it ends up stashed away as a class attribute internally. And it all works great, until you pass a value that just happened to be another function or lambda.

On the other hand, this only applies to objects of type "function", not all callables. So e.g. this is okay:

   Foo.bar = functools.partial(lambda x: x, 123)
because what partial() returns is not a function. Conversely, this means that you can't use partial() to define methods, which can be downright annoying at times. Suppose you have:

   class Foo:
      def frob(self, x, y): ...
and you want to define some helper methods for preset combinations of x and y. You'd think this would work:

   class Foo:
      def frob(self, x, y): ...
      frob_xyzzy = functools.partial(frob, x=1, y=2)
      frob_whammo = functools.partial(frob, x=3, y=4)
except it doesn't - while frob_xyzzy() and frob_whammo() both have the explicit "self" argument, they aren't "proper" functions, and thus that argument doesn't get treated as the implicit receiver:

   foo = Foo()
   foo.frob(x=0, y=0)    # okay
   foo.frob_xyzzy()      # TypeError: frob() missing 1 required positional argument: 'self' 
   foo.frob_whammo(foo)  # okay!
Which is to say, this all is a mess of special cases. You can argue that this all isn't really observable in the "common case" - the problem is that, as software grows more complex, the uncommon cases become common enough that you have to deal them regularly, and then those inconsistencies add even more complexity into the mix that you have to deal with - just when you already thought you had your plate full.


> Python grammar is not LL(1) in general; just look at set and dict literals. This is really no different than ":" after an expression being legal inside a dict literal (and how you know that it is a dict literal).

Yes it is[0]. LL(1) Grammars can still be recursive, they just can't change the parsing rules based on distant context.

> As far as less vs more common case - I think it's more important to optimize for obviousness and consistency

Yes, but having the "easiest" thing you do:

    class Foo:
        def bar():
            pass
silently do a usually unwanted thing (create a staticmethod) instead of an obviously wrong thing (raise an error) isn't obvious. It's building a footgun into the language.

The rest of your comment complains about inconsistencies of how python converts various callables to methods. This is a fairly valid and interesting complaint, but has nothing to do with syntax, it is solely a semantic complaint that would be solved by having class creation treat all attributes that are callables as functions. In fact, you could customize class creation yourself this way using __new__, no syntactic changes required.

> as software grows more complex, the uncommon cases become common enough that you have to deal them regularly

While this is true, I think you vastly overestimate how common these constructs are. Like, you're in the realm of "this doesn't appear on github" levels of uncommon.

Personally, again, I think "you can't use partial() to define methods" is a very good thing: if you're doing this, you're into weird metaprogramming land. Its not any harder to, for example, write out

    class Foo:
      def frob(self, x, y): ...
      def frob_xyzzy(self): return self.frob(x=1, y=2)
      def frob_whammo(self): return self.frob(x=3, y=4)
unless you're doing weird metaprogrammy magic and then, as someone who does a lot of weird metaprogrammy magic

1. You deserve what you get

2. You can invoke deeper magic to solve these problems

[0]: https://discuss.python.org/t/switch-pythons-parsing-tech-to-...


Why is it an unwanted thing? By the same logic that demands "self" to be explicit, if you don't specify "self", then you explicitly don't want the method to be an instance method - seems very straightforward to me. And it's still invokable via instance member access, so the user of the class is none the wiser. Where's the footgun?

My complaint is of course more complicated than the syntax alone. I'm just saying that a distinctive syntax for self-as-receiver could be used to drive other changes that would make things more intuitive and self-consistent overall. And, of course, any discussion about "elegance" is going to be inherently subjective.

With respect to commonality of various constructs - this is all from personal experience writing Python code (for a project that is hosted on GitHub, by the way). It doesn't require particularly fancy code to trip that wire in general - it just requires code that tries to be generic, i.e. not make more assumptions that it needs to about the types of values that flow through it. Python classes break that genericity by treating functions, and only functions, in a special way whenever they flow through class attributes. This is particularly egregious in a language where every object is potentially callable, and non-function callables are very common; so functions really aren't all that special in general - except for that one case.

With partial() specifically, of course you can avoid that in this manner. But why would you, if it works with regular functions? I prefer it over defs, not just because it's more concise and avoids repeating things, but because it's also clearer - when you see partial(), it immediately tells you that it's a simple alias, nothing else.

But regardless, it's a function that has a certain documented behavior, and common sense would dictate that this behavior works the same for methods as well as functions. That it doesn't is not an intentional limitation of partial() - it's an unfortunate quirk of the design of methods themselves. And if you don't know exactly how functions become methods in Python, you wouldn't have any reason to expect the behavior that it exhibits. That's why the docs for partial() have to spell it out explicitly: "partial objects defined in classes behave like static methods and do not transform into bound methods during instance attribute look-up" - because that's not a reasonable default assumption!

The bigger problem is that every library that offers a generic wrapper callable has to add the same clause to its docs, because they're all affected in the same way. And if they don't document it, and you use, say, a decorator from a library - how do you know whether the fact that it returns a function and not some other callable is part of the contract that you can rely on, or an implementation detail? Conversely, whenever you implement a decorator, you have to be cognizant that changing the type of the return value from/to plain function can be a breaking change for your clients - and that is even less obvious.


> Why is it an unwanted thing?

Let's take the following code snippet as an example:

    class Foo:
      def bar(val):
        foo = val
I claim that most of the time this is a mistake, and the author would have preferred `def bar(self, val): self.foo = val`. In current python, this will raise an exception when called. In your proposed python, this will silently do nothing, possibly leaving the instance in an invalid state. This is a footgun. I admit the example is contrived, but forgetting `self` is a thing I've seen happen, and having it fail loudly is preferable to having it do something likely unintended. Again, if someone wants to do the unusual thing, `@staticmethod` is still around.

> With partial() specifically, of course you can avoid that in this manner. But why would you, if it works with regular functions?

Simply: because I'd prefer it if functions look like functions. Understanding that `def x` is a callable is easier than trying to discern if `x = foo(other_thing)` results in x being callable or not, where it does for some values of `foo`, but not for others. Which isn't to say that python shouldn't make this change, I think I mostly agree with your complaint, I just probably wouldn't take advantage of it.

> My complaint is of course more complicated than the syntax alone.

To be frank, I don't see any connection between your syntactical suggestions and your semantic ones. They seem to be entirely orthogonal.


The thing is that that doesn't really match the calling syntax, it matches the syntactic sugar for the calling syntax. Whenever you do

    thing_instance.foo(bar, baz)
you're secretly calling

    Thing.foo(thing_instance, bar, baz)
Which succinctly explains where the self argument comes from.


Wow, this is a lot more intuitive. gowld for BDFL!


The first two of those are explicit design decisions: "explicit is better than implicit" and "we're all consenting adults" respectively.


There's a fine line between "explicit is good because it means how things behave is obvious" and "explicit is bad because it means typing the same boilerplate over and over again."

Not saying Python fell on the wrong side of that line, just that it's an easy line to end up on the wrong side of.


You're free to use `_` (or whatever) instead of `self`, if you find `self` to be much repetitive boilerplate. It's only a convention.


Sadly the linters went for `self`.


Most things in a language are explicit design decisions. Doesn't automatically make them good decisions.

Actually Python is one of my favorite programming languages, probably the language with the closest mapping to how I naturally think about a problem. I really like it. But I'm also willing to admit it has some warts, as does any language.


Python has warts, but IMHO it's weird to identify valid design choices, with pros and cons on each side, as "warts".

If I were listing Python warts, I'd point to things like single-element tuples (1,), the `datetime` support (timezone-naive datetimes are an ambiguous disaster), or the cpython Global Interpreter Lock.


Yep, complaining about self is kinda silly when there are real things to complain about.

My editor supports snippets so moderate python boilerplate is not a problem.


> The nuisance of having to add self as parameter to every class method

This was done intentionnaly, because "Explicit is better than implicit". It also has some uses, eg. if you want to do this:

    class Foo:
        def inject_bar(self):
            def new_bar(self2):
                pass # you can refer to both 'self' and 'self2 here
            other_object.bar = new_bar
It's rare, but it has its uses.


> "Explicit is better than implicit"

Too bad Python breaks this "commandment" pretty much whenever it wants to.


Yeah, The Zen of Python has commandments to justify pretty much anything. eg. "Simple is better than complex." or "Readability counts." when you want something to be implicit


"Practicality beats purity"


FWIW, I wish 'this' were explicit in C++.


You know that double underscore is basically making a method private, right? You can work around it to access the method anyway, but you can do that as well with Ruby.


first is a design decision, felt weird at first but after time I kind of like it, there's less magic behavior happening.

The second is not true, if you add methods with double underscore (also design decision) they do behave like private. The real method name will be randomly generated so you won't get conflicts, and you can still access it for debugging purposes.


I think for some people having the receiver as an explicit parameter of methods rather than references through a special syntactic variable bothers some people and makes it look to them like a procedural language playing OO, though I personally think it's the most clear representation of what every class-based OO language actually does operationally, since methods are attached to a class and take a reference to the instance on which they are invoked.


It may in fact be what every class-based OO language actually does operationally. But having the OO language do that for you is, to me, one of the lines that separates them from "a procedural language playing OO".


To me, it makes sense for a class-based OO language with first-class functions and (unlike, e.g., Java) available direct external access to data members to do what Python does because it reinforces the distinction between function members and methods.

Perhaps more importantly, it makes even more sense in a language like Python where classes are (unlike many, especially statically-typed, class-based OO languages) first class objects to do what Python does, because of the relation of methods to classes. It also, to me. makes unbound/bound methods slightly more intuitive.

Now, I too was initially thrown by it because I'd used a bunch of OO languages that did it the other way first, and for quite a long time.


How would you describe OO languages which only have multimethods, then? Say, Dylan?


Don't know enough to say.


I myself, speaking for my self and only my self find that, in regards to myself, the redundant self keyword, in my self's opinion, is somewhat selfish and easy for my self to accidentally omit. Self.


I personally really like the self keyword after spending some time reading code in Java where `this` is optional. Having to reference self makes it more clear where data is coming from and makes navigating an unfamiliar code base much easier. I'm a fan of forcing devs to acknowledge when they are accessing or manipulating mutable object state.

Separate from that, I think that would be a much bigger breaking change than you think. __setattr__ and __setattribute__ means that self.y doesn't necessarily refer to a traditional attribute so without self you could end up with either:

1. `x = y` where the expression `y` executes code

or

2. `x = y` where `x = ???` and `x = self.y` where `x = self.__setattr__('y')`.

That said, I do think the language could use something to reduce the size of __init__() since

self.arg1 = arg1

self.arg2 = arg2

self.arg3 = arg3

...

can get pretty verbose


>Having to reference self makes it more clear where data is coming from and makes navigating an unfamiliar code base much easier.

I don't understand this. Aside from people seeing an OOP for the first time in their life, are they getting confused about where "this" comes from?

How would an implicit "self" a la "this" make it any less understandable where the data come from?

How is passing self making "navigating an unfamiliar code base much easier"? Aside from total newbs who see an OO codebase with an implicit instance variable for the first time?


I believe the original commenter was more referring to the lack of 'this' usage in Java.

So instead of having:

    this.x = y
People tend to do:

    x = y
Which isn't always the most clear that it's referencing an instance variable rather than a scoped variable, especially if you see it midway through a method.


A, alright then. I'd go with making this mandatory in that case.


To your last point, check out Dataclasses[1].

    @dataclass
    class Foo:
        arg1: str
        arg2: int
        arg3: int = 5  # default value
I use them a lot in my 3.7+ projects. It helps reduce the boilerplate for typical classes. (Where for me, typical classes are a 1:1 mapping of inputs to attributes)

And if you need to compute some values at object initialization, then they have a __post_init__() hook you can use [2].

[1] https://docs.python.org/3/library/dataclasses.html [2] https://docs.python.org/3/library/dataclasses.html#post-init...


Java fields are declared in the same lexical scope as their use sites (except for inherited fields, but Python has the same magic in that case.)


My experience with Python is a little lacking, but doesn't the self keyword mean the difference between an instance method, and a static method?

Rust does the same thing - with self, it's an instance method, without, it's a static method.


Python has an "@staticmethod" decorator you can put on a function inside of a class definition. Then you don't need to pass "self" as a first arg.

https://www.geeksforgeeks.org/class-method-vs-static-method-...


"self" is not a keyword in Python, just a convention.

Any function inside a class declaration becomes an instance method, with its first argument becoming the explicit receiver. You have to use @staticmethod to prevent that (or @classmethod to get the class as the first argument, instead of the instance).

Furthermore, this behavior is not parse-time, but runtime. The "def" statement that defines a function produces a plain function object, regardless of whether it's inside a class or not. Once the class finishes defining, the function is still a function - which is why C.f gives you a plain instance that can be called by passing "self" explicitly as an argument.

However, Python allows objects to provide special behavior for themselves whenever they're retrieved via a member of some class - that is, when you write something like x.y, after y is retrieved from x, it gets the opportunity to peek at x, and substitute itself with something else. Function objects in Python (here I mean specifically the type of objects created with "def" and "lambda", not just any callable) use this feature to convert themselves to bound methods. So, when you say x.f, after retrieving f, f itself is asked if it wants to substitute something else in its place - and it returns a new callable object m, such that m(y) calls f(x, y). That's what makes x.f(y) work.


Self lets you do really cool things though. For instance, you can monkey-patch with `def override_method(arg_that_will_get_self, *args): pass; Class.method = override_method` after Class is defined. Python lets you have all the dynamism of Ruby without letting you go too crazy with DSLs - which is the perfect balance IMO.


To be pedantic "self" is not a keyword.


Worse than that not being a keyword, `self` isn't special at all, it's a naming convention that can be freely broken. Python codebases using `this` instead of `self` exist. Codebases that mix the two depending on who wrote which method in a class exist. I believe I saw `it` used once somewhere as the name for that parameter. I'm sure there are codebases with even crazier horrors for first argument names in class methods out there in the depths of corporate machines.


Python doesn't have a self keyword, redundant or otherwise.

In fact, that's a key difference between Python and many other class-based OO languages.

I'm not sure whether that (and the associated need to make the receiver an explicit parameter—conventionally, though this is not required, named “self” in instance methods—in method definitions) is your problem, or if your problem is that (unlike some, but fewer than the previous difference, OO languages) you can't omit explicitly naming receiver in references to its own instance variables in an instance methods, making instance variable reference syntactically distinct from local variable references.


Modern Python lacks coherent design:

How do you define enums? with a superclass.

How do you define data classes? With a class decorator.

And then there's metaclasses too.


Enums are technically defined with a metaclass, as are abstract base classes and most other weird special classes (but in all cases you access the metaclass by subclassing something). Dataclasses are the one exception, and they do this for pragmatic reasons: multiple metaclasses are tricky. So a decorator allows the existence of an abstract dataclass, for example.


> The only real killer feature of Python3 is the async programming model.

I understand that this is one of the major features, but I personally never saw the appeal, given that gevent exists and in my experience works well most of the time. It also allows me to multiplex IO operations and doesn't rely on new syntax. I'm probably missing something?


In gevent, any call may behave like an implicit "goto". Here's why "goto" is bad https://vorpus.org/blog/notes-on-structured-concurrency-or-g... (the link is stolen from @hermod's comment above)


I think the nursery construct described in that link looks a lot like par blocks in Occam, Handel-C and XMOS XC.


I find it much easier to reason about my async Python code when the async yield points are explicit. If every line can potentially yield to another async task, I find myself thinking long and hard about a lot of those lines.


For me the major gain in Py3 is exactly the sane handling of strings, that fixes the major flaw of Py2: weaking the String type to make the migration to UnicodeStrings "easy".


It's funny, because I consider asyncio (and derivatives like curio) to be the worst part of Python 3. There are plenty of other compelling reasons to move over. No, none of them revolutionize the language, but there are fewer warts and cleaner ways of doing many things.


On the topic of not enough carrot, I’m curious how impactful the end of support for python2 will be. How many programs stuck in python2 are encountering bugs in the runtime?


I actually find this aspect of the whole thing exciting, perhaps paradoxically. Now that Python 2 is static the runtime can become asymptotically bug-free. Meaning that, if you only change it to fix bugs (as opposed to introducing new syntax|semantics) it's going to approach perfection.


You are assuming that fixing bugs introduces less new bugs.


Who will do that though?



I've considered taking it on. If there were enough money in it, I'd very seriously consider it. It's a very good language.


In ten years, there will be serious money in it as enterprise codebases need to be renewed.


Python itself won't be as much affected, what will affect you most are your dependencies. Imagine using a library and running in some kind of bug. You check newer version, looks like that bug was fixed, but the library now only works on python 3.6+. What will you do?


Backport the fix from 3.6, or just implement a fix yourself?


Yes, that's one option it will be feasible initially but it will get exponentially harder.


And the side-effect of an endless trail of shit to deal with. I write primarily ML code so until 3.5 or 3.6 python 3 offered me almost nothing better. It did, however, make managing environments even more of a mess. I quite frankly resented having to deal with code in both, swapping back and forth between python2, python3, pip2, pip3, etc.


- format strings

- mandatory keyword arguments

- multi-dict splatting

- nicer yield semantics for generators

- Fixing system-specific encoding ambiguities

- dataclasses

- inline type annotations

- better metaclass support

- more introspection tooling

- pathlib (for nicer path handling)

- mocking pulled into the standard library in a cleaner way

- stable ABIs for extensions

- secrets handling

- ellipsis instead of pass (yeah who cares but I care)

- lots of standard lib API cleanup

All of this is very helpful for making clean applications. But I would say it's _very_ helpful for making good libraries as well. This stuff is about having a strong language foundation to avoid plain weirdness like the click issue .

Obviously it doesn't kill all of them, but there used to be even more of that kind of thing all the time. Library issues would basically get exported to its users, all basically due to language problems.


I ran into an issue recently specific to Pandas and python3 with unicode.

pd.read_excel(filepath) will read an entire dataset even if it contains unicode characters.

pd.ExcelFile() silently drops(!!) unicode rows. The resulting object will simply skip unicode-containing rows (in ANY column) them without even a warning.

For example, if you had an excel file:

word

---

"hello"

"hello"

你早

你早

"hello"

then pd.read_excel() would give you a dataframe with 5 rows. ExcelFile() on the other hand would return (silently!) a dataframe with only the first two and the last row.

Maybe this is a pandas issue, not a python issue, but it was really horrendous to debug for such a long time only to realize this was the issue.


Is there an issue for this in the bug tracker?


I searched, and this is the closest thing (https://github.com/pandas-dev/pandas/issues/11503) but it is not the issue I experienced.

I'm not sure how to submit a bug report, to be honest.


The main change people cite is the UTF-8 "support" but frankly it seems like far more pomp and circumstance than is necessary. Appropriate code point handling could have been provided and everything left as it was. And when you get rid of that, there's not that much left that is breaking.


I find the stronger differentiation between bytes and strings leads to a lot of gotchas, like when I forget to encode bytes or pass a string where bytes are expected.

I understand why it's the way it is, but when it comes the the typical unixy things I need to do shuffling of files around, tar'ing stuff, etc, it definitely trips me up more than I'd wish.


You at least notice it's wrong. In Python2 sometimes 'it worked' until it broke and you had to figure out why.


See, it just broke when UTF-8 was interpreted as ASCII. It's entirely possible to treat bytes as bytes and leave encoding out of it for the vast majority of programs. If you're dealing with text editing and so on, then you know you need to be UTF-8 aware, and the broken programs would still be broken in either language.

The visibility of the errors is a minor point, but I think it more appropriate that it be solved by e.g. the windowing toolkit API.


> the broken programs would still be broken in either language.

You need to slap a decode anyway on reads from subprocesses in python3, and files open in Unicode mode by default. Wouldn't that fix the majority of silly UTF-8 compat bugs? Or am I missing a class of bugs that's not avoided automatically by python3 strings?


Well, the summary of the argument is that the python3 UTF-8 does not actually solve the fundamental problem of multiple encoding formats existing. Think: Do you know that the process actually returns UTF-8, or that the file is actually encoded in UTF-8? No, you're just guessing. This puts people in the habit of attempting to turn everything into UTF-8 which could happen automatically and not require so much boilerplate.

On the other end, most programs don't actually care what the data encoding is. They just move it.


> Think: Do you know that the process actually returns UTF-8, or that the file is actually encoded in UTF-8? No, you're just guessing.

Well, no, not really. You go read the docs and try to find out. Most of the time, there is a definitive encoding - if there weren't, a lot more things would be broken. Sometimes, it is not guaranteed, even though de facto that is the case - and this highlights broken interface specifications. When it is truly unknown, you explicitly treat it as raw bytes.

And the good thing about Python 3 is that it forces you to think about this. In Python 2, most of the time, data processing code can be hacked together, and it "just works", right until the point the input happens to include something unanticipated. Like, say, the word "naïve".

> On the other end, most programs don't actually care what the data encoding is. They just move it.

It doesn't necessarily mean that they get to dodge the bullet. In Python 2, if you read data from a file, you get raw bytes, but if you read data from parsed JSON, you get Unicode strings - because JSON itself is guaranteed to be Unicode. Guess what happens when the byte string you've read from the file, and the Unicode string you've read from a JSON HTTP response, are concatenated?


I recently went through a fairly large upgrade from JDK8 to JDK 11 and it was a bit of a pain -- lots of dependencies to update, etc. But very few code changes were required, and the static type system made it pretty clear when the codebase was broken -- it just wouldn't build. It still took my team several weeks.

Migrating from Python 2 to Python 3 is way worse than that -- code changes are required, and because Python is a dynamic language you may not notice bugs until you actually run the code (or even worse, until after you release it to production and some code branch that is rarely invoked somehow gets called...). In other words, the tooling and the type system are not confidence-inspiring and it's really hard to verify that you migrated without breaking stuff.


Will this will make people less likely to use Python in future for some projects? I'm no developer or manager but I figure there'd be a thought in the back of my head thinking "what if we need to rewrite this all again for the next major release"


I don't think it will make people less likely to use Python overall, but for certain categories of projects there are some things that are just done better in other languages. Both Java and Go for example have specific programming language constructs and contracts to ensure that codebases can move from version to version easily and I do not feel like there is a dynamic/untyped language that really provides that same level of stability. The ones that do get close to that are the ones that are extremely small in scope and are not in the same class of batteries included language, and because of their small scope they have a general limit of change.

At a certain point this sort of compatibility/forward motion of a codebase through big language revisions is something that has to be designed as part of the language in either being able to break it down into small enough chunks to chew through in pieces (updating a submodule with the updated language without affecting anything else), completely transparent to the code being run through it (this happens for compilers for C for different standards), or to have a version to version automated rewriting mechanism that is so reliable the outcome of the automated tool is not in question (tools like Go's gofmt). Python in my opinion only has partial solutions to all of those answers so it turns into a lot of hand work.

So while there are other languages that may do other things better there are still a class of programs that are very effective to write in Python, and that's plenty enough reason to keep it around. Do not forget that Python 2 was released in 2000 and Python 3 was released almost a decade later. The general time scale makes worrying about the next release for many people, but for people who do they start considering other languages because that's important to them.


What I've been seeing is people using Python 3 for new projects but leaving their older projects on Python 2. As a result of that, Python 2 will continue to be supported internally at a lot of companies, even though it is "officially" end of life. Nobody wants to rewrite the stuff they finished years ago that is now in maintenance mode.


I considered doing this myself, but decided that I didn't want the hassle of having two versions of Python hanging around.

If/when the day comes that using Python 2 isn't realistic, I may go with 3, or I may choose a different language, depending on the project. I'll cross that bridge when I come to it.


You're not gonna have to rewrite it in a few years. Languages only go through the mess of retrofitting unicode once.

Besides Java and Python already discussed, another big mess of a transition was from Qt 4 to Qt 5, where all the strings became unicode.


Rewriting your code once every 20 years is a ferrari problem.


Pyflakes, unit tests, and type annotations will find the vast majority of such bugs in a large program. If you used logging instead of print it’s even easier.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: