Hacker News new | comments | ask | show | jobs | submit login
Why I'm Making Python 2.8 (naftaliharris.com)
241 points by cpeterso on Dec 10, 2016 | hide | past | web | favorite | 379 comments



> And the majority of Python code written does not run under any of the 3.x interpreters. This makes it harder for its users to be productive.

What a load of bollocks. For new projects this only matters if libraries aren't ported, which they are for the most part. For old projects, either you're in a situation where you can spend time porting your code to Python 3, or you don't; but as TFA mentioned pep-404, the writing has been officially on the wall ever since 2011 so at that point you have to admit you did choose to incur tech debt and do nothing about it, so the claimed loss of productivity is on you.

> Unlike 2.7 code, Python 2.8 wouldn't be able to guarantee exact 3.x compatibility, since there are some python scripts that will run under both Python 2.7 and Python 3.x but produce different output, and Python 2.8 chooses the 2.7 behavior in these cases.

What a terrible, terrible situation. Now you'll have "python" code that will neither run on 2.7 nor run compliantly on 3.x. As for the latter, please explain how that will alleviate anything on the following point, since behaviour at runtime will be subtly different:

> adding these remaining Python 3 features would greatly simplify running code targeting Python 3, and allow people to use Python 2.8 to run a mix of Python 2 and 3 code.

I don't know what recourse the PSF has but maybe they should even go all in and defend the "Python" name so as to prevent confusion and stop a potential community fracture. Just call it anything else but "Python 2.8" is not Python.


> I don't know what recourse the PSF has but maybe they should even go all in and defend the "Python" name so as to prevent confusion and stop a potential community fracture. Just call it anything else but "Python 2.8" is not Python.

This! A thousand times! I love open source and free software. I absolutely love the fact that you can fork the code and adapt it to your needs. If you find others who like it great! But please, don't use the Python name! It will create more confusion than help. This fork with a different name is 100% fair in my opinion.


Not only should the PSF intervene to prevent someone else from using the Python name, they legally must intervene if they wish to keep their trademark ( see http://privacyandip.blogspot.com/2011/10/common-questions-wh... ).


contribute, then, instead of complaining:

https://github.com/naftaliharris/python2.8/issues/47


(Replying to the top-ranked comment so that as many people as possible see it)

While I wish Naftali well in his efforts - I have a private Python-derived language myself! - this is not "Python 2.8." For trademark purposes, "Python" is only what is released or endorsed by the PSF.

We have already reached out to Naftali and asked him to change the name of his project and update this blog post accordingly.

Obviously, though, this is someone who cares a lot about Python, so let's be sure not to rain down on him with a lot of scorn; I admire that he was willing to sit down and 'scratch his own itch.'

Source: I am the General Counsel of the PSF.


"I don't mind renaming this project. Any other suggestions for good names? I personally like "Pythonesque (/usr/bin/pesque)" the best so far, thanks @dbohdan! :-)"

  - The Author (who isn't me)
    https://github.com/naftaliharris/python2.8/issues/47#issuecomment-266240525


Scorn? He's doing the CPython maintainer's job for them. He should be universally praised.


>What a load of bollocks. For new projects this only matters if libraries aren't ported, which they are for the most part. For old projects, either you're in a situation where you can spend time porting your code to Python 3, or you don't; but as TFA mentioned pep-404, the writing has been officially on the wall ever since 2011 so at that point you have to admit you did choose to incur tech debt and do nothing about it, so the claimed loss of productivity is on you.

I call BS (to counter your "bollocks").

Whether the "writing was on the wall" or not, doesn't change the fact that people had to actively port their old code if they wanted it to run on 3.

Sometimes that code could run into the tens of thousands (or even millions for large companies) of lines.

And why would they do it (and at a great cost and time effort)? For the marginal improvements Python 3 brings?

The "writing has been on the wall" is not an excuse, it's mostly blackmail ("port or else you wont run on 3, and we'll stop the 2.x line"). And most people didn't (and shouldn't) fall for that.


This argument is a reasonable one and is why we all support IE6 for web dev.

However, at a certain point it is worth your time to move forward instead of doing nothing, you gain a little time savings now and you run into few moments of "Oh @#$%^!!" later. Real world example: You don't bother updating ssl to deal with weak DHE and suddenly chrome users can't see your payment site.

My approach has always been to try to front load the work instead of doing it in crisis mode later. It sorta sucks but that's just how software is right now.


Except Python 3.x is bringing controversial changes and was slower than Python 2.x for several years.

If IE7 was slower than IE6, you can't blame people to not move over.


All changes are controversial. In a large enough group, there's no way to make everybody happy. The question is whether the new arrangement makes more people happy long term. Judging by the rate of Python 3 adoption, it took a long time indeed, but it got there.

As for Python 2... well, there are still people signing petitions for Microsoft to bring back VB6. Last one was this year, I think.


There's no controversial changes in python3? Except if you consider print() controversial but that's so silly it's laughable.

There are however non backwards compatible changes, like unicode by default, IE7 was also non backwards compatible so the comparision still holds. (with the exception that IE had a compatibility mode if you sent some magic http headers)


It can be silly, but that was one of the reason I picked Ruby over Python 5 years ago for a project. I felt at the time, Python is awesome, however they are taking a weird path.


Do you still feel that way or anything changed?


As a different person who made the same choice for the same reason at the same time I think I was very right then to avoid the whole mess.

It seems fine now, but there is nothing, as far as server side application coding is concerned, that would make me want to switch from Ruby/Rails.


> it's mostly blackmail ("port or else you wont run on 3, and we'll stop the 2.x line").

Would you also call the RHEL life cycle a blackmail? I'm using version 5 now and the normal support ends in March 2017. My options now are "port or pay extra for extended life cycle or else my RHEL will be without security fixes". And like Python, major RHEL versions break backwards compatibility.


And you went into that RHEL relationship with full information ahead of time. So, your comparison is terrible.


Anyone who started a Python project in the last TEN YEARS had full information ahead of time - pep 3000 came out in April 2006.

Honestly, that 10 years later we're still having this conversation is ridiculous.


>Would you also call the RHEL life cycle a blackmail?

The RHEL life cycle is based on real business needs (and a real business need to balance between newer releases/features and stable environments).

Not on some decree from above that "you should use this new thing".


Yes, and that's why you pay for it. Yet here is somebody complaining about that.

If you e.g. can't be bothered to do continuous integration or automated testing, then you might consider RHEL with it's life cycle to be an acceptable alternative. Which is fine. Just be ready to pay for that service.

Similarly, if you wanted continued Python 2 support, you could have donated time or money towards that goal. I would be surprised if anybody complaining did that. There's just not that much business value in dragging legacy Python further along.


"The RHEL is based on...some decree from above"


We have several large projects that are written in Python. Most of these aee production applications that are critical to what we do, and the others are libraries and tools for internal work. We haven't even started thinking about porting these to python3. We have so many other things to worry about (but fixes, new features, etc) that it's hard to justify the time investment to port these now. I can't imagine we're the only ones in this situation.


You're not, of course. And there are similarly many people running production critical code written in Perl 5 on RedHat 9 or something like that. "If it's not broken, don't touch it" is a wise rule to follow for that kind of stuff.

But to keep it running, you don't really need Python 2.8 with new features, right? You need extended support for Python 2.7 - basically, making sure that it keeps working with updated versions of other software (like OSes), and that bugs are fixed.


>But to keep it running, you don't really need Python 2.8 with new features, right? You need extended support for Python 2.7 - basically, making sure that it keeps working with updated versions of other software (like OSes), and that bugs are fixed.

Those systems are not just sitting there untouched.

Heck, not even 70s COBOL systems are "just sitting there" (they are hooked to newer systems, get new forms, have alterations, etc. all the time), and those Python 2.7 systems have been written 10-15 years before or less.

And they continue to get new subsystems, new features, alterations, etc. In 2.7.

So, yes, people would very much like to get not just "extended support for 2.7" but also the ability to keep running it in newer versions, and be able to take piecemeal adoption of new features to make their life better and eventually organically refactor in their own timeline.


The "writing has been on the wall" is not an excuse, it's mostly blackmail ("port or else you wont run on 3, and we'll stop the 2.x line"). And most people didn't (and shouldn't) fall for that.

So let me get this straight.

1. A bunch of people you've never met and probably have never paid or financially supported,

2. Gave you a high-quality programming language, for free, to use for any purpose you liked,

3. And then when you and they disagreed about the best way forward in a new version, you claimed their refusal to continue supporting and adding new features to the old version for you, for free, essentially forever, constitutes "blackmail" on their part.

Do I have that right?


Let me get this straight:

1) You frame this as some single random individual on HN is the only one that is concerned with the switch.

2) You seem to have missed that companies and individuals that do dislike the switch have contributed to the Python ecosystem, from employing core developers in the past, to creating frameworks, libraries etc that helped Python succeed.

3) You have missed the fact that some (a lot? most?) of the concerned people have actually donated to the PSF through its PayPal donate link (as I've done in the past, and I've used Python since 1998).

4) You seem to think that an open source community project is pretty much "anything goes" and end users be damned. And then the team can complain about "lack of adoption" for the new version.

Do I have those right?


So, do you still think it's "blackmail" when something you were getting for free decides to no longer support the version you like?

Python 3 adoption has been rising for a couple years now as people realize that A) Python 3 is a quite nice language, B) porting to Python 3 is not as hard as people keep claiming it is, and C) Python 2 is going to run out of zero-dollar-cost support one day as the number of people willing to support it without being paid for their trouble diminishes.

If someone does want to commit to supporting Python 2 + backported Python 3 features, they are of course welcome to do so provided they observe the license and trademark terms (not terribly hard to do). But I suspect it won't last very long, at least not as a small-team zero-dollar-cost project. Between Python 3 gaining steam and people staying on 2 in order to avoid work and expense, I just don't think it's going to work out on the kind of decades-long horizon the Python 2 die-hards seem to want.


And this captures the good and the bad of open source all so succinctly.

Here we have a person (the author) who has rejected the path that an open source project has taken, and invested the time and energy to move the source along a path they prefer.

In this particular case, there is a natural constituency of people who share that desire but are unable or unwilling to put in the effort to push the source down the path.

When there is critical mass, that group forks off and begins to bring other people along to the alternate path.

At that point the people who endorsed the change in direction come out in force to yell at these people who aren't doing what they are supposed to and threaten them and implore a higher power to emasculate their effort.

Sometimes that works, sometimes it doesn't. But it always results in massive amounts of confusion when someone new comes to the community and sees these two different paths for the same thing and can't really figure out why they are different.

Further because there is no mechanism for "righting" the ship as it were, the diverging paths lead to a lot of wasted time and effort on everyone's part. This happens to be a Python fork but its happened to window systems, video codecs, graphics libraries, data bases, hell even C compilers.

The nice thing about a Cathedral is that the Pope keeps the Cardinals toeing the one and only line.

Reminds me a lot of the Perl 5 / Perl 6 debates.


> I don't know what recourse the PSF has

They own the Python trademark, so they can make him stop using it.

In fact, I really hope that they do. This project does no good.


The Python Software Foundation has an obligation to defend the trade name. If it fails to exercise it, then it can be denied in the future.

I don't expect litigation because that's not how the community rolls. I expect a polite message from gvr to the author asking to change the name.

If that fails, the PSF has no choice. It's use it or lose it.


> I expect a polite message from gvr to the author asking to change the name.

And he has - https://github.com/naftaliharris/python2.8/issues/47#issueco...


If that fails, the PSF has no choice. It's use it or lose it.

Or license it.


I came here to say exactly this ... you're better off upgrading your systems as you can. There's one point missing above - do you really want to use a version of Python that's maintained by one person and of unknown quality? You're better off staying on 2.7 if you can't afford to upgrade.


Yes totally agree. In the end this won't matter because the momentum of the ecosystem is so great at this point but it just baffles the mind that someone would think this is a good idea, especially with the "can't run on 2.7 or 3.x" situation.


> What a load of bollocks. For new projects this only matters if libraries aren't ported, which they are for the most part.

Except when they aren't. And then what?

I've run into this multiple times. Sometimes there's a branch of the project for 3 that's underway, and I sit and wait. Other times it means dropping the project or committing to reimplementing a library.


> the writing has been officially on the wall ever since 2011 so at that point you have to admit you did choose to incur tech debt and do nothing about it, so the claimed loss of productivity is on you.

When Python 3 was released, it offered Python users a trade: In exchange for a productivity loss (porting your Python 2 code), you'd get a productivity gain (new features in Python 3 and removed cruft). Some projects and companies thought this was a good trade, and have upgraded over the years, and many have not, and haven't. The interpreter I've been working on tries to improve on the terms of that deal for people who have not switched to Python 3.

> What a terrible, terrible situation. Now you'll have "python" code that will neither run on 2.7 nor run compliantly on 3.x.

That's the point, yes. Obviously any interpreter that's backwards compatible with 2.7 but includes new features from 3.x is going to let people write code that doesn't run under 2.7 or 3.x. But what does it matter if your code doesn't run under interpreters that you aren't using and don't intend to use?

> Just call it anything else

I'll change the name.


There's a lot of good names based on Monty Python properties, but I like "Cobra", starting at version 2.8. Ignoring the MP stuff and going for something on the snake theme.


I think after 5 years we should came to the realization that Python 3.x is not the future of Python. You can't blame people to try to find solution.


Except it kinda is the current version to lots of us. I moved to Python as a hobbyist from .net languages and loved the freedom of not having an IDE and working with Linux. The first decent book I read was on Python 3 so I learned Python 3. Lots of us 'newcomers' (not so new in my case) learnt on Python 3, find perfectly good library support in Python 3. In fact the 'old guard', sound a bit like my Dad talking about how old cars or pre decimal currency to me. I just don't find myself hitting problems I can't solve on Python 3, that I could have solved under Python 2. To be fair I have a pretty minimal amount of code in production, but in each case it is not so monolithic that I couldn't have some of it using Python 2 and some using Python 3 or even some other language for that matter.

Python 3 is not only the future but the current version of Python. It is the version kids learn in School (in the UK kids do some CS from the age of 6 or 7, starting on scratch and then normally Python), it is the version colleges teach.

However there are lots of reasons enterprise users may want to use a legacy codebase. It is not like Python 2.7 is about to stop working! When a section needs a major re-write, then consider porting it. I don't see how this is different to any obsolescence problem. I know an enterprise software company that wrote a lot of stuff in VB6. Some of it is still in VB6 and they have to manage everything that means (especially around 64 bit architecture problems), when they do major updates they use .net. How can we be in the technology game and not just except that life moves on!


The fact that there's a huge split in the community over the issue shows just how divisive Python 3 is. That said, 3 > 2 in version number doesn't make it better or more "current" (and I've seen a number of projects where the "latest" version wasn't even the latest - it was often an experimental). Yes, you could do just about everything you need to in Python 3 that you can do in Python 2, except that it can be much more difficult depending on what you're doing. e.g. this guy -> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

I found it's easy enough to add the line:

# -- coding: utf-8 --

to the top of my Python 2 files so I can get UTF-8. That, and Python 2 has the bindings for GTK (which I like to use). Both versions of the language have their usage, and to each his own. There's no sense in bickering about it.


I do think the split is in use-cases. I've worked in film at both small and very large places. You can get by very easily ignoring things like color management and frame rates when you're doing small, homogenous work. As soon as you need to take it seriously, the only real way to handle these things is to tag and/or convert these things at the perimeter so you can reliably handle things internally in a consistent way, then convert on the way back out. Even knowing this upfront and being highly motivated, it can take companies years to transition with pain in the meantime.

Text handling is the same. The thing is, many people just deal with ascii compatible English so they don't realize this is a problem for other people and aren't motivated to change. The reason both sides can't just do their own thing (i.e. Python2) is because of libraries and shared code makes it miserable for people using other character sets (most of the world or any company growing bigger than a certain size).


Indeed. One has to wonder, then, about the controversial nature of Python breaking backwards compatibility for the sake of improvement. People are down on "Python 2.8" for changing the name, but the fact that there is no standard set-in-stone led to these problems to begin with. People want a reliable standard, so are the authors at fault for breaking that implied standard and carrying the name with them? Should they have changed the name to PythonU? Or do we always defer to the author's right to change both the VM and the implied standard at will? It's a human-party-pleasing problem because the changes obviously hurt some people (those having to rewrite libraries, which may be easy or very hard and time consuming) for the sake of helping others (who would have time-bomb buggy programs in 2). shrug


> I found it's easy enough to add the line:

For the record, that's not what anyone's talking about when they mention Python 3's unicode support.


It's worth noting that Microsoft is still keeping VB6 on life support, in a sense. The tooling is not guaranteed to work on modern OSes (and there is a bunch of actual breakage, although community has found workarounds so far).

But the runtime is still supported - in fact, it ships with the OS! If you have any non-ARM version of Windows, up to and including Win10, around, check the file named msvbvm60.dll in C:\Windows\SysWOW64 - that's it ("MS VB VM").

And because it ships as an OS component, the official support policy is the same as the rest of the OS, which is at least 5 years of mainstream support (longer if there's no successor release), and then at least 5 years of extended support. This is even clarified specifically for VB6:

https://msdn.microsoft.com/en-us/vstudio/ms788708.aspx

Since VB6 was first released in 1998, this marks 18 years of continued support to date; and if it's not dropped from the OS within the next 2 years, it has a chance of hitting 30 years...

For what it's worth, PSF also has a fairly generous (especially for a non-commercial OSS project) support policy for Python 2.7 - it had already extended the end-of-life date for it once to 2020:

https://www.python.org/dev/peps/pep-0373/#id2


That's not generosity that's control. They kept maintaining 2.7 because they knew if they didn't someone would definitely have done exactly this.

They are only trying to run out the clock on other people's interest- an effort to kill Python2 so it doesn't evolve.


That's the gamble you took when you chose to go all in on Python3. I personally didn't buy the arguments made and chose to stick with Python2. If I migrate anywhere, it would be to a Python2 compatible fork or something like Go.


> What a terrible, terrible situation. Now you'll have "python" code that will neither run on 2.7 nor run compliantly on 3.x.

I don't say you are wrong, but I am afraid this situation looks "terrible" only to people who do care about 3. If someone doesn't care about it and thinks that he can survive with never porting to 3 or start using it, for those people the situation isn't that terrible...from that perspective, his 2.7 language evolved to next step, and he know that new features can be used if he upgrades from 2.7 to 2.8.


> What a load of bollocks. [Snip] For old projects, either you're in a situation where you can spend time porting your code to Python 3

The majority of Python code is old projects, just like every other established language. This might change in the future as I now see people starting new projects in Python 3, but if your company is older than 5 years old, then there is a good chance that you started with Python 2 simply because at the time of creating your codebase a whole lot of libraries weren't ported to Python 3.


I currently work for a client who has decided to shift away from PHP and towards Python. They had a monolithic PHP app with perhaps 250,000 lines of code. Now we are developing a series of Python apps in the microservices style. We've decided to develop everything as Python 2.7. We are not looking at Python 3.x. There are a few reasons. Some libraries that we want are in Python 2.7. And Amazon only supports 2.7. And we are not wild about Python 3.x's attempt to imitate a classical object oriented style.

We would look very closely at a Python 2.8, if it existed.


If I was your client I'd be pissed that you decided to rewrite my code into a legacy version of Python. Make no mistake: Python 3 is the future of Python. There will be no version 2.8 and there is no going back to 2.7.

Also, I don't know what you mean by "Amazon only supports 2.7" because boto (the main client for Python) has supported Python 3 for 2 years now. Perhaps you mean Lambda?


+1. Also it's easy to write code that supports both, so if you really need python 2 support right now (surely AWS Lambda python 3 is coming soon, you can already use it unofficially) that's a much better option than being entrenched in python 2 (mainly the string handling is the issue for buggy code that'll run in 2 but not 3).


He means Lambda.


Could you tell us which libraries you want don't support 3.x?

Genuinely curious as i thought nearly all of the main ones were ported now


At this point, if a library I wanted to use didn't support 3.X, I would take that as a giant red flag not to use said library.


Breaking it down to microservices, could you not have some parts as 2.7, like any that need a specific library that does not have Python 3 support, and some in 3. Or for that matter GO, or RUST?


> And we are not wild about Python 3.x's attempt to imitate a classical object oriented style.

Curious. What do you mean by that?


Software typically evolves and grows and changes over decades. There very often isn't any point where you can go "for our next project we choose X". Each new project is a feature using 90% of some existing huge codebase.

"Starting fresh" is something many (most?) companies simply never does, over decades. (And if they attempt it is often an all out disaster..)

2011 is fairly recent in this context, and many popular libraries were not available on Py3 until much more recent than that, even if you have the rare luxury of starting fresh.


Name suggestion: Pythoff.


Pythed-off.


That has been official since 2011, but even today you may need some lib only available in Python 2.x and thus may need to start a new project in 2.7. I started several python projects since 2011 knowing that it was a dead end, but my hands were tied. At the time there was not even a working wsgi spec, and no web frameworks for 3.x, and the first ones has an awful performance (2.7 is bad enough).

Only recently 3.x has become a viable alternative. I for one welcome this 2.8 fork.


well Python 3 itself was a version hijack so it might be a little late to complain about others doing the same thing.


As others have said, maybe this project fixes some actual problems and backports some features from 3, but this isn't "Python". Beyond the fact that Python is a trademark of the Python Software Foundation, Python is more than the language, it's the community and the tools (as with every programming language). So while there are some vocal people that really dislike Python 3 (either in part or wholly), my understanding is that with the planned phase out of Python 2 and Python 2 only receiving bug fixes at this point, much of the industry is transitioning to Python 3 (either currently doing so or planning to) and so it seems relatively fruitless to attempt to build upon Python 2. I personally think the effort put into this would be much better spent making tooling around Python 2 to 3 transformations.

I also think it's pretty irresponsible of the author to call this Python 2.8, because it may cause confusion to developers unfamiliar with the history and come from a tutorial that is still in Python 2 (it does show up on the first page of Google for me). It's also especially irresponsible and hubristic to attempt to make a language that is seemingly compatible with both Python 2 and 3, because 1) I trust that if it was possible Guido and the other developers would have made it, and 2) it can cause significant confusion when code doesn't work when it hits an edge case, and then the whole tooling around it can't be guaranteed to work. The last thing I'd want in my programming language is unaccounted for ambiguity.


> It's also especially irresponsible and hubristic to attempt to make a language that is seemingly compatible with both Python 2 and 3, because 1) I trust that if it was possible Guido and the other developers would have made it

It is possible actually, that's kind of the point! The interpreter I've been working on passes the 2.7 unit tests (i.e. those in Lib/test/), and as well as unit tests for the new features that have been backported from Python 3.

Even if you don't believe me, it's interesting to note that, e.g., while Python 3.0 was being developed, function annotations and keyword-only arguments coexisted with tuple unpacking. I built the code and ran it myself, in fact: https://twitter.com/naftaliharris/status/784421498291310592. Tuple unpacking was actually removed later, introducing the backwards incompatibility after the new functionality had been added. Timeline:

Oct 2006, keyword-only arguments.

Dec 2006, function annotations.

Mar 2007, removing tuple unpacking.

There was also a promising backport of keyword only arguments to CPython 2.6 (!) that was never merged, (http://bugs.python.org/issue1745), due to lack of follow-through.


I don't know about that. It forked a Python compiler, and is fully interoperable with 100% of Python 2 code, and much of Python 3 code. It's even compatible with Python C extensions.

Why isn't it a valid Python compiler?

To me, the whole morass about trying to end-of-life Python 2 is a bit silly. People have gotten emotional about the situation.

On one side, people like Zed Shaw are calling the Python maintainers 'evil' and claiming conspiracy.

On the other side, people are calling companies using Python 2, 'lazy' and claim they're a threat to the ecosystem.

Yet elsewhere, C is still being written in all of its various year-specific formats, and people end up using 'old' versions simply because they join pre-existing projects or need to totally interface with something that's written in an 'old' version.

Python is an extablished language, it's likely that 10 years from now there will still be Python 2 codebases going strong.


That's the thing - people expect core developers (some of them working on the project for free) to keep backporting crap out of the kindness of their heart. It's like asking Microsoft to backport security fixes and .Net features into VB6.


There are some fair points, but unlike C or other languages that have "old" versions, most of the newer versions are compatible with this old code (as in, if you have some C89 code, you can compile it in the newest C compiler. Same with Fortran). This isn't the case with Python 3 (there are breaking changes), and I think it's fair that the Python core developers who, besides Guido, probably work on this for free decide that it's time to end the older, non-compatible version and give ample time for developers to move their codebases, adding bug fixes and security fixes in the mean time until EOL is reached.

My biggest gripe with this project besides calling this Python is that it's seemingly ambiguous with its code compatibility. I don't mind ambiguity in programming languages, but generally the ambiguous cases are explicitly defined with cases to explain them, and I don't see anything of that nature here, only something that vaguely says that if there's something that works in both the Python 2.7 way will be the default. Without defining those it's hard to know what could happen in an edge case and this could introduce specific bugs that don't present themselves immediately but introduce data weirdness because the cases where something may be ambiguous wasn't defined.

In any case, I think that if a company has a really big, maintained code base in Python 2, it's their fault for supporting an older, in-2020-unsupported version of a programming language and the money/developer time spent supporting the codebase could be spent transitioning it to Python 3. I can understand a little more with an open source project because time is more precious and generally that time is donated, but even then most bigger projects (Numpy, Scipy, Django) have moved to Python 2/3 compatibility so unless the project is gargantuan there's no real excuse besides the project is not maintained.


They transitioned with a lot of community support and did sooner the course of years. It wasn't easy to support 2/3 out of the box. For a normal company, staying on 2 is like keeping technical debt and we all know how companies loathe to allow weeks for major refactoring when the gains aren't immediately visible. My company switched to using puppy and eschewing c extensions before we supported python 3.


> much of the industry is transitioning to Python 3 (either currently doing so or planning to)

Except it isn't.


As I said, it's my understanding working with Python as well as seeing what others work with in the community and so it could be wrong. Do you have data to back up that people aren't?


Last time I checked, Google's tools (like the Android SDK) require python2.


According to what data?


I make software that people can write plugins for in Python. After months, years of struggle we finally dropped support for Python 2 because our small team could not bear the overhead of maintaining two bindings. We work a lot with researchers in signal processing domain and we have hard time as it is to get people to use Python 3. Please, do not put obsolete software on life support.


If you have several large software products rolled out and churning away at hundreds of customer sites, moving from Python 2.x all the way to 2.7 alone is a slow and tedious process of tests and deliberations. And we're still not talking about going all the way to 3.x which breaks things in even more new and exciting ways.

So scoff all you want, but Python 2.x isn't going away that soon.


If moving to a 2010 version of your programming language is slow and tedious you're doing something very wrong.

In the Java world (conservative and slow-moving) JRE 7 (2011) is considered the absolute minimum, and if you're not targeting JRE 8 (2014) you have to have a very good reason.


I think this is an inherent problem with dynamically typed languages. There are no reliable refactoring tools so even the slightest non compatible change can be lurking anywhere for years. And with things like meta programming, monkey patching and relying on private members, even changes that are supposed to be backwards compatible might end up to not be.


Try to move with half a million LoC of dynamically typed code with a handful of developers with not a single update breaking for any customer and I will be impressed.


> If moving to a 2010 version of your programming language is slow and tedious you're doing something very wrong.

You mean like, dealing with strings and Unicode? That's usually the case why people have trouble migrating to Python 3.


The 2010 version he's referring to is Python 2.7.


Obsolete is a funny word to use. In this case, it would mean that Python 2 is in good working order, but is no longer wanted. That's bound for a flame war, because:

- There is a community that wants it (largely enterprise).

- The Python team does not want it.

A less controversial word is deprecated - the Python team is discouraging use of Python 2, but not prohibiting it's use or development. That's fair, and if you read this page:

https://wiki.python.org/moin/Python2orPython3

they are not very opinionated about it, largely saying "Use 3, unless you can't, then use 2 and start trying to migrate, unless you can't, then just use 2."

I will say, not to give somebody a bad day but, 2.8 seems like a bad idea. Currently python's development has still largely been a straight line, which is good for transitioning, but 2.8 would cause a fork. It would give a lot of people a short-term win for a long-term lose. Better not to tempt people.


Obsolete was a wrong word to use, I admit that. But from an integrator's perspective supporting both versions is a mess. The problem is that the interpreter has the same name (python), the libraries export the same symbols (well, same names, different signatures for extra fun) etc.

Like you said, Python team sees the 3.x series as the successor AND as a replacement for Python 2.x. They were never meant to exist one beside the other (or, there was no thought put into this before the release).

From my perspective, giving people the choice between 2 or 3 will only give us problems down the road, which is why I vehemently discourage it.


And yet a transition period is needed.

I wonder what about this made this difficult. Was it because it's a language interpreter? Libraries have this problem sometimes, but not as much. (I never hear of issues with gstreamer between 0.10 and 1.0, for example.) Maybe it was just that a binary called python existed? Maybe we should have just said "screw it, python means python2, end of story."

Don't know. What would you have preferred?


Well, in my ideal world maintainers would have put all possible effort to porting libraries to python 3 and put python 2 versions into legacy mode (e.g.: security updates, fork it if you want to continue on the 2-branch).

In my field what seemed to keep people on python 2 for a long time was numpy or scipy (or both, I do not remember which) which did not get a 3 upgrade for a long time.

Either that, or just call it something different, kind of like perl6. There is no perl6 distribution shipping a perl library or some perl.dll that clashes with perl5.


>Please, do not put obsolete software on life support.

Regarding adoption, it's 3.0 that's obsolete, and 2.7 that's vibrant. Even for new code (they conveniently only count totally greenfield projects, but most new code is written in fact to work with established 2.x codebases under Python 2, not as a totally greenfield project).


You keep saying things like this but numbers don't bear that out. Since we switch to Python 3.5 for new code, I can barely tolerate working in 2.7 now. It went from feeling "vibrant" to feeling "OMG this is legacy" in about a week. I would never voluntarily go back.


>You keep saying things like this but numbers don't bear that out.

What numbers? All the numbers I've seen -- official numbers from PyPY etc) tell otherwise.



There is a clear selection bias here, which is revealed in the first response (unless that was what you're pointing people towards). IDEs are a lot more common on Windows, which has the best adoption rate. The data from the two sources in that comment point towards a large majority of users still using 2.7, which agrees with my experience (which is in the scientific community). The number one reason being that there is no incentive. There is a lot of that attitude is common, "if it isn't broken, don't fix it". At this point we need to recognize that python2.7 isn't going to die anytime soon, unless there is a drastic change.


While the claim of selection bias in JetBrains' survey may be argued, it is still a valid data source, which is what the gp was asking for.

Also, your anecdotal data is arguably biased as well.

My take is that many sources, including the ones linked to in the tweet's replies point to solid growth in Python 3 adoption. Python3 might not have overtaken Python 2 overall, but it's very far from being "dead".


>While the claim of selection bias in JetBrains' survey may be argued, it is still a valid data source, which is what the gp was asking for.

No, I asked for a representative data source (representative was implied: a biased data source is as good as no data source at all).


>My take is that many sources, including the ones linked to in the tweet's replies point to solid growth in Python 3 adoption. Python3 might not have overtaken Python 2 overall, but it's very far from being "dead".

I wasn't making the point that Python 2 is dead, in fact quite the opposite. I'm saying that with this many users the adoption rate is too slow. There is no real reason for people to switch. Until there is that incentive Python 2 will not die.


Citation please.



> Please, do not put obsolete software on life support.

Downvoting because Python 2 is anything but obsolete. People still love using it.


I am glad that you explain why you downvote, but I disagree.

The fact that people love and use something doesn't mean it cannot be obsolete.

At work, I care about more than 35 years old software. It is obsolete (it's written in mainframe SAS with 3270 green screens and some assembly), but people still love using it, mainly because there is no good alternative and it does the job very well.


What makes something obsolete in your eyes then? Just because some people want A to replace B, that makes B obsolete?

For reference, Oxford dictionaries define (..."define"? are multiple dictionaries involved here?) "obsolete" as:

1. no longer produced or used; out of date.

Clearly Python 2.7 is in widespread use, and version 2.7.12 came out just a few months ago, so it's neither "no longer produced" nor "no longer used" nor "out of date"...


Python 2.7 is outdated by Python 3.5. The fact that there is a bugfix release doesn't change that.

I mean look at other things. You can still program in C 89 or FORTRAN 77 or COBOL 74 (and no doubt there is somebody still supporting compilers and runtimes for those), but they are all obsolete standards.

Addendum: I think for standards like programming language semantics (which in case of Python is directly embodied in the C implementation), "obsolete" means there is a new standard by some official body (say, the developer of the old standard) that addresses shortcomings of the old standard. So "out of date" is the fitting equivalent of "obsolete" from the Oxford definition.


>Python 2.7 is outdated by Python 3.5. The fact that there is a bugfix release doesn't change that.

That's just what the lead project team declared. Not what the user base asked for or wants.

>You can still program in C 89 or FORTRAN 77 or COBOL 74 (and no doubt there is somebody still supporting compilers and runtimes for those), but they are all obsolete standards.

That's because people stopped using them organically. That's not the case with Python 2 -- Python 3 was declared "the new hotness" with a decree from above.

It's like as if the W3C comes out with some incompatible HTML NG on their own and says that HMTL 5 is "end of line", giving billions of webpages the middle finger.

Even worse, it's also as if HTML NG only had some marginal improvements over HTML 5, and was otherwise the same.


> That's just what the lead project team declared. Not what the user base asked for or wants.

You think they are doing it just for kicks? There are no issues with Python 2? They are also part of the user base, and they did it for a reason.

> That's because people stopped using them organically. That's not the case with Python 2 -- Python 3 was declared "the new hotness" with a decree from above.

Well, I for instance stopped using Python 2 when Python 3 came out, if it was possible for me to do so (I had all the libraries I needed). I understand that many people can't do it, but people are organically moving from Python 2 to Python 3, not the other way around.

Nobody is really forcing you to not use Python 2, just as nobody is forcing you not to use FORTRAN 77 or COBOL 74. It's just that the language will not evolve anymore, and as far as runtime goes, you will be on your own eventually.

> It's like as if the W3C comes out with some incompatible HTML NG on their own and says that HMTL 5 is "end of line", giving billions of webpages the middle finger.

I think I already addressed this in my other comments.


>You think they are doing it just for kicks?

Yes. From a false sense of "we know better than you what's good for you". And also from not being connected to actual business and end user needs.

>There are no issues with Python 2?

That's irrelevant. There are issues with Python 3. Besides, the issues that Python 3 fixed over 2 are marginal at best and most could be retroffited to 2.x (as this 2.8 release proves). Nothing earth shattering to make the transition worth it.

>Nobody is really forcing you to not use Python 2, just as nobody is forcing you not to use FORTRAN 77 or COBOL 74. It's just that the language will not evolve anymore, and as far as runtime goes, you will be on your own eventually.

It's more likely that Python will suffer from people moving to other languages (and already, first Rails and then JS have won the server side over Python big time, and JS looks poised to be more general use too), than that anything good will comes out of this "you're free to use 2.x, it just wont be updated anymore".


> And also from not being connected to actual business and end user needs.

I am not really personally bothered by Python 3 being incompatible (with that one Jython exception that I already mentioned). But I would like to point out the comment https://news.ycombinator.com/item?id=13146127, I think you're the one who is wrong here.

> It's more likely that Python will suffer from people moving to other languages

Unlikely. Rails are probably going out of fashion. Javascript is a terrible language, which only saving grace is a decent support in browsers. I am not sure for what other reason, choosing a language today, I would choose Javascript over Python 3.

So people moving from Python 2 are most likely to end up with Python 3, I don't really see compelling alternative for them (unless they are going to something more functional like Clojure or Haskell or Scala, but that's entirely different discussion; for example I like Python a lot but I feel pure functional is where the future is, I find the imperative programming quite annoying these days, I would prefer Haskell, but frankly, I am not nearly as productive in it as I am in Python, because Python's focus on usability is very hard to match by any language).

On the other hand, I think Python 3 will actually gain in science and data analysis thanks to things like @ operator for matrix multiplication.


Go and Julia are already reaping the benefits.

I see no future for Python be that 2 or 3. It's not great at anything, but projects a veneer of friendliness (that one should quickly outgrow) on top of a pile of bad implementation decisions and terrible design.

Its popularity is based on superficial attributes rather than solid foundations. Eventually, the entire ecosystem will collapse and the masses will flood to the next attractor.


Julia can be a good competitor to Python, but far in the future. Now it's just not there yet.

Go is interesting, but really a different (and perhaps smaller) use case. What, for instance, I do in Python? That little one-off script that converts one thing to another or calculates something - I am not sure why I would even bother thinking about Go.

I have no doubt that at some point, Python will be replaced by something. But I don't think it will be any of the languages that are currently in widespread use. Heck, C is also not based on solid foundations (I mean like type theory or something), and it wasn't fully replaced yet.


>C is also not based on solid foundations

C is based on the solidest possible foundation: the actual hardware CPU.


The C abstract machine doesn't have that much in common with any actual hardware. I suppose it sort-of resembled the PDP-11 once.


C does not define an abstract machine. That's the runtime implementation's job.


What's wrong with Julia? 1.0 should be coming out in a bit less than a year.


> You can still program in C 89 or FORTRAN 77 or COBOL 74 (and no doubt there is somebody still supporting compilers and runtimes for those), but they are all obsolete standards.

The situation with C89 and Fortran 77 is completely different than what you see today with Python 2 vs Python 3. For 99.9999% of C89 and Fortran 77 code you can build the old code with new C and Fortran compilers and use it from today's standards. You can take a piece of code written 30 years ago, recompile it and it usually works.


I don't disagree it is different. My argument was that a new release of compiler for an obsolete standard doesn't cause the standard not to be obsolete.

I am pretty sure there is theoretical way to run Python 2 code alongside Python 3 code, but they simply decided it's not worth the effort.

Edit: I think historically Python 2/3 divide is more akin to Maclisp/Common Lisp divide, but in the latter, the situation was even more complicated. But I doubt you can just take Maclisp code and run it on Common Lisp implementation, despite that fact that it was meant as a successor.


Thanks for bringing up Fortran 77. Lots of F77 code is in use today through R and SciPy bindings etc. -- millions of users every day, thousands of compilations every day. Just because new code is not written in it, it is very much in use, nobody wants to rewrite solid code that has stood the test of time.

The word you are looking for is "deprecated", not obsolete.


> Addendum: I think for standards like programming language semantics (which in case of Python is directly embodied in the C implementation), "obsolete" means there is a new standard by some official body (say, the developer of the old standard) that addresses shortcomings of the old standard.

Really? So even if no one ever uses it, it still renders the old one obsolete?!


> So even if no one ever uses it, it still renders the old one obsolete?!

This is a strawman, because I don't think this ever happens (feel free to give an example). There will always be people who try to use new standard; they may abandon it later, but they will at least try to use it.

In any case, this is not really relevant to Python 3, which is used plenty and more and more every day.

And your insistence that Python 2.7 is not dead really reminds me of this sketch: https://www.youtube.com/watch?v=npjOSLCR2hE


>> So even if no one ever uses it, it still renders the old one obsolete?!

> This is a strawman, because I don't think this ever happens (feel free to give an example). There will always be people who try to use new standard; they may abandon it later, but they will at least try to use it.

...I thought it was obvious I didn't mean the case where literally NO ONE was using it, but apparently it wasn't. Sorry. My point was, if it doesn't catch on, then does it still render what came before it obsolete? Is it only about time and whether it fixes some things from before? Not about whether it's actually used, or whether it introduces other problems, or whether the previous technology is still in widespread use, or a million other factors? Really?

As for Python 3 being used and more every day, yes, I never claimed it was obsolete or dead or anything else. I'm just saying Python 2.7 is being used too, and hence it's not obsolete either as you seem to think. You claimed it was, so I asked for your definition of the term. You're rejecting the standard one and you still haven't given me one that you're willing to apply to things other than Python. Not to mention I don't see why software deserves special treatment for the word's definition here.


I don't think this is going to be a productive discussion, so this is my last comment on the matter.

I already gave you a definition of what it means to be obsolete for standards (such as specifications of programming languages).

In general, an old standard will become obsolete once the new standard (that is supposed to replace it) is finalized (for example, in RFCs, they explicitly say that). At that point, there are probably no serious users of the new standard yet, so the actual usage doesn't matter.

Of course a standard can be de facto rejected by people abandoning it instead of accepting it. Then usually there will either be another standard that obsoletes the old one again (as was a case with XHTML and HTML 5), or people will move on entirely to something else; in either case, the old standard will remain obsolete.


You should be aware that trying to badger people with strict adherence to an arbitrarily-chosen definition of a term as a way to avoid countering their arguments does not make you look intelligent, does not make you look well-qualified to argue the topic, and does not make you look like you're winning the argument. Resorting to technical haranguing about the definition of a term typically, in fact, gives the appearance of someone who does not have an argument to make and is searching for any way to try to salvage a declaration of victory.

Just so you know.


I'm not trying to make anything or anyone "look" any particular way. I'm trying to present an argument against the undeserved misuse of a label that carries a negative connotation and that is carelessly thrown around in this industry far too much in order to dismiss technologies people don't consider smoking hot enough for their tastes.

How in the world was my definition "arbitrarily-chosen"? I literally chose definition #1 on Oxford English dictionaries. That's arbitrary?!


[flagged]


> write so well

I disagree. He uses complicated words where simple ones would do and other words that are redundant. For example, read his sentence after removing "typically, in fact,". It conveys the same meaning but sounds less pretentious.


I thought the two additional "o"s that you missed indicated sarcasm.


I'd argue Python 2.7 counts as no longer produced. The 2.7.x releases with their bugfixes are akin to an electronics company still honoring the warranty of tape recorders and still repairing them. That doesn't mean tape recorders are not obsolete, especially since the company is not making them anymore. I consider the parallel 'making software' to be the process of

    feature proposal -> patch -> review -> merge


In your eyes is anything that isn't getting more and more features added every few months necessarily obsolete? Can't something just become mature and fulfill its goals at some point? Do you consider T-shirts to be obsolete too? If they kept adding more and more attachments ("features") to your clothes every few months to prevent them from becoming "obsolete" you'd be walking around in really heavy clothing...


Your comment is particularly apt as the programming language du jour is driven by fashion, not technology. We could still be using COBOL and be just as productive churning our CRUD apps as we are with the latest JS frameworks today... But one's hot and one's not.


The reason COBOL is obsolete is because there are no useful programs which are easiest to express in COBOL anymore, unless you are already using COBOL. It us truly obsolete in a way unrelated to fashion. Even if somebody were to supply the tooling, writing apps in COBOL would not be as productive as using a more modern language. It just lacks the expressiveness.

For comparison, C is not obsolete because a lot of useful programs are easiest to express in C still. I'd argue that C is obsolete for App development too, but there will be people who disagree with that.


Tape recorders are not still being made; verdict: obsolete

T-shirts are still being made; verdict: not obsolete

Python 2 is not still being made; verdict: obsolete


This is just wrong. For example, the "best" recording microphone (to many artists), the U67, can no longer be made because the parts aren't available anymore. Yet it is the most popular mic, and 100% not obsolete.

Similarly, Python 2 might not be made anymore, but it is used everywhere, and people are making new things with it. So...it's also not obsolete.


That's not for the manufacturer to decide. Python 3 is like Coca Cola declaring that the New Coke is all people should drink, and stopping production of classic coke.

Python 2 is still being "worn" by millions of programmers, and is what runs in the biggest installations. This includes new code written for those installations, that it's written to run in the same 2.x environment.


> That's not for the manufacturer to decide. Python 3 is like Coca Cola declaring that the New Coke is all people should drink, and stopping production of classic coke.

That is within the rights of a manufacturer though. Coca Cola can continue to produce classic coke because it isn't any more or less complicated to produce than New Coke. The PSF's opinion is the new features they develop are best developed on top of the core changes in Python3, and that adding new features to Python2 is too expensive to maintain in addition to Python3. I feel like the cases are too different to work.


>That is within the rights of a manufacturer though

Sure. Still bad for the consumers who want the classic Coke though.


Python 2 is still being "made" in the sense you consider T-shirts are still being made, though. it's still provided for download, and people are downloading it and using it, and even using it for new things. Heck, it's even getting bugfixes which is a plus. It's just not getting features added, and it happens to be software so reproducing it happens to be trivial compared to "hardware" like clothing.

So, do your comparisons correctly. No matter how much you insist, Python 2 just isn't dead (or obsolete, etc.). Lack of new features doesn't imply obsolete.


You can still download the operating system for an Amiga, yet the Amiga is obsolete (despite diehards still using 31 year old computers).

Python 2 is no longer being actively developed. It receives bug fixes, and even those are scheduled to stop before too much longer. The fact that people are using Python 2 means that it's not dead, but that doesn't mean it isn't obsolete.

Think of it this way. You need a library to solve some problem, and you find one on Github. The project hasn't received any major updates in 3 years. Some people have submitted pull requests, and a few of them have even been merged, but it's clear that the maintainers are focused on other projects these days. Is that an indication that this project is simply mature and no further work is required? Or could it indicate that the maintainers want to work on other things, and this is not a priority for them any longer?

Saying that Python 2 is obsolete isn't an insult. Python 2 is popular, and loved by many, many people. It has been adopted as a teaching language by many schools, has inspired multitudes of people to learn to code, and has achieved prominence in data science, machine learning, and scientific computing. It also happens to be at the end of its lifecycle, development has moved on to Python 3, and the developers don't have much interest in maintaining Python 2 any longer.

That doesn't diminish the accomplishments of Python 2 or the people who love it. However, it does mean that the label fits.


>You can still download the operating system for an Amiga, yet the Amiga is obsolete (despite diehards still using 31 year old computers)

That's because nobody (very few) use the Amiga.

On the contrary, very many, much more than use Python 3, use Python 2.

So, it's not like the Amiga at all.

It's more like as if e.g. Apple suddenly decided to stop producing laptops because "iPads are the future", and forcing these down everybody's throat.


Submitted for your approval: https://vimeo.com/15365268

(No, it's not related to the matter at hand, and certainly should not be taken as an attempt to further complicate this not especially fruitful wrangle. But it's a goodie and you reminded me of it so I figured I'd share.)


That's wrong. It's not if something is being made, it's if something is being used. Nothing is obsolete that's still doing a job.


Well, in some respects T-shirts made in the 80s are obsolete, even though functionally they still work. Fashion changes, materials change, cuts change, etc.

You could still wear them today but you'd be working against today's "protocols".


Actually 80s t-shirts are very much in fashion.


Ok, replace that with 60s or 50s or any other period from which clothing is considered old-fashioned, even for hipsters.

Clothes are "deprecated" or "obsoleted", just like everythihg else.


People in China were, until just a few years ago, still using Windows XP, too, though.


Yeah, so for them it wasn't obsolete. For other communities it was. I don't see the problem. Obviously obsoleteness (word?) isn't a property of the product; it depends on the context (how much it's used, what else is available, how much the alternatives are used, etc... however you want to weight them).


There is nothing bad in making a newer version of an older language. But I agree they must use a more distinctive name.


> "Please, do not put obsolete software on life support."

The vast majority of current development out there is in python 2.x. A small minority use Python 3.x. How on earth does this make Python 2.x obsolete?

This is like saying Perl 5 is obsolete, just because Perl 6 is out and completely ignore the realities of the real world use of the products.


Please back your claims with data.


There are numerous metrics out there showing that Python 2.7 is far more ubiquitous than 3.x

Here is one recent one: http://www.randalolson.com/2016/09/03/python-2-7-still-reign...

If you have counter statistics showing that Python 3.x is more popular than 2.7 I would very much like to see them.


Nice to break down FIVE 3.x versions vs TWO 2.x versions. Stack them up and let's count again.


I agree, I also wish he would give actual numbers, rather than points on log chart, which are very hard to estimate by eye.

But to be fair, even with that, I doubt you would get more than 30% of Python 3 users. Which is kinda in line with other surveys, such as the one from JetBrains. (It's probably a good guess that users of Python applications are even more conservative in upgrading than developers of Python applications.)


ok let's do that, using the first graph for Overall Python Downloads:

2.7: 10 million (M)

2.6: 0.5 M

---

3.5: 1 M

3.4: 0.750 M

3.3: 0.05 M

3.2 and 3.1 too low to make a difference.

-------

So totals:

Python 2.x: 10.5 M

Python 3.x: 1.8 M

---

Python 2.x is waaaaay ahead over 3.x


The worst thing about open source is that people can do stupid stuff with your software.

If you're going to create this abomination, at least do us all a favour and DON'T call it Python. Call it Retardython or something. I don't want to imagine people coming into the official support channels and claiming they are using "Python 2.8", then other people lecturing them about what that software really is, etc. Sounds like a horrible waste of time. (Source: I spend many hours a week helping fellow Python users.)


Python is trademarked by the Python Software foundation. IANAL but pretty sure that means he doesn't get to call it Python 2.8


Only if the PSF does something about it...


If they don't protect their trademark they open themselves up to what is essentially loss of the trademark in the US.

They could license it to Python 2.8 for free if they want to, though it seems unlikely.


Very good points, but you should try to keep a more civil tone.


We can do without the tone policing.


"DON'T call it Python" They shouldn't have called it 'Python' 3 in the first place when it's practically backwards incompatible


Hence the major version number bump...


Python programmers and companies with python code should spend the time and effort to move to python 3 instead of spending that time and effort to backport stuff to python 2 because python 2 is deprecated and the future is python 3. Python 3 I think people and businesses with python 2 code would be better off moving their code bases to python 3 instead of doing things like this.


Companies with Python code are probably better off keeping their working, tested code than switching to an incompatible interpreter and set of libraries which among other things will print "b'Hello',b'World'" into their mission critical CSV files.

Yes, the built in csv module really does that in Python 3.


Yes, the built in csv module really does that in Python 3.

If you pass it bytes, yes, it does. If you pass it strings, no, it doesn't.

If what you pass to the built-in CSV writer is not a string, the CSV writer will call str() to get a string representation it can write out. The string representation of a bytes object includes the 'b' prefix.

Meanwhile, you discovered your bug: you were treating bytes as text, which is likely to blow up on you sooner or later, and thanks to how Python now handles text, it blew up on you immediately as a way to remind you not to treat bytes as text.

What you probably think you want is for the CSV writer to realize it got a bytes object and, instead of calling str(), call its decode() method to get text it can write. But that is once again a dangerous operation, and sort of the whole point of Python 3's text changes is it won't let you get away with that stuff anymore.


I had this problem.

It doesn't "blow up". If it blew up and retired, I would have seen the problem. The problem, like several python 2/3 incompatibilities, is that Python 3 merrily did something different, without telling anyone, until eventually we track down what has changed. I spent quite a while on this very bug myself, and it, along with others, persuaded me to switch to a different language (serious I know, but I was just getting annoyed with python's general loose dynamic nature, in combination with the python 2/3 changes.)


It's unreasonable to say "merrily did something different, without telling anyone" when fixing the string implementation was a significant reason for creating backwards-incompatible Python3 in the first place.

It's not a bug, it's a fix for an architectural error in Python 2, and it was quite well announced at the time: https://docs.python.org/3.0/whatsnew/3.0.html


Fixing the string mess in Py2 was a good thing.

But the fact that the same code now silently does the always wrong thing in Py3 wrt CSV is clearly a bug.

Actually, the design defect here is calling str() on everything, and assuming that the output is sensible for CSV. It may be a decent rule of thumb, but it clearly does not apply to bytes. Given the likelihood that someone might mistakenly use bytes as a string (for example, because they're porting a legacy Py2 codebase), this should be a hard error, immediately reported as such, and not just a silent behavior change.


Except if you make an exception for bytes, what about other types that might get passed into a CSV writer, whose __str__ is something "wrong" for CSV purposes? Do they also get auto-detected? Do we add a new __csv__() method just for when outputting to CSV (since it might not be "wrong" for other output formats)? Or do we ditch str()-ifying altogether, but then add back in a bunch of special cases for numeric types and other things where str() is "the right thing"?

Or do we say "CSV outputs strings, whatever is the string representation of what you passed in is what gets written out", and trust people to figure out when they're working with something that has a "wrong" string representation for their use case?

Because remember: the whole underlying cause of this was treating a dangerously non-string value as a string. Those bytes objects should have been decoded to strings long before reaching the CSV writer. Python 3 does raise more and louder exceptions when you pass bytes to things that expect strings, but the CSV writer isn't a thing that expects strings; it expects things that have a string representation, and several common use cases get much more difficult if you change that to force every user to explicitly do throwaway casts to string in the name of protecting people who keep insisting on writing dangerous "I'll treat bytes as string until it breaks, and then complain that the language did the wrong thing, not me" code.


`bytes` is plainly special case for historical reasons here - it's something that is not a string, but that so many people assume to be a string.

So yeah, I would be fine with making an exception for it (and providing some kind of option to disable that exception, for that incredibly rare case where someone really does need b"foo" in their CSV output).

And then in 5 years, flip the default of that switch, and deprecate it. In another 5, remove it entirely.

Also, note that raising an error in this case is not placating the people who insist on using bytes as strings. Quite the opposite - it very loudly and unambiguously tells them that they're wrong, and how exactly they're wrong.


The entire problem, though, is people assuming bytes and strings are interchangeable. Anything which allows that assumption to go unquestioned, or without program-wrecking consequences, leads right back to where we were. And the "phase it out" model doesn't work; you proposed a ten-year phase-out, but in ten years people are just going to say "we never updated our code, we're not ready, keep it this way another ten years and we'll think about fixing our code". The only thing that works is actively breaking people's programs when they try to intermix bytes and strings.


> The only thing that works is actively breaking people's programs when they try to intermix bytes and strings.

Um, this is exactly what I proposed above!

"this should be a hard error, immediately reported as such, and not just a silent behavior change."

What I'm asking for is that csv writer raises an exception if it sees bytes anywhere by default. The problem is that right now, it doesn't! It just gives you "incorrect" output, that might go undetected for a long time.


If it's a bug, why not submit a fix?

I think the Python maintainers fundamentally disagree with you on that point, but a fix submission would settle it unambiguously.


But people that have that csv issue have deeper lurker encoding issues they're probably not aware of.

Sure, don't change the stuff that works and cause yourself unnecessary pain, but don't blame python 3 for your misencoded data.


Oh we are perfectly well aware of our encoding issues. In NumPy, one of the most important libraries in Python 2 or 3, strings always take one byte per character. And it's not going to change. So the built in csv module needs to support a reasonable behavior. Which it does not.

If a company tries Python 3 and discovers basic things like CSV produce utter gibberish, they would do well to opt out. And they do--in droves.

My data is not misencoded, you see. It's just misunderstood.


>In NumPy, one of the most important libraries in Python 2 or 3, strings always take one byte per character.

So I won't be able to use NumPy with either of my two native languages. That sounds like a bit of a shortcoming for the majority of the world.


Numpy isn't really used like that though. It's for numerical computation. There might be cases for putting text in there but you can always keep it locally and map it to an int that you use in numpy for mapping (I do that in places).


> So the built in csv module needs to support a reasonable behavior. Which it does not.

It could be argued that the csv module's behaviour is reasonable, and NumPy's isn't. (I'm not 100% sure about all the details of this issue) Hopefully, NumPy will change it's behaviour to match Python 3, but if not you could still use the NumPy CSV routines like `loadtxt` or `genfromtxt` [0]. So then this becomes a documentation change to add some warnings to both modules.

> they would do well to opt out. And they do--in droves.

This is simply not true. They would do well to handle strings properly and so avoid bugs in future - something Python 3 actively encourages, and Python 2 obscures. And while I can't speak for every company, our metrics show that our Python 3 code has far less customer issues than Python 2, Perl, or Ruby. Now that's business value. (Edit: I mean it's hard to make the comparison - the Perl code is e.g. older - but we're writing code now, and when the interns add new stuff to the Python 3 codebase, it breaks less. All of them are still actively developed, and the Ruby one is about as old as the Python 3 one).

[0] https://docs.scipy.org/doc/numpy/reference/generated/numpy.l...


I say this as someone who uses the latest version of Python available in every new project or script.

Text encoding issues are absolute garbage in Python 3.x

I fucking hate the way that csv module works with text encodings.

As soon as I can figure out a reliable way to take latin-1 and save it as UTF-8 without breaking everything, I will try to shoehorn in a PR.

Right now, it's fucking awful. My ETL pipeline hates it, I hate it, my boss hates it, and my internal constituents hate it. Because it sucks.

A file I can read in one encoding and write as another should be readable with the encoding I wrote it in. That is not currently the case with the latest version of Python.

And it makes me hate the world.


    with open('some latin-1 file', 'rb) as f:
      text = f.read().decode('latin-1')
    with open('some utf8 file', 'wb') as f:
      f.write(text.encode('utf-8'))
Python 3's string encoding support is super good. I've said it before and I'll say it again: if you use bytes as a string you are Doing It Wrong.

If you use bytes as a string you are Doing It Wrong.

If you use bytes as a string you are Doing It Wrong.


Allow me to rephrase.

I do that operation on a file I get from an API. I know for a fact that the encoding I'm receiving is latin-1.

I run exactly that operation on the file that you wrote out in code.

When I try to read that file back in as UTF-8, I get encoding errors. That does not make for "super good." That makes me want to scream.

I do not have this problem when I use Python 2.7.x


You're definitely making a mistake somewhere, because I just tested it for myself and it worked perfectly fine. I made a latin-1 file, applied the above code with it, and got a correct utf-8 file out. Are you reading the final file back as latin-1? You have to read it as utf-8 of course.


And what happens when you don't get a choice about what strings you are digesting?


I don't follow. What do you mean?

To be perfectly clear: bytes (b'') is not a string. Again: bytes is NOT a string. It is an array of octets, aka bytes, aka unsigned 8 bit integers. NOT characters. NOT a string.

If you are dealing with bytes that are encoded representations of a string, then you have to know what encoding they use to decode them and treat them as strings.


I'm not sure what you mean. If you don't know what the encoding of the input file is you have a problem. As far as I know there are libraries to guess the encoding, but it cannot be determined completely accurate.


I don't see the problem; reading and writing different encoding works fine. The CSV module makes no problems either:

  in_file = open("in.csv", 'r', encoding='latin-1')
  in_csv = csv.reader(in_file)
  out_file = open("out.csv", 'w', encoding='utf-8')
  out_csv = csv.writer(out_file)
  out_csv.writerows(row for row in in_csv)


I have a persistent problem that does almost exactly that.

The resulting file is not readable, and it makes me want to kick puppies and punch kittens.


> It could be argued that the csv module's behaviour is reasonable

I don't see how silently printing a binary literal, if that is indeed what it does, is reasonable. Simply put, b"foo" is not meaningful CSV.

What it should do is 1) raise an exception by default, informing the user that they need to be supplying strings and not bytes, and 2) provide an explicit switch to treat binary data as pass-thru, which would be useful in scenarios where you're just reading a file and dumping it elsewhere, and don't want to spend time decoding and then encoding everything.


The docs say that "[a] row must be an iterable of strings or numbers" [0]. So I guess an exception could be raised. However, the docs do tell a lie; non-strings are accepted and get converted to strings. You can pass any object in which has a string representation - including a byte array. It actually wouldn't be too hard to introduce a check for a bytes field, https://hg.python.org/cpython/file/3.6/Modules/_csv.c#l1227

    + if (PyBytes_Check(field)) {
    +     append_ok = FALSE;
    +     Py_DECREF(field);
    +     PyErr_SetString(PyExc_TypeError, "Field is bytes");
    + }
    else {
This would then raise a TypeError.

I don't think this is the right solution. It seems weird to have a special case because people aren't watching what they're putting in. Garbage in, garbage out, consenting adults and all that.

[0] https://docs.python.org/3/library/csv.html


Yes, "special cases aren't special enough to break the rules".

But "practicality beats purity".

And "errors should never pass silently"!

It's the same kind of thing that leads to safety warning stickers put on products. You may read it and think that it's something so obvious that consenting adults should know better. But then you look at the statistics about how many people did not, and realize that, yeah, a sticker along the lines of "don't stick your finger into a food processor" is actually a good idea. Especially given how cheap it is, and how expensive reattaching fingers is...

Basically, products should be designed around known human weaknesses, and that includes entrenched modes of thinking by past products. It doesn't mean that new products should accommodate those entrenched modes, especially when they lead to other problems. But they should try to detect them, and issue clear and explicit warnings, to guide the person to the proper way of doing things.


I moved to Python and got back into coding specifically to work with csv files. And from the first time I used code copy and pasted from SO or wherever to import and export CSV's with Python 3 I have not found that I am plagued by unreliable behavior and mischievous encoding. Yes there seems to be a type conversion necessary with NumPy, but it simply is not true that Python 3 has an endemic problem with CSV files!

In fact, the reason I chose Python was because I was able to dive so quickly into real problems like this with no problems whatsoever.

This is a minor issue for someone porting from 2 to 3, it is not a problem with 3


Companies using the built in csv module would be better off moving to something like csvparser.

Python's a superlative language but it has a pretty terrible set of included libraries. urllib2 isn't the only library with a superior alternative on pypi. Pretty much all of them do.


"b'Hello',b'World'" this (and 1001 similar)


> Python programmers and companies with python code should spend the time and effort to move to python 3 instead of spending that time and effort to backport stuff to python 2 because python 2 is deprecated and the future is python 3.

I hate Python 3's removal of the (lambda (key, value): blah) tuple unpacking syntax, and the forcing of parentheses for print statements. They might seem minor but they aren't for me. So I'm not at all eager to move to version 3 and don't really see any benefit. Not sure if those who aren't migrating feel the same way, but I wouldn't be surprised if some of them do.

(Edit to address comment below: There are more issues I have with Python 3. It allows more bugs to slip through, for instance. I actually particularly like a comment I just wrote, so I'll link to it here: https://news.ycombinator.com/item?id=13145299 Do note that this was added after the reply below.)


Benefits? Unicode. Async. Extended library. Required keywords. syntax inprovements (lots of m, many more than just the removal of the print statement). Type hinting.


> Benefits?

I meant I don't see any benefits for me, not benefits for other people. I assumed that was clear; sorry if it wasn't.

> Unicode.

Yeah, but some people have still been living without the changes, and it's hardly enough of a reason on its own (for me anyway) when there's other things I hate about the language.

> Async.

It's a nice feature, yeah. I can live without it, as people have for many years. Maybe if I was used to having it around I wouldn't want to go back, but I'm not.

> Extended library.

Cool! I'm not sure what exactly falls under this that I'm supposed to be missing, but pip install has sure been taking care of everything in the blink of an eye in version 2.

> Required keywords.

Cool! I need it about as much as I need a donut.

> syntax inprovements (lots of m, many more than just the removal of the print statement).

nonlocal is literally the only positive one I can think of right now that I'd actually care about. But then again, it comes up maybe 50x less often than the parentheses I have to write for print, or the tuple unpacking that I have to do. So yeah, it's hardly a reason to migrate.

> Type hinting.

Nice to have. I'm living just fine without it. Maybe I'd have migrated if it actually optimized things or did something more useful.


Well, things change, especially in tech. For better or worse, but most of the time for the better.

You should read some changelogs of past python 3 releases. 3.6, for example, has ordered dicts by default. Which is quite convenient when you need to write test testing a small dict with two items for example.

I like driving an old muscle car, most of m look beautiful and bring me everywhere i want. But damn, those new cars changed a lot and are much more comfortable. (But they do break as much ;))


Pretty sure they say NOT to rely on the ordered nature of the new dicts. So definitely not something you want to put in your tests.


The intention is to make the order guaranteed in 3.7 or 3.8, AIUI. There was some desire to prove the new implementation before guaranteeing its behaviours (i.e., in the worst case, if it turned out to be broken, they could revert to the 3.5 code and it would be valid).


No it is in 3.6 already! It used to be not guaranteed.

https://mail.python.org/pipermail/python-dev/2016-September/...

But it is insertion order indeed, not sorting order.


The behaviour is there in CPython, yes. But the language documentation doesn't, deliberately.


People often bring up the unicode thing but tons of Python code is backend glue code that really doesn't care at all about unicode.


So you _hate_ Python 3 because of two changes in syntactic sugar? I would understand if you hated it because of the real breaking changes, but no...

I think you just don't comprehend the multitude of problems that Python 3 fixes by handling strings correctly... Maybe you've never handled Unicode before.


> So you _hate_ Python 3 because of two changes in syntactic sugar?

First of all, the tuple unpacking one is a HUGE readability AND maintainability issue; it's not just syntactic sugar. var[0][1][2] is not only far less readable than the unpacking notation, it doesn't even have the same semantics (doesn't enforce the structure of the tuple).

That means Python 2.7 helped me catch more bugs. Think about that!

Second, no, I just listed the two that irritated me the most every time I tried to switch, because they were the first things that came up by far the earliest and most frequently. Small inconveniences can be amplified through their frequencies. There are lots of things I don't like about it though... division becoming floating point division, having to say list(foo.items()) or list(map(...)) instead of just foo.items(), etc... again, more verbosity and typing for common cases where I really didn't mind the old way. If I wanted imap(), I could've just used imap; they could've just moved that to __builtin__ and made my life easier that way.

By the way -- the lazy nature of map(), etc. also means you catch fewer bugs now. Again, think about that! Just because it looks more efficient, that doesn't mean it's actually better. If there's anything I've learned, it's that even the smallest things tend to come with non-obvious tradeoffs.

Finally, regarding strings: if you look at my earlier comments, yes, I already acknowledged the Unicode changes were for the better. Awesome. I agree. Cool? OK, but there are other things in the language besides Unicode though, and they're not as awesome. I don't spend my entire programming life dealing with Unicode strings, so I care about other things too, and they make my life harder. Simple as that.


If your needs are for static analysis and type checking at compile-time, you are using the wrong language.

I'm a huge lover of Python, but I drop into C# when I need stuff like that. Or Go. Or Rust. Or something.

You're just making bad decisions here.

If you're looking to catch bugs in your code before you test or deploy it, don't use a dynamic language.

Python has never been and probably never will be a language with declarative types and the checking that allows.

Pick a different hammer if that's the nail you need to hit. Don't complain about the hammer you want to use not being a screwdriver.


Regarding tuple unpacking, instead of writing, say:

    def foo((birthname,surname)):
        ...
you can write in Python 3:

    def foo(name):
        birthname, surname = name
        ...
It's not less readable. I also missed it in the beginning, but it's not really a big deal. (I think they couldn't keep the feature because of how '*' is used, but I am not sure.)

Edit: If you have problem with this in lambda expression, just create a named inner function. It's a feature/shortcoming (depends on POV) of Python that you cannot bind variables in an expression. I hope you understand that they couldn't keep the feature in lambdas if they didn't keep it in proper functions.

I am sure if you think about other things, there are good reasons to do it the way Python 3 does it, usually there is a hidden case where things need to be disambiguated (like your list() examples).


I wish I could downvote. I specifically said I hate the tuple unpacking syntax change in lambdas. That's where I used it so much to begin with, not in defs! I obviously can't put statements like that in lambdas, and until now I didn't need to do that to make my code readable. Now I have to name all of my lambdas just to make this syntax work, which is nonsense. It used to be there and it worked perfectly fine.


Why would you want to downvote somebody trying to help you?

In any case, I think I see your problem. You are not the sole user of Python language. There are features that other people like (such as using '*' in unpacking), and so features you like are weighted against their use cases, and a reasonable compromise is made.

And frankly, I think if you like to use lambdas that much, you really want to program in a language where everything is an expression, such as Lisp or Haskell.


> In any case, I think I see your problem. You are not the sole user of Python language.

I'm glad I'm not. Otherwise I probably wouldn't be using it either. Not sure how that is my "problem".

> There are features that other people like (such as using [asterisk] in unpacking), and so features you like are weighted against their use cases, and a reasonable compromise is made.

I like that [asterisk] syntax too.

> And frankly, I think if you like to use lambdas that much, you really want to program in a language where everything is an expression, such as Lisp or Haskell.

Or I could just keep using Python 2.7 which works just fine, and not move to version 3 where I'm not welcome.


> Not sure how that is my "problem".

It's your problem in e.g. where you want list returned by default where Python 3 returns an iterator by default. Why is that useful for many people was already explained.

> I like that [asterisk] syntax too.

Funny, AFAIK it is Python 3 only.. https://www.python.org/dev/peps/pep-3132/

> Or I could just keep using Python 2.7 which works just fine, and not move to version 3 where I'm not welcome.

You are welcome to use Python 3, but - suit yourself. :-)


> Why would you want to downvote somebody trying to help you?

Misreading comments is not helping.


That's why I used the word "trying". Maybe you should read more carefully before you want to accuse others from misreading something. ;-)

I think it's unfair to say that I misread his comment - he doesn't explicitly mention he is aware of the workaround I outlined for the functions, and that he is bothered with lack of tuple unpacking in lambda expressions only, not in ordinary functions.

Regardless, I still think it's quite impolite to downvote somebody who wants to help you and misunderstands you, if they are not e.g. factually incorrect. If you don't actually tell me where I am wrong, I cannot improve my answer. Also, this is not Stackoverflow, where that could be marginally acceptable (I am very strongly against downvoting without explanation).


> Regardless, I still think it's quite impolite to downvote somebody who wants to help you and misunderstands you

Well, I don't consider it a "misunderstanding" when there are literally just 2 things to note in my comment that you're replying to ("lambda" and "tuple unpacking") and you still somehow miss 1 of them. I think it totally deserves a downvote, because it makes me look stupid when you present a reasonable solution to a non-problem and make readers assume I was saying something other than I was, and on top of that I have to waste some 5-10 minutes of my time replying. That's not something I appreciate.

That said, like I said, I never actually downvoted that comment (because I obviously couldn't). So you don't need to worry about the internet points.


If you weren't busy taking things so personally, you could note that I already hinted in my comment on why I made it - I missed the feature of unpacking within function arguments myself at first too, until I realized that unpacking within the body isn't really less readable. (And please - do not waste time replying.)

I admit I don't use lambdas that much, since generator expressions (which is like Python 2.3) they aren't really needed too frequently. And in most cases you're better off using function anyway, because in Python statements are not expressions, as I already also stated. For example, I use print() for debugging frequently and this is tough to insert into lambda. (And even in Haskell I prefer to name subexpressions to lambda syntax.)

> That said, like I said, I never actually downvoted that comment (because I obviously couldn't). So you don't need to worry about the internet points.

I am not worried about internet points (I actually got about 80 of them on this discussion alone, which is frankly ridiculously too much, and in practice, I find that comments I personally find to be the most insightful only rarely get most points), I am just really annoyed when somebody downvotes my comments without any explanation, because I am a very curious person and in most cases it's just a honest misunderstanding, which could be cleared up with, I don't know, actual communication?

And at least two or three other people actually downvoted my original comment, so I would like to use this opportunity to invite them to come forward with an explanation what they found so wrong about it.


Lambdas are not really pythonic these days anyway

  list(map(lambda (some, thing): some + thing, everything))
  # better
  list(some + thing for (some, thing) in everything)
Or, as the parent suggests, just create helper function, preferably one your python environment doesn't need to set up every time your outer function is called

  # okish, "verbose lambda"
  def compute(everything):
    def magic(elem):
      some, thing = elem
      return some + thing
    return list(map(magic, everything))

  # probably better
  def magic(some, thing):
    return some + thing
  def compute(everything):
    return list(magic(some, thing) for (some, thing) in everything)


> Lambdas are not really pythonic these days anyway

Too much nonsense in your comment. Really now? How about you give a realistic example where syntactic sugar doesn't substitute for it? Like what am I supposed to pass to sort(key)? And incidentally, this tuple unpacking problem comes up when sorting frequently... and that's the prime example on their web page (and a realistic one at that) for where you're supposed to use lambdas: https://docs.python.org/3/tutorial/controlflow.html#lambda-e... If you're telling me this isn't Pythonic, you're really just not being sensible.


Yes, Really Now! Your disdain for the answers people are giving you, and your abrasive demeanor tell me I can spend my time better than to discuss this further with you.


> Like what am I supposed to pass to sort(key)? And incidentally, this tuple unpacking problem comes up when sorting frequently...

I suggest you look at namedtuple: https://docs.python.org/3/library/collections.html#collectio...

I think you should watch some Raymond Hettinger's talks, he is discussing many little things like this.


Regarding laziness of map() - lazy is a good default, because you can always make eager out of lazy, but not the other way around. Lazy is also more general, because it can handle both lazy and eager inputs, while eager will always force a lazy input.

This has been the general trend in mainstream languages lately, not just in Python. E.g. in C#, all LINQ operations are lazy. in Java, the new stream API, to be used with lambdas, is lazy.


Notice in said languages they added lazy APIs. They did not remove eager APIs. Python already had imap, ifilter, izip, etc... I already said this and I'll repeat: I would've been just fine if they made those easier to use (e.g. no import). There was no need to change the behavior of existing APIs.


There were generally no map/filter/fold APIs in those languages, eager or lazy.

In cases where the APIs were there, they were generally not as easily accessible (i.e. they were the equivalent of imap etc, with some hoops to jump before you could use them). The new APIs are more straightforward to use.

The reason to change the behavior of an existing API is because the default (i.e. most obvious) API should also be the most flexible, and do the right thing in as many cases as possible. This was not the case with map etc in Py2.

The disadvantage of changing an existing API like that is that it breaks code. But Py3 broke code anyway, so it was a good time to introduce breaks like that for the sake of better defaults.


> There were generally no map/filter/fold APIs in those languages, eager or lazy.

Array.FindAll, Array.Convert, etc. all existed in C# beforehand. Though maybe this is what you meant in the next sentence.

> In cases where the APIs were there, they were generally not as easily accessible (i.e. they were the equivalent of imap etc, with some hoops to jump before you could use them).

This is going on a tangent but LINQ still has hoops to jump through. You have to say "using System.Linq;" at the top if you want to use the new syntax. That's like saying "from itertools import *" and then using imap, which you could've always done.


> Array.FindAll, Array.Convert, etc. all existed in C# beforehand. Though maybe this is what you meant in the next sentence.

Yes, it's what I had in mind. I have to admit that I completely forgot about ConvertAll (and so assumed there was no map).

I think the biggest reason why those weren't all that commonly used in practice, is because in .NET you often deal with opaque collection types (like ICollection<T>, or ReadOnlyCollection<T>, or even custom-made collections pre-generics) that are usually exposed on properties of objects. Since the concrete type is not known, you can't do List.ConvertAll etc.

This, by the way, is another point in the favor of lazy implementations - they don't care about input type, because the output type is always "lazy sequence". Of course, you can have an eager map similarly not care about input, but then what should be the type of its output collection by default? No matter what type you choose, someone will complain that they wanted someone else. Given Python's preference for explicitness, such design would warrant several functions like map_to_list, map_to_tuple, map_to_set etc. But, of course, if you have a lazy map, you might as well just write list(map(...)) etc.

> You have to say "using System.Linq;" at the top if you want to use the new syntax. That's like saying "from itertools import " and then using imap, which you could've always done.

It's a bit different, though. When you import itertools, it brings all those functions into your global namespace. But when you import System.Linq, it only brings one static class into your global namespace; the actual functions are extension methods that only show up on the types to which they are applicable. So the resulting namespace pollution is far less in C#.

There's also the issue of import being generally frowned upon in idiomatic Python, largely because the way conflict resolution works there (silent override). In C#, if you happen to have clashing identifiers from usings, it'll prevent you from using them unqualified, so there's no good reason to avoid it.


> and the forcing of parentheses for print statements

You mean print functions. I love the new change because you can pass "print" around like any other function now, letting you write code like:

  def my_map(data, func):
      for item in data:
          func(item)

  my_map(dataset, insert_into_database)

  # For testing
  my_map(dataset, print)


That seems kinda petty. Surely you can work around that. If they really were killer features that many devs used I doubt they'd have been deprecated and removed in python 3. I do agree that though I wish lambdas could be more useful instead of one liners.


Python programmers and companies with python code should the time and effort delivering new features for their customers instead of spending that time and effort forward porting stuff to python 3 because python 3 isn't delivering enough additional value to justify switching. I think people and businesses with python 2 code would be better off continuing to improve their code bases with new features instead of doing things like this.


What about not spending time and money and just keep using python 2 for software that already uses it. And only use python 3 for new code bases.


This is how you end up still running fixed-format Fortran code in 2016.

If your software is being actively maintained, it's time to move to Python 3.


This is how you keep using those efficient, numerically-stable subroutines written by a smart guy who retired 20 years ago in your new code. Unlike Python, Fortran has managed to add significant new features without breaking old code.


Unlike Python, Fortran hasn't seen an increase in usage and hasn't brought the joy(?) of programming to thousands of new programmers in the last 10 years. So sticking to Fortran or backwards compatibility blindly doesn't solve all the problems, either.

Maintenance is key. Most people don't stick around for 20 years anymore either. I know I'm going to have an easier time finding a new hire for a Python codebase. And he's going to have a far better chance at understanding said codebase. Code which nobody knows how to maintain will hurt us either with a fiendish bug, or limit out growth. So for me, slowly moving away from legacy stuff is good business value in the long run.

Remember, you can never be sure that Fortran code is 100% bug free. The test of time is as good as any other test, but not perfect.


Fair enough. But having worked with Fortran code that was designed in pen on yellow legal pads, typed in, and run on a suite of standard test problems chosen to expose subtle bugs, I have learned to respect an archaic engineering style of which few people are capable nowadays. Sometimes code is essentially "finished;" you'd be amazed at how rarely it has bugs. It should be changed only with care and for good reasons, not because the programming language changed to conform to some new fad.

(FWIW, I'd probably write most new numerical code in Julia rather than Fortran 20xx, and either call into existing Fortran via FFI, or drive it from the command line with some scripting language.)


> Unlike Python, Fortran hasn't seen an increase in usage and hasn't brought the joy(?) of programming to thousands of new programmers in the last 10 years.

SciPy has Fortran code under the hood: https://github.com/scipy/scipy/search?l=FORTRAN&utf8=%E2%9C%...

Both SciPy and NumPy use LAPACK - a Fortran library. SciPy also uses BLAS - another Fortran library.

So every time you praise Python for being useful for scientific work, you're actually praising Fortran and C libraries/modules wrapped in Python.


And the differences between F77 and Fortran 2008 are monumental. It's like they're 2 different languages.


I should add that in my mind this is akin to working around a design flaw instead of refactoring and modernizing and upgrading. The python deva have provided numerous tools one of which is the 2to3 tool for simple things.


People don't seem to be considering the possibility that a stagnant python 2.7 may actually be a reason to like that version of the language. I must admit it is nice to not have those oh-so-keen python developers messing with my favorite language.

However, I recently had a gig working in py3. Apart from screwing up every single print statement for a long time, it was entirely drama-free, and actually pretty great. There really is a difference between living and dead languages I think. 2.7 is the latin of python.


Oh, people are considering that possibility very much. But the notion of Python 2.8, with new features backported from 3.x, is kinda the opposite of that approach, and is very much "developers messing with my favorite language".


why can't print be both a function and a statement...


The print statement has a lot of special magic like '>>' and trailing commas that uglifies the parser.


I don't get this. There's a tiny fraction of Python users who care about the complexity of the language parser. But many hate the print-tax.


Author here. Imagine my surprise when I got back from a day of sightseeing (I'm on vacation in Spain) and saw that this had blown up. I had intended to "release" this project after New Years, after I'd gotten back and a week or two after 3.6 is released [1], and didn't expect this to get picked up since the project has been on Github for over a year (although inactive for much of that time) and since my blog usually doesn't get much traffic.

A lot of people here have strong opinions about the name "Python 2.8". I don't mind changing it, and intend to do so, (https://github.com/naftaliharris/python2.8/issues/47). I picked it initially since when talking with friends about this project it conveyed pretty darn immediately what the project is and does. I'd be very keen to hear people's suggestions for alternate names!

For those of you with 2.7 codebases or projects, I'd be extremely interested in hearing about whether you were able to get this interpreter to run your code. Personally, the biggest challenges I've had so far are with dependencies that check for `sys.version_info[:2] == (2, 7)` as opposed to something like `sys.version_info[0] < 3`. But I'd be very interested in other people's experiences, particularly with larger codebases.

[1] A minor and somewhat pedantic point: The interpreter I've been working on includes PEP 515 (underscores in numeric literals), which is new in 3.6. I didn't think it was right for me to "take credit" for this new feature before it was even out in Python 3.6. Obviously, the real credit for this feature existing (in 3.6 or in any interpreter) goes to the CPython core devs, and especially Georg Brandl.


Regardless of the detractors, I think this is brilliant work. All of the negativity around your project is over politics or subjective opinions based off of individual experience where they are lucky enough to work in environments on the bleeding edge.

Keep up the good work, this could help a lot of people!


Reading a thread from the last time something like that was proposed (although the only change in that proposal was that 2.8 would, on Windows, use the C runtime from a more recent MSVC compiler), I found a relevant argument (https://mail.python.org/pipermail/python-dev/2013-November/1...) against this "Python 2.8" idea:

What if more than one person did that? (paraphrasing Raymond Chen's "What if two programs did this?")

There can be only one official Python 2.8, but since it will never exist (see PEP 404), there can be many unofficial mutually incompatible "Python" 2.8 implementations. Therefore, calling it "Python 2.8" is a bad idea.


I see many people vigorously defending Py3 but I wonder how many of these have a paying-the-bills kind of job. Where you would look at the cost of porting a large project to Py3 and get an answer like half a million USD (easily). Do you go "of course we do that, that money is easily recouped with the added programmer productivity of Py3"? No chance.

So the question is do you want to basically light that money on fire, or just keep your perfectly fine Py2.7 code running and maintained another few years.

More discussions of the monetary value of programming languages please. What is "correct" or "right" isn't all that interesting to many, for good reasons.

Kudos to this project and hope it can set us on a saner migration path to Py3. (Should totally change the name though.)


You can keep your py2.x code running, absolutely; but you cannot expect other people working for free to help you pay your bills.

Honestly, the level of self-entitlement through all 2vs3 threads is staggering.


I don't want the CPython core devs to do anything different and are not angry with them at all. It is their pet and they can do what they want. I fully agree with what you say in that sense.

But I do mind people saying in these discussions "you should all move to Py3 now or you are stupid/evil".

No. There are legitimate reasons for staying with Py2.7 and embracing it.

So I hope "Python 2.8" gets a cool name, perhaps even some funding from a company who wants to keep their Py2.7 code alive and invigorated, and the community part as friends.

Is that self-entitlement in any sense?

All I want is for people who I think have a huge blind spot to stop calling me ignorant for decisons I make about MY code. I totally don't expect core CPython devs help me out though.


> I don't want the CPython core devs to do anything different

It's not just CPython, it's the Python specification of which CPython is the reference implementation. The specification couldn't move forward in significant ways without making some of the changes that came with Python 3.

> All I want is for people who I think have a huge blind spot to stop calling me ignorant

The level of vitriol in 2vs3 threads has been way too high from the start, because people always hate change. You are obviously free to do what you want, you always were. Don't mind the haters, but please don't be one either.


Thanks for that, well said.


> So I hope "Python 2.8" gets a cool name, perhaps even some funding from a company who wants to keep their Py2.7 code alive and invigorated, and the community part as friends.

It has already been explained elsewhere in this discussion by other people, I would strongly advise against that. If you for whatever reason have to stay on 2.7, then make sure your new code is 2.7 only (and best if it works with 3 without changes if you decide to change your mind later).

Consider what's more likely in the far future (after PSF will give up on 2.7 support in 2020). That somebody will support 2.7 as it is, or that this guy will support his Python 2.8 hybrid?

Also consider what happens (in the far future) when some library you use will drop Python 2 support. It's not likely it will be easy to run on this Python 2.8 hybrid, either.

And if you for any reason must use Python 3 features in your code base, just bite the bullet and port it.


What I was hoping for was a sane migration path to prevent the split you talk about in 2020...somebody made a superset of both py2 and py3 that lets one move gradually. If support ends/project dies one would bite the bullet and move all the way to py3 I guess.

My py2 code uses unicode properly which may color my view a bit...

But I don't really disagree with what you say.


Clearly, "sane" is up for argument. My understanding is that 2.6 was the last version to get major features. Python 2.7's goal was to be a bridge between 2x and 3x[1]--it was that superset where code can run in both. It back ported many of the popular Python3 features (at the time). But that was 6 years ago and Python 3 has had new features since then.

(Looking at this from the perspective of the "Python Community" or someone who's goal is to adopt Python3) His focus is to back port newer Python 3 features developed since then. Does this help people move to Python 3?

Good on him for digging into cpython. While there's __future__ and the backports module, he seems to have focused on features that aren't just new libraries (which is cool). A few years ago I was trying to backport Python 3's Namespace Packages for my company since our internal import tools effectively do the same thing (except our's had bugs).

[1] https://docs.python.org/dev/whatsnew/2.7.html#the-future-for...


If the code is "perfectly running" on python 2.7, then why do you need python 2.8?


> I've been working on Python 2.8 (not an official Python release)

It is very dishonest to call it Python 2.8 then.


I suggest Monty 2.8


It's too bad the python 3.x fans can't see this as feedback about how difficult it for users of 2.x to upgrade to the latest and greatest. Many 2.x folks have sprawling code bases and complex operational needs. The 3.x advocates seems to consistently ignore that.

Shoot the 2.8 messenger all you want for choosing to call it Python 2.8, but don't dismiss the issue that drives thoughtful people to get value out of this strategy.


The python maintainers and the 3.x fans do see this, and know this, and decided to do it. They knew there would be a cost in community, and decided it was worth it. They haven't been oblivious, and they've made several large concessions in 3.4 and 3.5 to increase the ease of migrating. Stop denigrating them.


Christian Tismer tried this a couple of years ago [1].

I guess the intention was a bit different: he wanted to have stackless features in python. It's not clear to me the reason he decided to back down, whether because of licensing issues or just because the other python developers didn't like it.

[1] http://www.stackless.com/pipermail/stackless/2014-January/00...


What another load of crap. Call it something else but this isn't Python. I'd never use this because it's not official. Who knows if or how long it'd be supported for or if any backdoors would/could be introduced.

I've scheduled time this year for my teams project to update to Python 3. It's expensive in the short term but in the long term we get continued support and new features which is a huge win.


Do you trust PyPy? It's not official either.


I can't fault a single thing in his justification. This should have been the approach to modernising python all along.


> This should have been the approach to modernising python all along.

The core driver for the Python 3 break was the fix in text model, this is what allowed literally everything else as it completely broke existing code.

And I, for one, think it's one of the most important improvements of Python 3, the text model of Python 2 is a giant mess and makes it very hard to correctly deal with non-ascii text for any non-trivial software, especially in large teams where not everybody will carefully evaluate the text-ness of their code..


> the text model of Python 2 is a giant mess and makes it very hard to correctly deal with non-ascii text for any non-trivial software

There are counter-arguments to this. Armin Ronacher, author of (among other software) the excellent Flask web framework, thinks that Python 2's system of codecs and byte streams is better in practice [1][2]. Reasons include: You can do byte -> byte conversions with codecs that are no longer possible. You can better handle text encodings besides UTF-8 (and here he describes several embarassing failures of Python 3 to handle OS paths correctly). You can write single APIs that handle byte streams like gzip and text encodings like UTF-8.

[1]: http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/ [2]: http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/


> Armin Ronacher, author of (among other software) the excellent Flask web framework, thinks that Python 2's system of codecs and byte streams is better in practice.

Armin Ronacher works in a very specific context of having to deal with byte/text interfaces in pretty much all his projects, and while I can see where he comes from I work at a different level and at the level at which I work the P2 model is a giant pain in the ass.

> [1] http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/

http://lucumr.pocoo.org/2016/11/5/be-careful-about-what-you-...

Armin is no foe of Python 3. And as noted in the essaye Python 3 has undergone several improvements or features reintroductions e.g. PEP 461 reintroduced C-style formatting to bytestrings, making generating binary data (especially ascii-based formats) significantly more convenient than it is between 3.0 and 3.4.

Also note that Armin has repeatedly praised Rust's text model, which is much more similar to P3's than P2's (except with static types and no messy legacy).

> and here he describes several embarassing failures of Python 3 to handle OS paths correctly

And (fucking surprise) the issue with that is the text model of FS paths is an embarrassing pile of garbage, Python 2 is convenient because it doesn't try to touch that mess at all and just hands the flaming bag of shit to whoever comes next.


> Also note that Armin has repeatedly praised Rust's text model, which is much more similar to P3's than P2's (except with static types and no messy legacy).

That is incorrect. Rust's text model has (almost) free (and copyless) transmutes from bytes to strings. Python does not. The text model of rust is much closer to Python 2 than 3 in many ways.


> That is incorrect. […] The text model of rust is much closer to Python 2 than 3 in many ways.

Rust's text model strictly separates proper strings and bytestrings, defaults to proper strings and requires that strings be properly formed (so much so that it has additional completely separated platform-dependent types for dealing with OS-originated "stuff").

The one "difference" (which is more in the realm of implementation detail than language text model) is that Rust leverages its ownership system to make UTF8 "encoding" and "decoding" free (literally for the former, essentially for the former). The encoding and decoding are still there and explicit operations though.

> Rust's text model has (almost) free (and copyless) transmutes from bytes to strings.

Only for the specific case of input bytes already in the language's internal encoding (which granted will be common as most inputs would be ascii or utf-8) and with the same ownership constraints as the input, and that's mostly enabled by Rust's ownership model.

> Python does not.

Python doesn't generally do no-alloc/0-copy operations so that's not overly surprising.


> Only for the specific case of input bytes already in the language's internal encoding (which granted will be common as most inputs would be ascii or utf-8) and with the same ownership constraints as the input, and that's mostly enabled by Rust's ownership model.

Except of course on operating systems where text I/O is done entirely in UTF-16. Say, Windows.

Since Python strings have no fixed encoding, but choose "the most efficient one" (heuristically) when decoding, they can cope better than a fixed UTF-8 encoding in these cases.

>> Python does not.

> Python doesn't generally do no-alloc/0-copy operations so that's not overly surprising.

Indeed. Even when the encoding is not changed, the string will be always copied. One could think of an API that does that, though, to optimize all those cases were memory is already owned by a shim in the runtime.


> Since Python strings have no fixed encoding, but choose "the most efficient one" (heuristically) when decoding, they can cope better than a fixed UTF-8 encoding in these cases.

That is wrong. Python can never pick the most efficient encoding unless you decode from latin1.


PEP 393


That's why I said it can use the most appropriate encoding in the latin1 case. Before that it would never have the right encoding.


Rust having strings that are utf-8 is a guarantee and as such allows uou to do very efficient operations on them. Puthon gives you a vagie guarantee that it gives you O(1) access to something like a glyph.

These are very different and incompatible text models.

At no point is Python's text model fast or overly useful.


Armin has backed off of this stance since then. And for good reason.

As someone who works with Python text processing extensively, I can tell you that the Python 2.7 text model is broken and dangerous, due to the silent bytes-unicode coercion and misguided use of ascii instead of UTF-8 as the default text encoding. Many people don't realize this and will argue that it's not broken, because they have never fed non-ascii text through their app to watch it blow up! And once they realize that they have a problem, they then have to deal with a rat's nest of silent bytes-unicode coercions happening implicitly all over their app, sometimes impossible to deal with due to library code outside their control.

There is a good discussion to be had on whether a language should prioritize bytes or unicode strings as the main data type, but there is no excuse for the "ticking timebomb" string data type design that pre-3 Python has with strings and the default encoding.

For this reason alone I'm very happy that 2.7 is starting to lose its grip. Its continued support is a problem, and I have no love for people who are trying to hold on to it.

There are many other features in 3 that I can no longer live without - most of them now available through backports modules - but types and asyncio can't be easily backported either, and people are starting to use them extensively.


Yeah, but if you are dealing only with a subset of the English Language in the U.S., and your API endpoint that you are scraping wants to serve to all peoples in all locales in all situations, you are fucked if you want to use Python3 and its csv module.

You genuinely are better off using Python 2.7.x and its naive approach to text.


I don't understand what you mean by "your API endpoint that you are scraping wants to serve to all peoples in all locales in all situations".

That would mean to me that the API endpoint could be sending me Unicode, in which case Python 3's Unicode-aware CSV is going to work great, and Python 2's csv is fucked. The limitations of Python 2's csv module was one of the key points that moved my company to Python 3.

On Python 3, if you want to be naive about text (not sure why you're celebrating only working in a subset of English, but you have this option), you could open the file as Latin-1 and get the same results as Python 2.

Many CSVs are made with Excel. Excel's only form of Unicode CSV is tab-separated UTF-16. Python 2's csv can't parse those at all, can it?


> Python 2's csv can't parse those at all, can it?

Nope, not without re-encoding to UTF-8 before parsing (learned that out the hard way and found out it's easier to just take excel files as input).

P2's CSV module works byte-based, and basically only handles ASCII-compatible supersets, assuming your special characters (quote chars, field and record separators) are straight ASCII.


I don't think it's honest to post this without context, and without mentioning that in the five years that passed most of these things were remedied, and indeed, some things were already remedied at the time of his writing.

Some points Armin makes are valid and remain valid for Linux-ish systems, but have been shown and refuted countless times for other operating systems; Python is not a Linux-only show. I won't re-iterate all that here.


You should be aware Armin now has a more-or-less followup post telling people not to do what you just did (i.e., reference his 2011 post as an authoritative "Python 3 is bad" explanation, because both Python 3 and his own opinions have evolved since he wrote that post).


This approach could not have worked for modernizing python. The whole point of the Python 3 thing was to be able to remove warts in the language that could not have been fixed without breaking backwards compatibility. One core part of this is unicode support -- Python had a horrible story for international text before this.

The fact that there are some parts of "modern" Python which could have been implemented in Python 2.7 backwards-compatibly is irrelevant.

Python 3 is not the language designers worrying about minor subjective issues in python like the print keyword or the design of iterators and deciding that they want to change it all. It is the language designers worrying about major issues like international text, realizing that they regrettably will have to break backwards compatibility to fix those, and then just taking the opportunity to revamp things like printing and iteration since they're breaking backcompat in some pretty major ways anyway.


Python had a horrible story for international text before this.

No, Python had a horrible story for text, and people who worked in limited/sheltered domains didn't realize it. I personally lost all kinds of valuable hours of my life fighting with Python 2's "pretend everything is ASCII until it isn't, then fall over dead" model, because I -- a US citizen, working at US companies, and for quite a while dealing only with English-language content -- still ran into non-ASCII characters with regularity.

And here I'm being charitable; I simply refuse to believe that the overwhelming majority of people who used Python 2 never once had to deal with someone copy/pasting text out of Word or another program that used "smart quotes".


Yeah, I agree, my bad on the phrasing.


It's super-sad that "Unicode" is a prominent stated motivation for Python 3.

Unicode in Python 2 was fundamentally broken in that whether it had UTF-16 semantics or UTF-32 semantic depended on how the interpreter was compiled. That's a terrible, terrible idea. However, they could have fixed it by sticking to one option: UTF-16 (which provided compatibility with some interesting things that Python interoperated with like Cocoa and, via Jython, Java).

UTF-16 is a sad legacy mistake, but APIs providing Unicode operations of any kind can be build on top. So UTF-16 is a mistake to begin with, but it's not a blocker for supporting all of Unicode and features targeted at the needs of all writing systems and languages. Java, Windows, the Web Platform (including JS) show that proper i18n can be built on top of the bad but backward-compatible 16-bit code unit foundation.

Now, the _even_ sadder part of Python 3 is that if you decide that UTF-16 is a mistake and want to fix it, UTF-32 is the naive and wrong solution. When a Unicode newbie is told about surrogates, they think that UTF-32 is the answer. But then they waste memory and cache line space (and, if dynamically omitting leading zeros on a per-string basis, the compute and copy cost of promoting to different unit width when adding one emoji). And once the damage is done, someone points out that grapheme clusters are a thing, so they still didn't get O(1) indexing to user-perceived units.

The enlightened thing, of course, is to do what Rust does: use UTF-8 and use iterators on top for accessing pieces larger than a code unit (code point, grapheme cluster). (To my taste, Swift strings are too magic and DWIM-y. At least back when I read the Swift book, it didn't even explain the underlying representation. With Rust, the representation is very explicitly known.)

"UTF-16 sucks" is what Python 3 got right. That UTF-32 (with dynamic leading zero omission on a per-string basis) is the answer is what Python 3 got very, very wrong. The correct answers are either UTF-8 (for a new language like Rust) or holding the nose and making stuff work on top UTF-16 (Java, JavaScript) without breaking old programs.


To be clear, I don't agree with py3s Unicode model. I think it sucks for the same reasons you do.

I also think that default Unicode is a major improvement over py2 and is enough to justify breaking the language because modern languages should at least have that.


Python 3 fixes no fundamental issues with python 2 and introduced far more warts than it removed. GIL is still there, crummy runtime is still there and unicode is now an even greater mess. I really wonder how many people who bang on about unicode actually have a good grasp of unicode and text processing because python3's unicode design is obviously terrible. I can now access or count code points in O(1) (neither of which is in any way useful) at the cost of tremendously increasing space and time overhead for any basic text operation on non ascii text and having some bizarre hacks to deal with the fact that pretending that stdin and stdout and sys.argv are always text.


It most certainly fixes support for Unicode on Windows in terms of filesystems paths, OS function boundaries and the console. Some of these fixes have even taken until 3.6 to get implemented.

As someone who writes cross-platform code, Python 3 was a breath of fresh air after fumbling around in the dark with Python 2.


I can definitely believe that – but windows has basically lost[1] and a much worse text model to boot. Like Java and unlike python 3, they at least have the excellent excuse that this was not obvious at the time. And under unix the impedance mismatch has definitely increased. Not a good trade.

[1] I wouldn't count them out, but they're definitely on the back foot as bash inclusion shows.


> I can now access or count code points in O(1) (neither of which is in any way useful)

Oh, yeah, I agree that Python's unicode model isn't great. I like Ruby's, and Swift has it's own cool thing going where it's very explicit about the uselessness of code points.

However, I think that Python 3 having some form of default unicode support is way better than what Python 2 had. It could be improved (backwards-compatibly too!), but it passes my minimum bar for a "modern" language's text story.


In all seriousness, I'd much rather python3 had kept str, phased out the unicode type altogether, got rid of all the harebrained locale crap (sys.{get,set}defaultencoding etc) and just provided tooling (collation, regexp, denormalization etc.) for working with utf-8 encoded byte-'str's. This would probably have been a much smoother transition and ended up with a vastly superior result.

I'm pretty sure the people complaining here about how python2 str only supports ascii and they couldn't paste their smartquotes were bitten either by windows or unnecessarily bad unicode/str interactions due to python not just hardcoding utf-8 auto-conversion. That is the only sane thing to do (Your locale isn't *.UTF-8? Well sucks to be you. By now even the Japanese and Chinese seem to slowly have come around to the utf-8 bandwagon, and they had better reasons then most).

I might be wrong, but I can see basically 3 non-idiotic ways to do text in a programming language:

1. arrays of utf-8 bytes (Rust, Go). Python was close to that already and then messed it up. Indexing indexes into bytes O(1).

Upsides:

- efficient: most text you're going to get is already utf-8 and the rest should be converted on ingress/egress; html/css/most code will be represented fairly efficiently even if the body text is mostly say, Chinese; you can do a lot of text processing by just working on the ascii range (e.g. CSV parsing).

- sane: no BOM, no 32 bit encoding of 21 bit quantities etc; unix-compatible

Downsides: - can't efficiently access individual logical characters or know the fixed-font width of the text

- normalization is kinda nasty (concatenation etc.), in practice people just tend to ignore that

- hard to constrain to only valid utf-8 without significant downsides

- maybe not that beginner friendly

2. use some non-array type that doesn't allow for indexing (e.g. ropes), probably using (mostly) utf-8 for internal encoding.

3. arrays of logical characters. That means you need to make up fake characters to handle graphemes that are not directly representable as a single pre-composed code point in unicode. The upside is that this has beginner friendly semantics in a sense and allows indexing on what's meaningful in the domain (graphemes). The downside is that I can't see how to do this with a lot of complexity and some nasty gotchas. This seems to be what perl6 does https://design.perl6.org/S15.html#NFG


as mentioned in a comment above, the choice to take 2.7 behaviour when 3 behaves differently means this wannabe-python '2.8' is neither backward nor forward compatible

More

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: