As someone who writes a _lot_ of Python code (mostly in 3 but with occasional switches to 2), who maintains several libraries which work in both 2 and 3, and as someone who uses a wide range of libraries in stats, machine learning, networking, web development, etc. in my python 3 work...this piece seems totally disconnected from reality. The problems it describes are relentlessly overblown when they're not simply manufactured from whole cloth.
Put another way: the differences between 2 and 3 are not great, the vast majority of libraries you'd want to use (i.e. are actively maintained or of at least good quality) work in 3, and while I don't doubt there's great piles of Python 2 code moldering away big "enterprise" apps and those sorts of places, it's ever been thus in that space, no matter what the language, and doesn't pose any sort of existential threat to Python.
(edit: I used to recommend "Learn Python The Hard Way" to newcomers, and have just kept reflexively doing that over the years because I wasn't aware of a better resource. But if this article has accomplished one thing, it's that it's spurred me to look for a replacement)
I found it really helpful.
From Py1.52 I've done the same.
I wish every language was as well documented (Perl6 is pretty good)  as Py, especially the changes b/w versions. Rough order of reading: What's new, Tutorial, PEPs, Library references. Why is it good? It is canonical, pushed out as a release is done, comes with the install (cf: $ pydoc foo) and is free.
Python is also rather outstanding in the sense that there are tons of good recordings of PyCon talks that explain almost every language feature in very approachable ways.
The quality of those talks is usually great too, it seems that PyCon consistently hosts very good speakers.
I'm only starting to realize, as I branch in to other things, that this might not be the norm.
This( + no longer allowing "encoding" of bytes or "decoding" of unicode) means that you have a much higher chance of writing correct code when it comes to handling encodings.
I stopped there.
Then I read more for laughs.
I kind of feel like I could fix py2to3 if I have a sed and awk layer in there.. maybe a fun project.
I like what you said about self-obsessed project. It's clear that you have a very well reasoned position, not just opinion. What you say makes irrefutable sense.
At least that's the conclusion that is repeatedly forced upon C, no matter what the argument is.
I'll say this, at Pycon 2016 I attended some dev sprints/hackathons with the python 3 developers. Python 2 is no where on their radar, its dead. They have moved on. There will be no additional compatibility layers or any of that. All of the mainline libraries that everyone uses (django, sql alchemy, etc ) have moved on to python 3. If you haven't, you should to.
No. The rest of the community is much better. In fact, his behavior is an aberration.
py3.3 support is at alpha level. py3.5 support is in the pipeline and already available for testing. It's unlikely they'll do any effort that's targeting py2 specifically anymore.
It's extremely badly conveyed but that's what I got out of it by keeping reading.
I do disagree with him on all the rest though, especially strings. It didn't "just work" before, it failed silently and who knows how much disaster he or his readers have caused because of it. Now at least you can't be wrong anymore. For a language heavily used on the web, it's hugely important to understand where your strings are coming from and where they're going to, and how.
Besides, Python is not a frigging "beginners language". Just because Python (and admittedly especially Python 2) is generally easy to grasp as a first language doesn't mean it's its purpose, nor should it constrain itself toward this goal. It's used in a million of different areas, including pretty sensitive ones. It's now become a more mature language, reaching for exactitude and consistency. Just because Zac no longer has an easy toy language to point script kiddies to so they can "learn to program" by reading one website doesn't mean Python is to blame.
But people talking about Turing completeness in a real programming language (usually in the form of "x is Turing complete, therefore you couldn't ask for more") almost always haven't got a clue.
There is a fun quote about that...
"There are those who tell us that any choice from among theoretically-equivalent alternatives is merely a question of taste. These are the people who bring up the Strong Church-Turing Thesis in discussions of programming languages meant for use by humans. They are malicious idiots. The only punishment which could stand a chance at reforming these miscreants into decent people would be a year or two at hard labor. And not just any kind of hard labor: specifically, carrying out long division using Roman numerals. A merciful tyrant would give these wretches the option of a firing squad. Those among these criminals against mathematics who prove unrepentant in their final hours would be asked to prove the Turing-equivalence of a spoon to a shovel as they dig their graves."
-- Stanislav Datskovskiy
As such, I think it represents one of the most enlightening esoteric languages out there together with Brainfuck.
"Garbage collection is simulating a computer with an infinite amount of memory".
We both agree, I'm just using this as a pretext to share this intriguing piece of knowledge. I find the concept much more intuitive than "getting back unused memory".
Especially when you look at the 90% memory usage on your OS, it still make sense with the "infinite memory simulation" definition.
I am unsure when the idea of python3 (first as python 3000) came to general knowledge, but it must have been some time between 2000 and 2004. Then python3 was published in 2008.
So the time between obsoleting the old version, before the new version was available, was notably shorter with python.
There's also a good chance I'm confusing perl and parrot.
> Python 3 has been purposefully crippled to prevent Python 2's execution alongside Python 3 for someone's professional or ideological gain.
I can't tell if Zed's referring to python3 doing a fork()/exec() of a python2 not working correctly or if he wants/expects some kind of inter-language import or "linking" among files written for respective language versions. What's he getting at?
Is there really something that prevents you from executing python2 at the same time as python3? (I tried a simple program "os.system('python3 -c \'print ("hello")\'')" and it worked just fine.
N.B.: I haven't worked with Python so if you can actually do this somehow let me just say that that's not the impression the article left me with.
If I were a library maintainer on py2 I would have felt betrayed by py3. Suddenly print is a function? 'yield from' won't be available on the py2 branch? Yikes.
C++ is an example of the serious negative consequences of language change. The language has been in flux for decades and the 3rd-party code shows it. Some fraction of good libraries rely on c++14 features. Many others duplicate in library code features that have been available in the language since c++11 or earlier. Boost, the 'missing standard library' for cpp, is something you either love or hate.
The end result for c++ in 2016 is that large shops either (a) don't use it or (b) don't use the new features and have a slow acceptance process for 3rd-party libs.
You can argue that the C community is healthier than C++ -- a ton of important system libraries are written in C and have bindings to tons of languages. Yes, they have problems (openssl cough), yes nobody understands unspecified behavior across compilers, but C has done a good job of supporting lots of platforms and staying relatively stable.
Betraying the community can kill a language. Perl learned this lesson the hard way. Let's hope python can be saved.
from __future__ import print_function
Now you've got print function in python2.
> 'yield from' won't be available on the py2 branch?
And 2.6 doesn't get set literals. And 2.5 doesn't get "with" statements. And ... There's got to be a cutoff somewhere. Or we'd just have an eternal 1.0 with all the features backported to it.
Which is exactly opposite for c++11 - it doesn't break backwards compatibility, but I really want those new features. I think if it broke backwards compatibility, I'd still switch to the new c++.
At work we use py2 and c++11...
The harder part is the string handling. In Python 2, you have one class, string, which is performing two different jobs. It is acting both as a holder for text, and as a holder for binary data. This sort of works, because ASCII looks like binary data if you squint it. If your users are English-only, then this probably sounds like a reasonable thing. Once you need to start supporting Unicode, this ambiguity causes a world of pain.
On python 3, this ambiguity is removed. You have bytes, which are binary data, and strings, which are encoded text. No longer is there one class trying in vain to represent both concepts. This is also why automatic translation doesn't work, because the translator doesn't know which concept you were trying to user when you used a python 2 string. This is a rather low level change, which is why it took libraries so long to update, and why it couldn't be done without breaking backwards compatibility.
What? Have you used Python 2? No: in Python 2, you have two classes; one is called "str" and represents a sequence of bytes, and one is called "unicode" and represents text. The former is used while interacting with network protocols and files, and the latter is what you use internally in your program: at the boundaries you use .decode and .encode to convert using character encodings. (BTW: Python 3 actually got this wrong for years, and even now I think it is only fixing the problem with a mitigation :/ filenames do not have an encoding: they should be of type "bytes", not of type "unicode", and Python 3 seriously shipped with an implementation of "list files in directory" which returned names as "unicode" and any names which failed to convert were just skipped.)
Then, Any text in your program should use a syntax u"" to indicate it is program text of type unicode. Sure: there are some really annoying implicit conversions in place that allow mistakes to be made, but WTF: they absolutely didn't need to rename "str" to "bytes" and "unicode" to "str" while simultaneously not only changing tons of other things in the language (the ability to pick and choose is amazing for porting) and making pointless related changes like removing the u"" syntax (which yes, I realize they added back later... years later... many many years too late). They should have just given me a way to "poison" str/unicode objects so they refuse to participate in implicit conversions, which would have let me find the corner cases in the libraries I use that are doing things wrong so I can report them to their maintainers and get them patched; after a couple years of people throwing .poison in a bunch of places users would have been ready to flip a VM-level "just don't convert them anymore".
Seriously: I've been programming in Python (2, not 3000) since 2007 as one of my primary programming languages; I do all of my web development in Python, for a website which is used around the world by millions of people and is not only itself translated into tons of languages (by both professionals and volunteers) but deals with tons of user-generated content that is accessed via tons of different channels (APIs from various websites, native network protocols, and tons of file formats as all the content indexing is also written by my, in Python), and dealing with these issues in Python 2 is simply a non-issue: in fact, I would say it is almost trivial (and in fact, the ways it tends to break in Python 2 are often quite similar to the way things break in Python 3, as it is easy to set up situations where the implicit conversions essentially fail).
I agree that there were some changes that were unnecessary, like the removal of the u"" syntax. That said, I think that the fundamental change, which was to have the thing named "string" be a string, was the right choice. Python has always been about having the obvious choice be the correct choice. Telling everybody that "quoted text" is not text runs counter to this.
> ... one is called "str" and represents a sequence of bytes ... The former is used while interacting with network protocols and files ... Python 3 actually got this wrong for years ...
Actually some Japanese have suffered from this wrong design of Python 3.
If you read Japanese, please see
Yes it does:
As do most corporations. If you want a job in Python, you had best know 2.7.
Yes I do multilingual string processing a LOT. I worked in SMT for about 3 years. Python 2.7 was the only practical option.
Python 3 does not fix the GIL. That would be enough to get me interested again.
A language that changes will alienate people, and lose people. Perl saw this. Python is seeing this. The way to guard against this is to keep backwards compatibility. That generally doesn't allow enough change in languages that provide enough constraints to make them popular for large engineering projects, whether open source or commercial. Lisp doesn't need to change much, it's malleable enough in many respects that you can implement whatever you need, but good luck getting large engineering projects done and maintained using it. Perl has the same problem.
The only solution I see to stay relevant is to allow alienating users, but make sure those changes that alienate users are good enough to draw substitutes, enough to keep steady state, or preferably still grow slowly. Maybe prior users will even swing back in occasionally, and without the false sense of betrayal, they'll actually find they like the state of the language a few years later. For a while, at least.
This seems like the end of the world to Python community members, and it seemed like the end of the world to Perl community members because of the prior status those languages held, where they are or were at the top of their respective niches. I doubt the Scala community worries about exactly the same things.
Put another way, did you expect to still be writing Python the same way in 20 years, with only the popular libraries changed? If you did, did you think about what that would mean for the language, and what the community would look like at that time? I'll tell you. Perl. That's not necessarily a bad thing. I write Perl professionally, every day. I love it. It's not the worst thing, being stable and reliable. But you don't get to be top language in your niche and sit on your laurels at the same time.
Change is good. Embrace the change. If you're lucky, you'll use many languages to program in your life and be happy. The only way I see anyone being happy using any single language forever is to bury their head in the sand and ignore everything else going on around them.
But is there any C11 compiler out there that will refuse to compile C99?
C is not a good language. It was a good language, but that was a few decades ago. We've progressed past the point where C's shortcomings are excused by there not being alternatives that address those shortcomings while offering comparable features. At this point, we're just finally edging past of the local maxima created by C's exceptional popularity and the knock-on effects of that popularity, such as most operating systems (and all popular open source ones) being written in C.
Nim can definitely produce C-compatible libraries, seeing how it compiles to C itself.
Go 1.5 added support for this.
It is cheap to deal with the ambiguity in the compiler using compile options. You could go the Java way and have separate class path (e.g. In PyPy, as CPython 2.x is dead).
But this does not allow you to mix and match code for older version anyway. Java was designed from start to allow this and it is why it's so stagnant.
Why do you single out Scala as being particularly different to Python and Perl?
It's an industrial tool, with an ecosystem provided by an industry, used to solve industrial problems.
C++ can't change. It's a feature. Unless the code uses some non-standard extension it is guaranteed that the code you write today will compile tomorrow.
Refactoring? Rewriting? We are talking of decades old codebases with millions of lines, filled with bugfixes to handle odd-but-critical cornercases.
Software that is sold for hundreds of millions, that run businesses that execute billion dollar projects.
Stability, utility man.
Not so graceful, perhaps, or cuddly. But it works.
Yeah, the language is still a bitch but it's at least a stable one with support from several industrial vendors.
From cognitive or software design point of view it's a disaster though - people should be thoroughly vetted in some other more sane language before allowed to write C++ :)
Can you explain why Python would need to be saved? I mean every week on r/MachineLearning there is at least one new Python deep learning framework being launched, I wonder if you can mention any other language which is that healthy.
About the only thing that is dead like that seems to be Enthought project for drawing and creating UI for graphs.
C++ is so unconcerned with modules they've been kicking the feature down the road spec by spec. I think it's now 'post-17'. (could be wrong).
Rust is a language that's in the perf class of c++ but has built-in lifetime support and built in modules. It's ten times easier to do database or, say, SDL interaction in rust. And rust is barely 1.0!
I don't know anyone who programs who doesn't use a ton of libraries. C++ just hasn't prioritized this part of programming.
Nobody is having discussions about C++14 vs C++98, the latter is better period. A C++14 compiler will compile C++98 code just fine in most cases. If it doesn't you get a compile time error.
People are complaining about the opposite, that C++ is keeping compatibility for too long and that it has too many features. Both strategies have their pros and cons, but breaking seems to have more negative effects.
C has done a great job of supporting buffer overflows on lots of platforms and injecting safety errors in all software that uses those libraries.
From the outside looking in (as a non-C++ developer), for me the bigger problem is honestly that they left in the warts rather than just paving around them. Those who have followed C++ for its entire life cycle have probably been able to mostly keep up with what language features were mistakes and what the best way to do things is, but I can't imagine myself ever picking up C++ now and hoping to have any clue at all what to avoid without spending years learning the hard way (no pun intended).
As a very basic example from C# (which I do use professionally), the fact that untyped collections still exist (and things like IEnumerable with explicit casts instead of IEnumerable<T>) is just pointless cruft that we'd be better off without. As far as I'm concerned, no code should ever depend on them, and any code that does should be forcibly broken to force people to fix it. We treat security flaws seriously by forcing people to fix things, why should we not do the same with features that are essentially bug-magnets?
I mean this is an age-old argument but MS comes down very strongly on the other side because, like... what if nobody is working on that C# 3.0 code anymore but you need it for whatever reason? Well, I guess you're out of luck in the world where they just break ArrayList.
... Not only is this blatantly false, but the author acknowledges it as such and seems to think that it is still OK to write this sort of sentence. Backing a pure falsehood with anecdotal support from "actual Python project developers" does nothing to change its veracity. I can't respect anything else in this article after seeing this kind of sensationalism.
Perhaps the author doesn't understand Turing-completion (I'll give the author the benefit of the doubt), but even if so it's inexcusable to throw around this sort of technical language this casually.
The author has this exactly backwards. It is Python2 that makes the weird distinction between the string and the unicode type. Strings (and bytes) in Python3 are done (almost) right. (I say "almost" because Python bytes are constrained to be 8-bits wide, but that is not a serious problem.)
He also clearly doesn't understand Turing-completeness.
The first time I was working on a project which required me to be able to work with characters beyond ASCII, I was so confused. To this date, I'm still not sure if I can actually explain what str.encode and str.decode do in Python 2.
Admittedly I've never used Python 3 for a project where I had to deal with international charsets - and his point about the concatenation of unicode and bytearrays certainly seems a valid point - but it still seems infinitely better than what Python 2 did.
This is one of those subjects where I feel like being a non-English programmer has given me surprisingly valuable experience.
The operation of concatenating a text value (string) and binary data (byte array) simply does not make sense, for two reasons. What do you expect the result to be?
1. Binary data with the text appended. Then you need to specify which encoding you want the text to have. Once you do (a.encode('utf-8')), all will be fine. You could argue that a conversion to utf-8 should be implicit since that's almost always the correct choice, but you'd be surprised how fast you run into trouble with other systems.
2. Text with the binary data appended after interpreting it as text. Again, which encoding? You'll get very confusing and terrible results if you do not specify this. If you specify this (b.decode('utf-8')) everything will be fine.
The only other alternative I accept is a type error. I'm guessing zed would be even angrier if that was the case. I'm not quite sure what Python 3 does, but at least it somewhat communicates that whatever interpretation it does is internal and arbitrary. The Python 2 result is ridiculous.
Python 3.5.2 (default, Nov 7 2016, 11:31:36)
[GCC 6.2.1 20160830] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> u'' + b''
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't convert 'bytes' object to str implicitly
>>> u'' + bytearray()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't convert 'bytearray' object to str implicitly
In zed's defense, it still does seem like a rather daunting error for a beginner programmer - although like you, I have no good viable alternative to suggest.
Aren't all bytes 8 bits wide?
I think in the context of the original answer there's also the problem that char types aren't necessarily 8 bits wide, e.g. wchar etc.
EDIT: Downvotes? Seriously?
I didn't realize that this concept was referred to by the term "byte". But this clarifies how you are using that term.
Given that usage, I'm not sure why 8-bit bytes are an issue at all for a language like Python that is supposed to be hardware-agnostic; the whole point is that programs should not care about how the data is physically represented in the hardware. Python 3 got rid of the distinction between int and long for the same reason. The only reason the "bytes" object exists at all is to represent binary data, for example data that gets read from/written to files, network sockets, and other I/O, and AFAIK bytes in such data are always defined to be 8-bit chunks of data. (If you really insist on having Python representations of hardware data chunks, there's the ctypes module.)
It is common to see "byte" used to mean 8-bit-byte, and also to see the term used for wider units of binary data. If you want to be unambiguous, the word "octet" is commonly employed to mean 8-bit byte.
> a language like Python that is supposed to be hardware-agnostic
That is exactly why you want wide bytes. If you want to represent an integer >255 as fixed-width binary data and you don't have wide bytes, then you have to choose an endianness convention. If you have wide bytes then you can be endianness-agnostic.
> AFAIK bytes in such data are always defined to be 8-bit chunks of data
This is usually true, but not always. For example, UTF-16 and UTF-32 data is specified as 16-bit and 32-bit bytes. This is why you need a BOM.
> there's the ctypes module
Yes, this is why I said it's not a serious problem.
But Python 3 doesn't represent integers that way, because Python 3 doesn't put any size limit on integers. There's no way to represent integers of potentially unbounded size as fixed-width binary data.
This is one of the things I meant by "hardware agnostic"--I didn't mean "able to represent the different register sizes of different hardware"; I meant "having a representation that doesn't have any concept of register size at all".
(Python 2 does have a separate int type, which IIRC is either 32 or 64 bits wide depending on the platform. AFAIK this is just a wrapper around the underlying C int, so it isn't affected by any 8-bit byte restriction.)
> UTF-16 and UTF-32 data is specified as 16-bit and 32-bit bytes. This is why you need a BOM.
Um, no. UTF-16 and UTF-32 data is specified as 16-bit and 32-bit code points. You need a Byte Order Mark (BOM) to specify in which order the 8-bit bytes that make up a 16-bit or 32-bit code point appear.
From the Unicode FAQ:
"Q: What is a UTF?
"A: A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. The ISO/IEC 10646 standard uses the term “UCS transformation format” for UTF; the two terms are merely synonyms for the same concept."
Further down, under the table listing the UTFs and their properties:
"In the table <BOM> indicates that the byte order is determined by a byte order mark"
And in the answer to the next question after that:
"UTF-16 and UTF-32 use code units that are two and four bytes long respectively."
Yes, obviously. Variable-length bytes is a feature that Python does not have. (AFAIK the only language that has variable-length bytes is Common Lisp.)
> There's no way to represent integers of potentially unbounded size as fixed-width binary data.
Yes, obviously. To be excruciatingly precise I should have said: if you want to represent a binary data type with N instances where N>255 (e.g. an integer in a range from 0 to N where N>255, or from -N to N where N>127) then you either need bytes wider than 8 bits, or you need to worry about endianness.
> UTF-16 and UTF-32 data is specified as 16-bit and 32-bit code points.
No. Unicode characters are specified as code points, which are just numbers with no specific representation. UTF-16 and UTF-32 are encodings of unicode which use 16-bit-wide and 32-bit-wide bytes respectively to encode those code points. It is only when you serialize those encodings to octets that you need a BOM. When UTF-16 and UTF-32 are used internally to one machine you don't need a BOM.
There are also, as I pointed out before, many algorithms (particularly in cryptography) that are specifically designed to operate on fixed-width binary data wider than 8 bits.
That wasn't my point. My point was that, as far as integer objects are concerned, there is no concept of "byte" at all, not even a fixed length "byte" of 8 bits. There is no concept of "chunk of data operated on as a single unit by the hardware", and there is no concept of "unit of data of the underlying storage for this object". There are just objects representing integers of any value you like.
The only kind of object in Python for which a concept of "byte" in your sense exists at all (if we leave out modules like the ctypes module) is the byte string (str in Python 2, bytes in Python 3). For those objects, you are correct that "byte" is always 8 bits in Python, and there is no way to vary it. But that has nothing to do with the underlying hardware; it is just the data model that Python has chosen for binary data.
> if you want to represent a binary data type with N instances where N>255 (e.g. an integer in a range from 0 to N where N>255, or from -N to N where N>127) then you either need bytes wider than 8 bits, or you need to worry about endianness.
Only if these things are exposed to you at all. In Python, you don't have to worry about this for integer objects because you don't see the underlying storage at all; things like what endianness is used to store integers > 255 are implementation details of the interpreter. Frankly, I find that to be a good thing: if I wanted to worry about stuff like that, I'd be programming in C, not Python. But of course different people will have different needs and preferences.
> UTF-16 and UTF-32 are encodings of unicode which use 16-bit-wide and 32-bit-wide bytes respectively to encode those code points.
I understand that you prefer to use this terminology; but my point in that particular comment was that it is not the terminology that is used in the Unicode documentation, which is the closest thing we have to an "official" terminology for Unicode. As is shown by what I quoted, that documentation clearly uses the term "byte" to mean 8 bits; it refers to UTF-16 and UTF-32 as encodings that use 2 or 4 "bytes" to encode a code point. It does not use the term "byte" to refer to 16 or 32 bit chunks of data. So it seems misleading, or at least potentially misleading, to use the term "byte" to refer to anything other than 8 bits when talking about Unicode encodings, since the Unicode documentation does not give the term "byte" that meaning.
Let me try to re-state the point I was trying to make: the only reason there is anything special about 8 bits as a unit of data is because hardware is built to operate on units of 8 bits at a time. Whether you call these things "bytes" or "words" doesn't matter. You can call them florbs for all I care.
The fact that florbs are 8-bits wide is mainly for historical reasons. There is no particular reason to make the magic number be 8 (except that it's a power of 2), and indeed there have been historical examples of hardware with florbs of other sizes. The Intel 4004 had 4-bit florbs. The PDP-8 had 12-bit florbs.
Modern hardware is built to natively handle florbs of multiple sizes, typically 8, 16, 32 and 64 bits, but sometimes other sizes as well (e.g. 80 bits for extended precision floats, or 128 for SSE5 instructions). Because of this, many algorithms are designed to operate on binary data in these sizes, and because of that it can be helpful to have binary data in these sizes exposed in your programming language. Python exposes 8-bit florbs, but not other sizes of florbs. Common Lisp has arbitrarily sized florbs. If you're doing certain things (like crypto) having access to florbs larger than 8 bits can be useful. That's all.
More precisely, hardware at the time that "the natural unit of data" was becoming a standardized convention was built to operate on units of 8 bits at a time. As you say, the 8-bit convention is mainly for historical reasons, and hardware today can handle larger chunk sizes.
> Python exposes 8-bit florbs, but not other sizes of florbs
To the extent it exposes florbs at all, yes. But that does not mean you are stuck with 8-bit operations in Python in all cases. For example, operations on Python floats use the IEEE standard data chunks, which, as you note, are not 8-bit. Operations on Python integers are not restricted to using 8-bit CPU instructions; they will use whatever register sizes the hardware allows (more precisely, the hardware the Python interpreter was compiled for--if you run a 32-bit Python interpreter on a 64-bit machine, you won't be using 64-bit register operations). It is true that the programmer has no control over the data chunk sizes that Python uses; they are hard-coded into the interpreter.
> Common Lisp has arbitrarily sized florbs.
Can you give an example of how this capability in CL is used?
Why not just use C? Because the C language spec has "features" that allow a C compiler optimize away parts of your code that you really don't want optimized away in crypto code. For example, if you do this:
for (int i=0; i<SIZE; i++) secret_key[i]=0; // Clear secret key so attacker can't read it
clear_memory(secret_key, sizeof secret_key);
The current translation unit must really call clear_memory and really pass it the pointer to the secret_key, whose contents have to be settled. The writes performed by clear_memory really have to take place, because the caller depends on it; clear_memory has no idea that the object is dead (having no next use) in the calling translation unit.
The main problem is not getting the clearing not to be elided, but with stray copies of the data being elsewhere. The C programmer doesn't have visibility and control over all the storage areas where a datum may end up. If secret_key is really cleared, is that enough?
Common Lisp for example can extract/set a byte given its width and its position:
CL-USER 145 > (write (ldb (byte 11 2) #b1100101011010110) :base 2)
Historically this comes form a DEC PDP-10 instruction 'load byte'.
I upvoted you to compensate. :-)
>> Currently you cannot run Python 2 inside the Python 3 virtual machine. Since I cannot, that means Python 3 is not Turing Complete and should not be used by anyone.
> I stopped there.
The author is - quite obviously - staying the Python 3 VM should run Python 2 code, in the same way the JVM and CLR run other languages. That the Python 3 VM doesn't run Python 2 code - and that the Python maintainers apparently say it can't - means the Python 3 VM isn't Turing complete.
This is nonsense obviously, but it's a way of pointing out that the claim that the Python 3 VM can't be made to run Python 2 code is also nonsense.
This is very, very obvious. Yet HN have gone nuts and rather than discussing the points brought up - why doesn't the Python 3 VM run Python 2 code? - you instead make the laughable claim that the author doesn't know what Turing completeness is.
The author even explicitly states they've already created a Python 3 version of their work and HN accuses the author of ulterior motives: "having an out of date work".
It's like HN have gone out of their way to cut out quotes from the article and not actually bother to read it.
The author is a better programmer I am and also than most in the thread. They're also over the programming mob mentality that sometimes appears on HN and I can very much see why in the responses to this articles.
To be clear, Turing completeness is Zed Shaw's baseless launching point for discussing the incompetence or political nefariousness of the Python language devs.
Turing completeness does not mean a perfect translation between all languages exists. Genuine ambiguity can exist. If your language goes from 1 to 2 types, a perfect solution may be <permanently> out of the question.
> This is very, very obvious.
Um...what? That in itself is nonsense.
The reason he brought up Turing completeness is to make it seem like Python 3 breaks some fundamental programming 'law' that all programming languages should adhere to, in order to further his point.
Sure, maybe the Python 3 VM should run Python 2 code, but he doesn't need to puff up his argument like this. It just makes him look uneducated.
There is no "CPython VM": no version of Python guarantees bytecode compatibility between versions. The bytecode is just an optimisation.
For a Python interpreter to support both Python 2 and Python 3, it would need to have have some source level way of telling the two apart. Bytecode doesn't even come into this.
> That the Python 3 VM doesn't run Python 2 code - and that the Python maintainers apparently say it can't - means the Python 3 VM isn't Turing complete.
No it doesn't. That's not what Turing Complete means.
Here's a pretty trivial way to demonstrate both Python 2 and Python 3 are Turing Complete: implement a Brainfuck interpreter in both. Brainfuck is Turing Complete (memory limits aside), and thus because you can implement a Turing Complete language in both Python 2 and Python 3, both are trivially proven as Turing Complete.
Zed's argument is like this: ARM processors can't run AMD64 code, thus ARM processors are not Turing Complete.
That argument is nonsense.
> No it doesn't.
You do realise the next four words in the post you're replying to is:
> > This is nonsense obviously
And then a reason why the claim is being made?
To continue your CPU analogy (I actually read your post), it's like Intel saying x64 can't run x86 binaries, then someone else saying "why not, isn't it turing complete?" as a joke way of pointing out that it should.
It perfectly reasonable to read Zed's post and argue that Python 3 VM should not be able to run Python 2 binaries. But not even bothering to read or comprehend the post (or this thread) and claiming he doesn't understand Turing machines is a waste of everyone's time.
* We know AMD invented x64, and x64 does run x86 binaries, and the IA64/Itanium story, but anyway.
> but it's a way of pointing out that the claim that the Python 3 VM can't be made to run Python 2 code is also nonsense.
And that's why I wrote:
> For a Python interpreter to support both Python 2 and Python 3, it would need to have have some source level way of telling the two apart.
[And now I have no way to double check what you wrote originally because you overwrote your original comment.]
As for "Machiavellian Python maintainers", that's real. There have been explicit attempts to apply pain to Python 2.7 users. This has decreased since von Rossum got stuck maintaining Python 2.7 code at Dropbox as his day job.
I don't think you are lying, but it seems extraordinary to me, so I'm interested in seeing some context and I'm not sure how to find it.
Disregarding the trolling about "Turing Completeness", there are some points he raises that are interesting.
On some (most thing) items there, the train has left the station. But some are good to listen to:
Better error messages. Rust has been making progress there so maybe a good place for inspiration. Error codes and links to additional info online has been done (maybe overdone) but I think there is room for more variable names, more info about the context ("looks like you're trying to do X, maybe Y might be better here..." kinda thing).
On the lack of typing and such, MyPy might be a good project to focus on.
Standard library updates would be good, but they are tricky. Wouldn't be nice to have requests in the main library? Yes, but changes are still made to it, it might not be updated for a long time then. As they say modules go to the standard library to die.
Things I'd like to see: better performance (close working with Pyston or PyPy) and better packaging. On packaging there was talk of Pipfile just today. That was a good start. I've been using Python for many years, and I still don't know off the top of my head the basic pip flags and options, or how to start the venv, what goes into setup.py vs goes in requirements.txt vs goes in setup.cfg. The story got better in recent few years, but there is more room for improvement there I think. I often end up with a build.sh script or run.sh which just does the building and venv-ing and such. Can that be done by default? For usability sake? Even if it is not super-consistent and doesn't work for 1% of use cases... Probably.
In fact, the plan is for there to be no further new features to requests:
I think requests module is one of the nice assets in Python, having it in the standard library would be a good move.
Zed's Rails article did lead to some improvements in the Ruby community (eg: The 'pickaxe' had meta-programming added to the next version). Zed likes to stir things up, but rather than dismiss him completely over some nit picky issues, eat the meat and spit out the bones.
To the "I stopped there" folks in this thread: why do you think Python 3 is still seeing such poor adoption?
Since those tools are in place, pretty much every major Python lib has Python3 support. Many minor libs as well, though it's easy to have one or two dependencies missing it if your project is big enough.
Application developers (overwhelming majority of Python users) had to wait for the libraries to get over to Python 3. This has basically happened, so a lot of application developers are now porting.
Python 3's rollout was awfully managed, but the community has learned its lesson, and is working hard to fix things.
Another issue with this article is that its recommendations simply won't work.
- If you try to make "text + bytes" work, you are letting people write code that will fail. You can't implicitly convert bytes to text. You have to know the encoding! Python 2's mechanism (system locale-implicit conversion) guarantees bugs in situations with multiple encodings floating about.
- Running Python 2 libs in Python 3 would mean that "text + bytes" style failures in the Python 2 code would bubble into Python 3, destroying all assurances you're supposed to have in Python 3.
And, in the mean time, they were stuck with a language ecosystem which entirely stagnated; while there are some benefits to working with a "stale" language, this was even worse: it isn't like "the C language is no longer going to change, but gcc gets better every year, with better error messages and faster compilation and improved code generation", but "the developers of Python have decided to spend years working on something which you can't port to yet because you are blocked on libraries and libraries are blocked on tooling and language fixes, and in the mean time have gotten pretty damned hostile towards users of Python 2 and constantly claim that whatever the most recent release of 2.7 is is going to be the last one ever so people had better start moving"... and a lot of the people I knew who had been using Python have thereby moved alright, but they moved to Go, Rust, Clojure, or Elixir, and they aren't coming back to Python given that the language is really only slightly better than it was five years ago while these other languages have been going in really interesting places. And yes: there are some people using Python 3 now, but interestingly it seems to be an entirely different group of people, such as data scientists, than the people who were using Python 2: when was the last time you heard about a big website being developed using Python? You don't: that era ended when Python 3000 was announced.
Language communities evolve. Guido is working on a static type checker.
Python 3k transition was bad, but I haven't really met many people who are against the current state of the language. Re litigating the transition over and over doesn't accomplish much. Everyone knows how bad it was now
The reality is that given what beginning programmers learn, they hardly notice the difference between them. Adding parentheses to print() and some explicit conversions to lists are all that is necessary to convert most code that beginners see. The last I looked at Zed's book, the same is true for his coding examples.
Overall, I believe for newcomers it is more useful to learn Python 3. Given this background, no employer would reasonably believe you cannot learn Python 2 quickly. They will likely even be impressed you are future-proofed.
Python 3 is a non-issue in my eyes, the only reason I don't use it more is because there's only one other developer on my team that knows Python well enough to maintain anything if I get hit by a bus - Python is "weird" for everyone else because we're mostly a .Net shop, so unless I'm writing an ansible module I use Kotlin and Spring-Boot or ASP.NET so my team can easily pick it up in the "snuxoll gets hit by a bus" scenario.
The author has a point here. I've been programming Python for about 10 years now. Really concerned about the future of the language. Languages like Node and Go are progressing rapidly, but what's happening to Python? I used to complain about Node, but with the latest ES6/ES7 improvements it's actually pretty decent. The difference in usability between npm and pip is also huge.
Python is still my favorite language, but they really need to fix this part of the ecosystem. Maybe it's time to release Python 4 that has a clean upgrade path from 2 and enough language improvements to encourage users to adopt it?
Great things, IMHO. Now that the painful changes of 2 to 3 are done (or nearly so), recent releases have focused on incremental and steady progress. Take a look at the "What's New" documents:
There is a lot of great stuff there without ripping up the world and breaking all your programs. For example, the new compact dict implementation in 3.6 is a great improvement. All Python programs can potentially benefit from the memory use reduction. Bug fixes and improvements to library code are great.
The transition from 2.x to 3.x was not handled well. The core Python team should have focused much more on making transitioning code easier. Allowing code that can easily run in both 2.x and 3.x (with suitable shims like 'six') should have been a focus early on. Disallowing 'u' prefix on strings in 3.x is an example of a serious mistake. The argument of keeping 3.x pure was wrong. It is much more important to make transition easier rather than making 3.x extremely pure. That's something that C++ has got right, although maybe they went too far the other way.
My small contribution to transitioning 2.x to 3.x is "ppython". See my github repo:
Now that I've been programming in Python 3 for a few months, I find it pleasant. Writing programs that handle Unicode text correctly is easier. For small programs, it is easy to make them run under either Python 2 or 3.
I've done Python about as long as you but also like Golang for things Python is awful at.
The OP was basically talking about the JVM as if it was a language.
numpy, scipy and matplotlib are really powerful. Go and Node don't have anything close to their capabilities.
- The concept of Turing completeness
- What static typing is
- Why strings are not the same thing as byte arrays
- That the distinction between strings/bytes is a fix for a problem exists with Python 2, not something that's "broken" in Python 3.
I understand why people might still use Python 2 if they have a legacy codebase to maintain, or need certain libraries that for whatever reason have not yet been ported to Python 3. However, if you're just beginning Python programming, or starting a completely new project in the language then none of that matters - you might as well just start with latest version.
The one point he gets right is that they should have maintained backwards compatibility by allowing Python 2 modules to run on the Python 3 VM. That was a major fuckup, the implications of which should be a lesson to every language designer of what to never do. It's sad to see the developers of Angular make essentially the same mistake.
Plus native code depends on particulars of CPython VM quite often.
Python 3 was released in 2008. Eight years ago.
* two caveats:
- we have Python3 listed on the languages page whereas you'd have to click around or search to get to Python2.
- we have slightly more features on Python3 like live pylint and ability to use modules
Regarding 2to3 not being flawless: agree, could be better.
Regarding core libraries: agree, but the same problem exists on Python 2 (different core libraries have different levels of compatibility for taking string or unicode as arguments), so I don't see how this can be a recommendation against 3 specifically.
Regarding new string types and handling, this has been discussed over and over again everywhere on the web, and boils down to people freaking out at unicode by default. This one does a good job explaining everything: http://www.diveintopython3.net/strings.html and this one does a good job at showing what you get w/ Py3 (not limited to new string types): https://speakerdeck.com/pyconslides/python-3-dot-3-trust-me-...
Regarding complaints about how many string interpolation methods there are: the `f` method relies on local scope (which is prone to introducing bugs), so it's actually pretty gimmicky compared to `.format` (which takes an explicit data structure as parameter). I would do the opposite and only teach `.format` to beginners since it leads to better code.
Regarding Turing completeness: fun.
The thing to remember though is that 2to3 will never be flawless. Since the underlying representation of some things has been split into different types, it simply impossible to automatically fix some code.
For example a function returning the first letter read from a file object is going to be completely different and more explicit in py3 than it was in py2. And the end result it better. You have to fix some code now - tough.
You're essentially giving up great features like async keyword and proper encoding handling getting nothing in return. The few libraries that are still Python 2 only should be ignored.
What does this even mean?
But who knows, his actual argument is absolute mess
for a, b in zip(generatorA(), generatorB()):
In Python 2 you have to import itertools to do this. I bring this up because I find examples like this all the time in Python 3 where core, modern language features like generators are more tightly integrated and easier to use. It's really a better version of the language and if you're starting fresh on a Python project you should do yourself a favor and use it.
Try this in py2: https://repl.it/E5vd/0
But I must admit, it is nice having all those python developers distracted by working on another language and leaving 2.7 in its current state of perfection. I wonder if this has actually helped increased the use of 2.7, not having endless "improvements" tacked on. It sounds like this could be one of those weird laws of the computer age. I dub thee "Shaw's Law".
There's a case study in the Christensen's Innovator's Dilemma where a company tries to improve its milk shake by making it more chocolaty, by putting whipped cream on top, etc but for some reason this doesn't increase sales. Then some guy goes into the restaurant to figure out “What job do people hire a milkshake to do for them?” and it turns out its not the job the people making the milk shake thought.
I wish the core python developers would do the same and ask themselves not how to make the best milkshake, but why do people buy their milkshake in the first place?
I hope to continue leveraging my investment in learning python by using it for a long time.
So far I haven't been forced to use Python 3.
I've tried a few times but always found it easier to just go back to python (2.7).
I figure eventually I have to use Python 3 but at this point, my life has been fine without upgrading to python3.
I switched around 3.1, though granted I was still a novice programmer then and didn't have to port any legacy code. I had so many issues with Unicode in Python 2 (some may have been my own ignorance at the time, but with 3 a lot of those issues magically disappeared). Since text processing / NLP is my main focus, it was an easy decision to switch.
Wow - python 2.5.1. He even does not update to 2.7.
Last login: Sat Apr 24 00:56:54 on ttys001
~ $ python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
I'm confused by this statement. I can start a python2 interpreter, background it, then start a python3 interpreter. Is there some other area that they inhibit each other in?
This REEKS of wrong answers. Turing completeness? Py3 in Py2? The argument that bytes are to confusing?
"So there were 7 applicants and I rejected 7 because they sent in their demo assigments in Python 2.7.
Come on people, its nearly 2017!"
The primary and possibly only reason Python 3 has not taken off as much as some would like is because 2.7 is that good.
Another problem in the tone: no respect for his readers.
"if *I* struggle to use Python's strings then you don't have a chance."
"I’ll add one more thing to the people reading this: I mean business when I say I’ll take anyone on who wants to fight me. You think you can take me, I’ll pay to rent a boxing ring and beat your fucking ass legally."
> Not In Your Best Interests
I say what's in my best interest and py3 fixed a lot of issues for me. I'm happy with the upgrade and dropped py2 for new projects this year.
> No Working Translator
A flawless translator is simply not possible. 2to3 is the best effort idea. At some point you'll run into `def f(x): return x` and without doing a full program static analysis you won't be able to say what's the right translation. You have to do it manually where needed.
> Difficult To Use Strings
I understand that the py2's "just works" strings are easier. But at the same time, having a non-ascii name I know that most applications "just work" simply because nobody actually tested them outside of ascii set. Py3 doesn't explode on encoding for fun - it basically says: you made an assumption that kind of worked so far by accident, but now you need to say explicitly what you want to do. I think it's a good thing, even if it takes people time to adjust.
> Core Libraries Not Updated
I don't even know what he means. This really needs an example, or it's just a meaningless rant.
> Purposefully Crippled 2to3 Translator
As above. Some things are just not possible to translate without knowing what the programmer meant. Some things in py2 strings happened to work a bit "by accident" and simply won't work in py3 for good reasons. For example py2 code:
In : "abc".encode('ascii').encode('ascii')
The string/byte confusion is an issue for people who like to pretend that they're the same. They're not. A string can be represented by different byte streams depending on the encoding. That by itself tells you that they are not the same and should not be treated as the same thing. But we have many decades of conflating the two, and old habits die hard.
> You Should Be Able to Run 2 and 3
That makes no sense. The whole point of versions is that they are different. Maybe he means 3 should be backwards compatible. That is potentially valid criticism. That said, AFAICT, the breaking changes are pretty reasonable . Also, I'm not sure what world he is living in, but interop between Java and C/C++ through the JNI is not easy. As a math major I'm not sure what the "solid math" here is. I think it is ironic that the CLR is referenced here so extensively - it has made some pretty big breaking changes in the past (unlike Java, it decided to break backwards compatibility to have reified generics).
Translation is tough, _especially_ in languages like Python that don't have good static guarantees. The real problem here is probably that the code you want to produce with a translator should look as much like the original code when possible. That _is_ tough. The point is it isn't just what the program does in terms of inputs and outputs that you want to preserve. You still want it to look more or less the same.
This is not an incorrect point. I think Python 3 wanted to have both the performance of byte strings and utf8-by-default strings (finally!). The intention behind that seems reasonable. As a proponent of strong static typing, the error message he shows seems quite benign. That said, I understand how one might be unhappy about this.
This may be the case, I'll defer to someone else. That said, this is yet another claim the author makes without much backing (this alone could be the subject of a blog post given the right backing). The quip about the Python community liking bad design seems a bit gratuitous.
Uhhh..? A more trendy language? Are we really saying that Elixir is more stable than Python?
That said, claiming that the language isn't Turing complete because it doesn't include switches for compatibility is a bit of a stretch.
Port to Python 3 is almost never a rewrite.
from __future__ import unicode_literals, print_function, division, absolute_import
Funny how many haven't seen it and are arguing pointlessly.
I just checked his twitter to make sure I'm right. He's clearly joking.
Mandatory utf-8 strings would had been a reasonably nice solution I think.
As for the rest, I'll pile on:
- python3 is eight years old, not decades, and this matters: Perl4 => 5 was a similarly long transition and for similar reasons.
- python3 is obviously Turing complete. Moreover, I'm not sure its lack of whatever the author thinks TC means, actually matters in the question of whether python3 is worth choosing to learn or build systems.
- very few languages can be used "side by side" with each other, and I want aware that this was a common use case. If you have python2 code, there are various ways to invoke it without porting, eg subprocess, web service, etc. Obviously, there's various costs eg performance, system mgmt, debugging, etc.
- high probability of failure? Strange but I see the exact opposite, with more and more projects more supporting ONLY python3 ie the tide is turning. Anyway, this is scaremongering and #fakenews so YMMV.
> This document serves as a collection of reasons why beginners should avoid Python 3
> The Most Important Reason
Aledged 30% adoption. No sources for this really, just a number out of the blue. In me experience, most companies have migrated their internal software to python3, or are doing so. A great deal of desktop apps are already there too.
> You Should Be Able to Run 2 and 3
Like so many other languages/libraries, a major breaks compatibility. But hey, guess what? Python actually allows you to install and run python2 and python3 at the same time - most distros actually do this!
> No Working Translator
Newbies have no legacy codebase no migrate. This is really off-topic. New devs can use code that is compatible with both, or just plain python3
> Difficult To Use Strings
Only if you're coming from python2, but definitely not for new users, who have no background as to "how it was before".
I got bored of refuting the invalid points by this, but none of them have any validity, especially for beginners.
Also, I work at a university where we teach python to new students. They've generally had a much easier time learning algorithms and THEM migrating to C in the second semester (and actually tend to understand everything clearer) than using C as a starter. I've also never seen any of the above mentioned items being an issue.
We've modified a couple of libraries ourselves, typically stuff that was written a long time ago and abandoned. Other than that, we've never really run into a problem where the choice of one version of Python over the other was an obstacle.
Go figure. It's an enduring debate with strong opinions on opposing sides.
The whole topic feels a little flame-baity to me.
This seems unlikely. Python 2 is a very popular language. There is currently more than one implementation. So no matter how weird the Python 3 thing gets, Python 2 should be available in some form.
Python 3 is actually helping Python 2 here. It attracts those who want to do cool new language things, leaving Python 2 as a stable target.
Just taking a look at the TIOBE, in July 2016 it hit #4, it's highest spot ever.
Python 3 is a ghetto...
does this mean he tried to run python 2 using the python 3 interpreter? no shit it doesn't work ._. that's not how turing machines work