Hacker News new | comments | ask | show | jobs | submit login
Incrementally migrating over one million lines of code from Python 2 to Python 3 (dropbox.com)
91 points by el_duderino 10 days ago | hide | past | web | favorite | 114 comments

It's worth noting that Guido van Rossum, creator of Python, is working at Dropbox on this project, and that Mypy is funded primarily by Dropbox (I think).

I'm glad that they are doing it, because it's contributed hugely to the Python ecosystem -- especially making it easier for other companies to make the switch to Python 3.

I highly recommend mypy for new and current python3 users. It's one of the biggest and best reasons to be using python3 (not that you should need any more :) )

I've never used mypy and I used Python a lot until I recently got my first job.

It kind of looks like a band-aid. Could you elaborate on why I would use, say, mypy as opposed to Golang if I don't need the benefits of a scripting language?

edit: I can understand why mypy would be useful to refactor an existing (large) project to bring forth type safety guarantees as you go forwards, but surely for new projects there are other languages of choice if type safety is what you want?

I believe the general idea with gradual typing is that

1) It's faster to write the initial proof of concept with a dynamic language

2) It's common that the proof of concept becomes sufficiently big/stable that it turns into the final codebase

3) Gradual typing makes the path from PoC to real codebase smoother, instead of re-implementing in another language.

Point 1 is contentious of course, but that argument isn't relevant here. It is believed, and that belief becomes practice.

An example of point 2 might be OSH of the oil shell, which doesn't use mypy I believe, but was originally written in python with the intent to rewrite in C++, and eventually decided it was too costly to rewrite (somehow he decided that ripping apart the python VM would be the more fruitful path... and apparently he's making good progress there).

So basically, the usecase of mypy is when you're making the decision of continuing with python, or rewriting. Sometimes rewriting was part of the strategy from the start, other times a script just grew into a full-fledged program over time, but mypy's primary goal wouldn't be to introduce from the start.

Another usecase I've personally done is having the typing available to me from the start, but not actually making use of it until I reached a particularly nasty section of code. And so that particular chunk becomes (reasonably) statically typed, while the rest of the program remains standard python.

Although in that case I used a library that enforced typechecking at runtime rather than mypy (the library was typeguard iirc), which I only enabled on test runs (it slowed down the program significantly).

If you want gradual typing for a smoother ride and performance then just choose Julia. Seriously.

Let's wait a couple of years until Julia stabilizes itself before suggesting is ready for production

Stabilization and proliferation of libraries; afaik its ecosystem is currently only really fulfilled for numpy/scipy/ml workloads

Seems Julia developers and evangelists main objective is to "kill Python", am I wrong?

They can have whatever goals they wish, but regardless, having an ecosystem like .net or python requires a critical mass of users; and once you have that, the language goals don’t even matter; they’ll just have libraries for everything, whether they like it or not.

Sound advice. But i am sure it's fine to gp head first

If python offers type safety, and language x offers type safety, what reason do I have to pick language x over python?

5x(probably more) reduction on server costs because of less memory/cpu requirements, if you choose Go for example.

Ah, so your reasons to pick a different language are based not on type safety, but on performance.

Indeed there may be a plethora of valid reasons to pick another language over python for some project. But that wasn't the question.

Type safety of Go is leaps and bounds better than python's will ever be, with mypy or not.

That's just factually untrue.

Mypy supports generics, protocols, (via extensions) dependent types, and user-defined covariance rules. Its leaps and bounds more powerful than go's type system.

Pytype adds type inference to that list. You end up with a type system that's closer in power to Scala's or Haskell's than Java or Go's.

Mypy's various bugs are just indicative of how much of a band-aid it is.

If you truly believe that a dynamic language like python can have type safety, then imagine what a language like Go can achieve.

Python already has deviated too much from where it started and mypy offers too little to justify the verbosity.

So I guess we'll see what the future holds for them.

>If you truly believe that a dynamic language like python can have type safety, then imagine what a language like Go can achieve.

Literally exactly the same thing, but no more.

Mypy and Go's compiler have the exact same information available at compile time. Python stops being a dynamically typed language when you add Mypy. It becomes statically typed. Calling it "dynamic" is like calling C++ "dynamic" because C++ doesn't have access to run-time type information (usually).

Actually that's not true, due to the increased expressiveness of Mypy's type system, it has more information available. Mypy has more information. It can do stronger analysis. Go has less information, it can't do as much. Go can make language (syntax) changes to improve this. Python has been very smart, and the type system is implemented as normal objects using the normal syntax. Adding new types, even "special" ones (like Union or Interface) doesn't require modifying the language itself, but can be done only with modifications to a fairly nimble standard library module.

This allows python's type system to evolve much more quickly than Go's has shown itself capable of.

>Python already has deviated too much from where it started and mypy offers too little to justify the verbosity.

But python's type system offers more than Go's, and isn't particularly more verbose, so what are you saying here about Go's type system?

> Go can make language (syntax) changes to improve this. Python has been very smart, and the type system is implemented as normal objects using the normal syntax

I will leave this tidbit here:

i, found = 0, False # type: int, bool

In any case, I find these new "typing" features very awkward, overly verbose and not the reason that I chose python in the first place. If I am to write all this non-pythonic stuff to be 10x+ slower, less memory efficient, awful at concurrency and still not having solved the packaging/deployment situation... I will just use Go. Or any other language in that league, that offers so many more nowadays.

> i, found = 0, False # type: int, bool

That's actually

    i, found: Tuple[int, bool] = 0, False
in python3. Which as I said, uses normal objects and normal syntax.

As for the rest of your comment, I think that's been well discussed. There are a variety of non-typing reasons to use go over python. But if you are about type safety you're probably better off with modern python than modern go.

> normal syntax.

Right. Ok we have a problem with definitions. No this is obviously not normal syntax.

Unless we take it to mean "whatever new syntax python introduces is normal syntax, since it supports it!", which is tautological and not very useful.

I'm gonna say something, strictly so you can understand where I'm coming from, not to offend you.

You sound very young. And by "young", I mean so young to not remember that Python was always about "duck typing" and not about strict types. I still have books on my bookshelf that tried to dissuade you from using "issinstance(foo, sometype)". So no, it's not python or normal syntax.

Yes, Python now is safer than older Python. But I had 99 problems with Python and that wasn't one of them :(

You, speaking as if Python is progressing so fast while Go is standing still, feels disingenuous. Go fixed their dep/packaging woes in a few years, Python is not even close. Go is adding generics. Go becomes faster in almost every release.

Changing Python into something that doesn't resemble python, its values, or the reasons that people chose it in the first place doesn't make sense, nor does it make it "Python".

I don't want Python to become Haskell, Rust or Go. No one ever chose python for its typing system! Rather its lack of it.

Anyway, there's no accounting for taste and to each their own.


> Mypy's various bugs are just indicative of how much of a band-aid it is.

I think you're confusing an idea with an implementation. Bugs can be fixed. If mypy is useful, it will remain useful and slightly more stable.

Go has to be better than that. Java beats Python by a factor of 10, and C beats Java by a factor of 10. C is beating Python by a factor of 100. I would expect Go to be a bit faster than Java.

Because python can't do anything else at this point.

It can't be faster. It can't consume less memory. It can't stay still.

So they add auxiliary features. No matter how much they downvote you, you're right.

For a new project it doesn't make sense to not use Go, or something similar, if you're going down that road.

You're wrong on every point. Congratulations!

Cpython (every edition gets optimizations)/PyPy/Numba/Cython/... -> faster. Dict optimizations -> less memory.

At work, we have much smaller codebases that have fractional FTEs allocated to their ongoing maintenance. In spite of the difference in scale, my experience is similar to what was described in the article. Because we had focused on getting the unicode to work right in those old code bases, we had good test coverage for those features as a result.

The other common legacy problems to address were:

- Other implicit encode/decode behaviors in py2 that need to be explicit in py3

- Old 'print' and 'except' statements not valid in py3, easily rewritten

- Implicitly relative 'import' statements not valid in py3, rewritten with a little care

- Arithmetic needing a change to the '//' operator for integer division

- Waiting for py3 support in all third-party dependencies

- Dealing with restructuring in standard lib packages and in upgraded third-party libs w/ py3 support

As I reviewed the techniques for straddling py2 and py3, I was displeased with how many seemed to involve a third dialect which was not really idiomatic py2 nor py3, particularly for the unicode/bytes handling. Many third-party libraries and frameworks also made different choices for how they handled this. Trying to integrate those approaches looked to produce even uglier code.

Also, some of our code had evolved since the Python 2.2 days and had accumulated cruft to import and wrap multiple generations of older standard lib and add-on packages which we have not cared about for 5+ years. The additional package restructuring in py3 would have made this even more bizarre. I wanted to see the code reset to use standard libs where possible and cull these legacy third-party dependencies. I also wanted the code to become more idiomatically py3, so whoever visited it for future maintenance would not need to work so hard to understand it.

So, we chose a clean break where we finally clean up and modernize the code to py3-only without the added burden of supporting py2 deployments from the same code. The declared 2020 deadline helped this decision. We branched our repos and worked on py3 ports and integration testing in parallel while continuing to run py2 in production. We declared a feature freeze on the py2 code, so we would not have a merging nightmare later, and so that we could use that as pressure to prevent procrastination on scheduling the flag day where we merge PRs and convert all our repos to a py3-only worldview.

Garret Walker, from Bank of America, spoke at PyGotham last year and touched on this topic at the end of his talk: https://2018.pygotham.org/talks/seventeen-million-lines-of-p...

Disclosure: I work for Bank of America.

Why does the desktop client have one million lines of code?

Not to be facetious, but why wouldn't it?

It has to find and diff files, coordinate a very reliable file upload of many files, as fast as possible. It has to understand enough about the content of those files to be able to do useful things with them. It has to reliably update itself again and again, from possibly very old versions, it has to communicate with an ever changing API. It has to have enough analytics in it to support product development, error reporting, understanding how users use it, how the product needs to evolve over time. It needs to integrate deeply with all major OSes.

...I'm sure I've missed some things. 1 million lines of code sounds like the right ballpark to me though.

Plus the UI, the file manager intagration, the daemon, the authentification, the cache... All that cross platform.

None of those should be written in Python, except for perhaps the cache handling code. The rest should be platform-specific.

Python is platform specific.


You can write platform specific code in Python. You can get the current operating system and you can write an if statement comparing against operating systems you want to write specific code for.

> You can write platform specific code in Python.

It's clearly not designed for doing anything advanced, though: it's not like you can get access to platform-specific APIs outside of the basics without doing a lot of work.

Quite the contrary, it is easy to call system api from python. That's why we have so many python wripmers for c code.

You have the stdlib ctypes of course, but also pypy's cffi for an even better story.

We even have specialized ready to use tools for those, such as pywin32 to manipulate the windows api and registry, pyqt to leverage the c++ qt lib and all it's tooling, or watchdog to survey file changes, using the native api when possible.

In fact, mac and linuxes implement natively of lot of tools in their os for this reason.

There is also a reason it's one of the few languages allowed in prod at google. It's the best at nothing, but it's damn good at most things.

> You have the stdlib ctypes of course, but also pypy's cffi for an even better story.

Not something I would call seamless.

> pyqt to leverage the c++ qt lib and all it's tooling

This isn't native.

> mac and linuxes implement natively of lot of tools in their os for this reason

Generally command line scripts, which is something that Python is good at.

> There is also a reason it's one of the few languages allowed in prod at google.

What Google allows in production has only an oblique relevance to whether it should be used for a specific task.

ctypes is a thing, and gives you access to the world of runtime-loadable libraries on your platform. Beyond that, C modules for Python by hand, SWIG, Cython (can take Python 3.x annotations now), etc., are not bad at all.

Can you explain how platform-specific stuff takes "a lot of work" with Python, and wouldn't with some other language that is not Python?

> Can you explain how platform-specific stuff takes "a lot of work" with Python, and wouldn't with some other language that is not Python?

Sure, just look at a modern Cocoa app written in Swift, or Android one in Java, and compare it to the amount of work you'd have to do to shoehorn Python in there.

That doesn't look bad. I might actually want to write apps now.

Missing /s. Yes - I meant there are nice options.

viraptor makes a very compelling case for at least 1 option for mobile apps in Python.

About 15.5 years ago while I was still a grad student, I released a (horribly designed, mostly self-experimentation) editor in Python + wxPython that I initially wrote to run on Windows. I didn't run it regularly in Linux or OS X until 2008, and it all just worked. OS X was a little laggy with my editor, but hey, write once run on 3 platforms.

Python is really solid on multi-platform, because there are so many Python users already using every platform that want to bring Python there.


Because they're platform specific. For example, on macOS, the UI and file manager integration should be done in Swift or Objective-C(++), the authentication in some C-compatible language to match the security frameworks, and the daemon in some language that isn't single threaded or garbage collected.

You’re talking about that it uses platform specific APIs. You can use these APIs from Python, and then there is all the logic which decides how to use the APIs and that can be in Python as well.

> You can use these APIs from Python

You can do it hand-written assembly too, but usually it's more trouble than it's worth.

I think what we've worked out what you mean is just that if you were doing it you'd chose to do it in Objective C etc. That's not quite the same as anyone else 'should' do it.

Not only you can do all that confortably and productively in python, but you should.

It's crossplatform, the task has not bottleneck for this vm, and it takes complexity down to one language. One language that as the very specific characteristic of being the best at nothing, but pretty good at everything, which in such an heterogenous software is what you want.

> the task has not bottleneck for this vm

Which is why Dropbox uses 100% of the CPU while syncing? It's pretty clear that trying to handle thousands of files is not a trivial operation. Sure, Python can do the job, but something native will probably do the job faster.

> in such an heterogenous software is what you want

No, not really. I'd rather the software work well, and I'd rate Dropbox as "passable" rather than "great".

If dropbox is passable, please point at the alternative with the same features and better perfs.

I'd say it's just a hard problem.

After all, the fs routines in python are all written in c, file watching uses the os api such as inotify, so python speed has little impact on those operations. Same for the gui. Same for the network since you only have a few connections to one server.

Dropbox doesn't even index the file content, so basically not many stri g or numerical operations. I've got 15 years of python behind me, and i'm pretty good at guessing what workload it's going to be slow at. This is not one of those.

Many people tried to compete with dropbox. None of them won, because eventually, their software either can't match the features, the perfs, the availability, the price, or the ease of use.

I should know, i tried to escape dropbox, as i don't like the company ethic stance, and tried many alterntive.

Dropbox is just very, very good.

> If dropbox is passable, please point at the alternative with the same features and better perfs.

iCloud Drive works pretty well for me.

> After all, the fs routines in python are all written in c, file watching uses the os api such as inotify, so python speed has little impact on those operations. Same for the gui. Same for the network since you only have a few connections to one server.

You'll notice that these aren't the things I mentioned as things you shouldn't be doing in Python, aside from the GUI, which I will maintain should not be done in Python. Writing parts of Dropbox in Python is fine, but there are places where they really shouldn't.

iCloud is not available on every single OS.

It's available on Windows (though, it's not great because it uses the wrong tools on this platform), macOS, and iOS, which is a sizable portion of the market.

I wonder if their code is over 1M, or if they're doing what you see in a lot of marketing, which is count all the lines from included packages.

Who markets on the basis of LOC?

Dropbox, right now. These blogs are always with an eye to recruitment.

Why would a massive legacy codebase be a recruitment turn-on?

Because then you can put on your resume that you have experience working on a massive codebase.

million lines isn't really that much

That "learn python the hard way" guy says don't use python 3. As someone who is at the "Hello World" stage, should I use 2 or 3?

It is a good question, and I am glad you asked as many new python developers hear this advice from Zed.

I think in 2019 there is no question that you should be using python 3. Zed seems to be personally offended by both how the transition was handled and some of the design choices in python 3, but I think he is doing his readers a disservice if he is still recommending python 2 today.

> That "learn python the hard way" guy says don't use python 3

They are wrong. There is no good reason to start with Python 2 for new projects. The reasons people stay on 2.x are institutional inertia and legacy code. A hello world program has none of these problems.

The previous poster should have written "said" not "says". If you look at https://www.learnpythonthehardway.org/ you'll see the first paragraph starts "Newly updated for Python 3".

The work started around 2 years ago - https://zedshaw.com/2017/04/22/learn-more-python-rough-draft... .

Shaw doesn't "agree with the direction of Python 3", and regards it as "being a complete waste of human energy" - https://zedshaw.com/2017/08/26/what-if-it-worked/ . But that's different than saying that new projects, in 2019, should not start in Python 3.

Shaw also declared that Python 3 is not Turing complete so I would take his Python 2/3 opinions with a heft grain of sea salt

Covered in https://learnpythonthehardway.org/book/nopython3.html .

If I understand the trolling logic, "if it's impossible for Python 3 to support Python 2 then Python 3 cannot be Turing complete." This hinges on 'impossible' being used as a technical objection.

The argument should be that it's economically infeasible for Python 3 to support Python 2, not that it's technically impossible.

Even then he's wrong. It's perfectly economically feasible to write a python2 interpreter in python3-the-language. Take an rpython based py2 interpreter and make sure the rpython code is py3 compatible.

What he suggests is that since the python3 interpreter cannot run python2 code, that somehow this implies that python3 the language is not turing complete. This makes about as much sense as saying that since gcc won't compile my Java code, C++ isn't turing complete. In fact it's the same statement.

And there are, in fact, valid reasons why the python3 interpreter cannot run both 2 and 3 code intertwined (mostly because things are ambiguous in such a situation. Does `{}.keys()` return a list or an iterator? How do you decide? Well you need to know if the code you're running is py2 or py3 in advance and you don't.

I include his original statement here: [1].

[1]: https://www.reddit.com/r/shittyprogramming/comments/5ejbr9/p...

I made the statement about economically feasible, not Shaw. Based on many readings of the Python developer comments, there wasn't enough people time + money to make that happen for mainline Python, that is, CPython.

Your statement is perhaps true about Python-the-language. CPython has additional constraints, including specific details of reference counting and extension APIs, as well as the desires of the current developers. You'll note all the work that PyPy has had to do for that level of compatibility, and it's still not perfect.

My comment was with respect to "Shaw also declared". Shaw's posting is, in my reading, obvious mockery and not a declaration of a truly held belief.

I don't see how a flamebait posting in a Reddit group titled "shittyprogramming" means much about the contents of his book or the topic of this thread.

Isn't there something more substantial about his opinions to criticize?

(his comment was quoted in shitty programming not originally made there. It was originally made in the blog post where NOTE I was trolling now appears, I just found the first quote in shitty programming).

My comment has nothing to do with cpython. Python2, the language and Python 3, the language, are ambiguous. The broader context for Shaw's comments were that the py3 interpreter should just transparently run py2 code. Without a flag, that's not possible. Not like hard, or a time investment, or something like that. But truly probably impossible. `print(b'b'=='b')` will give a different result in 2 vs 3. An interpreter can't transparently run both. It needs to know which is which.

As to your last question, not really. Very few of his opinions on py3 made any sense to people who were at all informed. The problem was, his audience is uninformed people.

(I only have access to Reddit through a text browser, which makes it hard to figure out what's going on. I deliberately blocked Reddit at my firewall since I found the noise level entirely too high, and don't think this is an important enough topic to unblock.)

shrug I'm not here to really support or defend Shaw, only to say that kalimoxto's comment seems like a frivolous reason to disagree with Shaw's opinions on Python 2/3. Why not pick one of his more substantive statements?

What URL to you mean for the "blog post where NOTE I was trolling now appears"? https://learnpythonthehardway.org/book/nopython3.html ? The earliest archive.org entry for the page has the note:

> Yes, that is kind of funny way of saying that there's no reason why Python 2 and Python 3 can't coexist other than the Python project's incompetence and arrogance. Obviously it's theoretically possible to run Python 2 in Python 3, but until they do it then they have decided to say that Python 3 cannot run one other Turing complete language so logically Python 3 is not Turing complete.

Now, I happen to believe he's wrong - lack of staff and funding are perfectly reasonable justifications.

I listen to The Changelog podcast, and in episode #300 Shaw himself has argued that open source projects (including Python, though he didn't say that directly) are underfunded. "the strategy is if you keep the cost and the amount of money that these developers make down, then it’s easier to take their project and use it, and they can’t fight you back, they can’t sue you in court if you violate the GPL, all these things, and then you commoditize your complement, and off to the races".

As for Python 2/Python 3 co-existence, I'll note that Python/Tcl co-exist as part of the standard distribution. (Python calling Tcl for Tk calling back to Python.) I've had Python/R coexist. Shaw argues that the existence of something like .Net shows that general co-existence is possible.

I don't see him making the argument that it must be 'transparent'. But since I don't know the original source, I could have missed it.

FWIW, I considered writing a Python 2 tool to modify the AST and generate new code that would check for run-time behavior that was inconsistent with Python 3. For example, if x.values() where x is a dict was only used in an iter context, and x is never modified, then it doesn't need a list(x.values()) during the conversion.

In essence this would be a Python 3 implementation for Python 2, used to help identify problems that might come up in the 2/3 transition without jumping immediately to Python 3.

I couldn't get the business model to come anywhere near working. And since it's not hard for Python 3 code to create an AST for Python 2 and modify it, but no one has written a Python 2 in Python 3 implementation, even with the much greater demand, I again argue that it's an expensive project to do.

So anyway, I bit the bullet. Here are more substantive points:

> The end result of this defect is that most people will not bother switching to Python 3 as rewriting Python 2 code is the same switching costs as just using a totally different language completely.

This is not true. My switch from Python 2 to Python 2+3 took 2 months. It took several years to develop the application in the first place. The switching costs would have been far higher to switch to another language, even ignoring that I depend on third-party components which were already in Python 2+3 that I would need to replace.

> The strings in Python 3 are very difficult to use for beginners.

So far I've only taught 2 beginning Python courses for Python 3.6, vs about 20 for Python 2. I didn't notice much difference. Python 2 strings are also difficult. FWIW, I taught f-strings from the get-go, which worked for everyone. It's annoying that I have to also teach the two older forms. (Really, I point out they exist then ignore them.)

> The choice to not have Python 3 run legacy code will be the main reason it does not get adopted.

Since Python 3 is about par with Python 2 in use, Shaw's conclusion is wrong.

> "members of the Python project have actually told me this is impossible"

Names or citations/context would be nice. Shaw makes many statements concerning anonymous people. What did they mean by 'impossible' and did Shaw misunderstand them?

> There is no technical reason for the Python 3 version to differ from the behavior of the Python 2 version, they just decided that you should have to handle all the type problems related to Unicode and bytes with no help.

Python 2 didn't handle all the type problems either.

  >>> addstring("\xff", u"\u012c")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 2, in addstring
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal
  not in range(128)
while in Python 3:

  >>> addstring(b"\xff", u"\u012c")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 2, in addstring
  TypeError: can't concat bytes to str
> even the error message is macho and terse

Though the Python 3 error message in my example is easier to understand in this case than the Python 2 one.

Hmmm. Does the traceback let you know which expression caused the failure? I thought it was only to the line number of the expression/statement. That makes it difficult for anything to do better than simply report which variables were active in that line, like what the Django debugger does.

> If I understand the trolling logic

Zed trolls as a didactic technique. Don't pay attention to what he says, pay attention to what he's actually trying to teach you.

This works if you have a small target group which understands it without explanation. Zed writes for the internet. That approach makes him a troll, not a clever teacher.

Wow, you are completely misinterpreting what he said here. Either that or you're repeating what other idiots on the internet said. Any CS undergrad would understand that he was making a joke when he said that.

he still recommends starting with python 2

Not at https://www.learnpythonthehardway.org/ . Where are you getting the "still"?

FWIW, the latest version of LPTHW is focused on learning Python 3.

Beyond that, I personally don't find the arguments presented in versions of the book that recommend beginners learn Python 2 to be all that compelling. They read to me as one complaint about string handling that's wildly at odds with my own experience, nestled in a great big bowl of sour grapes about the fact that the breaking changes are happening in the first place.

The library support issue is a consideration, but depends heavily on what you want to do. My own sense has been that it's tools that are primarily used by devops and sysadmins that have the poorest Python 3 support, whereas the rest of the ecosystem is more-or-less 100% migrated at this point. Many have already started to discontinue Python 2 support. So, if you're saying that you're at the "Hello World" stage, I'm guessing that you'll hit a lot fewer speed bumps if you just start with Python 3 and don't look back.

Python 2 support is sunsetting in 2020 after over a decade of python 2 and python 3 support. I'd recommend starting with Python 3. Many external libraries you may need will only be issued for python 3 and many external libraries will choose to not even issue bug fixes for python 2.

He's 100% wrong. Use Python 3. Don't spend a second thinking about Python 2.

His complaints are bizarre. He mostly claims that Python 3 sucks because he can't figure out how to convert his old Python 2 code to work on Python 3 easily so therefore a beginner couldn't possibly learn the language from scratch. But all of his examples are edge cases of him intentionally fighting against the changes in Python 3 by doing things that don't make any sense like trying to concatenate unicode strings with raw binary data. It's just the weirdest argument ever.

Python 3 makes almost everything slightly simpler and more logical - especially if you ever have to deal with unicode text. And since emojis are unicode and people love emojis, that means literally everyone.

Overall, he comes off to me like people who spend a lot of time ranting about USB-C ports when in reality USB-C is mostly great and honestly just isn't that big of a deal anyway. It's a strange hill to die on.

The only good historic argument for using Python 2 was that it used to be that some popular libraries weren't updated for Python 3 yet. But that hasn't been true for years. Nowadays you are more likely to run into libraries dropping Python 2 support than you are to find them not supporting Python 3. I can't think of a single remotely popular library that still doesn't support Python 3.

I don't write a lot of Python (use it mostly when deep learning calls for it), but my understanding is that it's time to start with Python 3. I try to write Python 3 whenever I have to write Python.


The bottlenecks to Python 3 acceptance were popular package and cloud environment support.

Both have been resolved.

One oft-overlooked externality is the time taken to migrate. If I maintained a Python 2 codebase and can use Pypy as a drop in replacement after 2020, I'd certainly be tempted to do so.

Correct, but the original comment was asking about new code.

Python 2 will be EOL'd on the 1st of January 2020. So you should definitely go with python 3.

It's worth noting that only the reference implementation (cPython) is slated for deprecation. To my knowledge, Pypy and others have no such plan.

Today, that advice is uncalled for

Use Python 3

If this is your genuine question, I'm afraid you are ill-advised and your Google-fu is rusty.

A great warrior with the most powerful Google-fu can simply ask a question and command his unwitting legions to look it up for him.

I came here for good advice and I got it. I trust hn comments more than Google (maybe this is wrong of me...)

python 3 all day


I can't downvote, so instead I'll call you out publicly. This is neither a productive nor appropriate response.

We can discuss the merits of Python 3 over 2 (almost all of which I full-heartedly would agree with!) without resorting to name-calling of "the other camp".

After 12 years since v3 came out and the annual python survey showing 84% v3, i think it's really disrespectful from anyone to advice v2.

But retard is not a convincing argument, even if i'm tempted to use it too.

They are quite different languages with similar names, like java and javascript

I saw an interesting article about how a million line production codebase was able to run simultaneously in both python 2 and 3, so I don't see how you could call them "different languages with similar names."

Python 2 and 3 are nearly identical. Yes, there's some backward incompatible changes. It's nothing like the gulf between Java and JavaScript which are only related in that they use C-like syntax. Java and C# have more in common than JavaScript.

That is not correct at all. Python 2 and Python 3 have relatively few syntax differences. You could teach someone the key differences in like 10 minutes.

In fact, I would even say that the difference between Python 2 and Python 3 syntax is far less than the syntax difference between traditional javascript and the latest revisions to javascript (ES6+).

Nice write up.

At Thread we did a similar thing. Admittedly our codebase is ~10% as big, but incrementally adding linters for incompatible code, and keeping the CI green, helped loads. We only had about 2-3 days of engineering time to ship the final version, the rest was done in 10% time.

Is python finally getting past the 2/3 jump?

It has for 2 years now. 3.6 was the tipping point, such a fantastic release.



For 2018 stats

Agreed. I have been very pleased with the cleaner asyncio API in 3.7 though. Much simpler for people writing async code for the first time.

I suggested for something like asyncio.run() to be included much earlier than that but python-ideas is a dead end for actually getting things into python. Eventually yuri saved us all because he could demonstrate the usefullness of things on uvloop first.

Can confirm, 3.6 is where my office made the leap too.

It's already been since 2015. Every major Python dependency has been Python3 compatible for several years, and many have already gone Python3 only.

If you have a good test suite it's pretty easy to migrate, you can easily do a couple thousand lines per day. There were a few third-party testing tools that used to be fairly popular that are no longer well maintained, so if your tests are written in those then you might be looking at re-writing the app from scratch. But otherwise it's fairly straightforward to port.

Even if you did your string encoding just using guess-and-check, which most people did, it still doesn't take all that long to upgrade as long as there are tests.

I would say yes. I'm pretty sure most people are writing all new projects exclusively in python3, except if you need them to run on old distros without any dependencies (lots of distributions shipped for years with python2 but not python3 installed by default). The ecosystem of 3 is very mature - I've been running my servers' production code for 5+ years in python3 and had no major issues with that choice. The only thing left is "legacy codebases".

IIRC osx still ships python2 only with the OS.

Python 2 EOL on January 1, 2020 is pushing a lot of folks to seriously look at this. This is a big thing we're looking at right now at Khan Academy.

While adding Python3 support for numpy at a time when I worked for a large Ad-serving company, we managed to uncover a bug in python 2/3's import code that had been latent (no crash) in Python 2 for 15+ years. The problem was unique to people who had made it possible to import the same library from two filesystem locations.

In python2, it was silent, in python3 it was a segfault. There was really only one person in the company truly qualified to understand and fix the bug.

I'm finally moved over to Python3 but boy, was that an unwanted transition.

What a disaster of lost productivity

Python 2 to 3 migration is straightforward for the most part but in the end it has some nooks and crannies

Trickiest one that we tripped: strings in Python 3 have the __iter__ method

mypy is a joke of a tool. I have actually used it, and 80% of its messages are useless junk or just plain wrong. Granted, the other 20% can be on point. All things considered, it's better to be with a tool like it, than without.

> and removed Python 2 from our application binary. This marked the end of our Python version migration!

No. :-) Now you need to remove all compatibility code, modernize your syntax and gradually start using new features.

I've noticed a trend that migrating from Python 2 to Python 3 includes adding mypy annotations and going async.

At that point, why not Go? You're trying to correct for a language that was designed for scripting, not application software. Go already has a type system, coroutines are a simpler model of concurrency than async (and Go can actually use multithreading), you don't have to choose between writing and async or nonasync library code, built in formatter (wheras Black is still experimental), and the code will run 10x faster.

The crucial lesson from the Python 2 -> 3 transition (which this article is also mainly about) is that incrementality is super valuable. In retrospect, a single codebase supporting 2 and 3 is the obvious best option, but that wasn't at all clear in the beginning. (For example, the u"" string syntax, which is crucial for compatibility, wasn't even legal until 3.3!) Moving to Python 3 incrementally is painful, but it's possible. Moving to a totally different programming language incrementally isn't possible, other than when you're starting brand new projects.

I see your point, but I think when migrating over a million loc, it's a question of practicality. Incrementally migrating a project from Python 2 to 3, between which at least large-scale architectural patterns are more-or-less identical, is a completely different story from migrating to Golang, which uses drastically different paradigms to structure code and think about data flow.

Reminds me of some of the points made here: https://www.joelonsoftware.com/2000/04/06/things-you-should-....

Are most people migrating over a million loc? And if they are, is it one million loc Python service, or a bunch of small services that could all be rewritten independently? Chances are the worst case scenario is very rare.

Golang really isn't that different from Python. They are both incredibly imperative languages with a one way mentality, and most people who learned CS in college will easily recognize both. The major difference is that Go has much less to learn, and very few patterns to speak of.

>You're trying to correct for a language that was designed for scripting, not application software.

Well, that's not true. Are you basing this solely over a lack of a type system?

As an historical observation, Python 0.9p1's README said (emphasis mine):

> This is Python, an extensible interpreted programming language that combines remarkable power with very clear syntax.

> This is version 0.9 (the first beta release), patchlevel 1.

> Python can be used instead of shell, Awk or Perl scripts, to write prototypes of real applications, or as an extension language of large systems, you name it.

Note the emphasis on "prototypes of real applications." That would suggest that version 0.9 was not really designed for "real applications".

However, by the 1.0 release this timid statement disappeared. For example, by the 1.0.3 release the README was telling "For a quick summary of what Python can mean for a UNIX/C programmer, read Misc/BLURB.LUTZ." That quotes an email dated Thu, 14 Oct 1993 17:10:37 GMT which said:

> It's central goal is to provide the best of both worlds: the dynamic nature of scripting languages like Perl/TCL/REXX, but also support for general programming found in the more traditional languages like Icon, C, Modula,...

> As such, it can function as a scripting/extension language, as a rapid prototyping language, and as a serious software development language. Python is suitable for fast development of large programs, but also does well at throw-away shell coding.

Thus, in a very narrow and irrelevantly technical sense, ilovecaching is correct - the very earliest versions of Python were not described as being for full-blown application software. As it turns out, despite not being designed for 'real applications', it turned out to be useful for 'serious software development', and the evolution for the last 27 or so years has kept that goal in mind.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact