
Porting a historic Python2 module to Python3 - luch
http://lucasg.github.io/2017/07/21/Porting-an-historic-Python2-module-into-Python3/
======
radarsat1
In almost every Python-related project I work on, we're developing in an
exciting new language called "Python 2-compatible Python 3 subset". Twice as
must testing, twice as much fun!

~~~
metafunctor
Are all these projects some sort of libraries? Is that why you need to support
both Python 2 and Python 3?

All the projects I've worked on moved to Python 3 and never looked back.

~~~
daveFNbuck
If it's a large project, you can't necessarily move everything to Python 3 all
at once. You can't break Python 2 compatibility until everything works in
Python 3.

~~~
radarsat1
Partly that, and partly dependencies. For example in one project we use VTK
for visualization. The Python VTK wrapper until recently only supported Python
2. So the rest of the project is Python2/3 compatible, but to run the
visualization we have to run the Python 2 version.

VTK itself recently does support Python 3, but it's not in Debian/Ubuntu yet.
And we'd have to upgrade our code to VTK 7. (Now 8, I believe, seems we've
skipped a version.) So... lots of inertia all around, basically.

The result is that in the meantime (meantime being like, 2 years), we compile
Python 2 and Python 3 versions of everything.

Not to mention that it's hard to justify breaking user's code that we aren't
in control of just to tell them to fix their print statements -- though
obviously we'll get there eventually.

~~~
metafunctor
In the meantime, we just kept with Python 2.

When all dependencies were there, we made the switch.

------
keenerd
I've been doing this for ten years. Why wouldn't you use 2to3? In my
experience, 2to3 is all you need 95% of the time. It is really that good.

The only times it doesn't work is with python that is bad to an unusual
degree. For example, if you make a variable named "list", overriding the
built-in list type. 2to3 freaks right out. The other type of "bad" python is
stuff that abuses the "open kimono" nature of the language and heavily
monkeypatches python internals. Or things that depend on version specific
binary data, like pickle files.

~~~
jwilk
> 2to3 is all you need 95% of the time

This is absolutely not the case in my experience.

2to3 tends to produce something that appears to work at first glance, but it
actually doesn't.

~~~
aaossa
You have any examples? I was planning to use 2to3 in a project, can you
explain in which case it doesn't work? Thanks!

~~~
masklinn
First, I'd recommend python-future rather than 2to3, it has additional fixers
and replaces some with better versions.

Second, those only do the fairly straightforward changes, fixers are simple
AST transformations they don't do complex type analysis or anything, you
should use them as a starting point but don't expect the work to be over by
then: the 2->3 transitions has a number of API and semantics changes which
these tools can't handle, text model changes are the biggest one[0] but
they're not the only one by far.

If the project is non-trivial, having a good test coverage is absolutely
crucial.

Basically, 2to3 handles the first 90% of the work which are mostly drudgery
and syntactic fixes, you're still on the hook for the second 90% which is more
subtle.

Source: finalising the conversion of a ~200k SLOC codebase to cross-version
compatibility.

[0] not just the strict separation of text and bytes, but APIs being set to
one or the other so you have places where you want to ensure bytes, others
where you want to ensure text and yet others where you need "native strings",
some APIs (csv) are also incompatibly altered.

------
seanwilson
> Usually, a dev writing a Python 2.7 library will not discriminate between
> bytes-like buffers and ASCII strings (since the language doesn’t either)
> which means to have to code review everything str operations and ask
> yourself if there is a bytes operation implied or is it really a string one.

And...that's exactly why you want static typing. Code review to catch stuff
like this is poorly and primitively doing a job a computer can do for you
perfectly. Dynamic typing is fine for small projects but when you're wanting
to port over all Python 2 code that's ever been written to Python 3 types
would have made the task much less daunting.

~~~
tyingq
In this case, a new type was introduced...bytes didn't exist. So I don't know
that static typing would have helped much, at least for dealing with existing
code. You still have to decide what migrates from type1 to type2.

The "bytes" that was put into 2.6 was just an alias to str as hinting for the
2to3 utility.

~~~
seanwilson
If Python had static typing though, wouldn't that have introduced a compile
time error instead of a runtime error? At the least it would alert you to all
the places you had to make the decision instead of waiting for a runtime
crash.

~~~
tyingq
Introducing a new type to a statically typed language still requires that
decision point. The compiler can't know what was string and now should be
byte. So there wouldn't be errors, other than maybe for types you're passing
into 3rd party libraries that changed themselves from 2 to 3. Yes, it would
help after that decision.

And as mentioned, the byte "backported (sort of)" to 2.x acts and works
differently than the byte in 3.x. So there actually 3 types in play, plus
bytearray too.

~~~
seanwilson
Yeah, I'm not saying static typing would make the decision for you but it
would let you know every line in your code where the decision had to be made
instead of having to rely on runtime crashes or debugging weird behaviour. The
latter is really horrible as well as it's hard to pinpoint where the problem
originates from and you sometimes have to delve into the innards of libraries
you're using to understand what's going on.

I just think it's a good example of how strong static typing makes refactoring
large projects significantly easier compared to strong dynamic typing. Getting
everyone to move from Python 2 to 3 is basically a massive refactoring task.

------
opportune
I once spent a week porting some python2 researchware into python3. It was
only at the end of that week that I discovered that it was the actual logic of
the researchware that was broken, and not the way I was porting it. I now
distrust all FOSS except for widely used and tested ones. You would think that
a graduate student would debug (or at least fix to the point of being
interpretable) the work that they spent months on before publishing it as
finished and adding it to pip.

And yes, 2to3 fixes 90% of porting problems. But in my experience, it was
especially bad at handling, of all things, import statements. Definitely not a
silver bullet.

~~~
ofek
I'd be weary to conflate OSS with what you refer to as "researchware". Code
coming from academia is notorious for poor reproducibility.

~~~
opportune
I'm not conflating them, but clearly software can be both researchware _and_
open source. I'm not going to start distrusting tensorflow or scikit-learn,
but I'm going to be wary before I invest time in a promising niche software
package, regardless of whether it was actually from academia or not.

------
moyix
Well, this isn't exactly how I expected my PDB parsing library to end up on HN
;)

The bytes/string changeover is probably the biggest pain point for RE tools
written in Python2. Indeed, despite lucasg's excellent work porting things
over, there was still at least one byte/string issue that needed to be fixed:

[https://github.com/moyix/pdbparse/issues/39](https://github.com/moyix/pdbparse/issues/39)

Edit: Also, I have one small correction – pdbparse is actually _10 years old_
, not 5! I started writing it my first year out of college, which also
explains (in part) why the code is a bit crap. As evidence I offer the first
article I wrote on the PDB format:

[http://moyix.blogspot.com/2007/08/pdb-stream-
decomposition.h...](http://moyix.blogspot.com/2007/08/pdb-stream-
decomposition.html)

------
echion
> I’m kinda bummed out that Python devs decided not to backport Python type
> hints in Python2.7.

Can't comment on the site itself, so: PEP 484 - Type hints (
[https://www.python.org/dev/peps/pep-0484/#suggested-
syntax-f...](https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-
python-2-7-and-straddling-code) ) does suggest Python 2.7-compatible type
hints, and PyCharm, and probably other IDEs, do support them (e.g.,
[https://www.jetbrains.com/help/pycharm/type-hinting-in-
pycha...](https://www.jetbrains.com/help/pycharm/type-hinting-in-pycharm.html)
).

------
breatheoften
How about a type checked python variant ala typescript that compiles to
python2 or python3?

Do: introduce and require the compile step for awhile, flesh out the language
and build it up, get by-in and an ecosystem while also leveraging
compatibility with python2/3 ecosystems -- then eventually release a python4
that runs this new typed python language directly!

Bring us the beauty!

~~~
Rotareti
Have you tried the typing [0] module? I now use it for every project I'm
working on and I find the experience working with it similar to the experience
I have working with TypeScript. At least if you use a good IDE that can handle
the type annotations.

[0]
[https://docs.python.org/3/library/typing.html](https://docs.python.org/3/library/typing.html)

------
jwilk
[https://github.com/Microsoft/microsoft-
pdb/pull/27](https://github.com/Microsoft/microsoft-pdb/pull/27) is labelled
cla-not-required, so it's probably not the best example for "need to sign
NDA/CLA for sending PR".

But some other projects are so hardcore about CLA enforcement, they won't even
accept one-letter typo-fixes without CLA signed.

------
jxramos
I really like the suggestion about focusing on PR for PR, that is pull request
public relations. PR diplomacy, very sound advice.

+1 for articulating this bit of funny business in the Python Windows
ecosystem... "Moreover on Windows the building process was historically so bad
that you usually end up downloading binaries from some random guy on the
Internet."

------
carapace
Oh dear God why do you hate my eyes. Tiny sans-serif gray body text is of the
Devil and should be grounds for irate and overbearing truculent curmudgeons to
whine about on online forums.

------
_eht
Maybe this is an unpopular opinion coming from someone new to Python (2 years)
but the way this versioning was done is definitely a pain point. Working with
new code and looking up docs to achieve X, working with existing code and
trying to apply the same solution. God it's annoying.

Having said that, I know the decisions were not made lightly, and were backed
by logical analysis. If nothing else, it's a good case study for the evolution
of future languages (looking at you golang 2.0).

------
yegle
There's a patch from pytype that backported the type annotation syntax from
python3 to python27:
[https://github.com/google/pytype/blob/master/pytype/patches/...](https://github.com/google/pytype/blob/master/pytype/patches/python_2_7_type_annotations.diff)

~~~
sametmax
You don't really need it, Python 2.7 can already use type comments.

~~~
luch
(author here) by type comments your mean this :
[https://www.python.org/dev/peps/pep-0484/#type-
comments](https://www.python.org/dev/peps/pep-0484/#type-comments) ?

I haven't seen them used anywhere, but that's an okay solution I guess.

NB : by the way, love your blog :p

~~~
carapace
I was experimenting with them a little while ago, along with MyPy [1] but I
immediately ran into an issue where the type system couldn't describe the type
I was using, so I had to let it go (reluctantly.)

[1] [http://mypy-lang.org/](http://mypy-lang.org/)

~~~
sametmax
Same. Mypy was very limited at the beginning, and have been much more improved
since. I give it a try regularly. It is indeed way better now, although I'm
still structural typing and taking duck typing in consideration.

------
jcolella
Great article, specially with the sections on coverage on testing. Also,
specifying the process so another can easily reproduce it. Very nice!

------
pinpeliponni
Just tell the Python 2 users frankly to go fuck themselves.

~~~
melling
Swift is 3 years old. Most people who use it will be on the latest version
Swift 4 within a few months. Apple won’t let developers stay behind.

While it was unrealistic for Python to be that aggressive given its larger
community, not forcing the issue created a lot of unnecessary work for the
community. A benevolent dictator should have moved developers along faster.
Having the language fragmented for this long is extremely unproductive.

~~~
kbenson
Isn't Swift compiled? I'm not sure it's relevant to the specific problems
faced here, unless a library written and compiled in earlier versions of Swift
won't be usable in a Swift 4 project.

~~~
ori_b
That's the case. Swift is currently not abi compatible, so libraries compiled
with one Swift version only work with that version.

~~~
kbenson
Ah, I see. I assume you could compile to a C compatible library, but they you
would have to deal with marshaling costs where the types mismatch, correct?

------
timcosgrove
(100% OT of linked article: "a historic", not "an historic", unless you
pronounce the word "istoric".)

[https://en.oxforddictionaries.com/usage/a-historic-event-
or-...](https://en.oxforddictionaries.com/usage/a-historic-event-or-an-
historic-event)

~~~
fnord123
>unless you pronounce the word "istoric"

Which we do (in Britain and the rest of the English speaking world outside
North America), so "an historic" it is.

e.g. from BBC:

[http://www.bbc.co.uk/programmes/p055vr37](http://www.bbc.co.uk/programmes/p055vr37)

~~~
robotmay
I'm British and I have never personally pronounced it "istoric", and to my
knowledge I have never heard anyone pronounce it like that either. Is it a
regional thing? I can only recall it being pronounced like that in US TV shows
with comedy fake English accents.

"An hour" on the other hand makes more sense, as that is truly a soft H.

