
Tauthon: Fork of Python 2.7 with new syntax, builtins, libraries from Python 3 - albertzeyer
https://github.com/naftaliharris/tauthon
======
naftaliharris
Original author here. I built this a few years ago, and the main motivation at
the time was that I'd heard people say that adding the new features in Python3
required breaking backwards compatibility, which I didn't believe. IMO the
only feature that really required breaking backwards compatibility was the
str/unicode consolidation and refactoring. This project was my way of proving
that we could have gotten the other features that people tend to be most
excited about (async/await, function annotations, new super(), etc) without
breaking existing Python2 code. I think it was successful at that as a proof
of concept.

It was a fun project; I learned a lot about how the CPython implementation
works and have a lot of respect for the people that built it. It was
surprisingly easy to implement Tauthon based off the work the core dev team
did on Python3:
[https://www.naftaliharris.com/blog/nonlocal/](https://www.naftaliharris.com/blog/nonlocal/)

For what it's worth, I do believe that Python3 is a better language than
Python2. We use Python 3.7 at my work (SentiLink) and we've had a good
experience with it. (If you're starting a new project or can migrate, I'd
recommend it). But I do think that the ~10 year saga of upgrading to Python3
from Python2 wasn't necessary when the main benefit was really the unicode
refactoring.

I no longer maintain Tauthon personally but there are others who are excited
about the project who occasionally add new features or bugfixes.

~~~
toyg
I feel a bit like you’re trying to rewrite history here.

When you launched it, you called it “Python 2.8”. You posted it everywhere to
gain traction, and didn’t rename it until the PSF and Guido got you by the
ear, so to speak. There was no mention of “everything but str” or whatever, as
far as I recall.

It was an outright (and hostile) attempt to fork - something that, going by
your words here, I guess you now recognise as a mistake. I guess saying “I
screwed up” is hard.

~~~
loeg
You're coloring the history with a lot more hostility than the reality. And
the condescension is really unnecessary.

This matches the pattern of Python.org developers and python 3 aficionados
being unnecessarily hostile and condescending to the concerns of Python 2
language users. You saw that in 2010; you saw it again in 2015; and you can
see it in these threads today.

~~~
strenholme
I saw a lot of hostility right here in 2019 when I pointed out some Python 2
programs are nay to impossible to port to Python 2:
[https://news.ycombinator.com/item?id=21258527](https://news.ycombinator.com/item?id=21258527)

~~~
joshuamorton
None of the things you mentioned are particularly difficult though. In fact
you're still in the realm of changes that can be trivially automated with
[https://python-
modernize.readthedocs.io/en/latest/fixers.htm...](https://python-
modernize.readthedocs.io/en/latest/fixers.html), the three issues you describe
are the print, xrange_six, and classic_division fixers.

It's certainly possible that there are parts of the migration that would be
tricky, a quick skim of the file didn't give me any obvious ones, but it's
also huge and hard to read, so I very well could have missed something.

Most of the truly challenging things to migrate involved some combination of
extension modules, heavy metaprogramming (eval/exec), and apis which change
significantly between 2 and 3 (most of which are string related, but some
libraries also decided to do backwards incompatible things)

------
berti
If you're looking to continue running old codebases without testing and
patching this isn't it. There are various incompatibilities with CPython 2.7,
some noted in the readme, some noted in issues e.g. [0], and no doubt others
unknown. It also doesn't appear to be actively maintained..

[0]
[https://github.com/naftaliharris/tauthon/issues/22](https://github.com/naftaliharris/tauthon/issues/22)

------
TedDoesntTalk
Why would I choose this over python 3? Is it for 2.x legacy code that can’t be
upgraded to 3.x?

~~~
coliveira
Many people dislike Python 3 syntax. I think this is a great project, and
given the mishandling of Python2/3, I wish we had a more systematic way to
introduce new features. I hope this succeeds.

~~~
ebg13
> _Many people dislike Python 3 syntax_

Which aspects?

~~~
coliveira
The aspect that breaks existing code. They could very well, for example,
create a print function that didn't break existing print statement. I think it
was a childish move.

~~~
acdha
It's a change which takes less time to fix than you've spent commenting in
this thread:

    
    
        python-modernize --write --no-six PROJECT_DIR
    

([https://python-modernize.readthedocs.io/en/latest/](https://python-
modernize.readthedocs.io/en/latest/))

    
    
        futurize --write PROJECT_DIR
    

([https://python-future.org/quickstart.html](https://python-
future.org/quickstart.html))

~~~
htfy96
This isn't ideal as a version upgrade may also change the parameter type from
a third party library, and these tools cannot detect it:

old.py:

    
    
      bs = raw_input()
      # Call a 3ps which originally accepts a bytestream, but now accepts a string
      some_3rd_party_function(bs)
    

After python-modernize:

    
    
      from six.moves import input
      bs = input()
      # Your code using bs as a bytestring still works
      # Runtime Error:
      # This now accepts a string, and you need to modify your code to deal with random TypeError popping from everywhere
      some_3rd_party_function(bs)

~~~
acdha
Your second case is why I had —no-six in my example to prevent what you
showed.

In both cases, as shown in the linked documentation, these tools are designed
to support staged migrations for exactly this reason.

------
mattbillenstein
I hope this is the last Python2/3 thread we have to endure - I fear it is not.

~~~
gjvc
It is possible that the transition from Python 2 to Python 3 is the worst-
handled of its kind in the history of open-source so far. Is it any wonder,
then, that things like this spring up in their wake?

~~~
cesarb
I believe the transition from Perl 5 to Perl 6 was worse. (A good overview of
it: [http://blogs.perl.org/users/ovid/2019/08/is-perl-6-being-
ren...](http://blogs.perl.org/users/ovid/2019/08/is-perl-6-being-
renamed.html))

~~~
paganel
Not as well-known but still in Python land, I'd also nominate the Zope2 to
Zope3 [1] as another such failed transition.

[1]
[https://en.wikipedia.org/wiki/Zope#BlueBream](https://en.wikipedia.org/wiki/Zope#BlueBream)

------
KMnO4
I couldn’t care less about support for libraries and builtins. If my boss came
up to me and said “Hey, I think we should use this Python 3.x library”, my
response would be “Great, let’s use Python 3”.

That said, I understand there are certain applications that are not compatible
with 3.x and the company does not have the resources to dedicate to rewriting
it. So let’s suppose there is a valid reason someone is forced to use Python
2.7. In that case, the number one priority of this fork should be back porting
the security fixes.

You can live without async, type-hinting, and f-strings. But please be
responsible when it comes to security vulnerabilities.

------
MatthewWilkes
Please, if you read this and decide it's a great idea, don't start bugging
open source maintainers to officially support Tauthon in their projects. The
most disruptive voices I've encountered in the last 6 months have been doing
this.

------
dochtman
Last commit from 5 months ago. Maybe not a great choice.

~~~
loeg
_Shrug_. The last commit on the Python.org 2.7 branch was 3 months ago, per
the release notes from the other thread today. It wasn't especially active
before that. By their own plan, Python.org 2.7's "last commit age" will
continue increasing forever.

This isn't a great metric for evaluating Python 2 forks, that's all.

~~~
berti
The Python.org 2.7 branch is unmaintained so that’s hardly a strong point.
That last commit, barring some exceptional circumstance, will be the last ever
received by that branch.

------
habosa
Is there a simple guide to understanding the str/unicode changes from python 2
to 3? I've made mistakes with strings that ruined every single piece of code
I've upgraded from 2 to 3 unless I just littered the code with defensive six()
calls ... and even then some things still broke.

------
tyingq
Interesting. Is there anything here to deal with the three states[1] of
"bytes" and "barray" vs "strings"? That was the hardest part, for me, to
release an extension that was 2vs3 agnostic. Personally, resorted to some ugly
hacks with version detection, barray detection, etc to make a 2/3 compatible
extension.

[1] The three states being 2.x prior to recognizing bytes, 2.x sorta
recognizing, and 3.x hard recognizing.

------
joshuamorton
According to the open issues, it doesn't currently support `async for`, nor is
anything from 3.7 or 3.8 included (dataclasses, f-strings, breakpoint(), a
number of performance improvements).

There hasn't been a main branch commit since python 2.7.17 was merged in ~6
months ago.

------
Mave83
please get rid of the old python...

~~~
nomel
Justifying the conversion of a legacy project, that's still being maintained,
is very difficult since the conversion doesn't necessarily bring anything
tangible, without refactoring, especially if you have fixed dependencies. And,
until python 3.7, the conversion would just make everything slower. That's
something you never want on your justification list.

------
hartator
I still don’t fully get why we can’t have ‘print x’ in Python 3. We would have
avoided 10 years of schism for something that was soooo backward incompatible.

~~~
maest
There are ample discussions about the change in `print x` syntax in Python all
over the web.

~~~
hartator
Well everyone understands the academic reasons. (Print is a function and now
you can do ‘my_print_function = print’) But really it was not necessary.

What piece of code is taking advantage that print is a pure function nowadays?

[EDIT] I am genuinely curious.

~~~
int_19h
It's the people who are learning the language from scratch taking advantage of
this simplification, because they don't need to learn the special syntax of
print with all its warts (like trailing comma).

------
bigdict
Is this the most unnecessary project ever?

Edit: no, the author built it as a proof of concept.

------
qwerty456127
> Matrix Multiplication Operator

Does this use SIMD instructions?

~~~
bigdict
Yes, if you are using NumPy. But `@` is just syntax. An operator by itself
doesn’t “use instructions”.

------
wnoise
This just sounds like Python 3 with extra steps.

------
OptionX
Let python2 die people. Move on. Isn't 10 years of tech debt enough?

------
ltbarcly3
It seems like a lot of effort to avoid the parens with print()

~~~
ATsch
The bigger porting problem is bytes vs unicode. That can't be done
automatically and is a lot of work.

~~~
ltbarcly3
I guess it seems hard, but if you understand the basics it's just a matter of
finding IO and encoding/decoding things appropriately. Python 2 str, unicode
and Python 3 str are almost exactly the same as they ever were, and if you are
doing binary IO bytes should work as expected.

It's not that big of a project, even for a moderately large codebase. If you
think you can't get it done in a reasonable period of time feel free to hire
me as a contractor and I'll knock it out.

~~~
int_19h
The real problem in practice is all the places that relied on implicit bytes
<-> unicode conversions. These are also likely to be broken on unusual inputs
due to ASCII being the default encoding, rather than user locale... but that
kind of thing is depressingly common even so.

~~~
ltbarcly3
there isn't an implicit bytes/unicode conversion in python2.

Do you mean calling unicode('some non unicode string')? That just uses system
default encoding. ( sys.setdefaultencoding() ). Just find and replace them and
slap in a .decode('UTF-8') or whatever your default encoding was in python2.

Grep for the strings encode, decode, unicode and just mechanically fix them
one at a time by making the old implicit behavior explicit. How many times
could you be doing that anyway? A few hundred? A thousand? You could even
script this pretty reliably and just page through the diff you end up with to
eyeball them one at a time.

I guess you might mean 'str' \+ u'unicodestr' or something, but again you can
find these pretty easily by rooting out where the non-unicode strings are
being produced and fixing the problem there. They are either literals or they
are coming from IO or calls to str, right? Anyway, I've done this quite a few
times and the main concern I've always had was trying to get the patch in
place before people commit too much stuff for me to be able to merge the fixed
up branch.

Of course, you could just do it little by little by taking out places that you
are relying on systemdefaultencoding by monkeypatching the default decoding
function in sys to log tracebacks whenever it is used, and then whacking them
as they come up so you end up with properly handled and explicit unicode
decoding before you move away from python2. I bet you could find and fix 95%
of the cases in a day of effort.

~~~
int_19h
('str' \+ u'unicodestr') involves an implicit conversion of the bytes operand
on the left side to unicode, yes, so clearly Python 2 has it. And it's not the
only such case- the fundamental problem is that the Python/C API for Python 2
implements this for the standard argument parsing functions. So basically any
function implemented in native code that invokes PyArg_ParseTuple("s"), will
also accept Unicode objects, and will implicitly encode them with ASCII. IIRC
the reverse is true for "u". So those conversions happen every time you cross
from Python into native - and all built-in data types, operators, and
functions are native, as are large swaths of the standard library.

And yeah, u"" literals aren't hard to find, but the problem is when you get
data flowing from different sources, so both sides are variables. For example,
one is read from a text file, and another one comes from parsed JSON - so the
former is raw bytes, and the latter is Unicode - and you need to combine them
together. Like you said, the proper way to do this is to ensure that as soon
as data crosses the I/O boundary, it should be of the correct type (i.e.
unicode rather than bytes) - which, ironically, is exactly what Python 3
encourages with its changes. But it can be hard to find all such places - you
have to actually audit every use of I/O one by one, because in Python 2, the
code by itself doesn't always reflect whether it's supposed to be dealing with
text or binary data.

~~~
ltbarcly3
So I'm back to my idea of monkeypatching the standard libraries to add logging
to a file wherever it resorts to the default encoding or an implicit decode,
and one by one fix them until the log file stops having stuff in it. After a
couple of months of not seeing any you just assume you found them all, and you
can safely bet that you'll find the last couple in a couple more months, but
by then you have managed to move to Python3 a few months ago.

------
sevanteri
But why?

~~~
ur-whale
print with brackets

~~~
michaelmior
I assume this is sarcasm, but you could do this in Python 2.x already :)

~~~
ehutch79
They mean having to use brackets

------
ehutch79
It's too bad the python 3 transition was super short notice and no one had
very much time to port their code at all...

~~~
hackingthenews
I think we need to recognize that most people/companies who stick to python 2
doesn't do it because they are being stubborn. There is a real judgment call
involved. It might look strange from the outside, but maybe it really is
daunting to migrate some code-bases to python 3.

(I got the sarcasm)

~~~
oliwarner
Daunting? Sure, but it's also daunting jumping from branch to branch trying to
find somebody to support your rusty old Python 2 installations. Or deciding to
actively run on unsupported Python.

Modern software isn't fire and forget. Developers who have grown up around
security exploits understand that. Higher management and FORTRAN77 programmers
don't. To them it's developer busywork.

The real judgement being made is money and resources.

~~~
TylerE
Rusty old python 2?

[citation needed]

I have encountered exactly zero bugs in Python 2 in production. Code doesn’t
magically stop working

~~~
oliwarner
Leaving aside the bugfixes they've kept publishing for Python 2.7 (and it's
rather massive standard library), how is your stack of ~100 external
dependencies holding up? In my experience, most libraries are dropping Python
2 support too. Many dropped it a while ago.

Your locally running, toy projects _will_ continue to run under Python 2
indefinitely. Large [esp network-facing] projects will start to fall apart.

~~~
dfox
Many large Python 2 projects come from age when it was not exactly practical
to have ~100 external dependencies and thus do not have that. It was the time
when one really though about whether having external dependency for this
particular shiny thing is worth the deployment hassle involved.

~~~
TylerE
Indeed. The main project I work on at the day gig is old enough that it's
build on an in-house framework that predates the first public release of
Django.

------
erwinh
Why tho

------
mistrial9
I will try this - thank you !

------
tengbretson
When are we going to start treating projects that are stuck on Python 2 the
same way that we treated IE6 or Adobe Flash? I'm kind of sick of hearing the
"well some people have to make technical tradeoffs in what they choose to
support" argument. We're all trying to make better software here. Catch up.

