Hacker News new | comments | ask | show | jobs | submit login

Having written a bunch of Python 2 and porting it to 3 where I deal with unknown encodings (FTP servers), I can't help but disagree with Armin on most of his Python 3 posts.

The crux of his argument with this article is "unix is bytes, you are making me deal with pain to treat it like Unicode." Python 2 just allowed to take crap in and spit crap out. Python 3 requires you to do something more complicated when crap comes in. In my situation, I am regularly putting data into a database (PostgreSQL with UTF-8 encoding) or working with Sublime Text (on all three platforms). You try to pass crap along to those and they explode. You HAVE to deal with crappy input.

In my experience, Python 2 explodes at run time when you get weird crappily-encoded data. And only your end users see it, and it is a huge pain to reproduce and handle. Python 3 forces you to write code that can handle the decoding at the get go. By porting my Python 2 to 3, I uncovered a bunch of places where I was just passing the buck on encoding issues. Python 3 forced me to address the issues.

I'm sure there are bugs and annoyances along the way with Python 3. Oh well. Dealing with text input in any language is a pain. Having worked with Python, C, Ruby and PHP and dealing with properly handling "input" for things like FTP, IMAP, SMTP, HTTP, etc, yeah, it sucks. Transliterating, converting between encodings, wide chars, Windows APIs. Fun stuff. It isn't really Python 3 that is the problem, it is undefined input.

Unfortunately, it seems Armin happens to play in areas where people play fast and loose (or are completely oblivious to encodings). There is probably more pain generally there than dealing with transporting data from native UI widgets to databases. Sorry dude.

Anyway, I never write Python 2 anymore because I hate having this randomly explode for end-users and having to try and trace down the path of text through thousands of lines of code. Python 3 makes it easy for me because I can't just pass bytes along as if they were Unicode, I have to deal with crappy input and ask the user what to do.

Python 2 is a dead end with all sorts of issues. The SSL support in Python 2 is a joke compared to 3. You can't re-use SSL contexts without installing the cryptography package, which requires, cffi, pycparsers and bunch of other crap. Python 2 SSL verification didn't exist unless you roll your own, or use Requests. Except Requests didn't even support HTTPS proxies until less than a year ago.

Good riddance Python 2.




> Python 3 requires you to do something more complicated when crap comes in.

Or in most cases: Python 3 falls flat on the floor with all kinds of errors because you did not handle unicode with one of the many ways you need to handle it.

On Python 2 you decoded and encoded. On Python 3 you have so many different mental models you constantly need to juggle with. (Is it unicode, is it latin1 transfer encoded unicode, does it contain surrogates) and then for each of them you need to start thinking where you are writing it to. Is it a bytes based stream? then surrogate errors can be escaped and might result in encoding garbage same as in python 2. If it a text stream? Then that no longer works then you can either crash or write different garbage. If it's latin1 transfer encoded then most people don't even know that they have garbage. I filed lots of bugs against that in WSGI libs.

If you write error free Python 3 unicode code, then teach me. (Or show me your repo and I show you all the bugs you now have)


> (Or show me your repo and I show you all the bugs you now have)

this would be great. Show me! I'd love to know:

https://bitbucket.org/zzzeek/sqlalchemy/

https://bitbucket.org/zzzeek/mako/

https://bitbucket.org/zzzeek/alembic/

I'm guessing you'd go for Mako first since it has the most unicode intense stuff going on (and it uses lots of your code).


As an example mako cli. You can call this an error or not, but with C locale your cmdline will die with UnicodeErrors when you open a non existing file with unicode filename on Python 3 but not so on Python 2 where it will do the correct thing. It will also die with unicode errors under the same situation when your template renders any unicode characters. Again, something that probably works fine on python 2 and correctly.

Or if you would put unicode characters into your README.rst you could no longer safely install mako. Again, Python 3 only.

These are just two things I found on github.

Another easy one: alembic README's now no longer can safely contain unicode. They would break on Python 3, but work just fine on Python 2 because of the code in list_templates.


the cmdline template runner at the moment isn't doing unicode in Py2K either, crashes there too.


I would not be surprised if you can construct contrived examples of how Python 3 can be broken. In my experience, writing real life code, I ship more stable software writing in Python 3 than Python 2.

I mostly work with subprocesses or directly reading data from socket connections, and I run all of my bytes through strict mode. If something doesn't decode properly, an error is returned. Currently I am working on an interactive way (inside of Sublime Text) to present to the user a way to see text in different encodings so they can help debug the issue on their own.

So, yes, you need to write helper functions and have an interface to deal with properly handling encodings. This has been my experience in every language I've ever worked in. I can't imagine there is a way around it. Is this a reason Python 3 sucks compared to 2? Not in my experience. I had far more issues in Python 2 with encodings and not being sure what other libraries and packages had done in regards to handling unicode data. Hmm, so ftplib accepts unicode for filenames. Does it encode it? What encoding does it use? Oh, look at that, it has just been coercing to ascii because it can.

So yeah, writing a simple little toy command line app needs more boiler-plate to deal with unicode. Any real app is going to need that and a ton more. And you are going to have to decide how to error with encodings, and how to let users identify encodings. And you are going to need to write a global exception handler for Python to capture unexpected exceptions and log them to a file so users can send crash reports. Yay, sys.excepthook!

But anyway, I think it all comes back to the fact that I know what I am dealing with far more quickly with Python 3 than with 2. Again, maybe because I don't write apps that deal with local file paths (expect abstracted through a subprocess).

Unfortunately, most of the code where I deal with crappy encodings from FTP servers and SVN is closed source. The open source stuff is at https://github.com/wbond.


> In my experience, writing real life code, I ship more stable software writing in Python 3 than Python 2.

Real life code is not the same for everybody.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: