Hacker News new | past | comments | ask | show | jobs | submit login

The only problem I have with Python3 strings/bytes handling is the fact that there are standard library functions which accept bytestrings in Py2 (regular "" strings), and Unicode strings in Py3 (again, regular "" strings in Py3).

This has led to developers attempting to conflate the two distinctly different concepts and make APIs support both while behaving differently.

A simple solution is there in plain sight: just use exclusively b"" and u"" strings for any code you wish to work in both Py2 and Py3, and forget about "". All and any libraries should be using those exclusively if they support both. Python3-only code should be using b"" and "" instead.

One could consider this a design oversight in Python 3: the fact that the syntax is so similar elsewhere makes people want to run the same code in both, yet a core type is basically incompatible.




u"" is a syntax error in python3 (or at least it was for a while, apparently its not anymore, that said...). The correct cross-platform solution is to do

    from __future__ import unicode_literals
which makes python2 string literals unicode unless declared bytes. Then "" strings are always unicode and b"" strings are always bytes, no matter the language version.


> u"" is a syntax error in python3

This has not been the case since 2012. The last release of Python 3 for which this was the case reached end of life in February 2016. Please stop misinforming people.


While u"" is accepted in current Python 3, for some reason they ignored the raw unicode string ur"" which still is a syntax error in Python 3. So, unicode_literals is definitely preferable.


> The correct cross-platform solution is to do

It's absolutely not correct, because there are many APIs which take "native strings" across versions aka they take an `str` (byte string) in Python 2 and an `str` (unicode string) in Python 3. unicode_literals causes significantly more problems than they solve.

The correct cross-platform solution (and the very reason why the `u` prefix was reintroduced after having initially been removed from Python 3) is in fact to use b"" for known byte strings, u"" for known text, and "" for "native strings" to interact with APIs which specifically need that.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: