Hacker News new | past | comments | ask | show | jobs | submit login

Yep, I'm always surprised by the number of people of people here who dismiss the usefulness of unicode. Not "dismissing" the hard way, but simply saying it's not a problem. I understand that we may have a lot of western/english people here but, unicode for me is a life saver. For example, I work on stuff for french speaking people, and I need the euro sign. In that very simple case, unicode already solve many issues : I don't have to convert between ISO-8859-1, ISO-8859-15, CP-1252 anymore, whatever my database uses, I just enforce unicode. Moreover, I can have french comment in source code (that's very important because sometimes we're talking about laws and some of our technical terms in those laws are not translated in english).

(I understand that in this particular case, I could have enforced ISO-8859-15 as well, but the crux of the matter is that with unicode built in python3, I don't have to think about it anymore)

And now my customer is copy/pasting additional information from documents coming from other countries as well...




You do realize unicode has been "built into Python" pretty much since the beginning (Python 2.0, 17 years ago), right?

The main difference is that unicode has a more efficient internal storage since Python 3.3+ (a neat technical detail), and that mixing bytestrings and unicode will fail with an explicit error since 3.0+ (a good idea IMO).

But that Python 2.7 didn't support unicode is simply FUD.


I have moved away from Python 2.7 because unicode support was not good enough.

Not good enough means I had to prefix every single string with "u" to make sure it's unicode. It was especially painful with code like this :

   logger.debug("Name of the person I'm debugging {}".format(name))
forgetting the "u" simply lead to various crashes if the name had characters that could not be encoded in whatever the debug was piped to. Always thinking about the "u" was nothing I had the time to.

Just an example.


Well now you have to prefix b"" instead for bytestrings. The argument works both ways -- there's no magical improvement that happened in Python 3 (except that nice storage optimization in 3.3 that I mentioned above).

It's actually good practice to be explicit about the type, and write b"" and u"" always (easier on the person reading the code). The u'' literal prefix was re-introduced in 3.3 for this reason.


The argument works both ways

It doesn't because strings and text are a lot more common than bytes. Yours is a really weird line of argument - that the 3.x changes and what's in 2.7 are fundamentally equivalent and thus the changes are outright unnecessary and that the people who made them just got it wrong and did it for no apparent reason or benefit. I get that someone might not like what they did or how they did it but your take should give you pause just by its general improbability.


You're mixing up two things: separating string types in the language, and using prefix literals. Completely orthogonal concerns.

As I said, u'' literals were re-introduced by the Python developers themselves, in Python 3.3.


I don't think I am. In this thread you're repeatedly making the point that 2.7 supported Unicode and the difference is mostly technical details of things like internal representation and/or a matter of prefixes or whatnot. This just isn't true. The fundamental change is - in Python 2, strings are bags of bytes and in Python 3 strings are collections of Unicode codepoints and you need to go through an encoding to convert to and from bytes. This is a big (and necessary) deal. No amount of debating the finer points of implementation, feature retention or reintroduction, etc is going to make that difference not matter.


What I said:

1. BOTH Python 2 and Python 3 come with built-in support for BOTH bytestring and unicode (contrary to OP's claims I responded to)

2. That mixing bytestrings and unicode will fail with an explicit error since 3.0+ (a good idea IMO)

3. Unicode has a more efficient internal storage since Python 3.3+ (a neat technical detail)

4. It's good practice to be explicit about the type of literals, and write b"" and u"" always

5. That Python 2.7 doesn't support unicode is simply FUD.

Can you articulate which point you're actually contesting? I'll be happy to clarify, but I'm honestly unsure what you're responding to.


I think almost all of these are wrong.

The person you replied to didn't claim Python 2 doesn't support unicode. 'Bytestrings' has what is wrong with Python 2 neatly summarized in a single word (and this, incidentally, is a term the Python documentation avoids these days because it's bad). 3 is true but not really related to the topic at hand. 4 is, I think, outright wrong. As to 5, I'm not sure why you would even want to defend that. It's not what the poster said and even if they had said it, they'd be just wrong - it's not 'FUD'. That is just you being grumpy and rude.


u"" was reintroduced to avoid people of fixing all their strings (see PEP 414 rationale).

that was my point, by moving to python 3, I removed all my "u", the thing other developers not wanted (see PEP-414 again); I loved the "purity".

but removing u was tedious (at best).

In my use case, strings are "string" and binary are "bytes". Which I think is much safer.


I don't think anyone claimed it didn't support Unicode. Only that it allowed mixing bytes / strings and the default type most people used from the beginning was str. That's a trap that they'll regret the moment they actually need to handle something outside of Latin-1.

Lots of python 2 code out there fails because the default option was simple but bad. I know, because my name broke the CI pipelines in a few different projects.


Did you actually read the thread you're replying to, or are you on auto-pilot?


Let's be honest, A lot of peoples experience with Python is restricted to the North America.

For these folks encountering anything other than ASCII is pretty uncommon.

Personally, I've worked on a Python 2.x project deployed in heavy industry across the globe including Japan and the number of times we had Python 2 unicode nightmare issues was too many to mention.


For these folks encountering anything other than ASCII is pretty uncommon.

Hmmm... [THINKING FACE (U+1F914)]




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: