Hacker News new | past | comments | ask | show | jobs | submit login

I don't say the transition is easy or maintaining a codebase that handles these issues correct that supports python 3 and 2 at the same time.

But looking solely at python3 , everything byte and unicode related just got so much easier and better testable and with better error messages at better points in your code than in python 2.

So claiming python 3 broke stuff just not what the case is.

Python 2 had a deeply flawed, boundryless view of unicode vs bytes and python 3 fixed that.




So for example what is your rebuttal to my actual example pointing out that "a".encode('unicode_escape') returns bytes instead of strings? Why/how the heck is it even encoding anything when it never even asked me for an encoding? If for nonsensical reasons it really wants to encode things, shouldn't it at least be asking me for an encoding if it wants a "clean separation" between strings and bytes? You find Python 2's behavior of returning a string to be somehow deeply flawed but Python 3's returning of bytes to be sensible? Really? What problem are they solving here?


Encode and decode always move between str and bytes. Otherwise you could never write a function that takes an encoding as the argument and uses it in the middle of its code (because the types of your locals would vary at runtime in incompatible ways).

What is your reason for using this obscure encoding anyway?


> Encode and decode always move between str and bytes.

Yes and my entire point is that this makes zero sense when we're talking about escaping.

> What is your reason for using this obscure encoding anyway?

Seriously?


https://docs.python.org/3/library/codecs.html

The encoding `unicode_escape` is not about escaping unicode characters. It's about python source code. It's defined as:

> Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped.

It makes absolutely no sense to have escaped unicode characters as actual unicode string. If you really need that, a version that also works in python 2 would be:

     u'hellö'.encode('unicode_escape').decode('ascii')


> It makes absolutely no sense to have escaped unicode characters as actual unicode string.

Absolutely no sense? So is a basic Python expression evaluator like this complete nonsense to you? #!/usr/bin/env python3

  try:  # Python 2
   from Tkinter import Tk, Entry, END
   import tkMessageBox as messagebox
  except ImportError:  # Python 3
   from tkinter import Tk, Entry, END, messagebox

  import ast
  def my_eval(s):
   # assume I've implemented this functionality manually...
   return ast.literal_eval(s)

  root = Tk()
  e = Entry(root)
  e.insert(END, '"Hell\\xc3\\xb6"')  # Assume the user typed has typed in a Python expression, not me
  e.pack()
  root.bind('<Return>', lambda evt: messagebox.showinfo("Result", repr(my_eval(e.get()))))
  root.mainloop()
I get an escaped string... because that's quite literally what Entry.get() gives me from the text box the user typed into. The simple fact that I got a string containing Python source code with escape characters makes "absolutely no sense" to you?

Note that that wasn't even my choice! That was the choice of the built-in Python GUI toolkit... distributed by the same folks who decided this string/bytes overhaul was a brilliant idea...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: