

Making Sense of Python Unicode - leecho0
http://lobstertech.com/2009/jun/07/python_unicode_tutorial/

======
jmillikin
"But UTF-8 has a dark side, a single character can take up anywhere between
one to six bytes to represent in binary."

What? No! UTF-8 takes, _at most_ , 4 bytes per code point.

"But UTF-8 isn't very efficient at storing Asian symbols, taking a whole three
bytes. The eastern masses revolted at the prospect of having to buy bigger
hard drives and made their own encodings."

Many asian users object to UTF-8/Unicode because of the Han Unification, and
because many characters supported in other character sets are not present in
Unicode. Size of the binary encoding has nothing to do with it -- in fact,
most east-asian characters take 4 bytes in UTF-16.

"American programmers: In your day to day grind, it's superfluous to put a 'u'
in front of every single string."

American programmers _who aren't morons_ : Use 'u' or the first time somebody
tries to run an accent through your code, it'll come out looking like line
noise.

~~~
nas
In 2.6 you can use: "from __future__ import unicode_literals". Use b'...' to
get a str() instance instead of a unicode() object after that.

~~~
mshafrir
Some gotchas: [http://stackoverflow.com/questions/809796/any-gotchas-
using-...](http://stackoverflow.com/questions/809796/any-gotchas-using-
unicode-literals-in-python-2-6)

------
qw

      Lobstertech wrote:
      > American programmers: In your day to day grind,
      > it's superfluous to put a 'u' in front of every single
      > string."*
    
      Good idea, who cares about internationalization? You can
      always just pay someone in India to go over all of your
      code the day you notice the rest of the world
    
      Regards,
      European developer
    
      (... who doesn't want more competition)

------
s-phi-nl
A good tutorial on Python Unicode is
<http://diveintopython3.org/strings.html>. It's also my favorite explanation
of Unicode in general.

------
leecho0
bonus tip:

don't forget to add:

    
    
      # -*- coding: utf-8 -*-
    

also, if you're using vim, make sure your encoding as well as your
fileencoding are correct (they're different):

    
    
      set encoding=utf-8
      set fileencoding=utf-8

~~~
baq
> set encoding=utf-8

this will cause problems on windows.

