> Conceptually, the strings are sequences of Unicode characters, and it helps to think of them that way.
They're not, they're a sequence of Unicode code points in Python 3.3+, and either 16-bit or 32-bit Unicode code units in 3.0-3.2, a distinction that is important to make. (Hint: `re.compile("[\U00010000-\U0010FFFF]")` doesn't create a regexp that matches what you think it does on 16-bit builds!)
> Conceptually, the strings are sequences of Unicode characters, and it helps to think of them that way.
They're not, they're a sequence of Unicode code points in Python 3.3+, and either 16-bit or 32-bit Unicode code units in 3.0-3.2, a distinction that is important to make. (Hint: `re.compile("[\U00010000-\U0010FFFF]")` doesn't create a regexp that matches what you think it does on 16-bit builds!)