Hacker News new | past | comments | ask | show | jobs | submit login

Needless pedantry:

> Conceptually, the strings are sequences of Unicode characters, and it helps to think of them that way.

They're not, they're a sequence of Unicode code points in Python 3.3+, and either 16-bit or 32-bit Unicode code units in 3.0-3.2, a distinction that is important to make. (Hint: `re.compile("[\U00010000-\U0010FFFF]")` doesn't create a regexp that matches what you think it does on 16-bit builds!)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: