Hacker News new | past | comments | ask | show | jobs | submit login

Sure, but if you are using a language without support for unicode, and you don't use a dedicated library (which would be already using a kind of lexer, wouldn't it?), you have also to parse these unicode characters yourself.

A unicode_next_character function is very simple to write regardless of unicode support in your language.

I usually write it as a small (256 byte) lookup table where each entry tells you how many characters to skip next. If you don't use a lookup table, its 4 single-line `if` statements (and if you do it's a one liner, plus however many lines the table takes).

There's free source code for basic Unicode operations all over the internet.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact