

Unicode Alphabet Detection for Python - AstroChimpHam
https://github.com/EliFinkelshteyn/alphabet-detector

======
miketuritzin
Does something like this exist for Javascript? In particular I am looking for
a function that tests whether a character is CJK or not.

~~~
nness
CJK characters are stored in the range 3400–4DBF and 3400-4DB5 (could be
others, but those are what Google comes up with first).

You can use JavaScript regular expressions to check for Unicode ranges. The
challenge is that Unicode uses UCS-2 internally, and EMCAScript 5 doesn't
handle surrogate pairs.

Thankfully, there's
[https://mths.be/regenerate/](https://mths.be/regenerate/).

~~~
sanxiyn
3400-4DBF is CJK Unified Ideographs Extension A. The main block is 4E00-9FFF
CJK Unified Ideographs.

See
[http://en.wikipedia.org/wiki/Unicode_block](http://en.wikipedia.org/wiki/Unicode_block)

------
nness
Looks like this is a nice abstraction of Python's Unicode Database,
"unicodedata."

~~~
AstroChimpHam
Yup. It does all of the stuff I had trouble finding and figuring out by
myself. I'm sure more annoying corner cases will come up later on (as they
always do with alphabets and unicode), and I'll keep adding to it.

