The library we have published is a finite state machines manipulation library first of all, also it is developed to support linguistic applications (large number of entries, unicode, compile once use many times, etc.) Tokenization is one of its applications. We needed to start somewhere. In our Deep Learning Era not everything that we have created is relevant, but tokenization is. What we might add to the project is BPE or BPE- variant, support East Asian languages, Multi Word Expressions ("New York" as one token).
fefe23, Sorry we did not put all this information into the GitHub readme, we will put more documentation into the doc folder soon, I hope it will answer some of your questions. To specifically answer your questions:
Regular expressions are somewhat early POSIX standard... does not have many features that nfa-based regular experssions have like in C#/Python or PCRE …
Machines are easy to create but right now it is all done via command line tools, so you will have to write code to create it from code.
Does not have JIT.
Machines operate on int's (int32), input weight maps and variable length coding is used in places.