I don't understand how you can say the code size has nothing to do with GLR and the explain that it is due to the formatting of the parse tables. The point is that table driven parsers are almost always huge compared to their recursive descent based alternatives (and, yes, recursive descent does have problems). The code size of the final binary is not all that matters because the huge c programs cause insanely slow compile times that make development incredibly painful. So that statement only holds if the time of grammar writers doesn't matter.
I was too vague on unicode. I meant unicode character class matching. Last I checked, tree-sitter seemed to use a binary search on the code point to validate whether or not a character was in a unicode character class like \p{Ll} rather than a table lookup. This lead to a noticeable constant slowdown of like 3x at runtime if I recall correctly for the grammar I was working on compared to an ascii set like [a-z].
I was too vague on unicode. I meant unicode character class matching. Last I checked, tree-sitter seemed to use a binary search on the code point to validate whether or not a character was in a unicode character class like \p{Ll} rather than a table lookup. This lead to a noticeable constant slowdown of like 3x at runtime if I recall correctly for the grammar I was working on compared to an ascii set like [a-z].