They are also very similar (basically the Go version was a direct port of the Rust version) so the performance should be very comparable.
Sure, but different approaches are going to be more optimal for different languages.
I assume by zero-copy you mean that identifiers in the AST are slices of the input file instead of copies?
Yes. From the README:
zero-copy: if a parser returns a subset of its input data, it will return a slice of that input, without copying
Geal also makes claims that nom is faster than hand-written C parsers.
It's somewhat complicated because some JavaScript identifiers can technically have escape sequences (e.g. "\u0061bc" is the identifier "abc"), which require dynamic memory allocation anyway.
Nom comes with 'escaped' and 'escaped_transform' combinators. In theory it should be possible, with relative ease, to return a slice if there are no escape characters and an allocated string if expansion is required. Presumably you'd have to use a Cow<str> though.
Note that strings aren't slices of the input file because JavaScript strings are UTF-16, not UTF-8, and can have unpaired surrogates. So I represent string contents as arrays of 16-bit integers instead of 8-bit slices (in both Go and Rust).
Of course it is. My opinion (which is worth what you've paid for it) is that I'd just go for UTF-8 support. I can't remember the last time I've seen UTF-16 in the wild (thankfully).
Performance-wise the other thing that I'd keep in mind with rust is that in debug mode string handling is painfully slow.
Sure, but different approaches are going to be more optimal for different languages.
I assume by zero-copy you mean that identifiers in the AST are slices of the input file instead of copies?
Yes. From the README:
zero-copy: if a parser returns a subset of its input data, it will return a slice of that input, without copying
Geal also makes claims that nom is faster than hand-written C parsers.
It's somewhat complicated because some JavaScript identifiers can technically have escape sequences (e.g. "\u0061bc" is the identifier "abc"), which require dynamic memory allocation anyway.
Nom comes with 'escaped' and 'escaped_transform' combinators. In theory it should be possible, with relative ease, to return a slice if there are no escape characters and an allocated string if expansion is required. Presumably you'd have to use a Cow<str> though.
Note that strings aren't slices of the input file because JavaScript strings are UTF-16, not UTF-8, and can have unpaired surrogates. So I represent string contents as arrays of 16-bit integers instead of 8-bit slices (in both Go and Rust).
Of course it is. My opinion (which is worth what you've paid for it) is that I'd just go for UTF-8 support. I can't remember the last time I've seen UTF-16 in the wild (thankfully).
Performance-wise the other thing that I'd keep in mind with rust is that in debug mode string handling is painfully slow.
Edit: here's the URL for nom: https://github.com/Geal/nom