Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice shortcut if you know the input is correct, but seems like it might be prone to failure with carefully constructed input strings. If I can find an input token that produces the same hash as a normal keyword, a program using this might misbehave in "interesting" fashions.


I think all of these techniques check whether the input string is correct. For example see here https://github.com/WojciechMula/toys/blob/master/lookup-in-s...


If I was parsing the input, then parse-time is the time to assign a token-id (I'd use an enum). Then you have a unique identifier and wouldn't need to use the fancy hash function. It's a nice function, takes good advantage of the instruction set, but seems brittle or redundant.


When parsing, you probably use a hash table from tokens to token IDs, which is where you'd use the fancy hash function.


This is exactly what Postgres does in its SQL parser.

The key question here is whether your lexing code handles each keyword in a separate path or uses a single path for anything that looks like a keyword or identifier. Most handwritten lexing code does latter, but if you are generating code with something like re2c or a parsing expression grammar, you may be doing the prior.

In the one-path-per-keyword case, you just assign the right ID in the right code path - no need for a hash table or unique hash function. The downside is that this makes it more difficult to emit generic error messages like "blah is not a valid keyword" or to have context-sensitive keywords where you might fall back to interpreting a keyword as an identifier.


When I learned to use them, it was common advice for users of lexer generators to use a single-path and disambiguate later. If you use separate paths for each keyword, you get potentially very complex regular expressions that make a branchy mess and can't compete with [a-zA-Z_]+ and a hash table (even without perfect hashing).


How would you convert the string to the enum?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: