Hacker News new | comments | show | ask | jobs | submit login

I think I've done that already. My first program parsed a network protocol. http://botdom.com/wiki/Dante (The code is nasty but it works)

So research how to make it less nasty. Check out the Lexer Hack article on wikipedia.

It also helps to familiarize yourself with classical pathological syntax issues for languages like ALGOL-68 (requires 'semantic actions' to parse correctly) and early versions of FORTRAN (identifiers could have intermediate whitespace, which meant that a typo could corrupt the meaning of the program) and figure out how to do parse a language with an Offside Rule (and not all offside rules are created equal; Haskell 98 is very complex). Then look at techniques like scannerless/lexerless parsing (SGLR, PEG, etc.), combinatorial parsing, etc. Understand what mathematical properties of a parser algorithm and a parser generator you should look for to ease your job. Understand why naive user of a parser generator tends to result in slower lexing/parsing than hand-coded ones. Understand why for some applications we might be able to not care. Understand how to embed good error messages in a grammar-based approach vs. hand-coded solution. Understand the notion of unparsers, which, IIRC, are not covered in any of the Dragon Book at all.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact