Hacker News new | comments | show | ask | jobs | submit login

I'm in the same boat. I have a degree but I don't know anything about compilers. I'm convinced that being a remotely intelligent programmer means you are going to be feeling stupid most of the time.

I'm convinced that being a remotely intelligent programmer means you are going to be feeling stupid most of the time.

Yup, which is why if you want to be a good programmer, you have to commit yourself to constant tinkering and learning.

One of the most useful things you could do at this stage is go write a lexer to split some input file up into lexemes / tokens.

As an example, think splitting a CSV file up into numbers, strings, separators and newlines. If you get this right, think how easy it would be just to parse this data into a simple data structure and generate SQL to spit it into a db.

Right there, in that one lexing module, you've probably taken the single largest step towards accomplishing the tasks that Steve Yegge talks about.

Of course there are tools to generate lexers, but if you're learning something, not much beats getting your hands dirty with real code.

I think I've done that already. My first program parsed a network protocol. http://botdom.com/wiki/Dante (The code is nasty but it works)

So research how to make it less nasty. Check out the Lexer Hack article on wikipedia.

It also helps to familiarize yourself with classical pathological syntax issues for languages like ALGOL-68 (requires 'semantic actions' to parse correctly) and early versions of FORTRAN (identifiers could have intermediate whitespace, which meant that a typo could corrupt the meaning of the program) and figure out how to do parse a language with an Offside Rule (and not all offside rules are created equal; Haskell 98 is very complex). Then look at techniques like scannerless/lexerless parsing (SGLR, PEG, etc.), combinatorial parsing, etc. Understand what mathematical properties of a parser algorithm and a parser generator you should look for to ease your job. Understand why naive user of a parser generator tends to result in slower lexing/parsing than hand-coded ones. Understand why for some applications we might be able to not care. Understand how to embed good error messages in a grammar-based approach vs. hand-coded solution. Understand the notion of unparsers, which, IIRC, are not covered in any of the Dragon Book at all.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact