Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Which one is better for writing a SQL parser?
10 points by liuliu on May 30, 2009 | hide | past | favorite | 10 comments
It seems that there are two choices: Lemon Parser Generator and YACC.

PostgreSQL uses YACC and SQLite uses Lemon.

My usages should be multi-thread, embedded parser for much simpler SQL syntax (a SQL subset).

Which one is better for this?




Can you just snag one of those parsers? It depends on just how "sub" your "subset" is, but for something like this, there's nothing like a snatch-and-grab if you can get away with it. (I don't recall SQLite's licence, but PostgreSQL's is very unlikely to be a problem for you.)

Even if it gives you something a bit too elaborate, it can be easier to filter your way back down to what you are looking for than to rewrite something up from the bottom. (One easy strategy is to just scream "SYNTAX ERROR!" when you see something that the grammar knows, but you don't understand. Since you'll be attaching your own object model/structs to the parser anyhow, this is virtually trivial.)


A few people (including me :-) ) wrote SQL parsers in ANTLR. I'm quite satisfied with the result and never heard anyone to complain ANTLR wasn't suitable for the task...


I would second antlr. Get the book, too, if only for the educational value.

http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-...


Last time I wrote a parser, I used yacc/bison. It wasn't particularly difficult to use, although I never stress-tested the language I put together.

Anyway, the field of parser generators seems to have grown recently. I had never heard of ANTLR before, nor Lemon. Thanks for the pointers in this thread, I'll be sure to investigate them. There is also Parsec (for Haskell), and ocamllex and ocamlyacc (for OCaml). If you use C++, look into Boost.Spirit. I use tinyjson in production, which uses Boost.Spirit and it works quite well.


I've used all of those at different times and for simple stuff they will all work. I'd recommend picking whichever of YACC, Lemon or Antlr builds easily on your platform and is the least displeasing to others you expect may need to trace through the code.

Oh and if for some reason you end up using flex somewhere, don't forget to specify the "-8" option to ensure it generates an 8-bit clean scanner.


one may stuck me to use yacc is that can yacc make a func like : query_t parse(const char* str) and do the parse part thread-safe?


Recent YACC's such as Bison support pure reentrant parsers. Here's a reference: http://dinosaur.compilertools.net/bison/bison_6.html#SEC56


make sure you check the licences... different licences apply to yacc and bison generated parsers...


Hang on is your sql LL? in which case writing a top down parser is not too difficult.. especially if you use flex for your scanner...


The LLVM Kaleidoscope example might be interesting to you.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: