Hacker News new | comments | show | ask | jobs | submit login
Show HN: ShivyC – Hobby C compiler created in Python (github.com)
120 points by ssarodia 4 months ago | hide | past | web | favorite | 11 comments

The parser has a somewhat unusual structure:


It's been a while since I've had to use the phrase "exception-oriented programming", but it fits that code well. While that might be a useful or even necessary pattern to parse something like C++ which can have almost unbounded ambiguities, AFAIK C can be parsed solely by branching on the next token in the stream except for one tiny case (typedefs).

That's not an uncommon practice in Python based on its (relatively) fast handling of exceptions. Maybe you could call it "break things and move fast."

Using exceptions is totally common, but this seems to raise/catch/log 6 errors each time it tries to parse a `for` statement. I'm a noob when it comes to compilers, but maybe a token -> function dictionary would be a better approach?

That does sound a bit extreme. (I freely admit I went by this description and never looked at the parser code myself, though some other portions I did read were pretty normal Python.)

Python implements iterators by raising a GeneratorExit, StopIteration or StopAsyncIteration exception. Exception handling is one of the core means of control flow in python along with function calls and if statements.

Also, because python lacks gotos and switch statements, "exception-oriented" code is a good way to implement efficient finite state machines common in compiler code.

It looks like that's what it is doing: https://github.com/ShivamSarodia/ShivyC/blob/156a71bce7f340d...

The exceptions seem to be how it signals failure if the token doesn't match, but they aren't being used for unbounded lookahead.

Thanks for your feedback! :)

The main downside I see to branching on the next token within the parse_statement function itself is that doing so would split logic between the parse_* functions called by parse_statement (parse_if_statement, parse_while_statement, etc) and the parse_statement function itself.

For example, suppose parse_statement was modified to only calling `parse_if_statement` if the next token was an `if`, thus avoiding the try-except pattern. Then, while parse_statement would be responsible for checking if the next token is `if`, parse_if_statement would be responsible for parsing the `(` token, the expression that follows, the closing `)`, and so on. I figured it was easier to keep the entire identity of an if-statement within the parse_if_statement function itself rather than splitting it between the parse_statement and parse_if_statement functions.

EAFP - Easier to ask for forgiveness than permission - is a popular idiom in Pythonic code. Unlike languages like C++ or Java, where you usually check for condition first and then proceed (which can introduce race conditions in multithreaded code if done improperly).

So, is it issue? If so, please, report it for developer on Github[0]

Or you know how to fix it? Then just pull[1] it to 'master' ;-)

[0] https://github.com/ShivamSarodia/ShivyC/issues

[1] https://github.com/ShivamSarodia/ShivyC/pulls

Now, compile CPython in it and you're fully boot strapped.

Cool!!!!!! Really cool! Did you try it?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact