
Parsing S-expressions in C (2019) - azhenley
https://benpaulhanna.com/writing-a-game-in-c-parsing-s-expressions.html
======
codemonkey-zeta
Whoa this thread is way more harsh than I would have expected. Coding isn't
always about maximizing development speed or avoiding arbitrary dogmas. Some
things are just fun to code, and parsing S-expressions is one of those things.
I'm glad the author shared their experience and actually plans to use it in a
game. Better than writing the parser for fun and then throwing it away.

Also, the author admits they didn't go to University for CS. You know what you
do as part of your CS degree? You write a parser for S-expressions. This is a
curious and driven programmer learning CS and sharing their discoveries. No
need to berate them for not embedding a scheme....

~~~
dbtc
I think it is responding to a project as if it were a product.

~~~
platinumrad
Even if it were a product there is simply no justification for embedded a
second language when all you want to do is read some config files.

~~~
dang
There's plenty of justification; for example, because you felt like it.

On HN, people don't need to justify their interests or their pleasures. It is
the scolds who are at fault.

~~~
platinumrad
Sure, I was imprecise in my language. There is no justification for insisting
that one is misguided if they don't embed a full langauge, as many of the
early comments suggested.

------
platinumrad
Someone: Here is how I put a sink in my bathroom.

The geniuses of news dot ycombinator dot com: I am scratching my head. If you
had simply put a kitchen in your bathroom you would have gotten a sink for
free. As a bonus, you would now be able to cook your dinner there.

~~~
dependenttypes
What outworlder suggested is not that extreme though. Certain scheme
implementations are extremely small and you get a lot of advantages from using
one for your config files, just look at emacs for example and compare what you
can do with it to the notepad++ config. Surely you can see the advantage of
being able to do (bind "C-x" (lambda (state) ...)) compared to (bind "C-x"
predefinedaction), or the ability to use functions rather than have to
duplicate code all over.

Another positive is that it will make modding/shipping custom levels extremely
easy. Just ask these that tried to mod (or config) return to castle
wolfenstein or wolfenstein: enemy territory.

~~~
platinumrad
Yeah embedding a scripting language is not a crazy idea in the context of
building a full game but what bothered me enough to make a snarky post was the
multiple condescending replies ("I am scratching my head", Greenspun's Tenth
Rule, etc.) in response to a small piece about writing a config file parser.

------
unoti
This is fun and interesting, and I'm not sure what to think of it.

There's an old saying in game dev, which is often good advice for business as
well: "Build the game, not a library"[1].

Or as applied to this case, that might mean trying to shoot straight for what
you need in the game. Arguably, S-expressions might be the smallest possible
thing you can do.

But another approach that is often used: Have your game process a binary file
format, and build tools that read text files and compile them into the binary
format. It's very common in games to have a tools pipeline that prepares
assets for production use. Those tools can safely and easily take whatever
dependencies help you get the job done, such as a JSON or YAML parser, or even
real LISP if that feels right. This kind of approach also fulfills the
laudible goal of keeping the dependencies down (for the production game).

The output of the tool chain could be binary files. Or, depending on your
situation, the toolchain could produce code which is then run through your
compiler, and then it's ready for direct execution, skipping a parsing step
entirely.

[1] [https://geometrian.com/programming/tutorials/write-games-
not...](https://geometrian.com/programming/tutorials/write-games-not-engines/)

~~~
f00zz
For what it's worth, I once wrote a game that had a configuration file parser
written in bison/flex (I actually finished that game).

------
outworlder
I am scratching my head.

If we are parsing S-expressions... why not embed Scheme already? This way, not
only you get an S-expression parser for 'free', you can easily manipulate
S-expressions, you can add macros and so on. And you also get your scripting
language out of this deal!

Something like Racket or Chicken (and possibly others) would work nicely.
Chicken specifically has a very nice FFI - which you don't even have to use.
As an experiment, I have asked Chicken's compiler to stop at the C code
generation, added the generated code to my XCode project, and deployed the
whole package to iPhone. Not even cross compilation shenanigans were required,
since this was handled by XCode.

As a bonus*2, with a few lines of code, I opened a TCP listener, and fed
straight to the REPL. Which meant that I could telnet and have a REPL, which
could be used to replace functions on the fly and observe the results
immediately, while the app was running.

They would have to rename the project at this point, though :)

~~~
svvcfb1212
...and this is how we get unnecessarily bloated software with ungodly
dependency chains and complex build systems.

Sometimes a purpose-built module that addresses the need and maintains
simplicity is preferable over complexity in the interest of some future
potential feature that nobody has asked for yet, and may never ask for.

~~~
outworlder
Those Scheme interpreters are tiny!

I can guarantee you that a game will need a scripting language. Better get
that done already.

That's if you prefer Scheme. You could use Lua too.

~~~
learc83
>I can guarantee you that a game will need a scripting language. Better get
that done already.

Myself and thousands of people before and after me have written games without
a scripting language.

~~~
outworlder
You _can_ write a game without any scripting language support whatsoever. You
could also write the whole game in assembly if you so desired.

However, if every single time you want to change any piece of logic in your
game, you have to do the edit/compile cycle, your development will be
extremely slow. Especially if you compare with the ability to change code
while the game is running.

Not having all the game logic in C or what have you makes the game easier to
mod too.

~~~
learc83
>However, if every single time you want to change any piece of logic in your
game, you have to do the edit/compile cycle, your development will be
extremely slow. Especially if you compare with the ability to change code
while the game is running.

1\. I'm working on a game in Unity right now that takes about 10 seconds to
compile and reload the assemblies for the scripting language (only about 2
seconds of that is actual compilation, the rest is time that would be
applicable to an interpreted scripting language).

2\. I've built games in C where the entire project took 4 seconds to compile.
Incremental compilation and modular architecture can get that down to near
instantaneous for most changes.

>Especially if you compare with the ability to change code while the game is
running.

Take a look at handmade hero for an example of an architecture that allows you
to hot-reload C code while the game is running.

>Not having all the game logic in C or what have you makes the game easier to
mod too.

That's completely architecturally dependent. You can make a game easy to mod
for non programmers by storing data in a human readable/editable format. If
you want to make it easy for players to write code there's a lot more work
that needs to be done than just using a scripting language. For most games
worrying about allowing modders to easily edit code is premature optimization.

~~~
svvcfb1212
> For most games worrying about allowing modders to easily edit code is
> premature optimization.

I was involved in a game project that died extremely quickly a few years ago.
The project lead made every decision re: architecture, tooling, etc. somehow
involve worrying about modding. As a result, we ended up with nothing to show
since everything got derailed by a bizarre obsession by one person with
modding before we even had anything to mod. I believe that person doesn't take
lead roles on projects any longer...

------
neilv
For people not already very familiar with S-expressions, a slightly different
way to formulate and format the example from the post is:

    
    
        (sprites (sprite :x 32
                         :y 104
                         (animation :texture      "assets/idle.bmp"
                                    :frame-length 2
                                    :frame-width  24
                                    :frame-height 24
                                    :frame-span   5
                                    :loop         0)))
    

A benefit of formatting this way comes when it's not only a nested chain, like
in this contrived example, but you have more varied tree structure, like in
real-world code and data, and want to be able to grasp the nesting structure
visually. And without using too many vertical lines of screen space.

~~~
benrbray
what is the advantage over something more standard like JSON?

~~~
lokedhs
I'd say there are a few major benefits:

1\. Distinction between symbols and strings.

2\. A well-specified number format (integers are integers). In JSON the
precision of a number is unspecified, not something you want from an
interchange format.

3\. It has comments.

4\. It's more concise.

5\. Easier to parse.

------
csharptwdec19
Expressions are fun.

In the .NET world we have an Expression library that behaves -almost- like
S-Expressions. I'd say some things (i.e. currying) are simplified but for the
most part it's the same concept. You can look at every part of an expression
and do whatever you want based on it. We often see it used either for advanced
reflection, or, for Dynamic Method Generation.

But, having written a few different parsers around it (and writing more LISP
than I'd care to admit) I think anyone who is poo-pooing this has just never
had the literal _pleasure_ of writing a meaningful expression parser.

EDI transforms and XSLTs don't count, folks. S-Expressions are dead _simple_

------
lispm
While you follow Greenspun's tenth rule of programming (Any sufficiently
complicated C or Fortran program contains an ad hoc, informally-specified,
bug-ridden, slow implementation of half of Common Lisp) you could embed a Lisp
system, like ECL -> Embeddable Common Lisp

[https://common-lisp.net/project/ecl/](https://common-lisp.net/project/ecl/)

There are a bunch of applications which include some kind of Lisp
implementation in C. From GNU Emacs, Audacity, AutoCAD (and its various
clones), and various others.

------
mark-probst
I wrote a Lisp reader for C a long time ago:
[https://github.com/schani/lispreader](https://github.com/schani/lispreader)

An interesting problem that came up was freeing deeply nested conses without
overflowing the stack. I got a bug report about that from a user. They must
have generated the structure in memory, because if they were reading it in,
the reader would probably also have overflown the stack. In any case, the
simple solution was to rotate the structure to be freed so that it can be
freed without stack overflow or using an additional queue or stack:

[https://github.com/schani/lispreader/blob/master/lispreader....](https://github.com/schani/lispreader/blob/master/lispreader.c#L566)

------
aidenn0
Not a criticism of TFA, but for those who want to parse s-expressions, the
algorithm used for common lisp is easy to implement, extensible, and works
while only inspecting a single character at at time (no need to mess with
formatted input tools like scanf):

[http://www.lispworks.com/documentation/HyperSpec/Body/02_b.h...](http://www.lispworks.com/documentation/HyperSpec/Body/02_b.htm)

------
lokl
Why not make parsing S-expressions the game? HN crowd would love to play.

------
bigdict
Tree is a more natural data structure for s-expressions.

~~~
moonchild
Usually it's called a cons cell in this context, not a tree. The base lisp
structure looks something like this:

    
    
      data Obj =
      | Cons(Obj car * Obj cdr)
      | Symbol(string)
      | Nil
    

(Nil is important, cannot be elided.)

In c:

    
    
      typedef enum { Cons, Sym, Nil } NodeType;
      typedef struct Node Node;
      struct Node {
              NodeType type;
              union {
                      struct { Node *car, *cdr; };
                      const char *symbol;
              };
      };

~~~
rumanator
You're being needlessly pedantic. A cons cell is basically a node of a binary
tree. It has fancy names for the left and right child node because of its
relation with lisp, but it's nonetheless a tree node.

~~~
dependenttypes
It is a list node. It's just that you can implement binary trees trivially out
of lists.

~~~
rumanator
You got it entirely backwards: a cons cell is a tree node which, as a corner
case, can represent branches of a binary tree where the left child node (car)
points to a child node value and the right child node (cdr) points to another
branch (cons cell or nill).

~~~
dependenttypes
I indeed got it incorrect but a cons cell is still not a tree node (it can be
one though). It is rather just a pair, like (x, y) in haskell or std::pair in
C++.

~~~
rumanator
If you want to insist and keep pushing a patently false take then you should
at least be aware that you're trying to push a personal definition for a well-
established concept that contradicts it's definition, origin, use, and
purpose.

[https://wiki.c2.com/?ConsCell](https://wiki.c2.com/?ConsCell)

~~~
dependenttypes
(cons 1 2)

Yep, definitely looks like a tree rather than a pair... not. What was the
point in linking to this c2 topic even if it does not agree with you?

~~~
rumanator
> Yep, definitely looks like a tree rather than a pair...

Node of a binary tree. Either you have problems keeping track of context or
you're trolling. Either way I see no point in continuing this conversation. If
you have any interest in learning what a cons cell is then there are plenty of
resources to go around. If instead you prefer to push personal misconceptions
formed in total ignorance then it's fine by me as well.

~~~
dependenttypes
> Node of a binary tree

std::pair is the node of a binary tree, yeah man. Listen, it is okay to be
wrong, just admit it and move on. Pressing on it makes you look like an idiot.

"oh look, a line segment!"

"It's not a line segment, it's a triangle/polygon node"

You can surely see how ridiculous this sounds, right?

------
ColinWright
The site seems to be down ... here is the referenced page courtesy of the
WayBack machine:

[https://web.archive.org/web/20200821000640/https://benpaulha...](https://web.archive.org/web/20200821000640/https://benpaulhanna.com/writing-
a-game-in-c-parsing-s-expressions.html)

------
ejanus
The site is not opening for me. I have tried several times. Is anyone
experiencing such?

------
vinkelhake
That calloc usage and string copying looks hella broken in the first example.

------
krapp
No mention of the game anywhere else on the blog, and this post is from last
year. I wonder what happened?

------
ravenide
“Even though I'm never expecting my game to read an empty string in from an
s-expression, I didn't want that to be a limitation of the parser.”

This thing is never going to ship. Coming from someone who has written an IDE
and wasted a bunch of time on the parser because it was the cool theoretical
portion.

That’s fine, I assume learning rather than shipping is the goal here. A fine
goal it is.

By the way, the problem is NOT that he wrote his own parser and other homemade
tools. This is actually a great thing to do —third party tools usually end up
being more trouble than they’re worth. The problem is that he’s ratholing on
tiny details that don’t matter.

~~~
mlatu
> ratholing on tiny details that don’t matter

That's part of some peoples' journeys though: Learning how to work
efficiently. Run into traps and find out again.

I applaud OP with full heart. Especially for stepping up and showing it off.

I don't think I could handle all the negativity you can read here. And so,
I'll keep my private projects for myself until I'm done with them. Of course,
when I'll be ready to release, nobody will care because I couldn't show it off
because of all the negativity.

But on the other hand, I'm not doing this for any of you, only for myself, so
I couldn't care less to be frank about what any of you think. I just hope the
site being down doesn't mean OP took their server offline just because of all
the flak he got for handrolling yet another parser.

------
tobyhinloopen
And 0 value was added to the actual game. This is exactly how my side projects
are never finished

~~~
emmanueloga_
Writing a custom configuration format seems like some prime quality yak
shaving :-) Probably flatbuffers would be a good choice for the author needs.

~~~
wahern
One of the reasons service configurations on OpenBSD (e.g. pf.conf,
ipsec.conf, etc) are so convenient and expressive is because they don't shy
away from defining and implementing custom configuration syntax using yacc.

At some point _somebody_ needs to implement the configuration interface for
the human. Punting things by preferring boilerplate, machinable formats and
libraries doesn't actually solve the problem.

------
rsecora
Nice, and descriptive. I love the explanation.

It's also an example of Greenspun's tenth rule [0]

"Any sufficiently complicated C or Fortran program contains an ad hoc,
informally-specified, bug-ridden, slow implementation of half of Common Lisp."

[0]
[https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule](https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)

~~~
platinumrad
A config file parser is not anywhere close to half of Common Lisp.

------
CoolGuySteve
It's fine for a personal project, but if you're thinking of writing a custom
parser and grammar for configuration for a commercial project: Don't.

There's a lot of flexibility that comes from being able to programmatically
generate configs in your scripting language of choice and then ingest them in
C++ with little effort. For example, being able to serialize a Python dict as
json/ini/yaml/sqlite/whatever and then ingest it in C++ with some library.

It turns your complicated C++ program into a function with a dictionary
argument.

I've worked on a few large programs now that have used custom config formats
and it sucks. It's a waste of time trying to output correct syntax while
fighting through idiosyncratic syntax choices and poorly explained parsing
errors in the custom code.

It doesn't create value, it's just a thing you have to do to get to the
automation you're trying to accomplish. Avoid it by not making the mistake in
the first place.

~~~
setzer22
To be fair, S-expressions are not a made up format and in fact predate most
other formats you mention.

~~~
CoolGuySteve
You're missing the point, S-expression parsers aren't "batteries included" in
most scripting languages. The age of the format is irrelevant.

