S-expressions

manaskarekar · on Feb 4, 2013

A highly recommended read from defmacro.

http://www.defmacro.org/ramblings/lisp.html

Other articles on the site are very nice as well: http://www.defmacro.org/

ScottBurson · on Feb 4, 2013

Yes, it is a great article, but it saddens me that we have to invoke XML to explain S-expressions rather than the other way around. S-expressions have been around since 1958! They are maybe the best reason that everyone should learn Lisp even if they don't wind up using it much, just so they know there are problems it already solved long ago, and they don't need to reinvent the wheel.

chimeracoder · on Feb 4, 2013

> Side note: Wouldn't it be awesome if composer supported composer.sexpr files natively, so that we would no longer have to write JSON? No, not really.

Why the quick dismissal? I find s-expressions much easier to work with than JSON.

igorw2 · on Feb 4, 2013

Let me clarify the reason for this.

Many people asked for yaml support in composer, and it was shot down. Here's why: Once you start supporting many formats you lose interoperability, as any tooling now needs to support all formats. That is the main reason why composer will not support sexpr.

octo_t · on Feb 4, 2013

Parsing S-expressions is easy: Dijkstra solved it in a really nice way with the Shunting Yard Algorithm.

slurgfest · on Feb 4, 2013

Thank you for this post, because it made me curious enough to look up Dijkstra's views on LISP. (It seems he credited it for some innovations but had a lot of complaints about its early design and docs, and favored ALGOL family design decisions)

krickle · on Feb 4, 2013

I thought that was for infix expressions with order of operations?

splawn · on Feb 4, 2013

Was the use of dict and list necessary? This is an actual question, not a rhetorical criticism. I have recently been trying to learn common lisp.

rdtsc · on Feb 4, 2013

It is if you want to differentiate between a list of lists and a map. Maps are not natively supported in s-expressions.

So there is a need to create a sub-language to handle special data structures.

I have done this as well. But I use a special namespace '#dict' for example. So it would be

    (#dict (k1 v1) (k2 v2))

This way there is less of a chance of collision between data and this sub-language. It also allows for extension, so for example instead of dict you can use #classname to say persist a perticular object of a given class name (json doesn't allow this, there would have to add a '__classname__' key to the object for example.

Another way of doing it is using a flat list and just using an alternation with keys being prefixed by some special character:

   (:k1 v1 :k2 v2)

I think Clojure and some lisps do that...

groovy2shoes · on Feb 4, 2013

Clojure s-expressions actually have syntax for vectors, maps, and sets (in addition to the ones mentioned in TFA):

    {:k1 v1, :k2 v2}   ; curly braces for maps
    [v1 v2 v3]         ; square brackets for vectors
    #{v1 v2 v3}        ; pound-curly for sets

Clojure's reader also provides some mechanisms for adding readable values, which is awesome for serialization and interchange. There's more info here: http://clojure.org/reader

rdtsc · on Feb 4, 2013

You are right. My bad, I should just looked it up. Thought I could wing it from memory.

pnathan · on Feb 4, 2013

That's an nice idea, using #<datastructure> as a straight-up constructor. I've always found hash-tables to be a colossal pain to create in Common Lisp. I tried to create a special JSON-esque syntax in CL for them, but it wasn't a smooth semantic drop-in.

rdtsc · on Feb 4, 2013

Yeah I think it is pretty neat too. I had a couple of design ideas. This way I can persist some other structures (from Python for ex):

For set():

   (#set v1 v2 ...)

For circular object graphs, I can encode an object reference to avoid serializing the same object twice:

   (#ref oid)

Then of course there are custom class names I can pass in to serializer and deserializer:

    (#MyClass (attr1 val1) (attr2 val2) ...)

This might not map to all languages but it worked for me.

pnathan · on Feb 4, 2013

CL uses # as a special character; I rewrote the idea into the (CL-standard idiom) `make-map` nomenclature: https://gist.github.com/4709844

But yea, I hadn't thought of particularly using this idea. It's a good idea from Python (or maybe its older than that, IDK). :-)

aerique · on Feb 5, 2013

Isn't calling it a "colossal pain" a bit of hyperbole? A simple wrapper function should take away most of the pain.

DasIch · on Feb 4, 2013

Using dict and list here preserves the dynamic nature of JSON. If you have some sort of schema - which in reality you probably would have, even using JSON - that implies dict and list you wouldn't need to use those.

enduser · on Feb 4, 2013

dict is not a function or macro in common lisp, so the example is fictitious. in CL one would construct an A-list or a P-list (both cons-based ways of expressing a small map). see http://www.gigamonkeys.com/book/beyond-lists-other-uses-for-... for more info on that.

the need for (list ...) would depend on the definition of dict.

brudgers · on Feb 4, 2013

"Stay tuned for follow-up posts."

For those interested, there was follow-through on the promise to follow-up:

https://igor.io/

sea6ear · on Feb 4, 2013

The follow-up posts look really interesting.

Looks like they go through the complete implementation of a lisp system including macros using PHP as the host language.

ecmendenhall · on Feb 4, 2013

> Side note: Wouldn't it be awesome if composer supported composer.sexpr files natively, so that we would no longer have to write JSON?

If you, too, think this would be awesome, check out extensible data notation: https://github.com/edn-format/edn.

billpg · on Feb 4, 2013

You could write code in XML and JSON too, you just need to define meanings.

[add][item]1[/item][item]2[/item][/add]

TeMPOraL · on Feb 4, 2013

> You could write code in XML and JSON too

And many do (e.g. Ant), but doesn't that sadly look like reinventing the wheel?

mwexler · on Feb 4, 2013

What's great about this is not only that it's a good quick summary, but it left me wanting more. That means I actually feel like I was getting somewhere with it, instead of it being instantly forgotten.

lttlrck · on Feb 4, 2013

Thanks! 30 seconds of reading and I have at least a basic understanding.

japaget · on Feb 4, 2013

Near the front of the main article is "(sexpr lexer reader eval forms special-forms macros walker)". Each word in the above parenthesized expression links to another article in the series.

MichaelGG · on Feb 4, 2013

Why are the keywords written with a quote?

> (keywords '(useless microframework academic swag))

Shouldn't it be a list with strings or something? What if the keywords were complex structures not just strings?

ScottBurson · on Feb 4, 2013

You're asking two questions here. First, about the quote. I'd say it's a mistake. The author hasn't been completely clear (and I think is possibly a little confused) over the difference between s-expressions in general, which are simply a tree-structured representation, and Lisp code, which is written as s-expressions but is subject to additional rules.

In this case, it's pretty clear from the surrounding discussion that these s-expressions are not intended to be executable code. That being the case, there's no reason to use the quote. This line should have been

> (keywords (list useless microframework academic swag))

Secondly, as others have mentioned, there is an important difference between symbols and strings that is not necessarily easy to explain -- indeed, it took the Lisp community a few decades to get clear on it. Fundamentally: strings are data; symbols are names. Notice, in this example, that all of the keys in the dict, as well as "dict" itself, are represented by symbols. That is because each of these names something in the domain of the program that manipulates the representation: an object type, an attribute of an object type, or, in the case you are asking about, a member of a set of keywords.

If you've used enumeration types in a language that has them (C/C++, Java, C#, etc.), there's an analogy: symbols are like literals of a single, pre-existing enumeration type containing all possible names.

RaphiePS · on Feb 4, 2013

The only Lisp I know is Racket (basically Scheme), so take the following with a grain of salt.

'(foo bar baz) is almost the same thing as (list foo bar baz). However, ' also makes the contents of the list into symbols, not variable references. Basically, '(foo bar baz) returns (list 'foo 'bar 'baz), not a list containing the values of the variables foo, bar, and baz.

Symbols aren't much like strings. They're immutable, and actually just refer to a unique number, not characters. I hope that sort of makes sense!

nivertech · on Feb 4, 2013

    '(foo bar baz)

is sugar for

    (quote (foo bar baz))

list function unlike quote will evaluate it's arguments, i.e.

    user=> (quote (a b c))
    (a b c)
    user=> '(a b c)
    (a b c)
    user=> (list a b c)
    CompilerException java.lang.RuntimeException: Unable to    resolve symbol: a in this context, compiling:(NO_SOURCE_PATH:1) 

   user=> (list 'a 'b 'c)
   (a b c)

randallsquared · on Feb 4, 2013

> Symbols aren't much like strings. They're immutable, and actually just refer to a unique number, not characters.

Depending on language, that could make them exactly like strings. :)

derleth · on Feb 4, 2013

If you have immutable strings and a smart compiler, yes, you could indeed have strings and symbols that are functionally identical. In fact, in Old Lisp, there were no strings and symbols were used for everything we use strings for now; that lead to weak string handling and it's why Common Lisp and all modern Lisp variants have a string type.

nialo · on Feb 4, 2013

if it's not quoted the list will be treated as an expression, and evaluate "Useless", which is probably not what you want.

MichaelGG · on Feb 4, 2013

Right so instead you'd write: keywords (list "a" "b") ? Which would let you also do complex things like (list (person "bob") ...)

RaphiePS · on Feb 4, 2013

See my comment above. The core difference is that (list ...) evaluates what's inside. So (list (+ 1 2)) returns (list 3). ' doesn't evaluate anything. So '((+ 1 2)) returns '((+ 1 2)).

informatimago · on Feb 5, 2013

See also Rivest's Sexps: http://people.csail.mit.edu/rivest/Sexp.txt

juan_juarez · on Feb 4, 2013

> I want to know how to parse them in PHP.

Step one : Stop using PHP.

jhrobert · on Feb 4, 2013

If you love the expressiveness of S-expressions but hate the super noisy parenthesis, check the sugar version, called "sweet-expressions". http://readable.sourceforge.net/

chc · on Feb 4, 2013

I find that alternative syntaxes like this for S-expressions tend to get really awkward when confronted with real-world Lisp code. For example, the following code sample†:

  (define (pointless-function a z)
    (let* ((numbers (map (lambda (n) (* n 2)) (range a z)))
           (multiple-of-three? (lambda (n) (= 0 (mod n 3))))
           (multiples-of-three (filter multiple-of-three? numbers)))
      (printf "Your numbers are: ~s~%" multiples-of-three)))

AFAIK, this code only gets noisier when you introduce a paren-free syntax, as you need to introduce hacks to get around the simultaneous necessity and lack of parentheses.

† OK, I'll grant you, this function is not at all realistic. But I think the structure of the code is pretty lifelike. I didn't have any Lisp code at hand to use as an example, so I just banged out something that used let and lambda to illustrate that the structure of S-expressions can be pretty intricate.

qu4z-2 · on Feb 5, 2013

My ideal s-expression syntax would allow you to replace some brackets with whitespace. For instance, your function would become:

  define (pointless-function a z)
    let*
        numbers (map (lambda (n) (* n 2)) (range a z))
        multiple-of-three? (lambda (n) (= 0 (mod n 3)))
        multiples-of-three (filter multiple-of-three? numbers)
      printf "Your numbers are: ~s~%" multiples-of-three

Obviously there are some down-sides, but the main idea is that the mapping should have no knowledge of the language details, and be a pure s-expression transform. I really should explore the idea more. I have no idea whether the above example can even be uniquely parsed.

praptak · on Feb 4, 2013

Clojure managed to keep the Lisp expressiveness while keeping the parentheses nesting low. I don't think I can list every Clojure feature that aids this goal but here's a few:

* bindings (let forms & similar) make implicit pairs, so it's (let [a 1 b 2] ...) instead of (let ((a 1) (b 2)) ...)

* square brackets as shorthand for vectors - the nesting level might be the same, but the second kind of brackets somehow helps the brain find the way in the parenthesis jungle.

* lots of helpful macros (e.g. -> and ->>) and other helpers that help write terse code with low nesting. #(+ 5 %) as shorthand for (fn [x] (+5 x))

Jach · on Feb 4, 2013

Clojure's way has been generalized into a nice standard called edn (extensible data notation): https://github.com/edn-format/edn I wish this was as commonplace as json and xml...

martinced · on Feb 4, 2013

You have to love the intellectual dishonesty of that site you linked to: they quote Paul Graham whose not just a big Lisp dialects advocate but also probably partly responsible for the regain of interest in Lisp dialects...

Yet they quote pg as if he ever said that Lisp source code were ugly.

Regarding the "noisyness", I really think that the one and only place that you can really criticize is the closing of the outermost method.

And any text editor worth its grain of salt can be programmed to automagically collapse these closing parentheses. Hence preserving the eyes of non-lispers.

Also, when you're using something like paredit or subpar you hardly ever have any issue of non-matching parentheses.

I'm really surprised that people are still whining about that instead of trying to focus on the bigger picture: the benefits that homoiconicity brings to the table.

orthecreedence · on Feb 4, 2013

Agreed. You don't hear people bitching about all the {}'s in C and Java, but when lisp comes up you'd think parens were responsible for the plague.

Get a real editor and deal with it. Once you write your first macro, you'll realize how stupid all the whining was and why homoiconicity is in many aspects superior to other syntaxes without sacrificing much at all.

chc · on Feb 5, 2013

Well, a lot of it is that Lisp uses parens for everything while most other languages use a variety of syntactic tools.

For example, declaring a bunch of variables in Lisp:

  (let ((a 1)
        (b 2)
        (c 3))
    (format t "They are ~D, ~D and ~D~%" a b c))

The same thing in C

  int a = 1;
  int b = 2;
  int c = 3;
  printf("They are %d, %d and %d\n", a, b, c);

Lisp used parens around the whole construct, as well as around the assignment list, around each individual assignment and to do the printing. C completely eschews some of these parens and uses = and ; to fill essentially the same role as the others. Only `printf()` keeps the parens. In short, C has lots of different syntactic symbols, while Lisp has relatively few.

To illustrate further, Clojure gets the "OMG parens" complaint a lot less than most Lisps because it consciously shies away from the traditional parens-everywhere look of Lisp. For example, the above code in Clojure:

  (let [a 1
        b 2
        c 3]
    (printf "They are %d, %d and %d\n" a b c))

Still more parens than C, but the difference is marked — the assignment block is clearly set apart by syntax, and the assignments are paired by juxtaposition rather than by grouping into lists. Parens are used relatively sparingly.

This isn't to say that this is really a problem, but the idea that parens in Lisp are equivalent to brackets in C isn't really right either. Parens in Lisp are used in place of nearly every bit of syntax in C, from braces to equals signs to semicolons.