
Some thoughts on JSON vs. S-expressions (2012) - wglb
https://eli.thegreenplace.net/2012/03/04/some-thoughts-on-json-vs-s-expressions
======
tikhonj
Funny to see this article—I was interning at a company that used S-expressions
instead of JSON internally right around the time this post was written.
(Seeing it for the first time now though.)

All I can say is that _in practice_ , s-expressions worked _way_ better than
JSON. Easy to read as a human, more flexible than JSON, reasonable syntax for
working with alternatives (ie variants or sum types) and, of course, comments.

Here's one thing that stood out to me in particular: you could encode
formatted text in a way that was pretty easy to read even _without_
formatting:

(Here is a sentence with (i some words) (b emphasized) in (i (b different))
ways.)

I forget if this was _exactly_ the syntax they supported, but it was something
like that. You could do something similar with JSON, but it would be way more
verbose and harder to parse visually.

Another massive advantage: structural editing in Emacs with Paredit is
_incredible_. Once you get the hang of it, it's way faster _and_ less prone to
typos than "normal" text editing.

If I ever start my own company, I'd definitely be tempted to standardize on
S-expressions as our human-readable format for serialization and config data.

~~~
ardy42
> Here's one thing that stood out to me in particular: you could encode
> formatted text in a way that was pretty easy to read even without
> formatting:

> (Here is a sentence with (i some words) (b emphasized) in (i (b different))
> ways.)

That's feels a little like XML, just without attributes and untyped end-tags.

I've never really understood the hate for XML and the preference for stuff
like JSON. I'm a little biased towards XML because my company uses it heavily
and I got very familiar with it early in my career, but it seems to make good
trade offs for most use-cases where you'd want to use structured text.

~~~
fao_
> That's feels a little like XML, just without attributes

Attributes:

    
    
        (p :class 'center'
           (div ...))
    

> and untyped end-tags.

The end tags are implied by the start tag. Why do you need an end tag? It's
just another way to mess up. The following situation is not possible using
S-Expressions:

    
    
        ...
        <Start>
            <Center>500</Center>
            <Left>10</Left>
            ...
        </End>

~~~
ardy42
> The end tags are implied by the start tag. Why do you need an end tag? It's
> just another way to mess up.

It helps with legibility and error-checking of hand-authored documents (at
least in documents with varied tags). It doesn't look like fun to figure out
where to insert something in a pages-long document with sections that like
like ))))))))))))).

It's not another way to mess up, it's another way to make sure you wrote what
you meant.

~~~
wglb
This is a fallacy rooted in SGML. With a proper editor, finding where to
insert or delete new stuff.

~~~
em-bee
... is easy.

~~~
wglb
Haha--yes, Thanks.

------
lilactown
I still believe that EDN[0] is one of the best general purpose data
serialization formats - definitely a huge upgrade from JSON.

It addresses most of the authors problems while _also having s-expressions_,
e.g.:

    
    
        (foo bar baz) ;; this is valid EDN
    
        {:foo [bar baz]} ;; this is also valid EDN
    

There are parsers for many popular languages, and a language already entirely
based on it: Clojure.

[0] [https://github.com/edn-format/edn](https://github.com/edn-format/edn)

------
tlb
I disagree. The difference between a dict and a list of key-value pairs should
be left as an implementation detail in the receiver, not specified in the
protocol.

S-expressions get this right. If the receiver knows they'll be doing a lot of
lookup, they can build an indexed structure. If not, they can use something
like assoc (which does a linear scan). The linear scan is probably faster with
less than 5 entries anyway.

~~~
codeflo
I think we mostly agree, but would put it a bit differently.

First, I think performance is a bit of a red-herring in this discussion. If a
linear scan is really faster for five entries, a good dictionary
implementation could simply switch to that.

Second, and more to the point, there’s a obviously a semantic difference
between a list of pairs and a dictionary. That’s always going to be part of
the “protocol” - whether your serialization format is capable of expressing
that or not.

However, and I think this is where we agree, there are lots of semantic
distinctions that JSON can’t express (dates vs. strings, to pick an example I
encounter often). So in many cases, you’re going to have a deserialization
step anyway that validates the JSON values and converts them into the objects
you actually want to use in your application. Why, among all the semantic
distinctions that JSON doesn’t capture, is the list/dictionary distinction so
special that it needs this kind of syntactic support in a serialization
format?

(BTW, for JSON, the answer is obviously rooted in JavaScript’s type system.
But while that makes it particularly nice to use JSON to serialize
_JavaScript_ objects, the context here is its use as a general-purpose
serialization format.)

~~~
HelloNurse
There's a difference between a list of _pairs_ , which is the one that is
equivalent to a dictionary, and a list of _lists_ that all happen to have
length 2; and between an unordered multiset and an ordered list.

------
dan-robertson
I’ve used both json and a cutdown version of sexps (there are no types like
numbers or symbols, all atoms are strings so "foo" and foo are the same sexp)
a lot.

I felt like sexps were much better for humans to write as you don’t need to
quote everything. Maybe this could be fixed in a json extension which allows
words to be read as strings of themselves but if you extend json you lose any
interoperability. Json also doesn’t have comments.

There are some arguments for or against alists/plists instead of json
“dictionaries”. An obvious point is that only one of these allows for non-
string keys. A weak concern I have is about using “the wrong type”, eg if you
use an array you sort of imply an ordering, and an object an unordered
mapping, so should you put unordered data like a set in an array or should you
have a map full of null values?

In recent years, languages with tagged unions like haskell or rust have become
more popular. I feel like there’s no satisfying way to write such a thing as
json (relying on field names to disambiguate is only sometimes possible and
feels tricky to read, having an object with a tag field or an array of [tag,
value] feels unnatural. With a sexp you can just write it with a list of tag
and args but because everything is lists, it doesn’t feel so unnatural.

This isn’t related to json vs sexps but having a difference between "123" and
123 in your text format kind of sucks. Either the parser will reject one or go
to the trouble of parsing both or there’s a semantic difference between them
which feels worse. A case it matters in json is that if your number isn’t a
double you may want to put it in a string so other json reading programs don’t
turn it into a double and round it to a different value.

~~~
MrManatee
I don’t know how universal this is, but at least in my mind there is a clear
semantic difference between 123 and "123". 123 is a number in the mathematical
sense, maybe counting or measuring something. It probably makes sense to do
arithmetic on it (addition, multiplication, etc.). "123" is a number in the
other sense: some kind of identifier that happens to consist of numerical
digits; something like a phone number, zip code, serial number, etc. It
doesn’t make sense to perform arithmetic with it. And if it happens to begin
with a "+" or a "0" then it’s not ok to just drop that character.

------
fmakunbound

        #S(HASH-TABLE :TEST FASTHASH-EQL (ORANGES . 2) (APPLES . 6) (PEARS . 5))
    

Could have been an ALIST:

    
    
        ((ORANGES . 2) (APPLES . 6) (PEARS . 5))
    

or even a PLIST:

    
    
        (ORANGES 2 APPLES 6 PEARS 5)
    

Those allow for ordered, and even duplicate keys unlike the hashtable
approach. No idea if that's applicable to JSON reading though...

------
waterhouse
For what it's worth, Racket has a pretty terse syntax for literal hash tables:

    
    
      > (let ((x (make-hash))) (hash-set! x 1 3) (hash-set! x 'a 5) x)
      '#hash((1 . 3) (a . 5))
    

They can be read in:

    
    
      > (list (read) 30)
      #hash((2 . 4) (a . b))
      '(#hash((a . b) (2 . 4)) 30)

------
bjoli
Just because the standard lisp reader is readily available doesn't mean you
should use it for anything else than code or small chunks of trusted data. Was
there ever a question of using regular s-expressions in the same way as json?
I don't think I have ever heard people suggest that. Most people either talk
about using (read ...) to read trusted config files (which is fine) or to
define an s-expression format to address specifically the complaints of the
article.

As others have pointed out, there is nothing stopping a lisp reader from
reading an alist as a hash-table. That is an implementation detail...

~~~
tlb
Indeed, feeding network-supplied input to a lisp reader is asking for trouble.
In principle you can configure a reader to ignore read macros, but every lisp
dialect has different footguns.

But they're great for application data storage. HN stores everything as
s-expressions.

------
gumby
Did the author know what an alist is (hint: check out he assoc function). In
any case though, there’s no huge difference between the two. The graph
structure semantics are assigned by the generator and consumer code.

------
thelazydogsback
He wants EDN...

------
wglb
I do use s-expressions converted from JSON for some API work that I do. While
I am not suggesting replacing json with s-expressions, I find it easier to
manipulate them.

For those interested, here is a detailed comparison of Lisp JSON conversion
libraries [https://sabracrolleton.github.io/json-
review](https://sabracrolleton.github.io/json-review).

------
andrepd
This completely misses the point imo. S-expressions are a simple and fast way
to serialise data, _which is then deserialised according to some schema_. It's
not up to s-expressions (or json) to represent the end object being
deserialised.

~~~
kzrdude
XML seems similar to S-expressions, in that way.

------
em-bee
if you want to experiment with using s-expressions and your language doesn't
already have a parser/generator, you can find a number of simple
parsers/generators on rosettacode:

[http://rosettacode.org/wiki/S-Expressions](http://rosettacode.org/wiki/S-Expressions)

there is also a good discussion on what issues need to be considered on the
talk page:

[http://rosettacode.org/wiki/Talk:S-Expressions](http://rosettacode.org/wiki/Talk:S-Expressions)

------
jchook
Why would you need key: value representation when (key value) works just as
well?

------
cryptica
JSON is the best interchange format ever invented. I don't understand why
anyone would want to change it. It's:

\- Simple

\- Human and machine readable

\- Compatible

\- Searchable

\- Debuggable

\- Relatively fast encoding/decoding

\- Small size

I think Protocol Buffers is yet another misguided attempt to reinvent the
wheel. If you did a pros and cons analysis on Protocol Buffers, you'd quickly
realize that you lose in simplicity, lose on human readability, lose on
compatibility (since it can introduce a reliance on specific type systems
which different services will have to agree with), it's less searchable, less
debuggable... The only benefit of Protocol Buffers over JSON is that in some
languages it's a bit faster to encode and decode... In some environments the
speed difference is negligible.

I can't think of any use case where ProtoBuf would be superior. I still don't
understand how it could have become so popular. I guess it has Google's name
behind it; that must be why.

~~~
ktpsns
JSON is by far not the universal exchange format. There are plenty of use
cases where JSON is not well suited, such as:

\- regular/tabular data where JSON fails at processing speed and/or
compressibility. One of the most efficient ways to store for instance a
table/array of numbers is packing them as binary IEEE754 stream.

\- encapsulation of other code. Ever put HTML in a string within JSON? It's
terrible to read and maintain for a human due to all the escaping. If you ever
embedded SVG into XML/HTML or XML/HTML in JSX, you know how easy it can be if
you don't have to break your head about escaping.

\- Obviously there is a need for binary json (there exist a lot of
proposals/standards). That's quite similar to protocol buffers. The reason is
always something about performance.

\- Also there are folks arguing since decades about JSON vs. XML vs. YAML vs.
TOML vs .... There are lot of different opinions, and it is hard to say some
is superiour to the other.

~~~
cryptica
Trying to standardize binary as a flexible and general purpose interchange
format is unwise IMO. UDP and TCP (and other protocols at that level) are
about as far as we can standardize binary formats - This is as specific as
they can be whilst still being generally applicable to a wide range of use
cases.

My problem with ProtoBuf is that you can't impose more a more rigid structure
and get more compatibility at the same time. A more rigid (statically typed)
structure is inherently less compatible/interoperable (requires more
integration effort, not less; harder to debug, not human readable, not
searchable etc... More integration effort = less compatible, less
interoperable).

