
Rich Hickey: extensible data notation - Peteris
https://github.com/richhickey/edn
======
jacobolus
Nice advantages over JSON: more compact, easier to pretty print, includes an
integer type, non-string map keys, has a nice built-in extension mechanism
(which is much more elegant than any ad-hoc thing that JSON can support).

Things that probably make sense coming from Clojure, but seem somewhat
unnecessary for a general purpose data interchange format: explicit character
type (as compared to length 1 strings (which could optionally use the
extension mechanism if necessary)), separate types for vectors and lists
(seems like the extension mechanism could handle this if it’s ever necessary;
to some extent this criticism holds for sets too, but those are also more
independently useful).

One type not included that I find useful: some kind of "raw" string wherein
backslashes are interpreted literally, and double escapes aren’t required all
over the place.

Possible point of confusion that should be spelled out more explicitly: by the
grammar provided, a floating point number requires an explicit leading digit.
That is, '0.5' cannot be spelled '.5'. (Should an implementation accept '.5'
as a number, or reject it as badly formed?)

Also, does _"a floating-point number may have the suffix M to indicate that
exact precision is desired"_ mean that it should be interpreted as a decimal
number? Might be worth saying that directly.

It would be nice to see a bit more guidance about _"Symbols are used to
represent identifiers, and should map to something other than strings, if
possible."_ Perhaps this could include examples of what Rich Hickey & al.
think would be useful interpretations in JavaScript and Python (to pick two
obvious popular examples).

Most of all, it would be nice to see a clear explicit treatment of Unicode for
strings/symbols (I’d recommend utf-8 here), including possible ways of
escaping code points in strings. Confusions about Unicode are one of the main
points of incompatibility between JSON implementations, and the JSON spec has
more to say about the subject than this current spec does.

One nice built-in tag to add: base64 raw data, using one of the typical base64
encodings, which should be described in the spec to avoid any confusion.

Question: if a tagged element is considered a unit, can another tag be put in
front of one? That is, something along the lines of '#outer_tag #inner_tag
"built-in element"', where in the code which interprets the file, whatever
object is produced by the inner_tag's extension is sent as input to the
outer_tag's? It’s worth clarifying this so that implementors make sure to add
the case to their test suites.

~~~
alexchamberlain
.5 cannot be a symbol, so it would make sense to add it to the grammar.

~~~
dsantiago
Actually, .5 does not parse as a number in Clojure, and does parse as a
symbol.

    
    
      user> .5
      CompilerException java.lang.RuntimeException: Unable to resolve symbol: .5 in this context, compiling:(NO_SOURCE_PATH:1) 
      user> (type '.5)
      clojure.lang.Symbol

~~~
alexchamberlain
Interesting, it's not a symbol according to EDN.

~~~
dsantiago
Oh yes, there's a few differences. For example, Clojure also has a radix
notation for integers (as in 16rFF = 255) that doesn't seem to be a part of
EDN. I had assumed that Clojure would cast the deciding vote in these sorts of
things, but now that I think about it, maybe not.

------
falcolas
So, XML, YAML, JSON, and the dozen plus other markup notations were not
suitable because they used C style delimiters instead of Lisp style
delimiters?

Snark aside, what does this buy us that these existing markup languages don't?

I appreciate all that Rich Hickey does for programmers, but aside from his
name, this just adds to the noise that already exists in this domain.

~~~
mattdeboard
I suspect it's just a tool, and the Datomic folks thought they'd make it
available. No one said it was going to change the world, right?

What does it buy "us"? Well, it doesn't buy "us" anything. They needed a data
interchange format and I assume it made more sense to use Clojure's primitives
than parsing up JSON. Why would a project written using a homoiconic language
use anything but that language to exchange data between its components?

edit: Said best by this tweet by fogus:

"Clojure devs have been using #Clojure data as an interchange format all
along. ..."

<https://twitter.com/fogus/status/243913831242399744>

~~~
lloeki
> They needed a data interchange format and I assume it made more sense to use
> Clojure's primitives than parsing up JSON.

Even in JavaScript it's better to parse JSON instead of eval'ing, because you
don't want to execute stuff that would "happen" to be contained in the _data_
interchange format.

~~~
Tuna-Fish
The word REPL comes from lisp and means Read Eval Print Loop. The significance
is that unlike in, say, python, Read and Eval are separate. Read reads a
string and turns it into an in-memory data structure, Eval takes a data
structure (not a string!) and evaluates it.

Using Read in a lisp is typically safe, so long as you turn off reader macros
and the like.

~~~
lloeki
> _The significance is that unlike in, say, python, Read and Eval are
> separate._

They are separate:

    
    
        * http://docs.python.org/library/ast.html#ast.parse
        * http://docs.python.org/library/ast.html#ast.literal_eval

------
snprbob86
I added a spot on the wiki for links to implementations:

<https://github.com/richhickey/edn/wiki/Implementations>

A few months ago, I started doing an implementation of a Clojure reader in C
and a Ruby extension, but I ran out of time to work on it. If anyone wants to
use my code as a starting point, please do! I'd love to see more usage of
Clojure forms... errr... "EDN" in the wild!

------
ChuckMcM
<http://tools.ietf.org/html/rfc4506>

That is 'XDR' (eXtensible Data Representation) which has similar goals and is
reasonably mature. Of course JSON (<http://www.ietf.org/rfc/rfc4627.txt>) does
this as well but using character code points. Not to mention XML and ASN.1. In
perl there is YaML (<http://search.cpan.org/dist/YAML/>) too.

I'm wondering what this one brings to the table. The readme file doesn't say.

~~~
andrewflnr
Your link says "external", not "extensible", and in fact seems to make no
mention of extensibility.

~~~
ChuckMcM
Absolutely correct, brain fart. The description language was designed to
represent any data structure you could represent in C, in XDR but more
importantly insure that moving those structures across a network between
disparate architectures would return them the natively 'correct' format when
received.

------
michaelsbradley
It seems one immediate application is in Datomic's new REST API:

 _Datomic gets a REST API_

<http://news.ycombinator.com/item?id=4487467>

I'm developing with Clojure on a daily basis at present, and it's cool to see
such a slick data notation take on new life outside of my REPLs and .clj
files.

Now, what would be really interesting is if _edn_ (or an official superset of
it) could be formalized with "hypermedia controls".

See: _Hypermedia-Oriented Design_

<http://amundsen.com/articles/hypermedia-oriented-design/>

One of the shortcomings of JSON is that as a standardized hypermedia type, it
doesn't offer any hypermedia controls. There are efforts to standardize JSON-
derived types which do provide such controls:

 _HAL_

<http://tools.ietf.org/html/draft-kelly-json-hal-03>

 _JSON-LD_

<http://json-ld.org/>

 _Collection+JSON_

<http://www.amundsen.com/media-types/collection/>

It would be great to see _edn_ take on that challenge in its early days as an
extra-clojure, general purpose data notation. I'm convinced that Fielding is
right about REST implying HATEOAS (others argue for "practical REST"), but you
can't robustly implement REST/HATEOAS APIs with a media type that outright
lacks hypermedia controls.

~~~
jacobolus
Would you mind suggesting some possible formats for this, and explaining a bit
what’s necessary for “hypermedia controls”? Or is there a link somewhere that
defines this more clearly? The hypermedia-oriented design link was kind of
abstract.

~~~
icebraining
Michael can correct me if I'm wrong, but from what I read from Fielding it's
essentially having links and information about the relationship between "this"
and the linked resource as first-class types in the data format.

A concrete example would be the <link> tag in HTML, which with the 'rel' and
'href' attributes can make explicit a relationship between that HTML page and
e.g. the page of its author.

~~~
michaelsbradley
Yes, that's the basic idea.

------
skrebbel
i think that one of the reasons why JSON got so popular is that it was
artificially restricted to a smaller language than the actual object notation
in JavaScript. This meant that it was easy to write a parser for which, in
turn, happened a lot.

I see lots of optional extras here that might make humans a little happier but
may increase the chance that different implementations are incompatible.

YAML had this problem, too.

------
dons
As with JSON, no support for algebraic data types (so tagged unions or
recursive data types)?

~~~
richhickey
edn is a mechanism for the conveyance of values. It is not a system for
defining their semantics, types nor schemas.

That said, there should be no problem using edn to convey the values of types
whose definitions are algebraic or recursive.

In particular:

"A tag may specify more than one format for the tagged element, e.g. both a
string and a vector representation."

admits unions.

And the system itself nests arbitrarily, i.e. a tagged element can be defined
in terms of others.

Values of this type:

    
    
        data Tree = Empty
                  | Leaf Int
                  | Node Tree Tree
    

could be conveyed like this:

    
    
        #tree/Tree Empty
        #tree/Tree [Leaf 42]
        #tree/Tree [Node #tree/Tree Empty, #tree/Tree [Leaf 42]]

~~~
dons
Thanks for the info. Having written wire serialization from typed to untyped
formats many times over the years, the limited expressivness of such formats
has been an ongoing source of annoyance.

------
6ren
Extreme expressiveness makes a data-format harder to understand at a glance;
and data is generally dumb enough not to need it.

Examples are helpful. An example of an example: <http://www.sinatrarb.com/>

~~~
josteink
_Extreme expressiveness makes a data-format harder to understand_

I would argue that extreme expressiveness through the most simplistic rules
makes a data-format easier to understand than some data-format which is only
mediocre-ly expressive through more complex and inconsistent rules.

For instance, Lisp/S-expressions typically has a much simpler _data-format_
than C and is much easier to learn from end to end.

The complexities associated with Lisp-code can not and should not be
attributed to it's syntax, but rather that most Lisp-code is written to be
purely functional and non-procedural.

While procedural and/or stateful C/C++/Java/C#-code might be easier to
understand for a C/C++/Java/C#-programmer, I don't think you would find any of
those programmers arguing that S-expression _syntax_ is harder to grasp and
master than the complex syntax of C-based languages.

------
Flow
I get that it's a richer format than JSON, with the introduction of sets,
dicts and so on, but what I felt wasn't explained was the "extensible" part.
Can someone give an example of this?

~~~
yusefnapora
I think it refers to the use of tags, which allow you to register tag handlers
to add custom semantics to whatever data structure follows the tag.

~~~
Flow
Ok, as a trigger. But isn't it backwards to have the generator suggest where
to put trigger points instead of the consumer?

~~~
richhickey
> But isn't it backwards to have the generator suggest where to put trigger
> points instead of the consumer?

No. edn is self-describing, and all descriptions bottom out on built-in types.
Having each date/instant or extended type proclaim it is an #inst or #whatever
means a single handler will ensure they all become proper types on the
consumer side, with no knowledge of the application or document structure at
all. And if no specific handler is installed, a generic handler can at least
ensure that the value returned keeps track that the tagged element was tagged,
and how, rather than just silently yielding a string with some encoded cruft
in it.

The world you are describing is that 'context-sensitive' world mentioned in
the edn rationale, where the consuming app must know that the cruft inside the
dob: string inside a particular map in a particular context is actually a date
or :id is a uuid or whatever, and also know how they represented. Ditto the
lastEdited: field, startDate: and foo: fields etc. Are they all the same?
Handling any particular document means having complete knowledge of such
details.

Context-sensitivity greatly complicates applications and thwarts generic
processing.

------
shaunxcode
<https://github.com/shaunxcode/jsedn>

This is my attempt at a js version. parse is working - now working on encode.

There is really nothing stopping it from working in browser other than I am
currently working on it as an npm/node js package.

I can successfully load datomic schemas and .dtm files which I think is a
pretty good test.

Need a lot more tests around int/float wrt the arbitrary precision. I would
love to have some pull requests if anyone wants to write valid tests which
break it.

------
Groxx
<redacted>

I'll stick with JSON for one simple reason: you can have _anything_ as a key
(as long as it's escaped). The fact that this can't means you either a) can't
use it if you have more-complex keys for some reason, or b) you have to use
some non-standardized packing format to encode your illegal keys.

</redacted>

edit: thanks repliers, I missed part of the spec. Ignore!

~~~
lmkg
False. You can use anything as a key.

> _Note that keys and values can be elements of any type._

The sample code also uses a vector as a key.

~~~
Groxx
Ah, I missed that part. I was going off the keyword == symbol rules, which are
waaaaay more restrictive. Thanks!

------
andrewflnr
I kind of like the tagging syntax. It almost seems like they should really be
lists with the first tag in function position, but I don't know think that
would interop with real Clojure. I tried to come up with something similar,
but somehow couldn't generalize to tagging any random object. When in doubt,
generalize...

PS: I'm just learning Clojure.

~~~
threedaymonk
If you mean something like this, it would interoperate just fine:

    
    
        (list 1 2 3)
        (vector 1 2 3)
        (hash-set :a :b :c)
        (hash-map :a 1 :b 2)
    

I suspect that typing hash-map all over the place might get old fast, though.

~~~
andrewflnr
I mean `(#myapp/person {:first "Fred" :last "Mertz"})' or `(#seconds 330)'. In
this case it's probably the extra parentheses that get old fast. Even so,
having semantically connected items not be stuck together makes me a bit
queasy.

------
icrbow
<https://bitbucket.org/dpwiz/hedn>

My stab at haskell parser/encoder and type converter. Would be glad for a
friendly hug^W code review and packaging suggestions.

------
tree_of_item
Could the blessed days of the s-expression interchange format finally be here?

