Nice advantages over JSON: more compact, easier to pretty print, includes an integer type, non-string map keys, has a nice built-in extension mechanism (which is much more elegant than any ad-hoc thing that JSON can support).
Things that probably make sense coming from Clojure, but seem somewhat unnecessary for a general purpose data interchange format: explicit character type (as compared to length 1 strings (which could optionally use the extension mechanism if necessary)), separate types for vectors and lists (seems like the extension mechanism could handle this if it’s ever necessary; to some extent this criticism holds for sets too, but those are also more independently useful).
One type not included that I find useful: some kind of "raw"
string wherein backslashes are interpreted literally, and double escapes aren’t required all over the place.
Possible point of confusion that should be spelled out more explicitly: by the grammar provided, a floating point number requires an explicit leading digit. That is, '0.5' cannot be spelled '.5'. (Should an implementation accept '.5' as a number, or reject it as badly formed?)
Also, does "a floating-point number may have the suffix M to indicate that exact precision is desired" mean that it should be interpreted as a decimal number? Might be worth saying that directly.
It would be nice to see a bit more guidance about "Symbols are used to represent identifiers, and should map to something other than strings, if possible." Perhaps this could include examples of what Rich Hickey & al. think would be useful interpretations in JavaScript and Python (to pick two obvious popular examples).
Most of all, it would be nice to see a clear explicit treatment of Unicode for strings/symbols (I’d recommend utf-8 here), including possible ways of escaping code points in strings. Confusions about Unicode are one of the main points of incompatibility between JSON implementations, and the JSON spec has more to say about the subject than this current spec does.
One nice built-in tag to add: base64 raw data, using one of the typical base64 encodings, which should be described in the spec to avoid any confusion.
Question: if a tagged element is considered a unit, can another tag be put in front of one? That is, something along the lines of '#outer_tag #inner_tag "built-in element"', where in the code which interprets the file, whatever object is produced by the inner_tag's extension is sent as input to the outer_tag's? It’s worth clarifying this so that implementors make sure to add the case to their test suites.
Oh yes, there's a few differences. For example, Clojure also has a radix notation for integers (as in 16rFF = 255) that doesn't seem to be a part of EDN. I had assumed that Clojure would cast the deciding vote in these sorts of things, but now that I think about it, maybe not.
So, XML, YAML, JSON, and the dozen plus other markup notations were not suitable because they used C style delimiters instead of Lisp style delimiters?
Snark aside, what does this buy us that these existing markup languages don't?
I appreciate all that Rich Hickey does for programmers, but aside from his name, this just adds to the noise that already exists in this domain.
I suspect it's just a tool, and the Datomic folks thought they'd make it available. No one said it was going to change the world, right?
What does it buy "us"? Well, it doesn't buy "us" anything. They needed a data interchange format and I assume it made more sense to use Clojure's primitives than parsing up JSON. Why would a project written using a homoiconic language use anything but that language to exchange data between its components?
edit: Said best by this tweet by fogus:
"Clojure devs have been using #Clojure data as an interchange format all along. ..."
> They needed a data interchange format and I assume it made more sense to use Clojure's primitives than parsing up JSON.
Even in JavaScript it's better to parse JSON instead of eval'ing, because you don't want to execute stuff that would "happen" to be contained in the data interchange format.
The word REPL comes from lisp and means Read Eval Print Loop. The significance is that unlike in, say, python, Read and Eval are separate. Read reads a string and turns it into an in-memory data structure, Eval takes a data structure (not a string!) and evaluates it.
Using Read in a lisp is typically safe, so long as you turn off reader macros and the like.
How so? Prefix notation reads left-to-right nicely when you are dealing with verbs. Boolean logic is less natural, but also makes up much less of my programs (especially when using functional patterns like predicates, map, reduce, filter, etc.).
I also think that while Boolean expressions in Lisp are less natural than "this AND that OR another", they are clearer when they get large (with good formatting of course).
In English we say 1 + 2 * 3 as "one plus two times three":
1 + 2 * 3
one plus two times three
But in Lisp when read naïvely left-to-right this becomes "times add one two three", which is nonsense in English.
(* (+ 1 2) 3)
times add one two three
Most people cite "lots of brackets" as their reason for disliking s-expressions, but I think the unfamiliar ordering is a less superficial criticism. Of course, for those of us manipulating ASTs they're very natural, because we see language as trees.
People always mention the infix operators, but really, how often do you come across them in your code? Unless you're working on something highly mathematic, it really isn't that big of a deal. Otherwise, sexprs visually are a matter of rearranging parentheses for basic function calls.
Of course 1 + 2 * 3 is more familiar that way, vs the Lisp way. But you have the Lisp way wrong (unless your example was Smalltalk)... it would be:
(+ 1 (* 2 3))
Which could be read left-to-right as "sum 1 and the product of 2 and 3".
Also, most math functions in Lisp are variadic. So if you have this in infix:
x + y + z + 1
// x plus y plus z plus one
You could do it in Lisp as:
(+ x y z 1)
;; sum x y z and one
There are plenty of cases where an algorithm is naturally expressed as a map or a reduce, where infix or prefix has nothing to do with it. I don't find myself using a lot of the kind of math that would be better in infix.
It might even be useful to give math operators new names in Lisp, because of the way that things like < > = == are a bit awkward when read as their equivalents in infix. They are all variadic, after all.
Compare:
(x == 10 && y == 10) && (x < y && y < z && z < 100)
Vs.
(and (== x y 10) (< x y z 100))
That's already an improvement, but the reading is strange. So what if we alias those operators?
(all-true? (same? x y 10) (ascending? x y z 100))
I don't know if I'd actually use those names, and I don't think I would do this in code, but you get the point. You can even tell how a list of numbers is sorted in Lisp by simply doing:
Lisp disappointed a lot of people in the early 2000s. It wasn't ready for a close-up examination by people who cut their teeth on Perl, Python, and Java. All the little inconveniences, the lack of software repos, the lack of organized open-source projects, the attitude that people shouldn't expect to get work done right away, the "why do you need a library for X when you can write it in twenty lines of code?" The Lisp community was pretty satisfied with the way it did things (which I think was more a reasonable attitude than it appeared from the outside) but they were not prepared to explain themselves to a mob of curious people with entirely different expectations for a programming language community.
Erasing that initial bad impression may be a generational thing. Even heroin goes through cycles where a new generation comes along that hasn't seen anyone die of heroin, they make heroin cool for a while, and then the horrifying results inoculate the culture against the idea that heroin is cool for another fifteen or twenty years. Lisp will get another chance, and it can afford to wait.
Also, I think next time around it won't be Lisp, the universal solution. It will be a Lisp. It might not even be called Lisp. It might be called "Clojure" or something odd like that ;-)
Can't get a hold on a blog article from an old lisper expressing his lack of understanding about the so-called lisp library problem. It was a clear summary of the fallacy.
More and more I hear people 'they've been doing this in lisp all along' so I make a naive guess that in the future lisp genes will reappear in more favorable settings and won't get dismissed as they were before. It's already happening as for gc's and closure right ?
Above JSON: Sets, keywords, integer type, hashmaps in which the key doesn't have to be a string, extensible data types (most of which jacobolus points out as well)
JSON is a lowest common denominator. The features that you just specified are not directly available in JavaScript. Therefore data represented in this new format will be very, very clunky to access from JavaScript.
You say "lowest common denominator," I say "subset of JavaScript." In fact, JSON objects are clunky to access in some languages, like C. JavaScript's lack of proper sets and maps is a weakness of the language's standard libraries and is in fact being corrected. It seems quite unreasonable to me to use JavaScript as some sort of Platonic ideal of a programming language, where anything that has a different feature set is automatically bad.
It's kind of a big leap to go from "This is not a strict subset of JavaScript" to "You're ignoring the web." If you really want to deal only with JavaScript-native data structures, I can't see any reason why you couldn't deserialize EDN into those. If you really like the added data structures in EDN and don't want to lose those in your JavaScript program, you could just use Mori or something similar.
If improvement in JavaScript, which sits in browsers that developers have no control over, is a prerequisite for the success of a protocol, that protocol will be irrelevant in the web world.
In general when you're trying to achieve interoperability, any features and distinctions supported in your data format that are available in one language and not in another will be causes of interoperability problems.
For instance with JSON, the format allows you to differentiate between values that are numbers and values that are strings. Languages like PHP, Perl and JavaScript do not particularly care about that distinction, and therefore coding values consistently is hard in those languages. (You can guess, but sometimes guess wrong - your zip codes are usually represented as numbers and sometimes strings.) Languages like Ruby and Python do care about the distinction, and so applications can have difficulty accepting JSON from the previous languages.
This new format allows more places for this kind of issue to exist. We have distinction between different types of integers. We have symbols (2 kinds if you have them). We have sets. We have 2 different kinds of lists. And we have a plethora of possible extensions, every one of which can behave differently in different implementations.
When both sides speak Clojure, this is going to be a very convenient choice. When they don't, every one of these obvious features is going to be a source of headaches down the road.
That is why data exchange formats should be in a least common denominator between the languages in question.
If I were to agree with some of your points, it would have to be the distinction between a list and a vector seems rather silly.
Having those as distinct different types in a data exchange-format seems awfully (lisp/functional-language) implementation specific. IMO conversion from "generic list" to "whatever type of list makes sense in your domain" should really be done outside of an data-interchange format. A data-exchange format should not need to meddle with any list besides "generic list".
I do think sets (as opposed to lists) brings valuable semantics, though I can see how that is up for debate.
Otherwise, you bring up fair points, but I don't mind a format with ambition. Merely presenting a new JSON would not be very exciting.
There is some redundancy between vectors and lists, admittedly.
However, edn is not without legacy. Part of that legacy is Lisp and s-expressions, and deserves some respect and accommodation. If that means readers map the [] and () to the same code path, well, that's not much work. Writers can always write only [] if they prefer.
Having the distinction has proven tremendously useful to Clojure, and might to other applications as well. In particular, consider DSLs written in edn, which, like Clojure, might need to distinguish calls or grouping from list/vector data. For all the bitching that goes on about parens, few people would want to work in a language without them :)
Maps with non string keys are handled differently even in languages that do support them. Are two instances of an object used as a key the same for example. This may vary depending on the type of the object.
> The features that you just specified are not directly available in JavaScript.
JavaScript has hashmaps with non-string keys, they're just not exposed by JSON. Also a set can be treated as an array in a non-clunky manner. Likewise for treating an integer as a Number.
I'm not sure the design goal of edn is to actually be consumed from JavaScript though?
Stuart Halloway mentioned in one of his talks that SOAP, iirc, did way too much and that as a reaction to that, JSON went too far in the bare bones direction. Looks like edn is the balance Rich Hickey and associates have struck. From a Clojure programmer point of view, I like what I see.
From what I can tell, not much, but it adds another form of extension (via tags) which you can arguably say you also had in XML (via namespaces).
Other than that, the major difference seems to be that EDN does not have a root-element or requirement for such, and as mentioned in the introduction can be used for streaming and real-time content etc.
A few months ago, I started doing an implementation of a Clojure reader in C and a Ruby extension, but I ran out of time to work on it. If anyone wants to use my code as a starting point, please do! I'd love to see more usage of Clojure forms... errr... "EDN" in the wild!
That is 'XDR' (eXtensible Data Representation) which has similar goals and is reasonably mature. Of course JSON (http://www.ietf.org/rfc/rfc4627.txt) does this as well but using character code points. Not to mention XML and ASN.1. In perl there is YaML (http://search.cpan.org/dist/YAML/) too.
I'm wondering what this one brings to the table. The readme file doesn't say.
JSON is not extensible and syntactically more verbose. However, the syntax corresponds exactly to many modern dynamic languages and is now nearly ubiquitous.
I hope Rich will define "alphanumeric" as Unicode letter and number classes. Clojure itself is ambiguous on valid symbol identifiers. http://clojure.org/reader has the same "alphanumeric plus some punctuation" rule, but the actual reader implementation [1] is extremely permissive:
YAML is not just limited for use in Perl, nor is it as closely related to Perl as JSON is to Javascript. One of its creators, Ingy döt Net, was (and still is) an active member of the Perl community, so Perl had excellent support for YAML from the very beginning.
The founding members of YAML are Ingy döt Net (author of the Perl module Data::Denter), Clark Evans, and Oren Ben-Kiki. YAML emerged from the union of two efforts. The first was Ingy döt Net's need for a serialization format for Inline, this resulted in his Data::Denter module. The second, was the joint work of Oren Ben-Kiki Clark Evans on simplifying XML within the sml-dev group. YAML was first publicized with a <?xmlhack?> article on 12 May 2001. Oren and Clark's vision for YAML was very similar to Ingy's Data::Denter, and vice versa, thus a few days later they teamed up and YAML was born.
Absolutely correct, brain fart. The description language was designed to represent any data structure you could represent in C, in XDR but more importantly insure that moving those structures across a network between disparate architectures would return them the natively 'correct' format when received.
I'm developing with Clojure on a daily basis at present, and it's cool to see such a slick data notation take on new life outside of my REPLs and .clj files.
Now, what would be really interesting is if edn (or an official superset of it) could be formalized with "hypermedia controls".
One of the shortcomings of JSON is that as a standardized hypermedia type, it doesn't offer any hypermedia controls. There are efforts to standardize JSON-derived types which do provide such controls:
It would be great to see edn take on that challenge in its early days as an extra-clojure, general purpose data notation. I'm convinced that Fielding is right about REST implying HATEOAS (others argue for "practical REST"), but you can't robustly implement REST/HATEOAS APIs with a media type that outright lacks hypermedia controls.
Would you mind suggesting some possible formats for this, and explaining a bit what’s necessary for “hypermedia controls”? Or is there a link somewhere that defines this more clearly? The hypermedia-oriented design link was kind of abstract.
Michael can correct me if I'm wrong, but from what I read from Fielding it's essentially having links and information about the relationship between "this" and the linked resource as first-class types in the data format.
A concrete example would be the <link> tag in HTML, which with the 'rel' and 'href' attributes can make explicit a relationship between that HTML page and e.g. the page of its author.
I'm not sure about a format just yet, but I'm fairly certain it would be a superset of edn (as opposed to imposing it on "plain old edn"). The Document Format spec for Collection+JSON is probably not a bad starting point:
Seems like it could readily be adapted to edn syntax. But I find myself drawn to the RDF concepts underlying JSON-LD, and wonder if the C+J and LD ideas could be melded together.
A solid exploration of the larger concepts involved in hypermedia controls can be found in Mike Amundsen's recent book:
The basic idea is that, per Fielding, HATEOAS is an aspect of REST that shouldn't be ignored. Standardized media types with well-defined hypermedia controls allow clients that understand those standards to make state transitions without having to "know other rules" that aren't contained in the representations themselves.
Our web browsers do this all the time when we click on links and submit forms that we find in web pages. Based on the X/HTML standard, ours browsers (hypermedia clients) know what to do with links and forms, i.e. how to initiate the appropriate state transitions (build and run GET/POST requests) based on the markup semantics.
Now, if a client is driven programmatically (i.e. not by a human reasoning about the appearance and purpose of a form-button labeled "login", etc.) then you need a bit more besides links and forms. This is where attributes like "rel" (for `a` and `link`) become important. There's been a lot of work in that area, e.g. Microformats and RDFa, among others.
i think that one of the reasons why JSON got so popular is that it was artificially restricted to a smaller language than the actual object notation in JavaScript. This meant that it was easy to write a parser for which, in turn, happened a lot.
I see lots of optional extras here that might make humans a little happier but may increase the chance that different implementations are incompatible.
Thanks for the info. Having written wire serialization from typed to untyped formats many times over the years, the limited expressivness of such formats has been an ongoing source of annoyance.
It is only me who has a strong feeling that this vector is an ugly construct here - [Node #tree/Tree Empty, #tree/Tree [Leaf 42]] and traditional S-Expression, which just describes an abstract structure - a list, not a particular data-structure - vector, is much better way?
And [Leaf 42]? What is this? Vector? Of elements of arbitrary type?
But a list of arbitrary elements is a quite natural and human-mind-friendly concept, and the Lisp notation, which intentionally omits any description of underlying representation, is a great idea.
Extreme expressiveness makes a data-format harder to understand
I would argue that extreme expressiveness through the most simplistic rules makes a data-format easier to understand than some data-format which is only mediocre-ly expressive through more complex and inconsistent rules.
For instance, Lisp/S-expressions typically has a much simpler data-format than C and is much easier to learn from end to end.
The complexities associated with Lisp-code can not and should not be attributed to it's syntax, but rather that most Lisp-code is written to be purely functional and non-procedural.
While procedural and/or stateful C/C++/Java/C#-code might be easier to understand for a C/C++/Java/C#-programmer, I don't think you would find any of those programmers arguing that S-expression syntax is harder to grasp and master than the complex syntax of C-based languages.
At a glance, is the order of follows relevant? Should it be preserved? Is "anon" the same kind of thing as "ben" or "billy-jo"? Is the key in message-counts another name, or some collection thereof?
I get that it's a richer format than JSON, with the introduction of sets, dicts and so on, but what I felt wasn't explained was the "extensible" part. Can someone give an example of this?
> But isn't it backwards to have the generator suggest where to put trigger points instead of the consumer?
No. edn is self-describing, and all descriptions bottom out on built-in types. Having each date/instant or extended type proclaim it is an #inst or #whatever means a single handler will ensure they all become proper types on the consumer side, with no knowledge of the application or document structure at all. And if no specific handler is installed, a generic handler can at least ensure that the value returned keeps track that the tagged element was tagged, and how, rather than just silently yielding a string with some encoded cruft in it.
The world you are describing is that 'context-sensitive' world mentioned in the edn rationale, where the consuming app must know that the cruft inside the dob: string inside a particular map in a particular context is actually a date or :id is a uuid or whatever, and also know how they represented. Ditto the lastEdited: field, startDate: and foo: fields etc. Are they all the same? Handling any particular document means having complete knowledge of such details.
Context-sensitivity greatly complicates applications and thwarts generic processing.
This is my attempt at a js version. parse is working - now working on encode.
There is really nothing stopping it from working in browser other than I am currently working on it as an npm/node js package.
I can successfully load datomic schemas and .dtm files which I think is a pretty good test.
Need a lot more tests around int/float wrt the arbitrary precision. I would love to have some pull requests if anyone wants to write valid tests which break it.
I'll stick with JSON for one simple reason: you can have anything as a key (as long as it's escaped). The fact that this can't means you either a) can't use it if you have more-complex keys for some reason, or b) you have to use some non-standardized packing format to encode your illegal keys.
</redacted>
edit: thanks repliers, I missed part of the spec. Ignore!
Well if by anything you mean Strings, that is very limited.
2.2. Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
I kind of like the tagging syntax. It almost seems like they should really be lists with the first tag in function position, but I don't know think that would interop with real Clojure. I tried to come up with something similar, but somehow couldn't generalize to tagging any random object. When in doubt, generalize...
I mean `(#myapp/person {:first "Fred" :last "Mertz"})' or `(#seconds 330)'. In this case it's probably the extra parentheses that get old fast. Even so, having semantically connected items not be stuck together makes me a bit queasy.
Things that probably make sense coming from Clojure, but seem somewhat unnecessary for a general purpose data interchange format: explicit character type (as compared to length 1 strings (which could optionally use the extension mechanism if necessary)), separate types for vectors and lists (seems like the extension mechanism could handle this if it’s ever necessary; to some extent this criticism holds for sets too, but those are also more independently useful).
One type not included that I find useful: some kind of "raw" string wherein backslashes are interpreted literally, and double escapes aren’t required all over the place.
Possible point of confusion that should be spelled out more explicitly: by the grammar provided, a floating point number requires an explicit leading digit. That is, '0.5' cannot be spelled '.5'. (Should an implementation accept '.5' as a number, or reject it as badly formed?)
Also, does "a floating-point number may have the suffix M to indicate that exact precision is desired" mean that it should be interpreted as a decimal number? Might be worth saying that directly.
It would be nice to see a bit more guidance about "Symbols are used to represent identifiers, and should map to something other than strings, if possible." Perhaps this could include examples of what Rich Hickey & al. think would be useful interpretations in JavaScript and Python (to pick two obvious popular examples).
Most of all, it would be nice to see a clear explicit treatment of Unicode for strings/symbols (I’d recommend utf-8 here), including possible ways of escaping code points in strings. Confusions about Unicode are one of the main points of incompatibility between JSON implementations, and the JSON spec has more to say about the subject than this current spec does.
One nice built-in tag to add: base64 raw data, using one of the typical base64 encodings, which should be described in the spec to avoid any confusion.
Question: if a tagged element is considered a unit, can another tag be put in front of one? That is, something along the lines of '#outer_tag #inner_tag "built-in element"', where in the code which interprets the file, whatever object is produced by the inner_tag's extension is sent as input to the outer_tag's? It’s worth clarifying this so that implementors make sure to add the case to their test suites.