
Mark – A simple and unified notation for both object and markup data - henryluo
https://github.com/henry-luo/mark
======
TeMPOraL
So this is _literally Lisp_ , just with curly braces instead of parenthesis
:).

I don't understand this part of the readme:

> _The advantage of Mark over S-expressions is that it is more modern, and can
> directly run in browser and node.js environments._

Does this mean I'm so out of date with JS that this syntax is actually legit
JS? Or does Mark run its own parsers, at which point it's just like sexps,
except it uses the "more modern" curly brace instead of "less modern"
parenthesis?

~~~
dragonwriter
> So this is literally Lisp

No, it's close to S-expressions, but it's not a programming language like
Lisp.

> just with curly braces instead of parenthesis

Well, and more syntax than S-expressions: it's got both objects and arrays as
fundamental structures instead of just lists, and it has commas as noise
characters.

~~~
zeveb
> Well, and more syntax than S-expressions: it's got both objects and arrays
> as fundamental structures instead of just lists

Well, Common Lisp has both of those as fundamental atoms:

    
    
        #S(foo :bar 3)
    
        #(1 2 3)
    

The former creates an instance of a FOO struct with its BAR slot set to 3; the
latter is a 3-item array.

~~~
nanny
And then CLOS objects, hash tables, regular expressions, etc. can be added
with reader macros.

~~~
TeMPOraL
It's cool and I love it, but it's irrelevant in the context of a universal
data exchange language. In such case, you'd want to have those primitives
defined in the exchange language spec itself - even if you'd end up
implementing them as reader macros in CL (which I don't recommend - reader
macros turn your READs into EVALs, which you obviously shouldn't do on
untrusted input).

~~~
zeveb
> you'd want to have those primitives defined in the exchange language spec
> itself

I agree with this: certain things need to be in the spec.

> even if you'd end up implementing them as reader macros in CL (which I don't
> recommend - reader macros turn your READs into EVALs, which you obviously
> shouldn't do on untrusted input).

This I don't agree with, because technically #\\( & #\" are reader macros …
they're just very well-defined reader macros. Presumably a spec which defined
hash tables, regular expressions or whatever would define them as well as the
Lisp spec defines lists and strings (and if not, well — it's a bad spec!).

~~~
TeMPOraL
> _I agree with this: certain things need to be in the spec._

That was my main point. In the second part of the comment I didn't mean to
discourage use of reader macros - it was more of an aside that the general
facility of CL-style reader macros literally makes READ "shell out" into EVAL,
so you need to (diligently) disable it for untrusted input (or reimplement a
limited READ by hand). So we can't say "oh, but S-exps in Common Lisp can have
anything through reader macros". Presumably _if_ hash table literals were
specified as a part of basic syntax, we could depend on it being standard
_and_ part of the safe subset of READ's duties; as it is however, we can't
depend on it for arbitrary inputs.

------
tonyg
One major weakness of JSON is lack of a corresponding "infoset"; that is, an
equivalence predicate. When are two JSON blobs "the same"? There's no sign of
anything like this here.

Another is the lack of support for binary data. There's no sign of support for
binary data here.

Finally, there's this claim:

> _The advantage of Mark over S-expressions is that it is more modern, and can
> directly run in browser and node.js environments._

Is it more modern? I don't think I care.

Can it directly run in browser and node.js environments? What does that mean?
It seems to need a parser. But then, S-expression parsers certainly directly
run in browser and node.js environments.

\---

IMO, SPKI SEXPs are _much_ more sensible than this design and many, many other
designs:

[https://people.csail.mit.edu/rivest/Sexp.txt](https://people.csail.mit.edu/rivest/Sexp.txt)

~~~
zeveb
> IMO, SPKI SEXPs are much more sensible than this design and many, many other
> designs

Yes, yes, ten thousand times _yes_! I really don't understand why, over two
decades hence, the world has stuck with XPKI & ASN.1, and has invented XML &
JSON, when SPKI solved the PKI problem for good & canonical S-expressions
solved the flexible- and human-readable–data-exchange problems for good.

~~~
Groxx
Since you both seem to know the spec: how would you encode key/value pairs? Or
would you have to have a list of nested lists, like

    
    
        (my_dict (key value) (key value) (key value))
    

Un-ordered qualities for data can be useful (e.g. they allow you to reorder
data to stream "important" stuff first), but I don't see it anywhere in here.

~~~
zeveb
With canonical S-expressions, unordered sets are a problem because part of the
point is to be able to have a single canonical sequence of bytes, which can be
hashed or compared bytewise for equality.

In general, I'd resist specifying data as arbitrary key-value pairs, but _if_
I decided that I indeed needed them, I'd do exactly as you suggest — and I'd
mandate that the be sorted lexicographically by their keys.

------
olavk
Each existing format have advantages and disadvantages for particular
purposes.

Benefit of HTML: You can actually write it by hand and easily see where each
element begins and ends, even when the document is longer than a screenfull.
Mark has the "}}}}} problem with larger documents, so it is not as suitable
for human-written markup.

It is not clear to me how mixed content like <cite>Hello <i>world</i></cite>
is expressed in Mark. I expect it will be pretty convoluted.

Benefits of JSON: Maps directly to simple data structures: List, dictionaries
and simple values. Similar data structures are supported in almost any
language. Mark has "type names" and anonymous text content which complicates
serialization and serialization a lot, and is sure to give interoperability
(and perhaps security) problems.

So - worst of both worlds? Instead of tying to be an overall worse alternative
to all the formats, they should rather focus on a specific niche where Mark
can be a better alternative.

Take configuration files, for example. They don't have large amount of textual
content like HTML, and they don't need to be transferred between disparate
systems.

    
    
       {size width:100 height:100}
    

vs

    
    
       <size width="100" height="100"></size>
    

vs

    
    
      {"size": {"width":100,"height":100 }}
    

In this case, the Mark syntax is simpler and cleaner. Mixed content is not
needed, which would make the format simpler. Yeah it is basically the same as
S-expressions, but that is not a bad thing.

~~~
phoe-krk
_Mark has the "}}}}} problem with larger documents, so it is not as suitable
for human-written markup._

And HTML has a problem of
</span></li></ul></div></div></div></div></body></html>, all spread over nine
different lines, one tag per line.

Take a look at
[https://github.com/keithj/alexandria/blob/master/definitions...](https://github.com/keithj/alexandria/blob/master/definitions.lisp#L24)
which is Lisp code styled in a standard manner. I don't see any problem there.

~~~
olavk
The HTML example is _much_ better than "}}}}} though, since you can e.g. add a
new item at the end of the list without needing a specialized editor to locate
the right position. This is one of the reasons for the redundancy in repeating
the tag name in the end-tag. In theory Lisp should have the same problem, but
usually code (hopefully) rarely have nested blocks larger than a screen, so it
is not a big issue in practice, even if )))) looks ugly. Bottom line is code
have a different structure than typical hypertext documents, so just because a
notation is suitable for one does not mean it is suitable for the other.

~~~
henryluo
But when s-expression is used to represent document, not a program, then it is
also not free to refactor deep nested content. So s-expression is no better
than XML/HTML/JSON/Mark when encountering deep nested content.

~~~
olavk
It is exactly the same as Mark, but I am arguing it is _worse_ than XML/HTML
for these scenarios.

------
nwellnhof
Comparing {mark} to XML, it doesn't seem to support namespaces which makes the
claim to be extensible somewhat dubious. How am I supposed to add custom
objects without risking name clashes? Namespaces also make XML kind of fully
typed without being tied to a single programming language.

Another strength of XML is support for mixed content which seems rather
awkward in {mark}. The following

    
    
        <p>Some <b>bold</b> text</p>
    

apparently needs to written as

    
    
        {p 'Some' {b 'bold'} 'text'}
    

It would be more honest to mark support for mixed content as "verbose" in the
feature table.

Besides, the name {mark} seems like a bad idea. How could you find relevant
results when searching for {mark} using a search engine?

~~~
henryluo
Current Mark design does not enforce a namespace standard. Namespace can be
easily captured in Mark, e.g. {'ns:elmt' 'xml:attr':'value' ...}

XML Namespace seems to have a lot issues, thus Mark does not want to enforce
something exactly following it.

Namespace in Mark, currently, is left upto the application user to define it.

We might be able to come up with a better way to define namespace.

As for the name, you can just use Mark. I use '{mark}' as an alternative name,
to make it more graphical, more impressive.

~~~
specialist
Please don't.

XML Namespaces is syntactic vinegar.

Less is more.

------
castis
Needs a "Why was Mark created?" section because this appears more 'neat' than
'useful'.

~~~
henryluo
Yes, I'll do that.

------
jasonjayr
> The advantage of Mark over S-expressions is that it is more modern, and can
> directly run in browser and node.js environments.

There seems to be a ton of s-expression parsers in npm already, that can run
in browser and in node.js:
[https://www.npmjs.com/search?q=s-expression](https://www.npmjs.com/search?q=s-expression)

Besides being able to run in js environments, what else does {mark} bring over
s-expressions?

------
meddlepal
"Whoever does not understand Lisp is doomed to reinvent it"

\- A wise man on the Internet once said

------
lolive
Defining the type of objects is a must when you want to exchange things in a
strongly typed environment (Java on the server, TypeScript on the client, for
ex). So +1 for {mark}. Do you handle multiple typing? (We use that a lot in
Neo4J, and we think it is really neat)

Another comment: Coming from a Semantic Web background, and using N3 as the
exchange format and N3.parse() as my client-side lib, I would advise to have a
UID parameter to uniquely identify objects, and a refId syntax, so any
parameter can reference some other objects of the data structure. That helps
when you want to transmit a graph [1].

My humble 2 cents.

[1]: I would add that it is also useful when you retrieve some refIds that are
not defined in the current data structure. You can then ask the server to
dereference these refIds, and send another (portion of the) graph, that you
can connect with the existing data structure.

~~~
jsd1982
Can you clarify what is meant by "multiple typing"?

~~~
lolive
Let's say you transmit an object of type Person, that is also a Student and a
MartialArtist. Your inheritance graph may define that a Student is also a
Person. So not sending the Person type could be fine. But would you define a
common subtype for Student+MartialArtist, just because your data serialization
handles only one single type per object? Obviously no! You want to send your
object with types "Student" and "MartialArtist". I.e multiple types.

------
brianolson
If you want a better JSON, try binary-json "Concise Binary Object
Representation" RFC 7049 [http://cbor.io/](http://cbor.io/)

~~~
imtringued
You either go with JSON because everything talks JSON or you go with something
that doesn't have an explicit parsing step like flatbuffers or capnproto.

If you don't care about parsing CPU efficiency then gzipped JSON beats
protobuffers, CBOR, etc when you care about bytes sent over the wire.

If you care about CPU efficiency then protobuffers, CBOR, etc are worse than
flatbuffers or capnproto.

There is not a lot of space for a new standard between these two existing
categories.

~~~
kentonv
> gzipped JSON beats protobuffers, CBOR, etc when you care about bytes sent
> over the wire.

Gzipped JSON does _not_ beat gzipped Protobufs in message size. Comparing
gzipped JSON to uncompressed Protobuf doesn't make sense.

------
rco8786
Interesting project but a little overboard with the self back-patting in the
README.

------
rdtsc
> The advantage of Mark over S-expressions is that it is more modern

Is it more modern because it is newer? There is mention of how adoption is
limited, but wouldn't the adoption of a completely new syntax be even more
limited :-)

There is even a canonical representation using length prefixes:
[https://en.wikipedia.org/wiki/Canonical_S-
expressions](https://en.wikipedia.org/wiki/Canonical_S-expressions)

~~~
henryluo
Being 'more modern' means Mark takes a JS-first or web-first approach in its
design. Whether we like it or not, JS has dominated the web. JSON is
successful, partly because it takes a JS-first approach. Mark inherits this
approach.

------
djsumdog
The biggest problem with these ideas is that json is already supported in the
browser.

There might be a use case where your data is better represented in LDIF
because it's hierarchical, but there's no built in LDIF support, so now you're
importing a ton-o-javascript just to parse some new format.

At this point, we should realize json isn't meant to be human readable anyway.
If you need to hunt through it, you put it into some type of json viewer so
you can see the tree and query it. It's an interchange format, that's more
compact than XML.

If you're shipping data between non-browser things like backend services,
there are already binary formats like protobuff that have typing and can be
optimized for small payloads.

------
chowes
Looks awfully similar to Clojure's EDN

~~~
icefox
I rarely code in Clojure, but I do use EDN and whenever I see new standards I
always compare it to EDN.

~~~
zcam
Actually EDN is much better by default (sets, extensibility, etc)

------
majewsky
Besides the nonsensical "advantage over S-expressions" statement in the
README, the biggest issue I have with this is that Mark maps only to
JavaScript, not to other languages where dicts/maps/hashes and
arrays/slices/lists are two different things. Makes me wonder if it just has
not occurred to the author that there are languages != JS.

~~~
henryluo
If all other languages have no problem supporting XML, they'll have no problem
supporting Mark.

It just that in languages like JS and Lua, where an object can be an map and a
list at the same time, they'll have the convenience of mapping a Mark object
into just one object, instead of many.

~~~
henryluo
Another way to support Mark in other languages, is just to use map for both
properties and contents. E.g. in Java, the key in map can be integer. Of
course, the performance will not be as good as primitive array. But it can be
one man's quick-and-dirty solution.

General JS arrays (not those TypedArrays) are actually maps indeed.

------
henryluo
Thanks for several comments pointing out the unclearness of what's being "more
modern".

I've updated the README to be: "The advantage of Mark over S-expressions is
that it takes a more modern, JS-first approach in its design, and can be more
conveniently used in web and node.js environments."

Hope it's clearer now.

------
patrickmay
Anyone who doesn't know Greenspun's 10th Rule is doomed to rhyme with it.

------
agentgt
_The nice thing about standards is that you have so many to choose from._

\- Andrew Tanenbaum Computer Networks, 2nd ed., p. 254.

I think all developers go through some experience where they want to just
"unify" everything because that will supposedly make it easier for them and
other developers.

Overtime as you become more experienced or I guess jaded you realize that
reality of a "GUT" technology platform or programming languages is a pipe
dream and the effort to get people to use said new format/language/tech is
more effort than what you get in return.

Anyway to be short about it I think most should just pick the best tool for
the job and stop rebuilding things that don't need to. And if you do please
make sure you have a plan to how you are going to replace all the old working
stuff.

~~~
nfoz
> I think most should just pick the best tool for the job and stop rebuilding
> things that don't need to.

I think you just contradicted yourself. Sometimes the best tool for the job is
something new, something improved over what already exists.

I don't think the author intends to "replace all the old working stuff". But
if this tool is better for new projects, then why not? I don't get all the
negativity... do people here really love XML/JSON/YAML that much? There's a
whole lot to complain about in all of those!!

~~~
agentgt
I am not averse to new formats. I am averse to formats that try to “unify”.

And yeah I don’t have a problem with XML or JSON. Those two combined with some
flatbuffer other men binary protocols cover most of my use cases... like
really what’s with all the XML negativity.

------
ZenoArrow
"XML...Fully Typed: No."

XSDs don't count then?
[https://en.wikipedia.org/wiki/XML_schema](https://en.wikipedia.org/wiki/XML_schema)

~~~
henryluo
XML is only semi-structured/typed without schema. JSON and Mark are always
typed.

Full formal schema definition, as in XML, is often a burden to ad-hoc
scripting, which is common in JS. JSON/Mark provides sufficient type info for
these adhoc usages.

~~~
cakoose
JSON is not "fully typed". It just happens to have different syntax for
strings, numbers, and booleans. But the application code still needs to come
up with a way to distinguish between timestamps, enums, different object
types, etc.

XML uses the same syntax for strings, integers, and booleans, but it has
mature schema/typing tools that make it easy to apply more precise typing,
which you'd want to do anyway to identify timestamps, enums, and different
object types.

------
drofmij
Pros: At least its not JSONx :D

Cons: Not seeing any advantage over JSON. If you want a type for objects just
add a type field and have your code read it. Then you can use any of the
existing parsers.

------
nailer
> It has clean syntax

You could remove every '{' with 0 loss of meaning.

~~~
drofmij
I was thinking this looked like YAML with {}

~~~
henryluo
YAML does not have good support for mixed content.

------
smizell
I made my own little language called Geneva [0] for similar ideas but it acts
as code and can be parsed as JSON. I also came up with a spec for doing this
for HTML [1] (but no code to do this yet).

[0] [https://github.com/smizell/geneva](https://github.com/smizell/geneva)

[1] [https://github.com/smizell/janeml](https://github.com/smizell/janeml)

------
Jeaye
Please just use EDN: [https://github.com/edn-
format/edn](https://github.com/edn-format/edn)

------
rurban
What he forgot to add:

Some disadvantages of Mark, comparing to JSON would be:

* Mark is insecure, JSON is secure.

* Mark is slower than JSON

Passing types directly to object.constructor is of course entirely insecure.
[https://github.com/rurban/Cpanel-JSON-
XS/blob/master/XS.pm#L...](https://github.com/rurban/Cpanel-JSON-
XS/blob/master/XS.pm#L2052) (i.e. CVE-2015-1592)

~~~
henryluo
Thanks for feedback on the security aspect. It is something that Mark
definitely needs to consider carefully.

Current Mark implementation does not call arbitrary constructor during
parsing. The constructors are created from scratch. But application users
might want Mark to call their customer class constructor. I'm thinking passing
in a callback function to Mark.parse().

~~~
rurban
What YAML did in this aspect is providing a whitelist of allowed classnames.

------
staticelf
I like the idea but I don't think the benefits outweighs the negative
implications it would have to implement it.

I mean JSON as a data format for api stuff is just enough as it is and you'd
need some serious reason why to change from JSON and these reasons just
doesn't cut it.

------
zeveb
> The advantage of Mark over S-expressions is that it is more modern, and can
> directly run in browser and node.js environments.

… with the right translator to JavaScript, which also happens to be true of
S-expressions.

His table is incorrect, incidentally: S-expressions support mixed content (if
I understand what he means) and are also fully generic.

He doesn't have a good example of the benefits of his proposal over
S-expressions: 'more modern' just means 'undiscovered bugs.'

I respect his enthusiasm and hard work, but I believe what the world needs is
hard work on existing things rather than hard work reïnventing the wheel.

------
jl6
> Mark utilizes a novel feature in JavaScript that an plain JS object is
> actually array-like, it can contain both named properties and indexed
> properties.

Where can I read more about this feature of JavaScript?

~~~
nsm
JS objects are tables, with Arrays being a syntactic convenience with some
extra properties. JS engines do heavily optimize for the array case with dense
layout. [https://docs.microsoft.com/en-
us/scripting/javascript/object...](https://docs.microsoft.com/en-
us/scripting/javascript/objects-and-arrays-javascript)

Calling it novel to JS is a stretch. Lua does this too and I’m sure there are
other languages.

~~~
fiddlerwoaroof
php does too

------
etu
So it's basically just XML with curly braces or sexps...

And why bring YAML in the mix? Yaml isn't used for transfer I hope? Should be
compared to TOML as well in that case that seems a lot better than YAML,
especially for configs: [https://github.com/toml-
lang/toml](https://github.com/toml-lang/toml)

Or msgpack? Which also seems useful. Why not protobuf? Or just s-exps which is
basically what this is.

------
adrianratnapala
If I understand the grammar properly, all plain-text elements have to be
quoted. That makes sense for object data, isn't really markup friendly.

------
geraldbauer
Great initiative. I'd say why not improve the leading format, that is, JSON
;-) ? I'm collectiong all data markup flavors and extension (HJSON, JSON 1.1,
JSONX, SON, etc.) in the Awesome JSON - What's Next page @
[https://github.com/json-next/awesome-json-next](https://github.com/json-
next/awesome-json-next) Cheers.

~~~
henryluo
JSON has a strong selling-point that it is compatible with JS in syntax.

It is very hard to make major extension to JSON and still be compatible with
JS syntax. Minor changes are possible, like in JSON5.

Once it breaks JS compatibility, I don't think people will think it is JSON
next any more.

~~~
geraldbauer
Good point. If it's not JSON next but a complete new format than you will have
to compete with JSON, all JSON next formats and all other alternative formats.
Good luck.

------
gregman1
Reinventing LISP is a nice thing.

I like this project.

~~~
alvis
I like this syntax also for this reason, though I have to add that the hell of
parenthesis would prohibit it to be widely adopted.

~~~
nfoz
Is that really worse than the "hell" of angle-brackets in XML/HTML? Because
those are pretty widely adopted.

------
LoSboccacc
> Mark utilizes a novel feature in JavaScript that an plain JS object is
> actually array-like, it can contain both named properties and indexed
> properties.

wouldn't that make introspecting objects very annoying?

~~~
henryluo
In Mark implementation, care has been taken so that indexed contents are not
enumerable. So e.g. when you run a for ... in loop on a Mark object, you'll
only see properties, not the contents.

This is one of the difference between Mark object and an array. Array contents
are enumerable by default.

------
gaius
Doesn’t have a native date/time, so no advance over JSON really

~~~
tlocke
You might be interested in Zish:

[https://github.com/tlocke/zish](https://github.com/tlocke/zish)

It's a data serialization format with timestamp, bytes and decimal data types.

~~~
gaius
Heh, check out [https://www.obj-
sys.com/asn1tutorial/node15.html](https://www.obj-
sys.com/asn1tutorial/node15.html)

ASN.1 dates from 1984. So much for Mark being “modern”!

------
korpiq
What would you use for schema validation? Obviously [http://json-
schema.org/](http://json-schema.org/) won't cut it.

~~~
henryluo
A new Mark-specific schema will need to be developed, based on the prior art
or XML Schema, JSON Scheam, etc.

------
couchand
This should probably be titled "Show HN:".

~~~
henryluo
I'll do that next time.

------
jdlyga
I really like protocol buffers. If you've never used it, they're really worth
checking out.

------
jchw
Not sure how many more xkcd 927s this thread will have but personally what
bugs me the most is the thought that being slightly more versatile than JSON
means that Mark is worth the trouble of adopting. Data structures are well
represented by JSON, markup is mostly well represented with XML. I rarely
really want to mix the two. Additionally, this doesn't feel like a language I
would want to write documents in any more than I do XML.

I think we're fine with separate languages for data and markup.

~~~
henryluo
Data and markup/document separation might not always be that clear cut.

The latest trend in CMS systems, piloted by the latest content editors, like
Quill, Draft.js, ProseMirror, Slate.js is to use JSON to present the content,
instead of using HTML or Markdown. Using object notation, gives rise to
cleaner API and data model.

So the wall between data and markup, JSON | XML may collapse one day.

------
CodeSheikh
I wish the start of comment can still be made less verbose-y “{!-- comment
--}”

~~~
henryluo
Comment can just be {!comment} or {#comment} in Mark, if you like.

In the README example, I deliberately made it resemble HTML comment, so as to
make it easier for people to correlate.

~~~
jfoutz
huh. when i read the spec, i thought comments were c like

    
    
        begin_sl_comment ::= '//'
        begin_ml_comment ::= '/*'
        end_ml_comment   ::= '*/'
    

maybe that ebnf is out of date.

~~~
henryluo
In Mark, there are two types of 'comment'. // comment and /* comment */ are
lexical comments, which are stripped during parsing.

Then there's Mark pragma, like {!pragma}, which are preserved in the data
model. HTML comment is supported as Mark pragma, not Mark comment.

~~~
nfoz
Oh that's a really interesting distinction.

------
jerf
I would strongly suggest a renaming. Mark is so generic that you can already
tell you want {mark}, which has a pronouncability issue, and the name is too
similar to the big player that is markdown (like a bright star making it
impossible to see a dimmer one right next to it), making "mark" sound to me
more likely to be a markdown renderer than a JSON replacement. Given that both
mark and markdown are markup languages of various sorts, the names are just
too close to each other, i.e., it would be different if markdown was what it
was and you were proposing "mark" as the name of a library that lets you put
marks based on geographical criteria on a map or something totally different.

Mark reserving all number-only keys is statistically likely to become a
problem as a project grows larger. I'd suggest finding a different way to get
out-of-band data to be _fully_ out-of-band, rather than trying to carve out a
chunk of keys.

Somewhat similarly, defining a "pragma" as "something surrounded with braces
that isn't a legal object" means that if you ever want to change the
definition of an object in the future, you can't, because you will turn things
that used to be pragmas into objects, or less likely (because you'll try to
avoid this going forward) but still possible, vice versa. You need to
concretely specify what a pragma is unambiguously, in a way that you can
evolve either without affecting the other. It also means errors in generation
become legal pragmas instead of errors, which will cause surprises, and on the
flip side, errors in parsing objects can turn them into legal pragmas rather
than parse errors.

I would reserve saying "Mark is a superset of JSON" for the case when you
really can feed any JSON to a Mark parser and get a (roughly) equivalent
structure. Alternatively, go through the documentation with a text find option
and make sure every time you say "superset" it is qualified as a " _feature_
superset". Especially in light of "(Mark does not support hexadecimal integer.
This is a feature that Mark omits from JSON5.)" The word superset should
either be qualified every time or mean a _strict_ superset; "Mark is a nearly-
feature-superset of JSON" would be more accurate.

In general, a review of
[http://seriot.ch/parsing_json.php](http://seriot.ch/parsing_json.php) may be
appropriate; mark addresses only one serious issue, and the other fixes are
ultimately fairly superficial (the trailing comma issue, for instance, is
almost never a problem for me because ninety-nine-point-I-don't-know-how-many-
nines percent of the time, JSON is a thing my tools generate; the cases where
that is a _serious_ issue have generally already moved on to another format
like YAML, same for comments). Also, per my comment about parse errors turning
objects into pragmas, if you expect this to become a big cross-language
standard it is worth reviewing a snapshot of the variability in JSON parsers,
which is a simpler format. A more complicated format should expect to see
_even more_ subtle divergences in its multiple implementations and things like
"misreading an object as a pragma" to become even more likely at scale.

~~~
henryluo
Hexadecimal integer is not part of JSON. It is a new syntax introduced in
JSON5 (not JSON version 5). I don't think this feature is that useful, thus
does not incorporate it into Mark.

~~~
jerf
(I can't downvote direct replies, so it wasn't me.) I was not suggesting that
you incorporate it. I was suggesting modifying your marketing copy to
incorporate the fact that it will not be a superset anymore. Superset is a
word we should guard and not let it become "sort of superset-ish, maybe,
mostly", but should mean _superset_. If you don't have _every feature_ , you
should not say it's a superset. Since not only is there nothing wrong with
feature elimination, but when done well is a downright good thing, it's not
like this is some sort of major problem for the marketing or something; just
say you used some taste in what you brought over.

And again let me emphasize, since you seem to be saying it again in some other
replies, that "{mark} is a superset of JSON", if you mean that _syntactically_
(as opposed to features wise), MUST mean that _every valid JSON document will
produce a valid {mark} parse_. Nothing less than that qualifies it as a
superset. Given that you reserve numeric keys I don't think that is the case;
whether the grammar is a superset is harder to determine so I haven't tried.
That would be something best served by taking a very complete JSON parser test
suite from someone and validating that all their corner cases that are
supposed to parse in JSON, parse in {mark}. Based on my own experience in the
world of parsing, the odds of you passing that first try are very low; if you
manage, major kudos to you as that would be a very difficult test. (Though I
would imagine that since the grammar largely came from JSON a lot of the
surprises would be the ways in which your parser turns out to deviate from the
grammar rather than grammar errors.)

------
JetSpiegel
Looks like a worse HAML...

------
singularity2001
tangential: I love json-5 which is like json, but allows comments and doesn't
need to quote the keys, so it's just 'dumped js'.

------
jlebrech
nope, data shouldn't include structural information (except maybe in the
header) and markup shouldn't include style information.

------
didibus
Not a fan. I'd rather JS adopted EDN instead.

------
velebak
EDN FTW.

------
therealmarv
waiting for mark 2. No... seriously, it's enough already
[https://xkcd.com/927/](https://xkcd.com/927/)

------
Faaak
I don't know if it's useful to have a universal format; each format (YAML,
XML) is suited for for a specific purpose (human readable, completeness).

The headline reminds me of: [https://xkcd.com/927/](https://xkcd.com/927/)

~~~
jrimbault
Maybe we should stop when we reach the YA prefix.

------
Ericson2314
The doesn't have a grammar? When will people realize this is completely
unacceptable.

~~~
henryluo
Grammar details in section 4 of [https://mark.js.org/mark-
syntax.html](https://mark.js.org/mark-syntax.html)

~~~
greggirwin
Henry, did you use Gunther Rademacher's RR diagram generator?

Best of luck with {mark}.

~~~
henryluo
I used grammarkit
([https://github.com/dundalek/GrammKit](https://github.com/dundalek/GrammKit)),
which further uses railroad-diagrams ([https://github.com/tabatkins/railroad-
diagrams](https://github.com/tabatkins/railroad-diagrams)) library to generate
the RR diagrams. I don't know if they are related to Gunther Rademacher's RR
diagram generator.

~~~
greggirwin
Cool. I'll check those out.

