
Ren: a lightweight data-exchange text format - bananicorn
https://pointillistic.com/ren/
======
greggirwin
Author of the Ren page being discussed here. Didn't expect to see it on HN.
:^)

The Ren site (ren-data.org) redirects there, is quite old now, and was a
playground and experimental area. Hence the empty links and such.

In addition to Bolek's Humanistic repo, I had set up [https://github.com/Ren-
data/Ren](https://github.com/Ren-data/Ren) to discuss ideas. Ren is,
effectively, Redbol (Red+Rebol). One of the initial goals was to define a
subset of values and normalize the syntax (Rebol never formalized its format
spec), which could be shared across Redbol langs as they evolved and went in
different directions. And also as a bridge for loaders in other languages.
JSON has taught us that a small spec is important. The balance between
simplicity and expressive value types is key.

It may come back to life at some point, but my time was better spent elsewhere
for a while. I'm focused on Red now (red-lang.org). It has a native bridge
feature for embedding in other langs ([https://doc.red-
lang.org/en/libred.html](https://doc.red-lang.org/en/libred.html)), along with
a lot more. I believe there is still value in formalizing the grammar so
others can create their own implementations, but it's not a priority at this
time. In the meantime, you can find the active Red community at
[https://gitter.im/red/red](https://gitter.im/red/red) to get more information
and examples of what it looks like in use.

Cheers.

~~~
pdfernhout
For base types, rather than a set of specific data types which is limited, how
about having a standard way to indicate a type and data, like "number:12" and
"rational:12/17" and "string:foo" and "date:2017-10-20" and "bitmap-
hex:cafebabea2b9c0..."? Then let the reader parse it however they want given
their desired interpretation of the type tag.

There could be a base set of standard types and ideally an organizational
process to add more standard types -- like mimetypes. Mimetypes also might be
considered included by default perhaps like "application/json;
charset=utf-8:[1, 2, 3]").

It is more characters to include a type for each primitive, but it is more
expandable. Likely these files will mostly be generated and read by code
anyway, with humans just looking at them now and then for debugging.

If that string version seems too cluttery, another option is something like:
rational:12/17 and string:foo and string:"this has spaces".

Or given possible confusion with maps and colons, another option is
rational/12/17 and string/foo and string/"this has spaces" and complex/−1+3i
and real:30.564 and "application/json; charset=utf-8"/[1,2,3] and maybe even
xml/<foo>bar\ baz</foo> and javascript/console.log("hello") and so on.

Or maybe pipes? Like: rational|12/17 and string|foo and string|"this has
spaces" and complex|−1+3i and real|30.564 and "application/json;
charset=utf-8"|[1,2,3] and maybe even xml|<foo>bar\ baz</foo> and
javascript|console.log("hello") and so on.

~~~
greggirwin
That would be a very different format. A big part of the format is that the
lexical forms allow you to write things as you normally would, to another
human. That does impose limits, but is part of the fundamental design.

------
iainmerrick
The very first entry in the FAQ ("Why is none used instead of null?") isn't
very convincing! It says null "it isn't friendly to normal people who might be
given configuration or message files to edit," and gives the example:

    
    
      Children: none
      Opinions: none
    

That certainly looks pleasant and human-readable, but a bit of a nightmare to
interpret! Couldn't it just as well be "Children: 0" or "Children: []"? If the
idea is to let non-technical people edit configuration files, the code reading
the file will have to be very flexible and forgiving.

 _(Edit to add: maybe that 's something you mostly get for free in REBOL? It
would be a major headache in most other languages though)_

~~~
hood_syntax
Valid points about the ambiguity in that example, but I do believe 'none' is
easier to parse than 'null' for the average person. I'm assuming that none is
meant to represent, like null "no input was given" not "an input of zero/an
empty list was given".

~~~
mh-cx
But in that case, wouldn't “unknown“ be even better? I'm not a native english
speaker, though, so I may miss some fine nuances in the meaning of “none“.

~~~
hood_syntax
Disclaimer: this is just my personal views, so I may be missing something
here.

'unknown' is almost like saying "We don't know what should be here", while
'none' is closer to saying "We didn't get passed any information here".

Ultimately, I feel like 'unknown' is more specific, it conveys intent beyond
that of 'none'. Such as, it could imply we actually don't know what kind of
data 'Opinions' should hold, rather than that there merely happens to be no
data to put there.

~~~
greggirwin
We can throw out options for a long time, as there is no perfect choice. Ren
went with `none` as it is standard Rebol and Rebol's designer thought about it
long and hard.

------
21
I'm annoyed when a text format doesn't support comments. Especially a Human
one :)

Sometimes you want to comment a section out of a JSON without deleting it.
Other times you want to annotate some generated JSON.

And because this is a common need, you have ad-hoc unofficial solutions which
are not supported by all parsers.

~~~
taeric
In the book Programming Pearls, one of the sections is on "provenance". One of
the tips shared was someone that put at the header of every generated file,
the command that was used to generate it. I absolutely love the idea and have
wanted to do it in places I generate files before. JSON completely kills this,
though. I don't want to encode the bloody command, just put a little comment
"this file generated on DATE by COMMAND". Grr...

~~~
alecthomas
Ironically, that is exactly why JSON doesn't have comments:
[https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...](https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaGSr)

~~~
taeric
Strictly, that is different. He was against parser directives. And offered a
solution for the case I gave.

Quite frankly, I think this is an area he made the wrong choice on. Which is
fine, but still annoying. Like checked exceptions. Logic for the choice was
sound, if misguided. End result sucks.

------
bananicorn
In case anyone wonders - red-lang[0] and rebol[1] both use this notation.

[0][http://www.red-lang.org/](http://www.red-lang.org/)
[1][http://www.rebol.com/](http://www.rebol.com/)

~~~
9214
mirror site: [http://red.github.io/](http://red.github.io/)

------
protonfish
I wish this (and JSON) wouldn't be ambiguous about number type (integers vs.
floating point.) There are very different types of data in my opinion.

~~~
jbob2000
I disagree. For 99% of applications, a number is a number is a number.
Integers and floating points are a leaky abstraction, I expect the
language/compiler to handle these.

~~~
dmm
Try passing around 64bit integers in json. The majority of json
implementations only implement double float numbers and will mangle your 64bit
ints. The usual solution is passing int64s as strings, nasty.

~~~
colanderman
Those implementations are as broken as a database that stores ZIP codes as
integers.

~~~
dmm
Hit F12 on chrome and enter the following:

    
    
        JSON.parse('[9223372036854775805]')[0] === JSON.parse('[9223372036854775806]')[0]
    

You should get:

    
    
        <- true
    

Is chrome's implementation broken? Those are both numbers that a signed int64
can store precisely but a double float cannot.

~~~
colanderman
Yes, the JavaScript JSON parser is broken (in design), in that it silently
loses data/precision of JSON numbers. JSON numbers are not "int64s" or "double
floats", they are arbitrary-precision decimal values. [1]

Postgres is the only popular system with built-in JSON functionality that I
know of to correctly round-trip JSON numeric data. Python comes close but
fails for any number with a decimal point or in E-notation.

One can argue that the JSON spec is too permissive (and I would disagree,
though I'm somewhat a purist). Or a pedant could note that the JSON spec
doesn't actually say whether _any_ of the digits in a number are considered to
carry information (and does in fact note that many parsers are faulty). But
it's unfortunately true that most popular JSON parsers fail to round-trip
valid JSON data due to flawed design.

[1] [http://json.org/](http://json.org/)

~~~
dragonwriter
> Or a pedant could note that the JSON spec doesn't actually say whether any
> of the digits in a number are considered to carry information

RFC 7159 not only specifies that all of the digits carry information, it
specified exactly what information they carry.

However, it also expressly permits implementations to limit the range and
precision of numbers accepted, recommending (but not requiring) range and
precision at least equivalent to IEEE 754 float64 be supported.

~~~
colanderman
> RFC 7159 not only specifies that all of the digits carry information, it
> specified exactly what information they carry.

Can you quote? I don't see where it does, except by reference to common
knowledge.

But I should clarify. There are several inefficiencies in JSON numeric
representation which may or may not be considered significant by an
application. The ones I can think of, in order from "obviously not" to "well,
maybe":

1\. the case of "e" vs. "E"

2\. the optional "+" sign after "e" or "E"

3\. presence/absence of decimal point and/or E-notation ( _shouldn 't_ make a
difference, but _does_ in many parsers, such as Python's)

4\. the value of the exponent itself (e.g. 3.14 vs. 314e-2)

5\. "-" sign in front of any number with a value of 0 (IEEE floats and ones-
complement integers do have a negative zero)

6\. excess trailing 0 digits after the decimal place (may be used to represent
significant figures in scientific applications)

7\. digits of lesser significance (obviously the most contentious)

JavaScript ignores all but #5. Python ignores all but #3, and due to #3,
sometimes #5 and #7. PostgreSQL `json` ignores none; `jsonb` ignores all but
#6 and #7. Personally I would draw the line between #4 and #5. But neither the
RFC nor the ECMA spec tell us.

------
grabcocque
It looks fairly similar to EDN used by Clojure too. Similar, but not, of
course, the same.

[https://learnxinyminutes.com/docs/edn/](https://learnxinyminutes.com/docs/edn/)

~~~
greggirwin
Some similarities, but different in that EDN has a tagging approach for some
types, and a more limited set of literal value formats.

------
4lch3m1st
The syntax itself reminds me a lot of Lisp (Scheme, specifically) but this is
probably on purpose. I wonder if it can be used on Lisps the same way JSON is
used in JS.

~~~
9214
check out Red and Rebol languages:

[http://red-lang.org/](http://red-lang.org/)

[http://red.github.io/](http://red.github.io/)

[http://rebol.com/](http://rebol.com/)

I think first version of Rebol interpreter was written in Scheme ;)

------
faitswulff
A different kind of pedantry: which is it, 仁 or 人 ? Both appear on the page
and the former seems more fitting.

~~~
rebolek
We agreed on 人, not sure where 仁 came from.

~~~
imron
> not sure where 仁 came from

It means humaneness, benevolence, kindness, and is pronounced exactly the same
as 人 which means person/people.

------
vesinisa
Easiest way to understand the notation would be to see some examples, but the
"Test files" link leads to nowhere.

~~~
draegtun
Here's a link to a another example -
[https://github.com/humanistic/REN](https://github.com/humanistic/REN)

------
mamcx
When click into the resources links all open in blank pages?

~~~
perryprog
Not for me.

~~~
doodpants
The links in the first column under References all have "[http://"](http://")
as their destination. The links in the other columns seem to be fine.

Also, the links under Implementations all result in a 404 page.

------
jepler
It's cute that the spec believes money is denoted by "$".

~~~
evv
I'm confused.. why is this cute?

Of course "$" is used as the symbol for several currencies, but its not clear
if you have a better symbol in mind for the generic concept of money.

[https://en.wikipedia.org/wiki/$_(disambiguation)](https://en.wikipedia.org/wiki/$_\(disambiguation\))

~~~
jepler
Sarcastically cute. It appears very parochial (in the sense of "limited or
narrow outlook") to think that having a single "currency type" is useful, or
that using a "$" to denote it is appropriate. While "$" is used by at least
one important world currency, I am pretty sure that more people use ¥ than $
by a long shot, and users of € are about as plentiful as people who spend in
USD.

(The idea that in Rebol one might write EUR$1.00 to denote the value that
would usually be written 1.00€ is also pretty horrible)

~~~
greggirwin
@jepler,

1) Do you agree that a notation should support currency values? That is, they
are useful to identify in data, as atomic units?

2) If so, do you agree there should be a _single_ standard symbol and lexical
form used to identify them? Because if we don't do that, we have to support
_every_ localized notation, correct?

3) If you agree to both of those questions, what symbol do you suggest? ¤ is
generic, but not on any keyboard layout I know of. Also, see Chris's note
about ASCII priority.

~~~
jepler
No, I can honestly say I've never worked on a software project that dealt
directly with currency values. Personally, in the kinds of projects I do,
explicit support for units of measurement would be more beneficial: you'd like
to use the type system to detect where units from different systems are mixed
(e.g., km+mi) and behave appropriately by performing a conversion; or to
detect where inappropriate units are mixed (e.g., kg+hz) and signal an error
(at project build time if possible!)

With that background in mind, I imagine the scenario here you have two data in
a Ren document which are both the literal $1.00, but one is actually 1.00USD
and the other is actually 1.00EUR: it doesn't prevent errors (for instance,
when you want to perform an operation like + on the two data), because you
still don't know what the data means. You have gained very little over just
using the literal 1.00 instead.

So if I were making a proposal I'd be tempted to suggest a syntax like [1.00
USD], and maybe even giving up one of the remaining sigils ^[1.00 USD] if it
is important to raise to being a special element in the syntax of a Ren file.
Now that you're saying what you mean, you can use the same syntax for all
units: ^[1.00 kg m -2] (1 kilogram per square meter), ^[1.00 V hz -.5] (1 volt
per square-root-hertz, a typical units specification of noise in opamps).

~~~
greggirwin
No need for special syntax. ^ is already the escape character. Just use
blocks. Though a `unit` syntax has been brought up. The notation would use
path syntax, but start with an number instead of a word. Frink, of course, is
the king of languages in this regard.

And while this may be more beneficial in your work, a lot of software does
have to deal with money, where it's important _not_ to use floating point, but
BCD or something else.

------
pnathan
Another go-round, trying to solve the same problem as TOML.

~~~
greggirwin
Not so. From the TOML page: "TOML aims to be a minimal configuration file
format."

Ren is intended to be a general purpose data exchange format.

~~~
pnathan
welp, I'm wrong. So it goes. Better reading in the future. :-/

I'll just leave my comment above to not destroy the comment chain.

