

RJSON: compress JSON to JSON - dogada
http://www.cliws.com/e/06pogA9VwXylo_GknPEeFA/

======
ZenPsycho
I see a pretty major problem with this. It seems to depend on the order of key
value pairs in object literals being defined. The order is not defined by the
JSON or the ecmascript standards. So you can't really depend on the order of
keys in a json object, unless you explicitly define some order (like
alphabetical, for instance). I like the basic concept of compressing json to
json but this is not a particularly good way to do it- since the order of
those keys may not be preserved in round trips through json encoders and
decoders in various languages.

~~~
whalesalad
I saw the same thing right away. Seems like the first item in the array would
have to be used as a hash of key/index pairs to be used for the following
items. I guess this would only be useful with large sets of data, the examples
makes them look kind of silly =)

So this:

    
    
        "users": [
            {"first": "Homer", "last": "Simpson"},
            {"first": "Hank", "last": "Hill"},
            {"first": "Peter", "last": "Griffin"}
        ],
    

Becomes:

    
    
        "users": [
            ["first", "last"],
            ["Homer", "Simpson"],
            ["Hank", "Hill"],
            ["Peter", "Griffin"]
        ],

~~~
danieldk
Doesn't this introduce ambiguity? How do you represent list of 'tuple lists'?

~~~
dogada
You can see it in the unit tests:
[https://github.com/dogada/RJSON/blob/master/test/tests.js#L4...](https://github.com/dogada/RJSON/blob/master/test/tests.js#L43)

------
zippie
Would like to emphasize that this is only really useful in environments where
gzip is not available (as the OP notes)...some tests using the demo JSON
(minified):

test.json = 285 bytes

test.rjson = 233 bytes (18%)

test.json.gz = 205 bytes (27%)

If you are able to bundle a RJSON parser, why not just bundle an existing,
well understood/tested compression scheme such as <http://stuartk.com/jszip/>
or
[https://github.com/olle/lz77-kit/blob/master/src/main/js/lz7...](https://github.com/olle/lz77-kit/blob/master/src/main/js/lz77.js)
instead?

~~~
ZenPsycho
An arithmetic coding scheme which has a model based on the probabilities found
in JSON abstract syntax trees would significantly improve on most typically
used generic compression schemes. Arithmetic coding schemes have largely been
avoided thus far due to patents which have recently expired, if I remember
correctly.

using the order 2 precise model on this page I get 190 bytes-- and that is
still a generic non-json model. <http://nerget.com/compression/>

~~~
zippie
This - JSON specific compression schemes aren't going to yield gains over AST
friendly schemes unless the JSON serialization specification changes
significantly.

Along these lines - shipping a schema with the data payload is avro-like ...
which is also questionable in terms of efficiency when compared with gzip/LZO.

~~~
ZenPsycho
hey look, I found this <http://research.microsoft.com/en-us/projects/jszap/>

~~~
zippie
They are using gzip compression level 1. Bogus.

~~~
ZenPsycho
Are you referring to the graph, in which they set the gzip compression as "1"
in order to clearly show the ratio of compression improvement that their
technique has over gzip?

------
jimktrains2
> reduce JSON data size and network traffic when gzip isn't available. For
> example, in-browser 3D-modeling tools like Mydeco 3D-planner may process and
> send to server megabytes of JSON-data;

`Content-encoding: gzip` anyone?

~~~
mhansen
I don't believe that browsers let uploads be gzipped, as it can't be sure that
the server supports gzipped requests.

~~~
jimktrains2
Good call. I would think there should be a way to specify it... Something to
look into I guess.

EDIT: or as a comment above states, compress (gzip/deflate) it yourself. Not
the most elegant, but if space is an issue.

------
Too
Good idea but i think for those extreme cases where you really have use for
this you might as well go with a specialized protocol, which can be based on
json if you need it.

I had to do this for an application which streamed several hundreds of data
points per second to a browser-client. Both data-size on the network and
parsing time in the browser was my biggest issues. Since it was a html-app i
had to use either JSON.parse or custom parsing written in javascript, the
second option being too slow to be viable. I ended up with something based on
almost only json-arrays and then the client would know what the different
items in the array meant. With his example it would only look something this:
[7, ["programming", "javascript"],["Peter", "Griffin", "Homer", "Simpson",
"Hank", "Hill"]]

So in other words it's just enough overhead to make it parsable by json.parse
but otherwise you only transfer the actual data.

Note that I wouldn't recommend this option unless you really hit a wall where
you realize that this in fact is a bottleneck.

~~~
dogada
RJSON not only compresses data of any structure but keeps it in JSON format
that is huge advantage over gzip/7z and other binary compression algorithms.
You can compress 1M of JSON to 300Kb of gzipped JSON but then you will to
parse 1Mb of JSON on client anyway. With RJSON you can compress 1Mb of JSON to
500Kb of RJSON, then gzip it down to 200Kb and parse only 500Kb of JSON. Then
you can analyze some core properties of parsed RJSON document and unpack it on
demand. For example you may send to client collection of 20 documents, parse
each and show title of each document to the client, then fully unpack only
selected by client document.

I agree with Too that specialized protocol will always win, but RJSON IMO
decreases structural entropy almost to 0 without need to debugging own
protocol.

------
uncle-ezra
If your HTTP connection is going to be gzip compressed, then manual
compression of this kind is not guaranteed to reduce the size of the final
result, and may in fact hurt it.

~~~
lsb
If your HTTP connection is going to be gzip compressed, then this will only
give you monetary savings to compensate for the additional code complexity if
you have medium to big data, at which point storing it in JSON is a
questionable-at-best proposition.

------
perfunctory
How would it compress and then decompress the following document?

    
    
      {"users": [
        {"first": "Homer", "last": "Simpson"},
        {"first": "Hank", "last": "Hill"},
        [2, "Peter", "Griffin"]
      }

~~~
asdfaoeu

        {
            "users": [
                {
                    "first": "Homer",
                    "last": "Simpson"
                },
                [
                    2,
                    "Hank",
                    "Hill"
                ],
                [
                    0,
                    2,
                    "Peter",
                    "Griffin"
                ]
            ]
        }
    

There was a tester linked from the site. Looks like he added the [0, to handle
that case.

~~~
dogada
Yeah, for arrays with first numeric value is reserved schema with index 0. You
can see more examples in the unit tests:
<https://github.com/dogada/RJSON/blob/master/test/tests.js>

~~~
jerf
Unit test suggestion: testPackAndUnpack should also check double-compress and
double-decompress for being identical, it'll tend to find any remaining such
issues, if any.

~~~
dogada
Thanks, jerf, done:
[https://github.com/dogada/RJSON/commit/0a7fc5dd9e142477ebda9...](https://github.com/dogada/RJSON/commit/0a7fc5dd9e142477ebda93f3197ea346a7aff13a)

------
gfosco
I wondered if it was April Fools day... Really, JSON is lightweight enough
that it doesn't need compressing. We're not dealing with XML here! This just
adds another point of failure.

------
chubbard
If you just use GZIP, supported by most browsers, it compresses without the
need of special software on the client side to rebuild the original JSON
response. And does a better job at compression I bet. I thought of doing
something like this to rebuild cycles in object graphs, but I didn't because
it requires special parsing logic for the client to use.

------
leeoniya
similar to JSONH, which i use to pass recordsets back and forth. there's no
recursiveness for nested stuff though.

<https://github.com/WebReflection/JSONH>

------
reedhedges
dogada, do you have any results showing both size reduction, and speed of
compression/decompression vs. other methods?

~~~
dogada
On algorithmic level compression is mainly identical to JSONH. I didn't do
performance tests yet, I still have ideas how to improve RJSON.

------
gcb
It's all fun until someone loses an eye.

What if i look at your API output and assume it's json (only got unique items)
but it's rjson? Or whatever?

The most important thing when adding another layer to a protocol is
identification.

So, please, put the whole altered object into a rjson root node so it's clear
what we're dealing with.

~~~
dogada
I agree that identification is important issue, but IMO it's protocol level
issue. RJSON is not protocol, it's algorithm. Someone will prefer to wrap all
in {"rjson": ...}, someone like {"format": "rjson", "data": ...} and so on. I
belive, algorithm itself should create as less limitations as possible.

~~~
rubyruy
The MIME type is allowed to add information about the nature of the data
encoded on top of the encoding. This is why we have application/xhtml+xml. It
means "this is encoded using xml, and btw the parsed xml structure is xhtml".

The ideal place to handle this is (as pointed out above) content negociation -
specifically as 'application/rjson+json'

