
JSONCrush – Compress JSON into URI Friendly Strings - KilledByAPixel
https://github.com/KilledByAPixel/JSONCrush
======
wolfgang42
This looks interesting, but I wish there was an explanation of how it works;
I'm particularly curious about how it extracts repeated substrings. I tried
reading the code, but it starts off with:

    
    
      let X, B, O, m, i, c, e, N, M, o, t, j, x, R;
    

and doesn't get any more readable from there. (And that's the unminified
version!)

There aren't any tests, either, so if you're using this and find a bug I guess
you just have to hope that nothing breaks when you change a line like

    
    
      for(M=N=e=c=0,i=Q.length;!c&&--i;)!~s.indexOf(Q[i])&&(c=Q[i]);

~~~
KilledByAPixel
Sorry, that part of the code is from JSCrush which I did not write. I don't
fully understand it but here's an explanation someone wrote...

[https://nikhilism.com/post/2012/demystifying-
jscrush](https://nikhilism.com/post/2012/demystifying-jscrush)

If you go to the live demo, it will do a test to crush/encodeURI and
uncrush/decodeURI to the string you pass in and verify that they are the same.
Maybe I should write some automated tests.

~~~
Waterluvian
Did someone actually write that or did it come from a minified output?

~~~
KilledByAPixel
I assume they minified it, but I don't have access to the unminified source.
I'd like to clean it up or find a cleaner version, but for now it works well
enough.

~~~
yorwba
The article you linked above contains a partially unminified version.

EDIT: On second thought, you'd probably be better off using a completely
different compression algorithm that doesn't sacrifice performance for
golfability.

~~~
slowenough
No way, that compression algorithm is amazing. But I also would like to
understand how it works by someone explaining it really simply.

I mean it's doing significantly better than LZ. Anyone want to provide simple
intuition on these algorithms and where they come from?

~~~
yorwba
The compression works by identifying repeating subsequences in the input data
and picking the one that gives the best savings if replaced by a single byte
not yet part of the input. This step requires time quadratic in the length of
the input. After replacing the subsequence and tacking it onto the end so it
can be recovered, the process continues until no repetition can be found or
all possible bytes have been used.

It can beat LZ's compression ratio for two reasons:

1\. The input consists of only bytes that are valid in an URI, so it doesn't
need an additional encoding step.

2\. It gets very slow very quickly for larger inputs.

~~~
KilledByAPixel
Also, LZ beats it out for longer strings, but not by much for strings in the
target range (~5000 characters).

------
roberto
Rison ([https://github.com/Nanonid/rison](https://github.com/Nanonid/rison))
is a nice alternative that seems to be more readable IMHO.

~~~
Guillaume86
Yes I use this one, not as compact but it remains readable/editable, more
notably Kibana uses it as well.

------
dillondoyle
I think a more 'popular' pattern is is to base64 encode the JSON. This is how
we do it (tracking pixels/get requests), and lots of analytics/ad tech uses
that pattern as well. But I'm not sure on the compression/size.

~~~
KilledByAPixel
Encoding to base64 increases the size by at least 33%. This is to help make
links small enough to share on places like twitter, dischord etc where the max
length is around 4000 bytes.

~~~
stock_toaster
If you compress the json first, it isn't quite so bad.

Example #1 (short string):

    
    
      input: 103 bytes
      gzip(input): 87 bytes
      base64(gzip(input)): 117 bytes
    

Example #2 (long string):

    
    
      input: 3122 bytes
      gzip(input): 840 bytes
      base64(gzip(input)): 1121 bytes

~~~
KilledByAPixel
Thanks, that is good reference. It makes sense that Zip would beat out JSON
for longer strings. Just as another point of comparison, how about
encodeURIComponent(gzip(input))?

------
donohoe
Hold my beer!

...

[https://donohoe.dev/project/jspng-
encoder/](https://donohoe.dev/project/jspng-encoder/)

Life is too short to do something useful so why not encode all your site's
Javascript code into a PNG image and then decode it on demand.

(This is a terrible idea. Don’t do it. Just for fun)

~~~
santa_boy
this is quite cool. how do you convert the file to a png?

Why is this a terrible idea?

I'm thinking I can Whatsapp friends a huge letter as an image and they could
use my toy app to decode it ;-)

~~~
donohoe
A PHP script does the conversion, basically takes the code converts to a
base64 png.

------
emj
The code is a bit dense for my taste, these are extracts from the unminified
version:

for(M=N=e=c=0,i=Q.length;!c &&\--i;)!~s.indexOf(Q[i])&&(c=Q[i]);

RegExp(`${(g[2]?g[2]:'')+g[0]}|${(g[3]?g[3]:'')+g[1]}`,'g');

~~~
KilledByAPixel
This is the JSCrush algorithm that I did not write and it is kind of magic.
The uncompressor is crazy small. You can read more about it here...

[https://nikhilism.com/post/2012/demystifying-
jscrush](https://nikhilism.com/post/2012/demystifying-jscrush)

------
jsd1982
The zzart format looks ridiculously wordy. A lot of compression could be
achieved by shortening the property names. I wonder how much BSON would
provide in size advantage but it'd have to be encoded to something amenable to
a URI.

------
otabdeveloper4
Here's another approach that compresses even better while retaining (limited)
human readability:
[https://bitbucket.org/tkatchev/muson/](https://bitbucket.org/tkatchev/muson/)

(Though it's mostly meant for making things smoother in statically-typed
languages.)

~~~
anentropic
are they usable in a URI though?

~~~
otabdeveloper4
No, you'll need to percent-encode them first.

------
mofosyne
Maybe further saving could be done by converting to cbor before urlencode

------
cfv
Idea: just put the thing in localstorage? It's cheaper, less fiddly, you can
encrypt it with something like TEA if you so desire, and doesn't make urls
unshareable by existing

~~~
Semaphor
> and doesn't make urls unshareable by existing

But when you put json in the URL, you normally do want exactly that. Making
the URL shareable.

~~~
KilledByAPixel
Yes, localstorage is great, but this for making URLs that are short enough to
share on twitter, dischord, etc. The max twitter url length is ~4000
characters I think.

------
wensley
I was looking for something like this last week so was interested to try this.
It's just crashed in Firefox with the fans also kicking up to max. It worked
in Chrome with a 31% reduction in size, but took a while, and the fans kicked
in again.

~~~
KilledByAPixel
I had some issues with long strings, like 5000 or greater. What length string
were you using?

~~~
wensley
22728 characters! Maybe that's way bigger than this is meant for.

~~~
KilledByAPixel
Yes, I'm interested in trying to make it work with longer strings. Worst case
scenario if it gets long strings it could just split it up into strings of
length that it can handle. The speed seems to decreases rapidly with string
length at some point eventually hitting a brick wall, the sweet spot seems
around 1000-5000.

~~~
m712
That means the algorithm's complexity is non-linear. I am suspecting the
JScrush code currently but it could be somewhere else as well.

~~~
KilledByAPixel
It is the JSCrush code. I don't fully understand it, but it's a brute force
approach to find the longest substrings.

------
zer0faith
Why?

~~~
anderspitman
One use case where I've put (uri-encoded) JSON in the query string is for
parameterizing GET requests with more structured data, since URL encoding is
fairly limited and GET requests don't have a body like POSTS.

~~~
zer0faith
Can't you just base64 url safe encode it and get the same result? Aren't URl
strings limited in their length as well?

~~~
KilledByAPixel
As far as I know base64 is not guaranteed to be uri safe, though most of the
time you should be fine. However more importantly converting to base64
automatically increases the size by 33%

[https://developer.mozilla.org/en-
US/docs/Web/API/WindowBase6...](https://developer.mozilla.org/en-
US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding)

~~~
CodesInChaos
The parent poster was referring to the "base64url" variant of Base64 which
uses `-` and `_` as the two special characters and leaves out the useless
padding, making it url safe. (The 33% expansion still applies though)

[https://tools.ietf.org/html/rfc4648#page-7](https://tools.ietf.org/html/rfc4648#page-7)

------
jermaustin1
I like this as a compression method, but not sure how I feel about sticking it
in a URL.

~~~
KilledByAPixel
It worked great for my use case [http://zzart.3d2k.com](http://zzart.3d2k.com)

Reduces share urls by about 75% for these very repetitious JSON strings.

