
Ask HN: Is there a way to compress repeated JSON objects? - lonelycoder2
Lets say you have a JSON:<p><pre><code>  {
    &quot;items&quot;: [
    {
      &quot;foo&quot;: &quot;foo bar&quot;,
      &quot;bar&quot;: {
        &quot;desc&quot;: &quot;some really long data&quot;,
        &quot;other&quot;: &quot;lots more data&quot;
      }
    },
    {
      &quot;foo&quot;: &quot;not foobar&quot;,
      &quot;bar&quot;: {
        &quot;desc&quot;: &quot;some really long data&quot;,
        &quot;other&quot;: &quot;lots more data&quot;
      }
    }]
  }
</code></pre>
Where &#x27;bar&#x27; is usually the same in every item. Is there a system of compressing json to use pointers or something so the &#x27;bar&#x27; objects aren&#x27;t repeated and only use a reference to the data?
======
viraptor
It depends what you mean by compressing. If you pass this through
deflate/gzip/some other standard compression, `bar` will compress very well.
So if you can use standard compression either in transport or when saving the
file, that will give you the best result.

If you want to do references within json, you'd have to implement something
yourself. There are lots of different solutions to this already. If you use
only references, you could simply say this field includes the id of the
structure you need, like:

    
    
        {
          "_refs": {
              "abc": {"desc": ..., "other": ....}
          },
          "items": [
              {"foo": "foo bar", "bar": "abc"},
          ...
    

If you want some system where you can use either the text or id, have a look
at something like AWS CF format. There, values that look like `{"Ref":
"SomeName"}` are replaced with values of the reference, but you can just have
"foo" in place of that object and it will still work.

If you want to keep that value a string for some reason, you can also use
namespacing like AWS does: `{"Fn::Base64": {...}}` for example gets
interpreted as a value of that function rather than a standard object. Using
"ref::..." as a prefix for "this is not a standard value" could work for you.

------
jimsmart
What is your goal?

If your goal is to reduce the amount of data that goes across the wire, just
make sure you have gzip enabled on your server - the compression algorithm
will take care of that for you.

Otherwise (if you are trying to reduce memory usage), you probably want to
split them out, as say "sub-items", and refer to them by name - but there's no
magic / hidden feature that will do that for you. Of course doing this will
make both the client and the server code more complex.

In the majority of cases, I suspect most coders (myself included) would prefer
the simpler code resulting from the former option - but it totally depends on
your goal.

Have you actually measured/profiled the code/data in question? I ask because
this kinda smells of being a premature optimisation (although obviously one
cannot truly say, without more context).

------
Herostwist
If this is for a public api i would simply enable gzip and call it a day.

if this is for internal use you have a few options. although there is
automatic way of reusing or referencing objects within json you can construct
your own model using references.

One technique i have used in the past for a smaller memory footprint is to
have an data array and then reference index positions within it.

e.g.

    
    
      {
        "items": [
        {
          "foo": 0,
          "bar": {
            "desc": 1,
            "other": 2
          }
        },
        {
          "foo": 3,
          "bar": {
            "desc": 1,
            "other": 2
          }
        },
        "data": [
          "foo bar",
          "some really long data",
          "lots more data",
          "not foobar",
        ]
     }
    

If you are free to use alternative serialisation methods i would recommend you
look at protocol buffers.

[https://developers.google.com/protocol-
buffers/](https://developers.google.com/protocol-buffers/)

------
EspadaV9
JSON-API [0] does something like this but would require your JSON to be
formatted according to their specification and would then need parsing at the
receiving end. You would end up having some JSON like this

    
    
        {
            "data": [{
                "type": "item",
                "id": "123",
                "attributes": {
                    "foo": "foo bar",
                },
                "relationships": {
                    "bar": {
                        "data": {
                            "type": "bar",
                            "id": "abc"
                        }
                    }
                }
            }, {
                "type": "item",
                "id": "456",
                "attributes": {
                    "foo": "not foo bar",
                },
                "relationships": {
                    "bar": {
                        "data": {
                            "type": "bar",
                            "id": "abc"
                        }
                    }
                }
            }],
            "included": [{
                "type": "bar",
                "id": "abc",
                "attributes": {
                    "desc": "some really long data",
                    "other": "lots more data"
                }
            }]
        }
    

This will initially increase the size of the JSON but depending on the number
of relationships and the amount of data within them it could end up saving
data.

As others have said you could just enabled gzip and leave it at that.

0: [http://jsonapi.org/](http://jsonapi.org/)

------
borplk
Firstly as others noted you can use a transport-layer level compression
mechanism like gzip.

The other bit sort of comes down to application level logic.

You can filter down to the unique items and store them in a json array (or
object) and in the other places instead of repeating the item use a value to
reference the items (like index in the array or a key in the object).

------
wuezz
Assuming you deliver the json over HTTP(s), then you can achieve good
compression on structures like in your example by enabling gzip compression.

------
cocotino
Just enable gzip and call it a day.

