

Ask HN: Indexing JSON data: thoughts - wilsonfiifi

Are there any technical drawbacks if one should index JSON data in the following manner for storage in a SQL&#x2F;NoSQL backend?<p>JSON:<p><pre><code>  {&quot;user&quot;:&quot;Jason&quot;, &quot;skills&quot;:[&quot;combat&quot;,&quot;evasion&quot;]}
</code></pre>
Index:<p><pre><code>  depth | path       | value                | type
  --------------------------------------------------
  0     | user       | &quot;Jason&quot;              | string
  0     | skills     | [&quot;combat&quot;,&quot;evasion&quot;] | array
  1     | skills.[0] | &quot;combat&quot;             | string
  1     | skills.[1] | &quot;evasion&quot;            | string
</code></pre>
Here&#x27;s a sample code implementation in Python:
https:&#x2F;&#x2F;gist.github.com&#x2F;johnwilson&#x2F;9009797
======
lutusp
> Are there any technical drawbacks if one should index JSON data in the
> following manner for storage in a SQL/NoSQL backend?

It depends on whether the record you chose is typical. It's a good idea to
scan all the JSON records and detect any surprises before committing to a
given table design.

And there's no reason to store the array's contents twice. A properly designed
table should allow the fields to be accessible regardless of how they're
stored. It's up to you whether to store them as separate (and separately
named) fields or as an array. But some databases, MySQL included, don't have
an array datatype, so if you decide to use MySQL you would want to break out
each array member as a separate string field.

Finally, you need to add a primary key field to your design, for multiple
reasons.

~~~
wilsonfiifi
The array is stored in full and it's contents are also parsed and stored
separately to enable the following queries (pseudo code) to be performed:

Get user's skills:

    
    
      Select row where path == 'skills' and 'objected' column == (uuid)
    

Get user's skill at index 0:

    
    
      Select row where path = 'skills.[0]' and 'objected' column == (uuid)
    

Get all users whose skills contain 'combat':

    
    
      Select rows where path starts with 'skills.[*]' and 'combat' in 'values' column

~~~
lutusp
Yes, that's fine, unless you plan to use a database engine like MySQL that
doesn't support arrays. You have the option of using an SQL statement that
doesn't require the fields to be in the same array.

In other words, don't let your logic be defined by something that databases
don't support.

------
wilsonfiifi
I should probably emphasize that the above table is a representation of the
index in memory before submitting to the database persistence function.

------
gcb0
this is so off topic that i cant resist to reply.

. add a primary key.

. change depth to a parent schema

. count depth on the business logic. all the data layer should care is
immediate hierarchy

