
Unstructured Datatypes in Postgres – Hstore vs. JSON vs. JSONB - craigkerstiens
https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
======
Eridrus
I really wish someone would write an article about when to use regular columns
instead of JSONB and whether you should split your data into multiple JSONB
columns, etc.

So far the only thing I've figured out is that updating pieces of a JSONB
structure seems like a bad idea at the moment since it will not always succeed
(notably you can't add entries to an array once it has 999 elements already).

In my problem I've given each analysis tool I run and API I query it's own
JSONB column so that they only ever need to be written once, or if they are
changed, they would be overwritten entirely and keep a few normal columns that
contain some metadata, but I'm not sure if this is a good approach.

~~~
hobs
Your idea of a mixed system that has the best of both approaches (some stored
state which may be hard to compute and/or may be complicated for you to figure
out how to store) and data that is often updated not inside the JSONB column
but related to it either via a foreign key or another column in your table is
perfectly fine.

The thing you want to avoid in that type of system is having data in both
locations that you need to update to keep in sync, this is a bad sign and you
probably need to move some data out of the JSONB column.

Also, I would almost never choose anything but normal row data unless I
absolutely had to and produce some json at the end of the pipeline. However,
your situation has a special type of work you can save and it would be silly
to redo it all over again on each request.

------
b0ti
Having to work with unstructured datatypes in Postgres and other RDBMS
products was quite hard a few years back, now with hstore and JSONB in
Postgres this has gotten much better but it still feels a bit unnatural. When
we started the Raijin database project
([http://raijindb.com](http://raijindb.com)) one of the goals was to make it
possible to treat these extra attributes (that you would shove into JSONB with
postgres) as normal "columns" so you can do this:

    
    
       CREATE TABLE tbl(eventtime datetime);
       INSERT INTO tbl {"eventtime":"2016-07-20 10:37:12","foo":"bar","whatever":"xx"};
       INSERT INTO tbl {"eventtime":"2016-07-20 10:38:22","foo":"bar2","intfield":42};
       SELECT foo, intfield FROM tbl;
       --
       {"foo":"bar","intfield":null}
       {"foo":"bar2","intfield":42}
    

It won't complain about the missing 'columns'.

Anyway, glad to see that Postgres is steadily making progress in this area.

~~~
happyslobro
Why does PG create fields with null values? Intfield was undefined when it was
inserted for the `"foo": "bar"` row. Does this mean that if you insert 10K
JSON rows, and each one has a unique field / key, then a SELECT * will return
a 10K rows * 10K fields, most of which are null?

Having the DB automatically insert null fields also makes it harder to use
Javascript's `Object.assign` to populate undefined fields with default values:

Object.assign({a: "default-a", b: "default-b"}, {a: "db-a"})

{a: "db-a", b: "default-b"}

VS

Object.assign({a: "default-a", b: "default-b"}, {a: "db-a", b: null})

{a: "db-a", b: null}

null and undefined are two different concepts in JS. It is like the difference
between "there is nothing here" and "I don't know if there is anything here".

~~~
b0ti
Note that this is not PG, Raijin is a new implementation and is not built on
Postgres. Otherwise the question is valid. Raijin does not store NULL values
though. In the above example a specific field (column) was selected and that's
why it appears as null. Perhaps the output would be more appropriate this way
then (which is a modification for us to consider):

    
    
      {"foo":"bar"}
      {"foo":"bar2","intfield":42}
    

Note that when you select the full record with the star it does not return
null values:

    
    
      select * from tbl;
      --
      {"_id":1,"eventtime":"2016-07-20 10:37:12","foo":"bar","whatever":"xx"}
      {"_id":2,"eventtime":"2016-07-20 10:38:22","foo":"bar2","intfield":42}
    

The undef/exists thing can get a bit confusing when you are mapping this into
SQL.

------
gdeglin
We store key/value pairs in Hstore, since JSONB wasn't available when we
started doing this.

We've since updated to PostgreSQL 9.5. Does anyone know if we would see a
performance advantage switching this to JSONB (either in selects or updates).

My sense is no, but I haven't been able to find any good answers one way or
the other. If it's yes, then HStore doesn't really serve any purpose at all
anymore.

~~~
_pmf_
> Does anyone know if we would see a performance advantage switching this to
> JSONB (either in selects or updates).

I don't actually know anything from practical usage, but I think JSONB makes
some kinds of membership tests more efficient (at the price of more expensive
inserts). See 8.14.3. of [https://www.postgresql.org/docs/9.4/static/datatype-
json.htm...](https://www.postgresql.org/docs/9.4/static/datatype-json.html)

------
thinkingkong
Summary: use jsonb unless you need non-json. Then use Hstore.

~~~
drob
Or if you need GiST indexing, can't wait for someone to write the bindings for
JSONB, and have non-nested data, in which case hstore is probably the right
choice.

------
chrismorgan
I’ve wondered before: why does hstore still exist? I get that JSON adds
something over JSONB, but as far as I can tell (without ever having used
either of them) hstore functionality is strictly inferior to JSONB; if that is
the case, why would hstore not be deprecated and marked for eventual removal?

~~~
ht85
hstore has some stuff that's missing from JSON.

My favorite hstore feature is that you can produce a diff simply by using the
minus operator. For auditing a table, in an ON UPDATE trigger you can simply
do

previous_values = hstore(OLD) - hstore(NEW)

previous_values will only contain columns that have changed, with their old
values.

You can then restore the row to its previous state by doing PREVIOUS_ROW =
CURRENT_ROW #= previous_values.

~~~
chrismorgan
Any real reason why JSONB shouldn’t support those operators? (Does it do
something else with them?)

~~~
ht85
hstore are very different from JSON in the sense that they are a flat
key/value structure, which is exactly what a record/row in the database is.
This is why you have operators to convert and combine hstore values and actual
records.

The ability to diff might come at some point, but interest for it is probably
low because of the purpose of JSONB. Diffing works best for similar structure,
and if you're storing things with similar structure in postgres, I doubt
you're going to use JSONB. Diffing deep trees is also very costly.

------
TylerE
Examples would make this a lot more interesting. Don't just say "you can query
for events with email foo@bar.com", show me the operators!

~~~
craigkerstiens
Good feedback, in general we thought there are enough articles out there about
the various operators that you can use with JSONB such as
[https://hasura.io/blog/the-unofficial-guide-to-jsonb-
operato...](https://hasura.io/blog/the-unofficial-guide-to-jsonb-operators-in-
postgres/). We may take another pass through though to see if we can add some
more complex examples to highlight whats possible.

------
darklajid
Would it be possible to store (and query) date/time values as well?

I have a use case where either HStore or JSONB might be neat (The most basic
requirement is K/V, so HStore might be good enough. I could use nested
structures as well though). Unfortunately some of the values I need to store
are datetime values and I'd need to query based on those:

"Give me all documents where the datetime is from this year"

The article claims HStore is basically just strings (might work for bool as
well, but not numbers/datetimes). JSON can express strings, numbers, booleans
- but not datetime values.

Has anyone done something similar? Is that even possible?

~~~
ZenPsycho
why not make the date values a real column? does it need to be _inside_ the
JSON?

~~~
darklajid
The use case I'm looking at is a legacy application that allows you to define
various 'dynamic' fields. Think document, extracted data and you can expose
fieldA, fieldB and fieldC, while another user wants D, E and F.

Right now that application stores that in a rather messy way. Exposing this as
a table field would mean that configuration changes (New field/a different
field) causes a structural change (alter table). Or you have to go down the
'Date1, Date2, Date3, Number1, Number2' ... path.

In short: I see no way to expose these pieces of information in fields without
ugly hacks, hence my interest in HStore/JSON(B) - but datetime based filters
would be quite nice to have..

~~~
s_kilk
Re datetime filters:

If the datetimes are ISO formatted strings, would regex matching not work for
basic filtering?

    
    
        where json_field.datetime similar to '2016-07-.*'
    

Of course, it'll probably be slow-tastic, and less convenient than native
datetime manipulation, but hey, maybe it's enough to get unblocked?

------
mynameislegion
Does jsonb work with CitusDB? Like can I have a sharded and indexed cluster of
postgres dbs? That'd be really awesome if they work together.

~~~
craigkerstiens
Yep, JSONB works great with Citus. It's a model we've seen a few customers
follow and the example towards the end are two common use cases where we see
it most commonly used (event streams where data doesn't always exist across
all columns, and when schemas may commonly vary a bit but you still need to
find a shard key and data model that can be shared across them).

------
koolba
One other (big) difference between json and jsonb is that the former preserves
extra white space. The exact text string of the json is stored. If that
matters to you, say you're taking a signature of the original value, then you
should use non-b json. Otherwise, stick to jsonb.

~~~
dexwiz
Why does whitespace (outside of string values) ever matter in JSON? Most
servers will strip as much whitespace as possible to reduce response size.
Also stripping before signing is pretty trivial.

~~~
angryasian
just throwing something out, but if you're allowing users to enter in their
own json.

~~~
dexwiz
Why would you want to maintain extra whitespace (again outside of text
strings) after a save? Any sort of JSON editor is going to have some sort of
formatting function that could restore it on load.

~~~
yeukhon
One exmaple: implement a text editor like GitHub's gist naively.

You can try to be clever and encode the incoming data to reduce whitespace,
but the business requirement is whatever user entered is what user expects to
get back. Otherwise, your best bet is save the entire text.

~~~
rvense
But in that case, should you really be storing the JSON as JSON? Isn't it just
a string as far as you're concerned? You're not going to be querying the
user's JSON, are you?

~~~
yeukhon
You are correct and it depends on how you want to save your data (poor or
bad!). In the article it does use log as an example. Perhaps that would be a
more convincing-ish example.

------
cobbzilla
my admittedly anecdotal experiences working with JSONB in Postgres: coding is
generally easy and performance is great, it even plays nice with ORMs like
hibernate (with just a bit more work). the downside is the query syntax: for
complex queries where you need to search within the JSON, it's hard to write
correct SQL. it's a trial-and-error process and the error messages aren't that
great. once you get it working you're fine, but it can be hard to get there
sometimes.

------
tmaly
I had used postgresql on a previous project. Now with my current side project,
I developed the initial version using just Redis and some BoltDB.

After figuring out exactly what data I need to store, I decided to move back
to Postgresql. I may still use Redis as I like the geo-location features that
come right out of the box. Recompiling the extensions to Postgresql as not my
favorite thing.

