
Postgres 9.4 feature highlight: Indexing JSON data with jsonb data type - ghosh
http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-indexing-jsonb/
======
hliyan
This is terrific. I spent all of September looking at almost every NoSQL DB in
town but nothing could handle the loads we were looking at. Then I found the
Postres JSON data types but was sad when I saw it was missing the one thing
that would make such a data type worthwhile -- indexing. Now we have it!

Edit: or should I say: we will have it soon?

~~~
ddorian43
I think you can index even plain json columns, it just will be slower than
jsonb.

~~~
pilif
No. You can't create an index on a json column that will be usable for
arbitrary containment queries which is the awesome thing provided by jsonb.

What you can already do however is to create a functional index if you always
search for the same elements in your JSON bodies.

This will then run at the exact same speed as any other btree index (which is
still faster than the gin indexes for jsonb), but at a considerable loss of
flexibility (or a ton of storage usage).

~~~
ddorian43
Can you tell me any usecases where you want to index every key on the json ?

~~~
pilif
Not _every_.

But assume you have a json field with various, lets say, product attributes
which might vary greatly between various products, let's say, based on the
category they are in.

Hard drives for example might have a "capacity" field, whereas monitors might
have a "screen size" field. Also assume that you have 10000s of products, 100s
with completely different attributes, 1000s of distinct attributes across all
products.

Now you want to add a search feature that allows to search for products based
on these attributes (like "give me all products where screen_size=37 inches")

If you don't have any traditional columns with indexes you can use to limit
the amount of rows, that query will either cause a sequence scan over all
products, or you will have to create a functional index over the screen_size
field in your JSON document which then only covers screen_size, but not, say,
the disk_size.

This will mean that you will have to create 1000s of indexes (one for every
custom attribute) which also means that you lose a lot of flexibility: Adding
a new custom attribute now means that you have to add another index (or you'll
suddenly have to deal with unexpected sequence scans).

Using jsonb and a gin index, you need exactly one index to find arbitrary
key/value combinations in your json document, giving you the flexibility of
trivially adding new attributes but just storing that in the json document.

~~~
mml
I think I ran your exact application (large retailer catalog). We just used
solr as a sort of external index. Worked like a charm, though obviously
introduced additional complexity.

~~~
pilif
Yeah. I'm also pondering the use of elastic search in my case.

But the additional complexity is scaring me a bit, especially as the product
in question's main strength is _not_ showing most of the products it knows
about, so none of the elastic search results could be displayed directly, but
would have to go through additional processing.

Being able to do all of this directly in Postgres will be much nicer, so I'm
really looking forward to 9.4

~~~
Xylakant
For any kind of product search I'd go for an external index (I'm strongly
biased towards ES, but SOLR is certainly on par feature-wise). Postgres is my
go-to relational database and I do love it's json/hstore features, but it does
not offer features such as stemming, aggregation, proper weighting or any
other analyzer based features.

Adding ES as a secondary datastore just for search is relatively easy in terms
of technical complexity, especially if you already have json data. The more
complex part is deciding on a relevancy model, but you'd have to do that for
every technology you choose.

------
pilif
Now if only we'd finally get that release :-)

Over the last few years, September was always early christmas for me because
in September we would be getting the new release. And despite this being a
database, their x.y.0 releases were always rock solid (with the exception of
9.2.0 and an issue with IN queries and indexes), so I'm usually upgrading very
early.

This year, it looks like the releases is bit late, also caused by some on-disk
storage issues for jsonb, so of course I'd like them to spend all the time
they need, but I'm still very much looking forward to playing with jsonb as
this will provide a nice way of solving some issues I'm having.

I'm not using the beta releases for anything bigger than quickly seeing
whether clients still mostly work because updating between beta releases is a
PITA due to it requiring a full restore most of the time.

The database I'd like to use jsonb with is 600GB in size and restoring a dump
takes 2-6 hours depending on the machine.

~~~
crucialfelix
Does it not work to use the same data dir and use pg_upgrade ?

I'm just about to upgrade 9.1 - 9.3 and am wondering if that method is
advisable.

~~~
pilif
Yes. That works fine between official releases and I've been using it to
upgrade in the last 3 years.

However, pg_updrade does not support upgrading between beta releases, which is
one of the reasons why I stick to released versions.

------
harel
I'm patiently waiting for this release like I might wait for a good new film
or a game. Don't think I was ever that 'excited' (if I can use that word) by a
database point release.

I needed a document database/RDBMS hybrid combo for a piece of work I want to
do, and was going to default to MongoDB but once I heard of 9.4 I decided to
wait and see how it pans out.

------
nl
Does anyone know where to find the 9.4 update JSON[1] features? I was sure it
was coming, but I can't see it in the docs anywhere[2][3]?

[1] [http://stackoverflow.com/questions/18209625/how-do-i-
modify-...](http://stackoverflow.com/questions/18209625/how-do-i-modify-
fields-inside-the-new-postgresql-json-datatype)

[2] [http://www.postgresql.org/docs/9.4/static/datatype-
json.html](http://www.postgresql.org/docs/9.4/static/datatype-json.html)

[3] [http://www.postgresql.org/docs/9.4/static/functions-
json.htm...](http://www.postgresql.org/docs/9.4/static/functions-json.html)

~~~
gdulli
Every reference to jsonb in those docs is the new 9.4 feature/type.

~~~
nl
No, how do you update a field within the JSON doc? That is missing in 9.3

~~~
gdulli
That's not in 9.4.

------
misiti3780
Does anyone have a link comparing postgres 9.4 jsonb and mongodb ?

~~~
Kassandry
Right here. =) [http://blogs.enterprisedb.com/2014/09/24/postgres-
outperform...](http://blogs.enterprisedb.com/2014/09/24/postgres-outperforms-
mongodb-and-ushers-in-new-developer-reality/)

~~~
tracker1
Now if only I had a baked in solution for a replica set, with hot master
failover and command forwarding in PostgreSQL like I get with MongoDB.

Quirky bolt-on solutions, and those requiring a contract with EnterpriseDB or
others need not apply.

I really don't mean for this to be as snarky as it sounds... I'd really love
these features in PostgreSQL along with PLv8, I'd rather use that than
MongoDB... unfortunately having a good in the box solution for MongoDB over
PostgreSQL is a bigger breaking point than lack of Joins in my use case.

~~~
moe
_with hot master failover and command forwarding [...] like I get with
MongoDB._

Failover is documented to not work very well in MongoDB (data loss):
[http://aphyr.com/posts/284-call-me-maybe-
mongodb](http://aphyr.com/posts/284-call-me-maybe-mongodb)

On the postgres side RepMgr makes dealing with replication quite easy:
[https://github.com/2ndQuadrant/repmgr](https://github.com/2ndQuadrant/repmgr)

See also this recent thread on the PostgreSQL mailing list:
[http://postgresql.nabble.com/Fully-automatic-streaming-
repli...](http://postgresql.nabble.com/Fully-automatic-streaming-replication-
failover-when-master-dies-td5788393.html)

~~~
Xylakant
I'd be careful quoting the aphyr post in that context. PG also had some
failures, even in the non-replicated setup: [http://aphyr.com/posts/282-call-
me-maybe-postgres](http://aphyr.com/posts/282-call-me-maybe-postgres)

Failures are just a property of distributed systems, some fail in ways that
are better for you, some fail in ways that are worse, but sooner or later they
all fail. It's a tradeoff which one you choose.

~~~
moe
_PG also had some failures_

MongoDB loses _acknowledged writes_ in its supposed "highest consistency"
mode.

PostgreSQL does not.

That's a pretty fundamental difference, don't you think?

~~~
Xylakant
MongoDB was tested in a replicated setup, PG not, that's a pretty fundamental
difference. Loosing acked writes in a master-slave replication with failover
would be expected with a sufficiently large replication lag even for PG.

MongoDB is about the last database I'd pick for any given scenario, but I
acknowledge that there are people that have a usecase where it might be a good
fit because the tradeoffs work out for them.

~~~
moe
_Loosing acked writes in a master-slave replication with failover would be
expected with a sufficiently large replication lag even for PG._

Bullshit.

------
mau
Many people here are comparing this feature of Postgres with MongoDB or other
document-oriented NoSQL dbs. I think the comparison is just wrong and unfair.

Even if I'm amazed by the performance of Postgres for this particular task
(considering also that is a relatively new feature), I don't think performance
is the reason why people are using NoSQL dbs. The problem that NoSQL dbs are
helping with is scaling. I don't see this as a priority for an RDBMS such as
Postgres. Take for instance MongoDB (just because it was named by many of the
other comments here, but I guess the same apply to Couch or others): it's
relatively simple to deploy a cluster with automatic sharding, replica,
failover, etc.. because these are all builtin features.

~~~
andrewstuart2
SQL DBs scale just as well as NoSQL and in the exact same ways: read only
clones, sharding, federation, etc. YouTube runs on MySQL in case you're
worried about scaling. It may be a tad easier with something like Mongo but
it's not hard with MySQL.

IMO, NoSQL is attractive mostly because it's schemaless. You need to change up
your model all the time, or handle arbitrary datasets but still query them?
NoSQL may be the right tool for the job.

~~~
BenoitP
> handle arbitrary datasets

I was wondering: does the postgres json type does not fulfill all of MongoDB's
current feature set?

------
therealunreal
It'd be interesting to see how this compares to hstore on flat data.

