This is pretty much the reason we're using Postgres for our document storage. Ad...

aaronyo · on March 31, 2013

Do you have any thoughts on expression indexes? Assumed I would have to pull out fields from the document to index reasonably on pgsql, but this presentation at least shows that simple expression indexes perform well. I wonder about compound, nested, etc.

eksith · on March 31, 2013

I'm not too familiar with the current schema, but I do know we make extensive use of partial indexes (WHERE (vnnk IS NOT NULL), WHERE (fhs > 0) etc...) and compound indexes in btree. In GIN, for example, this isn't possible, but GIN itself works reasonably well for partial matches (since it's also btree) so it's not too much of a deal. We use a combination of hstore, GIN, conventional fields and multiple indexes with the schema abstracted out on our app to provide a document storage and search engine with a separate app for client login. I believe some metadata is stored in conventional fields (varchar integer etc...).

Performance compared to the conventional schema is of course noticably slower, but not unacceptably so for a few thousand docs and growing (this is compared to hundereds of millons of rows on the conventional schema).

But such is the sacrifice for reliability. When NASA hit a spat of failed missions to Mars (explosions, collisions, miscalculations etc...) people were joking about the creed at the time, "Faster, Better, Cheaper; Pick 2".

This kind of engineering is a turn off for a lot of folks, so I can certainly see the appeal of a document DB. I've also seen people fall into the trap of "I don't know what kind of data I'll be getting so I'll go with a doc store". Well, there has to be some hint of what kind of data could be arriving so engineering for that with some wiggle room early is a good idea.

You shouldn't take my word or the slides at face value, though. Really think through, experiment, throw stones at it etc... and see if it works for you.

aaronyo · on March 31, 2013

Reliability is gold.

Thanks for the pointers. I will read up on GIN and hstore.

If nothing else, duplicating an element from a doc column to a first class field seems reasonable. Also, even in mongo one must define indexes which is a schema of sorts and can be non trivial to change.

DrJokepu · on March 31, 2013

In addition to this, we have found that the JSON field - PLV8 - JavaScript user-defined function - function-based index combination to work and perform really well. As the parent said, is not as fast as a full relational schematic but it's entirely acceptable. We try to stick to relational where speed and/or referential integrity is crucial and just use JSON fields for everything else.

kiwidrew · on March 31, 2013

One approach to expression indexes on JSON would be a multicolumn index. Column A is the path, e.g. "foo.bar[2].baz" and column B is the value at that path... of course you'd need to use GIN -- which as mentioned already is slow to build -- since one row would map to multiple index entries. But this ought to be a fairly workable approach.