

Why HStore2/jsonb is the most important patch of PostgreSQL 9.4 - andreypopp
http://www.databasesoup.com/2014/02/why-hstore2jsonb-is-most-important.html

======
jeremyjh
Chasing flavor-of-the-month seems like a pretty lost cause. I would like to
see the patch in as well, but this seems like a pretty terrible reason for it.

~~~
wisty
AJAX was flavour of the month in 2004. JSON has been flavour of the month
since ~2006.

While that makes it about 10 years younger than Postgres and MySQL, it's still
not a flash in the pan.

~~~
jeremyjh
You misunderstand me. I am not commenting on the nature of the feature, but
rather the motivation cited in the SP for holding the release open to get it
in. That reason could be summarized as "node.js and mongo!!".

------
jmnicolas
PostgreSQL is the best open source relational database and I hope it remains
that way.

While I don't mind seeing some non relational additions like JSON, it
shouldn't be detrimental to Postgres relational core.

It's not a pissing contest : if MongoDB is more popular than Postgres for non
relational (NoSql) data so what ?

It's not like Postgres is a commercial product and need wider adoption to keep
the parent company afloat.

~~~
masklinn
> While I don't mind seeing some non relational additions like JSON, it
> shouldn't be detrimental to Postgres relational core.

The patchset in question is an improvement of hstore[0] (typed hstore, nested
hstore) and unification of json and hstore (by using the binary hstore format
for both). There is no more detriment to the relational core than when hstore
or XML were originally added. Also, next-GIN[1] by the same people[2], not
sure it's part of the patch mentioned.

[0] [PDF warning] [http://www.sai.msu.su/~megera/postgres/talks/hstore-
dublin-2...](http://www.sai.msu.su/~megera/postgres/talks/hstore-
dublin-2013.pdf)

[1] [PDF warning]
[http://www.sai.msu.su/~megera/postgres/talks/Next%20generati...](http://www.sai.msu.su/~megera/postgres/talks/Next%20generation%20of%20GIN.pdf)

[2] Who previously contributed GiST and GIN, FTS, ltree or intarray

~~~
jeltz
The GIN improvements were submitted as a set of separate patches and several
of them have been merged into 9.4.

~~~
masklinn
Good to know, thanks a lot for the info.

~~~
jeltz
I know that the GIN compression patch has been committed and some of the patch
for speeding up searches by skipping posting list entires, but I have not seen
anything about the third part, speeding up FTS by adding ordering info to the
indexes.

[https://commitfest.postgresql.org/action/patch_view?id=1357](https://commitfest.postgresql.org/action/patch_view?id=1357)
[https://commitfest.postgresql.org/action/patch_view?id=1382](https://commitfest.postgresql.org/action/patch_view?id=1382)

------
einhverfr
I am interested in it for another reason.

I am working on a not-yet-released (needs test cases!) CPAN module called
PGObject::Type::JSON.[1] This module would let you effectively grab JSON types
from the db, serialize and deserialize on the db query so you don't have to
worry about it.

The serialization and deserialization is cool, but the question is what you
can do with it. With the json functions if they become more functional, we
could use JSON as an input to stored procedures, allowing arbitrarily complex
data types to be fed into or out of stored procedures easily without having to
serialize in tuple/array format.

A single, cannonical, well supported format would be huge in that it would
allow you to do things with the database with extraordinary ease that are not
trivial to do today. For example, really good json support + composite types
as tuple elements (including arrays of composite types) + really good indexing
support should give you an ability to approximate the power (for at least some
workloads) of full custom type development, without needing to go to C.

[1] [https://github.com/einhverfr/PGObject-Type-
JSON](https://github.com/einhverfr/PGObject-Type-JSON)

------
dorfsmay
I don't tkink people care so much for storing json, rather than "talk" json to
the database. A different interface with simple insert and update with json is
typically what people are after. A sort of ORM that simplifies inserts and
update (and retrieval per primary key even) where data is provided as json
would be a much better tool to attract the newer generation of programmers.

And like some of the comments said, PLEASE auto-partitionning!

~~~
einhverfr
Agreed. How about something like:

"Here is a financial transaction. Please make sure it is balanced and post
it."

This is not trivial to do right now. Top-notch JSON support would make it a
breeze.

~~~
jeltz
Just having a JSON type should fix this without the need for a JSON based
protocol. Actually the current JSON support in 9.3 should be enough for this.
Just call a stored procedure with the JSON object describing the transactions.

~~~
einhverfr
It's reasonably easy to wrap the SQL involved in calling stored procedures.[1]

The JSON in 9.3 is _almost_ good enough for this. It doesn't handle nested
data structures to sql nested complex types properly, and so it won't quite
get you there but otherwise agreed.

I.e. to make it work currently you have to slice and dice the JSON in some fun
ways first. I find it less painful to just handle the complex type
serialization/deserialization on the client side.

[1] See [https://github.com/einhverfr/PGObject-Simple-
Role](https://github.com/einhverfr/PGObject-Simple-Role)

~~~
dorfsmay
Yes, it is easy, but people using Mongo, CouchDB etc... will argue that not
having to do it at all is better, and will use that argument to justify their
choice.

Same idea as ORMs... Making any CRUD operation is easy in SQL, why did we need
ORMs? Yet, there is a fragment of the population who wouldn't use a database
they couldn't talk to through an ORM. Today, there is a segment of programers
that won't use a database if they can't PUT({"name": "Big Bird", "address":
"123 Sesame street"}), because, you know, that's the modern way to do things.

~~~
einhverfr
It loses the power of SQL which may also seem to be the modern way to do
things ;-)

~~~
dorfsmay
No. Adding support for a json NoSQL like interface doesn't remove the SQL
access. You could let the front end guys do Create Update Retrieve by primary
key using the json interface, while the backend analysis guys use SQL through
the normal SQL driver.

~~~
einhverfr
I think you misunderstood my point. The "modern" way seems to be to avoid SQL
using either NoSQL or something like an ORM.

Personally I prefer to tackle this from the other side with stored procedures
and service locators so my application code is free of SQL (aside from the
service locator), and all my SQL is in .sql files.....

------
chrisfarms
I'd also add that SPA[1]/client-side applications are also an important
consumer of databases with decent JSON support.

The "backend" for these types of applications are often relatively thin API
wrappers around a database, with the bulk of application logic on the client-
side.

Postgres' JSON improvements over the last couple of years have made it a
pretty good choice already for these sorts of things, but the JSON indexing
improving would make this a no-brainer.

[1] [http://en.wikipedia.org/wiki/Single-
page_application](http://en.wikipedia.org/wiki/Single-page_application)

------
pjmlp
Who cares about node.js? I am yet to see any Fortune 500 deployment on our
consulting projects.

Following fashion usually doesn't lead to good results.

~~~
spikels
Walmart, eBay/PayPal, Yahoo, Amazon/Zappos, LinkedIn, CBS, Time Warner, GM,
DowJones, Microsoft....

And those a just the ones I have happened to hear about with major node.js
projects.

~~~
pjmlp
Those are Fortune 50, not Fortune 500. Note the extra digit.

------
justincormack
Postgres has never been popular. Why the sudden race for popularity now?
MongoDB is replacing MySQL not Postgres. I would be interested to know what
proportion of Postgres users use the JSON features, I suspect it is still
small, even amongst the Node users. The popularity of Postgres is growing at a
manageable rate, not an exploding rate and that seems fine to me.

~~~
venus
> Postgres has never been popular. [..] MongoDB is replacing MySQL not
> Postgres

I don't know what you are basing that on, but it's the complete opposite of
what I am seeing. Postgres is now the default open source database as far as I
can see, especially for Rails. MongoDB enjoyed a brief hype-fuelled day in the
sun but is now viewed as a specialised tool whose choice over a more general
RDBMS would require considerable justification.

I suspect the question of whether its popularity is "exploding" or not depends
on your definition of the word. I have certainly noticed a very marked uptick
in the last few years. Not exploding perhaps, but certainly ballooning.

~~~
justincormack
Thats what I meant by "growing at a manageable rate".

Well maybe Postgres and Mongo are replacing MySQL. I haven't seen Postgres
being the default, but who knows, I mostly hang around with people who have
been long term postgres users anyway.

~~~
venus
I suppose my main experience is with rails people, and as of rails 4 postgres
is pretty clearly the "default" DB, emphatically displacing MySQL.

The author appears to come from much more of a node.js angle and so probably
what he sees is different. I'm probably underestimating the number of node.js
projects so the author likely has a good point. Still, I'd raise an eyebrow
pretty highly at anyone who considered MongoDB a solid default choice for a
general purpose application. Most companies I know who opted for MongoDB
regret it, for reasons that have been rehashed here any number of times.

------
pfraze
The node-pg driver by brianc is in good shape; I'm not sure why OP sees it as
a major issue. Feel free to improve it, though.

------
freyr
Could someone fill me in about what's lacking with the node pg driver by
brianc?

~~~
spikels
My understanding is that it will need to be updated to handle all the new
hstore2 features.

------
saosebastiao
I think this addition is extremely valuable, but courting node users is a weak
argument for it. For me, it is more about the growing importance of semi-
structured data, as well as other forms of data that are difficult to model
outside of row-tuples and joins. Hstore was a hack...really useful, but a hack
nonetheless. Hstore2 is a legitimate kv-store. Now if we could just do the
same with a graph-store.

------
sheff
The linked post says : "I think we shouldn't release 9.4 unless it goes in."

I haven't been following the PG developers mailing list that closely recently
and the article doesn't go into any details as to why it may be delayed.

From the link that @masklinn posted (Thanks, very interesting!) to a hstore
presentation, it looks like the hstore side was developed by November last
year, so I'm wondering why it wouldn't be ready for release by this September
or whether its just the integration of hstore/JSON side of things that is the
hold up.

For me, the new PG feature - although its an outside project - which I am most
looking forward to in the 9.4 time frame is a newer JDBC driver for Postgres (
[https://github.com/impossibl/pgjdbc-ng](https://github.com/impossibl/pgjdbc-
ng) ).

~~~
masklinn
[http://postgresql.1045698.n5.nabble.com/jsonb-and-nested-
hst...](http://postgresql.1045698.n5.nabble.com/jsonb-and-nested-hstore-
td5788891.html) seems to be the relevant threads. Apparently there are issues
over wire format, and jsonb being in core (and unless I'm misreading things
jsonb is to be a new type separate from json, which is unwelcome news, I
didn't know about that although it does make sense from a "binary copy of
database files" sense)

------
sandGorgon
I would like to suggest a modification to the author's thought about an ORM.
What would be very cool would be a PG extension that wraps a mongodb
compatible Api on top of postgresql.

That should completely short circuit the whole value proposition discussion of
pg vs mongo.

~~~
twic
This could be done fairly easily, perhaps more easily, as a standalone
program, i think. Something that exposes a MongoDB-compatible API, and
translates requests into libpq to go to a PostgreSQL backend.

You could run it either on the PG host, or on the same host as the apps
wanting to talk to PG.

~~~
brannon
My understanding of all of this is beginner level at best, but isn't that more
or less what waterline does? [https://github.com/balderdashy/waterline-
docs](https://github.com/balderdashy/waterline-docs)

~~~
twic
Based on a quick look at the documentation, i think the API Waterline exposes
to clients is not an imitation of the MongoDB API, which would allow
PostgreSQL to be used as a drop-in replacement for MongoDB. Rather, it's its
own thing.

Waterline does support both MongoDB and PostgreSQL, though, so you could use
it as part of a migration strategy: start with MongoDB, install Waterline
backed by MongoDB, migrate your app to use Waterline rather than using MongoDB
directly, replace MongoDB with PostgreSQL.

------
rdtsc
Just by observing from the outside, Node.js and MongoDB, don't know what it is
about those two platforms, but the communities formed around them seem to
attract a large number of people I wouldn't want to interact with. There is a
lot of immaturity and drama. Maybe it is the unreasonable and over-hyped
marketing, not sure, can't quite put my figure on it.

Go is also a new language and framework that is popular but it has a different
feel, a better one. Python so far, has one of the largest and most pleasant
community (given its size) I've seen.

Json/KV friendly features are welcome, but perhaps looking to impress Node.js
and MongoDB users is just barking at the wrong tree.

------
collyw
OK, I now know why it is the most important patch. Is there anywhere I can
read what it is what it does exactly?

~~~
spikels
Maybe try searching for "Postgres 9.4 hstore". Just a thought but I'm to lazy
to try. ;)

