
PostgreSQL: Jsonb has committed - r4um
http://obartunov.livejournal.com/177247.html
======
SixSigma
If, like me a moment ago, you have no idea what jsonb is; see here for a full
explanation [http://www.postgresql.org/message-id/E1WRpmB-0002et-
MT@gemul...](http://www.postgresql.org/message-id/E1WRpmB-0002et-
MT@gemulon.postgresql.org)

tl;dr storing json in a way that doesn't mean repeatedly parsing it to make
updates

~~~
atesti
Thanks for that link: It explains that insignificant whitespace, duplicate
keys and key order are not preserved.

Does someone have a link to the storage format? I'm always curious and want to
learn efficient encodings. Thanks!

~~~
andrewvc
It's sad, but many developers actually count on JSON key ordering. Why, I
don't know, but they do. Why people like to code to the implementation, not
the spec, I do not know.

~~~
wiredfool
I've seen cases where the actual api is XML based, and there's a simple
mechanical translation from JSON to XML.

Standards People.

~~~
rspeer
A prominent example is Solr's JSON update endpoint [1], which is just the XML
update endpoint in disguise.

It expects input of the form:

    
    
        {
           "add": {document 1 goes here},
           "add": {document 2 goes here},
           ...
           "commit": {}
        }
    

And of course all the "add" values are different and the "commit" has to come
at the end.

[1]
[https://wiki.apache.org/solr/UpdateJSON](https://wiki.apache.org/solr/UpdateJSON)

------
fvt
PostgreSQL was already able to compete with Oracle's RDBMS and Microsoft's SQL
Server but could soon supplant Mongo for most jobs.

It's great to know that the only required storage components nowadays could be
PG and ElasticSearch (as PG's full-text search can't compete with ES), and
that the former is a no-brainer to setup (on top of AWS, Rackspace, etc.) or
cheap to acquire (with Heroku Postgres for example).

Good job !

~~~
arethuza
Out of interest, what would you say the weaknesses are of the full-text search
in PostgreSQL?

NB I've been using PostgreSQL for a few months on a side project and I've been
hugely impressed. I wanted to add full text searching at some point and rather
than using Lucene or Solr (or similar) I thought I would use PostgreSQL's own
search capabilities - which certainly makes some things a _lot_ simpler than
using a separate search engine.

~~~
pilif
I can't speak for the original poster, but I have three issues with the
postgres offering:

1) it has suboptimal support for handling compound words (like finding the
"wurst" in "bratwurst"). If the body you're searching is in a language that
uses compounds (like german), then you have to use ispell dictionaries which
have rudimentary support for compounds and which aren't maintained any more in
many cases because ispell has been more or less replaced by hunspell which has
far superior compound support which in turn is not supported by postgres.

2) If you use a dictionary for FTS (which you have to if you need to support
compounds), the dictionary has to be loaded once per connection. Loading a
20MB dictionary takes about 0.5 seconds, so if you use Postgres FTS, you
practically have to use persistent connections or some kind of proxy (like
pgbouncer). Not a huge issue, but more infrastructure to keep in mind.

3) It's really hard to do google-suggest like query suggestions. In the end I
had to resort to a bad hack in my case.

Nothing unsolvable, but not-quite-elastic search either.

~~~
karavelov
1) I do not have experience with languages with heavy use of compound words so
no suggestion from me.

2) There is a shared dict extension that load the dict only once. See:
[http://pgxn.org/dist/shared_ispell/](http://pgxn.org/dist/shared_ispell/)

3) About the suggestions: you should look at pg_trgm contrib extension.

~~~
pilif
Thank you very much for 2) - this will allow me to considerably lessen the
amount of infrastructure I'm having to take care of (pgbouncer can go now).

About 3: That's what I was using before moving to tsearch, but pg_trgm based
suggestions sometimes were really senseless and confusing to users. Also,
using pg_trgm for suggestion will cause suggestions to be shown that then
don't return results when I later run the real search using tsearch.

I'm happy with my current hack though, so I just wanted to give a heads-up.

------
pilif
For those interested, the online documentation for the development version of
PostgreSQL has been rebuilt and contains documentation about the new feature:

[http://www.postgresql.org/docs/devel/static/datatype-
json.ht...](http://www.postgresql.org/docs/devel/static/datatype-json.html)

------
nailer
Pardon if this is a little ignorant - I haven't done a lot of SQL before, so
[http://www.postgresql.org/docs/9.3/static/functions-
json.htm...](http://www.postgresql.org/docs/9.3/static/functions-json.html)
isn't that obvious to me - but:

Does this mean I can do Mongo-style queries, retrieving a set of documents
which match particular key: value criteria, using PostgreSQL?

~~~
arethuza
Yes, you can do extract values from JSON, filter based on those values and
even create indices - not sure if that counts as "Mongo-style queries" but
I've used it and it works pretty well:

[http://clarkdave.net/2013/06/what-can-you-do-with-
postgresql...](http://clarkdave.net/2013/06/what-can-you-do-with-postgresql-
and-json/)

The only thing to note about current JSON support in PostgreSQL is that, as
far as I know, there is no way to update parts of a JSON document - you have
to rewrite the entire document each time. I've not found this to be a big
headache YMMV.

~~~
marcosdumay
> The only thing to note about current JSON support in PostgreSQL is that, as
> far as I know, there is no way to update parts of a JSON document

That's what jsonb fixes.

~~~
magnush
No, jsonb will still update the whole value when you edit it. For now at
least.

~~~
roller
Isn't this the case for postgresql data in general? ie, on updates, new data
rows are written containing all the field values, whether or not they have
changed.

~~~
arethuza
It's more they case that when you want to reach into an existing stored
document and change a single value you have to re-write the entire document
from your application.

On the reading side there are functions and operators that allow you to reach
into stored JSON and extract the parts you want. What would be nice would be
to be able to do something similar for updates - although this is clearly more
complex than reading, so I can see why it has been done this way.

Edit: I guess the most general solution would be to directly support something
like JSON Patch:

[https://tools.ietf.org/html/rfc6902](https://tools.ietf.org/html/rfc6902)

------
skrause
_> Jsonb has several missing useful features (from nested hstore), but that
could be added later as an extension._

Which are those missing useful features?

~~~
craigkerstiens
All of the equivalent operators don't exist, though those are only slightly
interesting. Perhaps most interesting would be all the index types, right now
currently GIN works, but GiST didn't quite make it. GIN was definitely showing
the better performance on JSONB but there are still valuable cases for GiST
which often appears smaller on disk.

~~~
magnush
Another important part about gist is that GIN has much higher overhead for
tables that are frequently updated.

------
saltvedt
Slightly off-topic, does anyone know if there is progress being made on the
proposed HTTP API for
PostgreSQL?([http://wiki.postgresql.org/wiki/HTTP_API](http://wiki.postgresql.org/wiki/HTTP_API))

------
twic
What does this binary format actually look like on disk? Is this documented
anywhere other than in the code that reads and writes it?

I do remember a description of it in some slides a few months back, but i
can't find them now.

~~~
rch
Maybe this is what you are looking for:

[http://www.sai.msu.su/~megera/postgres/talks/hstore-
dublin-2...](http://www.sai.msu.su/~megera/postgres/talks/hstore-
dublin-2013.pdf)

How the ideas migrated from hstore to jsonb is briefly described here:

[http://obartunov.livejournal.com/177247.html](http://obartunov.livejournal.com/177247.html)

[http://www.postgresql.org/message-
id/53304439.8000602@dunsla...](http://www.postgresql.org/message-
id/53304439.8000602@dunslane.net)

~~~
twic
Yes, that's it, thanks!

I suppose the definitive description of the format is the function
JsonbToCString, which is what writes the in-memory structures out as bytes:

[http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=...](http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/jsonb.c)

------
illuminated
I only hope it won't be too long before various "drivers" and frameworks start
supporting it, otherwise the acceptance rate will be too low. And Jsonb
deserves to be accepted as fast as possible.

~~~
guptaneil
Considering Postgres is the first relational database to support jsonb,
portability (which is the main benefit of drivers) doesn't really apply. Any
decent driver or framework should let you write native psql commands, so you
should be able to start using this as soon as it's released.

~~~
illuminated
Being able to use Jsonb through a framework and fully utilize it (through it)
are not the same things. I will be definitely use it soon, but unless it's
fully supported in frameworks (...) it won't be widely accepted.

~~~
lmb
Seems like there is room for a PostgreSQL-only ORM that specifically makes all
these nice features available. Anyone heard of such a thing?

~~~
masklinn
Most "frameworks"/ORM can provide database-specific features, and it's
especially easy for field types. e.g.
[http://docs.sqlalchemy.org/en/rel_0_9/dialects/postgresql.ht...](http://docs.sqlalchemy.org/en/rel_0_9/dialects/postgresql.html#postgresql-
data-types) (data types local to SQL Alchemy's Postgres dialect) or
[https://github.com/djangonauts/django-
hstore](https://github.com/djangonauts/django-hstore) (hstore in Django)

------
luikore
What's the difference between jsonb and bson? They look so similar to each
other...

~~~
MartinMond
BSON actually was considered as the JSON storage format for PostgreSQL, but
was discarded once people figured out that BSON stores ["a", "b", "c"] as {0:
"a", 1: "b", 2: "c"} which is just silly.

~~~
cies
wow.. thanks for that info, i came here to see the diference and and a little
shocked by the silliness of BSON (and the team that created it, apparently
Mongo); compared quality devotion of the postgres community..

well done postgres!

------
andybak
Can anyone with some familiarity with PostgreSQL's release schedule comment on
how soon this will be in an official release?

~~~
pilif
This will go into 9.4 (or maybe they'll call it 10.0) which will be released
September-ish barring no bigger issues.

~~~
jeltz
I think they will reserve 10.0 for when logical replication is done.

------
optimiz3
Can anyone comment on the storage overhead of JSONB key names? One of the
worst things about MongoDB's BSON format is that 8M instances of the following
record:

{ "user": "test", "isValid": true }

will at a minimum cost 8M * (4 + 7) simply due to field name overhead. MongoDB
isn't smart enough to alias common field names, and this has remained as one
of its biggest problems.

[https://jira.mongodb.org/browse/SERVER-863](https://jira.mongodb.org/browse/SERVER-863)

~~~
subleq
If only there were a way to store data where each record has the same
structure...

The JSON support in postgres allows a hybrid approach. Put the common data in
a tabular format, with a JSON column that stores all the extra, variable data.

~~~
optimiz3
Internal pseudo-schematized data storage and JSON support aren't mutually-
exclusive. Nice to have a checkbox feature like this but the implementation
isn't particularly exciting for large datasets.

------
userbinator
Is there a trend of making verbs reflexive among developers/managers? I've
seen "product X has released", "patch Y has applied", and a lot of similar
phrases which I'm still having trouble parsing. When I saw the title I thought
"has committed what?"

~~~
Intermernet
As others have pointed out, this seems to be due to the writer not learning
English as their first language.

I love this, and it reminds me of Steven Pinker's "The Language Instinct"
where he mentions some stats about the most common dialects of English world-
wide are _not_ actually in "native" English speaking countries, but in regions
in south-east Asia, as well as India and, increasingly, China.

It's highly likely that, in two or three generations, all of the English
speaking population of the world will be "reflexifying" their verbs... Please
excuse my mangling of my native tongue :-)

~~~
portmanteaufu
I've also noticed how very common it has become to find questions in the
format "How to do [task]?" rather than "How do I do [task]?"

I strongly suspect that this has arisen from Google searching, where people
naturally write their queries to complete the sentence "I would like some
information about ________."

~~~
nl
_" How to do [task]?"_

I've seen that most often from people of Indian origin. Not so much from
Pakastanis though.

Around 5-10 years ago I used to commonly see "I have doubt on how to
<XXX>"[1]. I don't see it quite as often now days.

[1]
[https://www.google.com.au/search?q="I+have+doubt+on+how+to+*...](https://www.google.com.au/search?q="I+have+doubt+on+how+to+*")

------
booleanbetrayal
This is fantastic news. Hats off to the PostgreSQL team! Hoping this lands on
AWS RDS shortly after release.

------
abstrct
Does the JSON type still have a set of use cases as well or is it just
obsolete now? If I was writing large quantities of json documents to a table,
and I didn't need tag indexing or targeted updating, would using the JSON type
be faster than JSONB?

~~~
magnush
Probably the most important difference is that the old JSON types preserves
whitespace and attribute ordering, whereas the new one doesn't. If you don't
care about those, the old one is also slightly smaller on disk - but I would
say not worth it based on the simple fact that you might want to search it in
the future some time...

------
piratebroadcast
Someone explain like I'm 5?

~~~
reactor
As of ver 9.4, PostgreSQL support storing data in JSON format. Which it is
nothing but 'text' datatype with additional syntax check built-in. You can't
do a query or do update partial/selected node inside the JSON doc. But with
this new jsonb (JSON Binary) format, you can do simple query as well as update
certain node (if it can reach through the query).

Think it like a very basic "MongoDB" implementation :)

Take a look at their devel doc
[http://www.postgresql.org/docs/devel/static/datatype-
json.ht...](http://www.postgresql.org/docs/devel/static/datatype-json.html)
for more details.

~~~
pilif
just nitpicking, but the json data type was added with 9.2 where it was not
much more than syntax checked text plus one or two functions to convert from
records and arrays into json.

In 9.3 we got many more useful functions that allow to query into json
documents, though if you wanted to select rows based on the content of a json
document, you'd have to resort to functional indexes.

9.4 (which is what the above post is talking about) will provide index support
for arbitrary queries and a much more efficient format to store JSON data
which will not involve reparsing it whenever you use any of the above
functions.

~~~
hackerboos
Am I right in thinking this is only good for reads though?

Do writes still require loading the field into memory, parsing, updating and
the writing back?

~~~
pilif
Yes. You are right. At the moment there's no exposed functions for altering
that binary data structure.

Because of the way how Postgres works though, the row will always have to be
rewritten in the future (all updates to a row will cause a new copy to be
written - rows are immutable in Postgres). What we might gain in the future is
a way to skip the parsing process, but the document will always have to be
rewritten.

------
rch
I'd love to manage HDF5 this way too. Anyone know of an effort already
underway?

------
_pmf_
First comment: > I believe that this is one of the most significant
breakthroughs in Postgres history. Congratulations.

Yeah, right.

~~~
avenger123
I don't follow Postgres too much and I saw the tweet by Peter van Hardenberg
first. It's great to see the enthusiasm for the project.

It's likely that 9.5 or even 9.6 will actually see the real significance of
this new feature. I don't know how solid the implementation is but once its in
the wild and these missing features are added to bring it to parity with
MongoDB and others, and the bugs have been fixed, it will be a great solution
for a NoSQL database instead of MongoDB and others.

That is significant and worthy of celebrating. This will also give MongoDB and
others another yardstick to stay on top of their game.

------
capkutay
Yeah but MemSQL can do this a bajizillion times faster!

~~~
capkutay
I don't know if I got downvoted for sarcasm (which is a valid reason for
downvoting). But just incase no one noticed, this was sarcasm.

