
PostgreSQL as Schemaless Database [pdf] - xenator
https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdf
======
neya
Where is taligent when we need him? MongoDB is pretty impressive considering
its age to the market, but I'd still go with the relational version of
PostgreSQL anyday. Not because of the benchmarks, but in general, most use-
cases are covered well enough by our relational db's like MySQL and
PostgreSQL. The most expensive mistake one could make is using a NoSQL db
where an SQL db fits perfectly. I admittedly made this mistake and learnt a
LOT. Now, these benchmarks are just some extra spice for sticking with my
lovely PostgresSQL :)

~~~
taligent
Why do you need me ? I am just that crazy troll who believes that no database
is perfect for all use cases and that making blind generalisations like "all
NoSQL is bad" is stupid. Especially since MongoDB is as different from
Cassandra or Riak as it is from Spanner as that is from the other 120+ odd
NoSQL databases.

And if benchmarks impress you I suggest you take a look at Cassandra sometime.

[http://techblog.netflix.com/2011/11/benchmarking-
cassandra-s...](http://techblog.netflix.com/2011/11/benchmarking-cassandra-
scalability-on.html)

~~~
jchrisa
I keep hearing Cassandra touted as fast, but I just don't see it. Instacluster
brags about 17ms at the 99th percentile being fast, but where I come from that
gets you thrown out of the running.[1] The Netflix benchmark linked above sees
10,000 ops/sec/node. Again, not impressive.

Skip to slide 22 of this comparative benchmark with Couchbase Server[2] and
you'll see that <5ms latencies at 99% is totally achievable. We can't take too
much credit for these stats, most of the blame goes to the fact we use
memcached at the front end, and it's been optimized for consistent low-latency
for a long time.

I don't want to make this a speed contest, as there are a lot of good things
about Cassandra. It just strikes me as odd that people sell Cassandra based on
low latency. If you want consistent high-performance, don't use a database
that runs on a language virtual machine. There are lots of great reasons to
use Cassandra, standout benchmarks aren't one of them.

[1] <https://www.instaclustr.com/solution#tco> [2]
[http://www.slideshare.net/renatko/couchbase-performance-
benc...](http://www.slideshare.net/renatko/couchbase-performance-benchmarking)

~~~
3amOpsGuy
What else would you say is good about Cassandra?

I find the datamodel inflexible and often hard to work with for real world use
cases. I think the tooling is immature and the hector API leaves me feeling
depressed (pycassa and astyanx are quite a lot better).

All that said, I find cassandra to be fast, incredibly fast. 99th percentile
on our 9Tb time series store is around 20ms on below par hardware.

Another thing i like is we've never had downtime, not even scheduled. We've
had nodes fail, we've had datacentres isolated during disaster recovery tests,
yet our cluster has continued on regardless (2 x sub clusters per region).
Whatever way you stack it up, that's impressive. Remember with cassandra
there's no load balancers or other shenanigans involved.

Garbage collection / virtual machine concerns are a red herring due to the
mechanics of a cassandra query (certainly for CL < RF)

~~~
jchrisa
I'm glad you are happy with 20ms, to me it is slow...

Probably the best thing about Cassandra is that it is written in Java, so it
is easy to fork and add custom behavior. Of course this is orthogonal to
performance.

------
sargun
So, the problem that I have with PostgreSQL isn't that it doesn't have every
datastore under the sun, but rather the lack of automated distribution, and
fault-tolerance.

This is not a hard thing to build though (continue reading please). I don't
mean that it's an easy task, but that the semantics by which you may be able
to build a fault-tolerant, distributed database on top of Postgresql are
pretty straightforward.

1\. Partitioning: So, Postgresql has multiple notions of partitions. Ideally,
this would be done at the database level in Postgres, but the idea is that
instead of having your entire database fail at once, a transparent layer in
which you SET PARTITION_KEY=X before querying would make this pretty
straightforward. It would be nice if there was a semi-transparent proxy layer
that this.

2\. Replication / Fault-tolerance: Postgresql today has built-in synchronous
master/slave replication. It would be nice if someone built a automated
failover system on top this. Alternatively, it would be really interesting if
someone had multiple postgresql instances, and then build a WAL log per
partition, and performed 3PC between N replicas, and you could have a fully
peer-to-peer postgres. I imagine these transactions would either need to be
PL/SQL, or fully written-out, staged serializable SQL queries.

3\. Co-processors: One of the biggest benefits a distributed, fault-tolerant
Postgresql setup would give you is the ability to collocate your application
with the data. Instead of having to do read, modify, write, you can write
complex business logic that can run on whichever Postgres partition's master
node, and the data doesn't have to go over the network.

I feel by introducing some basic wrappers around postgres to support
partitioned, master-slave topologies, and perhaps a slightly different query
interface to support this transparently, PostgreSQL could effectively replace
most modern NoSQL databases.

~~~
bbromhead
Hot Standby in postgres is pretty damn good for replication (like you said
making failover occur properly takes a little bit of effort but its not too
hard).

Also pgpool2 helps a lot with partitioning and parallel queries.

For a relational db, postgres does a pretty good job at distribution and fault
tolerance.

~~~
drdaeman
Postgres won't let you perform any sufficiently long (read-only) queries on
slaves. For example, I was unable to run pg_dumpall on a slave to have DB
snapshots.

This is [well-explained]([http://www.postgresql.org/message-
id/201102272005.00234.jens...](http://www.postgresql.org/message-
id/201102272005.00234.jens@wilke.org)), but still inconvenient. This way,
slaves are only useful for failover and possibly offloading very short read-
only transactions, which is pretty limited.

~~~
joevandyk
This used to be the case, but it's not anymore. You can run long-living
queries, run pg_dump, etc on slaves.

~~~
moe
Do you have a reference in which pg version this got fixed? We still
pg_xlog_replay_pause() our slaves, it would be nice to drop that.

~~~
bbromhead
[http://www.postgresql.org/docs/9.2/static/runtime-config-
rep...](http://www.postgresql.org/docs/9.2/static/runtime-config-
replication.html#GUC-MAX-STANDBY-ARCHIVE-DELAY)

------
eksith
This is pretty much the reason we're using Postgres for our document storage.
Admittedly, our doc store isn't really extensive and it's mostly for internal
use (with limited client access), but we just didn't see why it was necessary
to go with a new database when the existing stack worked well.

YMMV, but we've yet to come across an instance where a pure document DB was
necessary or desired, even to store documents (which from my perspective seems
nothing more than a flattened EAV model).

~~~
aaronyo
Do you have any thoughts on expression indexes? Assumed I would have to pull
out fields from the document to index reasonably on pgsql, but this
presentation at least shows that simple expression indexes perform well. I
wonder about compound, nested, etc.

~~~
eksith
I'm not too familiar with the current schema, but I do know we make extensive
use of partial indexes ( _WHERE (vnnk IS NOT NULL)_ , _WHERE (fhs > 0)_
etc...) and compound indexes in btree. In GIN, for example, this isn't
possible, but GIN itself works reasonably well for partial matches (since it's
also btree) so it's not too much of a deal. We use a combination of hstore,
GIN, conventional fields and multiple indexes with the schema abstracted out
on our app to provide a document storage and search engine with a separate app
for client login. I believe some metadata is stored in conventional fields
(varchar integer etc...).

Performance compared to the conventional schema is of course noticably slower,
but not unacceptably so for a few thousand docs and growing (this is compared
to hundereds of millons of rows on the conventional schema).

But such is the sacrifice for reliability. When NASA hit a spat of failed
missions to Mars (explosions, collisions, miscalculations etc...) people were
joking about the creed at the time, "Faster, Better, Cheaper; Pick 2".

This kind of engineering is a turn off for a lot of folks, so I can certainly
see the appeal of a document DB. I've also seen people fall into the trap of
"I don't know what kind of data I'll be getting so I'll go with a doc store".
Well, there has to be _some_ hint of what kind of data could be arriving so
engineering for that with some wiggle room early is a good idea.

You shouldn't take my word or the slides at face value, though. Really think
through, experiment, throw stones at it etc... and see if it works for you.

~~~
aaronyo
Reliability is gold.

Thanks for the pointers. I will read up on GIN and hstore.

If nothing else, duplicating an element from a doc column to a first class
field seems reasonable. Also, even in mongo one must define indexes which is a
schema of sorts and can be non trivial to change.

------
jtchang
Really great presentation/slidedeck.

For me I still feel MongoDB is a semi niche solution. I rather go with a
standard SQL database unless there is a specific reason why you need NoSQL.

~~~
inopinatus
I've tried both MongoDB+Mongoid and Postgresql+hstore as a way to do better
inheritance than STI in ActiveRecord.

Both have deficiencies: Mongo is not mature enough for an old lag like me to
trust (and is incompatible with some Rails extensions), and hstore had
problems with key deletion and cannot do object references or indeed types
other than strings.

~~~
shawn-butler
Can you elaborate on what you mean by key deletion problems?

------
karteek
First thing I remembered after reading the title is
[http://pgsnake.blogspot.in/2010/04/postgres-91-release-
theme...](http://pgsnake.blogspot.in/2010/04/postgres-91-release-theme.html)

~~~
mh-
haha, thanks for posting that. i enjoyed:

 _"It's time for us to switch to something fresher. I personally would have
preferred XSLT, but QUEL is almost as good."_

------
hannibalhorn
The new JSON functions have landed in the development version, for anyone
interested in seeing the future syntax:
[http://www.postgresql.org/docs/devel/static/functions-
json.h...](http://www.postgresql.org/docs/devel/static/functions-json.html)

------
adrianhoward
Damn. The "XML. It seemed a good idea at the time" slide caused me to spray
coffee on my laptop.

~~~
Evbn
See also "XML literals as a native syntax in Scala."

------
xenator
I want to admit that conclusion in presentation can be interpreted vise versa:
MongoDB almost is fast as PostgreSQL. Also, notice that comparsion was done
with Mongo 2.2, in 2.4 JS engine was changed to V8, so there is a lot of speed
improvements. And compare PG have 18 years of development vs 5 of MongoDB. So
what I want to say? Difference is not so big after all.

~~~
jacques_chester
PostgreSQL hasn't spent 18 years pitching itself as a pure performance play,
though. It started as a research project into the object-relational model and
was adapted to be a safe and featuresome database first and performant second.

Mongo's performance will converge downwards as people demand more features and
safety guarantees.

If however you want the original premise -- memory-only, schemaless, auto-
sharded document database -- Mongo is still the easier pick.

It's a cliche, but it's easy to:

1\. overestimate the size of your data, (especially since relational models
_by design_ squeeze out all duplicated statements of data)

2\. overestimate how much traffic they need to serve

3\. underestimate what can be achieved by stock RDBMSes with modern hardware

4\. underestimate the deferred development costs of schemaless data and

5\. underestimate the costs of avoiding durable storage.

It's a really hard field to make generalisations about right at the moment.
The algo-economics of hardware (especially SSDs) and software have been
changing so fast that any generalisation formed even 2 years ago is
unreliable.

Speaking only for myself, I need a lot of talking down from my data-
safety/data-integrity tree. _I don't trust myself_. That's why I use tools
that slow me down.

~~~
jfb
"1. overestimate the size of your data, (especially since relational models
_by design_ squeeze out all duplicated statements of data)"

Of course, this is opt-in in a SQL world, more's the pity. I've seen some
nightmarish ActiveRecord spawned Postgres monstrosities.

I wonder how much of the attraction of schema-less DBs comes from the lack of
friction between the largely guarantee-free idea of data storage precipitated
by Rails misfeatures -- there's much less distance between a MySQL or Postgres
database as "designed" by ActiveRecord and a NoSQL instance. Having already
opted out of your database's particular guarantees, there's very little
keeping you there.

~~~
mvc
Here's a question for people who prefer to put their guarantees (i.e.
unique/check constraints, foreign keys etc) in their database. How do you
present these constraints to the user? How do you show when they've been
violated? Just present them with the PG error message? Maintain a mapping of
PG errors to more domain specific messages? Bite the bullet and just code them
twice (once for the DB, and once for the user)?

~~~
jfb
One could build out from the database schema -- Postgres certainly provides
sufficient type information to build consistency double-checks in other layers
of the application. As an added bonus, other consumers of the data won't be
flying blind.

~~~
mvc
Yeah, I'm trying to build something like that right now. Nice to know others
have similar ideas.

------
aaronyo
It's only one of several interesting comparisons in the slides, but I think we
need more data points on the disk space to make use of the data. Does it grow
linearly? probably, but is there a +x in there? Mongo has some big upfront
disk allocations. Were those considered somehow?

------
thomasfl
When the benchmark shows postgres can do the same as mongo, only faster, then
why aren't we seeing more hybrid solutions? Online retailers would be better
off using relational storage for monetary transactions and time series, and
json storage for document like data.

~~~
ProblemFactory
The great thing about PostgreSQL is that it _is_ hybrid.

With <http://www.postgresql.org/docs/9.1/static/hstore.html> you can put
schemaless documents into the same database, even into the same table row as
your usual relational data.

You could start with a schema that's only a primary key and hstore for all
other data (like a fully schemaless database), but over time extract important
fields from the hstore into columns to add indexes, constraints or joins.

~~~
rurounijones
That is... I am not sure, either madness or genius.

But an intriguing idea either way.

------
guilloche
I am always wondering why no-sql has better performance than SQL if we have
same simple use case (no join etc.).

If the data is well structured and no join is needed, can I assume that
relational database should have better performance than no-SQL?

~~~
lucian1900
Depends what you mean by "NoSQL". MongoDB is pretty terrible overall, so don't
even bother.

Systems like Dynamo/Riak/Cassandra sacrifice consistency (most of the time)
for performance, largely because a distributed setup (sharding in particular)
is easier when you don't guarantee consistency. Note that none of them
sacrifice durability.

It is possible to have a distributed _and_ consistent database, but you either
have a bottleneck (master nodes in BigTable/HBase), require significant
infrastructure (Google Spanner) or make schema changes very hard (Hyperdex).
It's a hard problem.

~~~
guilloche
Thanks for your reply.

But I have one more question, if I need to manage bank account, then the
database must be consistent. Lets say that I just need to keep balance and
social security number for the account, and only need to query the balance.
Can No-SQL like redis have better performance than relational database?
(assume same on logs and transactions).

~~~
lucian1900
Redis is consistent, it's just less durable than Postgres. A query cache is a
very good usage for it. It's also likely to be faster than Postgres for the
simple operations it is capable of.

The only downside is that (for now at least) it is single node, with optional
replica slave(s).

------
cinbun8
There is still some way to go to support better indexing of json fields, but
I'm pretty excited about these developments. In the future you might see
schemaless and schema driven data living side by side on postgres.

------
fpp
Would be interesting to know which version of MongoDB was used (2.4 also uses
V8).

The original slides are at: [https://wiki.postgresql.org/images/b/b4/Pg-as-
nosql-pgday-fo...](https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-
fosdem-2013.pdf)

There is also another presentation on the use of PL/V8 (on Heroku) at the
PostgreSQL wiki - from last week's NYC talks:
<http://plv8-talk.herokuapp.com/#1>

~~~
disbelief
In one slide it says he used MongoDB 2.2 on his mac.

------
lucian1900
Sadly it's not a fair fight, of course it'll beat Mongo (even though Postgres
is ACID).

I would be more interested in comparing the biggest machine you can get on AWS
running Postgres with a cluster of smaller machines running Cassandra. I
expect Cassandra to win, but of course it's not consistent.

------
lifeisstillgood
Well that's Monday's afternoon decided !

------
forgotAgain
Almost 6,000 synchronous commits per second to a 7,200 rpm disk seems high to
me.

Anyone seeing those numbers for any database that synchronously writes the log
to disk?

~~~
sendob
[http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-t...](http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-
transactions-on-my.html) may interest you

~~~
forgotAgain
While very impressive, I don't see how your post is applicable in this case.
The OP at the very least implies that the transaction are done end to end and
not batched. I find 6,000 distinct disk I/O operations per second, to append
the log, very very high for a 7.2k disk.

~~~
sendob
that post, also discusses using a 7200rpm disk.

I think where you may be confused is considering them as distinct IO
operations, vs seeing it as a point which must have been flushed past (
sequentially )

~~~
forgotAgain
No, I don't think I'm confused. The OP says postgresql is used in its default
configuration:

 _Stock PostgreSQL 9.2.2, from source. No changes to postgresql.conf_.

Given that statement then either the writes are serialized or postgresql is
not ACID compliant in its default configuration. I'm not an expert on
postgresql but I assume that it is ACID compliant in its default
configuration. Therefore my skepticism on the 6K writes per second.

~~~
sendob
from the link: "In Postgres 9.2, this improvement automatically becomes
available without any further configuration."

"Essentially, it accomplishes this by reducing the lock contention surrounding
an internal lock called WALWriteLock. When an individual backend/connection
holds this lock, it is empowered to write WAL into wal_buffers, an area of
shared memory that temporarily holds WAL until it is written, and ultimately
flushed to persistent storage."

"With this patch, we don’t have the backends queue up for the WALWriteLock to
write their WAL as before. Rather, they either immediately obtain the
WALWriteLock, or else queue up for it. However, when the lock becomes
available, no waiting backend actually immediately acquires the lock. Rather,
each backend once again checks if WAL has been flushed up to the LSN that the
transaction being committed needs to be flushed up to. Oftentimes, they will
find that this has happened, and will be able to simply fastpath out of the
function that ensures that WAL is flushed (a call to that function is required
to honour transactional semantics). In fact, it is expected that only a small
minority of backends (one at a time, dubbed “the leader”) will actually ever
go through with flushing WAL. In this manner, we batch commits, resulting in a
really large increase in throughput..."

I am sorry to cut and paste so much of the article, but I hope this is
helpful?

~~~
forgotAgain
Looking at your linked posting more closely I do not believe it is applicable
to this situation for two reasons. Again I'm not an expert on postgresql but:

1) It appears to me that the technique discussed in your linked post is about
ganging unrelated transactions together into a single flush to disk. I do not
see that to be the case in the OP. Since all of the writes are going to the
same table they are related.

2) Looking at the second graph in the linked post _pgbench transactions/sec
insert.sql_. The number of clients is high. I got the impression from the OP
that there was only a single client. Indeed if there were more than a few
clients the benchmark would have been subject to the deficiencies of the
client libraries used.

~~~
sendob
you are good to be suspicious into any benchmark(as even the original slides
note). I dont think we really have any information as it relates to
transaction boundaries present, nor clients used:

From the slides: "Scripts read a CSV ﬁle, parse it into the appropriate
format, INSERT it into the database. • We measure total load time, including
parsing time. • (COPY will be much much much faster.) • mongoimport too, most
likely."

Postgres does its best to use intelligent defaults, but it is only a part of
the system, it is generally up to practioners to be wary. Tools like:
[http://www.postgresql.org/docs/current/static/pgtestfsync.ht...](http://www.postgresql.org/docs/current/static/pgtestfsync.html)
Assist in this goal, but generally have to watch out for things like raid
controllers that are not battery backed etc ( depending upon your
environment).

There is no mention of the number of clients used directly.

I was simply trying to highlight that with the (little) information available,
it is very possible.

It is also possible, that the session(s) should themselves choose to pursue an
asynchronous commit strategy ( on a session level
see:[http://www.postgresql.org/docs/9.2/static/wal-async-
commit.h...](http://www.postgresql.org/docs/9.2/static/wal-async-commit.html)
) which would also not require modifying the configuration, I do not know as I
have not seen the scripts, but it is similar to how a library could interact
with:

[http://docs.mongodb.org/manual/reference/command/getLastErro...](http://docs.mongodb.org/manual/reference/command/getLastError/)

Thanks.

~~~
forgotAgain
Nice discussion, thank you.

------
antimora
The problem with this scheme is that Postgres does not support JSON indexing
and partial updates which are important for high transaction databases.

------
apunic
When I checked hstore's functionality the last time (~6 months ago) it was
rather limited compared to Mongo's full-blown API. Any new observations?

------
ogdr
MongoDB's native storage type is not JSON, but BSON. One should know this if
they're about to wade in a deep river, trying to argue against it.

------
sandGorgon
would it be possible to build an extension to expose a mongo compatible
interface in postgres - essentially being able to use all the ORM/code built
around mongo for postgres, seamlessly.

~~~
lucian1900
Yes, but why bother? There are already so many good SQL clients out there,
like SQLAlchemy.

~~~
sandGorgon
there are many apps built around mongodb that it would be interesting to do a
drop-in swap with postgres (assuming protocol compatibility).

hell, ill personally pay a lot to have a mysql compliant interface just for
WordPress!

~~~
lucian1900
I see, that motivation makes more sense to me :-)

I guess it should be possible to write a wrapper that executed at isolation
level "single".

------
meritt
Oh. This is legit?

I skipped this story because I assumed it was an April Fools' prank.

------
sylvinus
What I get from these slides is that MongoDB has better average performance
(when both using indexes or doing full scans) compared to all schemaless
PostgreSQL options.

------
weisk
sorry but the first thing I readt was "PostgreSQL the shemales database".
Confusing ..

~~~
xkcdfanboy
Same here, don't feel bad.

And to the downvoters, the correct word is aschematic or schema-less with a
dash.

If a word ends in a vowel, you must use a dash.

Hatless is fine, but scrupleless should be scruple-less. The more you know,
you scruple-less bastards!

~~~
eru
> If a word ends in a vowel, you must use a dash.

Feeling prescriptive these days?

