
Don’t Assume PostgreSQL Is Slow - craigkerstiens
http://thebuild.com/blog/2015/10/30/dont-assume-postgresql-is-slow/
======
fishtoaster
My fun postgres story is the time I spent a week learning and using CouchDB to
do some operations across five to eight million tweets. I mean, _surely_ a
relational db is ill-suited to something like that, right? After no small
amount of frustration, I just threw a cleaned-up version of each tweet into
postgres and did a `LIKE` query across all of them. It returned in, iirc, less
than a second (vs nearly a minute for couch).

The lesson, of course, is not that couch is bad and postgres is great, but
rather as the OP said: "Human intuition about what might be slow is almost
certainly wrong."

~~~
astine
I once built a location aware application on MongoDB. After months of Mongo
being the bottleneck in the application and periodically crashing, I switched
to a naive implementation of the application with PostgreSQL and the
application sped up five-fold. I've learned my lesson on Mongo. Postgres,
however, has always been pretty awesome.

~~~
DrJosiah
Postgres with its Geo extensions are also pretty solid.

------
malkia
FYI: The source code to the nextval() function:

[https://github.com/postgres/postgres/blob/71fc49dfe1d99bd83c...](https://github.com/postgres/postgres/blob/71fc49dfe1d99bd83cd99c2e7a39f93e07d19310/src/backend/commands/sequence.c#L498)

------
bigdubs
I'm not sure where this mantra that pgsql is slow comes from.

My sense is it's more from the monolithic nature of a typical install which
makes it difficult to scale horizontally to handle supreme amounts of load.

For most development use cases, and even most production use cases, pgsql is
plenty fast for almost everything if you design your schema well and leverage
things like indexes.

~~~
chubot
It's because SQL has no performance model. You can't tell by looking at the
code (the query) what the performance will be like. For example, you can speed
up or slow down code by orders of magnitude by changing a line in a completely
different part of the code base (i.e. adding a index). And it's also not an
orthogonal change -- increasing your read speed decreases your write speed.

The _common_ case for business apps is suboptimal schemas and missing indices.

Postgres is fast enough, but it's not trivial to use well. On the other hand,
anyone who has programmed in Python or Ruby pretty much has an intuitive feel
for how well Redis operations perform. With SQL you get these wild swings,
depending on what the query optimizer does.

I'm not saying you should use Redis -- it's worth it to learn how to use
Postgres. But there's just a lower barrier to entry for non-SQL solutions.

~~~
acdha
> On the other hand, anyone who has programmed in Python or Ruby pretty much
> has an intuitive feel for how well Redis operations perform.

… have you ever seen that not inevitably lead to wild performance swings as
people reinvent SQL in their favorite abstraction layer? The same kind of
developers who used to make an ORM demand-load a million objects to calculate
a number are now doing that with Redis, Mongo, DynamoDB, etc. and telling
themselves that it's okay for it to take 5 minutes to sum a million records
because it's Big Data™️.

I would posit that you're more likely to see good performance from a database
engine with many thousands of engineer-hours poured into optimizations than
what the average developer is going to produce reinventing a reporting
language in Ruby/Python. Those same developers who can't add an index on a SQL
database are not suddenly going to figure out how to maintain denormalized
records in Redis.

~~~
crdoconnor
>telling themselves that it's okay for it to take 5 minutes to sum a million
records because it's Big Data

If it's the kind of query you run every week, it probably _is_ okay.

I've not optimized tons of queries like that before - and prevented other
developers from doing so - not because it's "big data", but simply because the
risk introduced by changing the code and the effort involved in mitigating
that risk simply didn't make the optimization worthwhile.

In most cases I never would have built it _myself_ from the ground up in such
a way that it would take 5 minutes (e.g. without indexes), but somebody else's
dumb system that's running in production and solving real human problems in an
entirely acceptable time frame is still code that's solving people's problems.
And, changes always come with a higher cost and risk than most developers
imagine - _especially_ in big systems.

------
Herald_MJ
Postgres isn't slow. But this is a very specific example that doesn't actually
prove anything either way.

------
jamiesonbecker
It sounds like this was more of a tip to always test your assumptions, rather
than an serious attempt as a benchmark. It does its job in that respect, but
there's so much more to both of these servers than the test case.

It sounds like OP wrote a _for_ loop in Python but was surprised that the
Redis one was only marginally faster. Nearly all of the speed loss would have
been in that loop, not in either server. Both servers are so fast that all of
the latency was in the mechanics of a sequential _for_ loop.

Postgresql is awesome for so many things, and certainly so is Redis, but those
are really two distinct sets with very little intersection. Knowing when to
use each is something that a good architect intuitively grasps. (these two in
particular complement each other very well.)

Always test your assumptions within your specific use case, and don't embark
on performance optimizations until you've solved all of your other
requirements first.

------
herf
Redis can do a lot more than 12k operations/sec. Bottleneck is not the backend
here at all.

~~~
arielweisberg
I am guessing this is a single threaded benchmark.

Performance concerns in practice are much more complicated than "What is the
average response time for a single thread on one operation?"

------
mbell
> Not really (test on an Amazon i2-2xlarge instance, client over local
> sockets, Python client)

This is a flawed benchmark setup with little connection to the real world.

~~~
imaginenore
No, it's the right way to do it. This way you don't pollute your benchmark
with random things like connection speed, connection pool performance,
networking issues, etc.

------
andy_ppp
There are valid reasons to distribute load between redis and your database,
per query performance ranks pretty low compared to your overall database load,
for example.

------
imaginenore
The biggest problem with this benchmark is it being single threaded.

Redo it with 32 or 256 concurrent (local) requests and see what happens. I bet
Redis will end up being quite a bit quicker.

