
PostgreSQL Scalability: Towards Millions TPS - lneves
http://akorotkov.github.io/blog/2016/05/09/scalability-towards-millions-tps/
======
gjolund
Postgres has been my DB of choice for nearly a decade. The only times I wind
up working with another db are because:

(1 it is a better technical fit for a very specific problem

(2 there is already a legacy db in place

I have been voted down at a couple of startups that wanted to run a "MEAN"
stack, invariably all of those startups moved from MongoDB or shutdown.

The only time I will advocate for anything other than Postgres is when
Wordpress is involved. If the data model is simple enough then MySQL is more
than up for the task, and it avoids an additional database dependency.

Thankfully all the ORM's that are worth using support MySQL and Postgres, so
using both is very doable.

### Useful Postgres (or SQL in general) tools/libraries :

Bookshelf ORM [http://bookshelfjs.org/](http://bookshelfjs.org/)

PostgREST automated REST API
[https://github.com/begriffs/postgrest](https://github.com/begriffs/postgrest)

Sqitch git style migration management [http://sqitch.org/](http://sqitch.org/)

~~~
manigandham
> invariably all of those startups moved from MongoDB

Why? Especially after point #1 and assuming the document-store was a good fit
for the data model.

~~~
takeda
Because NoSQL is a hype.

Many of the NoSQL essentially takes us back to 60s before Codd came up with
relational model[1] These ideas tend to come back once in a while [2][3], but
so far nothing is better than relational model.

NoSQL still makes sense in many cases (generally when your specific use case
does not need all guarantees of ACID), you can get in return higher
performance or horizontal scalability.

MongoDB is aim to be generic database but rides on the wave of NoSQL
"coolness". It was created by people who had no experience in databases and
are learning as they go[4]. As they started adding things that are essential
for database they realized it's not that simple.

Currently MongoDB is outperformed by Postgres. In fact there is an application
called ToroDB which provides protcol compatibility for Postgres that emulates
MongoDB and even that outperforms Mongo. Mongo also doesn't scale well
horizontally, so essentially you don't really don't get any significant
advantage.

[1]
[https://en.wikipedia.org/wiki/Hierarchical_database_model](https://en.wikipedia.org/wiki/Hierarchical_database_model)

[2]
[https://en.wikipedia.org/wiki/Object_database](https://en.wikipedia.org/wiki/Object_database)

[3]
[https://en.wikipedia.org/wiki/XML_database](https://en.wikipedia.org/wiki/XML_database)

[4] For example they started with mmap'ed memory regions to store the data.
Did not initially use fsync() to make sure data is saved on disk. In a way it
reminds me of MySQL several years ago. It's much better now than it was in the
past, but it has a lot of warts left of from the past.

~~~
stickfigure
_Currently MongoDB is outperformed by Postgres_

I hate hearing absolutist dogma like this. Some things are faster in Postgres,
some things are faster in Mongo. We are migrating our analytics db from Mongo
to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally
in Postgres. Mongo's aggregation framework is an immature PITA but it performs
this little trick well enough that we're pretty much stuck keeping Mongo
around unless we want to use MSSQL or Oracle or something else with a better
(and much more expensive) query planner.

We still love (and prefer) Postgres but it is not a pareto improvement. There
are always tradeoffs, and this kind of fanboyism just speaks to inexperience.

~~~
sitharus
There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres,
as long as you have an appropriate table design and indexes.

Also MSSQL's query planner isn't better than Postgres', I work with both.
Postgres does have its quirks though, especially with the MVCC row expiry.

~~~
anarazel
> There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in
> Postgres, as long as you have an appropriate table design and indexes.

Meh. Postgres' planner doesn't know how to generate a skip-scan/loose index
scan for DISTINCT. You can write it yourself, but it's a bit painful:
[https://wiki.postgresql.org/wiki/Loose_indexscan](https://wiki.postgresql.org/wiki/Loose_indexscan)

If you have a low cardinality that can be a _huge_ efficiency difference.

~~~
merb
Actually that's wrong:

    
    
        EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
                                                              QUERY PLAN                                                           
        -------------------------------------------------------------------------------------------------------------------------------
         GroupAggregate  (cost=10713.04..11381.06 rows=10 width=5) (actual time=1010.383..1073.263 rows=11 loops=1)
           Group Key: calc
           ->  Sort  (cost=10713.04..10935.68 rows=89056 width=5) (actual time=1010.321..1049.189 rows=89041 loops=1)
                 Sort Key: calc
                 Sort Method: external merge  Disk: 1392kB
                 ->  Seq Scan on price_history  (cost=0.00..3391.56 rows=89056 width=5) (actual time=0.007..20.516 rows=89041 loops=1)
         Planning time: 0.074 ms
         Execution time: 1076.521 ms
        (8 rows)
    
    

With Index:

    
    
        EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
                                                                             QUERY PLAN                                                                          
        -------------------------------------------------------------------------------------------------------------------------------------------------------------
         GroupAggregate  (cost=0.29..2804.82 rows=10 width=5) (actual time=0.117..47.381 rows=11 loops=1)
           Group Key: calc
           ->  Index Only Scan using price_history_calc_idx on price_history  (cost=0.29..2359.52 rows=89041 width=5) (actual time=0.054..18.579 rows=89041 loops=1)
                 Heap Fetches: 83
         Planning time: 0.208 ms
         Execution time: 47.416 ms
        (6 rows)
    
    

Actually that is called a index only scan, and happens when you have a data
type that is inside your index. Which means if you need a aggregate you could
try to index everything you need. Mostly a aggregate only contains some values
of a row so a index is mostly not a problem.

~~~
anarazel
I didn't say an index couldn't be used at all. Just not to actually make the
query fast. This will get all duplicates for a value from the index, before
going to the next value. If you have a couple thousand or more of each to be
counted value that'll make the query rather slow.

------
jasonmp85
Andres is a great coworker to have at Citus Data, though I first ran into him
on the mailing lists shortly after starting at Citus myself. I was tasked with
figuring out "why do certain read-only workloads fail miserably under high
concurrency?"

I had never touched PostgreSQL before, nor any Linux performance tools, but I
noticed that replacing certain buffer eviction locks with atomic
implementations could drastically help this particular case. I emailed the
list about it and Andres was someone who chimed in with helpful advice. I
wrote up what I'd discovered in my deep dive here: [http://tiny.cc/postgres-
concurrency](http://tiny.cc/postgres-concurrency)

Turns out Andres was already working on a "better atomics" patch to provide
easier methods of using atomic operations within PostgreSQL's code base (my
patch was a quick hack probably only valid on x86, if that). It's been useful
in removing several performance bottlenecks and—two years in—it looks like
it's still paying off.

~~~
AbacusAvenger
I wonder, does Intel's TSX/HLE help with these workloads? If it's read-only
then I'd expect that it'd be able to elide a lot of the locking (assuming the
Intel-designed heuristics do the job).

~~~
anarazel
I'd bought one of the first haswell notebooks to play around with tsx. Before
I'd time to do so Intel found the tsx bug...

I hope to have time to play around with it one I have new hardware (I refuse
to do performance development on virtual).

But honestly, most remaining performance/scalability problems in pg are more
algorithmically caused. So micro optimization, and that's what I'd call
tax/hle, aren't likely to biggest bottleneck.

~~~
ziedaniel1
Wasn't TSX only enabled on Xeons to begin with?

~~~
anarazel
No, the higher end i7 mobile processors had it as well (i7 48xxhq or something
IIRC).

------
snhkicker
I don't know much but to me it seems PostgreSQL is probably one of the most
open and supporting communities maybe this is reason alot of new faces are
looking at it including me.

~~~
morgante
It's also a great example of a technology which is mature and well-tested yet
actively growing and improving. Most open source projects have a lot to learn
from Postgres.

~~~
movedx
What would you say those learning points are? What key factors would you
introduce into another OSS project that PostgreSQL currently employs?

~~~
stubish
One of the weirdest is that there is no bug tracker. When things are found to
be broken, they tend to get fixed immediately and there is nothing to track.
Feature and incremental improvement work seems an arduous process, where you
need to own your work and deal with extensive reviewing (no tossing it over
the wall for others to maintain).

------
graffitici
Are there any best practices for using PostgreSQL for storing time series
data? Would it be comparable in performance to some of the NoSQL solutions
(like Cassandra) for reasonable loads?

~~~
thom_nic
I decided to tee my timeseries data into InfluxDB. Purpose built for the task
and has builtin support for rollups/ aggregation/ retention policy/ gap
filling. Admittedly I have not put Influx under much stress or scalability
testing since my use case is more based on utility than performance.

Unless PG has some timeseries-specific extensions I have assumed it would be
appropriate for a TS-specific database. Also curious to try Riak TS.

~~~
siculars
/plug/ Riak TS was just released open source that does just this. I wrote a
post on it, [http://basho.com/posts/technical/time-series-the-new-
shiny/](http://basho.com/posts/technical/time-series-the-new-shiny/) .

Disclaimer: I work for Basho, makers of Riak TS.

------
bnchrch
PostgreSQL has continued to be one the best open source en-devours so far. An
amazingly smart and welcoming community that turns out arguabley the best in
class relational database. Kudos and keep innovating team PG.

------
devy

         In partnership with IBM we researched PostgreSQL scalability on modern Power8 servers. 
    

That statement and the linked Russian blog white paper[1] makes it seem like a
Power8 specific and Power8 is a "a massively multithreaded chip"[2]. I wonder
how far off it would be to x86-64?

[1]
[https://habrahabr.ru/company/postgrespro/blog/270827/](https://habrahabr.ru/company/postgrespro/blog/270827/)

[2]
[https://en.wikipedia.org/wiki/POWER8](https://en.wikipedia.org/wiki/POWER8)

~~~
dimfeld
The article says a few lines down, "The optimization #1 appears to give huge
benefit on big Intel servers as well, while optimization #2 is Power-specific.
After long rounds of optimization, cleaning and testing #1 was finally
committed by Andres Freund."

Also, the chart indicates that the benchmarks shown were run on Xeon chips.

------
jrcii
Slightly tangential but I'm genuinely curious, does any have a theory as to
why nearly every RDBMS post on Hacker News is about Postgres and almost never
MySQL or MariaDB? Considering the relative obscurity of the former it seems
somewhat inexplicable.

~~~
combatentropy
> why nearly every RDBMS post on Hacker News is about Postgres and almost
> never MySQL

You'll see the same phenomenon on Slashdot.

MySQL is popular among a subset of programmers: web developers. In
corporations, Microsoft SQL and Oracle are more popular. Further, MySQL is
popular among a subset of web developers: those who use PHP. Among web
developers who use Python, Postgres seems more popular.

My suspicion is that MySQL's popularity is tied to those web-hosting plans (1
GB of storage! 1 TB of bandwidth! 10 databases! $10/month!). They were almost
always MySQL databases. Web hosts that offered Postgres databases were few and
far between. This article corroborates my theory:
[http://rodner.blogspot.com/2008/01/what-makes-mysql-so-
popul...](http://rodner.blogspot.com/2008/01/what-makes-mysql-so-
popular.html). MySQL's company (which at the time was "MySQL" not Oracle)
pushed it in at a critical time in the development of the web.

I have written applications that use: Postgres, MySQL, SQLite, Microsoft SQL,
and Oracle. My favorite by far is Postgres. The strangest thing to me is not
that MySQL is more popular than Postgres, but that anyone uses Microsoft or
Oracle at all. Not only do they cost a lot, but from a purely technical
standpoint they are worse.

~~~
manigandham
> Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they
> cost a lot, but from a purely technical standpoint they are worse.

Based on what? Both commercial databases have incredible features, tooling and
extensions that leave postgres behind. Postgres today doesn't even have a
solid scale-up or scale-out strategy.

It does have nice SQL support and makes developer lives a little easier but
this isnt anywhere close to making it the absolute winner technically.

~~~
jeltz
The very article you are commenting on is about scaling up with PostgreSQL.
PostgreSQL is excellent at scaling up with many small queries, but has
problems with few large ones.

~~~
manigandham
This is a brand new commit, it's not today. There's been some good progress
recently but Postgres historically hasn't been the best at scaling up,
especially compared to the commercial engines.

------
tormeh
How does PostgreSQL compare to VoltDB?

I'm trying to get a handle on the different databases, and VoltDB sounds
exciting, but everyone's talking about PostgreSQL. Then there's Mnesia which I
hear is, as all things Erlang, excellent, though it's kinda tied to Erlang.

I know it's hard to say what's best, but what would you say is the best DB for
a completely new multilingual project that needs throughput but prioritizes
low latency, for example?

Also, VoltDB is licensed under AGPL. Does this mean that it can't be used in
commercial projects? Or is it OK as long as the other components are on
different servers or similar?

~~~
Alex3917
> How does PostgreSQL compare to VoltDB?

If you don't know the difference, you probably want Postgres.

VoltDB is a specialty database for things like high frequency trading. It
wouldn't make sense to use for, say, a consumer app or web startup.

~~~
SEJeff
In specific, it is a column store, which is advantageous to do things like
real time analytics over millions of data points via streaming market data.
This has uses for HFT, but also for anyone who wants to do their own day
trading.

~~~
Jweb_Guru
VoltDB isn't a column store. It's designed for serializable OLTP workloads
with really fast index updates, neither of which characterize column stores.
You may be thinking of one of Stonebraker's other projects, Vertica.

~~~
SEJeff
Gah, you nailed it. Sorry about that. Right guy, wrong db project that starts
with a V.

------
Ono-Sendai
Sounds like the padding stuff is a false sharing issue. They might want to
look into putting 128 bytes of padding between the data structures as well:
[http://www.forwardscattering.org/post/29](http://www.forwardscattering.org/post/29)

~~~
anarazel
That's what the discussed padding patch does (except to only padding to
64bytes on most platforms). There's a downside though - on low concurrency the
padding reduces the cache hit ratio sufficiently enough to cause a slowdown.

Given we're in code freeze anyway, I've not spent a lot of though on that yet;
but I suspect that rearchitecting things so the lines are dirtied fewer times,
is the better fix; with less potential for regressions at lower client counts.

~~~
Jweb_Guru
Would it be possible to dynamically choose the padding at server start time?
Given that cache line sizes vary that seems like it might be a prudent choice.

~~~
anarazel
> Would it be possible to dynamically choose the padding at server start time?

I doubt it. Allowing the compiler to generate accesses with lea et al. is
quite beneficial; and that'd likely be gone by making this not be a compile
time constant.

It'd also end up being a tuning knob very very few knew how to tune...

> Given that cache line sizes vary that seems like it might be a prudent
> choice.

They usually only vary between architectures. Netburst IIRC was the last time
x86 cache line sizes varied. If there's any doubt it's usually ok to just use
the higher (128 byte) line size, the "unused" padding cache-line will never be
accessed and thus not occupy cache space.

~~~
Ono-Sendai
cache line size is 64 B on recent intel CPUs.

~~~
anarazel
That's why I said that we'd use a smaller cache-line size for most
architectures. PPC uses 128byte tho..

~~~
Ono-Sendai
ok.

------
jlgaddis
Man, how I wish WordPress had originally chosen to use PostgreSQL instead of
MySQL back in the day.

~~~
the-dude
[http://www.hawkix.net/pgsql-for-wordpress/](http://www.hawkix.net/pgsql-for-
wordpress/)

[https://wordpress.org/plugins/postgresql-for-
wordpress/](https://wordpress.org/plugins/postgresql-for-wordpress/)

~~~
nisa

        Compatible up to: 3.4.2
        Last Updated: 2 years ago
        Active Installs: 400+
    

WordPress 4.5 is current. Anyone doing WordPress will just skip this if they
don't have developers that can fix the issues. Also while the base may work a
lot of popular plugins (used to?) don't utilize WP_Query or whatever else
WordPress offers.

It's also probably easier to add indexes to the code or rewrite the logic.
Mostly it's bad database code that sucks on MySQL and would likely suck the
same way on PostgreSQL.

Besides that if you use an object cache like memcache or redis you can avoid a
lot of database accesses and InnoDB seems to be able to deal with concurrent
tables like wp_comment.

Would be cool to see in WordPress itself but I doubt it.

------
hinkley
Always nice to hear about throughput improvements in Postgres.

How do these changes affect more heterogeneous workflows, of mixed reads and
writes? A little better? A lot better? A little worse?

~~~
anarazel
Very dependent on the workload. The optimization isn't specific to reads or
writes, but in many cases your bottleneck when writing will be elsewhere.

------
hbrid
Postgres is still going through the motions of a transaction for every query
you issue it even if nothing else but that transaction is happening on the
server. So obviously if you add extra load in the form of writes, you may slow
your reads down, but this was not a full benchmark, but instead a comparison
of the same workload running against multiple versions of Postgres.

~~~
Tostino
Huh, that's funny to see a comment of mine copied word for word from Reddit:
[https://www.reddit.com/r/programming/comments/4in70l/postgre...](https://www.reddit.com/r/programming/comments/4in70l/postgresql_scalability_towards_millions_tps/d2zo1kl?context=3)

------
znpy
Dumb question: in this context, "TPS" means... ?

T<what?> Per Second?

~~~
mrjsson
Transactions Per Second

~~~
znpy
Thanks :)

------
ionheart
Big thanks for the community for the hardwork! I wonder if we set the
"sync=off" in the test, will it be way higher than the OP results?

~~~
jeltz
Since this is a read-only benchmark it won be affected by either
synchronous_commit=off or fsync=off (do not turn off fsync for any data you
care about, it can be silently corrupted on a crash!).

------
malloryerik
Any opinions about AWS' SQL database, Aurora?

~~~
pritambarhate
These 2 articles on Aurora by Vadim Tkachenko, the CTO and co-founder of
Percona are very informative -
[https://www.percona.com/blog/2015/11/16/amazon-aurora-
lookin...](https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-
deeper/) and [https://www.percona.com/blog/2015/12/03/amazon-aurora-
sysben...](https://www.percona.com/blog/2015/12/03/amazon-aurora-sysbench-
bencmarks/)

------
gnarbarian
The more efficient we can be at completing TPS reports the better. I must
spend upwards of 40% of my time on them.

~~~
gaius
Did you get the memo?

~~~
juhq
Ah! Yeah. It's just we're putting new coversheets on all the TPS reports
before they go out now. So if you could go ahead and try to remember to do
that from now on, that'd be great. All right!

