
Lessons learned scaling a PostgreSQL database to 1.2bn records/month - willvarfar
https://medium.com/@gajus/lessons-learned-scaling-postgresql-database-to-1-2bn-records-month-edc5449b3067
======
sandGorgon
> _The primary reason we migrated away from Google SQL for PostgreSQL is
> because we discovered a bug that was causing data corruption. This was a
> known bug that is fixed in newer PostgreSQL versions. However, Google SQL
> for PostgreSQL is several versions behind. The lack of response from the
> support acknowledging the issue was a big enough red-flag to move on. I am
> glad we did move on, because it has been 8 months since we have raised the
> issue, and the version of PostgreSQL has not been updated to-date_

We have had similar experiences on various DB related issues. AWS are miles
ahead on anything to do around storage and databases.

~~~
honkycat
Google cloud support is PATHETIC.

We had a production Kubernetes outage caused by a bug in Google Cloud. We were
paying for gold level support.

They deescalated and reescelated our ticket 3 times.

In the end they closed the ticket and sent me a link to "Architecting
distributed application" docs. I think they were trying to be condescending.
That idiot should have been canned.

Eventually I threw a big fit in the Google Cloud slack channel and thockin
ended up looking at the issue and finding the bug.

~~~
thockingoog
LOL. Thanks for the shout out. I am sorry you had a bad experience. Please be
assured that we are working on it, and that we take this seriously.

~~~
honkycat
If I may be so bold, where should I direct this complaint so that people hear
it:

When you delete a CloudSQL instance, it also deletes the back-ups associated
with that instance along with it.

There is also no way to mark a CloudSQL instance as "protected" so one bad
keypress can lose you your production database and all backups.

This is... cartoonishly dumb. Clownish.

\- Cloudsql instance deletion protection should exist

\- Those backups should be preserved somewhere when I delete the instance.

~~~
sandGorgon
agreed - backups have nothing to do with an instance. i should be able to spin
up and down postgresql instances, but backups should be stored (and billed) as
per cloud storage rates.

------
lettergram
So my PostgreSQL deployment processes quite a bit more than 1.2bn records /
month (closer to 10bn / month).

What I have to say for AWS (which I use), the back up is why I use AWS. The
ease of backing up, rolling back, and connecting additional services, and
scaling make it killer.

That being said, I’ve seen around 5x - 10x speed improvement for the same
exact cost utilizing bare metal servers. I’ve been using online.net/scaleway
and similar services. However the real value save is limited as you need to do
a lot more work and often pay more for backing up. Overall, real savings might
be 50% cost reduction, with some additional risk. Need more space? Launch a
new server, copy, check, switch over, pray. With AWS it’s one click

The real question is what do you want to manage the database or not and how
much time you have.

~~~
rattray
The author mentioned using
[https://www.pgbarman.org](https://www.pgbarman.org) for this, with a
freelance DBA.

Does anyone else have experience between the two options?

~~~
jsiepkes
I can attest that pgbarman works great. _cough_ I accidentally nuked a
production server once. However thanks to barman the downtime was only tens of
minutes.

It works by making a full backup at fixed intervals (say weekly) and then
"pretending" to be a replica which makes it receive all the changelogs in
realtime. (Basically it receives and archives WAL's). This makes point in time
recovery possible.

Highly recommended.

------
yeahitslikethat
The article confirms the fact the best cost to performance ratio is colocated
hardware you build.

Next is managed server.

Last: any of the cloud database providers.

The key piece is knowing what you are doing. The clouds don't care about your
tiny problems. They won't fix things for you.

You must fix things for you.

~~~
Heliosmaster
Exactly my experience. And not only about Databases.

At a previous job I was able to cut our infrastructure expenses by a lot (70%)
by essentially have our infra being 1 load-balancer, 2 beefy application
servers and 1 (+1 failover) PostgreSQL database. Each of the application
server costs 300~ Euros p/m , 24 cores and 256GB of RAM. I have not
experienced the same cost to perf ratio anywhere else.

~~~
griffordson
What was your salary, and the salary of additional people to increase the bus
factor to an acceptable level? There is a lot more than raw perf numbers that
go into these calculations for any responsible business owner.

~~~
Shish2k
FWIW I'm in a similar situation, running a moderately sized site (10 servers,
~2gbps) on leaseweb hosts, costing ~$1,500/mo - every year or so I look at
cloud offerings, and see that the bandwidth alone would be something like
$40,000/mo.

I also have ~10 smaller hobby projects (<100 monthly active users) which could
easily fit inside the smallest cloud services, but the smallest on offer seems
to be $5/mo, so $50/mo in total. For $25/mo I can get a pair of bare metal
servers which host all of them with failover and a ton of capacity to spare.

Then my actual day-job is at the opposite end of the scale, where we build our
own datacenters...

I am very confused by the movement towards using the cloud for everything,
because I work at all sorts of scales, and can't find a cloud host who works
well for any of my use-cases :S

~~~
foggyToads
Do you mind if I ask what service you're using to host your hobby projects?

~~~
Shish2k
Kimsufi - AFAIK they get their low prices by using second-hand hardware, but
I've not found that to be an issue (hard drive failures are a little more
common than the leaseweb servers, but "more common" is still only "one failure
per server per 5 years" \- and that's easily mitigated by setting up a pair of
servers with failover)

------
willvarfar
And I have built a very similar, perhaps even slightly larger, system, only in
MySQL with TokuDB :)

These free old-fashioned RDBMSs really can go fairly big and stay really fast;
much bigger and faster than the noSQL stuff I also use.

One thing that pushes people towards noSQL is that it is easier to get
started; but I think that I've saved buckets of time in the long run by going
with a conventional relational database.

ymmv but its always worth thinking 'is this problem too big for a classic free
RDBMS?' and the answer is almost always 'hell, no!'

------
gomezjdaniel
Thanks for the read!, a couple of questions:

> Now we are renting our own hardware and maintain the database. We have a lot
> better hardware than any of the cloud service providers could offer

Which provider are you using?

> That 30% we can use to hire a freelance DBA to check in on the servers once
> a day.

Where did you find this freelancer?

~~~
emddudley
> hire a freelance DBA to check in on the servers once a day

Not sure whether you were just interested in the author's specific freelancer
but--

The term for this is "Remote DBA" and there are lots of them. I suppose the
difficult part is confirming trustworthiness and skills before giving them
direct access to your database.

------
ngrilly
Great example of a job queue correctly implemented inside a transactional
relational database (with SKIP LOCKED and a table dedicated to the job queue),
instead of using a specialized tool.

~~~
ddorian43
When you don't need insane scalability, having a transactional queue is a
godsend.

~~~
ngrilly
For the record, it's possible to have scalability and transactional enqueuing
at the same time, with something like Google Cloud Datastore:
[https://cloud.google.com/appengine/docs/standard/python/data...](https://cloud.google.com/appengine/docs/standard/python/datastore/transactions#transactional_task_enqueuing)

~~~
ddorian43
Yeah, with sharding. And global transactions. Which are costly. As long as you
pass the cost to the customer. But if you're doing a lot, then you can't use
them.

------
hestefisk
Very interesting read. Having worked myself on large Postgres databases, I
would have loved to know about these tuning opportunities. It’s disappointing
to see Google providing such poor support for Postgres though in Google Cloud.
Even Microsoft provide first class support. They need to up their game.

------
elken
Very interesting read. I'm not convinced that Postgres should be used for
every aspect of a data processing pipeline yet but the author has certainly
proved it is possible. You also seem to get a lot of features for free that
people using Kafka can only dream of.

It will be interesting to see how people will take advantage of new Postgres
features in the future.

------
manigandham
What is Aiven supposed to do about a bug in Timescale that is still open?

Aiven is great if you don't need the superuser access. They will run some
bigger plans on local SSDs and allow you to create replicas and migrate across
clouds and regions easily. It's a great way to run multi-cloud without
shipping the data yourself. True that they could use more alerting features on
resource usage.

GCP managed services of open products are highly unrecommended for this exact
reason of being very behind or even obsolete with very limited customization
and control. Stick with BigTable, BigQuery, Dataflow and run the rest
yourselfs on GCE/GKE.

~~~
oskari
Unfortunately we were not able to offer a concrete workaround in this case,
but as mentioned to the author of the post, we were and are in contact with
our friends at Timescale to resolve this, and an update with the fix will be
available in Aiven shortly.

We are not the upstream for any of the engines running in Aiven, so we work
with the upstream developers and communities to resolve any issues found.
Often our involvement in fixing the issues is through providing logs, stack
traces and other debugging information to the original authors of the code,
but in some cases we also fix the issues ourselves and always contribute such
fixes back upstream, see e.g. this commit for a PostgreSQL bug that was found
by our customers and fixed by us:
[https://github.com/postgres/postgres/commit/07ef035129586ca2...](https://github.com/postgres/postgres/commit/07ef035129586ca26a713c4cd15e550dfe35e643)

~~~
sandGorgon
> _In general, I didn’t understand what added value Aiven.io provides – we
> weren’t even warned when the database was running out of storage._

We have been considering using Aiven on GCP. Could you comment on the
statement above ? cos that's a little worrying.

~~~
htn
It's impossible to comment the specific circumstances with the report, but we
do generally monitor and warn our users on running low on resources - be it
storage, CPU or memory. I admit that the alerting is by no means foolproof; we
may miss reaction on rapid or sudden changes in the usage patterns, but the
alerting works quite well on more steady and common workloads and usage
patterns.

Should the worst case scenario happen and the storage run out, there's always
an option to upgrade to the next resource tier size. This will restore the DB
state to the latest successfully recorded transaction.

------
datenhorst
> _Column order matters. We have tables with 60+ columns. Ordering columns to
> avoid padding saved 20%+ storage_

I need to look into this

~~~
jihadjihad
This was an informative read: [https://blog.2ndquadrant.com/on-rocks-and-
sand/](https://blog.2ndquadrant.com/on-rocks-and-sand/)

------
ggm
I will never complain about my minor pg issues again. The beast that is
running here says whatever else, my problems are small.

I'm interested in learning more about sharded design, time series and json
data if anyone else reading this or the OP wants to write more.

------
garyclarke27
Thanks for writeup. Yes PostgreSQL is an incredible database except for it’s
Mat Views, IMO this is their biggest weakness vs Oracle, SQL Server mat views
are also poor with umpteen restrictions. With PG any view can be a Mat View,
problwm is they don’t have incremental refresh or auto refresh. So not v
useful, I roll my own, using custom tables and transaction id’s, not too
difficult since all are auto generated but would be nice if Postgres added
incremental refresh. I think this their most urgent missing capability that
should be top of the list for v12.

------
patwolf
_> The primary reason we migrated away from Google SQL for PostgreSQL is
because we discovered a bug that was causing data corruption_

Is anyone familiar with the details of the data corruption bug? I'm using
Cloud SQL and would like to understand the risk.

~~~
bluecheese
My Cloud SQL instance is currently on 9.6.6, I just got an email from GCP
saying they are upgrading to 9.6.11 next week. There is a data corruption bug
mentioned in the 9.6.7 release notes, not sure if it's the one the author is
referring to.

[https://www.postgresql.org/docs/9.6/release-9-6-7.html](https://www.postgresql.org/docs/9.6/release-9-6-7.html)

"Fix vacuuming of tuples that were updated while key-share locked (Andres
Freund, Álvaro Herrera)

In some cases VACUUM would fail to remove such tuples even though they are now
dead, leading to assorted data corruption scenarios."

------
threeseed
> The takeaway here is that Google and Amazon prioritise their proprietary
> solutions (Google BigQuery, AWS Redshift)

Except this is nonsense and there is no evidence to back it up.

AWS Redshift is a replacement for an enterprise data warehouse. It is not
suitable for a website, OLTP or for many analytics use cases and AWS has never
claimed it to be as such. It's why they they offer a suite of different
paradigm databases e.g. Aurora, DocumentDB, DynamoDB, Neptune and Redshift.
Because no one database fits every use case.

Also using a database as a job queue really is a strange thing to do when you
are running on the cloud and there are dedicated services available
specifically for this use case.

~~~
lr4444lr
I don't think that's what he's implying. He means I think that the cloud
providers are not very friendly to devs who will take the time to customize
their RDBMS to squeeze more performance out instead of switching over to a
more expensive solution capable of handling the kind of high volume OLAP he
had to deal with.

------
Rafuino
You ended up co-locating your own hardware, saying "We have a lot better
hardware than any of the cloud service providers could offer." Can you clarify
then what kind of hardware you're using? All SSDs, HDDs, how much capacity,
are you going for all in-memory transactions, etc., etc.?

------
eatonphil
I'm confused why hosting PostgreSQL themselves on GCP or AWS was not
considered an option?

~~~
shoo
my understanding was that they were trying to outsource as much of the ops &
infrastructure side of things as possible. hence trying a third service
provider after GCP & AWS before giving up and doing it themselves.

~~~
eatonphil
I got that. :) But they went from managed-db, managed-host to unmanaged-db,
unmanaged-host when the clear middle ground is unmanaged-db, managed host.

Maybe I missed it.

~~~
gajus
Hi, author here scanning through the comments.

There are quite a few things that I had to leave out as the post was getting
really long. I am writing a follow up article now titled "How Kubernetes and
twelve-factor methodology saved us USD 300k in 24 months" where I am going to
touch on this.

The short version is that we never intended to stay with a cloud provider due
to the costs of operating a large cluster (we continue to use GCP for critical
components of our infrastructure, such as monitoring and alerting). It was
convenient while we could afford it thanks to their startup programs, but
migrating our data aggregation workers from cloud providers to dedicated
servers cut costs by 40% (which adds up to a lot, as we are using ±0.5k vCPUs
and _a lot_ of RAM).

~~~
justinclift
> Hi, author here scanning through the comments.

Small note, there's a typo in the heading of "Google Cloud SQL for
PosetgreSQL". ;)

PosetgreSQL -> PostgreSQL

------
LaserToy
Very interesting design. It looks like they are capturing events, and trying
to predict things. I wonder why they went with Uber database and didn’t use
cheaper S3 got storage, some processing for prediction and something simple
and cheap for serving.

------
agnivade
For the large tables, what about using timestamp based table partitioning to
split data into smaller units ? And then materialize each of them one by one ?

So you get to keep materialized data and a single logical table.

------
hstaab
GCP beats out AWS by a factor of 2? Interesting

------
ziont
1.2b/s try that its far moerchallenging

~~~
reallydude
thats the kind of problem i work on (millions per sec at peak). everything is
a queue and we stream from servers via kafka and into processors into
redshift, more or less. Its free once it's moving around in AWS, but redshift
and processors get expensive and im interested in reliable ad hoc solutions...
but havent seen it.

~~~
scapecast
can you elaborate on the process of copying data into Redshift? How often do
you run COPY commands? On a continuous basis? Asking since Redshift as an OLAP
isn't really optimized for heavy writes, so if you've figured out a way to do
it, please bottle that magic!

