

Now available: auto-scaling PostgreSQL deployments - mrkurt
https://blog.compose.io/new-year-new-database-postgresql-on-compose/

======
akurilin
On this note, what is people's favorite way of scaling out Postgres? I'm told
Slony + master/slaves configs are good for scaling out read-heavy situations,
but it seems that there are a ton of alternatives on the market.

Is there a comprehensive guide on the various options and tradeoffs for this
kind of things? Good resources one could use to learn more about it?

~~~
bmurphy1976
Ugh, slony is awful. You use Slony when you have to do a live migration from
one version to another (say 9.3 to 9.4) without any downtime. WAL replication
is the way to go most of the time. I've used Londiste, it's better than Slony
if you need partial replication, but still messy to use compared to WAL.

Personally I haven't used any of the others (i.e. Bucardo).

~~~
akurilin
Would you be so kind as to provide context on why WAL replication is superior
to logical replication? I'm very interested in learning more about it.

Also, what specifically did you hate about Slony?

Any resources you'd recommend on this?

~~~
rpedela
WAL replication is currently the official and best supported mechanism for
replication. If WAL replication does not suite your particular use case, then
that is when to start looking at Slony, Bucardo, etc.

------
buro9
This is great, but two things:

1) Where are you physically located? As I would want to move my API servers
very close to the database for performance. Knowing which city and datacenter
would help.

2) Please fix the logo in to the top left of your blog to go to your main
website. It's a UI fail when logos do not go to the main website even when the
blog is on a sub-domain.

~~~
mrkurt
Postgres is currently in Ashburn VA (near AWS us-east-1) and Dublin, Ireland
(eu-west-1).

Good catch on the logo, that irks me too.

~~~
buro9
It would be great if you had more locations that favour non-Amazon customers
though I understand why you would prioritise that first.

I mostly use Linode
[https://www.linode.com/speedtest](https://www.linode.com/speedtest) and the
locations of AWS are generally good for Amazon but bad for other services. For
example in London nearly all smaller hosting providers, as well as major
peering connectivity, is based around the LINX locations such as Telehouse.

Latency to the database is key, but without moving to AWS (which I wouldn't
want to do for price and performance reasons) I couldn't achieve a low enough
latency here to consider it.

Another thing... "sign-up for free" quickly followed by "enter payment
method". Which is it? I wanted to sign up, in part just to receive future
notifications and also to get a sense as to the qualitative feel of your
dashboard tools. For reasons outlined above, I'm not going to buy a service
today.

------
mrmondo
Great to see but I'd like some real numbers on the performance, in my
experience AWS' poor storage performance has always been a show stopper for
large database especially when trying to scale them if integrity is crucial to
your platform.

~~~
mrkurt
We don't use AWS's storage — at least not EBS. We run on our own hardware and
on the i2 instances on AWS that have high performance ephemeral SSD arrays.
Our benchmarks of the i2s show their IO performance is quite good, as you'd
expect of local SSDs.

~~~
hydrogen18
How do you get persistence if all your storage is ephemeral?

~~~
mrkurt
It's not all ephemeral, the physical servers we run are old-school redundant
(RAID-10, two power supplies, etc). Even ephemeral is persistent — it's just
something that'll go away if you migrate your instance to new host hardware.

Deployments are on two servers and the write ahead log is streamed to both the
slave and offsite secondary storage. Replication is async so there _is_
potentially a small window of data loss if a whole server goes. We are
considering letting people opt in to synchronous replication, but don't have
it available yet.

~~~
hydrogen18
Ephemeral storage is just that - ephemeral. Amazon can decide to retire your
instance at any time.

So your persistence plan is RAID 10 it sounds like.

------
gabrtv
Not many PostgreSQL DBaaS offerings out there. This is great news.

------
sciurus
Why should I choose this over Amazon's RDS?

------
halayli
if you aren't going to tell me how the scaling is implemented, how it impacts
ACID, and describe your edge cases well, then It's going to be hard to trust
the solution.

~~~
mrkurt
We don't have enough content about this yet, but we run a very vanilla
Postgres setup on big, beefy servers and scale resources vertically, similar
to how we start with MongoDB: [https://blog.compose.io/how-we-scale-
mongodb/](https://blog.compose.io/how-we-scale-mongodb/)

Scaling Postgres horizontally is not something most of our customers need
right now. When we do release scale-out, it'll be obvious to customers how we
do it — we might just make something from our good buddies at Citus Data
available, for instance.

~~~
halayli
Got it. thanks for the clarification.

------
marbemac
First of all, this is great news - I use compose for Mongo and have been
waiting for a PostgreSQL option. However, is the ram allotment similar to that
given in the Mongo deployments - 1/10th storage? So, about $125/month for 1GB
ram and 10GB storage. Making the obvious comparison to Heroku (which, granted,
doesn't offer the autoscaling feature), Compose looks quite expensive. At a
glance, it seems that on Heroku one gets the same amount of ram and 6 times
the storage for less than half the cost of Compose.

~~~
Thaxll
People are paying $125/month for 1GB memory and 10GB space? I see how AWS is
making so much money, those prices are just insane.

~~~
wiz21
I sense you've never worked in a suit & tie enterprise...

If you have backups, upgrade, access to monitoring tools, etc. 125$ / month is
just 2 hours max of developer time.

That's a _bargain_ and your standard Oracle admin should just feel threatened
:-) (provided the company is willing to put its data on a server far away of
its control room, which, I guess, doesn't happen so often :-))

so a bargain if you actually can make use of it...

~~~
Thaxll
People that use Oracle don't even read HN, it's another league.

~~~
sanderjd
Out of curiosity, what do they read?

~~~
mrkurt
It's probably more accurate to say that people who buy Oracle don't read HN.
Developers that use Oracle might read HN, and these are the people that are
driving the future of databases. There's a reason most new DBs have an open
source business model. :)

People who buy Oracle are likely reading management publications, and possibly
Gigaom. They're not reading technical discussions of databases.

------
gasping
PostgreSQL wants to be web scale but it will never touch MongoDB scale.
MongoDB is true web scale with full cloud compatibility and horizontal scaling
like how clouds spread out in the real world. MongoDB mimics physics because
physics is green technology. MongoDB is truly efficient with zero carbon
footprint unlike PostgreSQL which is like diesel exhaust clogging up your
network pipes when you try to shard upwards and outwards into the virtual
scalable cloud atmosphere. PostgreSQL chokes your environment and doesn't
support 10gen's new invention the MAPREDUCE which is the successor to outdated
SQL. If you use PostgresSQL with auto-scale your will never support big data
but MongoDB can scale up to even 100GB of big rich object data without
relations so the data truly represents your client's needs.

~~~
julien_c
This joke got old quite a long time ago.

(And no I promise it's not because I'm using MongoDB.)

------
rpedela
Cool, but why is it so expensive? $12 per GB per month? It is hosted on AWS so
what does "high-performance" mean?

~~~
fizx
Presumably they don't also charge separately for requests, CPU cycles, or
bandwidth, so the storage costs include that.

~~~
mrkurt
Correct. Storage is just the easiest way to define a service that's sold as a
usage based utility. The price includes all the hardware resources, our
support staff, DBA tools, etc.

------
rubiquity
I'm trying to understand why this is being upvoted so much.

~~~
brlewis
I upvoted in the hope that I'll find out that "auto-scaling" means it
automatically scales to large numbers of reads and/or writes, which would
amount to one awesome service. Since pricing is based on storage, I'm inclined
to think it only auto-scales to high storage needs. I'm hoping to be wrong.
The part where they say release 9.4 brought them within reach technically of
what they wanted to do gives me some hope that I'm wrong.

~~~
mrkurt
It automatically scales all resources based on data size. This means
increasing IOPs, RAM allotments, and CPU capacity on the fly as the data
grows. At 1TB of data, for instance, the DB would have access to 100GB of RAM,
about 60,000 random IOPs, and 12 full CPU cores.

~~~
splitrocket
Is dataset size the only autoscaling criteria? I.E. my data set is relatively
small but with very large transaction volume.

Additionally, do you have standard postgres modules installed? Specifically,
at least for my use case, PostGIS?

~~~
mrkurt
PostGIS is coming soon, the contrib extensions are all available to be turned
on for DBs.

Autoscaling is currently datasize only. You can scale deployments up manually,
however, for DBs where our 1/10 ratio isn't quite right.

~~~
splitrocket
Thanks!

