
PostgreSQL: What happened to Hot Standby? - jawngee
http://it.toolbox.com/blogs/database-soup/what-happened-to-hot-standby-30391?rss=1
======
gfodor
As frustrating as it is, PostgreSQL users have always benefited from the lack
of defects in the software due to hard choices made like this.

------
aditya
Hmm. PgSQL is rock solid as an RDBMS, but the lack of dependable replication
options is quite annoying.

I'm curious to know if anyone here has actually used {londiste, pgpool2,
slony} to do master/slave replication and made it work well?

~~~
mdasen
It is a great database, but replication is lacking. Slony-I is the best of the
bunch currently, but it works by using triggers, requires you to set it up for
every table and key, scales quadratically, and just generally has a lot of
overhead (when compared to MySQL's replication). Mammoth Replicator is the
most interesting project to me since it operates a lot more like MySQL's
replication in that it works off log shipping. Currently they only have a beta
for download off their site
(<https://projects.commandprompt.com/public/replicator>), but I'm hopeful for
it.

Replication is a really hard problem to solve given all the little
inconsistencies that can creep in, but it's one of those areas that's holding
PostgreSQL back most. Hot failover and scalable read-only replicants are a big
selling point for MySQL. I really hope that PostgreSQL nails this area in 2010
because it's query planner is much better at handling complex queries (really,
even something as simple as a sub-query).

But you're right, it it annoying that PostgreSQL doesn't have a nicer
replication option.

EDIT: If performance isn't your biggest concern and you don't mind a highly
tedious setup, Slony-I is reliable. However, it's not suitable for larger
clusters of machines since communication costs grow quadratically.

~~~
neilc
_it's not suitable for larger clusters of machines since communication costs
grow quadratically._

Quadratically? Why?

~~~
mdasen
[http://www.pgadmin.org/docs/1.4/slony/slonylistenercosts.htm...](http://www.pgadmin.org/docs/1.4/slony/slonylistenercosts.html)

------
patrickg-zill
For all the noise about replication, the reality is that buying a server with
a) lots of RAM b) dual power supplies, each going to a separate UPS or power
feed c) dual or quad CPUs will get you 95% of the way there.

If you really care, get an external storage solution and buy two identical
servers; then you can switch the storage over to the working machine if the
first one fails.

So many people I know have sites that do not require fancy-pants replication -
if they just shelled out the $20K for a big 128GB RAM system and learned how
to do decent backups they would save on hosting charges and have a site that
is just as reliable.

~~~
tobi
Maybe for some use cases but most people on this site are concerned about
scaling web sites where scaling up (which is what you suggest) isn't feasible
because of the obvious cap to the approach and the fact that you can't even
add your own hardware to cloud/virtual hosting which is quickly becoming the
norm.

In general, every problem that can be solved in software should be solved in
software.

Postgres can be the second coming of jesus but it's still utterly infeasible
for high traffic web applications because of the replication issue. Even the
current hacks that add replication will not work because they cannot deal with
schema changes without downtime. At Shopify we add an average of 12 columns
and 2 tables to the database every month and we had a grand total of 58
minutes of downtime in 2008.

~~~
moe
_but it's still utterly infeasible for high traffic web applications because
of the replication issue_

Sorry, but what a nonsense.

High traffic sites scale by caching, sharding and non-relational databases, in
that order.

Replication can, at best, be an intermediate kludge for read-heavy websites
that haven't learned about caching ( _wink_ ) or insist on abusing their RDBMS
as a fulltext search engine.

It was also commonly used as an incremental backup solution until filesystem
snapshots became commonplace.

~~~
aditya
I think having multiple read slaves (and even write masters) is a perfectly
acceptable way of distributing load, and in turn, scaling.

To completely ignore that sounds foolish.

EDIT: xzilla is right, I wasn't really recommending blindly adding write
masters since that wouldn't work, but just that read slaves are not a bad
idea. :)

~~~
moe
Well, I have never seen a read-slave setup in a webapp for the purpose of
scalability that would've made a lot of sense.

It's a sometimes a stopgap measure when money is cheaper than time, but will
come back to haunt you later.

When your reads are starving in a webapp then you should look into fragment
caching and content-generation but not much further. If that doesn't solve
your problem then you're just very likely doing something fundamentally wrong
and better go ask someone smarter.

------
chiffonade
> As expected, the NTT Open Source team working on Synch Standby ran into some
> unexpected issues

Is that sort of like a known unknown? Or an unknown known?

~~~
patio11
I think people who mock those phrases mercilessly probably never heard them
used in context. They're government contractor-speak that is probably popular
around the DoD, but they're not meaningless. Here, let me try.

Known known: There is a bug open in the tracker which reports that there is a
race condition under certain circumstances. You have someone assigned to look
into it.

Unknown known: One of the programmers noticed a race condition under certain
circumstances, but he did not put it in the tracker, and as a result the
information is compartmentalized and the issue is _not_ being dealt with.

Known unknown: In preparation for your release you've got a load balancer,
four application servers, and memcached set up, but you're not sure whether it
will be able to handle a Slashdot on release day. You resolve to test this,
when the schedule permits.

Unknown unknown: Your password recovery form permits a SQL injection attack
and nobody in your organization has a clue.

The manager take away point: it is far better to have known unknowns than
unknown unknowns because you can take steps to mitigate the risk, and unknown
knowns are almost useless to you as a manager because they prevent you from
mitigating the risk even if you're aware of a solution at some level of the
organization.

~~~
chiffonade
I think anyone who can figure out what an adjective put in front of a noun
means understands what these phrases mean, the point is that they're just so
fucking ridiculous as to be laughable.

------
discojesus
"Hot Standby"?

Sounds like a porno movie set in an airport terminal.

