Part of the issue is that there is no such thing as "simple" replication unless one means "unreliable" replication. Simple replication is just shooting off the same SQL insert/update query on two servers. In the best scenario, that means that if you use a time function, you're likely to get two different results. In the worst case, it fails on one server and data gets out of sync.
Really, it's not that PostgreSQL needs more replication or built-in replication. It needs better replication than any of the current solutions. pgpool, while great as a connection pooler, is a terrible replication solution. Slony-I is based off triggers and its communication costs grow in quadratically (O(n^2) - yuck!). Plus, Slony-I requires an immense amount of setup for every table and every key. Mammoth replicator, which seems to be the closest to the right track, has a website that doesn't show much life and only a beta release for download. Plus, it still looks like there's a good amount of setup.
What we all really want is for PostgreSQL to implement a log-shipping based replication system where we can say "replicate this database" or "replicate these tables" to another server and have it just work. But there's nothing simple about that.
I completely agree there's nothing simple about that, but ever since we switched from Slony-I to Londiste at Justin.tv I couldn't be happier. It's been completely bulletproof, recovers from errors easily, and makes schema upgrades a breeze.
Postgres already has a asynchronous log shipping replication system similar to MySQL's, but the rub is that you can't read from the slave, so it's only good as a "warm standby" solution.
Work was happening to make it possible to read from the slave, and this was meant to go into 8.4. However, there were some issues and it got pulled from the release.
Hopefully we'll see it in 8.5. This is the only real way MySQL is beating Postgres at the moment.
It's a shame that PostgreSQL still lacks meaningful XML support. That's one critical area where the commercial relational database vendors are way ahead.
Why on earth would you store XML data in a database? XML is for when you have structured data but no database. Any structure you can store in XML is already catered for in conventional RDBMS design. And if you just want to store the string, there is nothing stopping you from doing that.
There are a lot of applications that benefit from native XML storage in a database. Here are some examples: http://www-01.ibm.com/software/data/db2/customers/ . Of course those customer statements are somewhat biased coming through IBM, but still fairly accurate and realistic.
Storing XML as a string is really not very useful since you can't query into the document contents. You also can't create indexes on particular elements or attributes for fast access.
Microsoft has a good explanation of why it isn't necessarily appropriate to store XML structures in a conventional RDBMS. I'll quote it here:
"The XML data model has characteristics that make it very hard if not practically impossible to map to the relational data model. XML data has a hierarchical structure that may be recursive; relational databases provide weak support for hierarchical data (modeled as foreign key relationships). Document order is an inherent property of XML instances and must be preserved in query results. This is in contrast with relational data, which is unordered; order must be enforced with additional ordering columns. Re-assembling the result during querying is costly for realistic XML schemas that decompose the XML data into a large number of tables."
Ok, seriously? You're asking an RDBMS to index into XML data you've shoved into a column?
Quoting Microsoft & IBM on database design is like quoting GM & Chrysler on vehicle design. All are years behind and focused on an under-informed and declining customer base.
2 things:
1. Postgres supports both Regular Expressions and indexing on expressions. If you can't index your table on a regex expression, you are probably doing something wrong.
2. Reconstructing full XML documents from a single column based on a query is a lazy anti-pattern. Either a) store the document in a way that it can be retrieved on demand, or b) structure the data and the query in a way that is performant. Jamming a verbose textual representation of data into a single column of a table and then demanding performance is a fundamental misunderstanding and misuse of an RDBMS table.
Apparently you don't understand how it actually works. At least in the case of DB2, the database engine doesn't store a "verbose textual representation" of XML documents. It breaks them down into a more efficient internal binary format that still preserves the structure.
I deal with complex XML documents that go a dozen or more levels deep. It would certainly be possible to represent that data in a relational structure. But then to query based on the value of a deeply nested element would require a large number of table joins. Keeping everything in a single XML column is much simpler and faster, since the specific inner element values can be indexed.
Hierarchical databases have been around for years, and actually predate relational databases. XML databases are just a specialization of hierarchical databases. Some applications are a better fit for the relational model and some are a better fit for the hierarchical model. With hybrid databases you get the best of both worlds and can choose the right tool for the job.
"Apparently you don't understand how it actually works."
Man, nope. No disrespect but if you are regularly querying 12 levels deep into an XML file you have stored in your database then IMO you are almost certainly doing something horribly, horribly wrong.
Because some database designs can be vastly simplified if they don't need to contain tables to map to every possible type of XML document your system handles. You can extract some of the data into a tables and store the whole document in a column to refer to later.
MS SQL Server, since version 2005, has had the ability to do XPath queries on columns containing XML data. In an optimised manner. That's very handy in the average enterprise system that contains mounds of XML data in different schemas - you'd have an explosion of tables if you tried to store all that using conventional RDBMS design.
PostgreSQL 8.3 (maybe earlier versions too, I haven't checked) has an xml data type. I haven't used it personally, but if I read the documentation correctly, you can do something like this:
SELECT id, xpath('//node[@type=''title'']', doc) AS title,
xpath('/@version', doc) AS version
FROM documents WHERE version=2;
(assuming your "documents" table has a "doc" field of type "xml")
The trouble is that PostgreSQL still doesn't support indexing into XML fields based on XPath statements. So if you need to look for some particular XML element value in the WHERE clause then it's going to be really slow.
According to the documentation, you can't index on XML directly because there are no XML comparison operators (I suspect this is to avoid hair-pulling debates over whether `<foo x='y'>` should compare equal to `<foo x="y"></foo>`). But you can serialize the XML, or a fragment of XML retrieved by XPath, and cast it to text, and you can build an index on the result of that serialization. (PostgreSQL has supported indexing on expressions for a loong time.)
I think the place you lose is if you have some arbitrary XPath expression for which you haven't constructed a DB index yet, and you want to search on that expression.
For the same reason you would store json in there. It's when you want a schema-less structure. I chose XML because the platform that the app uses prefers that.
I am salivating over the new WITH RECURSIVE ... SELECT ... syntax (http://www.postgresql.org/docs/8.4/static/sql-select.html). There's a very hairy piece of C++ code in my group's codebase that grinds over a certain schema to perform a certain transitive closure, and if I can redo that operation in SQL, we might be able to change its status from "only godlike programmers may dare touch this code" to "only demigodlike programmers may dare touch this code".
You should use pg_dumpall rather than pg_dump, typically. Also, remember to use the pg_dump implementation from the new version of PostgreSQL (e.g. 8.4), not the old one (e.g. 8.3).
I was hoping to see simple built-in replication in 8.4 (didn't make it into 8.3, see here: http://it.toolbox.com/blogs/database-soup/postgresql-develop...), so I guess it's on the 8.5 wishlist.