But I'm starting to wish they had in-place upgrades. Application I develop is getting to the point of "too large to make a copy of the whole db whenever I want to upgrade Postgres".
I can postpone things for a little while by replacing all our large objects with a external file storage -- but the tables themselves are growing quickly.
That'd be my favorite "big data" feature.
(you can partition table into chunks that are together larger than 32TB)
Depending on the nature of your project, you might consider a column store engine for postgres, such as
In the meantime...
self-plug: http://parpsql.com (free, open source)
Yes, but there's a little more to it.
1. My understanding is that the parallel query stuff they want to add into the core of postgres is about introducing parallel algorithms for scans, sorts, etc. The advantage is that when it is working there will be an opportunity to help everyone, a) regardless of how you use postgres (psql, API, etc) and b) transparently from the user perspective.
2. par_psql on the other hand is two things. Mainly it's a cute piece of syntactic sugar for psql users that makes it trivial to run multiple queries in parallel (as you observe) but also synchronises them automatically as they end, which is important.
I've also provided some guides about how to use this feature to substantially accelerate single queries without much work or refactoring. It's generally a good tool for SQL workflows that might otherwise be managed by a combination of BASH and SQL, or for situations where you have one epic-sized query that is naturally easy to parallelise.
This talk I gave at FOSS4G Europe (a GIS conference) offers some hints about identifying situations where queries or workflows are trivially decomposable into parallel small queries, allowing a huge speedup:
Hope this is of interest and use to you.
If you choose random data pages (of fixed size), then it can fit more rows of small size than of big size.
"In the query above, SYSTEM is the name of the chosen sampling algorithm. The SYSTEM algorithm chooses a set of pseudo-random data pages, and then returns all rows on those pages, and has the advantage in running in constant time regardless of the size of the table. PostgreSQL 9.5 will also ship with the BERNOULLI sampling method, which is more rigorously random, but will take longer the larger the table is."
pg_shard extends PostgreSQL without forking it, and it's worth a look: https://github.com/citusdata/pg_shard
pg_shard v2.0 will target linear scaling and higher HA through metadata replication. If you have any questions or feedback for the project, we'd be happy to hear from you.
If you're on AWS, you can already get linear scale and full HA. This post shows a proof-of-concept: https://goo.gl/3c2GYc
I'm currently experimenting with Phoenix, because it's a relatively thin layer above HBase, but it doesn't provide many features that you expect in relational databases like transactions. SELECT support is quite complete, it can do JOINs, subqueries, etc.
I have recently discovered Trafodion which looks like a much more complete relational database, but at the expense of no longer being just a HBase client.
My use case can pretty well be summed up as "things an ORM can do for you out of the box." As a developer living in the shallow end of the RDBMS pool, switching a hobby project over is a matter of hours. If your usage is similar, learning it well enough to get by could just be a matter of:
apt-get install postgres
Compared to MySQL, the presence of schemas is a big win for me (same as 'user' in Oracle). Also, I can't move away from Postgres because it's the only DBMS which supports transactional DDL, which the ability to roll back an "add column" as part of a transaction.
To really go out on a limb, I would posit that MySQL's view on RDBMS functionality influences its use, which I bet does not square with more traditional systems. Anytime one finds a shift in world view, the code becomes hard to rewrite and still behave the same way.
I don't think it has something like TABLESAMPLE BERNOULLI.
If PostgreSQL can't trivially scale horizontally like Cassandra or Riak or integrate nicely with Spark/Hadoop like MongoDB then it isn't particularly useful in typical big data roles.
Found this detailed example : http://michael.otacoo.com/postgresql-2/postgres-9-5-feature-...
Maybe thats not a No-SQL trendy way of doing things, but for I am more used to PostgreSQL this look pretty neat. I think as the old and reliable project it is it will come to BigData at its own pace and quietly... but its already moving toward this and FDW was the first step.