
“Big data” features coming in PostgreSQL 9.5 - snaga
http://lwn.net/Articles/653411/
======
warbiscuit
I love PG, and don't plan to ever move away from it.

But I'm starting to wish they had in-place upgrades. Application I develop is
getting to the point of "too large to make a copy of the whole db whenever I
want to upgrade Postgres".

I can postpone things for a little while by replacing all our large objects
with a external file storage -- but the tables themselves are growing quickly.

That'd be my favorite "big data" feature.

~~~
tsuraan
You can run pg_upgrade with the --link option to have it hardlink most of the
tables into the new cluster's location. It'll still re-write some things, but
I believe that most of the actual table data remains unaltered.

~~~
jeltz
Yeah, unless you have huge system catalogs (where the table definitions are
stored) pg_upgrade should be almost instant if you sue hardlinks.

------
brianolson
"Big" is apparently still limited by the 32TB table size limit in PostegreSQL.
My last job was running into this. Sadly we didn't get rigorous metrics, but
the database 'felt' slower when it got over .5 TB. The process pumping data
into the database was several percent slower in items per second than when it
started. Still, PostgreSQL is great and I'm using it right now on the new
project and I hope that someday I am blessed with tables too big for it.

~~~
nabla9
If 32TB table size limit and degrading performance worries you, I suggest that
you delve into Postgress documentation and discover table partitioning.

(you can partition table into chunks that are together larger than 32TB)

------
moreoverlords
"Unfortunately, due to the technical challenges being more difficult than
expected by the team led by EnterpriseDB programmer Robert Haas, no parallel
query features have yet been committed."

In the meantime... self-plug: [http://parpsql.com](http://parpsql.com) (free,
open source)

~~~
ddorian43
So the parallel-query-features help to run 1 query in parallel, while your
program executes multiple queries in parallel.

~~~
moreoverlords
Hi,

Yes, but there's a little more to it.

1\. My understanding is that the parallel query stuff they want to add into
the core of postgres is about introducing parallel algorithms for scans,
sorts, etc. The advantage is that when it is working there will be an
opportunity to help everyone, a) regardless of how you use postgres (psql,
API, etc) and b) transparently from the user perspective.

2\. par_psql on the other hand is two things. Mainly it's a cute piece of
syntactic sugar for psql users that makes it trivial to run multiple queries
in parallel (as you observe) but also synchronises them automatically as they
end, which is important.

I've also provided some guides about how to use this feature to substantially
accelerate single queries without much work or refactoring. It's generally a
good tool for SQL workflows that might otherwise be managed by a combination
of BASH and SQL, or for situations where you have one epic-sized query that is
naturally easy to parallelise.

This talk I gave at FOSS4G Europe (a GIS conference) offers some hints about
identifying situations where queries or workflows are trivially decomposable
into parallel small queries, allowing a huge speedup:

[http://graemebell.net/foss4gcomo.pdf](http://graemebell.net/foss4gcomo.pdf)

Hope this is of interest and use to you.

------
chubot
So, reading between the lines, the SYSTEM TABLESAMPLE algorithm appears to be
biased toward small values (in bytes, or whatever the on-disk encoding of
Postgres values is).

If you choose random data pages (of fixed size), then it can fit more rows of
small size than of big size.

"In the query above, SYSTEM is the name of the chosen sampling algorithm. The
SYSTEM algorithm chooses a set of pseudo-random data pages, and then returns
all rows on those pages, and has the advantage in running in constant time
regardless of the size of the table. PostgreSQL 9.5 will also ship with the
BERNOULLI sampling method, which is more rigorously random, but will take
longer the larger the table is."

------
no1youknowz
Cannot wait for 9.5. Migrated from MySQL. So happy I did.

------
kodablah
I would love to go back to relational, but I can't get over the linear scaling
and HA ease of Cassandra and I'm willing to write my applications to work
around the data model limitations. Is there any open source SQL database (i.e.
"newsql") that can linearly scale with Cassandra's ease right now? I look
forward to CockroachDB but it isn't done yet.

~~~
ericfrenkiel
It's commercial, but MemSQL Community Edition scales horizontally. It's free
and there are no capacity or scaling limits.

~~~
ddorian43
Community edition doesn't have HA

------
Joyfield
I should really learn some PG and move away from MySQL. But at 40 years old i
have a problem learning how to sit properly. _sigh_

~~~
chucksmash
It's certainly possible to use Postgres productively without ever doing
anything advanced.

My use case can pretty well be summed up as "things an ORM can do for you out
of the box." As a developer living in the shallow end of the RDBMS pool,
switching a hobby project over is a matter of hours. If your usage is similar,
learning it well enough to get by could just be a matter of:

    
    
      apt-get install postgres

~~~
smacktoward
For longtime MySQL users (like myself, ahem), I've found the biggest initial
hurdle to be the different syntax the stock MySQL and Postgres clients use.
When dropping into a Postgres environment I have to constantly look up what
the Postgres commands are to do simple stuff like list all tables or describe
a table's structure. It's not that the Postgres commands are particularly
abstruse or anything, just that they're non-obvious if you're unfortunate
enough to have spent 99% of your life marinating in MySQL.

~~~
tajen
Postgres is often closer to the SQL standard. For example escaping column
names is specified by SQL-99 with double-quotes (") whereas MySQL by default
refuses this and requires backticks (`, same as Oracle).

Compared to MySQL, the presence of schemas is a big win for me (same as 'user'
in Oracle). Also, I can't move away from Postgres because it's the only DBMS
which supports transactional DDL, which the ability to roll back an "add
column" as part of a transaction.

[http://stackoverflow.com/questions/2901453/sql-standard-
to-e...](http://stackoverflow.com/questions/2901453/sql-standard-to-escape-
column-names)

~~~
McGlockenshire
Parent seems to be discussing the cli toolset, where many commands are
composed of a backslash and one or two letters, whereas the mysql cli expects
verbose commands, like SHOW TABLES and DESCRIBE tablename.

~~~
tajen
It's true that the lack of SHOW and DESCRIBE in Postgres is annoying. Makes it
difficult to get the same result from a SQL client.

------
Qantourisc
I just hope they retrain the option/default to strcoll for the final sort. For
internal parts it doesn't matter.

------
polskibus
Does anyone know about TABLESAMPLE equivalents in other RDBMS, for example in
MS SQL?

~~~
pfg
MS SQL has TABLESAMPLE support: [https://technet.microsoft.com/en-
us/library/ms189108(v=sql.1...](https://technet.microsoft.com/en-
us/library/ms189108\(v=sql.105\).aspx)

I don't think it has something like TABLESAMPLE BERNOULLI.

------
bmh100
It's nice to PostgreSQL adding all of these analytics and business
intelligence features. BI platforms have had these sorts of features for a
least a decade.

------
curiousjorge
just seems like postgresql is constantly coming up with new things that just
sort of works on top of a solid and reliable platform.

------
threeseed
These aren't "big data" features but rather EDW/BI features.

If PostgreSQL can't trivially scale horizontally like Cassandra or Riak or
integrate nicely with Spark/Hadoop like MongoDB then it isn't particularly
useful in typical big data roles.

~~~
Twisell
Well something not mentioned in this article is the that 9.5 should also
introduce inheritance from/to Foreign Tables which is one step closer to nice
and easy horizontal scaling.

Found this detailed example :
[http://michael.otacoo.com/postgresql-2/postgres-9-5-feature-...](http://michael.otacoo.com/postgresql-2/postgres-9-5-feature-
foreign-table-inheritance/)

Maybe thats not a No-SQL trendy way of doing things, but for I am more used to
PostgreSQL this look pretty neat. I think as the old and reliable project it
is it will come to BigData at its own pace and quietly... but its already
moving toward this and FDW was the first step.

