
PostgreSQL Columnar Store for Analytic Workloads - tadruj
http://www.citusdata.com/blog/76-postgresql-columnar-store-for-analytics
======
chrisfarms
I've never really digged into column-oriented storage, so had a quick skim...
Would the below excerpts/example be a fair note of pros/cons of the general
idea?

> Column-oriented organizations are more efficient when an aggregate needs to
> be computed over many rows but only for a notably smaller subset of all
> columns of data, because reading that smaller subset of data can be faster
> than reading all data.

Example: SELECT sum(a) FROM things;

> Column-oriented organizations are more efficient when new values of a column
> are supplied for all rows at once, because that column data can be written
> efficiently and replace old column data without touching any other columns
> for the rows.

Example: UPDATE things SET a = a+1;

> Row-oriented organizations are more efficient when many columns of a single
> row are required at the same time, and when row-size is relatively small, as
> the entire row can be retrieved with a single disk seek.

Example: SELECT * FROM things;

> Row-oriented organizations are more efficient when writing a new row if all
> of the row data is supplied at the same time, as the entire row can be
> written with a single disk seek.

Example: INSERT INTO things (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);

~~~
natebrennand
That's pretty accurate. Column-stores also make data compression significantly
more effective because they are storing many values of the same type together
[1].

The improvements to your first two points are typically 1-2 magnitudes faster
with column-stores [2].

[1]
[http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf](http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf)

[2] [http://db.csail.mit.edu/projects/cstore/abadi-
sigmod08.pdf](http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf)

------
rusanu
Good work, and I'm impressed by the PostgreSQL foreign data wrapped API power
to allow for such ease of implementation.

The .proto defined for cstore_fwd differs from the ORC as defined for Hive. At
a quick glance I can't find references to dictionary encoding nor statistics,
and the datatypes used are apparently the native PostgreSQL data types. From
what I can tell this implementation leverages reduced IO (fetch only columns
of interest from disk), segment elimination (use min/max info to skip over
entire row groups) and pgzl compression for the data stream. I couldn't find
references to run-length encoding (RLE) or dictionary encoding. I'm sure the
shortcoming will be improved in future iterations, specially better encoding
schemes which would result in better compression.

But I'm a bit disappointed that the ORC format used is not the same as the one
originally used in Hive, Pig and the rest of Java/Hadoop ecosystem. Had it
shared the actual file format it would had enabled a number of very
interesting scenarios. Think about Hive/PIG serving as ETL pipelines to
produce ORC file that are attached directly to PostgreSQL (add today's
partition as FOREIGN DATA WRAPPER ... OPTIONS (filename '.../4_4_2014.cstore')
and then do interactive analytic driven by PostgreSQL. It would reduce the
cost of ingress significantly (fast attach, no need for expensive INSERT and
re-compression). I realize data type translation would had been thorny, to say
the least (datetime and decimal as primitives, probably many of the structured
types too).

\- github.com/citusdata/cstore_fdw/blob/master/cstore.proto \-
github.com/apache/hive/blob/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto

------
tkyjonathan
I think the emphasis should be on the fact that Postgres now has a free and
open source columnar store. For many years (probably decades) there have been
companies that have developed "big data"/analytics system with Postgres as the
base, but have not contributed back to the Postgres ecosystem.

While this new columnar store is not as speedy as ones that have been around
since the early 2000s, it does give a platform for CitusData and other
companies to build on and share solutions.

Having a DB that can hold both the transactional data and data for fast
analytical purposes is very advantageous as you have less moving parts and
much less ETL work.

What I am looking forward to now is a few start ups similar to CitusData that
solve different "Big Data" problems and for them to work together to disrupt
the multi-billion dollar datawarehouse/analytics vendors.

------
lucian1900
This is potentially very interesting, it's the only open source SQL columnar
store I'm aware of. Bonus points for being part of Postgres.

Basically ParAccel (like used by Amazon RedShift) at reasonable cost.

~~~
rhelmer
Don't you still need to purchase citus DB to use this?

~~~
ozgune
Nope, you don't need to purchase CitusDB.

We open sourced cstore_fdw with the Apache license. You can just build and
load the extension, and start using it with your PostgreSQL 9.3+ database.

If you later need to scale out your PostgreSQL cluster, just please let us
know. :)

~~~
spaznode
I know some statisticians / analyst minded people your generous contribution
may make extremely happy. Myself as well of course.

Thank you very very much in advance. This is a really huge deal. Even for
general things like academic research. We shall see.

------
monstrado
Any particular reason why ORC was chosen as the columnar store format over
Parquet ([https://github.com/Parquet/parquet-
format](https://github.com/Parquet/parquet-format))? Reason I ask is because
Parquet seems to have its own development cycle, roadmap, and is pretty
continuously updated with enhancements.

------
mason55
I'd be interested to see more benchmarks. The improvements in this post are
not anywhere close to what we've seen going from PG to Infobright for our
reporting queries - we get speedups from 10x - 1000x, the one speed benchmark
they have here is only 2x.

~~~
alecco
Because they still use PostgreSQL engine, very slow.

There is a great talk by Stonebraker about this. (He created PostgreSQL about
3 decades ago and moved on to newer database engines).

~~~
Alex3917
Is this talk online?

~~~
garenp
Probably referring to this:

[http://blog.jooq.org/2013/08/24/mit-prof-michael-
stonebraker...](http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-
the-traditional-rdbms-wisdom-is-all-wrong/)

------
alecco
How is this different than the many other columnar SQL databases and
extensions?

Columnar querying is typical for OLAP, PostgreSQL engine is aimed at OLTP.
This doesn't look like a good idea. Like adding side floats to a car and
paddles to use it like a boat.

This goes against using the right tool for the right job.

~~~
alecco
Downvotes? Lovely.

Also, I bet VoltDB, a modern open source OLTP, can beat this thing hands down.
Also in-memory and clusters. Complex store procedures precompiled and many
other goodies.

Commercial column stores like Vertica should be orders of magnitude faster.

~~~
nemothekid
Don't know why you are being down voted, but you should understand its about
tradeoffs.

If I want to increase my database performance I can either

1.) Build this plugin and integrate it into my already working ecosystem 2.)
Spend time researching, testing, and deploying VoltDB.

Given the popularity of postgres, and the relatively low friction solution of
(1.), its clear why this could be an adequate solution. Sure you won't be as
fast as VoltDB, but as an outside engineer we don't know the potential
customers requirements, and if being as fast as VoltDB actually matters.

------
noelherrick
This is really exciting! Columnar storage is something that the big boys like
Microsoft and Oracle charge an arm and a leg for. You can currently get
column-based SQL open-source databases, but this new FDW allows you to mix
workloads on the same machine.

------
mimighost
I am curious about how this thing compares to something like Amazon Redshift.

Briefly skimming, it looks pretty similar, except for the data compression
part, which uses RCFile. It also supports more data types. If this being
adapted by redshift or something else, I will be thrilled.

------
capkutay
So with Cassandra you have a pretty nice, scalable, columnar DB with a SQL
interface[0]. Not to mention, it's free and apache licensed so you can
distribute it as part of your own software. I guess I've only looked at
cassandra from the scope of a developer. Would a DBA prefer using a columnar
version of PostgreSQL rather than using cassandra for free?

0:[http://cassandra.apache.org/doc/cql/CQL.html](http://cassandra.apache.org/doc/cql/CQL.html)

Edit: I didn't realize Citus Data was making the columnar postgres offering
open source...that's great!

~~~
arielweisberg
Cassandra is a Bigtable style column store not an analytic column store.

It's an unfortunate naming collision.

~~~
capkutay
I think cassandra is pretty good for analytical workloads...for instance "Give
me all the customers from pittsburgh" is a faster scan when the city column is
laid out contiguously on disk.

~~~
arielweisberg
Cassandra doesn't store columns from different rows together to my knowledge.
That means it is no way like an analytic column store where columns from all
rows are stored together. This is not a bad thing, it just wasn't intended to
be an analytic column store and makes design choices suited for other things.

See
[http://wiki.apache.org/cassandra/ArchitectureSSTable](http://wiki.apache.org/cassandra/ArchitectureSSTable)

Analytic column stores can exploit storing columns separately by using various
encoding schemes to compress data that are far superior to generalized
compression and they can use the sort to zero on relevant ranges for each
column.

~~~
capkutay
Ah ok. I understand why a columnar database would be better for analytical
workloads, but I didn't realize that HBase and Cassandra aren't actually laid
out in a columnar fashion on disk. I was confused by the 'column store'
terminology. This makes Citus data's release even more compelling.

Thanks for clearing that up!

------
hfmuehleisen
I have written a short blog post on how CitusDB's column store compares with
purpose-built systems such as MonetDB. Repeatability ftw. Disclosure: I am
part of the MonetDB development team.

CitusDB vs. MonetDB TPC-H Shootout [http://www.monetdb.org/content/citusdb-
postgresql-column-sto...](http://www.monetdb.org/content/citusdb-postgresql-
column-store-vs-monetdb-tpc-h-shootout)

------
rpedela
I am curious, why are foreign tables necessary?

~~~
ozgune
(Ozgun from Citus Data here.)

Foreign tables provide a nice abstraction to separate the storage layer from
the rest of the database. They are similar to MySQL storage engines in a
sense. With them, users don't need to make any changes to their existing
PostgreSQL databases, and instead just Create Extension and get rolling.

You can also use both foreign and regular tables in the same query:
[http://citusdata.github.io/cstore_fdw/#toc_4](http://citusdata.github.io/cstore_fdw/#toc_4)

~~~
saosebastiao
My question is what do you lose by doing so? FDW has always felt like a little
bit of a hack, but I guess if the optimizer and execution engines can use it
as if they were native tables, then there isn't much lost. But the fact that
ProtoBuf is used makes me think that there is some overhead that doesn't occur
in native tables.

~~~
Jweb_Guru
I could be a bit off here, but from what I remember about working with FDWs I
think the biggest drawback is that you can't assume native access to system
tables on a remote server, which means you can't do things like CREATE INDEX
on a foreign table and aren't going to have, e.g., statistics about those
tables. However, it does have pretty much all the information the optimizer
does about the actual query, and if you're implementing your own storage
engine like Citus rather than trying to hook into someone else's you can
probably get around those problems.

~~~
jasonmp85
You're correct that foreign tables do not allow explicit index creation
(though cstore_fdw does use skip indexes internally), but ANALYZE can ask them
for statistics: [http://www.postgresql.org/docs/current/static/sql-
analyze.ht...](http://www.postgresql.org/docs/current/static/sql-
analyze.html#AEN68136)

cstore_fdw supports ANALYZE so these statistics can indeed be used by the
planner; however, the autovacuum daemon doesn't do this automatically for
foreign tables, so it's up to the user to decide how often to run ANALYZE on
them.

~~~
Jweb_Guru
That's pretty cool, I had no idea that was supported :)

------
hans_castorp
This sounds very interesting. Are there any pre-built Windows binaries
available for this extension?

------
joevandyk
You'd have to mangage backups of the storage file separately than backups of
postgresql, right?

------
yazun
Any chance it will support INSERT some day?

