> Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
Example: SELECT sum(a) FROM things;
> Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
Example: UPDATE things SET a = a+1;
> Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
Example: SELECT * FROM things;
> Row-oriented organizations are more efficient when writing a new row if all of the row data is supplied at the same time, as the entire row can be written with a single disk seek.
Example: INSERT INTO things (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);
The improvements to your first two points are typically 1-2 magnitudes faster with column-stores .
From a workload perspective, row-stores are predominant in the OLTP (transactional insert/update) domain. They are also used in OLAP and data warehousing. Still, column stores have benefits when the underlying tables have many columns and the user's running analytic queries over a small subset of those columns.
I consider different from your first point because sometimes is not the aggregation, but the drill down what is needed
The .proto defined for cstore_fwd differs from the ORC as defined for Hive. At a quick glance I can't find references to dictionary encoding nor statistics, and the datatypes used are apparently the native PostgreSQL data types. From what I can tell this implementation leverages reduced IO (fetch only columns of interest from disk), segment elimination (use min/max info to skip over entire row groups) and pgzl compression for the data stream. I couldn't find references to run-length encoding (RLE) or dictionary encoding. I'm sure the shortcoming will be improved in future iterations, specially better encoding schemes which would result in better compression.
But I'm a bit disappointed that the ORC format used is not the same as the one originally used in Hive, Pig and the rest of Java/Hadoop ecosystem. Had it shared the actual file format it would had enabled a number of very interesting scenarios. Think about Hive/PIG serving as ETL pipelines to produce ORC file that are attached directly to PostgreSQL (add today's partition as FOREIGN DATA WRAPPER ... OPTIONS (filename '.../4_4_2014.cstore') and then do interactive analytic driven by PostgreSQL. It would reduce the cost of ingress significantly (fast attach, no need for expensive INSERT and re-compression). I realize data type translation would had been thorny, to say the least (datetime and decimal as primitives, probably many of the structured types too).
While this new columnar store is not as speedy as ones that have been around since the early 2000s, it does give a platform for CitusData and other companies to build on and share solutions.
Having a DB that can hold both the transactional data and data for fast analytical purposes is very advantageous as you have less moving parts and much less ETL work.
What I am looking forward to now is a few start ups similar to CitusData that solve different "Big Data" problems and for them to work together to disrupt the multi-billion dollar datawarehouse/analytics vendors.
Basically ParAccel (like used by Amazon RedShift) at reasonable cost.
I use it with great success.
We open sourced cstore_fdw with the Apache license. You can just build and load the extension, and start using it with your PostgreSQL 9.3+ database.
If you later need to scale out your PostgreSQL cluster, just please let us know. :)
Thank you very very much in advance. This is a really huge deal. Even for general things like academic research. We shall see.
We're going to publish more comprehensive results in a few weeks. In our initial tests, we found the speed up to depend a lot on the underlying data and the type of queries.
For example, we found that when compression reduced the working set from being on disk to in-memory, there was a significant jump.
Also, we're looking to do optimizations on the cost estimation side -- these will notably help with queries that join multiple tables together, a common scenario for the TPC-H benchmark.
There is a great talk by Stonebraker about this. (He created PostgreSQL about 3 decades ago and moved on to newer database engines).
Columnar querying is typical for OLAP, PostgreSQL engine is aimed at OLTP. This doesn't look like a good idea. Like adding side floats to a car and paddles to use it like a boat.
This goes against using the right tool for the right job.
Also, I bet VoltDB, a modern open source OLTP, can beat this thing hands down. Also in-memory and clusters. Complex store procedures precompiled and many other goodies.
Commercial column stores like Vertica should be orders of magnitude faster.
If I want to increase my database performance I can either
1.) Build this plugin and integrate it into my already working ecosystem
2.) Spend time researching, testing, and deploying VoltDB.
Given the popularity of postgres, and the relatively low friction solution of (1.), its clear why this could be an adequate solution. Sure you won't be as fast as VoltDB, but as an outside engineer we don't know the potential customers requirements, and if being as fast as VoltDB actually matters.
Briefly skimming, it looks pretty similar, except for the data compression part, which uses RCFile. It also supports more data types. If this being adapted by redshift or something else, I will be thrilled.
Edit: I didn't realize Citus Data was making the columnar postgres offering open source...that's great!
Not to say there aren't tools to get SQL ontop of Cassandra (Datastax has a hadoop/mr driver, that you can probably put pig/hive/presto(?) on top of).
You can find a brief description of why and when you want actual columnar formats in the introduction section of this blog post: https://blog.twitter.com/2013/dremel-made-simple-with-parque... .
You can read more about columnar databases here: https://en.wikipedia.org/wiki/Column-oriented_DBMS
A columnar storage format for a database engine that does not have operations specifically tuned for vectorization and taking advantage of data being cheaply available a column chunk at a time instead of a row chunk at a time is only half the battle -- you can get massive perf wins on top of the work CitusDB did by doing things like operating on compressed data / delaying decompression, avoiding deserialization for column values based on a filtering scan of another column, pipelining operations on column values to fit better into your cpu architecture, etc.
But just having the IO wins is nice, too.
It's an unfortunate naming collision.
Analytic column stores can exploit storing columns separately by using various encoding schemes to compress data that are far superior to generalized compression and they can use the sort to zero on relevant ranges for each column.
Thanks for clearing that up!
An analytic column store like say Vertica has a schema like a regular SQL database. I don't know what their flexible schema story is right now.
Instead of storing the columns of a row together an analytic column store will store columns from many rows together in sorted runs. When you go to do a scan your disk will only read the columns you have selected. The format for column storage is optimized for specific types and uses type specific compression so 10-50x is something that is claimed. This further improves the IO situation. They can also zero in on relevant ranges of data for each column because they are indexed and this further reduces the IO requirements.
Where other databases are bound on seeks or sequential throughput an analytic column store will be bound on CPU, especially CPU for the non-parallel portions of every query.
Obviously a column store will have a hard time selecting individual rows because the data is not stored together so it will be expensive to materialize. They also have trouble with updates/deletes to already inserted data, in some cases requiring the data be reloaded because updates have dragged everything down.
CitusDB vs. MonetDB TPC-H Shootout
Foreign tables provide a nice abstraction to separate the storage layer from the rest of the database. They are similar to MySQL storage engines in a sense. With them, users don't need to make any changes to their existing PostgreSQL databases, and instead just Create Extension and get rolling.
You can also use both foreign and regular tables in the same query: http://citusdata.github.io/cstore_fdw/#toc_4
cstore_fdw supports ANALYZE so these statistics can indeed be used by the planner; however, the autovacuum daemon doesn't do this automatically for foreign tables, so it's up to the user to decide how often to run ANALYZE on them.
Similar to how Infobright is just MySQL but you still can't mix columnar and regular DBs in the same instance, you have to run IB separately.
As Ozgun notes above, the Foreign Data Wrapper APIs just provide a powerful abstraction for what this extension is doing.
Though the DDL to set up a foreign table requires first creating a "server" with CREATE SERVER, this is merely a formality: as in file_fdw the server doesn't actually represent an external process. All I/O occurs within the postgres backend process servicing the query.