

PostgreSQL columnar store benchmarks on SSDs - il
http://citusdata.com/blog/78-postgresql-columnar-store-benchmarks-on-ssds

======
danmaz74
This looks so good, that a question arises: Where's the catch? In other words,
in which situations is a columnar DB a bad solution?

~~~
EdwardDiego
We use Vertica for our analytics, and basically, INSERT, UPDATE and DELETE are
performance risks, as are trying to SELECT large amounts of entities.

As other commenters have said, Vertica, like other columnar DBs, is optimised
for reads, and it does so basically through materialized views (called
projections) across multiple nodes - so when underlying data is updated, those
updates have to be replicated across the system. It's not a huge concern in a
reporting / analysis environment though.

Retrieving whole entities is not particularly performant, but if you want to
group and aggregate columns, it's blistering fast.

I ran some numbers - our largest stats table in Postgres is 64GB, 379 million
rows, and summing a column can take up to 4.5 minutes. Our largest table in
Vertica is 500GB, 2.7 billion rows, and summing a column can take up to 100
milliseconds.

I'd have to say that the worst thing about Vertica is that it's not free as in
beer or as in speech. Licencing is done on a raw data basis, and the high
costs of upgrading licences often drive our technical decisions. And likewise,
trying to use the Vertica JDBC library to stream entities from Postgres to
Vertica was an interesting exercise (in fairness to Vertica, it was caused by
poor implementation of some code in the Postgres JDBC library), because I had
only the compiled JARs to work with, and so debugging was difficult.

Other catches - constraints are not enforced - a performance decision. The
other issue is that projection design is a very fine art and really determines
how much performance you'll get. That said, Vertica ships with a projection
designer that designs projections to answer specified queries fast, and the
designer is suitable for about 98% of your needs. For the last 2%, you can
often find yourself trying to interpret the mind of the query planner when you
want it to use your new projection.

...This leads to another catch - as projection structure is strongly tied to
query structure, queries that are changed and ad-hoc queries may suffer
performance issues, and they do so silently. Vertica does make excellent
records of query metrics, but nothing will blow up if an existing projection
can no longer answer a query.

The last caveat is that importing data into Vertica requires some thought, and
how you do it is typically determined by Vertica's design constraints. The
documentation is good, and the training is worthwhile, but there's still a lot
of thinking specific to the implementation required.

~~~
jeffdavis
"basically through materialized views... summing a column can take up to 100
milliseconds"

That's a key point worth expanding on. This isn't so much a columnar/row
distinction as it is a point about materializing results (or partial results)
in a clever way that allows for fast processing for those kinds of queries.

That moves the problem: queries are faster (as long as they are in the sweet
spot of queries that you expected), but you have extra cost at
insert/update/delete time and extra effort when designing the "projections"
(as you say).

Columnar makes a similar trade in that writes are slower and a class of
"typical" grouping and aggregation queries are generally faster.

By combining the two strategies (materialization and columnar), Vertica is
essentially doubling-down on that trade. So queries that match what you
designed the projections for, and that follow typical patterns of grouping and
aggregation, are blazingly fast (or so I've heard). But it's more restrictive
about the kinds of queries that will work well, and it's sensitive to new
kinds of queries that you might not have planned for when designing the
projections (again: or so I've heard). In other words, it's more specialized.

Personally, I don't find ultra-specialized databases very interesting
(speaking generally here, not necessarily saying that Vertica is ultra-
specialized). I basically see two reasons to use something specialized: (1)
replicate data into it and run queries (i.e. an accelerator); and (2) put lots
of low value data in it, that you'd otherwise throw away, and try to extract
some signal from it. (There are a couple other cases, like real-time systems,
but that is outside of what I ordinarily deal with.)

Disclaimer: I work for a big data company that is not Vertica. Our products
offer row or columnar as storage options, but that is not central to the
products' value propositions. Opinions above are my own.

~~~
EdwardDiego
> Personally, I don't find ultra-specialized databases very interesting
> (speaking generally here, not necessarily saying that Vertica is ultra-
> specialized). I basically see two reasons to use something specialized: (1)
> replicate data into it and run queries (i.e. an accelerator); and (2) put
> lots of low value data in it, that you'd otherwise throw away, and try to
> extract some signal from it. (There are a couple other cases, like real-time
> systems, but that is outside of what I ordinarily deal with.)

Pretty much. We had a specific business requirement of near real-time
statistics and historical statistics available at an entity / day level of
granularity for advertisers and publishers using our platform.

Vertica occupies a very sweet spot for us in that regard, but it requires a
lot of specialised approaches. I've managed to break its tuple mover a couple
of times when loading historical raw data in large quantities.

------
ddorian43
A long way to go:

[https://www.monetdb.org/content/citusdb-postgresql-column-
st...](https://www.monetdb.org/content/citusdb-postgresql-column-store-vs-
monetdb-tpc-h-shootout)

~~~
techscruggs
I wonder how these benchmarks would compare with citusdb + compression. Very
interesting nonetheless.

------
Alex3917
Any comparisons against Vertica or other DBs that were designed to be columnar
from the ground up?

~~~
nchammas
Published benchmarks comparing commercial databases are difficult to come by
due to the DeWitt Clause [1] in many commercial database licenses.

[1]
[http://en.wikipedia.org/wiki/David_DeWitt](http://en.wikipedia.org/wiki/David_DeWitt)

------
techscruggs
If you are not familiar with Foreign Data Wrappers, they allow you to connect
to other datastores and represent that data as tables in your database.
[http://wiki.postgresql.org/wiki/Foreign_data_wrappers](http://wiki.postgresql.org/wiki/Foreign_data_wrappers)

------
fletchowns
Does this support JOINs? Or do you use a giant WHERE IN () clause?

My use case is essentially a cross-database JOIN that I've been using MySQL &
temp tables to accomplish. For example, give me the sum of column x if column
y is any one of these 50,000 values from a separate system. So load the 50,000
values into a temp table and then do a JOIN. Performance isn't that great and
it uses a ton of disk space so I wanted to try using a columnar store.

~~~
techscruggs
Seems like you might want to consider getting all of your data in the same
place. A Foreign Data Wrapper feeding a Materialized View might be a neat way
to solve your problem: [http://www.postgresql.org/docs/9.3/interactive/rules-
materia...](http://www.postgresql.org/docs/9.3/interactive/rules-
materializedviews.html)

~~~
fletchowns
> Seems like you might want to consider getting all of your data in the same
> place

Unfortunately that's not possible

------
dharbin
I'm very excited about this! Add a mechanism to distribute data and queries
across a cluster, and this could be the makings of an open-source Amazon
Redshift.

~~~
bredman
Actually CitusDB, which we also produce, does distribute data and queries
across a PostgreSQL cluster. You can see some performance comparisons of using
cstore_fdw on CitusDB in the blog post and the performance improvements are
even more significant there.

------
rustyconover
It would be interesting to compare these benchmarks against the performance of
Amazon's Redshift.

If the benchmark can be run without changes on Redshift would be my first
question. There are some interesting differences that Redshift has rather than
just being a columnar PostgreSQL protocol-speaking database. But if its
possible, I'd be very interested to see the results.

------
klreierson
Do the benchmarks for postgres utilize in memory columar store (IMCS)? What is
the difference between postgres imcs and citus cstore_fdw?
[http://www.postgresql.org/message-
id/52C59858.9090500@garret...](http://www.postgresql.org/message-
id/52C59858.9090500@garret.ru)

~~~
bredman
The benchmarks use cstore_fdw which is a columnar store that is accessed via
the PostgreSQL foreign data wrapper system.

ICMS is a different implementation of a columnar store for PostgreSQL. I don't
know it extremely well but my understanding is that it uses the PostgreSQL
shared memory system to allocate memory for a columnar store. This columnar
store is not queried via standard SQL commands but rather using a combination
of a custom query language and some user defined functions.

------
mixologic
Isn't the assumed tradeoff SSD storage for CPU usage? How much more cpu time
is utilized in compressing/decompressing? And whats the unit cost of that
extra CPU in comparison to the cost for disk space savings of 'expensive'
SSD's?

~~~
rosser
CPU time is so _ridiculously_ cheap compared to disk IO — even on SSD — that
it's pretty much always a win, sometimes massively so.

My comment during a previous instance of this discussion:

[https://news.ycombinator.com/item?id=5592341](https://news.ycombinator.com/item?id=5592341)

~~~
mixologic
I think you're comparing cheap in terms of time, Im talking about in terms of
resource utilization and saturation. I think it would be clearer to show what
the exact tradeoff is in terms of cost. In an arbitrary scenario where you
have 100$ worth of ssd, and 100$ worth of CPU, and this strategy saves 75% of
the SSD space at the cost of 1% more utilization of CPU, then yeah, its a
total win. But isnt your example showing that it takes 5 times as much cpu
time compressed vs uncompressed?

~~~
rosser
Just using my own example, it's roughly a 5x increase in CPU time for a > 10x
_reduction_ in disk space (and commensurate reduction in disk IO — including
writes, which is an important consideration when you're using SSD). Let's just
say 10x though, for sake of easier math.

Assume $1/gbyte storage costs. We're not talking about consumer-grade drives
here, right? It's probably worse than that anyway, because you're using some
form of RAID, too. Aren't you?

Assume further a $1000 8-core CPU, as that's what the machine in my example
uses (it's actually slightly more expensive, but again: easier math), and a 3
year depreciation schedule. That's roughly 1000 days, or $1/day, $0.125/core-
day, or $0.000087/core-second.

Storing that log file uncompressed costs you $8.817 in storage, and $0.00085
in CPU time, if the process uses 100% of a core for the duration. Storing it
compressed costs you $0.828 in disk space and $0.0038 in CPU time — again, if
you're burning an entire core for all 44s.

I think $0.8318 is less than $8.81785, but feel free to check my math...

EDIT: Yes, I know I'm conflating the ongoing cost of storage and the
incidental cost of CPU time. I'm also ignoring the cost of power and cooling,
leaving entirely aside the difference between storage _milliseconds_ and CPU
_nanoseconds_ , and, and, and. I guess, if it's not obvious that compression
is pretty much unequivocally a win with modern CPUs, then I don't know what
else to say.

~~~
aquadrop
That math stands only if your server has spare CPU utilization.

~~~
rosser
I disagree. Being IO starved is far more likely, and has a vastly greater
impact on the utility and capability of a host, than being CPU starved.

~~~
aquadrop
Sure IO starving is more likely, but on analytic DB (which is supposed use of
showed tech) you probably will do some complex queries with joins, filtering,
aggregation etc. That can load CPU pretty well. So you always need to check
where the bottleneck is.

------
dougmccune
I couldn't find documentation about what subset of SQL you can use. I saw
mention of "all supported Postgres data types", but not anything about what
features work. Any links?

~~~
bredman
If you use cstore_fdw with PostgreSQL you can use the full PostgreSQL SQL
dialect for querying your table. However, we don't currently support writing
to your cstore_fdw table using SQL commands like INSERT/UPDATE/DELETE.
PostgreSQL added foreign data wrapper support for these commands with their
9.3 release and it is something we're considering adding to cstore_fdw in the
future.

