Yes, in fact, it's specifically designed for data that doesn't fit in RAM. Their storage engine optimizes data locality for fast retrieval from non-flash storage.
Has anyone achieved a dramatic speedup using this extension? Last time I tried it, I got barely a 30% speedup in a generic aggregation across a few fairly normalised tables with ~10M rows. I don't know if I'm doing something wrong or if this is all I'm getting. Thanks!
I've tried it a couple times but didn't see as dramatic a speedup as I'd hoped either. What has worked for me is storing a full array in a single row of a regular Postgres table. Then if you have to load e.g. 1 million FLOATs, you can just slurp an 8MB chunk off disk instead of looking in 1 million different places. Postgres doesn't have a lot of functions for working on that kind of data though, so I wrote an extension to provide a few at [0]. I will probably add some more functions soon around masking out values so you can work "R/Pandas-style" on these arrays.
My approach breaks down if you are heavily updating your arrays though. On the mailing list Tom Lane suggested I store that kind of data outside the database.[1] But of course all software development is an attempt to have your cake and eat it too, so I wrote an extension that is a bit like cstore_fdw, but maybe more extreme: you can store/load/append a float array as an ordinary file.[2] It has similar operational disadvantages to cstore_fdw (no replication, no pg_dump, etc.) but performs faster. On the other hand it is definitely not battle-tested and still likely to change a little. (I need to improve the crash story and may add compression.) If you decide to use it, let me know!
Oh another limitation of keeping everything in an array (whether with floatfile or not) is that Postgres has a 1 GB limit on most RAM allocations. For example you could try running array_agg on a giant table and you might get a message about `Exception invalid memory alloc request size 1073741824`. I don't think cstore_fdw has that problem.
https://clickhouse.yandex/
It's a recently open sourced database by Yandex. It powers their web analytics backend. CloudFlare is already using it for their DNS analytics.