Hacker News new | comments | show | ask | jobs | submit login

Have you tested running it on a FreeBSD box with ZFS? It has lz4 compression by default and makes such a great storage solution for PG. You get compression, snapshots, replication (not quite realtime but close), self healing, etc etc in a battled hardened and easy to manage filesystem and storage manager. I've found you can't beat ZFS and PG for most applications. Edge cases exist of course everywhere.

Compression works well with column stores for different reasons - since all columns are things of the same data type, there are way more opportunities for very very lightweight and highly effective compression, like run length encoding (a,a,a,a,a,a -> 6a) which helps a lot with sparse or duplicate columns, range compression (1,2,3,4,5 -> 1-5), dictionary compression (hippopotamus,potato,potato,hippopotamus -> 1,2,2,1 + a dictionary like {1:hippo,2:potato}), value scale (203,207,202,201 -> 3,7,2,1 + base:200), bit arrays for booleans and bit vectors for low-cardinality columns, etc.

This saves on space but also often improves query latency, provided your execution engine supports queries on compressed columns - less data means more of it fits into cache, meaning less cache misses (i.e. expensive disk reads).

edit: So what I mean is compressing the entire files or disk blocks with LZ might not take advantage of the same properties and might not get you the same deal - here often the in-memory data layout is also compressed, and queries execute efficiently on compressed data, versus with a blindly compressed filesystem you'd ostensibly have to always decompress first.

True. Actually, this execution efficiency is usually way more important benefit of column stores than just saving disk space.

Sadly, PostgreSQL does not support that now, but hopefully it's not that far in the future. It's becoming a fairly regular topic in discussions both with customers and devs on conferences.

Thanks for the great suggestions.

We've certainly discussed running on ZFS internally, but haven't evaluated yet. We're a bit uncomfortable dictating file system requirements to users, so looking to ultimately provide many of these things in the database instead.

Would welcome any testing/benchmarks, though! :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact