
Running PostgreSQL on Compression-enabled ZFS - cwsteinbach
http://citusdata.com/blog/64-zfs-compression
======
old-gregg
Can this simply be an artifact of terrible disk I/O on AWS or overall
difference between ZFS/ext3?

Do you think the results would have been similar if you were to use no-
compression-ZFS instead of ext3 on a proper database hardware?

Basically trying to figure out if the low performance of uncompressed dataset
is specific to AWS/ext3. Thanks.

~~~
mbell
Another issue is that ZFS is extremely aggressive with caching data in ram
(L1ARC). That can eat up memory you'd rather give to the database heap and
also tends to skew benchmarks.

~~~
wiredfool
Yes, but postgres' design in that area actually helps. Postgres relies on the
OS caching the data tables for the most part. There is some caching in shared
buffers, but generally that's not a huge portion of the memory of your db
system. (10-25%, and not more than a few gigs)

------
ars
I wouldn't recommend doing benchmarking on a virtual server.

You have no idea how busy the real server is, (noisy neighbors, etc), so it's
impossible to have comparable results from benchmark to benchmark.

~~~
jaytaylor
FWIW, If you use one of the largest instance types (4x large or whatever), the
VM will probably be on it's own host which would mean you're unlikely to have
neighbors ;)

~~~
skeletonjelly
When benchmarking, it's best to remove assumptions based on "probably" though
right?

~~~
marshray
It's cloud.

------
danbruc
The result doesn't really surprise me - many operations are bound by the
available bandwidth. There is even a compressor named Blosc [1] that speeds up
operations by moving compressed data between memory and L1 cache and
(de)compressing it there instead of moving the uncompressed data.

[1] <http://blosc.pytables.org/>

------
fsiefken
The Btfrs and Reiser4 filesystems also support transparent compression and
might currently be a better alternative to increase Postgresql query speed.
Btfrs supports gzip, LZO, LZ4 and Snappy and is in the mainline linux kernel,
Reiser4 is still maintained and available as a patch on Linux 3.8.5 (latest is
3.8.8) and supports LZO and gzip (alternatively there are also the embedded
NAND flash medium compatible filesystems F2FS and UBIFS which both improve on
the JFFS2 filesystem and it's transparent compression). For I/O bound queries
SSD drives (in your preferred raid configuration) also will speed up the
system. Btfrs has built-in support for TRIM SSD already, Reiser4 TRIM/SSD
support is being discussed among the remaining developers.

~~~
laumars
Reiser4 is in a weird place after the conviction of Hans. I'm not sure I'd
want to trust a production system on it. And I've been less than impressed
with BtrFS on the test systems I've ran it on (though I'm aware there's others
who swear by it - I'm only talking about my experiences).

ZFS is a fantastic file system, but I can't help wondering if part of the
issue is the fact that the benchmarking was conducted on a virtual machine.
ZFS is better suited for raw disks than virtual devices (again, just my
anacdotal evidence. I've never ran benchmarks myself).

~~~
iso8859-1
Btrfs is unstable too. Source: <https://news.ycombinator.com/item?id=5460449>

------
GalacticDomin8r
This isn't the first time benchmarks like this have been done and these
results are consistent with the earlier ones.

It shouldn't surprise most people that enabling transparent compression gives
these benefits. Why you ask? Well what is the largest bottleneck in a system?
Disk IO - by far. So all ZFS is doing is transferring workload to a subsystem
you likely have plenty of(CPU) from one that you have the least of(Disk
IO/latency)

------
jamhan
Is it just me or is "Compression Ratio" a poor label for the graph in that
article? Normally, when one uses "Compression Ratio", it is the opposite of
those numbers, i.e. EXT3 storage would be 1:1, ZFS-LZJB would be 2:1 (not
0.5), and ZFS-gzip would be 3.33:1 (not 0.3). It's a small thing I know but it
turns convention on its head in its current form. A better label would be
perhaps "Storage Size Ratio".

~~~
marshray
I don't see a problem with them expressing the ratio as a decimal since it
becomes a simple multiplier of the original file size 38GB x 0.3.

But it's downright misleading to show the vertical axis from something other
than 0.0 to 1.0 when comparing ratios. They start it at 0.2. In reality, LZJB
is saving 50% of the space whereas gzip saves 70%. But a naive glance at the
graph implies gzip look roughly 3 times smaller/better than LZJB.

Classic "How to Lie with Statistics" stuff.* I would have expected better from
an "analytics" database.

* Not saying they intend to lie here but it's representative of the classic text <https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics>

~~~
cwsteinbach
Author here. Believe it or not I originally had the compression ratio graph
rotated 90 degrees, and had manually modified it to run from 0.00 to 1.00.
Google docs for some god awful reason insists on starting at 0.2 by default.
Anyway, when my colleagues reviewed a draft of this post they requested that I
rotate the graph back, and in the process I forgot to reset the scale. Sorry
for the confusion. It's fixed now. As for the definition of "compression
ratio", I looked this up and went with the definition found here:
<http://en.wikipedia.org/wiki/Data_compression_ratio>

I agree that it's kind of counterintuitive.

~~~
marshray
Perhaps "file size on disk" would be an unambiguous way to put it.

------
jacob019
Would love to see these performance metrics on a powerful system with pcie or
raided SSD's. Would be interesting to find the tipping point where the extra
CPU time outweighs the IO reduction. Even if the DB layer performs better
total application response time could be negatively impacted for CPU intensive
work loads as the compression steals cycles from the application layer.

------
petsos
Can someone give us an overview of the state of ZFS on Linux? Last time I had
checked it was implemented over fuse. Has this changed?

~~~
cdjk
There are kernel modules here, which is what I assume they're using:

<http://zfsonlinux.org>

The licensing problems only apply to distributing CDDL and GPL code that have
been compiled into the same binary, not running a CDDL-licensed module in a
GPL kernel - I think. My experience with ZFS (which is awesome, btw) comes
from FreeBSD.

------
nemothekid
If I'm reading this right, with ZFS compression enabled I am seeing 1/3rd disk
usage and 3x increase of speeds in query times just from switching the
filesystem. Stats like that make me very skeptical. Does this mean that I can
get a 3x increase in speed while cutting my disk space down by a third just by
switching to ZFS? If so, why isn't everyone doing this?

~~~
lwat
The way I make sense of this is that you need fewer (slow) disk reads to get
the same amount of data into RAM, so that might explain the speedup?

I agree that it sounds too good to be true though.

~~~
rosser
Your read is correct. Once CPU time spent in decompression became less than
disk wait time for the same data uncompressed, the reduced IO with compression
started to win — sometimes massively. As powerful as processors are these
days, results like these aren't impossible, or even terribly unlikely.

Consider the analogous (if simplified) case of logfile parsing, from my
production syslog environment, with full query logging enabled:

    
    
      # ls -lrt
      ...
      -rw------- 1 root root  828096521 Apr 22 04:07 postgresql-query.log-20130421.gz
      -rw------- 1 root root 8817070769 Apr 22 04:09 postgresql-query.log-20130422
      # time zgrep -c duration postgresql-query.log-20130421.gz
      19130676
    
      real	0m43.818s
      user	0m44.060s
      sys	0m6.874s
      # time grep -c duration postgresql-query.log-20130422
      18634420
    
      real	4m7.008s
      user	0m9.826s
      sys	0m3.843s
    

EDIT: I'm not sure why time(1) is reporting more "user" time than "real" time
in the compressed case.

~~~
caf
zgrep runs grep and gzip as two separate subprocesses, so if you have multiple
CPUs then the entire job can accumulate more CPU time than wallclock time (so
it's just showing you that you exploited some parallelism, with grep and gzip
running simultaneously for part of the time).

------
atoponce
I'm curious of the rest of the architecture. Each benchmark needs to be tested
separately, as ZFS is likely caching the reads in the ARC. We also need a
benchmark of ZFS without compression enabled.

However, we're not showing how bad ext3 is, but that the end result still
shows the stellar performance, compression or not.

~~~
cwsteinbach
> as ZFS is likely caching the reads in the ARC

Each of the seven queries we used in our benchmark required a sequential scan
of the 32GB dataset. It's unlikely that the ARC had any impact on the results
since the EC2 instance had only 7GiB of memory.

------
cafard
I tried running Oracle on ZFS for a while, with fairly terrible results. A bit
of examination showed that ZFS was fine for table scans but had bad
performance with indexes. It may be possible to tune one's way around this,
but I simply dumped ZFS in favor of Automated Storage Management.

