
Design and Implementation of the Wave Transactional Filesystem - rkwasny
http://arxiv.org/abs/1509.07821
======
nl
We were just joking at work today how Cloudera's new Kudu "fast storage
system"[1] compares itself to "fast" HBase. At least there they were talking
about random access (something that HBase is reasonably ok at), and put HDFS
on the "slow" end of the spectrum when talking about storage.

This system claims:

 _Experiments show that WTF can qualitatively outperform the industry-standard
HDFS distributed filesystem, up to a factor of four in a sorting benchmark, by
reducing I /O costs_

They benchmark a sort 300GB of data. You know what's fast at sorting 300GB
data? My laptop.

Sort a PB[2] much faster than Spark+HDFS and then it might be worth looking
at. Even 100TB[3] is an interesting size.

Maybe I'm being too harsh? What am I missing here?

[1] [http://blog.cloudera.com/blog/2015/09/kudu-new-apache-
hadoop...](http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-
storage-for-fast-analytics-on-fast-data/)

[2] [https://databricks.com/blog/2014/10/10/spark-petabyte-
sort.h...](https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html)
(note that this was on disk with HDFS, not Spark in memory).

[3] [http://sortbenchmark.org/](http://sortbenchmark.org/)

------
notacoward
Looks very interesting. A more thorough performance comparison - both more
alternatives and more varied workloads - would certainly be nice, though. HDFS
isn't even close to full POSIX semantics, so comparing against it doesn't tell
us much about the viability of the WTF approach for more general use. Maybe if
I have some spare time I'll take it for a spin myself.

Disclaimer: I'm a developer on the Gluster project.

------
binarymax
Quality of the paper looks like a good read, notwithstanding the terrible
acronym for the name.

