
Apache Arrow – Powering Columnar In-Memory Analytics - bertzzie
https://arrow.apache.org/
======
PDoyle
Oops... The first sentence in the "Fast" section says "SIMD (Single input
multiple data)".

------
filereaper
Asking the stupid question here, but why create a new Apache project for this?

Apache Arrow seems to be targeting the use of SIMD which is a very JVM/Runtime
dependent feature. If the runtime can't detect this out-of-the-box then create
recognized method or some sort of intrinsic to coax the runtime to SIMD-ize
the operation.

I understand the performance gains of this but why not add this functionality
to existing projects like Parquet or HTable etc...

This just comes to mind: [https://xkcd.com/927/](https://xkcd.com/927/)

~~~
rz2k
I don't know the answer, but in this case does columnar store imply that it is
a collection of arrays, perhaps for a scientific database, and a bit different
than HBase?

Here's someone else's blog post from 2010 on different categories of columnar
store DBs:

[http://dbmsmusings.blogspot.com/2010/03/distinguishing-
two-m...](http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-
types-of_29.html)

~~~
infinite8s
That "someone else" is Daniel Abadi, one of the researchers who re-popularized
the idea of column stores during his graduate work at MIT (in addition to
researchers at CWI).

------
ljoshua
Is this similar to how QlikView's in-memory engine works?

~~~
jandrewrogers
Yes, almost all BI cache database engines are typically designed this way.

------
threeseed
It really is a confusing title for the project. It's more of a high speed
interchange format e.g. send data to Cassandra from Spark or Storm.

Nothing that end users will ever really have to know anything about.

------
axman6
I'm confused, is this just Structure of Arrays as a service for columnar data?
It's not clear to me what this actually does.

~~~
tveita
It's not really a service at all, it is a in-memory data format intended to be
shareable between processes. The project also includes libraries for C++, Java
and Python.

This post explains the intention better than the project webpage:

[http://blog.cloudera.com/blog/2016/02/introducing-apache-
arr...](http://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-
interoperable-in-memory-columnar-data-structure-standard/)

