
SnappyData: A Hybrid Transactional Analytical Store Built on Spark - plamb
http://dl.acm.org/citation.cfm?id=2899408
======
placeybordeaux
Ah I was really hoping this was just a library that could be used in spark
that implements probabilistic data stores. This still looks interesting, but
it looks like this would take a lot more work to spin up.

~~~
plamb
It takes about as long as it takes to start up a Spark cluster, and you can
interact with it entirely through Spark APIs if that's what you're comfortable
with. It can also be used in "Split Cluster Mode" so you can use your existing
Spark build instead of what's embedded within SnappyData if you prefer.

~~~
placeybordeaux
Right now it is easy for me to spin up a spark cluster on YARN, but hard for
me to acquire a number of machines to run arbitrary code.

------
plamb
Github repo:
[https://github.com/SnappyDataInc/snappydata](https://github.com/SnappyDataInc/snappydata)

~~~
frak_your_couch
So, you guys forked gemfire, spark and spark-jobserver. Are you planning on
going back into mainline or will you be maintaining the forks to stay current
at some interval?

~~~
jagsr123
Just to be clear, we do support Snappydata as a library (unfortunately, the
docs don't make this clear) that can work with your distribution (upto 1.6
today). Our releases will support the latest Spark version in a staggered
manner.

Everything we have done is an extension to Spark. None of the existing
functionality is lost. In a sense, we have turned Spark to work more like a
database. So, don't think we can turn it back into the "mainline" (assuming
that is what you meant).

------
ddispaltro
How much of this is still relevant with the upcoming 2.0 structured streams?

~~~
jagsr123
Structured streaming really makes spark much more competitive for streaming
use cases by allowing streams to be seen as DataFrames and hence make it super
simple to run continuous queries using SQL on streams, but, often you also
need to work with historical data in your analytic query, continuously mutate
data (which may require transactional semantics), make sure the data itself is
HA (not just fault tolerant; important for low latency apps), etc. What we do
is fuse in modern in-memory DB with spark (i.e. DB cluster nodes run spark
executors) so there is no need to couple spark with some other data management
cluster. That said, some of the APIs we introduced for SQL stream processing
will be replaced by the new Structured streaming APIs.

