
SnappyData: OLTP and OLAP Database Built on Apache Spark - plamb
https://github.com/SnappyDataInc/snappydata
======
eranation
Interesting and much needed!

Few questions if the maintainers / owners read this:

1\. what is the license? it seems to be not commercial friendly at the moment,
but I couldn't tell if it's GPL / AGPL or something else

2\. what are your plans for supporting network/graph data structures, graph
analytics etc.

3\. visualisation plans?

4\. is the target to become a fully fledged BI tool? compete with the likes of
Tableau / Spotfire only on top of Big Data? (if so I'm very interested!)

~~~
sudsmenon
1> Most of it is straight up Apache 2.0 license. The only thing that is not
Apache licensed is the approximate query processing piece which is closed
source right now. We intend to build out a community and that predicates that
much of what we do has to be open source

2> Since we have spliced a database into Spark such that Spark and Snappy
share the same memory space. Everything that Spark supports will be supported.
The data can be accessed as data frames and accessed using the Spark API (and
we will be a 100% compatible)

3> Absolutely. In upcoming releases. Both from a data standpoint and from a
management and monitoring, and security.

4> I think becoming a full fledged tool is not on the cards. Think of this as
a enterprise class real time operational analytics platform. But we intend to
integrate and work closely with the leading tool vendors (especially the AQP
piece is great for aggregate class visualizations)

------
salsakran
Looks very interesting!

Would you be interested in helping us write a SnappyData connector for
Metabase? github.com/metabase/metabase

~~~
jagsr123
Metabase looks very interesting. We will go through it in more detail. Can you
point to any doc on what it takes to create a connector? Note that Snappydata
has two straightforward entry points - SQL (e.g. JDBC connector) or using the
Spark APIs. Looks like the SQL connector route is the simplest?

------
hsshah
This can be great if it lives up to it's promise.

I could not find any performance numbers from real-life scale deployments.
Will have to wait before giving a verdict.

~~~
plamb
Hi hsshah, we talk about some benchmarking we did in our technical paper (page
10, "Experiments") [http://www.snappydata.io/snappy-
industrial](http://www.snappydata.io/snappy-industrial). This is not a real-
life scale deployment, but may be useful.

------
twistedpair
Why add Spark? We run Spark in prod and have to actively patch the code based
because there are so many ways to make it crash and burn. LDAP is a solved
problem, why add Spark and instability to the mix?

~~~
sudsmenon
Fundamentally, any enterprise today has to deal with OLTP data, OLAP data
(transactional plus other sources), Streaming and finally machine learning.
Our premise is that you can choose to use 4 different platforms for each one
or move to a unified platform. Spark offers streaming, it offers Spark SQL and
there are a bunch of machine learning libraries available on Spark. Also,
everyone who has anything to do with data has a connector to Spark so it
becomes a good data integration platform. The API is uniform across these. So
it forms a good substrate for what we are trying to do. As for bugs, it is a
platform that is growing rapidly , going through some growing pains, and over
a period of time, it will mature. We believe that the core capabilities will
become powerful over time (SparkSQL, Streaming etc.) It is a somewhat
opinionated choice but one that we think will pan out over time.

~~~
plamb
Also (correct me if I'm wrong), the stability of SnappyData will depend more
on GemFireXD (related to Apache Geode), the in-memory database that has been
integrated with Spark to form SnappyData, then it will depend on Spark.
GemFire has been in development for over a decade and has a multitude
production use cases.

------
superkk
I wonder if they added Spark into the mix just for marketing purposes. Since
its born out of Pivotal, I guess they'll end up gemfire much more than Spark.

~~~
jagsr123
@superkk Ironically, we debated this extensively and concluded the reverse
(primarily because we believe the Spark API is succinct and integration with
the eco-system will be easy) - much more like spark than Gemfire. In fact, the
API is pretty much Spark. For instance, you don't care about updating data,
you just use the Spark DataFrame API. (
[https://github.com/SnappyDataInc/snappydata#quick-start-
with...](https://github.com/SnappyDataInc/snappydata#quick-start-with-
scalasparksnappy-programming) )

------
vishalzone2002
does the snappy in snappydata inspired/referred from "snappy compression"

~~~
plamb
No we were not referring to snappy compression when we came up with the name.

~~~
zootm
I think we're running out of good names for things, especially since Spark
uses Snappy (compression) itself.

It is time to build a tool on top of SnappyData called Spark.

------
dkoch
Looks very promising!

Any plans to expose a Python API in addition to the Scala API?

~~~
jagsr123
Yes. Hopefully soon.

