SnappyData: OLTP and OLAP Database Built on Apache Spark

eranation · on Feb 3, 2016

Interesting and much needed!

Few questions if the maintainers / owners read this:

1. what is the license? it seems to be not commercial friendly at the moment, but I couldn't tell if it's GPL / AGPL or something else

2. what are your plans for supporting network/graph data structures, graph analytics etc.

3. visualisation plans?

4. is the target to become a fully fledged BI tool? compete with the likes of Tableau / Spotfire only on top of Big Data? (if so I'm very interested!)

sudsmenon · on Feb 3, 2016

1> Most of it is straight up Apache 2.0 license. The only thing that is not Apache licensed is the approximate query processing piece which is closed source right now. We intend to build out a community and that predicates that much of what we do has to be open source

2> Since we have spliced a database into Spark such that Spark and Snappy share the same memory space. Everything that Spark supports will be supported. The data can be accessed as data frames and accessed using the Spark API (and we will be a 100% compatible)

3> Absolutely. In upcoming releases. Both from a data standpoint and from a management and monitoring, and security.

4> I think becoming a full fledged tool is not on the cards. Think of this as a enterprise class real time operational analytics platform. But we intend to integrate and work closely with the leading tool vendors (especially the AQP piece is great for aggregate class visualizations)

plamb · on Feb 3, 2016

I'm an employee for SnappyData, just letting you know I'm going to make sure these questions get answered.

salsakran · on Feb 3, 2016

Looks very interesting!

Would you be interested in helping us write a SnappyData connector for Metabase? github.com/metabase/metabase

jagsr123 · on Feb 3, 2016

Metabase looks very interesting. We will go through it in more detail. Can you point to any doc on what it takes to create a connector? Note that Snappydata has two straightforward entry points - SQL (e.g. JDBC connector) or using the Spark APIs. Looks like the SQL connector route is the simplest?

hsshah · on Feb 3, 2016

This can be great if it lives up to it's promise.

I could not find any performance numbers from real-life scale deployments. Will have to wait before giving a verdict.

plamb · on Feb 3, 2016

Hi hsshah, we talk about some benchmarking we did in our technical paper (page 10, "Experiments") http://www.snappydata.io/snappy-industrial. This is not a real-life scale deployment, but may be useful.

twistedpair · on Feb 3, 2016

Why add Spark? We run Spark in prod and have to actively patch the code based because there are so many ways to make it crash and burn. LDAP is a solved problem, why add Spark and instability to the mix?

sudsmenon · on Feb 3, 2016

Fundamentally, any enterprise today has to deal with OLTP data, OLAP data (transactional plus other sources), Streaming and finally machine learning. Our premise is that you can choose to use 4 different platforms for each one or move to a unified platform. Spark offers streaming, it offers Spark SQL and there are a bunch of machine learning libraries available on Spark. Also, everyone who has anything to do with data has a connector to Spark so it becomes a good data integration platform. The API is uniform across these. So it forms a good substrate for what we are trying to do. As for bugs, it is a platform that is growing rapidly , going through some growing pains, and over a period of time, it will mature. We believe that the core capabilities will become powerful over time (SparkSQL, Streaming etc.) It is a somewhat opinionated choice but one that we think will pan out over time.

plamb · on Feb 3, 2016

Also (correct me if I'm wrong), the stability of SnappyData will depend more on GemFireXD (related to Apache Geode), the in-memory database that has been integrated with Spark to form SnappyData, then it will depend on Spark. GemFire has been in development for over a decade and has a multitude production use cases.

mhw · on Feb 3, 2016

LDAP and OLAP are different things.

nl · on Feb 4, 2016

In what way is LDAP related to this project?

(We run Spark in production and don't have the issues you are seeing at all)

superkk · on Feb 3, 2016

I wonder if they added Spark into the mix just for marketing purposes. Since its born out of Pivotal, I guess they'll end up gemfire much more than Spark.

jagsr123 · on Feb 3, 2016

@superkk Ironically, we debated this extensively and concluded the reverse (primarily because we believe the Spark API is succinct and integration with the eco-system will be easy) - much more like spark than Gemfire. In fact, the API is pretty much Spark. For instance, you don't care about updating data, you just use the Spark DataFrame API. ( https://github.com/SnappyDataInc/snappydata#quick-start-with... )

sudsmenon · on Feb 3, 2016

The beauty of open source is the we do not have to guess. Take a look at the source. GemFire offers some amazing capabilities and it would be foolish to not use them, but the integration of Spark into the platform is fairly deep and we intend to contribute things back to Spark over time. But marketing is not complaining :)

vishalzone2002 · on Feb 4, 2016

does the snappy in snappydata inspired/referred from "snappy compression"

plamb · on Feb 4, 2016

No we were not referring to snappy compression when we came up with the name.

zootm · on Feb 4, 2016

I think we're running out of good names for things, especially since Spark uses Snappy (compression) itself.

It is time to build a tool on top of SnappyData called Spark.

dkoch · on Feb 3, 2016

Looks very promising!

Any plans to expose a Python API in addition to the Scala API?

jagsr123 · on Feb 3, 2016

Yes. Hopefully soon.