
Apache Zeppelin – Interactive analytics on Spark - sejun
http://zeppelin.incubator.apache.org/
======
rkrzr
Interesting. I wonder how this compares to an IPython notebook?

While I haven't tried it, I imagine that you can use pyspark from an IPython
notebook and have much of the same functionality as with Apache Zeppelin.
Makes me wonder if it wouldn't have been better to just extend IPython
notebooks instead of starting a new project.

Edit: How to run PySpark from an IPython notebook[1]

[1] [http://ramhiser.com/2015/02/01/configuring-ipython-
notebook-...](http://ramhiser.com/2015/02/01/configuring-ipython-notebook-
support-for-pyspark/)

~~~
leemoonsoo
I'm one of committers of Apache Zeppelin (incubating). Zeppelin is inspired by
iPython notebook and many other amazing softwares that has notebook interface.

I know iPython notebook has long history and large community, i really like
it. Zeppelin is young, new project compare to iPython notebook.

Zeppelin and iPython notebook, they are opensource. iPython notebook is lead
by IPython Development Team. Zeppelin is under Apache Software Foundation and
it is being developed in Apache way, from copyrights, development process,
decision making process to community development.

Zeppelin is focusing on providing analytical environment on top of Hadoop eco-
system. I'm not sure about iPython's direction, but i don't think it's the
same to Zeppelin.

I see many projects that has notebook interface. Not only iPython and
Zeppelin, but also Databricks Cloud, Spark Notebook, Beaker and many others.
I'm sure they all have their own advantages. Hope see all softwares are
beloved by users.

Cheers

~~~
sbenario
I find it concerning when project developers don't seem to know exactly WHY
the product exists or how it compares to competition on the market.

If I'm spending a lot of time working on a product that has a clear
competitor(s), I'd expect to know exactly why my product is going to be better
than those competitors (ie. in which ways), and what their weaknesses are.

~~~
wmf
A lot of open source developers go to great lengths to avoid admitting that
anything competes with anything else. I don't know if it's just conflict
avoidance, rationalizing failure, or what.

~~~
leemoonsoo
I can spend hours and hours to explain why Zeppelin is good and what is
different feature compare to the other.

for example, Zeppelin has pluggable architecture so it does support not only
scala but also python with built in Spark integration. Zeppelin not only
support data exchange between scala and python environment but also
SparkContext sharing for spark cluster resource utilization. It's got ability
create rich interactive analytics GUI inside of notebook. It's got
customizable layout system, and so on..

But I wanted to address more fundamental difference. Most competitors are
opensource project. And as a opensource project, how community works and
project direction are the most important things. That'll make huge difference
in the end. And i think i explained and compared them.

------
po84
The size and growth of the Jupyter / IPython ecosystem is its biggest draw for
me. For example, there are roughly 125,000 IPython notebooks on GitHub as of
today from which I can learn.

[http://nbviewer.ipython.org/gist/parente/facb555dfbae28e817e...](http://nbviewer.ipython.org/gist/parente/facb555dfbae28e817e0)

~~~
spot
Beaker is mostly compatible, and has an import feature, so you can open and
run an IPython notebook as a Beaker Notebook.

~~~
bourbe
Hi spot,

    
    
      Is it possible to run sparl/scala in Beaker ?
    

Regards

------
mbrzusto
Notebooks where all the rage in the 90's. I spent quite a few years (~15) with
Mathematica and its "notebook". Of course, Maple, Mathcad and Matlab all had
the same thing. At some point, I wanted to write more readable and modular
code that others could use (and that I could reuse) so I switched to using and
IDE for python, Java and C. My workflow enables exploratory data analysis as
well as algorithm development. The huge advantage of IDE's vs. REPL's is that
in the end, it's easier to write the quality of code that is ready to be
shipped off to a production server. I have watched the "Return of the
Notebook" with some trepidation: yes, it allows quick iteration and lower
barrier to entry, but ultimately if your goal is to create a software product,
learning good coding skills in an IDE is a much better path.

~~~
takluyver
> ultimately if your goal is to create a software product, learning good
> coding skills in an IDE is a much better path

IPython developer here. A lot of the use cases for notebooks are where you're
not creating a software product. It's useful where the product you're really
after is a scientific result, some plots, or the like, along with a
description of how you got that. And where the product is a presentation, a
demo, or documentation.

Of course, as you're doing that kind of thing, code that starts out in a
notebook often becomes something you want to reuse, and you therefore move it
out to an importable module. We are interested in ways to make that process
more fluid.

~~~
pgroth
This is one thing I struggle with. I usually start in a notebook and then
eventually want/need to port to a module environment. It would be nice if
there were some guidelines or support for this sort of transition. Any
pointers?

------
thebouv
I must admit I now need to go figure out what a "notebook" is in this context.
I don't really get what Zeppelin -is-.

But I'm curious.

~~~
ujjwal_wadhawan
A good demo would be to watch
[https://www.youtube.com/watch?v=lO7LhVZrNwA](https://www.youtube.com/watch?v=lO7LhVZrNwA).
This was a real time demo at Spark Summit last year.

Note: The demo gets pretty interesting at
[https://youtu.be/lO7LhVZrNwA?t=36m50s](https://youtu.be/lO7LhVZrNwA?t=36m50s).
The presenter handles the situation very well afterwards !

~~~
nchammas
Just to be clear, though that video shows an example of what a notebook is in
this context, that isn't a demo of Zeppelin. (It wasn't clear which you were
referencing in your comment.)

That's a demo of Databricks Cloud [0], which is Databricks's product offering.

Among the many things it offers is an interactive notebook that looks like
open source alternatives like Zeppelin.

[0] [https://databricks.com/product/databricks-
cloud](https://databricks.com/product/databricks-cloud)

------
leemoonsoo
For someone who is interested, Here's some videos of Apache Zeppelin
(Incubating)

[https://www.youtube.com/watch?v=QdjZyOkcG_w](https://www.youtube.com/watch?v=QdjZyOkcG_w)

[https://www.youtube.com/watch?v=_PQbVH_aO5E&feature=youtu.be](https://www.youtube.com/watch?v=_PQbVH_aO5E&feature=youtu.be)

[https://www.youtube.com/watch?v=cWuWUvWVLx4](https://www.youtube.com/watch?v=cWuWUvWVLx4)

[https://www.youtube.com/watch?v=xU5TBS_MsAs](https://www.youtube.com/watch?v=xU5TBS_MsAs)

~~~
viklas
The addition of angularjs is going to power some really nice dashboards.
Thanks for the feature and the detailed demos (and comments on github).
Looking forward to playing with this a lot!

------
spot
Notebook interfaces are proliferating.

Here's the one I work on:
[http://BeakerNotebook.com](http://BeakerNotebook.com)

Note that we are hiring, especially front-end engineers. Do open source
fulltime in NYC.

~~~
mhuffman
This is really very nice! For me, seem like it might scratch an itch, on which
I have had to rely on RStudio (which is also awesome for iterative exploration
and display, but a "heavy" context switch if R is not your favorite language
).

------
wcbeard10
I have no idea about Zeppelin's lineage, but looks like there's also Spark
Notebook [https://github.com/andypetrella/spark-
notebook](https://github.com/andypetrella/spark-notebook) which more closely
resembles the IPython notebook. I'd love to hear an explanation of the
differences between all of these notebooks.

Can't wait to see if the Jupyter split will contribute to a consolidation or
proliferation in the notebook-verse...

------
tperrigo
I've been following Zeppelin for a while and have been actively using it for a
couple of weeks (using Scala and SparkSQL/Dataframes), and although there are
a few rough edges, it has been a godsend for data exploration, analysis, and
feature extraction. If you're working with Spark, I highly recommend giving it
a try.

------
bourbe
Hi all of you,

I want to have a notebook where I can run

    
    
       - Spark/scala (like in zeppelin)
       - Python (Like in Ipython/Jupyter)
       - R
    

How can I do this ?

~~~
dagw
I've yet to get around to trying it, but there is beakernotebook.com which
claims to be basically iPython, but with support for multiple languages within
the same notebook. It looks really neat.

------
__database__
looks alot like the databricks cloud notebook.. looking foward to taking for a
spin

~~~
jamesblonde
Let's just say that this was out before Databricks Cloud, and it was open-
source and elements of it were very similar to the demo shown at Spark summit
in July 2014.

