Hacker News new | past | comments | ask | show | jobs | submit login
Apache Zeppelin – Interactive analytics on Spark (apache.org)
183 points by sejun on Apr 30, 2015 | hide | past | web | favorite | 33 comments

Interesting. I wonder how this compares to an IPython notebook?

While I haven't tried it, I imagine that you can use pyspark from an IPython notebook and have much of the same functionality as with Apache Zeppelin. Makes me wonder if it wouldn't have been better to just extend IPython notebooks instead of starting a new project.

Edit: How to run PySpark from an IPython notebook[1]

[1] http://ramhiser.com/2015/02/01/configuring-ipython-notebook-...

Also IPython is moving towards splitting the notebook part(Jupyter) and python part(IPython), so you can run any language in notebook (like IHaskell).

Details are in IPython changelog:

> 3.x will be the last monolithic release of IPython, as the next release cycle will see the growing project split into its Python-specific and language-agnostic components. Language-agnostic projects (notebook, qtconsole, etc.) will move under the umbrella of the new Project Jupyter name



I'm so excited for this. The power of org-babel is the common 'org table' format that can be used to interchange data between different languages.

Has IPython thought about how to send data / state between steps without python?

I'm one of committers of Apache Zeppelin (incubating). Zeppelin is inspired by iPython notebook and many other amazing softwares that has notebook interface.

I know iPython notebook has long history and large community, i really like it. Zeppelin is young, new project compare to iPython notebook.

Zeppelin and iPython notebook, they are opensource. iPython notebook is lead by IPython Development Team. Zeppelin is under Apache Software Foundation and it is being developed in Apache way, from copyrights, development process, decision making process to community development.

Zeppelin is focusing on providing analytical environment on top of Hadoop eco-system. I'm not sure about iPython's direction, but i don't think it's the same to Zeppelin.

I see many projects that has notebook interface. Not only iPython and Zeppelin, but also Databricks Cloud, Spark Notebook, Beaker and many others. I'm sure they all have their own advantages. Hope see all softwares are beloved by users.


No one seems to mention that they're all imitating Mathematica, which is still going strong since 1988 with an amazing notebook facility.

I find it concerning when project developers don't seem to know exactly WHY the product exists or how it compares to competition on the market.

If I'm spending a lot of time working on a product that has a clear competitor(s), I'd expect to know exactly why my product is going to be better than those competitors (ie. in which ways), and what their weaknesses are.

A lot of open source developers go to great lengths to avoid admitting that anything competes with anything else. I don't know if it's just conflict avoidance, rationalizing failure, or what.

I can spend hours and hours to explain why Zeppelin is good and what is different feature compare to the other.

for example, Zeppelin has pluggable architecture so it does support not only scala but also python with built in Spark integration. Zeppelin not only support data exchange between scala and python environment but also SparkContext sharing for spark cluster resource utilization. It's got ability create rich interactive analytics GUI inside of notebook. It's got customizable layout system, and so on..

But I wanted to address more fundamental difference. Most competitors are opensource project. And as a opensource project, how community works and project direction are the most important things. That'll make huge difference in the end. And i think i explained and compared them.

Hi leemoonsoo,

  Great JOB !! This is exactly what I was looking for. How to add Python an R in this notebook ?
Thanks in advance for your answer

Yes you can use IPython with Spark for Pyspark. Spark-notebook and Zeppelin, gives let's you use Scala with Spark,and support all of the other spark libraries. There are other projects like spark-kernel https://github.com/ibm-et/spark-kernel that let's you use IPython (Jupyter ) I guess, supporting spark's scala API.

Hilarious that i've tried this one but not the IPython one.

It's actually quite nice and very promising but still a little raw and rough around the edges. It's just great to see an open source project to compete with the commercial ones that seem to be popping up all over the place. And it's very modular so it could be used for more than just Spark based jobs.

The size and growth of the Jupyter / IPython ecosystem is its biggest draw for me. For example, there are roughly 125,000 IPython notebooks on GitHub as of today from which I can learn.


Beaker is mostly compatible, and has an import feature, so you can open and run an IPython notebook as a Beaker Notebook.

Hi spot,

  Is it possible to run sparl/scala in Beaker ?

Notebooks where all the rage in the 90's. I spent quite a few years (~15) with Mathematica and its "notebook". Of course, Maple, Mathcad and Matlab all had the same thing. At some point, I wanted to write more readable and modular code that others could use (and that I could reuse) so I switched to using and IDE for python, Java and C. My workflow enables exploratory data analysis as well as algorithm development. The huge advantage of IDE's vs. REPL's is that in the end, it's easier to write the quality of code that is ready to be shipped off to a production server. I have watched the "Return of the Notebook" with some trepidation: yes, it allows quick iteration and lower barrier to entry, but ultimately if your goal is to create a software product, learning good coding skills in an IDE is a much better path.

> ultimately if your goal is to create a software product, learning good coding skills in an IDE is a much better path

IPython developer here. A lot of the use cases for notebooks are where you're not creating a software product. It's useful where the product you're really after is a scientific result, some plots, or the like, along with a description of how you got that. And where the product is a presentation, a demo, or documentation.

Of course, as you're doing that kind of thing, code that starts out in a notebook often becomes something you want to reuse, and you therefore move it out to an importable module. We are interested in ways to make that process more fluid.

This is one thing I struggle with. I usually start in a notebook and then eventually want/need to port to a module environment. It would be nice if there were some guidelines or support for this sort of transition. Any pointers?

I must admit I now need to go figure out what a "notebook" is in this context. I don't really get what Zeppelin -is-.

But I'm curious.

Think of it as an interactive notebook where you can not only write down notes, but also enter data and code and interact with it. You can then draw nice graphs of the data and explore it interactively.

The nice thing about it is that it is all repeatable and that you can share it with other people/researchers.

A good demo would be to watch https://www.youtube.com/watch?v=lO7LhVZrNwA. This was a real time demo at Spark Summit last year.

Note: The demo gets pretty interesting at https://youtu.be/lO7LhVZrNwA?t=36m50s. The presenter handles the situation very well afterwards !

Just to be clear, though that video shows an example of what a notebook is in this context, that isn't a demo of Zeppelin. (It wasn't clear which you were referencing in your comment.)

That's a demo of Databricks Cloud [0], which is Databricks's product offering.

Among the many things it offers is an interactive notebook that looks like open source alternatives like Zeppelin.

[0] https://databricks.com/product/databricks-cloud

Notebook in this example is much like the notebook concept in ipython.

Basically, it is a literate program that allows you to intersperse code, documentation and visualizations in the same "file" called a notebook.

If you are familiar with how mathematica works, it's basically the same thing.

If you do data-science at all, or even get tired of working in excel, I recommend picking up ipython and matplotlib.

The addition of angularjs is going to power some really nice dashboards. Thanks for the feature and the detailed demos (and comments on github). Looking forward to playing with this a lot!

Notebook interfaces are proliferating.

Here's the one I work on: http://BeakerNotebook.com

Note that we are hiring, especially front-end engineers. Do open source fulltime in NYC.

This is really very nice! For me, seem like it might scratch an itch, on which I have had to rely on RStudio (which is also awesome for iterative exploration and display, but a "heavy" context switch if R is not your favorite language ).

Wow, now this looks useful! Ability to pass data between Python to R seamlessly? I'm putting this on my list to check out. Thanks!

I have no idea about Zeppelin's lineage, but looks like there's also Spark Notebook https://github.com/andypetrella/spark-notebook which more closely resembles the IPython notebook. I'd love to hear an explanation of the differences between all of these notebooks.

Can't wait to see if the Jupyter split will contribute to a consolidation or proliferation in the notebook-verse...

I've been following Zeppelin for a while and have been actively using it for a couple of weeks (using Scala and SparkSQL/Dataframes), and although there are a few rough edges, it has been a godsend for data exploration, analysis, and feature extraction. If you're working with Spark, I highly recommend giving it a try.

Hi all of you,

I want to have a notebook where I can run

   - Spark/scala (like in zeppelin)
   - Python (Like in Ipython/Jupyter)
   - R
How can I do this ?

I've yet to get around to trying it, but there is beakernotebook.com which claims to be basically iPython, but with support for multiple languages within the same notebook. It looks really neat.

looks alot like the databricks cloud notebook.. looking foward to taking for a spin

Let's just say that this was out before Databricks Cloud, and it was open-source and elements of it were very similar to the demo shown at Spark summit in July 2014.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact