While I haven't tried it, I imagine that you can use pyspark from an IPython notebook and have much of the same functionality as with Apache Zeppelin. Makes me wonder if it wouldn't have been better to just extend IPython notebooks instead of starting a new project.
Edit: How to run PySpark from an IPython notebook
Details are in IPython changelog:
> 3.x will be the last monolithic release of IPython, as the next release cycle will see the growing project split into its Python-specific and language-agnostic components. Language-agnostic projects (notebook, qtconsole, etc.) will move under the umbrella of the new Project Jupyter name
Has IPython thought about how to send data / state between steps without python?
I know iPython notebook has long history and large community, i really like it. Zeppelin is young, new project compare to iPython notebook.
Zeppelin and iPython notebook, they are opensource. iPython notebook is lead by IPython Development Team. Zeppelin is under Apache Software Foundation and it is being developed in Apache way, from copyrights, development process, decision making process to community development.
Zeppelin is focusing on providing analytical environment on top of Hadoop eco-system. I'm not sure about iPython's direction, but i don't think it's the same to Zeppelin.
I see many projects that has notebook interface. Not only iPython and Zeppelin, but also Databricks Cloud, Spark Notebook, Beaker and many others.
I'm sure they all have their own advantages. Hope see all softwares are beloved by users.
If I'm spending a lot of time working on a product that has a clear competitor(s), I'd expect to know exactly why my product is going to be better than those competitors (ie. in which ways), and what their weaknesses are.
for example, Zeppelin has pluggable architecture so it does support not only scala but also python with built in Spark integration. Zeppelin not only support data exchange between scala and python environment but also SparkContext sharing for spark cluster resource utilization. It's got ability create rich interactive analytics GUI inside of notebook. It's got customizable layout system, and so on..
But I wanted to address more fundamental difference.
Most competitors are opensource project. And as a opensource project, how community works and project direction are the most important things. That'll make huge difference in the end. And i think i explained and compared them.
Great JOB !! This is exactly what I was looking for. How to add Python an R in this notebook ?
It's actually quite nice and very promising but still a little raw and rough around the edges. It's just great to see an open source project to compete with the commercial ones that seem to be popping up all over the place. And it's very modular so it could be used for more than just Spark based jobs.
Is it possible to run sparl/scala in Beaker ?
IPython developer here. A lot of the use cases for notebooks are where you're not creating a software product. It's useful where the product you're really after is a scientific result, some plots, or the like, along with a description of how you got that. And where the product is a presentation, a demo, or documentation.
Of course, as you're doing that kind of thing, code that starts out in a notebook often becomes something you want to reuse, and you therefore move it out to an importable module. We are interested in ways to make that process more fluid.
But I'm curious.
The nice thing about it is that it is all repeatable and that you can share it with other people/researchers.
Note: The demo gets pretty interesting at https://youtu.be/lO7LhVZrNwA?t=36m50s. The presenter handles the situation very well afterwards !
That's a demo of Databricks Cloud , which is Databricks's product offering.
Among the many things it offers is an interactive notebook that looks like open source alternatives like Zeppelin.
Basically, it is a literate program that allows you to intersperse code, documentation and visualizations in the same "file" called a notebook.
If you are familiar with how mathematica works, it's basically the same thing.
If you do data-science at all, or even get tired of working in excel, I recommend picking up ipython and matplotlib.
Here's the one I work on: http://BeakerNotebook.com
Note that we are hiring, especially front-end engineers. Do open source fulltime in NYC.
Can't wait to see if the Jupyter split will contribute to a consolidation or proliferation in the notebook-verse...
I want to have a notebook where I can run
- Spark/scala (like in zeppelin)
- Python (Like in Ipython/Jupyter)