Hacker News new | comments | show | ask | jobs | submit login
Show HN: SoS Notebook – Using multiple kernels in one Jupyter notebook (vatlab.github.io)
41 points by bpeng2000 10 months ago | hide | past | web | favorite | 11 comments

Just a bit more background. We are bioinformaticians who routinely analyze large datasets using tools and libraries in many different languages. Jupyter Notebook supports a large number of kernels but it does not allow us to use multiple kernels in one notebook. As you can imagine, using multiple notebooks for an analysis has caused a lot of trouble in the book-keeping, sharing, and reproduction of our analyses.

SoS Notebook relaxes this restriction and allows us to use a different kernel for each cell of the notebook so that we can use the most appropriate language (tool, library etc) for each step of the analysis. More importantly, SoS Notebook provides a mechanism to transfer variables among live Jupyter kernels so that we can, for example, clean data in Python, analyze them in R or MATLAB, and plot the results in JS.

We have also tried to improve the Jupyter frontend to create a more comprehensive work environment for interactive data analysis. For example, SoS Notebook provides a side panel that allows you to execute cell content line-by-line using shortcut Ctrl-Shift-Enter. It also provides magics to, for example, render output from any kernel in Markdown or HTML, and clear non-informative output after the execution of the cells. We are very excited about our work and would really love to get your feedbacks.

Nice work!

Maybe should look at collaborating with BeakerX:


I personally prefer the legacy "beaker notebook" which did similar things in (I think) a nicer interface than Jupyter.

We are well aware of the Beaker Notebook and BeakerX and appreciate the great work they are doing. We however decided to develop our own tool because of a few specific reasons such as we needed support for MATLAB and SAS, we needed a more powerful data exchange model for (almost) arbitrary data types, and most importantly, we were creating an interactive data analysis environment backed by a powerful workflow engine, which is well beyond the scope of Beaker. The combination of SoS (workflow engine) and SoS Notebook is what makes SoS Notebook a powerful environment for (bioinformatics) data analysis.

I agree that all other notebooks (e.g. BeakerX, Zeppelin, R Notebook) have nicer interface than Jupyter but Jupyter excels at its simplicity and JupyterLab (to which SoS Notebook will be ported eventually) is making great progress there. With the frontend enhancement that SoS Notebook provides, especially the line-by-line execution feature, we are pretty satisfied with the frontend and do not really miss the fancy frontend of other notebooks.

That's a really interesting idea about backing it with a workflow engine. I can't recall something that did that before - though there are obviously plenty of workflow engines for python and other languages. Definitely interesting to look at. Good luck!

Yes, multi-language notebooks solve the "multi-language" but not the "large-scale" problems with bioinformatics (or data science) data analysis. However powerful other notebook environments can be, they are rather limited if they can only execute the notebooks on a single machine. However powerful other workflow systems can be, they are counterproductive if they require you to develop workflows in another environment and in another language. Backing up SoS Notebook with the SoS workflow engine provides a single environment for both interactive data analysis and the development and execution of workflows. This topic definitely worths a separate blog post so I will just list a few features that SoS enables here:

1. Extended from Python 3.6 to make SoS an easy and yet powerful workflow language.

2. Embedding workflows in SoS Notebooks allows you to annotate the workflows with detailed descriptions (markdown cells) and results (of demo runs).

3. Supports both forward (sequentially numbered) and makefile style (patten matching) workflows.

4. Execution signatures to avoid re-execution of long steps.

5. Magics to execute workflows in SoS Notebooks so you can, for example, execute cells of a notebook conditionally and repeatedly.

6. A task system that sends parts of workflows to remote host (a more powerful workstation), cluster (PBS/Torque/LFS/Slurm systems), or RQ task queue, even if the remote systems are on different file systems (SoS automatically maps paths and synchronize files).

As a newbie Jupyter user, I don't even understand all these complains. I still think it is really cool to have an scriptable document. My greatest complain is just when I need to tweak a graphic and I have to scroll up to see my modifications instead of having directly feedback.

We are in the same boat, and that is why SoS Notebook allows you to execute parts of scripts in the side panel. Basically, you will need to select the graphic-generation part of the script and press Ctrl-Shift-Enter to see the output in the side panel.

SoS notebook as a name not so good

However the idea of multi-kernel is useful.

I am going to have to try this out.

Can you explain what is the architecture of SoS kernel?

Thank you for your comment on the usefulness of the multi-kernel setup of SoS Notebook. Please feel free to try SoS Notebook on our live SoS server (click the rocket button on the top right corner of the SoS homepage).

The architecture of the SoS kernel, if we ignore the SoS workflow engine part, is just a Python3 kernel that sits between Jupyter and all other kernels. The SoS kernel starts the subkernels, collect user inputs (interpolate them if needed), send them to the subkernels, and collect and display outputs from the subkernels. Data exchange between subkernels are implemented by executing hidden statements in SoS and subkernels, with assistance from SoS language modules for supported languages. This simple setup introduces minimal changes to Jupyter users: they can use multiple languages in a SoS notebook but can also use the SoS kernel as a wrapper to their kernel for an improved frontend, and they can enjoy all the tools that Jupyter provides (e.g. JupyterHub, template, conversion tools) with SoS Notebook.

Finally, the name SoS Notebook comes from the fact that it is a frontend to the SoS (Script of Scripts) workflow engine. We know it does not sound great and places SoS well behind the real "SOS" in a google search, but let us just hope that some day SoS would appear in the first page when you search for SoS. :-)

1) Is data exchange accomplished by packages such as `rpy` or `R.matlab`?

2) Will it support `JupyterLab`?

1). No. As explained in the post, the data "exchange" is accomplished by creating another variable of similar type in the destination kernel so the resulting variable is independent of the homonymous variable in the sending kernel. No third-party modules such as rpy or R.matlab is used.

2). That is certainly on our radar after JupyterLab matures and provides a stable API. The core of SoS Notebook (namely the data exchange model) should be easy to port but the entire frontend would need to be rewritten.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact