
Show HN: SoS Notebook – Using multiple kernels in one Jupyter notebook - bpeng2000
https://vatlab.github.io/blog/post/sos-notebook/
======
bpeng2000
Just a bit more background. We are bioinformaticians who routinely analyze
large datasets using tools and libraries in many different languages. Jupyter
Notebook supports a large number of kernels but it does not allow us to use
multiple kernels in one notebook. As you can imagine, using multiple notebooks
for an analysis has caused a lot of trouble in the book-keeping, sharing, and
reproduction of our analyses.

SoS Notebook relaxes this restriction and allows us to use a different kernel
for each cell of the notebook so that we can use the most appropriate language
(tool, library etc) for each step of the analysis. More importantly, SoS
Notebook provides a mechanism to transfer variables among live Jupyter kernels
so that we can, for example, clean data in Python, analyze them in R or
MATLAB, and plot the results in JS.

We have also tried to improve the Jupyter frontend to create a more
comprehensive work environment for interactive data analysis. For example, SoS
Notebook provides a side panel that allows you to execute cell content line-
by-line using shortcut Ctrl-Shift-Enter. It also provides magics to, for
example, render output from any kernel in Markdown or HTML, and clear non-
informative output after the execution of the cells. We are very excited about
our work and would really love to get your feedbacks.

------
zmmmmm
Nice work!

Maybe should look at collaborating with BeakerX:

[http://beakerx.com/](http://beakerx.com/)

I personally prefer the legacy "beaker notebook" which did similar things in
(I think) a nicer interface than Jupyter.

~~~
bpeng2000
We are well aware of the Beaker Notebook and BeakerX and appreciate the great
work they are doing. We however decided to develop our own tool because of a
few specific reasons such as we needed support for MATLAB and SAS, we needed a
more powerful data exchange model for (almost) arbitrary data types, and most
importantly, we were creating an interactive data analysis environment backed
by a powerful workflow engine, which is well beyond the scope of Beaker. The
combination of SoS (workflow engine) and SoS Notebook is what makes SoS
Notebook a powerful environment for (bioinformatics) data analysis.

I agree that all other notebooks (e.g. BeakerX, Zeppelin, R Notebook) have
nicer interface than Jupyter but Jupyter excels at its simplicity and
JupyterLab (to which SoS Notebook will be ported eventually) is making great
progress there. With the frontend enhancement that SoS Notebook provides,
especially the line-by-line execution feature, we are pretty satisfied with
the frontend and do not really miss the fancy frontend of other notebooks.

~~~
zmmmmm
That's a really interesting idea about backing it with a workflow engine. I
can't recall something that did that before - though there are obviously
plenty of workflow engines for python and other languages. Definitely
interesting to look at. Good luck!

~~~
bpeng2000
Yes, multi-language notebooks solve the "multi-language" but not the "large-
scale" problems with bioinformatics (or data science) data analysis. However
powerful other notebook environments can be, they are rather limited if they
can only execute the notebooks on a single machine. However powerful other
workflow systems can be, they are counterproductive if they require you to
develop workflows in another environment and in another language. Backing up
SoS Notebook with the SoS workflow engine provides a single environment for
both interactive data analysis and the development and execution of workflows.
This topic definitely worths a separate blog post so I will just list a few
features that SoS enables here:

1\. Extended from Python 3.6 to make SoS an easy and yet powerful workflow
language.

2\. Embedding workflows in SoS Notebooks allows you to annotate the workflows
with detailed descriptions (markdown cells) and results (of demo runs).

3\. Supports both forward (sequentially numbered) and makefile style (patten
matching) workflows.

4\. Execution signatures to avoid re-execution of long steps.

5\. Magics to execute workflows in SoS Notebooks so you can, for example,
execute cells of a notebook conditionally and repeatedly.

6\. A task system that sends parts of workflows to remote host (a more
powerful workstation), cluster (PBS/Torque/LFS/Slurm systems), or RQ task
queue, even if the remote systems are on different file systems (SoS
automatically maps paths and synchronize files).

------
neves
As a newbie Jupyter user, I don't even understand all these complains. I still
think it is really cool to have an scriptable document. My greatest complain
is just when I need to tweak a graphic and I have to scroll up to see my
modifications instead of having directly feedback.

~~~
bpeng2000
We are in the same boat, and that is why SoS Notebook allows you to execute
parts of scripts in the side panel. Basically, you will need to select the
graphic-generation part of the script and press Ctrl-Shift-Enter to see the
output in the side panel.

------
SimplyUseless
SoS notebook as a name not so good

However the idea of multi-kernel is useful.

I am going to have to try this out.

Can you explain what is the architecture of SoS kernel?

~~~
bpeng2000
Thank you for your comment on the usefulness of the multi-kernel setup of SoS
Notebook. Please feel free to try SoS Notebook on our live SoS server (click
the rocket button on the top right corner of the SoS homepage).

The architecture of the SoS kernel, if we ignore the SoS workflow engine part,
is just a Python3 kernel that sits between Jupyter and all other kernels. The
SoS kernel starts the subkernels, collect user inputs (interpolate them if
needed), send them to the subkernels, and collect and display outputs from the
subkernels. Data exchange between subkernels are implemented by executing
hidden statements in SoS and subkernels, with assistance from SoS language
modules for supported languages. This simple setup introduces minimal changes
to Jupyter users: they can use multiple languages in a SoS notebook but can
also use the SoS kernel as a wrapper to their kernel for an improved frontend,
and they can enjoy all the tools that Jupyter provides (e.g. JupyterHub,
template, conversion tools) with SoS Notebook.

Finally, the name SoS Notebook comes from the fact that it is a frontend to
the SoS (Script of Scripts) workflow engine. We know it does not sound great
and places SoS well behind the real "SOS" in a google search, but let us just
hope that some day SoS would appear in the first page when you search for SoS.
:-)

------
yhat
1) Is data exchange accomplished by packages such as `rpy` or `R.matlab`?

2) Will it support `JupyterLab`?

~~~
bpeng2000
1). No. As explained in the post, the data "exchange" is accomplished by
creating another variable of similar type in the destination kernel so the
resulting variable is independent of the homonymous variable in the sending
kernel. No third-party modules such as rpy or R.matlab is used.

2). That is certainly on our radar after JupyterLab matures and provides a
stable API. The core of SoS Notebook (namely the data exchange model) should
be easy to port but the entire frontend would need to be rewritten.

