
Why Jupyter is data scientists’ computational notebook of choice - sohkamyung
https://www.nature.com/articles/d41586-018-07196-1
======
makmanalp
The majority of the complaints I hear about notebooks I think come from a
misunderstanding of what they're supposed to be. It's a mashup between a
scientific paper and a repl. So it's useful for a bit of both:

a) Just like with a paper, you can present scientific or mathematical ideas
with accompanying visualizations or simulations. From the REPL side, as a
bonus, you get interactivity, and the reader can pause and experiment with the
examples you're giving to improve their understanding or test their
hypotheses. If I change this variable, how will the system react? You can just
try it!

b) Just like with a REPL, you can type in and execute commands step by step,
viewing the output of the previous command instead of running the whole thing
at once. From the document side, as a bonus, you get nicer presentation
(charts, interactivity, nice and wide sortable tables, etc) than you would in
a shell, which comes in handy when doing things like data exploration or
mathematical simulation.

It's decidedly NOT there for you to type all your code in like an editor and
make a huge mess. It's apples and oranges w.r.t and a poor substitute for
something like PyCharm or VS Code or vim. It is there for you to a) try things
out yourself, and whatever you discover hopefully eventually make it into
proper python modules b) make interesting ideas presentable and explorable for
others. That's all!

When I see stuff like "out of order execution is confusing", I don't disagree,
but it does make me wonder how long and convoluted the notebooks these people
work with are - probably a ripe candidate to refactor stuff out into python
modules as functions. When I see stuff around notebooks for "reproducibility",
I'm a bit confused in that notebooks often don't specify any guidance on
installation and dependencies, let alone things like arguments and options
that a regular old script would. In that regard I think it's barely an
improvement over .py files lying around. When I hear "how do I import a
notebook like a python module", I'm very very scared.

Granted, I've seen huge notebooks that are a mess, so I understand the
frustration, but it's not like we all haven't seen the single file of code
with 5000 lines and 10 nested layers of conditionals at some point in our
lives.

~~~
jonnycomputer
If you ever have used an R Notebook written in R-Markdown, then its pretty
easy to see why Jupyter Notebooks putting everything in JSON is just...
infuriatingly wrong-headed. In an R Notebook, I can see my code, I can see my
text, everything is exceedingly simple to understand, and I can edit it in any
of the fantastic text editors out there (Jupyter's editor is not among them)

~~~
kortex
The main reason for json, I believe, is that the Jupyter client is separate
from the backend. It's actually pretty trivial to run the engine on a beefy
box while interacting on a light laptop (on the same subnet). With Jupyter Lab
and some fiddling, you can put the server anywhere.

It's also trivial to export notebooks to .py files.

That said, my goodness do notebooks wreak havoc on git. I hope this in
particular gets fixed as popularity grows.

~~~
PurpleRamen
Having a proper server available is even more reason to use a proper
fileformat. The client doesn't care what the server handles and the server
doesn't need to send raw datastructures directly from the storage.

Actually, fixing the fileformat-mess should be very simple. Just change the
file-load/save-functions. Use a folder-structure with every cell being a
seperate file. Or switch to XML. Or make a generic interface and allow to save
in whatever the people want. Saving notebooks in Mongodb or some SQL-Database
seems like a good goal for dedicated services.

------
amirathi
Version control for Jupyter notebooks was one of the biggest complaint I had.
Specifically, diff and merge with the JSON files (.ipynb) is ugly.

I built ReviewNb[1] to solve one of those problems (diff). Note that, there is
nbdime[2] which works well for local diff/merge. The idea for ReviewNb is to
have much tighter integration with GitHub etc.

[1] [https://reviewnb.com](https://reviewnb.com)

[2]
[https://nbdime.readthedocs.io/en/latest/](https://nbdime.readthedocs.io/en/latest/)

~~~
dev_dull
I’m glad I’m not the only one. When I inherited some “production notebooks”
(if that’s a thing) I couldn’t believe it was nearly impossible to do basic
things such as test and review changes (via version control).

~~~
bitL
You don't use Jupyter notebooks in production; they are super useful for
pitching ideas to clients/bosses and doing some early prototyping. I feel
sorry for anyone that has to work with "pure data scientists" that have no
clue about software engineering practices...

~~~
brylie
FWIW, Netflix uses Jupyter notebooks in production, using nteract UI:

[https://medium.com/netflix-techblog/notebook-
innovation-591e...](https://medium.com/netflix-techblog/notebook-
innovation-591ee3221233)

[https://nteract.io/](https://nteract.io/)

This approach seems promising, particularly as it facilitates cross-
disciplinary collaboration.

------
zmmmmm
As with so many things python related (including python itself), I am
perplexed by how willing people seem to be to fall in love with solutions that
have so many limitations and problems. I find Jupyter just barely usable. I
constantly have issues with editing in the cells, diagrams not sizing
correctly, cells accidentally displaying huge amounts of data and freezing my
browser, complete failure of autocompletion in many languages, a very awkward
security model involving manual cutting and pasting of auth tokens around,
nearly impossible to get a reasonable rendering of the notebook into something
reasonable like PDF (yes there attempts at solutions, they are full of
problems). Many limitations derive directly from the architecture where the
kernels are limited in what they can do because specific parts have to be
interpreted in the browser that are language specific.

From my perspective, it's a dumpster fire - in 2018 there should be something
so much better than this. RStudio is a thousand times better but only does R.
I used to like Beaker Notebook but it gave up due to Jupyter's popularity and
converted itself into a bunch of Jupyter extensions which now have all of
Jupyter's limitations.

Yet despite all this I can see that there's this enormous community that loves
this and keeps developing and contributing to it.

~~~
gaius
_it 's a dumpster fire - in 2018 there should be something so much better than
this_

In 1998 I was using a tool called MathCAD that provided a notebook interface
running as a plugin to MS Word. In 2018, Jupyter is still not as good as that.
Some things are just not meant to be webpages.

~~~
narwally
> Some things are just not meant to be webpages.

This is how I feel about most of the single page apps I've worked on.

------
rsivapr
Link to the deck by Joel Grus' talk that is mentioned in the article:
[https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-
dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1)

~~~
amirathi
Really good talk. Here's the video:
[https://www.youtube.com/watch?v=7jiPeIFXb6U](https://www.youtube.com/watch?v=7jiPeIFXb6U)

And all JupyterCon 2018 talks if anyone is interested:
[https://www.youtube.com/playlist?list=PL055Epbe6d5b572IRmYAH...](https://www.youtube.com/playlist?list=PL055Epbe6d5b572IRmYAHkUgcq3y6K3Ae)

------
azag0
I love Jupyter Notebook for experimenting and rapid creation of reports, but
dislike it for not being able to use my editor and for intermingling inputs
and outputs in a single file. So I'm working on an alternative frontend to
Jupyter kernels, which is heavily inspired by KnitR:
[https://github.com/azag0/knitj](https://github.com/azag0/knitj) It is still
being developed, but it's functional and I use it every day.

~~~
bodkan
This looks incredible! Does your project already support other language
kernels than the Python kernel?

I use R for 90% of my work, but most of it has been happening in Jupyter
notebooks (which I'm not a huge fan of, despite practically living in them for
the past 4 years of my life).

Thanks for sharing!

~~~
azag0
I have not tested it with anything else than the Python kernel, but it uses
Jupyter Client to communicate with the kernel, which is kernel agnostic. So
you should be able to do just “knitj -k <kernel name> ...”.

------
dangirsh
I've found ob-ipython [1] within Org Mode to be one of the best options for
interfacing with Jupyter. If you're sick of the limitations of working in a
browser, it's worth checking out.

Scimax version: [https://github.com/jkitchin/scimax/blob/master/scimax-
ipytho...](https://github.com/jkitchin/scimax/blob/master/scimax-ipython.org)

Video of Scimax version:
[https://www.youtube.com/watch?v=dMira3QsUdg](https://www.youtube.com/watch?v=dMira3QsUdg)

Previous HN discussion highlighting key features:
[https://news.ycombinator.com/item?id=17839926](https://news.ycombinator.com/item?id=17839926)

Some relevant blog posts:

\- [https://vxlabs.com/2017/11/24/getting-ob-ipython-to-show-
doc...](https://vxlabs.com/2017/11/24/getting-ob-ipython-to-show-
documentation-during-company-completion/)

\- [https://vxlabs.com/2017/11/30/run-code-on-remote-ipython-
ker...](https://vxlabs.com/2017/11/30/run-code-on-remote-ipython-kernels-with-
emacs-and-orgmode/)

\- [https://kozikow.com/2016/05/21/very-powerful-data-
analysis-e...](https://kozikow.com/2016/05/21/very-powerful-data-analysis-
environment-org-mode-with-ob-ipython/)

[1] [https://github.com/gregsexton/ob-
ipython](https://github.com/gregsexton/ob-ipython)

------
projectramo
Here are the issues with Jupyter, and most other flavor, of notebook:

1\. variables have to be explicitly output

The most important tool for programming, for me, is that window that shows you
the current state of all the variables. When I step through a program, I look
at the state. 90% of my debugging solutions come from seeing that variable
doesn't have the right state.

2\. Intellisense

For the love of god, I do not want to remember if it is len(), length(),
.len(), .length(), .size(), size(1) or whatever.

That's it. But those two are so big that I have to code and debug in Spyder
and then paste the code into notebook. I feel sorry for people who are new who
think that all the debugging is happening in the notebook.

~~~
pavanagrawal123
Hey There! I'm trying to solve the issue of IntelliSense.. I'm
building/improving Jupyter Notebooks inside VSCode:
[https://github.com/pavanagrawal123/VSNotebooks](https://github.com/pavanagrawal123/VSNotebooks)
. It's a fork from another extension somebody already built, but all activity
is dead, so I'm starting up dev on an active fork. I'd love to hear any
feedback y'all have! :)

Also planning to add some nice debug features, plus hopefully integration into
the inbuilt VSCode debugger!

~~~
yodon
How do you see the idea of a vs code notebook comparing to or being different
from the goals of the hydrogen editor?

~~~
kylebarron
To me, my favorite part of the design of Hydrogen is that it's entirely
language agnostic, and can be used with _any_ Jupyter kernel.

~~~
pavanagrawal123
I'm planning on adding better language support in a couple of weeks! Don't
want to limited to Python and R

~~~
awake
One thing you may want to be aware of is that the python language server used
for vscode runs pylint which performs static analysis on the code. However,
Jupyter Notebook uses autocomplete by actually introspecting the variables as
they are defined. This creates large differences when doing things such as
selecting a column in a pandas dataframe. In jupyter if you press tab on the
column name, it can autocomplete and also assumes you are getting a series,
which leads to autocomplete on things like .min, .max, etc... In pylint you
don't get any of this autocomplete since pylint cannot statically determine
the column names so you lose the intellisense.

~~~
pavanagrawal123
yep! this is something I will be focussing on in VSNotebooks. I noticed this
as a huge drawback to the current implementation, so I will be fixing it :)

------
cphoover
For people who prefer to code in JS. There is a similar application called
observable notebooks that has recently come out:
[https://beta.observablehq.com/](https://beta.observablehq.com/)

It offers some nifty things including, _well_ , observables where cells of the
scratchpad can automatically update by observing changes from other cells.

~~~
th0ma5
More of an online service, though.

~~~
cphoover
true true.

------
wanderfowl
I like R for many things, but Python just keeps getting more compelling,
particularly given the excellent machine learning packages. As these sorts of
toolchain elements get better and better, and as more people realize that
there's a benefit to simultaneously training researchers to run code _as well
as_ stats, I suspect we'll start to see an exodus from pure R solutions.

The real question is when (and whether) new social scientist stats courses
will start teaching Python stats toolchains, rather than R. That seemed to be
an inflection point for R (as folks moved away from SAS), and could be for
stats-centric Python too.

~~~
thousandautumns
I'm not sure what about Jupyter makes Python more compelling in comparison to
R. R is entirely usable in Jupyter Notebooks, and R Notebooks are, in my
opinion, possibly superior to Jupyter notebooks in many ways.

> and as more people realize that there's a benefit to simultaneously training
> researchers to run code as well as stats, I suspect we'll start to see an
> exodus from pure R solutions

I'm not sure what you are saying here.

I would actually argue that most of the Python data science toolchain is years
behind what is available in R.

~~~
achompas
> I would actually argue that most of the Python data science toolchain is
> years behind what is available in R.

I do not want to litigate this on HN, but the problem with R is the toolchain
_around_ your data science work.

You've fit a model in R, and that's great! Now how do you get it into a real-
time system? Or how do you test the software you wrote to train the model?

------
glup
It's easy to grade student assignments in notebooks with
[https://github.com/jupyter/nbgrader](https://github.com/jupyter/nbgrader),
which also makes it great for teaching.

~~~
taeric
Meh. As long as you have defined deliverables between grader and student,
grading programming based assignments are relatively easy. Coursera has been
around longer than Jupyter has been popular, after all. (And they aren't all
just multiple choice.)

Being interactive is what makes it good for teaching. But there are plenty of
interactive options. And for a certain class of teaching, it is not "on rails"
enough such that people will have to have a ramp up period first on Jupyter
before they can really get into their topic.

------
agibsonccc
The only thing that stops me from being able to use notebooks full time is
their intellisense compared to IDEs is horrible. I like being able to use them
for demos/presentations, but I can't imagine trying to code within one
primarily. Especially when it comes to tracking results.

How do people cope with this? Do you supplement it with other tools? I spend a
lot of my time in an IDE and then just paste some of the code in to cells.
That seems easier.

~~~
zimablue
I do the opposite, my job is kind of bad data engineer/scientist/etl minion so
it's a lot of dataframes.

Work (and often debug) in jupyter -> open the notebook from pycharm when it's
got some completed thoughts and write into a python module + test module,
tidying up and adding type annotations.

Sometimes doing that multiple times so that the notebook is importing from
modules which were originally pulled out of the notebook.

It sucks having to use two tools but I don't think there's any one tool that
can do both as well as pycharm/jupyter, short of me getting a lot better at
emacs or writing a lot of custom Atom extensions (I think).

~~~
bunderbunder
I am very hopeful that JupyterLab will get support for the Language Server
Protocol sometime soon. That would make all the difference in the world for
me. I'd still have to use a terminal to build and run tests, but I wouldn't be
surprised if a test runner comes along fairly quickly after that.

(Relevant issue:
[https://github.com/jupyterlab/jupyterlab/issues/2163](https://github.com/jupyterlab/jupyterlab/issues/2163))

------
glup
I've switched largely to Jupyter / Python for computational linguistics /
psycholinguistics because of the pandas / numpy /numba stack, decent off-the-
shelf NLP (spacy and gensim), and the ease of moving data into an R kernel for
specific analyses and plots. Also nice that any reasonably sized notebook will
render on GitHub (and access can be controlled through the accounts system,
until something is ready to be public).

One thing I haven't figure out how to do is to generate fully styled LaTeX
manuscripts from notebooks (like papaja for RStudio). Is there a way to do
this with pandoc?

~~~
Quenty
Yes! Juypter notebook has an export to .tex file in its export menu. If you
install the right stuff you can render the output on the server.

However, the output of this isn’t nearly as well formed as a hand written
latex document is.

~~~
macawfish
Yeah but I believe you can customize the export template, right?

------
ChuckMcM
I see Jupyter notebooks as the next step in spreadsheets with more code
foundation and different media support. For me the interface doesn't work as
well as I would like but I can see the potential.

My brother in law wants something like this for structural analysis reports
where the code, data and report are all one thing that can be pulled out and
examined.

Watching the new iPad announcement today I think this is something that would
make an excellent iPad app as well.

~~~
huac
yeah, i have a use case now where we pull data from a database, manipulate it,
and then have a final table/csv/dataframe/whatever. the problem is then how to
share this with non-technical users. in an ideal world, this would get
inserted into a google sheet, and that sheet would just update daily after new
data is loaded into the database.

i'm pretty sure this is a usecase which others have and curious what people
use to solve it. i've heard, variously, that some options are to use
tableau/similar or email a csv and ask the end user to import into google
sheets/excel

~~~
jononor
Google Sheets has an API. So if putting the output there is the ideal, just
use that API from Python?

~~~
huac
yeah, issue is that if you're inputting more than ~500 rows, you'll be rate
limited :|

------
johnminter
RStudio with using Rmarkdown is also popular. Both workflows are language
agnostic.

~~~
cwyers
Putting aside the R vs Python question (as as noted in this thread, you can
use R in a Jupyter notebook and Python in an RMarkdown notebook), I much
prefer RMarkdown notebooks. RMarkdown notebooks are plain text, so you can
read them easily in any text editor (which also means they play well with git,
unlike Jupyter notebooks).

And it's meant to work with the RStudio IDE, so I get a much more seamless
experience going between regular code and notebooks (although this is
admittedly a more R-centric benefit, at least until and unless RStudio adds
Python support outside of notebooks).

~~~
jonnycomputer
Using markdown for python notebooks makes so much sense. What was the thinking
behind encoding everything inside JSON?

------
blululu
IMO, Jupyter is nice for presenting the final results of research (like
LaTeX), but it is often not the right tool to get there. It's good for
professors who teach and publish but it's bad for students to learn and
research.

Frankly I find that all programming environments for scientific computing are
deficient in some way or another. If you look at the set of features in Visual
Studio, R Studio and Jupyter notebooks, you will see that the Union of useful
features is large, and the intersection is almost empty.

------
betolink
IMO Jupyter notebooks are popular because it's open source, they are
convenient and they help a lot illustrating an idea.

The results of a notebook can be shared more easily than a plain
repository(via nbviwer or binder) and more importantly the science there it's
reproducible.

------
trop
Question/idea: Could a notebook-model supplant bespoke photographing
processing software such as the "darkroom" mode of Lightroom (or darktable).
The extant programs essentially take a lot of data (camera's raw output) and
apply a configurable recipe to produce intelligible output (an image). Each
recipe (stored as an XMP sidecar) is essentially a list of math operations
(increase brightness, wavelet decompose, change color model, etc.) and their
parameters.

Obviously a great part of why we use Lightroom/darktable is because of the
speed with which the recipe-processing occurs. Plus a smooth UI, a catalog-
viewing feature, and a well vetted choice of image operations. The appeal of
moving this work to a notebook would be that an actively maintained Jupyter
ecosystem could supplant lock-in to a specific software, and open up the
underlying math magic.

At the very least, this could be an interesting platform for experimenting
with image processing methods. And the reordering of cells could become a
virtue, to run an image processing pipeline out of the standard order.

I'm curious if anyone has already worked along these lines. I find through a
quick web search that people are doing some image processing, but more in the
face detection or ML for medical imaging aspects. I see as a basic toolkit
that [http://scikit-image.org/docs/dev/auto_examples/](http://scikit-
image.org/docs/dev/auto_examples/) is something, though this isn't the whole
range of operations needed for, say, fine art image tuning.

~~~
dimatura
Yes and no. I do computer vision and like photography as well. I use Jupyter
notebooks extensively for computer vision, and they work pretty well for
(semi) interactive manipulation of image data with code. But as a general
purpose tool, it's too clunky for anything more than prototyping. I don't see
them replacing darktable/lightroom anytime soon.

------
mindcrime
Zeppelin[1] is another great tool of a similar nature. It leans a little bit
more towards the Scala / Spark world, for people who like that stack. That
said, you can use Python, R, etc. with Zeppelin as well.

[1]: [https://zeppelin.apache.org/](https://zeppelin.apache.org/)

------
reilly3000
Does anybody know of a good hosted solution of JupyterHub? I made a neat
notebook that I needed to share with my non-technical team, it was using
iPyWidgets to do some interactive modeling, but they each needed to be able to
use it independently. It has private data so I couldn't use Binder. I've been
following Zepl.com for a long time, but couldn't use them here because Zepplin
doesn't support iPyWidgets. Pretty soon I found myself installing helm and
trying to follow along a tutorial on how to deploy JupyterHub on a kubernetes
cluster. That started to add an unmanageable level of complexity to own,
especially to share a simple notebook. And while spinning up a GKE node per
user is the whole point of Kubernetes, it got expensive quickly in my test. We
cannot spend $75K a year on Domino. Any other options?

~~~
SiempreViernes
Do you have some server to host it on?

If so, I’d run the notebook on the remote server and just teach them whatever
command they need to make a ssh tunnel there. Something like what is described
here: [https://techtalktone.wordpress.com/2017/03/28/running-
jupyte...](https://techtalktone.wordpress.com/2017/03/28/running-jupyter-
notebooks-on-a-remote-server-via-ssh/)

So they would utter the unknowable incantation and then point their browser at
localhost:8000 or whatever and then use their version of the notebook.

~~~
reilly3000
I think that would be within reason for most of our users, but we have a
couple of Windows users. I'm not particularly keen on telling them to install
putty or WSL.

~~~
SiempreViernes
Ah, well they can always borrow their less disadvantaged coworkers computer
until _they_ get the sysadmins to install and configure a shortcut on their
desktops via active directory I guess.

------
MrPowers
Notebooks are great for invoking existing functions and exploring data.

Notebooks aren't ideal for creating functions (standard text editor features
are lacking and testing is impossible).

Notebooks encourage an "order dependent variable assignment" programming style
without abstractions. Here's what you'll commonly see in a notebook:

val df = spark.read.csv("some_data")

df2 = df.withColumn("clean_name", trim("name"))

df3 = df2.filter("clean_name" === "Mark")

I've found that notebooks are very useful if you write all the complicated
code in separate GitHub repos and attach binary executables to the cluster. If
you try to write all your logic in notebooks, you'll quickly struggle with
order dependent, messy code.

~~~
edparcell
I had the same trouble with order dependence as notebooks got to a certain
size, so my team and I created and open-sourced a library, Loman, to help with
that. It allows you to interactively create a graph, where nodes represent
inputs or functions, and then keeps track of state as you change or add
inputs, intermediate functions and request recalculations. Our experience has
been broadly positive with this way of working. As graphs get larger, it's
easy to lift them into code files in libraries, while continuing to modify or
extend them in notebooks. The graph structure and visualization make it easy
to return to loman graphs with up to low hundreds of nodes, which would make
for a fearsome notebook otherwise. It also makes it easy to bolt Qt or Bokeh
UIs onto them for interactive dashboards - just bind UI widgets and events to
the inputs and widgets to the outputs. They can be serialized, which is useful
for tracking exceptions in intermediate calculations when we put them in
airflow to run periodically, as you can see all the inputs to the failing
calculation, and its upstreams.

[1] Github:
[https://github.com/janushendersonassetallocation/loman](https://github.com/janushendersonassetallocation/loman)
[2] Quickstart/Docs:
[https://loman.readthedocs.io/en/latest/user/quickstart.html](https://loman.readthedocs.io/en/latest/user/quickstart.html)

------
tcpekin
Having spent a decent amount of time learning to be a programmer while doing
scientific image analysis in Matlab (shudders from the real programmers), and
with a decent amount of time spent in Mathematica as well, I just can't seem
to buy into the Jupyter/notebook based programming enthusiasm. The talk linked
in the article explains it better than I ever could, but for me, when I am
leaving data in memory, it is much more convenient for me to have a completely
linear history, ordered by command execution time. In python I have found the
best way to do this is writing standard python functions and scripts, and
running them in an IPython environment with the %run magic. You have the
linear history, git works well on standard .py files, and you can
interactively work with the data in the IPython prompt without worrying that
something is proceeding nonlinearly. What I find works best is to explore with
the data in the live prompt, which gives you interactivity, and then build up
slowly a master collection of functions and commands that when run, with a
single command, can reproduce the results you got while exploring. Then to
come back to the data at a later point in time, you have to run one script
file on the raw data. Of course, this is kind of the point of the Jupyter
notebook, but I find that often when I want to change parameters, the ability
to jump around and redefine things means I do. By moving from IPython to a
script/functions I run, I ensure that everything progresses linearly. Idk,
just my two cents.

~~~
jackbravo
Which talk? I didn't see any link to a video.

~~~
tcpekin
Here they are - they're definitely irreverent, but I find myself strongly
agreeing with everything he says.

[https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-
dkAIsUXP-AL4ffI/edit#slide=id.g3d168d2fd3_0_130)

[https://www.youtube.com/watch?v=7jiPeIFXb6U&feature=youtu.be](https://www.youtube.com/watch?v=7jiPeIFXb6U&feature=youtu.be)

------
ontouchstart
Many comments are about implementation details such as JSON format of Jupyter
and comparing user experience with IDE and shell and fail to see the
fundamental difference between Emacs and Jupyter, which is in this quote in
the article:

“In many cases, it’s much easier to move the computer to the data than the
data to the computer,” says Pérez of Jupyter’s cloud-based capabilities. “What
this architecture helps to do is to say, you tell me where your data is, and
I’ll give you a computer right there.”

------
thomasfedb
Jupyter is lovely (and JupyterLab looks _delicious_ ), but the set up required
to achieve a reproducible local server with the R kernel and versioned R
packages is 100% not.

Installing R packages through anaconda is like pulling teeth and the docker
images for my Jupyter notebooks push past 6GB and take multiple cups of tea to
build.

Is there a good solution I'm missing? A good hosted solution perhaps?

~~~
kvlr
I run [https://nextjournal.com](https://nextjournal.com)

We archive full-stack reproducibility by allowing you to install arbitrary
software and version these environments using docker. You can reuse these in
other articles or pull and use them locally. `xoxo` is a signup code you can
you if you want to give it a try.

~~~
thomasfedb
Looks cool, do you have an article that explains the stack from a technical
perspective? Is this based on Jupyter?

~~~
kvlr
Not yet. It isn't based on Jupyter but it's all written in Clojure. Been
meaning to do a writeup on our stack for some time…

In addition to our own runtime protocol, we support Jupyter kernels and you
can import Jupyter and (R)markdown documents.

------
GChevalier
I recently wrote on how to build/grow clean software out of Jupyter notebooks
and on pitfalls to avoid when coding like that: [https://github.com/guillaume-
chevalier/How-to-Grow-Neat-Soft...](https://github.com/guillaume-
chevalier/How-to-Grow-Neat-Software-Architecture-out-of-Jupyter-Notebooks)

~~~
GChevalier
I submitted it to HN too, why not! Here:
[https://news.ycombinator.com/item?id=18339703](https://news.ycombinator.com/item?id=18339703)

------
lixtra
What was the earliest of these tools? Mathcad? Mathematica? Maple?

~~~
tony_cannistra
there’s a relatively esoteric paradigm known as “literate programming” which
has been around since Knuth (he wrote the book [0]) and that has some software
tools associated, of which Jupyter is a particularly web-age example.

[0]:
[https://en.m.wikipedia.org/wiki/Literate_programming](https://en.m.wikipedia.org/wiki/Literate_programming)

~~~
tmalsburg2
Literate programming _was_ esoteric, true, but the concept saw a huge
renaissance in academia and data science with the advent of RMarkdown¹ which
for many of my colleagues is the default way of preparing technical documents.
Another area in which literate programming has become hugely popular is Emacs'
Org-mode ecosystem which has fantastic support in the form of Org Babel². I
use literate programming for almost everything. Research papers, tech reports,
notes, experiments, teaching materials, letters, student evaluations, and so
on. It's completely ridiculous how useful it is once you get the hang of it
and make it your default document type.

[1] [https://rmarkdown.rstudio.com/](https://rmarkdown.rstudio.com/) [2]
[https://orgmode.org/worg/org-contrib/babel/](https://orgmode.org/worg/org-
contrib/babel/)

~~~
TeMPOraL
I'm still not sold on writing complete program this way (I did try, with
various level of success), but even partially-literate approach is
ridiculously convenient if you happen to live in Emacs.

I do my task management and note-taking in Org-mode, and recently I found
myself doing things like jotting in the middle of my notes[0]:

    
    
      #+BEGIN_SRC http
        GET address.to.api:123/sth
      #+END_SRC
    

and tapping CTRL+C twice, to get the actual response of the API I was
debugging.

Or, the other day I was making notes about gravity batteries, and was
wondering how efficient is one startup's solution. I briefly thought about
firing up Jupyter, but then simply wrote the following[1]:

    
    
       these guys power a LED (or three?) with a 0.1W, generated through dropping
       a 12kg weight down 1.8 meters over 20 minutes.
    
       Doing some basic math on that:
       #+BEGIN_SRC elisp
         (let* ((m 12)
                (g 9.81)
                (h 1.8)
                (_t (* 20 60))
                (E (* m g h))                    ; E = m*g*h
                (P (/ E _t))                     ; P = E/t
                (efficiency (/ 0.1 P))           ; efficiency = Pout/Pin
                )
           `("ideal power [W]" ,P
             "efficiency [1]" ,efficiency))
       #+END_SRC
    

Typing CTRL+C twice, out pops:

    
    
      #+RESULTS:
      | ideal power [W] | 0.17658000000000001 | efficiency [1] | 0.5663155510250312 |
    

(which is automatically rendered as an org-mode table I can operate on, or
even reference in other code snippets).

Point being, note-taking in org mode makes it ridiculously easy to invoke any
programming language you hooked up to Emacs without breaking your flow, and
you get to edit the code in the mode specific to that programming language -
so everything from autocomplete to linters work.

I know Emacs is niche, but I can't recommend it enough.

\--

[0] - BEGIN/END_SRC block is under convenient autocomplete of "<s TAB".

[1] - this is a real note, so if I got the physics wrong, I just made a fool
of myself publicly -.-

------
zimablue
I kind of find Jupyter an indictment of other coding tools really, it's 2018
and they're normally kind of weak or kind of unprogrammable. Feel like we're
waiting for someone to really reinvent Emacs, preferably using web tech.

Most editors can't open a terminal that you can use VIM keybindings on to
search/navigate history and treate like any other buffer.

VSCode -> not currently possible because they wrote it in a restrictive way
with Panel as a special case very different to code window Atom -> probably
possible but I don't think terminal-plus is quite it. Any IDE I've tried ->
not possible. Emacs -> possible.

Not that this is the be all end all feature but it is useful as hell and kind
of a litmus test for whether you can program your environment.

edit: LightTable seemed kind of cool but became abandonware like the author's
other projects

~~~
gnulinux
I use Jupyter in emacs: ein-mode. The whole concept of programming in a
browser sounds bizarre to me. I have a tool that's designed for programming
(emacs) and a tool that's designed for streaming cat videos (firefox) and I
use the latter for programming? Thanks but no thanks. Ime emacs works pretty
perfect with jupyter too, there is no need to use firefox for something it's
not designed to do for my full-time job.

~~~
zimablue
I have also tried ein-mode, and sometimes use it. I haven't used it enough to
have a super educated opinion but it didn't blow me away.

I don't think it supports cell folding (?) as an example missing feature

Minor point but it also can't/shouldn't support widgets, which we use at work.
Any extension to jupyter is going to be written in javascript so there's an
element I'd be locking myself out of the ecosystem.

I didn't mean this as a criticism of emacs, my post said that the best thing I
know is emacs, just that it's (probably) not the future of editors imo so I'm
reluctant to throw 100 hours into it.

~~~
TeMPOraL
> _I don 't think it supports cell folding (?) as an example missing feature_

If I understand what you mean correctly, that would be handled by built-in
outline-minor-mode, or by a third-party Emacs module like yafolding or fold-
this.el. Emacs packages tend to be made to compose well with other packages
(it's a requirement given how everyone's Emacs is a special snowflake, unlike
any other Emacs).

As for widgets/Jupyter extensions, then yes. Emacs can't really help you there
AFAIK.

------
jxramos
"Two additional tools have enhanced Jupyter’s usability. One is JupyterHub, a
service that allows institutions to provide Jupyter notebooks to large pools
of users. The IT team at the University of California, Berkeley, where Pérez
is a faculty member, has deployed one such hub, which Pérez uses to ensure
that all students on his data-science course have identical computing
environments. “We cannot possibly manage IT support for 800 students, helping
them debug why the installation on their laptop is not working; that’s simply
infeasible,” he says." I think this result is a real winner, I recall the
problems of setup in university student labs. Good win for reducing teaching
friction.

------
stephengillie
I recently got a Jupyter Notebook, and found it's a large JSON document, with
some sections in markdown and some in Python. A browser could omit the Python,
and an interpreter could omit the markdown.

Would this work for other languages? Maybe JavaScript or Powershell instead of
Python?

~~~
closed
Can and does! Jupyter notebooks have a flexible architecture for running
anything (called "kernels").

Notable examples...

R:
[https://github.com/IRkernel/IRkernel](https://github.com/IRkernel/IRkernel)

node: [https://github.com/notablemind/jupyter-
nodejs](https://github.com/notablemind/jupyter-nodejs)

In my mind, one of this big advantages of jupyter is its extendability. I was
able to quickly modify notebooks to run unit tests, so we could use them for
projects at DataCamp:
[https://www.datacamp.com/projects](https://www.datacamp.com/projects).

------
bravura
Have people been finding that AWS SageMaker disconnects you and you have to
restart after a couple hours?

I am curious people's thoughts on using Jupyter for long-running code. Having
a totally self-contained experiment in one notebook, even if it long-running,
is very useful for reproducibility. It works fine on my local laptop and a
remote server, but not with SageMaker.

------
ianamartin
The problem with Jupyter isn't with what it does. It's the people who use it.

My experience as a data engineer/architect/application developer attached to
data science teams for a while now is that most really good data scientists
are very good at what they do, write somewhat competent code, and do not--in
any way--care about writing good software or good application code.

Jupyter is a bane of my existence because people who use it want to use it for
everything. Oh, it can have a web interface? Okay. The app is done. DEPLOY TO
WEB USERS! NOW!!

It's a great tool. A lot of the people who use it are not software engineers,
and they don't want to be. For a lot of people it's the straight line from
point a to point b.

But in my experience, legit data scientists are pretty smart and are willing
to learn a little if you're willing to give a little. This is a good exercise
because they are typically skeptical about everything. So you have to be
really secure about why you want certain things done certain ways, and why you
definitely don't want things done other ways.

It's a good exercise for everyone involved if you have the right team dynamic
and mutual, healthy respect for each other.

If you don't . . . well, then Jupyter notebooks completely suck.

------
enriquto
I love notebooks as a way to present information, data, code and computations.

However, I cannot stand typing _any_ text into a web browser window. Is there
any way to edit a jupyter notebook with a text editor and then run it in the
browser? The native json is not really human-editable.

~~~
dimatura
I have felt the same way in the past. There are some ways to do this, but none
is great. Unfortunately using a text editor to put in the notebook text areas
is not that straightforward because of security features in modern browsers.
Since jupyter is actually a server (usually running locally) it's possible to
communicate directly with it from a sufficiently advanced editor, but haven't
seen any good execution of that idea. There's also the ipymd
([https://github.com/rossant/ipymd](https://github.com/rossant/ipymd)) format,
which is just markdown, and seems more or less what you want, but you lose the
interactivity and display of images/plots/HTML/etc. Personally, I've found the
"jupyter-vim-binding" ([https://github.com/lambdalisue/jupyter-vim-
binding](https://github.com/lambdalisue/jupyter-vim-binding)) to be a
relatively acceptable emulation of vim keybindings for the code editing.

~~~
enriquto
This is not what I mean. Mine is a problem of file formats, not of
interactivity. I want to edit a text file alone, without need for any web
browser in my computer. Then I push the notebook to git, and somewhere else it
is opened by the browser.

This would be possible today if the notebook file was python code with
comments, for example, instead of an uneditable json.

~~~
dpwm
This is quite a neat idea. A couple of people have mentioned Jupytext around
here. I found a guide with animations which appears like it might do what you
want. [0]

I personally look forward to trying this out, as it means that I can use
Jupyter in a way that doesn't mean adapting my workflow to the tool so much.

[0] [https://towardsdatascience.com/introducing-
jupytext-9234fdff...](https://towardsdatascience.com/introducing-
jupytext-9234fdff6c57)

------
mike_ivanov
There is no "why" because it is not. An awkward JSON format plus the inability
to survive network interruptions. A WiFi router goes down during a long
computation, and voila -- the results are lost, which is unimaginable with
RStudio.

------
ralmidani
Out of curiosity, what other options are there?

~~~
jonnycomputer
for python, org-mode in emacs will do it. but that ties you down to emacs.

~~~
tmalsburg2
Org-mode was a total game changer for my work life. I use it for almost
everything. Yes, it ties me to Emacs. I don't see this as a problem.

------
HaHa31
Link is dead?

------
throwaway487548
Jupyter could be viewed as a modern reminiscence of Lisp Machine UI, without
the elegance of homoiconicity, of course.

Python is a good "glue" for optimized C++ or fortran libraries which the core
of things like tensorflow or numpy. Everything fits together nicely.

~~~
lispm
> Jupyter could be viewed as a modern reminiscence of Lisp Machine UI

maybe some part of it, but the Lisp Machine UI has a full window system, many
different applications based on it with different UIs (font editor, file
system browser, process overview, chat program, terminal, Zmacs editor,
debugger, documentation browser, documentation editor, drawing program, ...)

~~~
dual_basis
That is what Jupyter Lab is becoming.

~~~
lispm
great, but how is it Lisp Machine UI like?

------
RandomInteger4
I hate Jupyter notebooks; I seriously do. Maybe that's an irrational
sentiment, but I find them nauseating to me in the same vein that I find
country music nauseating.

