JupyterLab

dannykwells · on Jan 20, 2019

Any talk of Jupyter requires the following (often posted) talk from Joel Grus: https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...

As someone who switched from Notebooks to script style coding for all of my data science, I couldn't a) agree more with most of his points and b) be happier that I made the switch.

tgb · on Jan 20, 2019

Just two days ago working in Rstudio, thought my script ran fine, but actually the data loading part failed but Rstudio charges on despite errors and runs the rest of the script on the data that was still in memory from another script I had run the day before!

But I don't have a satisfactory solution to handle the problem where Step A takes a few minutes or longer to complete and so I don't want to keep re-running it while coding Step B. I resort to copy-and-pasting Part B into ipython which amounts to being a poor-man's notebook. Obviously the "correct" answer is to write out the results of A to disk but this is can be a lot of a pain while you're developing! (Eg: loading an excel sheet in pandas is slow, while the equivalent tab-separated sheet is very fast, but I don't really want to have a separate step pre-processing all excel sheets into tsv files.) Also, pandas to_csv/read_csv functions aren't even inverses of each other (they lose the index) so writing to desk requires extra steps. (And from_csv is deprecated.)

What I'd rather have is "half" a notebook where I can checkpoint a computation part way through and restart from there. But only ever proceed linearly through the code to avoid notebook out-of-order confusion.

xapata · on Jan 20, 2019

While the to_csv method and read_csv function aren't inverses by default, they can be with one keyword argument. Design mistake, but it's not terrible.

closed · on Jan 21, 2019

In Rmarkdown you can set cache=TRUE for the code chunk you don't want to rerun, and it won't rerun it unless something above/in it changes.

short_sells_poo · on Jan 20, 2019

Same here. I work in the systematic trading business and I gave Notebooks an honest shot, but in the end using Pycharm and scripts along with the ipython console is the best workflow I found. I can easily send blocks of code to the console with a single keypress inside Pycharm, making interactive work a breeze. However, all my code is in nice idiomatic python files that can be imported, refactored and easily version controlled. The same workflow can be done in VIM, Emacs or pretty much any other editor with an ipython console integration.

philipov · on Jan 20, 2019

Yeah, I've made stuff with notebooks for clients who wanted them, but I've never been sold on it. It can be useful for playing around with new libraries before I start using them, but when I want to write actual code I want to keep, I will always do it as a packaged library driven by scripts. You can't deploy notebooks.

Whenever I've had to use notebooks, if I make a change, I just run it top-to-bottom as if it were a script, and try to push as much of the important code as possible into a packaged library. And all of this experience has been with JupyterLab, which doesn't seem to address any of the concerns in this slideshow.

The simple fact that I can't F12 to go to the definition of a function makes doing anything serious with notebooks a chore compared to a real IDE. Missing static analysis, indentation control with tab, and moving lines with shift-up/down, and quick refactor tools are also nuisances that leave no reason to use notebooks over a real editor like PyCharm.

dna_polymerase · on Jan 20, 2019

> You can't deploy notebooks

Netflix begs to differ: https://medium.com/netflix-techblog/notebook-innovation-591e...

jgamman · on Jan 20, 2019

crazy powerful tools have different communities that use them. if you describe jupyter as a mash-up of word and excel, every non-IT pro that works at BigCo will have a pretty good idea of what it is and why it's a good idea for them.

alternately, think about who you are writing code _for_. If it's for yourself or other people, mixing code, graphics and text is a really good idea. If you are writing code for computers you will prefer other workflows. Complaining that a workflow that is useful for one community doesn't work for your community is pointless - try going 1 or 2 levels up in your abstractions.

stared · on Jan 20, 2019

It is good to be aware of the alternatives. Yet, I am super suspicious of data scientists not using Jupyter Notebook at all (collaboration with most of them was a pain; e.g. they didn't explore data enough, had a beautiful pipeline of garbage-in garbage-out data, etc).

While it has my pain points (it shouldn't be the only environment!), show me a better alternative to interactive data exploration and sharing it?

rb808 · on Jan 20, 2019

I've been trying to get lots of disparate teams that are experimenting with python to use notebooks for shared, resuable tasks. Its been really hard and I think I'll give up. Intellisense is the big one, modern IDEs are really good at this. Modularity is next. Shared modules are really important to avoid the high level scripts from getting too complex.

That slide #143 though is perfect. I'll have to send this around.

mark_l_watson · on Jan 21, 2019

While I still use notebooks, I increasingly configure matplotlib to render plots right in my shell window (requires iTerm2 on a Mac) so code from a Jupyter notebook can be run in a bash shell and I still get to see plots display inline.

kanox · on Jan 20, 2019

I only used Jupyter a little but TL;DR seems to be:

* Always run notebooks from the top

* Split non-trivial reusable code to modules

* Only use Jupyter for examples with pretty pictures

agumonkey · on Jan 20, 2019

couldn't finish it, too many pain points

worst problem: social one, people justifying idiotic principles (incoherent state because it's not software engineering)..

that said notebooks are probably way better than previous state.. people were writing god knows what with unreasonable state pushed onto the FS and whatever

did notebooks improved since ?

oli5679 · on Jan 20, 2019

One issue I have with notebooks is that you get far messier git diffs.

kermatt · on Jan 20, 2019

This can help in some cases: https://nbdime.readthedocs.io/en/latest/

jjoonathan · on Jan 21, 2019

Can whoever downvoted tell me what the caveat is? It looks good to me, can't wait to try it out.

jjoonathan · on Jan 20, 2019

IDE-style notebooks are a godsend for projects over a certain size threshold. They encourage people who aren't typically used to "programming in the large" to start folding unwieldy chunks of code into files, which in turn encourages them to experiment with the abstractions and structure that software engineers use to handle scale. Notebooks got this wrong, Jupyterlab (and RStudio) get it right, and it's a huge step forward!

That said, Jupyterlab has growing pains. Things like ipywidgets that used to Just Work now Just Don't -- if you want to use widgets as seen in the screenshots, you have to separately install npm, learn to grok the jupyterlab extension manager, and surf the version compatibility matrix between the backend ipywidgets you have installed and the frontend ipywidgets you have installed. Sigh. I'm sure it'll iron out in time, but every time I see the screenshot with widgets in it I have to clear my throat.

useful · on Jan 20, 2019

I write libraries in an ide and give examples in a notebook. Working in a cross platform environment is painful when gpu stuff doesn't work out of the box for everyone or when you need a way to poorly spread work across a compute cluster instead of using your local machine.

Actually, I hate python now because of this. I would kill for a proper tool for implementing requirements.txt like package.json(yarn) or pom.xml(maven) that worked on jupyter by defining a base image similar to docker. Pip+virtualenv just seems like a half-attempt

santoriv · on Jan 20, 2019

You should check out Binderhub - it uses RepoToDocker to generate a Dockerfile containing all the dependencies listed in requirements.txt in a git repository. Then it spins up a Jupyter Notebook server running in a Docker container with all of your dependencies installed.

https://mybinder.org/

reallymental · on Jan 20, 2019

dannykwells' link has got all the good points for why a Jupyter Notebook is kinda disadvantageous, but what about the new Jupyter Lab?

It's not got support for simple code/header folding yet. That's an issue.

ArthurBrussee · on Jan 21, 2019

Why are we going through all this trouble to make something that in the best case is not worse than existing IDEs?

I can't help but feel that a visualization window + a cacheResult(data, fileName) function would be superior to what Jupyter can ever hope to be

LonelyBanana · on Jan 20, 2019

If I describe this as an IDE in the browser that runs in debug/breakpoint/interactive execution mode by default, how wrong am I?

The interface also reminds me of RStudio somehow.

My own preference is to only use notebooks as an exploratory tool, debugging tool, or to make presentations. For any longer code or serious project that I'll write I'll stick to my traditional editor (emacs/vim/etc.). Call me old-fashioned?

wodenokoto · on Jan 20, 2019

Is there a new version out, or is there any particular reason to post this?