As someone who switched from Notebooks to script style coding for all of my data science, I couldn't a) agree more with most of his points and b) be happier that I made the switch.
Just two days ago working in Rstudio, thought my script ran fine, but actually the data loading part failed but Rstudio charges on despite errors and runs the rest of the script on the data that was still in memory from another script I had run the day before!
But I don't have a satisfactory solution to handle the problem where Step A takes a few minutes or longer to complete and so I don't want to keep re-running it while coding Step B. I resort to copy-and-pasting Part B into ipython which amounts to being a poor-man's notebook. Obviously the "correct" answer is to write out the results of A to disk but this is can be a lot of a pain while you're developing! (Eg: loading an excel sheet in pandas is slow, while the equivalent tab-separated sheet is very fast, but I don't really want to have a separate step pre-processing all excel sheets into tsv files.) Also, pandas to_csv/read_csv functions aren't even inverses of each other (they lose the index) so writing to desk requires extra steps. (And from_csv is deprecated.)
What I'd rather have is "half" a notebook where I can checkpoint a computation part way through and restart from there. But only ever proceed linearly through the code to avoid notebook out-of-order confusion.
While the to_csv method and read_csv function aren't inverses by default, they can be with one keyword argument. Design mistake, but it's not terrible.
Same here. I work in the systematic trading business and I gave Notebooks an honest shot, but in the end using Pycharm and scripts along with the ipython console is the best workflow I found. I can easily send blocks of code to the console with a single keypress inside Pycharm, making interactive work a breeze. However, all my code is in nice idiomatic python files that can be imported, refactored and easily version controlled. The same workflow can be done in VIM, Emacs or pretty much any other editor with an ipython console integration.
Yeah, I've made stuff with notebooks for clients who wanted them, but I've never been sold on it. It can be useful for playing around with new libraries before I start using them, but when I want to write actual code I want to keep, I will always do it as a packaged library driven by scripts. You can't deploy notebooks.
Whenever I've had to use notebooks, if I make a change, I just run it top-to-bottom as if it were a script, and try to push as much of the important code as possible into a packaged library. And all of this experience has been with JupyterLab, which doesn't seem to address any of the concerns in this slideshow.
The simple fact that I can't F12 to go to the definition of a function makes doing anything serious with notebooks a chore compared to a real IDE. Missing static analysis, indentation control with tab, and moving lines with shift-up/down, and quick refactor tools are also nuisances that leave no reason to use notebooks over a real editor like PyCharm.
crazy powerful tools have different communities that use them. if you describe jupyter as a mash-up of word and excel, every non-IT pro that works at BigCo will have a pretty good idea of what it is and why it's a good idea for them.
alternately, think about who you are writing code _for_. If it's for yourself or other people, mixing code, graphics and text is a really good idea. If you are writing code for computers you will prefer other workflows. Complaining that a workflow that is useful for one community doesn't work for your community is pointless - try going 1 or 2 levels up in your abstractions.
It is good to be aware of the alternatives. Yet, I am super suspicious of data scientists not using Jupyter Notebook at all (collaboration with most of them was a pain; e.g. they didn't explore data enough, had a beautiful pipeline of garbage-in garbage-out data, etc).
While it has my pain points (it shouldn't be the only environment!), show me a better alternative to interactive data exploration and sharing it?
I've been trying to get lots of disparate teams that are experimenting with python to use notebooks for shared, resuable tasks. Its been really hard and I think I'll give up. Intellisense is the big one, modern IDEs are really good at this. Modularity is next. Shared modules are really important to avoid the high level scripts from getting too complex.
That slide #143 though is perfect. I'll have to send this around.
While I still use notebooks, I increasingly configure matplotlib to render plots right in my shell window (requires iTerm2 on a Mac) so code from a Jupyter notebook can be run in a bash shell and I still get to see plots display inline.
worst problem: social one, people justifying idiotic principles (incoherent state because it's not software engineering)..
that said notebooks are probably way better than previous state.. people were writing god knows what with unreasonable state pushed onto the FS and whatever
IDE-style notebooks are a godsend for projects over a certain size threshold. They encourage people who aren't typically used to "programming in the large" to start folding unwieldy chunks of code into files, which in turn encourages them to experiment with the abstractions and structure that software engineers use to handle scale. Notebooks got this wrong, Jupyterlab (and RStudio) get it right, and it's a huge step forward!
That said, Jupyterlab has growing pains. Things like ipywidgets that used to Just Work now Just Don't -- if you want to use widgets as seen in the screenshots, you have to separately install npm, learn to grok the jupyterlab extension manager, and surf the version compatibility matrix between the backend ipywidgets you have installed and the frontend ipywidgets you have installed. Sigh. I'm sure it'll iron out in time, but every time I see the screenshot with widgets in it I have to clear my throat.
I write libraries in an ide and give examples in a notebook. Working in a cross platform environment is painful when gpu stuff doesn't work out of the box for everyone or when you need a way to poorly spread work across a compute cluster instead of using your local machine.
Actually, I hate python now because of this. I would kill for a proper tool for implementing requirements.txt like package.json(yarn) or pom.xml(maven) that worked on jupyter by defining a base image similar to docker. Pip+virtualenv just seems like a half-attempt
You should check out Binderhub - it uses RepoToDocker to generate a Dockerfile containing all the dependencies listed in requirements.txt in a git repository. Then it spins up a Jupyter Notebook server running in a Docker container with all of your dependencies installed.
If I describe this as an IDE in the browser that runs in debug/breakpoint/interactive execution mode by default, how wrong am I?
The interface also reminds me of RStudio somehow.
My own preference is to only use notebooks as an exploratory tool, debugging tool, or to make presentations. For any longer code or serious project that I'll write I'll stick to my traditional editor (emacs/vim/etc.). Call me old-fashioned?
As someone who switched from Notebooks to script style coding for all of my data science, I couldn't a) agree more with most of his points and b) be happier that I made the switch.