In addition to stripping the output, I will advertise a Jupyter trick that has helped me many times: When I'm ready to walk away from a project for a while, or even if I just need a coffee break, I will do "restart and run all cells." This ensures that there is no hidden state that I've forgotten about, and that someone else could run the same notebook without mishap.
If you use CI you can add such things there so any commits that didn't run them fail CI. I do the same with the `pre-commit` utility, it's very very handy for running checks repeatably.
I wonder if it would be easier and/or possible (should be) to just write a filter in jq that strips output, prompt_number and execution_count in each cell
I haven't really followed Jupyter's development lately, so maybe this is already happening, but I think what you really need is some concept of a workspace, rather than a single file.
The problem with notebooks is that they get unwieldy, and you want to keep a bunch of code around that's only useful in certain cases, or just starts doing "too much".
Sure, you can factor this code out into a library/function, but there's nothing that makes that easy, and once you've made it into a library, there's nothing that helps you easily make changes to that library in a different notebook.
Am I the only one who looks at this and thinks: "wtf, no, versioning notebooks should not be this tedious?" instead of suggesting other horrendous ways of versioning them?
As a data scientist, I disagree strongly on this. Writing "typical" application code, sure, jupyter is (probably) overkill. But for cv, nlp, data sanitizing, etc, you are constantly iterating over algorithms and visually viewing the output. Multi-stage pipelines just require rerunning a cell.
Caching to disk is cumbersome for data that's usually junk.
Cells and integrated vis is such a massive leap forward that using plain old text feels like banging rocks together.
Pretty much this. As a quant / data scientist, I quite often have notebooks just hanging there for weeks with a few hundred GB of ready-to-use data preloaded and preprocessed in the kernel which makes the experimenting with it incredibly ergonomic.
Being able to quickly check the output while iterating on a an algorithm, or visualise intermediate results is irreplaceable.
With the VSCode Python extension you can directly create cells with #%% in a similar way to Hydrogen. There is also Neuron which allows you to see outputs in a separate pane.
I'm still struggling to find a setup in which cells are auto-generated (or unnecessary like in RStudio) and the autocomplete works as well as in JupyterLab. If I could reliably see all methods/submodules/inline documentation + path autocomplete quickly and for all packages, I would switch to VSCode. (There's a good chance that this just due to me not being fully aware of what's available in VSCode. )
edit: Atom IDE (that this package links to) has been deprecated last week or so by Facebook, I'm not sure what dependencies packages like the above have on the atom-ide-ui.
I have never programmed in R before, but why do you say that there is no need for cells?
I use cells/notebooks in Python, so I can keep my code organized and run computationally intensive things once... Is this something that is not needed in R?
So firstly, you can use R in Jupyter in the exact same way you use Python (ju-pyt-er stands for Julia, Python, R).
Then R also had RMarkdown which allows to have notebooks with executable cells (code chunks) and they play much nicer with version control than .ipynb files.
What I was referring to in my previous post is working with a .R file (which is plain text) in RStudio. If my cursor is on a single line which is also one statement, ctrl/cmd + enter executes that statement and shows me the output in the console or in a separate pane for plots. If the cursor is within a multi-line expression such as a plot declaration, beginning of a loop, function declaration, then the interpreter figures out that I want to run multiple lines and executes the whole loop/declares function/creates plot. Or I can also select some code and run it.
Ideally, this is the kind of behaviour that I'd like to replicate with a .py file. It's a nice interactive workflow and also solves the problems that jupyter has with version control.
Interesting... I'm currently working on VSNotebooks (extension for VScode), which is a fork from Neuron...
I would love to get some ideas that could help bring notebooks into the future, so thanks for your reply!
I imagine it's because this is a web jupyter book or something. It's definitely extremely slow loading the page on mobile even after downloading the assets, so it's probably super unoptimized.
Right, we didn’t get around to splitting the js bundles up yet but will do so soon, thanks for the reminder. Currently the main bundle is the same for the page and the editor which you can try at https://nextjournal.com/try
Write methods in Python, call them from notebook. This works well in collaboration and team members can just fork notebook or the merges are kind of trivial.
This makes the notebook just a convenient way to visualize or share with non team members.
In my university I took a data science course and we needed to do jupyter notebooks in groups, and merging was horrible. And then we expressed some concern to the course team, they recommend that we use google drive. I still think that jupyter notebook file format wasn't done for collaboration.
R markdown? I'm not sure I understand, the text between chunks is plain 'ole markdown. It can be overridden with Latex as needed. What parts feel hacky?
My point was that I don't want to work with markdown at all. Ideally, I'd like to work with a .py file in the same way that I can execute parts of a .R file in RStudio.
$ pip install --upgrade nbstripout # install nbstripout bin
$ nbstripout --install # install Git hook in current repo
Then, any .ipynb files that you check in will have their output stripped in the index (without affecting your working copy).
(Surprised it's not mentioned in the article.)
[1] https://github.com/kynan/nbstripout