
How to Version-Control Jupyter Notebooks - tosh
https://nextjournal.com/schmudde/how-to-version-control-jupyter
======
KerrickStaley
nbstripout [1] is my favorite tool for this. Installing it in your Git repo is
2 lines:

$ pip install --upgrade nbstripout # install nbstripout bin

$ nbstripout --install # install Git hook in current repo

Then, any .ipynb files that you check in will have their output stripped in
the index (without affecting your working copy).

(Surprised it's not mentioned in the article.)

[1] [https://github.com/kynan/nbstripout](https://github.com/kynan/nbstripout)

~~~
potatohead00
This is my current go to solution. It does require the collaborators to also
have it installed

~~~
StavrosK
If you use CI you can add such things there so any commits that didn't run
them fail CI. I do the same with the `pre-commit` utility, it's very very
handy for running checks repeatably.

------
Eridrus
I haven't really followed Jupyter's development lately, so maybe this is
already happening, but I think what you really need is some concept of a
workspace, rather than a single file.

The problem with notebooks is that they get unwieldy, and you want to keep a
bunch of code around that's only useful in certain cases, or just starts doing
"too much".

Sure, you can factor this code out into a library/function, but there's
nothing that makes that easy, and once you've made it into a library, there's
nothing that helps you easily make changes to that library in a different
notebook.

~~~
hadsed
Perhaps notebooks should have variables accessible as if they were modules.
This would solve my personal problem of building libraries.

Jupyter has great potential to be a new kind of IDE, it just needs more
resources.

------
kvm
Am I the only one who looks at this and thinks: "wtf, no, versioning notebooks
should not be this tedious?" instead of suggesting other horrendous ways of
versioning them?

~~~
schmudde
Absolutely. Point #4 offers a more sane alternative.

------
ezoe
Don't use the Jupyter Notebooks for something you want to version control it.
It's like a one-line Perl script. Write, run and delete.

If you need more than that, use the plain text file source code.

Actually, just forget the Jypyter Notebooks and use good old plain text source
code like the rest of the programmers.

~~~
kortex
As a data scientist, I disagree strongly on this. Writing "typical"
application code, sure, jupyter is (probably) overkill. But for cv, nlp, data
sanitizing, etc, you are _constantly_ iterating over algorithms and visually
viewing the output. Multi-stage pipelines just require rerunning a cell.

Caching to disk is cumbersome for data that's usually junk.

Cells and integrated vis is such a massive leap forward that using plain old
text feels like banging rocks together.

~~~
aldanor
Pretty much this. As a quant / data scientist, I quite often have notebooks
just hanging there for weeks with a few hundred GB of ready-to-use data
preloaded and preprocessed in the kernel which makes the experimenting with it
incredibly ergonomic.

Being able to quickly check the output while iterating on a an algorithm, or
visualise intermediate results is irreplaceable.

------
imbiased
It’s missing that one can use Hydrogen and avoid this problem:
[https://nteract.io/](https://nteract.io/).

Export isn’t great atm but can be combined with pweave:
[http://mpastell.com/pweave/docs.html](http://mpastell.com/pweave/docs.html)

I think VSCode has something similar.

This gives another advantage of using a proper editor and its entire
ecosystem.

~~~
altairiumblue
With the VSCode Python extension you can directly create cells with #%% in a
similar way to Hydrogen. There is also Neuron which allows you to see outputs
in a separate pane.

I'm still struggling to find a setup in which cells are auto-generated (or
unnecessary like in RStudio) and the autocomplete works as well as in
JupyterLab. If I could reliably see all methods/submodules/inline
documentation + path autocomplete quickly and for all packages, I would switch
to VSCode. (There's a good chance that this just due to me not being fully
aware of what's available in VSCode. )

~~~
pavanagrawal123
I have never programmed in R before, but why do you say that there is no need
for cells?

I use cells/notebooks in Python, so I can keep my code organized and run
computationally intensive things once... Is this something that is not needed
in R?

~~~
altairiumblue
So firstly, you can use R in Jupyter in the exact same way you use Python (ju-
pyt-er stands for Julia, Python, R).

Then R also had RMarkdown which allows to have notebooks with executable cells
(code chunks) and they play much nicer with version control than .ipynb files.

What I was referring to in my previous post is working with a .R file (which
is plain text) in RStudio. If my cursor is on a single line which is also one
statement, ctrl/cmd + enter executes that statement and shows me the output in
the console or in a separate pane for plots. If the cursor is within a multi-
line expression such as a plot declaration, beginning of a loop, function
declaration, then the interpreter figures out that I want to run multiple
lines and executes the whole loop/declares function/creates plot. Or I can
also select some code and run it.

Ideally, this is the kind of behaviour that I'd like to replicate with a .py
file. It's a nice interactive workflow and also solves the problems that
jupyter has with version control.

~~~
pavanagrawal123
Interesting... I'm currently working on VSNotebooks (extension for VScode),
which is a fork from Neuron... I would love to get some ideas that could help
bring notebooks into the future, so thanks for your reply!

------
westurner
Mentioned in the article: manual nbconvert, nbdime, ReviewNB (currently GitHub
only), jupytext.

Jupytext includes a bit of YAML in the e.g. Python/R/Julia/Markdown header.
[https://github.com/mwouts/jupytext](https://github.com/mwouts/jupytext)

~~~
nerdponx
Huge +1s for both nbdime and jupytext. Excellent tools both.

------
betatim
nbdime -
[https://nbdime.readthedocs.io/en/latest/](https://nbdime.readthedocs.io/en/latest/)
works very well and works well in the terminal with `git diff` for me
([https://nbdime.readthedocs.io/en/latest/vcs.html#usage](https://nbdime.readthedocs.io/en/latest/vcs.html#usage)).
Wanted to highlight the fact it integrates well with `git diff` which is my
favourite part of nbdime but skipped in the article.

------
abakus
Just write python scripts instead in the first place. Import to Jupyter if you
really, really need the notebook UI.

~~~
prepend
Write methods in Python, call them from notebook. This works well in
collaboration and team members can just fork notebook or the merges are kind
of trivial.

This makes the notebook just a convenient way to visualize or share with non
team members.

------
andrethegiant
Holy moly, the JS main bundle on this site is 16MB.

~~~
devwastaken
I imagine it's because this is a web jupyter book or something. It's
definitely extremely slow loading the page on mobile even after downloading
the assets, so it's probably super unoptimized.

~~~
kvlr
Right, we didn’t get around to splitting the js bundles up yet but will do so
soon, thanks for the reminder. Currently the main bundle is the same for the
page and the editor which you can try at
[https://nextjournal.com/try](https://nextjournal.com/try)

------
ognarb
In my university I took a data science course and we needed to do jupyter
notebooks in groups, and merging was horrible. And then we expressed some
concern to the course team, they recommend that we use google drive. I still
think that jupyter notebook file format wasn't done for collaboration.

------
altairiumblue
Do we already have something with the quality and workflow of RStudio but for
Python?

Spyder and Rodeo don't even come close at this point. Does PyCharm allow
something similar?

~~~
RA_Fisher
Rstudio and R Notebooks work great with Python:
[https://cran.r-project.org/web/packages/reticulate/vignettes...](https://cran.r-project.org/web/packages/reticulate/vignettes/r_markdown.html)

~~~
altairiumblue
Yes, but it still feels hacky and only supports R markdown.

~~~
RA_Fisher
R markdown? I'm not sure I understand, the text between chunks is plain 'ole
markdown. It can be overridden with Latex as needed. What parts feel hacky?

~~~
altairiumblue
My point was that I don't want to work with markdown at all. Ideally, I'd like
to work with a .py file in the same way that I can execute parts of a .R file
in RStudio.

~~~
RA_Fisher
I hear ya. I think you can do that in Rstudio, but I've only tried with .R
files.

~~~
RA_Fisher
Yep, looks possible with code completion in the next iteration:
[https://twitter.com/grahamimac/status/1076881510194651136?s=...](https://twitter.com/grahamimac/status/1076881510194651136?s=21)

------
jimhefferon
I have a TOC that covers the right third of the page. Do other folks have
that? Does the document author not see it?

~~~
rasteau
It's an issue with the responsive layout. If the window is wide enough the
overlap goes away; narrow enough the TOC goes away.

