I can change parameters in a script. What's the advantage?

abdullahkhalids · 2024-06-13T22:49:58 1718318998

Remember back university, we used to do course projects (for science/engineering course) and at the end we would have to write a report. The report would typically include, among other stuff, two things.

(1) The calculational methods we used - could either be a set of mathematical equations or a description of the algorithms. (2) The results of evaluating these equations/algorithms for different parameter values. Usually some graphs, and some discussion of their meaning.

A Jupyter notebook is designed to replicate that process but make it easier because the figures are produced by code right there. Personally, all my notebooks include a discussion in the markdown cells what I am doing, and why. It includes discussions of the code. And directly from the code, some graphs or numbers, with a discussion attached.

With the script workflow, I would have two different files. One with the code, and one with the results pasted in. It's annoying when my primary goal is to develop and test the algorithms under discussions. Best thing is, if done right, my work is completely replicable. Just run the notebook again.

Just because some people misuse the tool doesn't mean the tool isn't useful.

pipe2devnull · 2024-06-13T22:51:11 1718319071

Sometimes you want to quickly iterate on a portion of code doing some sort of analysis or tweaking a plot. If your data pipeline takes a while to run then rerunning the whole script is really awful. Notebooks make it easy to cache rerun and tweak chunks of the code.

jhbadger · 2024-06-13T22:41:41 1718318501

You can't embed graphs in a script, and plotting is an important part of systematic research. Also, it is easier to have an obvious sequential set of experiments over months in a notebook rather than a bunch of scripts. It's the same reason scientists use lab notebooks to keep track of things rather than just a bunch of loose papers.

alan-hn · 2024-06-13T22:46:33 1718318793

Yes, I have a lab notebook and I work in a lab. However I use folder structures to organize my scripts and as for graph outputs I save them as files. In my line of work I generate hundreds of graphs for data analysis verification and a notebook isn't set up to handle things like that as far as I can tell

cqqxo4zV46cp · 2024-06-13T22:49:14 1718318954

You are asking what the advantage of a notebook is. Can you truly not see a job / workflow in which inline plotting would be advantageous? I find that very hard to believe.

williamcotton · 2024-06-13T22:52:39 1718319159

My workflow normally starts in the shell (SQL, R/rush/ggplot csvtk, etc), moves to Jupyter (F# in .NET Interactive, with some SQL and R/ggplot inlined), and then to a Makefile when I’m ready to make some deliverables.

I’ve got an ever growing tool set of F# data related functions that I’ve moved to a personal lib in my dotfiles and use in scripts, notebooks, etc.

setopt · 2024-06-14T07:58:12 1718351892

There’s also a hybrid version that I like: Write proper scripts (which are easier to e.g. push to HPC servers or deploy later), and explore them using something like Quarto (neé RMarkdown, renamed now that it supports Python).

Then you still get a digital lab notebook that ties together scripts, plots, and documentation, but the scripts remain usable standalone.

szvsw · 2024-06-13T22:51:22 1718319082

Another advantage is when you have very slow code, you can use cells as caches, essentially without having to worry about serialization to disk. This often makes it much easier to interactively explore/develop downstream methods without needing to re-run earlier upstream dependencies.

This is especially useful with large datasets. Even if serialization is straightforward, if you have enough data (or the data is remotely hosted), loading it might take anywhere from 2s to multiple minutes, and even 2s is enough to get you out of the flow if you are working rapidly and want quick feedback.

zellyn · 2024-06-14T14:04:05 1718373845

This.

It's also useful when one of your cells goes and queries a slow API for a bunch of data — I do this all the time with Datadog.

whywhywhywhy · 2024-06-14T11:37:23 1718365043

You can rerun something halfway down a script without re-running the whole thing.

If your script requires loading 12+ GB of ml models into a gpu before running anything at all this is the difference between a few seconds and a minute to see a change also if the output isn't text you can see the image or chart result inline to that code.

williamcotton · 2024-06-13T22:42:13 1718318533

I am guessing you don’t do data science or data forensics? Inline styled tabular outputs and graphic plots while exploring data is very handy!

Edit: jinx!

vundercind · 2024-06-14T05:52:33 1718344353

They’re a repl where you can go back and edit and re-run earlier parts much more easily than on a normal repl.

yowlingcat · 2024-06-14T06:31:39 1718346699

I think the workflow improvement happens but it's not because notebooks allow you to do something that you can't do otherwise. They just improve ergonomics.

For example, there are a lot of cases my team uses notebooks for proofs of concept where we make a large expensive call to load a large chunk of data, slice a small piece of it, iteratively try to reprocess the piece until you get the reprocessing to occur the desired way, validate it reprocessed correctly, and then extend the reprocessing the the rest of the data set. That can all be done after only making 1 expensive call. Further more, if the last cell evaluation fails, it just resets you back to the line before and you can retry it.

Can you do this with a script? Absolutely. You can write a script to download the data, and a script to process the data, and sub scripts for the individual steps. But that's not the path of least resistance; the path of least resistance involves you having to edit a piece and recompile everything and reset the entry point. Avoiding really makes it easier to brute force to the desired state ASAP.

packetlost · 2024-06-13T22:36:47 1718318207

Immediate(ish) visualization and a whole lot of tooling to make presentation palatable for some datasets/types

slt2021 · 2024-06-14T04:26:04 1718339164

you would have to re-run script from the beginning - this is not productive in scientific experiments, where you need to re-run certain parts of your code and tune/change parameters, try different things.

if your calculation is long running you would not be as productive as could be in notebooks

setopt · 2024-06-14T08:02:22 1718352142

An alternative here is to make a script with `# %%` code cells, so you can send one cell at a time to a REPL while developing the script.

Peritract · 2024-06-14T09:47:51 1718358471

I think you've just reinvented the notebook.

setopt · 2024-06-14T11:50:00 1718365800

On the contrary, it’s a backport of the (IMO) most useful notebook feature to work also for regular scripts. It’s been ported to most code editors by now (VSCode “Interactive Python”, Vim “hydrogen”, Emacs “code-cells.el”, Sublime “Send Code” extension, Spyder has it built-in, etc.), so a lot of people have found that feature useful.

In contrast to Jupyter, you’re still working with plain text files and not JSON, and you don’t end up saving the cached data in the same file as the script.

In contrast to Quarto, Jupytext, etc., this is still just a code file and not a MarkDown file with code blocks. Not all editors have fully working “go to definition” etc. in MarkDown code blocks, and in any case, many people need a standalone script that can be placed in an HPC job queue after initial local testing is done.

Cacti · 2024-06-13T22:38:07 1718318287

It’s a REPL, for starters.

lelandbatey · 2024-06-13T23:19:34 1718320774

REPL stands for Read–eval–print loop https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93prin...

A REPL, by its name, is a very narrow version of the broader paradigm of interactive computer programming environments. But Notebooks are not REPLs, unless you use REPL to mean "interactive programming environment" and not REPL. Notebooks are much broader than a REPL! In a notebook, you can go back and edit and run individual lines of the notebook without re-running the whole notebook from the start and without re-computing everything that depends on what you just edited. Behavior like this makes it super hard to track the actual state, and super easy to lose track of how things are how they are. That's pretty terrible!

The parent article links this great talk that goes into more detail than the parent post and is much easier to understand: https://www.youtube.com/watch?v=7jiPeIFXb6U

nikonyrh · 2024-06-14T00:37:05 1718325425

"In a notebook, you can go back and edit and run individual lines of the notebook without re-running the whole notebook from the start and without re-computing everything that depends on what you just edited."

Isn't this a standard on REPLs as well? You can select the code you wish to run, and press Ctrl+Enter or what ever. I must admit, I've programmed Python for about 10 years in Spyder and VS Code now, but I haven't used notebooks at any point. Just either ad-hoc scripts or actual source files.

My definition of a "notebook" is an ad-hoc script, split into individual "cells" which are typically run as a whole. On my workflow, I just select the code I wish to run. Sometimes it is one expression, one line, 100 lines or 1000 lines depending what I've changed on the script.

lelanthran · 2024-06-14T05:11:25 1718341885

> Isn't this a standard on REPLs as well? You can select the code you wish to run, and press Ctrl+Enter or what ever.

Not usually, no. Type `python` at the command prompt - what you get is a REPL. Type `clisp` at the command prompt, or `wish`, or `psql`, or `perl` or even `bash` - those are al REPLs.

Very different to a program that presents an editor, and then lets the user selectively choose which lines/expressions in that editor to run next. For example, type `emacs somefile.sql` in the command prompt. The application that opens is most definitely not a READ-EVAL-PRINT-LOOP.

dahart · 2024-06-14T06:27:33 1718346453

Why would adding fancy select or cut-and-paste features to a REPL make it not a REPL? Selectively choosing which lines to run is just a convenience to let you not have to type the whole line or set of lines again, it doesn’t really change the base interaction with the interpreter.

lelanthran · 2024-06-14T06:50:34 1718347834

> Why would adding fancy select or cut-and-paste features to a REPL make it not a REPL?

For the same reason that adding (working) wings to a car makes it not a car anymore.[1]

I mean, to my mind, when something is satisfying a different primary use-case, then that thing is a different thing.

I'm sure there's some fuzziness in the distinction between "This is a REPL/car and this is a Notebook/plane".

Usually it's very easy to see the distinction - the REPL is waiting for the next command and the next command only while the notebook takes whatever input you give it, determines whether it got a command or content, and reacts appropriately.

[1] Tons of examples, TBH. I don't refer to my computer as my calculator, even though the computer does everything a fancy calculator can do. People don't call motorcycles 'bicycles', even though the motorcycle can go anywhere that a legal bicycle can go. More telling is how people don't call their computer monitor 'TV' and don't call the TV a 'Monitor' even when the same actual item is used for both (i.e. I repurposed an old monitor as a small-screen netflix-box, and now an item that used to be called 'monitor' by wife and kids is called 'TV' by wife and kids).

fragmede · 2024-06-14T06:59:01 1718348341

a flying car with wings is still a car. that's the whole point of it. it can drive you to the airport and drive like like a car. I don't care what you call your computer, it can still do math. people who have their TV hooked up to their computer would more readily refer to it as a monitor. idk, I just think REPLs are kinda shit for interacting with the present state of a kernel (as Jupyter calls them). Jupyters better, but still kinda shit because it could automatically infer the important variables in scope and keep a watch list like a debugger does. And then suggest things to do with them since it's a repl and not an IDE. but the thing is fundamentally they're Read Edit Print Loop interfaces to the computer and its current working state.

dahart · 2024-06-14T07:08:02 1718348882

Ugh this is a gish gallop of broken straw man analogies. Being able to select and evaluate a single line in a notebook is nothing like adding wings to a car. Fundamentally, selecting a line to evaluate is no different from typing that line again. It’s a shortcut and nothing more, the interaction is still read-eval-print. Note REPL doesn’t even refer to where the input comes from, the point is simply that it handles your input, processes it, displays the result, and then waits for you for more input. This is as opposed to executing a file where the interpreter exits once the execution is completed, and the return value is not automatically printed.

Jupyter Notebook absolutely is a REPL, see my sibling comment above for the WP link describing it as such. It waits for input, then evals the input, the prints the return value, and then loops.

dahart · 2024-06-14T06:19:17 1718345957

I’m not sure what distinction you’re trying to make. Maybe you can give some examples of notebooks that are not REPLs, since some of them definitely are. For example, Wikipedia says Jupyter Notebook is a REPL. The bare Python REPL (and the command line REPLs in any language, for that matter) has the exact same issue with tracking state, because what you describe is a problem with all REPLs, and all notebooks that are REPLs. That isn’t generally a serious problem with command line REPLs, because those REPLs aren’t meant or used for large system programming, they’re for trying small experiments. The parent article is pure opinion and seems a bit confused about the idea of using the right tool for the job, because command line REPLs and notebooks both have their place, as do IDE projects with lots of files.

“A Jupyter Notebook application is a browser-based REPL containing an ordered list of input/output cells which can contain code, text (using Github Flavored Markdown), mathematics, plots and rich media.”

https://en.m.wikipedia.org/wiki/Project_Jupyter

int_19h · 2024-06-16T05:10:56 1718514656

In modern parlance "REPL" means a lot more than that and typically incorporates at least some kind of editable history.

sneed_chucker · 2024-06-14T10:54:54 1718362494

Formatted markdown, embedded images, graphs, plots, etc.

Can't really do that in a script unless you're running TempleOS