Hacker News new | past | comments | ask | show | jobs | submit login
Jupyter Receives the ACM Software System Award (jupyter.org)
714 points by williamstein on May 2, 2018 | hide | past | web | favorite | 133 comments

The Jupyter team deserves every accolade they get and more. The console, notebook, and now JupyterLab are some of the key reasons why Python's data ecosystem thrives.

I think Jupyter notebooks are quite useful as "rich display" shells. I often use them to set up simple interactive demos or tutorials to show folks or keep notes or scratch for myself.

That being said, I do think the "reproducibility" aspect of the notebook is overblown for the reasons other comments cite. Notebooks are hard to version control and diff, and are easy to "corrupt." I often see Jupyter notebooks described as "literate programs," and I really don't think that's an apt description. The notebook is basically the IPython shell exposed to the browser where you can display rich output.

This is where I think the R ecosystem's approach to the problem is better (a bit like org-mode & org-babel). For them, there is a literate program in plain text. Code blocks can be executed interactively and results displayed inline by a "viewer" on the document (like that provided by RStudio), but executing code doesn't change the source code of the program, and diffs/versions are only created by editing the source. At any point, the file can be "compiled" or processed into a static output document like HTML or PDF.

This is essentially literate programming but with an intermediate "interactive" feature facilitated by an external program. RMarkdown source doesn't know its being interacted with or executed, and you can edit it like any other literate program.

Interaction, reproducibility, and publication have fundamental tensions with each other. Jupyter notebooks are trying to do all three in the same software/format, and my sense is that they're starting to strain against those tensions.

Notebooks can be reproducible, they just aren’t automatically so. It requires a little bit of effort and discipline, if reproducibility is a goal. https://www.svds.com/jupyter-notebook-best-practices-for-dat... is an excellent starting point. Personally, I use notebooks to keep a record of large computational pipelines. The key is to cache all results to disk. This allows for an iterative process where I modify the notebook, kill the kernel, and rerun everything. Only new calculations will be executed, everything previously calculated will simply be loaded from disk. In the end, I have a reproducible record of the entire project (and rerunning the notebook is fast) This kind of make-like functionality is implemented through the doit Python package (http://pydoit.org). An example workflow for this is http://clusterjob.readthedocs.io/en/latest/pydoit_pipeline.h...

So basically you write a script instead of a notebook? If you save data on disc, are they still displayed with the rich formating of Jupyter?

Well, it also contains markdown comments (often with LaTeX formulas, so the graphical rendering is appreciated), and, most importantly, plots of the results (which are typically fast to generate, so they are not cached).

I agree, 120%.

I like the r approach so much more.

I mean, as a medium for interactive exploration where you might want graphs and widgets or other rich/dynamic output, I still think the notebook is superior. But as a medium for developing complete, share-able, reproducible data analyses, I do think R has the upper hand.

Graphs, widgets and other rich/dynamic output is also possible with the R approach.


Additionally, Rstudio is an incredibly powerful IDE for data analysis.

EDIT: Interestingly, however, I still use ESS https://ess.r-project.org/ but that's because I love Emacs too much :D

I understand. I believe I pointed that all out my comment above. I wasn't saying that I find the notebooks superior because they allow for rich & dynamic output, but that I find it superior to RStudio when all you want is a quick exploratory REPL capable of rich/dynamic output. I simply find it easier to fire up a notebook and start noodling around than writing an RMarkdown notebook. That really only holds if I'm not overly concerned with keeping or sharing the notebook. Otherwise, I believe RMarkdown is the better option.

I also tend gravitate towards ESS, and probably split my R development time between emacs and RStudio. I've even written a very kludgy Rmd notebook mode that uses overlays to show evaluation results from code chunks. But RStudio is very well-designed and ESS just doesn't compare feature-wise, sadly.

I just like python pandas better than R.

Not me. I'd take dplyr and related libraries over pandas any day. I've been using pandas for 6 years and I'm still regularly tripped up by parts of its API.

So well deserved. Jupyter is critical infrastructure, helping the scientific community address the reproducibility crisis. It's so great that it's free and open source, and can be shared, used and contributed to by scientists all around the world.

Not only that, it's great for tutorial material as well!

Trivia: Ju-py-ter = Julia Python Terminal

Although it's gone far beyond just Julia and Python now.

Edit: Ahurmazda is right.

...the core programming languages supported by Jupyter are Julia, Python and R. While the name Jupyter is not a direct acronym for these languages, it nods its head in those directions. In particular, the "y" in the middle of Jupyter was chosen to honor our Python heritage.


Also worth noting: Hydrogen, the Jupyter plugin for Atom[1][2].

Hydrogen is a fantastically well-executed and useful piece of software. (As is Jupyter.) I’ve used it in my own programming, and also in teaching introductory software development, where it’s a helpful transition from Jupyter notebooks (which are used in the early part of the course, for problem sets / reading journals) to text files and code editors (which are introduced later, and used for team projects, and for projects that use Flask or PyGame).

But also — relevant to this sub-thread — “Hydrogen” (it relates “Jupyter” to “Atom”) has to be one of the best project names ever. It’s right up there with “Pyramid Scheme”[3].

[1] https://nteract.io/atom

[2] https://atom.io/packages/hydrogen

[3] http://www.michaelburge.us/2017/11/28/write-your-next-ethere...

ahurmazda pointed out that you're thinking of 'Jupyter Python R'. And this is a reference, not an 'equals'.

The name is also in homage to Galileo's notebooks recording the discovery of the moons of Jupiter (the four we now call the 'Galilean moons'), and also to a bar called Jupiter in Berkeley, which the core team has visited quite often. The last one's more like a funny coincidence, though.

That Quasar beer at Jupiter... Honestly, all the beer at Jupiter. Mmmmm.

XKCD Galilean moons. https://xkcd.com/1300/

Congrats BTW.

Julia Python R

Is it pronounced jupeeter or jupieter?

And is NumPy "Numb Pie" or does it rhyme with "lumpy"? (My apologies to everyone who will picture something lumpy next time they're doing a bunch of math).

I've always heard "Numb Pie"

It think it's pronounced numpy...

That said, I've always rhymed it with lumpy, haha

In conference talks, most people (including NumPy's creators) pronounce it "numb pie".

I pronounce it like the planet, and everyone else I've met in Jupyter leadership (including Fernando and Brian) also pronounce it like the planet.

I just pronounce it the same as the planet Jupiter.

Oh, I say the Py part as in Python... Is it wrong?

Yes. The creators pronounce it exactly like "Jupiter", where the unstressed middle "i" vowel becomes a schwa.

Fernando Perez says "Jupiter". It's his baby, "Jupiter" it is.

Like the planet

Jupyter is awesome, well deserved award. Our CTO is working to integrate JupyterLab and JupyterHub directly into GitLab.

Getting a sensible git-diff would be great!

nbime can help ! NoteBook DIff and MErge. https://github.com/jupyter/nbdime

Thanks, we love GitLab as well ! Would appreciate to have your feedback on how to make integration easier and get productive and constructive discussion. Will we see GitLab at JupyterCon this august in NY ?

We probably won't be a JupyterCon, but my CTO would love to have a call. Is it OK if Eliran, responsible for partnerships, reaches out to you for a video call?

Sure. We have team member across the bay in Berkeley, we can also cross over to see you or vice versa.

Great, we'll make it happen. It will probably be a video call since our CTO and co-founder is living in the Ukraine.

fyi - Why Ukrainians Hate When You Say 'The Ukraine' http://www.newsweek.com/why-ukrainians-hate-when-you-say-ukr...

My bad, thanks for the hint.

Does this mean Git and Jupyter notebook (or lab in the future) will play nice? Separation of code and results? Or... what does it mean that Jupyter integrates into Gitlab? FWIW I love both Jupyter and Gitlab and thus by extension, you guys.

Thanks for asking. As a first step we want to make it easy to deploy JupyterHub from GitLab to any attached Kubernetes cluster. The other thing Dmitriy will try to do is make sure the deployed Jupyter will integrate simply with the GitLab repo.

Is there a way to associate and store large data sets, as well? I don't necessarily want them in git (hundreds of GB, for example), but I do want my notebooks to see them...

Can you talk about what this means? Is it just a SAAS offering with SSO? Or will it do binderhub type functions?


Oh wow. Oh wow. Oh wowowow!

IS that a simple wow or reference to the last words of Steve Jobs https://www.theguardian.com/technology/2011/oct/31/steve-job...

I started using Jupyter + Python recently, can't say enough good things about the project.

Sometimes you want to present data, graphics and have a bit of interactivity. The notebooks make it easy to share your code/data/graphics. And it beats a PowerPoint any day (for this use case anyway).

Thanks Jupyter Team!

Am I the only person in the universe who doesn't like Jupyter? I much prefer tools like Rmarkdown and Sweave.

I've been using Python for 15 years, and I also haven't gotten into Jupyter. I worked with some research scientists who used it, but even they ported their programs to plain text files in order to get version control with textual diffs (which is IMO better for collaboration.)

One reason I don't use it is that I started doing data science before the Python ecosystem was viable -- before Pandas existed. I use R for data science (which I generally find superior due to Hadley Wickham's libraries).

I know Jupyter supports R now, but I already had a terminal/web-based workflow by the time that happened.


More importantly, I think this recent blog post finally crystallized why I don't program in REPLs: Because they encourage global variables! I naturally structure my code into functions from the outset.

I don't like the persistence because it can lead to "wrong" programs. I prefer to test my programs with a "clean slate", i.e. by starting a new process.


Jupyter’s structure of delimited code cells enables a programming style where each can be treated like an atomic unit, where if it completes, then its effects are persisted in memory for other code cells to process.

However, this style of programming with Jupyter has its limits. For example, Jupyter penalizes abstraction by removing this interactive debuggability.

In other words, if you put all your code in functions like I do, then Jupyter doesn't add anything. It doesn't let you "step through" the function like a debugger does.

Though, I think that I should somehow try to get over this because there are a lot of benefits to something like Jupyter, like having graphics inline.

Or maybe Jupyter just needs an integrated debugger? And maybe the ability to clear state or tree-walk definitions? I don't like having unused definitions laying around in my workspace.

Also, does Jupyter have any notion of data flow? I don't think it does, because Python doesn't. I think Observable might address some of my gripes, but I haven't tried it yet:


Your complaints resonant strongly with me.

I'd really love an ideally typed expression-oriented language to take notebook programming to the next level ... This is my dream for what I want from swift notebooks ...

Something that:

- memoized all (non-loop?) values created in the notebook file scope

- automatically invalidated memoized entries after source change using control flow analysis and code coverage data from previous runs

- provides an approximation of the conceptual model of re-running the whole notebook 'from the beginning' when the notebook is executed -- but pulls memoized values from the memoized cache when present to make execution fast and to avoid repeating side-effects

- allowed for easy to issue interactive 'invalidate all memoized entries before/after here' operations in the notebook file ...

- involving editing code normally in a regular source-file in ide (with maybe a different extension to imply the different execution semantics)

- allowed for execution of code in debugger when desired without having to change anything ...

- supported inline graphical representations

- supported wiring custom graphical ui models into the notebook's execution context ...

Fantastic list!

I'd encourage you to give https://beta.observablehq.com a try. From all of your points — aside from a statically typed language and editing in your normal IDE — we try to hit that target on the nose.

Every cell is only reevaluated when any of its inputs changes, inline and custom graphical representations can render your live data — and even be used as values to be passed themselves as inputs to other cells. For a very simple example, see: https://beta.observablehq.com/@mbostock/d3-brushable-scatter...

Yes, in short, I think we both want the "dataflow model" and not the "stateful update" model.

If anything should use the dataflow model, it's data analysis!!!

And yes that's why I mentioned Observable, and I'm glad jashkenas also responded. As far as I understand, it's like a spreadsheet, so when you update your inputs, the outputs become consistent automatically.

It's sort of like Make (or perhaps Make in reverse). Dataflow also allows your code to be parallelized. Some scientists don't care about this, but engineers do. It's an eye-opening experience to speed up naive data analysis by 1000x or more with shell scripts and a little C++.

I agree on the global variables. At least you can define functions in blocks of the same notebook file.


[1] a = 2 [2] def square(x): return a 2 print square(a) > 4

[3] b = 5 print square(b) > 25

This is something that you can't do in MATLAB, which I still use primarily. In MATLAB you have to create a new file for each function. Which nulls the readability of the script (notebook) when you publish it, because the source of the functions is not included. So, if I write a MATLAB notebook that I want to publish and share, I end up avoiding to create functions for as long as possible and instead use copy paste ...

Currently, some of the main thing that keeps me with MATLAB regardless is: a) It feels more responsive, probably because its a native app and not running in the browser. b) I don't like to work in the browser. Its distracting. c) I like the profiler, debugger and workspace (constant visual inspection of global variables in a separate window) of MATLAB which comes right out of the box. For python/Jupyter I have to set this up manually.

Note: MATLAB now has an 'interactive script' function that is similar to Jupyter, but it is so slow for > 100 loc, that it's completely useless. Even the Mathworks developers admit this. Instead I use %% to seperate my MATLAB scripts into executable blocks (ctrl + enter) and then use the 'publish' function to create a LaTex file which I can then compile to pdf. This creates vector based math formulas and figures (which 'publish to html / pdf' doesn't).

Hm that limitation sounds severe! Functions are essential. I actually don't remember that, but I last used MATLAB like 15 years ago! I've done a lot more data science than linear algebra in the last decade.

It sounds like RStudio might be a better model for what we want:

https://en.wikipedia.org/wiki/RStudio (click on the screenshot)

It only works with R, but a lot of people say great things about it.

I saw another commenter say that Jupyter is more like Mathematica notebooks, and RStudio is more like MATLAB. I think this sounds right.

In the former, the interactive experience is more central. In the latter, you are developing a program, and the IDE helps you do it interactively. But the program is central. (At least this is true for R, it sounds like it might not be as true for MATLAB. But I know that pretty large programs are written in MATLAB.)

For now I'm still sticking with my highly-custom shell-based workflow. But I do want to make interactive graphics less painful. Right now I juggle a web browser, a terminal, a text editor, and an R REPL!

If you compare RStudio to matlab, then is Spyder for python more what you're interested in, an IDE with interactive code execution, variable inspection, and debugging? https://en.wikipedia.org/wiki/Spyder_%28software%29

You can define functions in a MATLAB script. That was added in R2016b (september, 2016)

Take a look at Cauldron, aka the unnotebook -http://www.unnotebook.com The author discussed it on pocastinit episode 111 - https://www.podcastinit.com/episode-111-cauldron-notebook-wi...

Thanks, I hadn't heard of it! Looks interesting.

We would love to hear what you like about Rmarkdown and Sweave. Jupyter tooling is always improving, and we are very interested in engaging with users about their needs, and helping grow the ecosystem to be able to address those.

My biggest frustrations with Jupyter are (see #4 for comments on Sweave etc):

1. The default front-end is a weak platform for getting work done.

It's a JavaScript code editor. It will never be as good as my personal text editor configuration. It will never be as good as an IDE like RStudio, Spyder, or Pycharm. It's good that there are keyboard shortcuts for doing things like adding cells, and extensions for things like folding cells and adding a table of contents. But it still isn't terribly comfortable to use all day. Also I personally hate doing everything in a browser. Apart from some useful notebook extensions, there are no viable alternative front ends yet.

2. Running a remote kernel is a pain in the ass (cat a config file then manually tunnel 4 ports over SSH), and I can't seem to get it to work on Windows at all.

This is an issue at my company because we do a lot of work on remote servers that can be accessed only through SSH or JupyterHub. Individual users do not have control over the latter, so we are stuck with the inadequate default experience I just described above.

3. No kernel other than Ipython is mature.

IRKernel is getting there. Everything else is at best a beta-quality product.

4. Notebooks are not a plain text file format.

Hand editing a notebook is messy. They do not play well with version control systems and diff tools. RMarkdown and Knitr/Sweave are just preprocessors for established plain text formats (Markdown and Latex with some extra syntax). With those formats you can take advantage of a wealth of existing tooling, as well as having the freedom to edit the file in a normal text editor without having to rely on a special front end. Ironically having everything formatted as JSON should make it easier to write those special front ends, but I have not seen any good ones yet.

Wow this was really useful. I was feeling guilty for not trying Jupyter, after using Python for 15 years and doing data science for much of that time.

I hear so many good things about it. I wrote this comment about it:


But ANY of those four is is a dealbreaker for me. I want to use languages other than Python, with remote kernels, and I want version control. And I like my text editor to be really fast.

I think it comes down to a scientific background vs. a software background. I've memorized a boatload of tools and weird shell incantations, but the result is that I have a more solid workflow than Jupyter provides. Solid in the sense that it is likely to produce reliable results, not that it's "easier".

But if you don't have that software engineering background then I understand that Jupyter makes a whole bunch of things easier. It's not optimal in my view, but it's easier.

Great set of complaints per Jupyter. I think the project is an excellent idea, but it needs to become more mature. I've used it for some prototyping, but it isn't the polished experience you'd like. The web browser is good for looking at the results, but pretty terrible for editing and developing. I'm hoping that with time there will be:

- better front end integration - e.g. a separate vim process connecting/editing cells of a running notebook and updating the browser view on each change

- Fewer bugs and more parity between the python kernel and non-python kernels

To address “hand editing” problem I wrote a little vim plugin that wraps around notedown (https://github.com/aaren/notedown) to edit notebooks on the fly in a markdown format. It’s not perfect, but it goes a long way for quickly editing notebooks “as a whole”: https://github.com/goerz/ipynb_notedown.vim

Other than that, I try to put any lengthy code in functions that are in a module alongside the notebook, so that the notebook mostly contains one-line commands to do kick off a calculation or to generate a plot. I also have a shortcut that copies the content of the current browser text field (notebook cell) into MacVim, and pastes it back automatically as soon as I close the editor.

1. Jupyter Lab (note: NOT Jupyter Notebook) is an attempt to make the interface more IDE-like. It's still not Rstudio due to the Jupyter's notebook nature, but it's close enough for me.

I do prefer Rstudio's REPL approach of being able to run code by line or by blocks (likely inspired by MATLAB's IDE), rather than Jupyter's approach of executing code by cell (which was inspired by Mathematica). They both let you try stuff out easily while maintaining state, but the former is far easier to productionize.

2. Remote kernels over SSH aren't that hard -- I do this all the time via SSH tunnels. I start Jupyter Lab in an SSH console (usually on a cloud-based VM), and create a tunnel to port 8888 (the default) using my Windows SSH app (Bitvise). 1 port. That's it.

3. No comment - I only use the Python kernel.

4. Correct. Notebooks do present challenges for version control.

Remote kernels over SSH aren't that hard -- I do this all the time via SSH tunnels. I start Jupyter Lab in an SSH console (usually on a cloud-based VM), and create a tunnel to port 8888 (the default) using my Windows SSH app (Bitvise). 1 port. That's it.

I want the opposite. I want to use a remote kernel with a local client.

Umm, yes, in my case, the kernel is running remotely on a cloud VM. My client is a local browser (Chrome) which connects to localhost:8888, which is a tunnel set up to connect to the remote machine on port 8888.

This lets me run computationally heavy Jupyter calculations on a beefy remote backend in the cloud. My local browser merely talks to that backend via a tunnel.

Here's something on the web that describes this [1] -- except with Bitvise on Windows, you don't have to enter any SSH commands. The tunnel setup etc. is all done via a GUI. This is a pretty standard SSH tunnel technique. You can use this for more than just Jupyter.

[1] http://www.vickyfu.com/2017/04/using-jupyter-notebook-remote...

Again, that's not what I mean. I want to run Jupyter (or some other front-end) on my laptop and have it talk to a kernel running on a server. You're describing running both Jupyter and the kernel on the server.

Oh I see now. You want to run the raw kernel with no front-end on the remote machine and communicate with it via the 0MQ/JSON transport layer. I'm curious, what is the advantage of doing this vs. simply running an instance of Jupyter on a remote machine?

I don't necessarily want to use Jupyter as the front end. This way lets me use e.g. Pycharm with the kernel running in a console.

BTW I managed to get it to work. I think I had missed a port the first time I tried.

Remote kernels: I'm working on some infrastructure that should make this easier, but it's still some way off being ready.

Kernel maturity: the Julia and Haskell kernels are pretty well supported, I understand, though I haven't used them myself.

Alternative frontends: Emacs IPython Notebook is pretty well maintained, if that's to your taste.

See whether you'd like Org-mode + Babel. Example: https://youtu.be/dljNabciEGg

Using a web browser as an IDE just seems like a solution in search of a problem. Also, maybe I just never spent enough time working with Jupyter, but it seemed to me that it encourages a sort of exploratory workflow, really well suited to teaching programming and data science. It was less clear to me how it could be used well in production environments.

I use Rmarkdown and Sweave to write homeworks for my students in a very Jupyter way. I also use them to generate data driven static webpages, procedurally generate production quality and easily formatted PDF and HTML reports. I also use them as a templating system for auto-generated model diagnostic emails. Perhaps I need to return to Jupyter to see what I'm missing, but I don't really know what purpose it would serve, or what kind of work it would make easier.

Doing the interface in the web browser has its ups and downs: it makes some things more awkward locally, but it's easy to deliver the same interface remotely - e.g. a university can run a JupyterHub instance for a course, and students visit a URL and login. nteract is an attempt to make a notebook interface as a local application.

For sure, I totally see how it's useful as a teaching tool. I don't see how it fits into a proper production system.

Different system and requirement need different tools, if you prefer text editor. Depending on your preference look at Emacs IPython Notebook, the Jupyter VS-Code extension, or Atom Hydrogen. That will basically let you select chunk of code and execute in a kernel. You do not have to use the notebook format, or the browser based editor.

JupyterLab also allow Rmarkdown-like workflow where code-blocks in a markdown document can be executed to display graph.

I believe the important part is to allow interoperability between different ways people want to work. You can't have 1 size fits all, and there are still a lot of work that can be done to cover some use case.

This may depend on what you mean by 'proper production system'. It's definitely meant to complement, not replace Python modules, scripts and so on. I wouldn't write a web app in a notebook. But the LIGO team that discovered gravitational waves published a notebook demonstrating their data analysis.

I see it as useful where illustrating and explaining some computational steps is at least as important as executing them. Teaching is one obvious use case, but it's also valuable for sharing scientific methods, documenting a library with runnable examples, or presentations at programming conferences.

Rmarkdown has the better version control story.

The number one feature request by far at this point would be collaborative features in the notebook. Not really because I actually want to collaboratively edit a notebook with others, but because I would like to open the same notebook in two separate browser windows (side-by-side), and edit a cell at the top of the notebook in one, and a cell at the bottom in the other, and have things get merged automatically. Or, not having to worry that I left a (remote) notebook open on my work computer when I connect to the same server from home, causing parts of the notebook to be accidentally overwritten.

JupyterLab has the ability to open multiple "views" on the same file. http://jupyterlab.readthedocs.io/en/latest/user/notebook.htm...

That's amazing! I'll have to try out JupyterLab sometime very soon. I was holding out because I'm using a whole bunch of plugins for the classic notebook from https://github.com/ipython-contrib/jupyter_contrib_nbextensi... that I think have't been ported yet

I guess it would be nice to make it run within an electron environment, without extra console and browser tabs to run it in. Just jupyter running as an electron app with maybe tabs for each notebook to give it a more native desktop app feel.

Check out https://nteract.io/. It's based on Jupyter standards (protocol, kernels, notebook format), written in JS, and runs on your desktop.

I don't use it because I'm used to other things, but I really love the concept and I am sure I would be a user if it had existed some years ago. More importantly, I know it is doing a great job helping many scientists. Other tools like the ones you mention may improve certain aspects or fit better in some workflows, but Jupyter is the clear leader in the field, it is the one that changed the state of the art, even although some similar tools already existed and new (and maybe better) ones will come. I think the award is well deserved.

I think the workflow is great for the web tutorials I see all over the place, and I'm sure it's a pretty good fit for exploratory data analysis and model building. On the other hand, I see a lot of people pushing it as the must have tool for data science broadly, and I just don't see it fitting into most production workflows.

When a model and the associated data pipelines hit production, they need to be version controlled - plain text files can't be beat for that. The idea that the same tool should be both IDE and report is also very strange to me - I can see how it lowers the barrier to entry in some cases, but it doesn't seem optimal for most uses.

I do agree that Jupyter has helped a lot to get reproducibility on people's radars, and that's a positive thing.

You are definitely not alone. I personally use pweave, which is essentially a thin Markdown/LaTeX parser wrapping Jupyter kernels for injecting the output of code chunks.

I like being able to use my existing tools (text editor, make, version control, etc). I also like being able to write clean functions and run unit tests while still ultimately being able generate a clean final document.

I agree. My big issue with Jupiter is that it tries to replace the command line instead of augmenting it. I want a web app that is a full fledged Linux command line plus some API's for visualizing information, don't think that exists yet. I just cannot work in any environment that doesn't support VIM + tmux + linux command line utilities.

I'm with you (although you should check out knitr on the R side).

I see two use cases for this sort of notebook thing. One is reproducible research, where I find that a woven solution is far preferable over a notebook. I can use version control to assist in iteration, and use an editor of my choice. The other is exploratory analysis, which Rmarkdown/Sweave/knitr is not really a good substitute for.

For exploratory analysis, I've found that I use the terminal integration in Vim together with the REPL works well for this. It lets me save half-working stuff and record some of my thought process; in a Jupyter notebook the only things I can leave in are working code snippets.

I use the reply for exploration, and my notebook for final product. Do same in Matlab, but just with code.

>Am I the only person in the universe who doesn't like Jupyter?

Yes ;]

I would like something like make for notebook cells inside a repl.

I would write the notebook in my text editor (not in a web form) execute "reload" in the repl, execute the cells that have been changed and render the webpage. The output should be a complete self contained html file with everything else embeded and of course not a single line of javascript.

If I understand you correctly, a combination of http://ipython.readthedocs.io/en/stable/interactive/magics.h... and http://nbconvert.readthedocs.io/en/latest/execute_api.html might get you there. Post if you build something like this!

BTW I'm pretty sure that this is exactly what Sweave is. You write in a text editor, and then invoke a batch process to generate an HTML file with mixed code and data.

Wikipedia says it does incremental rebuilding, which was news to me. I've never used it, but I know a lot of R users who use it.


Nope... you're not the only one

IPython's interactive shell is way more convenient than using the default python command line when trying things out. But I just don't get why one would need the whole Jupiter thing on top of that, except for maybe making presentations.

One thing that I think is missing from IPython though is the ability to save a given state of the interpreter, with all the variables in it. So that one could preform a time consuming data loading/parsing once, and restart from that point if some variables get messed up. Jupiter cann't do that either, afaik.

Anything graphical or interactive: maths, science, data, web, machine learning, etc. Interactive includes interactive widgets, but also almost any kind of exploratory programming.

Even stuff that's technically plain text is easier when you can display tables and other formatted text. E.g. I have a tiny little notebook that generates LaTeX code for a normal distribution table; it's a notebook because then I can display an HTML preview in a few lines of code.

Jupyter can't save interpreter state - that seems essentially impossible without adding state-saving and -loading code to every single library and dependency.

>>Anything graphical or interactive

but ipython is already well integrated with matplotlib in the --pylab mode, and is pretty interactive. Thats how I use it.

re:saving state - its technically a nightmare, I agree. I though it might be possible at OS level - just dumping the whole process. There still would be issues of what to do with open files, network connections, etc, but somehow it seems that in many use cases that would be enough.

Yeah, I used matplotlib like that for years (at times all day every day), but the notebook interface makes it much easier to keep track of lots of figures and where they actually came from, so I switched to that 4-5 years ago. The figures are also much more lasting, as they save in the notebook right next to the code, so it's a lot easier to go back and make sense of old work, especially with markdown cells documenting things right there too.

I think even saving state by dumping the whole process is unfeasible. What happens if some dependency gets upgraded, e.g. for a critical security hole? The problems seem unavoidable, so I think we're stuck.

you get a 'narrative' with explanatory text, charting and code together, interactively building until you get something you like, as a single document. That document can then can be exported to HTML or pdf to share with others.

Ever wish your laptop had 128 cores and 4TB RAM? With Jupyter, it can! Well, almost. Run the Jupyter server on some monster you rent on EC2, then laugh when someone says they have a big data problem.

Cell based code testing.

For transplants from Matlab or Mathematica it's wonderful.

I have to say that Matlab folks might be happier transitioning over to Spyder; Jupyter is much more like Mathematica, though. Those are very different interfaces (and each has its strengths / use cases).

Agreed. Spyder is decent. I'm a huge fan of JupyterLab these days.

Thoroughly deserved - Jupyter is helping push science forwards.

A huge congrats from Azure Notebooks which is entirely built around Jupyter. We actually started way before that by connecting the visual studio REPL to Jupyter and the whole experience (technically and people) has been delightful.

Wow this is awesome! It got the Award quickly also I think Juypter hasn't been around as long as other recipients.

As we say in the blog post, the Jupyter Name has been around only since 2014, but the work started in 2001. So 17 years is a good check of time !

Though the part that people mostly identify with the project, the Jupyter Notebook was introduced only in 2011. I remember that vividly as I was writing my thesises at that time and spending quite some time playing around with it :)

Though the current notebook is the 6th prototype, so there've been quite some work before actually being made "public". Hope the notebook didn't distract you too much from your PhD !

If open source didn't distract from ph.d's, we wouldn't have had ipython! ;)

overnight successes are usually 10-15 years in the making; so much goes on before the public release

I'm excited for this award, it's certainly well deserved.

In my heart, I wish the "data community" would show more care for security (and, related, privacy), with deeper focus on features that simplify access control, and guidelines on how to enforce "reasonable defaults".

I fear that Jupyter in many companies is becoming the next Jenkins, with unconstrained access to all data vs all infra, and this will lead to more and more incidents and leaks.

I very much hope that recognitions like this one will foster not only better tools and support, but also best practice and security considerations.

But, back to the focus of the post, congrats on this success!

Thanks for your comment. One of the next focus area, where we are looking for funding and help, is making sure the right restrictions and permissions are in place.

We probably will put that in the context of GDPR/HIPPA/FERPA and follow these guidelines to make Jupyter "Ready" for these framework. We can't say that Jupyter it itself compliant, as you need to see in which context it is deployed, but we want to make it as easy as possible for a team of researcher with low budget, or a companies with 1000+ user to make it easy to deploy a secure, auditable and safe Jupyter environement.

a CTO said recently .. we think of the base community edition as the place where we enable people to do things, and the corporate enterprise edition as where we prevent people from doing things. All the hardened security goes into the Enterprise Edition, for which they can pay.

Well deserved. There are few softwares that totally amazed me when I first used them. (I used Jupyter for the first time last week).

I was like "Did they actually achieve this?"

I've been using Jupyter for some time now since it was strongly recommended/almost-required in my school's classes. One thing I think Jupyter team achieved is reliability of common interface. Like, regardless what my data is, white noise, music, image, matrix, human face... I know I can easily output it and get some sane representation.

I really like jupyter and I am looking forward to jupyter lab and where it takes computing.

It is really great for solving one off problems or learning.

The jupyter code itself while verbose is pretty extensible also. I just put together something that lets me connect to my spark kubernetes pod.

I think being able to customize jupyter and add new kernels (languages) is where it becomes really powerful and awesome.

Jupyterlab is here. I've been using it basically full-time since August, but it recently went to Beta, so it's ready for use.

The major downside is that it disables JS by default within notebooks, so if you're using Bokeh you'll have to install the jupyterlab extension, but the innovations around files, downloads, views on notebooks, etc. are worth the price of admission.

Tools like Jupyter (and Mathematica) don't really match my mental model. I'm fine using them as a REPL (in which case they're like a bloated ipython or ROOT prompt), but as soon as I go back and change something, I get confused about the internal state.

Internal state is often a problem, look at project like stichfix nodebook, and dataflow kernels, they make things a bit easier by re-executing cells.

On of the issue is you always have internal state as soon as you interact with a data source or sink. If you read/write from a API, then rest is stateful. Your file system is stateful... etc.

It's an interesting but hard problem, we'll be happy to have more help with.

https://github.com/dataflownb/dfkernel https://multithreaded.stitchfix.com/blog/2017/07/26/nodebook...

Problem with that is, sometimes some cell in the middle is computationally intensive and I don't wanna run it again. Just going back and changing one function shouldn't run the whole computation.

Thanks for the links, those are interesting solutions.

I don't know if it's exactly the same as what you're getting at, but I'm a heavy Jupyter user, and find that as notebooks get more elaborate, I have to work with great care to avoid problems with everything having global scope. So I have to refactor my code once in a while, get things into subroutines, etc. All of my good and bad coding habits are laid bare in Jupyter.

Also, this is not for the faint of heart, but before closing a session, I do a "restart kernel and run all." This ensures that my notebook is running the way I'd think if I open it up later and try to re-run it.

It's one of those difficult things. You have to restart and run all to make sure your code is working and your state hasn't gone fudgy, but you also don't want to re-run that cell that took 3 hours to run unless you have to.

My personal best practice is to 'restart and run all' to run the entire thing before I commit it to github. Jupyter git integration is a whole separate bag of worms, as the combination of presentation/code violates some core git assumptions.

Any 3-hour calculation should be cached to disk, so that when you re-rerun the notebook you get the result instantaneously

Congratulations! Jupyter is such a gem. It has become such a critical piece in the whole data analytics/ML landscape that I can't think of living without.

This is great. Without Jupyter, I doubt I would have started learning data science a few years ago.

Do you have a favorite tutorial or simple use case that helped you grok it?

I just started using Jupyter a couple of days ago as I'm starting to learn ML. I'm really impressed by it. Would love something similar for Ruby.

Good news ! It exists ! It's called Jupyter ! Just install one of the non-python kernels[1], for example the Ruby one [2], and create a new Ruby Notebook !

1: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels 2: https://github.com/SciRuby/iruby

There are jupyter kernels for dozens of languages. Search it in github.

Wish they would add variable inspection to Jupiterlab!

I am non programmer business guy who 'excels' a lot. Been teaching myself pandas through notebook. Awesome combination.

Cannot imagine liking data science nearly as much as I do if Jupyter didn't make it so easy to quickly test new ideas. Well-deserved.

The other "innovations" that ACM lists alongside were real innovations " Unix, TeX, S (R’s predecessor), the Web, Mosaic, Java, INGRES " now they are handing out awards for "copy commercial software but make it free" projects. Its funny how copying is a bad thing in an essay but applauded in software (as long as it has the right license).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact