Hacker News new | past | comments | ask | show | jobs | submit login
Nbdev: A literate programming environment that democratizes best practices (github.blog)
210 points by pbowyer on Nov 20, 2020 | hide | past | favorite | 86 comments



IMO no one has done more to make deep learning accessible than Jeremy + fast.ai team. Thanks for the amazing work!

My question is about the coding style - @jph00 I’ve read your fast.ai style guide and worked with APLs like q/KDB (written by Arthur Whitney who you cite).

My experience is that brevity is great, until you need to collaborate or have individuals working on small parts. That was my experience as well trying to write an extension to the fast.ai code (where I had to read large amounts of source to understand how to implement a small change).

Given that a key motivator for literate programming is collaboration/communication, how do you think about this?


I fully agree here. I am working through Deep Learning for Coders now. I have created several errors as I am implementing chapters, but the coding style of fastai makes in impenetrable to debug. It’s an incredible book and a great library when its working as expected, but the number of times i have run into some variant of `method takes N parameters but M were given` is pretty frustrating. Looking through the stack, these are not patterns that would have been accepted in a code review from me for the same reason you mentioned. Making small changes and debugging are both hampered by the style.


I have used Nbdev and I am not a fan. It creates friction when one wants to contribute (in my case, to fast.ai) and forces you to write your code with notebooks which is the point but also not great when you are writing code rather than producing a display mixing text and pictures. Plus, while notebooks should favour documentation in theory, you can also end up with notebooks full of blob of code with transitions text that does not help you undertsanding what's going on.

Case in point, here is a random notebook from the fastai repository, a python file would be simpler to read and shorter: https://github.com/fastai/fastai/blob/master/nbs/09b_vision....


It seems like there's a lot of friction points with this model and perhaps a bit too much world building. In general, I have trouble understanding the allure of this approach.

Another example (and bonus points if you can spot the rogue semicolon).

https://github.com/fastai/fastcore/blob/master/nbs/02_founda...

https://github.com/fastai/fastcore/blob/master/fastcore/foun...

https://fastcore.fast.ai/foundation.html


So I use this in production at my company. It's an awesome tool. Personally when I'm coding in python I like to prototype in jupyter, copy code over, and then reimport anyway. Nbdev streamlines everything so I can write docs, tests, and code all in one place. And since the docs are just a jekyll site I can copy it to our documentation aws bucket in continuous integration. And with one command I can run all the notebook tests in CI as well.

The packaging is also really well thought out. I don't have to stress out about connecting setup.py with whatever publishing system we have. The settings.ini makes things sane and I can bump the version whenever I want.

A get a lot of skeptical looks when I say the source code is in notebooks, but that's just syntactic sugar for the raw source code. You still get to edit the raw code files and with one command sync everything with the notebooks. From my point of you it is close to a pareto improvement over traditional python library development.


Really interesting! Do you mind sharing what your company is? (I am the author of the blog post)


I work for Lyft's self driving car division, Level 5! Nbdev has been great. I use it a lot. Thank you for all of the work you've put into it!


I also use this for a work project. My experience has been incredibly similar, particularly with the wall of skepticism I get from people regarding their opinions on notebooks. It's the main barrier I have in getting others on board. That presentation a few years ago hating on notebooks has really penetrated.


Sigh. I really think that the only reason notebooks are so popular is that Python never had a popular IDE with a high-quality REPL where people can learn to work interactively while writing code in plain text.

R illustrates this very well - as an R user you can have your pick between RStudio, Jupyter and Rmarkdown and the overwhelming majority of users pick RStudio and notebooks are reserved only for a niche set of use cases. It also speaks volumes that almost no one writes R in Jupyter even though it's supported very well - R users just have better options available to them.


No that's not the reason - or at least, not for everyone. I created nbdev. I've been coding for over 30 years, and spent over 10 years using R and S-PLUS. I've used Delphi, Visual Studio, Emacs, vim, vscode, and many other editors and IDEs, including many with integrated line-oriented REPLs.

There's a big difference between a line-oriented plain text REPL, and a Mathematica/Jupyter-style notebook REPL, especially when you want to mix and match your charts, image outputs, rich table outputs, interactive JS outputs, and so forth. Also, for experimentation, where you want to go back and change things to see what happens (e.g. very common in data science) I find it much easier and more understandable in a notebook.

I have a video where I show the difference between these styles of working in some detail: https://www.youtube.com/watch?v=9Q6sLbz37gk


Yea, I don't really get what repls have to do with what notebooks offer. They are similar-ish, obviously, but they accommodate different workflows and use cases.


People mean different things by REPL - the nicer Lisps had richer reader prompts that were not totally text bsaed, could show you graphical stuff and accept commands other than just source code, have interactive features etc - see eg https://upload.wikimedia.org/wikipedia/commons/c/c6/Listener... or some youtube videos of Lisp machines.


That's getting to my point though - they should accommodate different workflows and use cases. But what's happening instead is that people are overusing notebooks in cases where plain text + REPL is more appropriate.


That makes sense as a possible situation. I’m just not sure that it’s one that’s familiar to me. I can’t honestly say I have my finger on the pulse of where people are using notebooks vs repls but the former seem really great for 1) step by step examples 2) scripts-in-progress where certain steps are more in flux than others.

Though, I think I get what you mean as I reflect on my own dev experience. As a mainly C# dev which has a very very limited repl experience (I would and should say no repl experience but someone will yell at me about csi.exe or dotnet-script) I have seen people using notebooks for want of a good repl, but I’m curious why anyone writing python would.


You bring up RStudio but not the R Notebooks which it supports natively (https://bookdown.org/yihui/rmarkdown/notebook.html), which IMO is a far-superior way of handling notebooks than Jupyter. (namely, the files are plain text so you can actually commit to Git without fuss)

I wrote a detailed blog post about the differences between Jupyter and R Notebooks years ago: https://minimaxir.com/2017/06/r-notebooks/


You can do the same thing with the jupytext extension. But sometimes it is helpful to have the rendered results in version control, eg internally we use it to discuss data science findings on Gitlab.


Yes, what I meant by Rmarkdown is these notebooks.


This is an interesting view in light of the fact that Jupyter is actually a direct evolution of the most popular bells-and-whistles REPL in Python land (IPython) - .ipynb files were just saved IPython REPL sessions.

(try "sudo apt-get install ipython && ipython" on your Ubuntu/Debian system to try it out)


I don't think this is the best comparison. People use notebooks in Python to work with data and code, not for lack of a popular IDE.

Comparing R and Python also could be done better. Python is a general purpose language. R is for statistics and science. Even then, Python is extremely popular for these very domains with Jupyter notebooks.

This is why I shrug looking at "Jupyter killer" notebook alternatives tackling machine learning: in my experience delivering machine learning products to large, paying, clients, the bottleneck never was slicker stylesheets or cool animations. It was the nuts and bolts of things.


> Sigh. I really think that the only reason notebooks are so popular is that Python never had a popular IDE with a high-quality REPL where people can learn to work interactively while writing code in plain text.

I would not call Idle unpopular. But it had a limit for growing for sure.

But Notebooks are mostly coming from the science-corner of python. People there used notebook-like tools and workflows for decades and some brought that over to ipython-project. I remember 15(?) Years ago when the project started focusing more and more on the cluster-aspect of their shell, they brought up many differenct tools for this. One of them was notebook-like and what become later Jupyter. It quickly became popular in certain groups for those reasons.


There are good IDEs for Python development, such as PyCharm (from JetBrains). You can launch an IPython console in PyCharm, and get a REPL. And besides, many languages don’t have any REPLs or haven’t had them for years (C# until 2015, Java until 2017), and people were still able to work fine with those.


I love democracy, but somehow there is something cloying about the use of "democratize" in this context.


For some strange reason, everyone in the ML community refers to 'increasing adoption' as 'democratizing'. It's my pet peeve.


There are two lead definitions for "democratize" in the Oxford English Dictionary. One of them is:

"make (something) accessible to everyone"

So the usage here is entirely consistent with standard English usage. It is also consistent with the French etymology (démocratiser), which has as a dictionary definition "Rendre démocratique, populaire" (i.e. to make popular).


I think that definition is a part of parent's pet peeve.

dēmo- (people) -kratía (rule) has only indirect relation to popularity.


Thanks! I have also been confused by this use of “democratise”, so learning about the shift from “rule by the masses” to “accessible by the masses” is valuable.


If I let everyone on my street borrow my bike, can I say "I've democratized my bike"?


Did your bike get liberated from your dictatorship?

I don't know. How good did it feel to eat Freedom Fries after 9/11?


If you make lots of bikes available for rent then yes.

https://medium.com/the-fourth-wave/the-great-democratization...


i would say, only if they all get to vote on how/when its used and by whom


While I am also irked by shifting meanings, I think trying to resist these changes is very Canute-like.

https://cdn.digg.com/images/e160ad4bb9c845f894155145539af3df...


"Democratizing" makes more sense when the industry evolved from AI/ML frameworks which required a Ph.D to use to allowing anyone to train a model (e.g. Theano -> TensorFlow -> Keras/fast.ai), and allowing people to train models on GPUs without a grant-funded supercomputer cluster, for very little cost (spot/preemptible cloud GPUs, Google Colab)

In this case, I agree it's not equitable.


Yeah, my first guess was that the tool allowed everybody to have an input on what best practices should look like and enable distributing the final consensus to everybody. So if today best practice is 4 space indent, this would be the default, but if enough people changed it to one tab, that would change everybody's default and reformat their code to conform with the new best practice.


It implies that the state of ML research is undemocratic and therefore unfair or somehow opposing the values of our time. It's a ridiculous idea because with such an immense amount of quality information freely available, the only real barrier is intelligence. We can't democratize intelligence, so the sentiment smells a bit anti-meritocratic, rather than democratic (I think these two are often confused). It's really no different from saying that mathematics is undemocratic. Furthermore, it's patronizing to the wider audience by suggesting that they can participate in the field only after it's been brought down to their level of understanding (by the democratizing gatekeepers of course).


I mean, increased adoption is a result of democratization (e.g. more accessible). I think the usage here is fine because Jupyter notebooks are definitely more accessible to those new to programming than a typical Python environment and this expands Jupyter to be used for more development purposes, like building libraries.


Increased adoption in ML comes with more open implementations and more freedom, which in contrast to FAANG being the only ones to employ advanced ML can indeed be seen as a form democratization.


It suggests you've stolen fire from the gods (or the arcane halls of wizards/academics) and fed the starving masses.

Pretty good for an ide.


I'm still struggling to find the link with democracy. Sure you can't have democracy if the source code is inscrutable but that's a somewhat tenuous link.

The word 'democratize' here doesn't seem to add any meaning that 'literate programming' doesn't already cover.


The author writes

> we decided to assist fastai in their development of a new, literate programming environment for Python, called nbdev.

but this is followed by:

> nbdev builds on top of Jupyter notebooks to fill these gaps and provides the following features

is it a new environment, or is it an extended Jupyter Notebook? It looks like Jupyter Notebook to me. Why not Jupyter Lab?

> JupyterLab: Jupyter’s Next-Generation Notebook Interface https://jupyter.org


for background, the original author of nbdev has a good post outlining exactly how it relates to jupyter -

https://www.fast.ai/2019/12/02/nbdev/


Strange... "Nbdev is a system for something that we call exploratory programming." but no citation... the phrasing suggests this is something they believe they've come up with a name for?



I'm aware, what I'm confused about is why they've phrased it as if it's something they call exploratory programming, as if they've coined the term.


I coined the term - and it turns out someone else did too, for something else. So be it. If someone else can think of a better term that's never been used before, then I'll happily use that instead.

The earlier usage mentioned in Wikipedia is entirely uncited there however, and seems to have only been used in one academic project AFAICT.


How are you sure you didn't just read it somewhere and then forgot? It has been mentioned many times in related literature (even in publication titles) https://scholar.google.com/scholar?q=%22exploratory+programm...

Here's one from 1988 https://dl.acm.org/doi/abs/10.1145/51607.51614

Are these unrelated? Is Nbdev not only a "new programming environment", but also a new concept that needs a new name?


"In some cases the estimates may be obvious. Perhaps the story is similar to others that have already been completed. In other cases the story may be very difficult to estimate and may require exploratory programming."

Kent Beck and Martin Fowler

http://index-of.es/Java/Planning%20Extreme%20Programming.pdf

Its a commonly understood term AFAIK


> Its a commonly understood term AFAIK

I agree - it's what made me raise an eyebrow... However, based on their comment above, the lead author of Nbdev believes to have coined the term.

My opinion is that due diligence and attribution are important. If I believed I'd coined a new term, I'd check first. Mistakes are easy to make, but when highlighted, perhaps corrections are more appropriate than negotiating with the person highlighting them:

From the lead author (jph00): ... If someone else can think of a better term that's never been used before, then I'll happily use that instead.


Why are people nitpicking about this? So the term/phrase he came up with was so descriptive that other people had also thought of the same term previously. He didn't steal anyones research on the topic, or steal code for the project, or deny credit to a developer working on a project. It's 2 simple words that are extremely common in the english language. Of course lots of people have happened to put them together before.

This is my second time on hacker news, and I don't think I will be back. Why not offer to help the project, or show support? Why try to find something to fight about? It's just demoralizing to see the lack of kindness from people.


People that “coin terms” that don’t spend the two seconds required to Google it and then spread around statements like “I coined it” are self-aggrandizing.

It’s worth calling them out and discouraging this behavior because it leads to missing entire fields of previous work for both the author and people who build on the work.


If you knew him at all or had bothered to spend the time to look at his work and projects in any level of depth, you would know he is not a self-aggrandizing person.

And if the arguement is that this is a gateway action that leads to other bad things, that's not an arguement I put any stock in. It's used a lot in many places and many discussions, but just focus calling out the bad. It isn't his responsibility to ensure that some person building off his work at some point in the future does the appropriate level of research appropriate for their project. Assuming that an action is bad because you think someday down the road it might lead someone else to do something and that that something will be a bad thing is just ridiculous.


> you would know he is not a self-aggrandizing person.

But he is, he just claimed he coined “exploratory programming” FFS. It takes a shocking lack of hubris to announce that you are on the cutting edge of a field where you get to coin terms without doing the trivial amount of searching to verify it first.


This is going off-topic. I highlighted something factual - this is not personal.

Whether they choose to update the materials and reference existing work is up to them.


Thank you. It just comes off very elitist and distasteful, not something you like to see in people doing big things. We want to look up to these people, not cringe when they "coin" a common phrase and dig their heels in when confronted.


It's certainly possible. Smalltalk is absolutely an important inspiration for nbdev, and that's a really great reference that you pointed out.


I'd probably call that analytics or analysis.


In electronics, we call it Breadboarding, and the results are expected to be temporary by all involved. Breadboarded circuits generally are not stable over time or movement.


Guessing at plausible answers:

- Nbdev started right around the 1.0 release of jupyterlab, and it might not have been on their radar

- Nbdev came out of fast.ai, and I wouldn't be surprised if they were using a ton of jupyter-specific features already which weren't supported by jupyterlab


It works fine in lab too, although I prefer using notebooks on the whole.

(I'm the lead author of nbdev).


Awesome! Thanks for chiming in :)


>is it a new environment, or is it an extended Jupyter Notebook? It looks like Jupyter Notebook to me. Why not Jupyter Lab?

Neither? It doesn't change the features of jupyter notebooks, and its not an improved/expanded UI like jupyter lab (you could use nbdev with jupyter lab). Its utilities and automation to make package/library development a better experience if jupyter is where you write your code.

From https://github.com/fastai/nbdev:

"nbdev is a library that allows you to develop a python library in Jupyter Notebooks, putting all your code, tests and documentation in one place."


Sure, but the OP link that I'm commenting on says it's a "new literate programming environment". Based on what you're quoting, the OP article is incorrect and needs correcting?


I have a love hate relationship with fast.ai. Its great for breaking into ML, and that is awesome. But, its somewhat funny to me that this is being promotes as a way to democratize best practices since it also throws many traditional python best practices out the window. I am sure its better if you have spent a lot of time with it, but trying to dig into the code base really breaks my brain. Too many import *, and one line functions / if statements.


nbdev looks promising. I'm wondering if it solves what I see as the biggest pain of using notebooks when I'm doing data science work.

When a notebook gets large, it can be difficult to keep track of dependencies between cells. For workflows in which you have to run cells n_1, n_2, ..., n_k before running cell n.

I try to organize my cells so that if I run them from first to last, all dependencies are covered (e.g. "Restart kernel and run all cells).

Unfortunately, this doesn't help when I discover a bug in cell n_2 and don't want to run ALL cells n_2 + 1, ..., n-1, n because some of them carry out expensive operations.

When working in my editor, the way I resolve this is to make a light CLI wrapper around my program (if __name__ == "__main__": import argparse; ...) and my CLI commands encode all this dependency information.

Is it possible to get this kind of experience in a Jupyter notebook without building a custom plugin (I think a frontend plugin would suffice)?


>Unfortunately, this doesn't help when I discover a bug in cell n_2 and don't want to run ALL cells n_2 + 1, ..., n-1, n because some of them carry out expensive operations.

Running all cells has its disadvantages, but what I found out was that often, there are bugs elsewhere than in cell n_2. I treat that as unit tests: run them all because even though I think only this is breaking, fixing it could have broken some other part.

Many other "Jupyter killers", as sensationalist blog post titles call them, claim they have done away with Jupyter's dirty "hidden state", but reading further about how the "Jupyter killer" avoids re-doing heavy computation by "caching" compute-intensive results and you want to tell them "get your mind right, which is it?".

The way we do it is schedule the notebook[0] to make sure everything works.

- [0]: https://iko.ai/docs/notebook/#long-running-notebooks


I used nbdev when it was first released. Some things must have improved since then, but I was already amazed by the experience. I think having code + docs + tests in the same document makes a huge difference in the effort needed to get those 3 done properly.


This is really exciting! I've been building a data engineering practice around jupyter notebooks, Netflix's papermill, and k8s cronjobs for scheduling, and it's been great...except for code review, weird dependency/virtualenv glitches, tests, and documentation.

At first glance, this seems like it would address all of my pain points? Will be interesting to try it out.


Has anyone from a SE background used this? I come from data science so I and peers use Jupiter notebooks for everything. I've never used a proper IDE so I wouldn't have anything to compare it to. But I need to start wrangling in people's notebooks and am hoping nbdev does the trick.


I will use it soon. We want to support fast.ai[0] courses on our machine learning platform[1], and wanted a way to easily test the notebooks. I asked one of the people behind fast.ai, and they told me they use nbdev to test their notebooks.

- [0]: https://www.fast.ai/

- [1]: https://iko.ai


I've done both. I haven't used nbdev, but it looks like it addresses many of the drawbacks notebooks have, from a software-development perspective.


I really enjoy using nbdev. I am a fastai user but also a researcher, and I have been using nbdev for my research projects. Because everything is in Jupyter Notebooks, it's much easier to document your work as you are working. Nbdev will build docs and run tests based on the jupyter notebooks and everything is quite flexible!

Here is a small project of mine highlighting some of the capabilities of nbdev: https://github.com/tmabraham/UPIT


So you can put these in a GH action to ensure they are in working order across deps / dataset updates etc? Seems like exctly the missing piece for using notebooks for serious work.


Have only had a quick look at the examples and docs but there doesn't seem to be any support for reordered chunks? (ie. there's no weaving involved.)


Enough with all these orgmode rip offs already. When will the community actually appreciate older ideas and code without shiny marketing and packaging?


Shh! Don't tell anyone :)


I'm wondering how one would incorporate important practices such as TDD into this development methodology.


Not meaning to be a grump, but "democratise" really doesn't mean what people in tech thinks it means. Now is a pretty sensitive moment with regards to that.

Maybe we should say "ubiquitise" instead?


Am I reading this and the quickstart right - this is still complicated to use on self-hosted infrastructure, and for python packages not published via pypi?


This is not literate programming.

Literate programming involves having a meta language that is extended into the target source code through nested macros.

The killer feature was the ability to see everywhere a piece of code was used on dead paper by looking at the auto-generated index, with the chunks being logical rather than language driven. In a literate program you wouldn't care that something was a class or a function, you would just have it be described by what it does, not how it does it.

This is marginally better documentation for python notebooks.


No true scotsman.

It's more literate programming than not and chasing the promise of the ideal literate programming environment is what created the novel notebook environment in the first place.


1). Notebooks were invented by Mathematica in the 80s.

2). Words have meanings and literate programming is defined extremely well by Knuth in his 1983 paper. This is not what was described there any more than the WWW is Xanadu.

3). That is not what No True Scotsman means.


> That is not what No True Scotsman means.

Setting aside whether you are or are not right, can we just appreciate the irony in that statement for a moment?


Sorry, using NSA/Microsoft Github is not a "best practice". This project should be dead in the water if that's their starting point.


Willing to be more specific with a comment like this?


While the grand parent's tone is not appropriate and has been duely downvoted. They do point out that nbdev is currently closely linked around pushing code and docs to GitHub. This is something which threw me at first but isn't a requirement. You can set it up to work only locally.

The GitHub flavor is likely just because that is what the author was familiar with and what they were using.

If there are enough people interested we could get together to make PRs to add other remote version control systems and other static site hosts. I know an integration into the Atlassian world would really help me at work as that's my employer's chosen code repo and doc manager.


They have gitlab options as well.


Don't use NSA/Microsoft's computer to store your data. I'm not sure how much more clear cut this can be. NSA/Microsoft is a persistent, global adversary and should be treated as one not as some kind of standard. Treating NSA/Microsoft as a 'best practice' means sleepwalking towards losing the right to read[1], the right to investigate and understand the environment around us and all human rights worth considering.

[1] https://www.gnu.org/philosophy/right-to-read.en.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: