Hacker News new | past | comments | ask | show | jobs | submit login

I understand why they became popular, but as a software engineer considering how they work, I am just full of disappointment. we're going to spend the next ten years re-inventing every single software engineering best practice for jupyter's weirdo environment.



You reason from a downside (more layman programmers without proper workflow). I reason from an upside: much better flow than Excel or Excel/VBA and hence less errors, better accountability, etc.


In the mid-90s I was a Mech Eng undergrad using a program called MathCAD which provided a "notebook" interactive computation mixed with text environment by running as a Word plugin. In 2018 I use Jupyter and it's not clear where the progress has been. There are too many compromises trying to make it work in a web browser. For what Jupyter is for I find that RStudio or Spyder are infinitely superior. I do interactive exploring and transforming that into both documents and reusable and deployable code all in the one tool


I too prefer RStudio/RMarkdown. Now we have the Reticulate package that allows running Python inside RMarkdown, for my purposes it's way better than Jupyter. It's just plain text so my editor and VCS and everything plays nicely.

But perhaps the biggest strength is that you go through Pandoc, so you can just click a button and get the output as a Word file for sending to someone who's not a software dev, or you can get it as a LaTeX source file and extend it into a proper formal document like a journal paper, or you can do it as HTML where embedding Bokeh scripts and other interactive things Just Works.

I've even made presentations with it going through the reveal.js framework, where you can put an interactive plot on one of your slides, and show people live "what happens when we change this parameter". That's still semi-witchcraft in 2018, but it's going to become a common thing (hopefully).


Unfortunately RStudio also makes compromises to make it work in a web browser. A native tool could be even better.


RStudio is mainly a native tool, the web version is quite inferior. Linux, Windows and OSX native versions


One has to strecth the term quite a lot to say that RStudio is a "native" GUI. The interface is just a browser window, the only native part is the menu bar.

https://imgur.com/yszRlMk


This is definitely one of my concerns too. Ad hoc code inside these notebooks is almost completely unmanageable from any reasonable software maintenance perspective, and refactoring code out of them is prohibitively difficult as well. I really want something to emerge that combines the best of both worlds of an IDE and notebook development, but there isn't anything close currently.


ob-ipython enables IDE-like editing features within the code cells. It's embedded in a polyglot, git-friendly, literate programming environment called Org-mode. I use it every day and love it. Other goodies:

- easily manage multiple kernels (in different languages / machines) in one file

- tree-based organization manages complexity better than linear notebooks

- no browser in sight (unless you need interactive widgets)

- highly exportable, including to ipynb via ox-ipynb

Downsides:

- Emacs-only

- small user base / limited docs

- can't easily import from ipynb

- async cell execution support is early-stage

See Scimax ipython for examples


Org-mode is strictly speaking the most powerful notebook programming environment out there.

I’m also much more excited by R Notebooks as implemented by RStudio than I am Jupyter. R Notebooks take the same basic approach as org-mode, implementing a smaller set of functionality built around a more mainstream-palatable Markdown format.

I don’t like R Notebooks as much as org-mode (why reinvent the wheel!?), but at least the general approach plays nicely with git.

On the other hand, the lisp-addled part of me starts to think that the juypyter format being json might actually be a step up from a poorly specified markup format that has to be parsed into data structures.

Maybe the real issue is that git is an insufficient revision control system, and that we need revision control systems that can revision data structures, rather than simple text diffing.


Could you please elaborate on how you use orgmode for interactive programming?


Interesting. I use emacs for hacking Python but I've never heard of this tool. Do you actually use this for software development or is it more of a data-science-type exploration tool?


I use Org for software development whenever I can, which is currently everyday.

Usually, new code starts in cells with some Org-managed context (e.g. a Jupyter kernel in a remote container with some DB/service access). This is done using the :session code cell keyword, which works per subtree. Managing remote sessions like this generally keeps me away from terminals.

Surrounding the cell are various mini-dashboards with useful docs / links / commands for that part of the project. Since Org supports embedding elisp and shell commands in clickable links [1], these mini-dashboards can be made very quickly.

Org lets me edit the code using the proper Emacs mode for its language, while pulling dynamic completion / docs from the Jupyter kernel. Just like Jupyter notebooks, I can view rich outputs from the cells in-line. I can then name the outputs and make them inputs to other cells, including ones in different languages / kernels. AFAIK that's an Org-only trick.

Most code eventually finds it's way to normal source files (see Org's "tangle" feature). This feels more natural than moving code from notebooks since, again, the cell editing mode is the same as the one for source files.

Org's tree-manipulation capabilities + support for multiple sessions means that (so far) I've only ever needed 1 Org file per project. I track this in git, which is simple since Org is just plain-text. To share with non-Org users, I usually export to ipynb [2] or, for static docs, HTML [3].

[1]: https://orgmode.org/manual/External-links.html [2]: https://github.com/jkitchin/ox-ipynb [3]: https://github.com/fniessen/org-html-themes


Thanks.


related "literate devops in emacs" https://youtu.be/dljNabciEGg


What? What are you using Jupyter notebooks for where you want maintenance? They should be records of data analysis/ procedures, not code that runs in production or something.


Did you read the article?

"A Netflix engineer described how they have replaced Bash scripts with Jupyter notebooks for ETL pipelines and cron jobs."


Maybe read the article. They are experimenting with putting the notebooks directly into production ala bash script. I don’t think this is a great idea either.


Yeah, I read it after, silly of me to comment first.


exactly. I use it for quick data exploration and experimentation but it remains at that level.


People do that in Excel too and the next thing you know a spreadsheet is managing a portfolio or being used as the basis for published science!


everyone knows that excel is a drawing program https://www.thisiscolossal.com/2017/12/tatsuo-horiuchi-excel...


I can’t tell if this is sarcasm or not... I’ll assume it is and upvote :-)


Its not the fault of the tools if users dont know any better.


So what would you suggest a financial modeler, who has no experience of any programming environments, to use instead?

The value of Excel is that is a zero-config tool, available everywhere as 'standard' business installation, allows very quick iteration with visual output in certain range of tasks, is battle tested in millions of computers... etc. And everyone else is using it too.

For a programmer it's easy to suggest to just pick some good language. Once you know two programming languages you can navigate pretty soon fluently in third.

But picking up the first language fluency? That's hard. You can't just suggest to use python or such. You need to provide a tool that holds hands, has lots tutorials available etc.

Visual programming is not necessary the answer either. I've understood that without discipline LabView programs become actual visual spaghetti pretty quickly.


of course its hard. Just like its hard a new language thats is far from your native tongue. But we should not be scared of learning something difficult, because the returns are well worth the investment. There is only so much you can do in Excel and if you want to advance your career you need to move past it even if you have no programming background. Additionally, its never been easier than in 2018 to find free resources to learn. The only things you need are awareness, time, and will.


It's also not the fault of the tools if there are no better alternatives.


Some simpler notebook-like environments stay closer to source code in that they basically are source code with interleaved results (e.g. as comments). IMHO they hit a sweet spot between REPLs and those notebook environments inspired by mathematica, maple and similar more mathematically oriented software products.


That's just a REPL with basic editor integration (eval at point, paste result). Surprisingly unpopular outside Emacs/Lisp land.


Well yes, but these notebooks are little more, aren't they.


People are already doing bad programming practices in large scale with MATLAB, I guess jupyter is a step up from that.


On the other hand, notebooks encourage many good practices: literate programming, purely functional code with immutable outputs, visibility of state, and reproducibility.


Testing, style checking, code coverage...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: