Item #1 Writing a notebook is foremost an exercise in expository writing. Make sure the writing is high quality is the first objective when writing a notebook. This is the Knuth’s literate programming idea where prose takes precedence over code, which is usually the reverse of the way we usually program; code first, comments second.
Item #2 Don't use notebooks for general purpose programming. Notebooks are supposed to have an audience and clearly explain something.
Item #3 Keep code cells simple and clear. If needing a comment in the code block, consider putting that verbiage in a markdown cell instead and elaborating on the idea the notebook is trying to convey.
Item #4 Don't make notebooks a long series of extended code cells, or even worse, just one long cell. Explain what is going on or see Item #2.
While your points are valid for presentations, I believe that notebooks should first and foremost be used for exploratory computing.
Notebooks are my goto tool for whenever I need to do something with a computer and any of the following apply:
* I'm not quite sure what or how.
* It will likely be a one-off
* I need it now.
* Someone is watching me, to learn how I do it.
I would go a step further and say that notebooks should probably not be shared directly most of the time, and that you should wrap up functionality into modules/packages for other people to use in their own notebooks.
Edit: Just thought of a corny pun to make my point: They are notebooks, not textbooks. Notebooks are personal, and while it may be useful/insightful to compare notes, it's not the primary function of a notebook.
And regarding sharing -- particularly with notebooks you should be really cautious about sharing and opening them. The security model is not exactly bullet proof.
Nitpick, but I hope you meant "young", not "early". Interactive programming is as old as (and today still primarily featured in) Lisp - i.e. twice as old as most of us here on HN.
I believe that there is an audience for Jupyter beyond just "data scientist", or "teacher". I think that what Jupyter encourages is _experimentation_, and there are lots of folks who could benefit by using Jupyter to experiment with ideas. If you consider programmer as another audience, we all at different points in time experiment with things. Think about all of those random console applications that we've all created at one point or another to learn a new API, or to try out an idea. Jupyter can (and does) excel in these scenarios and handles the mundane task of "writing things down". This is why adding a Jupyter experience to a text editor is something that we're experimenting with as well.
I think there is a lot to do with organising and formatting the notebooks, and the %run command works a treat for me when breaking a project into multiple notebooks
Take the time to learn some good programming practices. If you're lucky and fall in love with scientific programming, your programs will grow to a point where they are unmanageable without some techniques borrowed from computer science and software engineering.
Have someone show you the "out of order execution" and "hidden state" failures.
Since Xie's post discusses a "meta" layer above Joel Grus, I recommend people read it before JG. This actually helps one understand the underlying thinking in JG's slides.
I prefer pipenv to Conda, and I don't like having Jupyter(Lab) installed in each venv separately, so instead I only add `Ipykernel` to each venv and then use my system-level JupyterLab to access per-project kernels; seems like that wouldn't work here?
I'm not sure if I understand your kernel question, but VSCode's Python extension has everything built-in. As soon as you add a #%% comment it considers what follows to be a notebook cell and automatically gives you a "run cell" button that uses your chosen Python interpreter.
I tell pipenv to store venv stuff in the project folder (`export PIPENV_VENV_IN_PROJECT=1`), and then do the following to start a new project:
pipenv --python 3.7
pipenv install ipykernel
pipenv run python -m ipykernel install --user --name=`basename $(pipenv run dirname '$VIRTUAL_ENV')
Then in the list of kernels, in addition to the usual suspects I'll have one named for the folder I ran the above in. If I started a notebook before all that, I'll have to change it's kernel. Doing more `pipenv install` at the prompt makes new packages immediately available in the running notebook.
Cannot wait to give it a try.
I submitted this a week ago, but too bad nobody cared. https://news.ycombinator.com/item?id=19794865
It never caught on outside of the R community, but the format itself is language-agnostic.
R Markdown and its derivative are such great tools. I would also suggest xaringan if you want to present your work on a big screen.
There are a lot of examples of people analyzing public code on GitHub efficiently for patterns and usages with BigQuery and getting pretty accurate data out of it. https://medium.com/google-cloud/analyzing-go-code-with-bigqu...
If you use GitHub on a daily basis, you are unlucky enough to know that web search sadly can't even find words that exist in your repository.
Possibly something in my config, but I've recently got a lot of
"Sorry, something went wrong. Reload?"
I have used this as an alternate: https://nbviewer.jupyter.org/
If github is not working for you well for your notebooks. You can try Kyso by signing up and importing your notebook from Github directly on this page: https://kyso.io/github, or upload it using this page: https://kyso.io/create/study
Disclaimer - I am OP and founder of Kyso
(Obviously this would be for shits 'n giggles, not any serious work).
If you're on emacs and like Jupyter, there's https://github.com/dzop/emacs-jupyter , which is pretty nice. I've been using it for a few days with Julia, and it works really nice. It also allows you to use different kernels from the same org-mode file, though I haven't tried to pass data between them yet (should be possible, though, at least it works in plain org-mode).
Interestingly, for initial use, we increasingly start teams on their existing internal NB servers, and for new ones, they either start on Jupyter included in their Graphistry AMI or use Google Colab. So, very little outside of our quick start notebook skeletons hits GitHub.
So... How many notebooks are actually out there? Probably an even more interesting growth curve...!
Feel free to ping in a ~couple weeks, happy to chat.
The linked post is actually a Jupyter notebook itself - analysing the number of notebooks on Github.
A key element with Kyso is that the code is hidden by default to make it readable to non-technical people but you can click on the "code hidden" button on the top right to see the code in full.
If you want to give Kyso a go - sign up and import from Github directly on this page: https://kyso.io/github, or upload using this page: https://kyso.io/create/study
Image you had a notebook to analyse sales data and you needed to present the results to your CEO (who perhaps cannot code) - this feature lets you present the notebook as is, without needing to prepare a report in some other format
Like an internal wiki for a companies data-science where the technical people can communicate their work to the non-technical people with a pretty seamless experience
You effectively say to them "I want you to be proactive and go ahead and grab this food which I prepared for you and placed here and here and over there, and by the way plates and utensils are in that corner, help yourself" instead of "I though you might need this, here it is."
The way you do it works in some orgs, but generally it doesn't.
Upload directly, import from Github or use our plugin to publish from Jupyterlab itself - all those methods are outlined on our docs page: https://kyso.io/docs
1. Adding better Jupyter support to GitLab 12.0 https://gitlab.com/gitlab-org/gitlab-ce/issues/47138 as suggested by my co-founder.
2. Making it easier to do the entire data lifecycle with Meltano https://meltano.com/ which plans to include JupyterHub
I was under the impression that FB Prophet was optimal for significantly seasonal time series data.
Honestly given the fickle nature of these kind of growth patterns beyond the very near term, an ARIMA with a flat vol or a simple eyeball extrapolation in my experience as a quant would likely generate just as reasonable/reliable results.
While I understand this is likely intended as a standalone project, it would be interesting to run a comparison of ARIMA vs FB Prophet on out of sample trending Github tools/file types, as well as the general performance of these predictions beyond a one year time frame (especially vs the reported confidence intervals in Prophet).
I am not that familiar with how Prophet works, so I am absolutely open to being humbled and corrected. I have a project myself that has a varying seasonal component and I am looking forward to diving into Prophet for a deeper understanding. I am attempting to model an Asian 2 asset spread option with a volume weighted average index price setting mechanism where the underlying exhibits seasonality in the volume traded over the trading time window. I am currently running a Monte Carlo on the valuation with a simple average settlement assumption, as opposed to a volume weighted average assumption, and I was thinking Prophet could help.
Does anyone have experience in financial time series analysis and option valuation who would care to chime in?
Also, what is everyone's thoughts on using prophet non seasonal vol clustering times series?
I will likely publish something on my project using stale data given I work in a trading environment. The theory should be the same though. One of these days I’d like to actually write a solid white paper level research study and get published! One can dream!
Would love to hear some feedback.
How do you define inactivity? If I do
$ nohup ./computational_intense_and_runs_for_100_hours.py &
Do you just kill the process (or stop the container)? In essence Jupyter is a graphical rich shell, so you providing free *nix machines - don't underestimate how this feature can be exploited (e.g. CoCalc limits at least internet access for free instances).
Second, yes. The container will be killed after 10min unless we keep detecting activity of your user in the platform. So, basically the rule is: If we don't detect user's activity after 10mins we kill all containers for that user.
You could hack this by doing periodical requests to the API to simulate activity, but at some point your JWT will be expired and requests will start failing.
In any case, other students won't be affected at all by the appropriate usage and we will end up banning your account at some point when we detect it.
We also limit the amount of parallel running containers to avoid unlimited containers running at the same time.
Do you see any drawbacks on this implementation? Happy to hear about possible improvements.
Oh no, you make it sound like this is a good thing. It only means I can't take you serious.
Don't make it free. That is not a feature for a computation environment, it will only cause headaches on your side and people get wrong (bad) impression about the performance (assuming free accounts get some limited shared instances).
I rather pay a monthly fee for a good application, than a pseudo free instance, where you get limited resources. Do you have your credits forever?
* We added a way for students to present their projects / analysis in an static blog-looking version of their notebooks. Here's an example: https://notebooks.ai/martinzugnoni/how-to-trade-bitcoin-with...
They can share that version with classmates, and classmates can "fork" the project and do their own changes to it.
* Also, we added a custom JupyterLab extension called "solutions" that allow the teacher to mark certains parts of the notebooks as assignment solution and it will be hidden for the student until they decide to reveal it. Here's how it works:
We now have the ability to keep adding educational relates stuff to it, without depending on Google or Microsoft, that are not focussed on this space.
So the fairer comparison would be to compare .rmd vs. .ipynb instead of looking at the kernelspecs in .ipynbs to see the distribution of R users. In my experience (I own the Azure Notebooks service at Microsoft), I see very little R usage - in the order of 1% of Python, similar to what other commenters on this thread have seen.
We wouldn't need special viewers in our source control frontend, or git hooks + tools like jupytext, or in-between editor plugins (VSCode)! By not storing state with input and forcing top-to-bottom linear runs, I think the entire field of machine learning would be significantly more reproducible than it is now.
I do believe that we have room for more than one option. The work that we've done in interactive programming in the Python VS Code extension is one possible approach that folks have seemed to like, and that aligns with what you're saying.
We're going to continue to explore that direction in addition to the Jupyter work that we're doing. Stay tuned!
R users are mostly using RStudio.
You get most of the same experience and you can even customise various of the steps?
We are VM based for now but are moving to be kubernetes based to make sharing better. Our initial market is classrooms.
There are expensive ways to deal with this today, e.g., running each user isolated in a separate VM. Hopefully we will have better solutions in the near future.
Like I can't add a project. It says "To create project
Please wait till hard drive button turn green." I don't know what that means. You've lost my attention.
Can you going into a little more depth about this statement?
past hn discussion: https://news.ycombinator.com/item?id=17856700
Disclaimer: I designed this experience in VS Code.
We also need to work in the discoverability of this feature too. Lots of existing users of our extension had no idea it was there ... suggestions welcome!
I tend to do a lot of exploratory work in Jupyter stuff, but find the whole process really annoying and cumbersome to set up - just been playing around with this VS code extension and it seems really neat!
I've been using nteract a lot recently but I'm gonna switch to VS Code now, at least that reduces one Electron app eating up my memory
If I view any fastai notebooks taken from their repo, there are a million imports (having a bunch of `import *` statements in the fastai lib doesn't help things) and methods and classes keep popping up out of nowhere. Good luck making sense out of them in a notebook. At least in Pycharm or VS Code, it's one Ctrl+Click away from viewing the relevant source code and having a somewhat better idea of what is going on.
Debugging is woefully inadequate compared to the Pycharm experience (VS Code is slowly catching up with Pycharm on that front).
I've only found 2 good uses for Jupyter Notebooks:
1. As a scratch-pad to try out things without any plans to utilize directly or share the code with anyone
2. As a means to write well documented examples/code with a bunch of markdown...especially when I'm modeling things that have a lot of equations, where the LATEX support is a boon and makes the documentation in the notebook far superior to what you could achieve in a regular .py script.
In almost every other instance, you are better off using a full-blown IDE with a far superior development environment, ability to have venvs, superior debugging, superior code management/refactoring and most importantly, much better reproducibility.
All that being said, I think the direction in VSC is definitely a great step in the right direction. I honestly love VSC but can't give up PyCharm yet as it is still a long way ahead of VSC when it comes to Python features (much better linting, much better management of venvs and run configs, more powerful debugging experience... though VSC is getting pretty good, much better refactoring support and PEP-8 reformatting/linting). I really hope there is more of a push within MS to continue improving the Python experience in VS Code. Nothing would make me happier than to consolidate my work in VSCode! Keep up the awesome work!
I'm currently thinking about a model of notebooks / interactive programming where the default assumption is that you're using it as a scratchpad, i.e., you won't need to explicitly name the file in order to get the benefits of auto-save, but yet the file won't pollute your filesystem / project namespace until you choose to "keep" it. Hopefully this helps reduce the friction in the exploratory programming realm; my goal is to eliminate the "ConsoleApplicationX" directories (I'm a VS guy from way back so ...) and I think it helps with your scenario 1) above.
The Python VS Code extension team is well aware of the gaps that you list as well, and are working hard to narrow them with each release. We are definitely serious about improving the Python experience in VS Code. How times have changed :)
Thanks for the support and encouragement. And as always, if you find issues that aren't already in our github, feel free to add some more and keep the feedback coming!