Hacker News new | past | comments | ask | show | jobs | submit login
Estimating Number of Jupyter Notebooks on Github (kyso.io)
275 points by eoinmurray92 43 days ago | hide | past | web | favorite | 130 comments



In the same spirit as “Effective Java” and “Effective C++” we need to have a book entitled “Effective Jupyter Notebooks”. Here are some of my items below. Maybe this sub-thread can come up with an outline for this book.

Item #1 Writing a notebook is foremost an exercise in expository writing. Make sure the writing is high quality is the first objective when writing a notebook. This is the Knuth’s literate programming idea where prose takes precedence over code, which is usually the reverse of the way we usually program; code first, comments second.

Item #2 Don't use notebooks for general purpose programming. Notebooks are supposed to have an audience and clearly explain something.

Item #3 Keep code cells simple and clear. If needing a comment in the code block, consider putting that verbiage in a markdown cell instead and elaborating on the idea the notebook is trying to convey.

Item #4 Don't make notebooks a long series of extended code cells, or even worse, just one long cell. Explain what is going on or see Item #2.


I disagree.

While your points are valid for presentations, I believe that notebooks should first and foremost be used for exploratory computing.

Notebooks are my goto tool for whenever I need to do something with a computer and any of the following apply:

* I'm not quite sure what or how.

* It will likely be a one-off

* I need it now.

* Someone is watching me, to learn how I do it.

I would go a step further and say that notebooks should probably not be shared directly most of the time, and that you should wrap up functionality into modules/packages for other people to use in their own notebooks.

Edit: Just thought of a corny pun to make my point: They are notebooks, not textbooks. Notebooks are personal, and while it may be useful/insightful to compare notes, it's not the primary function of a notebook.


Agreed. I've been increasingly doing that at work with org-mode. Sometimes I'll just open up a source block under a TODO item, other times I'll create a new heading or a new file for some stream of thoughts mixed with code blocks in various languages. It feels better than doing it in a pure source code file, with my thoughts noted as comments (though I do plenty of that too).


Indeed, when the alternative is the interpreter, the structure notebooks offer and the (loose) process they encourage makes for a vastly better solution for recording and capitalizing on your work.


I agree with with your disagreement. I've always approached notebooks as a prototyping tool first and a presentation tool second. I see Jupyter more as an example of an early Interactive programming tool that just happens to be useful for presentation and teaching purposes. (https://en.wikipedia.org/wiki/Interactive_programming)

And regarding sharing -- particularly with notebooks you should be really cautious about sharing and opening them. The security model is not exactly bullet proof.


> I see Jupyter more as an example of an early Interactive programming tool

Nitpick, but I hope you meant "young", not "early". Interactive programming is as old as (and today still primarily featured in) Lisp - i.e. twice as old as most of us here on HN.


While I generally agree with everything you say for a particular audience, i.e., someone who is writing something to explain an idea to someone else, I think that at the same time this scoping limits the utility of Jupyter too much.

I believe that there is an audience for Jupyter beyond just "data scientist", or "teacher". I think that what Jupyter encourages is _experimentation_, and there are lots of folks who could benefit by using Jupyter to experiment with ideas. If you consider programmer as another audience, we all at different points in time experiment with things. Think about all of those random console applications that we've all created at one point or another to learn a new API, or to try out an idea. Jupyter can (and does) excel in these scenarios and handles the mundane task of "writing things down". This is why adding a Jupyter experience to a text editor is something that we're experimenting with as well.


I disagree massively with this, to me Jupyter is a repl+ development environment + rapid prototyping in ETL/visualisation. There's no rule that they have to be used for exposition, or any writing at all. I don't know why you think this is a rule, or what you'd use as an alternative data-repl.


For example Peter Norvig does an exemplary job with jupyter notebooks for expository writings: https://github.com/norvig/pytudes


See also "computational essay"[1]. It is written with Mathematica notebooks in mind, but many of the ideas carry over to Jupyter notebooks.

[1] https://blog.stephenwolfram.com/2017/11/what-is-a-computatio...


and a more jupyter-centric agree/reply by Tony Hirst:

https://blog.ouseful.info/2017/11/15/programming-meh-lets-te...


For #2, I have used notebooks for personal data analysis. I am my own audience and need to remember what I did and why.


This is a really good idea, and I've been planning to write said guide for a while.

I think there is a lot to do with organising and formatting the notebooks, and the %run command works a treat for me when breaking a project into multiple notebooks


Check out: Ten Simple Rules for Reproducible Research in Jupyter Notebooks, https://arxiv.org/abs/1810.08055 and https://github.com/jupyter-guide/ten-rules-jupyter


While I don't follow #2, I suggest two more item:

Take the time to learn some good programming practices. If you're lucky and fall in love with scientific programming, your programs will grow to a point where they are unmanageable without some techniques borrowed from computer science and software engineering.

Have someone show you the "out of order execution" and "hidden state" failures.


Obligatory Joel Grus: slides from his 'I don't like notebooks' talk. He has good follow-up talks too.

https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...


...and a well-written response to that by Yihui Xie: https://yihui.name/en/2018/09/notebook-war/

Since Xie's post discusses a "meta" layer above Joel Grus, I recommend people read it before JG. This actually helps one understand the underlying thinking in JG's slides.


If you ever put notebooks in source control, you owe it to yourself to try the text-based notebooks supported in Visual Studio Code[1]. They're round-trippable with real (i.e. browser-based) notebooks, yet are much better for collaboration, diffing, and editing.

[1] https://code.visualstudio.com/docs/python/jupyter-support


How does this compare to Jupytext?

I prefer pipenv to Conda, and I don't like having Jupyter(Lab) installed in each venv separately, so instead I only add `Ipykernel` to each venv and then use my system-level JupyterLab to access per-project kernels; seems like that wouldn't work here?


Thanks for mentioning that project. I'm wasn't familiar with it, but the difference seems to be that Jupytext is a tool to convert notebooks between ipynb and text formats, while VSCode supports the same format (#%% cell delimiters) but enables you to edit text-based notebooks similar to how you would in the browser, including keyboard shortcuts to execute individual cells, it'll render the markdown, and so on.

I'm not sure if I understand your kernel question, but VSCode's Python extension has everything built-in. As soon as you add a #%% comment it considers what follows to be a notebook cell and automatically gives you a "run cell" button that uses your chosen Python interpreter.


I've noticed that if I dont install jupyter to each venv, then the pythonpath defaults to the jupyter install. As a result, the jupyter notebook cannot find the packages in the venv(for example, caffe, module not found errors)


huh, I've never used Caffe but wrapping my venvs with pipenv has never caused me this kind of grief.

I tell pipenv to store venv stuff in the project folder (`export PIPENV_VENV_IN_PROJECT=1`), and then do the following to start a new project:

    pipenv --python 3.7
    pipenv install ipykernel
    pipenv run python -m ipykernel install --user --name=`basename $(pipenv run dirname '$VIRTUAL_ENV')
    jupyter lab
(all the magic is in that third command)

Then in the list of kernels, in addition to the usual suspects I'll have one named for the folder I ran the above in. If I started a notebook before all that, I'll have to change it's kernel. Doing more `pipenv install` at the prompt makes new packages immediately available in the running notebook.


Exactly. Conda is the worst of Python world - installing gigabytes of unnecessary garbage every time.


But the purpose of Anaconda is to have a easily installable set of frequently used tool for a variety of data science tasks. It's not meant to be a minimalist package.


Conda can create environments with literally nothing in them. If you’re putting the full-blown anaconda metapackage in every environment, then it sounds like you’re using conda wrong for your use case. Just tell conda to install the packages you actually need.


Oh man, that's basically org-babel [0] but with working jupyter server support!

0: https://orgmode.org/manual/Working-with-Source-Code.html#Wor...


This sounds so good! I just read that Latex support is also included. If there is also an extension to handle references I can move out of emacs / org-mode / orgref because the maintenance cost is too high.

Cannot wait to give it a try.



The jupyter support in VSCode is working very well.

I submitted this a week ago, but too bad nobody cared. https://news.ycombinator.com/item?id=19794865


The R ecosystem also has a text-based notebook format that they call R Markdown, with a long lineage going back to the TeX days.

It never caught on outside of the R community, but the format itself is language-agnostic.

https://rmarkdown.rstudio.com/


I believe people underestimate what can be done in R/RStudio these days because they haven't been exposed to it.

R Markdown and its derivative are such great tools. I would also suggest xaringan if you want to present your work on a big screen.


pycharm supports the same thing, with debug, but frustratingly doesn't allow running all cells


Why doesn't this just use the GitHub public dataset available on Google BigQuery to have much more accurate data rather than "scraping GitHub web search results"? https://cloud.google.com/bigquery/public-data/

There are a lot of examples of people analyzing public code on GitHub efficiently for patterns and usages with BigQuery and getting pretty accurate data out of it. https://medium.com/google-cloud/analyzing-go-code-with-bigqu...

If you use GitHub on a daily basis, you are unlucky enough to know that web search sadly can't even find words that exist in your repository.


One limitation of the BigQuery dataset is they only look at repos with a license on them[1], the scraping approach can look at all public repos.

[1] https://github.blog/2017-01-19-github-data-ready-for-you-to-...


Off topic:

Possibly something in my config, but I've recently got a lot of

    "Sorry, something went wrong. Reload?"
when trying to view Jupyter notebooks on github itself. Seems to be working right now.

I have used this as an alternate: https://nbviewer.jupyter.org/


This is what we made Kyso for - the linked post is actually a Jupyter notebook itself, the code is hidden by default to make it readable to non-technical people but you can click on the "code hidden" button on the top right to see the code in full.

If github is not working for you well for your notebooks. You can try Kyso by signing up and importing your notebook from Github directly on this page: https://kyso.io/github, or upload it using this page: https://kyso.io/create/study

Disclaimer - I am OP and founder of Kyso


You made Kyso for making GitHub-hosted Jupyter Notebooks render when accessed with a web browser on github.com?


Well, it's one of the ways to post to Kyso, but yeah, you can synchronize your Github repositories to Kyso and choose the notebooks you want to have rendered. When you push changes to the repo, the sister post on Kyso will be automatically updated.


I get it a lot, so I set a precommit hook to render all my notebooks to HTML. With GitHub Pages, I can access them easily.


Sounds useful, care to share that hook?


I've gotten this as well after uploading notebooks to github. It usually goes away for me after ~10mins after I've uploaded the .ipynb. I'll check this out though, thanks!


thanks for linking there, tried to make some ruby notebooks and works quite well https://nbviewer.jupyter.org/github/localhostdotdev/notebook... (very similar to the github interface though)


I don't think it's just you, I've been getting them too. It's probably Github timing out trying to render the notebook.


Re: this - it would be fun to set up a Beowulf cluster of Jupyter notebooks, if you will, and just upload them to Github for free parallell computations.

(Obviously this would be for shits 'n giggles, not any serious work).


GitHub and nbviewer don't actually execute the notebooks, they simply render the markdown, code, and saved output to HTML.


If only more people would use org-babel...

If you're on emacs and like Jupyter, there's https://github.com/dzop/emacs-jupyter , which is pretty nice. I've been using it for a few days with Julia, and it works really nice. It also allows you to use different kernels from the same org-mode file, though I haven't tried to pass data between them yet (should be possible, though, at least it works in plain org-mode).


I agree. emacs-jupyter is great but I'm eagerly waiting for jupyter notebook server support since my work is on remote clusters now. For local use, it's pretty great.


Emacs jupyter notebook mode (ein-mode) is pretty excellent too. That's my go-to way to connect to jupyter.


Also when using Term2 (on Mac) or a Linux terminal you can configure the backend of matplotlib to draw plots right in a SSH shell, term window, etc. I think this would work fine if you were running emacs with -nw no window option in a terminal.


We use notebooks heavily for onboarding devs & data scientists to Graphistry, and I only see that increasing.

Interestingly, for initial use, we increasingly start teams on their existing internal NB servers, and for new ones, they either start on Jupyter included in their Graphistry AMI or use Google Colab. So, very little outside of our quick start notebook skeletons hits GitHub.

So... How many notebooks are actually out there? Probably an even more interesting growth curve...!


How do you share and collaborate on the notebooks internally - I'd love to get your thoughts on our Kyso for teams system [1] if you would be willing to chat?

[1] https://kyso.io/for-teams


Google Colab has solved the 90% for us. I wish it'd have context sharing across users and better default folder management, and probably other things, but free + usable + sharing has been amazing in + across partners.

Feel free to ping in a ~couple weeks, happy to chat.


OP and Founder of Kyso here - we built Kyso to make it easier to blog your notebooks to the public and also to make them easier to share in teams.

The linked post is actually a Jupyter notebook itself - analysing the number of notebooks on Github.

A key element with Kyso is that the code is hidden by default to make it readable to non-technical people but you can click on the "code hidden" button on the top right to see the code in full.

If you want to give Kyso a go - sign up and import from Github directly on this page: https://kyso.io/github, or upload using this page: https://kyso.io/create/study


In my opinion hiding code is an anti feature as exposing the code by default gives you more incentive to write clean understandable and explaining (self documenting) code as it's always visible. By hiding it people might be more likely to paste in big blobs ugly code that would be much better put into a reusable function then in a notebook snippet.


Thats true if you sharing with someone who also understands the code - we think that the feature allows you to share the notebooks with a completely non-technical audience.

Image you had a notebook to analyse sales data and you needed to present the results to your CEO (who perhaps cannot code) - this feature lets you present the notebook as is, without needing to prepare a report in some other format


Why would you share a notebook with somebody who doesn't understand the code?? What are they supposed to do with them anyways? Reading? Isn't that what PDF is for? Just give them PDFs, C*O people have got piles of other things to worry about besides shared notebooks.


Mostly so you don't need to convert to PDF and so that you can host the reports in a central place where everyone can read them technical or not.

Like an internal wiki for a companies data-science where the technical people can communicate their work to the non-technical people with a pretty seamless experience


Let me elaborate. Delivering information is a job. Either you do this by yourself -- by translating from your datascience-speak (aka notebooks) to the business-speak (aka concise, targeted PDFs) -- or you make stakeholders do that job instead of you.

You effectively say to them "I want you to be proactive and go ahead and grab this food which I prepared for you and placed here and here and over there, and by the way plates and utensils are in that corner, help yourself" instead of "I though you might need this, here it is."

The way you do it works in some orgs, but generally it doesn't.


This is not the experience non-technical senior leadership people are looking for, unless you are a 10-people startup.


Im not sure - we have large teams using Kyso as a knowledge base for data-science work and there's also Airbnb's knowledge-repo which originally inspired us so from my point of view there is decent evidence for the need for this


Ah, that makes sense. So, basically you replaced a dashboard effort with a whole bunch of readonly notebooks, thus distributing the information delivery job among the peers outside of your DS team. Clever.


We actually have a few ways to publish your notebooks.

Upload directly, import from Github or use our plugin to publish from Jupyterlab itself - all those methods are outlined on our docs page: https://kyso.io/docs


I should not that anyone can contact me at eoin [at] kyso.io


We're seeing an explosion of Jupyter use as well on GitLab. GitLab already makes Jupyter easier to install on a Kubernetes cluster https://docs.gitlab.com/ee/user/project/clusters/#installing... In response to the growing demand we're doing two things:

1. Adding better Jupyter support to GitLab 12.0 https://gitlab.com/gitlab-org/gitlab-ce/issues/47138 as suggested by my co-founder.

2. Making it easier to do the entire data lifecycle with Meltano https://meltano.com/ which plans to include JupyterHub


Python Notebooks will the be the PERL of 2010s - write once, pretty impossible to maintain long term


Very cool work here. This is a pretty epic post, so please do not take this the wrong way.

I was under the impression that FB Prophet was optimal for significantly seasonal time series data.

Honestly given the fickle nature of these kind of growth patterns beyond the very near term, an ARIMA with a flat vol or a simple eyeball extrapolation in my experience as a quant would likely generate just as reasonable/reliable results.

While I understand this is likely intended as a standalone project, it would be interesting to run a comparison of ARIMA vs FB Prophet on out of sample trending Github tools/file types, as well as the general performance of these predictions beyond a one year time frame (especially vs the reported confidence intervals in Prophet).

I am not that familiar with how Prophet works, so I am absolutely open to being humbled and corrected. I have a project myself that has a varying seasonal component and I am looking forward to diving into Prophet for a deeper understanding. I am attempting to model an Asian 2 asset spread option with a volume weighted average index price setting mechanism where the underlying exhibits seasonality in the volume traded over the trading time window. I am currently running a Monte Carlo on the valuation with a simple average settlement assumption, as opposed to a volume weighted average assumption, and I was thinking Prophet could help.

Does anyone have experience in financial time series analysis and option valuation who would care to chime in?

Also, what is everyone's thoughts on using prophet non seasonal vol clustering times series?


Hey, I posted the notebook by the OP. Thank you for your feedback! You're correct in saying that FB Prophet is for forecasting time series with strong seasonal effects. FB Prophet was the model used in the original script I found & the main point here was simply to make the notebook more readable on kyso, which has quite a few non-technical readers. I've worked a lot with ARIMAs before for financial/economic data and I like the idea of comparing the results between the two, and maybe even extend the time frame. So 1. I think that'll be my next project and 2. if your project is public I'd love to give it a read when published.


Did you try running the forecast with a log ceiling to control for the trajectory a bit? Or would that only be a concern of yours if you had to forecast past a couple of years? I find that when I use Prophet to forecast down to the day I end up creating initial forecasts with heavy log ceilings to prevent unreasonable estimates of the future and then end up removing the ceiling once enough history is established to provide a resaonable baseline.


Ah okay. Great work with plotly on visualization then. This is a great skill to have and something I need to work on. I’ve added Kyso to my favorite list. I look forward to your future posts.

I will likely publish something on my project using stale data given I work in a trading environment. The theory should be the same though. One of these days I’d like to actually write a solid white paper level research study and get published! One can dream!


My two cents: We've been recently working in a FREE hosted version of Jupyter Lab mainly intended for education. Feel free to check it out.

https://notebooks.ai/

Would love to hear some feedback.


Wondering how you are planning to keep it free. Also wondering whether you would possibly consider shifting to Sagemath/CoCalc as a service.


We got support of the local university at my city and we got a bunch of free credits at AWS. Costs are very low and we want to keep it that way so we can support the most students we can with a free access.


What happens when your funding and AWS credits run out?


We might add bigger paid tiers later if we decide to support business usage of it. For now it's only educational and we can deal with the costs, even without the credits. Containers used are small and get shut down on inactivity. So, we only need to care about concurrent users. Hope it makes sense.


> Containers used are small and get shut down on inactivity.

How do you define inactivity? If I do

$ nohup ./computational_intense_and_runs_for_100_hours.py &

Do you just kill the process (or stop the container)? In essence Jupyter is a graphical rich shell, so you providing free *nix machines - don't underestimate how this feature can be exploited (e.g. CoCalc limits at least internet access for free instances).


First, that will use 100% of the CPU quota assigned to your user, which is really small.

Second, yes. The container will be killed after 10min unless we keep detecting activity of your user in the platform. So, basically the rule is: If we don't detect user's activity after 10mins we kill all containers for that user. You could hack this by doing periodical requests to the API to simulate activity, but at some point your JWT will be expired and requests will start failing.

In any case, other students won't be affected at all by the appropriate usage and we will end up banning your account at some point when we detect it.

We also limit the amount of parallel running containers to avoid unlimited containers running at the same time.

Do you see any drawbacks on this implementation? Happy to hear about possible improvements.


> We've been recently working in a FREE hosted version [...]

Oh no, you make it sound like this is a good thing. It only means I can't take you serious.

Don't make it free. That is not a feature for a computation environment, it will only cause headaches on your side and people get wrong (bad) impression about the performance (assuming free accounts get some limited shared instances).

I rather pay a monthly fee for a good application, than a pseudo free instance, where you get limited resources. Do you have your credits forever?


That's a very good point. We wanted to make it super accessible for students using Jupyter. Also, we only support very small containers which of course don't have GPU. It's not mainly intended to be used for business, but for learning purposes.


What would be the difference between using this and something like Google Colab?


We were using Colab and Azure Notebooks before, but we found a few not covered requirements. For example:

* We added a way for students to present their projects / analysis in an static blog-looking version of their notebooks. Here's an example: https://notebooks.ai/martinzugnoni/how-to-trade-bitcoin-with...

They can share that version with classmates, and classmates can "fork" the project and do their own changes to it.

* Also, we added a custom JupyterLab extension called "solutions" that allow the teacher to mark certains parts of the notebooks as assignment solution and it will be hidden for the student until they decide to reveal it. Here's how it works:

(teacher) https://user-images.githubusercontent.com/7065401/50402147-1...

(student) https://user-images.githubusercontent.com/7065401/50402146-1...

We now have the ability to keep adding educational relates stuff to it, without depending on Google or Microsoft, that are not focussed on this space.


I think it would be cool to run the same analysis on the number of R Notebooks on Github and compare the two.


The R community has built an incredible competing tool in RStudio and RMarkdown. You could argue that the progress of Jupyter is being held back by its ties to JSON as the serialization format for notebooks (which enables, among other things, the ability to serialize notebook execution output into the .ipynb vs. RMarkdown which does not).

So the fairer comparison would be to compare .rmd vs. .ipynb instead of looking at the kernelspecs in .ipynbs to see the distribution of R users. In my experience (I own the Azure Notebooks service at Microsoft), I see very little R usage - in the order of 1% of Python, similar to what other commenters on this thread have seen.


Can you imagine how much headache would have been saved if Jupyter was basically Rmarkdown with a less R-specific extension and a more general backend? After years with R I've been doing a lot of python work lately and UGH it feel like JupyterLab is a decade behind Rmarkdown.

We wouldn't need special viewers in our source control frontend, or git hooks + tools like jupytext, or in-between editor plugins (VSCode)! By not storing state with input and forcing top-to-bottom linear runs, I think the entire field of machine learning would be significantly more reproducible than it is now.


I totally agree with your sentiment. However, I also think that the popularity of Jupyter is largely due to the fact that it does appeal to a certain type of user.

I do believe that we have room for more than one option. The work that we've done in interactive programming in the Python VS Code extension is one possible approach that folks have seemed to like, and that aligns with what you're saying.

We're going to continue to explore that direction in addition to the Jupyter work that we're doing. Stay tuned!


I did this a while ago on a sampling of the dataset from https://bigquery.cloud.google.com/table/fh-bigquery:github_e..., Python is ~100x that of R.

R users are mostly using RStudio.


Aren't Jupyter notebooks R notebooks? Jupyter stands for "Julia, Python, R" I believe


Yeah this is true but sometimes R notebooks also refers to R studio notebooks - https://rmarkdown.rstudio.com/lesson-10.html


I think the file extension scraped on Github is only .ipynb, which is only python notebooks right?


I think most people save under this extension even if they are using a different kernel (i.e. they are running R, Julia, Matlab, etc. code in the notebook).


Yeah so its mostly a split between Jupyter and R Studio - but Jupyter can mean different languages


But the user is choosing a Jupyter Kernel upon creation and the code is run on either one of the available Kernels. No code mixing within one notebook at the current point.


I thought you could with apache toree? Maybe I misunderstand the project


On the topic of Jupyter Notebooks, Is there something similar to a paid version of Google's CoLab? CoLab is so awesome for creating prototypes and even better since it's free. However, there is no paid alternative that I have seen. I do not want to have to deal with setting up my own VM or server. The way that CoLab is perfect for what I need.


You can use a VM with Colab[1],also paperspace[2]and floydub[3] are other companies providing cloud gpu with notebooks.

[1] https://colab.research.google.com/drive/1Xh0slMoD1dfi8iTQJbO...

[2] https://www.paperspace.com/ml

[3] https://www.floydhub.com/


The OP post is a Jupyter notebook itself and if you sign up to Kyso you can actually Jupyterlab on our cloud and the post the notebooks to the web, or make them private on the paid plan - is that what your looking for?


Yeah but unfortunately there is no GPU support. I wish there was!


Ah yeah ok - we are not planning GPU support soon - what if we created one of those one-click deploy a VM to aws/digital-ocean buttons and from there if you wanted to post to Kyso you could do it with git or our jupyter lab plugin.

You get most of the same experience and you can even customise various of the steps?


Interesting! That would be super awesome. Like a one-click deploy with auto-shutdown after the end of run to save money. I would definitely pay for a service like that!


Google Cloud Datalab seems like it fits the bill: https://cloud.google.com/datalab/



Gryd and CoCalc


So I just learned they're not laptops.


This comment made my day!


There's also the GitHub extracts table available in BigQuery which allows analysis of the contents of the notebooks themselves: https://bigquery.cloud.google.com/table/fh-bigquery:github_e...


We built a runnable jupyter notebook website. Would someone be able to take a look and give us some feedback?

https://datacabinet.systems

We are VM based for now but are moving to be kubernetes based to make sharing better. Our initial market is classrooms.


Something to be aware of (and a general comment about k8s in general) is that k8s is not suitable for use in hostile multi-tenant scenarios like the one that you're describing. Once an attacker escapes from the container (see HN archives for lots of examples of this), they can p0wn the entire cluster. Jessie Frazelle has a great post on this: https://blog.jessfraz.com/post/hard-multi-tenancy-in-kuberne...

There are expensive ways to deal with this today, e.g., running each user isolated in a separate VM. Hopefully we will have better solutions in the near future.


We were starting to work on disabling kubernetes cluster access. We will try the steps in the post.


There are a few basic UI problems I ran into immediately. You should focus on that instead of rearchitecting things. It needs to look sleeker, too. Pay some designer to create a new design for you.

Like I can't add a project. It says "To create project Please wait till hard drive button turn green." I don't know what that means. You've lost my attention.


Thanks. We are doing some enhancements. Here is some initial screen: https://mondaytest.datacabinet.systems/ system but the page inside is still the same. We have an elaborate design done.


Right now, for VMs we make a hard drive(mkfs, install things) for every user that takes some initial time. We will fix that in the kubernetes version.


Especially odd because they are so unsuitable for use with git. Someone needs to find a way to fix this.


> because they are so unsuitable for use with git

Can you going into a little more depth about this statement?


Diffs primarily I'm guessing - a, it's kinda hard to parse the jsons that you see when you look at a notebook in raw text b, every time I execute a cell, it shows up in the diff as a change. That being said, there are plugins and tools that deal with these issues quite well. check out https://nbdime.readthedocs.io/en/latest/


I have not had good luck with nbdiff. Minutes-long runtimes and huge memory consumption on fairly standard ipynb’s.


Isn't the prediction too low? My (unsupported) prediction fitting a smooth curve in the graphic is https://imgur.com/a/ykeIxPm


This seems like an incredibly complicated way to fit an exponential to data...


Maybe it's time to be able to run them implicitly on Azure cloud?


nice marketing, kyso.io team


I hate jupyter notebooks. Joel Grus puts it perfectly: https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...

past hn discussion: https://news.ycombinator.com/item?id=17856700


What specifically do you hate about Jupyter? Is it out of order execution exacerbating the "hidden state" problem? If so, and if you already use VS Code, I encourage you to try out our Python VS Code extension. We have an "Interactive Python Window" mode https://code.visualstudio.com/docs/python/jupyter-support that we showed a lot of folks at pycon last week and even among the "I don't like Jupyter" crowd, it was quite well received. The key thing about our experience is that it is an editor focused interactive programming experience vs. trying to just replicate Jupyter functionality in an editor (though we are also doing work here because we believe that folks want an experience where they can move seamlessly back and forth between and editor and a notebook experience).

Disclaimer: I designed this experience in VS Code.


Please allow me to thank you for designing an experience that hits the sweet spot between maintaining a good history of work through git while allowing for interactivity and ease of exploration. I started using VS Code on seeing a video of the jupyter support within the latest release of the Python extension. This has eased and sped up my work tremendously!


Thanks for the kind words! If you find things that you would like to see improved, please do open an issue on our Github - https://github.com/Microsoft/vscode-python/issues.

We also need to work in the discoverability of this feature too. Lots of existing users of our extension had no idea it was there ... suggestions welcome!


This is really cool.

I tend to do a lot of exploratory work in Jupyter stuff, but find the whole process really annoying and cumbersome to set up - just been playing around with this VS code extension and it seems really neat!

I've been using nteract a lot recently but I'm gonna switch to VS Code now, at least that reduces one Electron app eating up my memory


Yay for less electrons! We're also using nteract for a future release of Azure Notebooks, and we're already using nteract in the VS Code extension. BTW, by nteract I mean nteract as a suite of components for building Jupyter things vs. nteract the electron app. We recently hired the awesome Safia Abdalla to help us contribute back to the nteract community and things have been going great!


Nice work on this! I think apart from the very valid points raised by Joel Gru in his presentation slides, Jupyter Notebooks are woefully inadequate for exploring and understanding code that utilizes libraries. I'll take the FastAI library as an example since Jeremy Howard is the one who took Joel to task regarding criticizing Notebooks:

If I view any fastai notebooks taken from their repo, there are a million imports (having a bunch of `import *` statements in the fastai lib doesn't help things) and methods and classes keep popping up out of nowhere. Good luck making sense out of them in a notebook. At least in Pycharm or VS Code, it's one Ctrl+Click away from viewing the relevant source code and having a somewhat better idea of what is going on.

Debugging is woefully inadequate compared to the Pycharm experience (VS Code is slowly catching up with Pycharm on that front).

I've only found 2 good uses for Jupyter Notebooks:

1. As a scratch-pad to try out things without any plans to utilize directly or share the code with anyone

2. As a means to write well documented examples/code with a bunch of markdown...especially when I'm modeling things that have a lot of equations, where the LATEX support is a boon and makes the documentation in the notebook far superior to what you could achieve in a regular .py script.

In almost every other instance, you are better off using a full-blown IDE with a far superior development environment, ability to have venvs, superior debugging, superior code management/refactoring and most importantly, much better reproducibility.

All that being said, I think the direction in VSC is definitely a great step in the right direction. I honestly love VSC but can't give up PyCharm yet as it is still a long way ahead of VSC when it comes to Python features (much better linting, much better management of venvs and run configs, more powerful debugging experience... though VSC is getting pretty good, much better refactoring support and PEP-8 reformatting/linting). I really hope there is more of a push within MS to continue improving the Python experience in VS Code. Nothing would make me happier than to consolidate my work in VSCode! Keep up the awesome work!


Thanks for the kind words!

I'm currently thinking about a model of notebooks / interactive programming where the default assumption is that you're using it as a scratchpad, i.e., you won't need to explicitly name the file in order to get the benefits of auto-save, but yet the file won't pollute your filesystem / project namespace until you choose to "keep" it. Hopefully this helps reduce the friction in the exploratory programming realm; my goal is to eliminate the "ConsoleApplicationX" directories (I'm a VS guy from way back so ...) and I think it helps with your scenario 1) above.

The Python VS Code extension team is well aware of the gaps that you list as well, and are working hard to narrow them with each release. We are definitely serious about improving the Python experience in VS Code. How times have changed :)

Thanks for the support and encouragement. And as always, if you find issues that aren't already in our github, feel free to add some more and keep the feedback coming!


Awesome! Wish you guys the best. As someone who has made a few electron apps in the past couple of years (I am not a front-end/back-end guy), I have immense respect for the VSCode team and the quality of the product. They pretty much showed the whole world what a quality electron app can look like and that most of the blackeye electron gets for "performance" can be chalked down to poor programming practices and software design in the case of a lot of other electron apps.


That surely is a step in the right direction, thank you.


exponentially?





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: