
Hosting Jupyter Notebooks for 100k Users [video] - jbredeche
https://www.youtube.com/watch?v=omxY1YsPGUc
======
ssanderson11235
Speaker here. If you want to follow along with the slides from the talk, you
can find them at [https://speakerdeck.com/ssanderson/hosting-notebooks-
for-100...](https://speakerdeck.com/ssanderson/hosting-notebooks-
for-100-000-users).

Also, happy to answer any questions that people might have.

~~~
cbanek
I'm working on building an educational environment with Jupyter, and I'm
interested in the multiple hubs.

A few basic questions: Why multiple hubs? (was there some point of scale where
you needed this) Did multiple hubs allow you to have better migrations? (where
you drain one and move it over to the other)

Totally agree that state is the enemy of scale, so having a separate service
backing your storage independent of what hub you're on seems like a big win.

Thanks for the great talk!

~~~
ssanderson11235
> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1\. Having multiple hubs makes it much easier to do zero-downtime deploys.

2\. Having multiple hubs makes us more resilient to transient machine
failures.

3\. We were worried that having a single proxy for all our notebook traffic
might become a system-wide bottleneck. Notebooks with a lot of images can get
pretty large, and at the time we were rolling this out JupyterHub was pretty
new. We weren't sure how well it was going to scale (the target audience for
the JupyterHub team at the time was small labs and research teams), so it
seemed safest to aim for horizontal scalability from the start. The JupyterHub
team has since done a lot of awesome performance work to support the huge data
science classes being taught at UC Berkeley, so it's possible that a single
hub with the kubernetes spawner could handle our traffic today, but given
points (1) and (2) plus the fact that we already have a working system, I
don't have much incentive to find out :).

~~~
cbanek
That's great, thanks! I was also curious if you hit scale issues on just one
hub. I agree, it's best practice to not have all your eggs in one basket. I'd
love to see an HA hub where this would be all taken care of for me, but
hopefully by the time we go live we'll have this.

------
sandGorgon
there is Dash
([https://plot.ly/products/dash/](https://plot.ly/products/dash/)) and there
is Jupyter.

I wish there was some abstraction to generate a Dash like output from Jupyter.
There are a lot of people who would pay serious money for that.

Even Airbnb built a framework to extract code from Jupyter notebooks and push
them into a machine learning pipeline ([https://medium.com/airbnb-
engineering/using-machine-learning...](https://medium.com/airbnb-
engineering/using-machine-learning-to-predict-value-of-homes-on-
airbnb-9272d3d4739d)).

Jupyter can be so much more by going closer to how it fits within a production
pipeline versus just competing against Rstudio.

~~~
rb808
Also I always thought notebooks would be a great devops tool, kind of like a
super command line that has easily observable steps grouped in chunks and
graphical feedback. No one else seems to think so though so maybe I'm wrong.

~~~
existencebox
As disclaimer/context, am a dev on the Azure hosted Jupyter Notebooks product,

You're not wrong! (Or at the least, it's a topic that has come across our ears
before, and is something I certainly agree with) Obviously I probably
shouldn't go off spouting all the pipe dreams I have in this space, but given
that I got my start doing Ops work and tried to keep an eye for things I might
have liked back then, I can assure you there you're not alone.

I always saw the similarity foremost as a direct upgrade to the
"runbooks"/"firefighting/deploy checklists" that crop up all too often.

~~~
alexeldeib
Another Azure dev checking in :)

Have you seen Application Insights Workbooks [0]? Basically you can have
interactive notebooks and run analytics queries against your telemetry,
generate charts, add text cells, etc. It's picking up usage for investigating
outages, e.g., have a Workbook with a query that looks at your dependency
calls and determine what service is failing + produce a visualization.

Workbooks don't actually execute any external actions, though. It's solely an
analysis tool. Runbooks skew the other direction, they are for executing
scripts (more or less).

Jupyter/python seems to fit in a nice gap where this could be bridged,
especially with the level of existing python support from azure sdk + cli.

PS: a dev from Workbooks has seen Azure Notebooks, and was curious a while
back about how he could integrate the functionality [1]

[0]: [https://docs.microsoft.com/en-us/azure/application-
insights/...](https://docs.microsoft.com/en-us/azure/application-insights/app-
insights-usage-workbooks) [1]: [http://blog.my-is300.com/2017/06/what-i-work-
on-application-...](http://blog.my-is300.com/2017/06/what-i-work-on-
application-insights-workbooks/)

~~~
gardnerjr
wow, a link to my blog (that second link) made hackernews? that's exciting!

Anyway, yeah, workbooks in appinsights is almost like notebooks for non-
programmers? kinda? you string together markdown, parameters, and analytics
queries (and very soon metrics across more of azure) into reports. But the
parameters stuff lets you do more interactive things to hide/show sections
now. i really need to do a new blog post about all the new stuff that's in
there that wasn't last june!

i've prototyped some stuff to export an AI workbook to an azure/jupyter
notebook, as there's some support for querying analytics already from a python
package. there just hasn't been enough demand for it so far (not as much as we
expected, anyway?)

