
JupyterHub 1.0 - jonbaer
https://blog.jupyter.org/announcing-jupyterhub-1-0-8fff78acad7f
======
acbart
I love JupyterHub, but we've hit some real headaches in having it scale. At
Virginia Tech, we have an introductory course for non-computing majors where
students are using Jupyter through JH. At around the 70-student mark, we have
performance issues. Considering that the course is eventually meant to scale
much farther (hundreds of students), we're not really sure how we can make
further progress with our current resources. I hope this new version has some
performance enhancements (though I don't see any in the changelog). Last I
talked with anyone about this was in the NBGrader project[1], where other
schools were hitting walls with scaling.

[1]
[https://github.com/jupyter/nbgrader/issues/530#issuecomment-...](https://github.com/jupyter/nbgrader/issues/530#issuecomment-375693819)

~~~
julienchastang
Give us some details about your setup. Are you running JH on the cloud
somewhere? Or is this running on a single box at VT? We’ve got JH running on a
Kube cluster with persistent storage on the NSF Jetstream cloud [1] (which you
could qualify for free resources being from VT) . This set up should
theoretically address the scalability concerns you are having. See the work
Andrea Zonca and I have been doing [2, 3, 4, 5]. Contact me for additional
details.

[1] [https://jetstream-cloud.org/](https://jetstream-cloud.org/)

[2] [https://zonca.github.io/2018/09/kubernetes-jetstream-
kubespr...](https://zonca.github.io/2018/09/kubernetes-jetstream-
kubespray.html)

[3] [https://zonca.github.io/2018/09/kubernetes-jetstream-
kubespr...](https://zonca.github.io/2018/09/kubernetes-jetstream-kubespray-
explore.html)

[4] [https://zonca.github.io/2018/09/kubernetes-jetstream-
kubespr...](https://zonca.github.io/2018/09/kubernetes-jetstream-kubespray-
jupyterhub.html)

[5] [https://github.com/Unidata/xsede-
jetstream/blob/master/vms/j...](https://github.com/Unidata/xsede-
jetstream/blob/master/vms/jupyter/readme.md)

~~~
acbart
I'm not as up on the details as you'd probably want, but my understanding is
that we're running it on a virtual server with a nice chunk of RAM and CPU
locally at VT. I don't think we want to be at all reliant on external servers
- this is FERPA protected data. Plus, the long term goal was to find a
solution that other schools could adopt without being an R1.

------
coleifer
When would someone use jupyterhub? I've been running my own notebook server
for years, but it's single-user, single machine. Is hub for like providing
separate jupyterlab instances for a bunch of different users/different
machines?

~~~
rhizome31
Yes exactly. At my work when a new scientist joins us we just create an
account and she can get started on her research within minutes. Each user gets
a contained environment in which we mount a disk of shared data.

~~~
erikgaas
I just did this in my University lab as well. Most people aren't savvy with
Linux, so having normal accounts with Jupyter port forwarding is out of the
question. JupyterHub is just about the lowest friction I can possibly make it
for introducing the Python data science stack to non data scientists.

~~~
amrrs
IMO, Jupyter Notebook is the closest equivalent to Python as what Rstudio is
for R. While Pycharm and VSCode are also preferred by some Py-based Data
scientists, Jupyterhub offers almost everything that a typical IDE would do
along with the traditional Notebook environment which a lot of beginners these
days start with. Thus much less friction while getting started.

~~~
prakhar987
I would be really hesitant to comapre Jupyter Notebook to an IDE.....an
example is a debugger...the only visual debugger that i have come across for
jupyter is pixie debugger, which is miles behind the debugger of an IDE like
Pycharm.... there is a huge list of features that jupyter needs before you can
compare it to an IDE

~~~
d0mine
It is an interactive environment (not much use for a debugger).

~~~
erikgaas
FWIW I use the %debug magic command in Jupyter and it has been a great
experience. I'm pretty ignorant of the enterprise debugging tools so take that
with a grain of salt.

~~~
y4mi
Debuggers are only really useful if you're trying to figure out why some
object in your server doesn't do what you want it to.

I'd wager that almost no data scientists write object oriented code.. it's
probably mostly done one calculation at a time. executed in the notebooks
repl. So the value you get from ide debuggers is tiny, as you're already doing
everything one step at a time.

~~~
bonoboTP
You still write functions and may want to inspect variable state in the middle
of function execution.

~~~
applecrazy
Correct. RStudio has this feature, where variable values can be inspected in a
sidebar. This would be a really useful feature for Jupyter, especially when
running a Python kernel.

~~~
timdumol
There is a JupyterLab extension for that: [https://github.com/lckr/jupyterlab-
variableInspector](https://github.com/lckr/jupyterlab-variableInspector)

~~~
bonoboTP
Does it work with variables that are local to a function? I don't mean
inspecting global variables _after_ having executed a cell, but local
variables in the middle of a function execution.

------
prepend
This is really cool and I’m impressed by the jupyter team. My favorite part is
that it’s such a good product that beats the commercial products because it’s
hard to figure out, I think, commercial models that support this wide range of
collaborators (people who view once a month to people who author every day).

I was trying to read about whether jupyterhub is included in RStudio Connect,
or if they are competing products.

~~~
javierluraschi
You can use RStudio Connect to publish Jupyter notebooks, see
[https://blog.rstudio.com/2019/01/17/announcing-rstudio-
conne...](https://blog.rstudio.com/2019/01/17/announcing-rstudio-
connect-1-7-0/)

~~~
prepend
Thanks. That’s why I’m trying to figure out if it’s actually jupyterhub under
the covers so they’ll get these new features. Or if they are competing and
RStudio will have something similar or I have to check their dev schedule.

------
rhizome31
Congratulations! JupyterHub is a great project with high quality code and
docs. Looking forward to try the named servers feature as I run a JupyterHub
instance that spawns servers inside containers based on a single image which
inevitably tends to grow as I add libraries. Being able to manage multiple
servers should allow me to split the image into smaller specialized images.

------
victornomad
Any recommendation on how to setup an environment with Jupyter and
nvidia/cuda? Is it worth to use a docker container? or installed system wide?

~~~
patall
At work we use conda which works great, though nvidia libraries are usually
installed by admins system wide. But for anything python related, conda is
perfect.

~~~
victornomad
Thanks for the answer :) what is the advantages of using conda vs installing
the packages by hand? Tried to document my self a bit before but I still dont
know the real advantages in a modern Linux system

~~~
ianbooker
Conda is "just" a distribution and package management tool. As a beginner it
is a good idea to just go with it, but installing all packages using pip is
just as good, in the end. For reference:
[https://www.xkcd.com/1987/](https://www.xkcd.com/1987/)

~~~
jhbadger
In bioinformatics there is a trend for systems like qiime2 to be basically
impossible to install via pip and to have conda and docker be the only
options. In part this is because rarely do bioinformatics pipelines rely only
on python but on a multitude of existing programs that need to be installed.

------
anonu
When will jupyter have highlight to execute?

~~~
rhizome31
The Script of Scripts project adds this feature :
[https://vatlab.github.io/sos-
docs/doc/user_guide/multi_kerne...](https://vatlab.github.io/sos-
docs/doc/user_guide/multi_kernel_notebook.html#User-Interfaces-)

------
darsnack
Great news! Question for the fellow Jupyter Hub users: how do I expose users’
Conda environments to them?

~~~
moonbug
Install ipykernel or irkernel packages in the environments you want juoyterlab
to know about

~~~
rhizome31
You'll also need to register it against Jupyter.

For Python :

    
    
        $ ipykernel install --user --name myenv --display-name "My environment"
    

For R :

    
    
        > IRkernel::installspec()
    

As mentioned previously, nb_conda_kernels allows to automate this step.

------
smortaz
Congrats to the team! This is a major productivity milestone for teams using
Jupyter.

~~~
ianbooker
Jupyter Notebooks / JupyterHub and BinderHub are, in my humble opinion, most
relevant for the future of education, as a tool for teaching, for science, as
a format for replication, and for everything in between, see the data
sciences. And even more!

------
jcims
How would JupyterHub compare with something like AWS SageMaker or EMR
notebooks?

~~~
hogu
First of all, AWS SageMaker is really a ML system that happens to include
Jupyter notebooks as a component. But if you are talking about just the
Jupyter notebook part, then I would say - you could use JupyterHub to build
your own implementation of SageMaker (you would want to use kubespawner and
some deployment of kubernetes if you wanted to scale to multiple nodes). For
example, I run [https://www.saturncloud.io/](https://www.saturncloud.io/), and
we orchestrate JupyterHub to do just that.

JupyterHub is more flexible - for example, you could deploy JupyterHub to one
beefy server and have Jupyter deployed for many users, which could all read
data from a shared filesystem. that kind of thing is not easy to do with
SageMaker since everything runs on a separate ec2 instance.

I can't comment on EMR notebooks.

------
LeicaLatte
Congrats on a great product!

