
How Powerful Are Microsoft Azure’s Free Jupyter Notebooks? - ptoniato
http://www.walkingrandomly.com/?p=6351
======
smortaz
Nice to see this on HN! For folks that don't know what this is... it's a free
hosted Jupyter service, provided as a "thank you" to the Python, Jupyter, and
general OSS community by the Microsoft Python team.

Regarding the perf #'s - just a heads up that you were probably on that VM all
by yourself :). Though once enough people sign in or cpu threshold gets to a
certain level, new VMs are allocated.

PS On a related note, we are wrapping up at PyCon in Portland, and one of the
hit swags was the "Jupyter Notebook Notebook" :). See pics here:

[https://goo.gl/photos/L9C4fq6AsPxU7bfq5](https://goo.gl/photos/L9C4fq6AsPxU7bfq5)

/disclaimer: team lead/

~~~
marmaduke
The benchmark here was probably calling into Intel's MKL, which automatically
determines the number of cores to use based on matrix size and CPU type.

How well does it behave when you've multiple VMs running on the same hardware?

Also, is it possible to setup prebuilt environments on your platform?

~~~
smortaz
Hi -

* It's possible - Anaconda comes with an MKL enhanced version of its math libs.

* There are multiple VMs, and each VMs hold multiple docker containers, one for each Library/user (library == collection of notebooks)

* Right now, no, but we are working on it. You can have a "prep" notebook where it readies your environment with !pip, !wget, ... etc. and then actual work notebooks. We'll soon have an initial "install.sh" that will be run upon start to run any prep steps you might have.

thanks!

------
highd
Sort of off-topic, but for me Jupyter is 90% of the way there to dominating my
data analysis / ML work-flow, but that last 10% kills it for me:

\- Weird behavior when disconnecting / reconnecting to sessions, especially
from multiple computers.

\- Tendency to flake out on long running jobs, i.e. 2 hours of the way through
a 4 hour algorithm something dies and I have to restart or run from terminal.

Unfortunately this relegates it to exploratory viz for me, but maybe that's
the intended use case anyway. But when I've wanted to build semi-persistent
dashboards or check in on running jobs I've had better luck with ssh+screen
and then dumping pdfs of results with matplotlib to files that I serve from a
webserver with a little auto-refresh javascript wrapper.

~~~
marmaduke
What versions are you running? Jupyter or at least its predecessor IPython
notebook have always been rock solid for me, running for days and days.

That said, I always made sure to save before disconnect and refresh on
reconnect.

------
partycoder
Jupyter Notebooks are great since you can have put your experimental data and
share it in an interactive format. They have the potential of becoming the de
facto standard for scientific collaboration.

But what if you add a library where you can only load data from Azure ML
Studio? then you cannot share your notebook anymore. Your notebook got tainted
with proprietary stuff from a specific vendor...

Science is about being able to universally reproduce experiments, and vendor
lock-in prevents that. We already have enough problems in scientific
publishing with journals.

So if you like Jupyter, and you want it to become the standard science needs,
avoid proprietary extensions. Let's not go back to share stuff in paper or its
modern equivalent, PDFs.

~~~
smortaz
Your concern is valid. One easy way to avoid depending on a particular cloud's
APIs is to either wrap them with your own, or just grab data from other
neutral locations, such as github using !wget, requests, etc. There are
pluses/minuses to each approach.

------
em500
You can actually run shell commands with '!' in the ipython notebooks on Azure
and do a dmesg or lscpu.

~~~
smortaz
Also, click on the Jupyter logo, then Select New, then Terminal, and you get a
full bash prompt into your instance.

------
lph
I wonder how Microsoft treats the GPL3 licensing of ZeroMQ (used by Jupyter
for client-server communication)?

In my workplace, Jupyter needs special approval and can only be installed in
limited environments because the company is afraid some developer will
inadvertently do something that contaminates our code with GPL3.

~~~
koolba
Why would the GPL apply to works created with Jupyter? I understand if you're
redistributing a commercial version of it, but using it would be akin to using
GCC no?

~~~
lph
My understanding is that works created with Jupyter wouldn't have a problem,
but that the admins are just trying to minimize the footprint of GPL 3
libraries on our systems. It seems pretty knee-jerk to me, but then I'm not an
expert.

~~~
gregmac
In a technical audit during acquisition, for example, any use of GPL warrants
extra scrutiny. Even when you're complying it is an extra bunch of work, often
with lawyers reviewing, so many people would rather avoid that if possible.

------
Spacemolte
[http://webcache.googleusercontent.com/search?q=cache:fBwfhmg...](http://webcache.googleusercontent.com/search?q=cache:fBwfhmgfm-
EJ:www.walkingrandomly.com/%3Fp%3D6351+&cd=1&hl=da&ct=clnk&gl=dk)

~~~
myhrvold
Thanks! Figured there was too much traffic when the main link was unavailable
a few min earlier. Was wondering whether it was a direct view of a notebook
like via nbviewer (in which case, would be ironic.)

