

Introducing Wakari - Scientific Python in the cloud - paddy_m
http://continuum.io/blog/introducing-wakari

======
mileswu
I can see this being useful for teaching environments, but I'm not so sure how
useful this would be for actual scientific research.

Most universities tend to have their own computing cluster, which will make
storing the (potentially large) datasets easier, and furthermore use of the
CPU time is free!

Javascript ploting for the next set of graphs I need to produce for my
research though is something I'm actually pondering about. It has the ability
to be interactive which can be really helpful.

~~~
draven
Useful for heavy computation on smallish datasets, but not for moderate
computation on big datasets. Here at work the datasets are several gigabytes
so this would not be a solution.

After reading the blog post this looks like an enhanced IPython notebook with
some additional packages. It could be a good platform to test-drive their
technology though.

(I hope this comment does not come across as being too negative --- the tech
is cool --- but I really wonder what the use case for this tool is.)

~~~
pwang
_> Useful for heavy computation on smallish datasets, but not for moderate
computation on big datasets. Here at work the datasets are several gigabytes
so this would not be a solution._

This is just a closed beta for the very first initial version of the product.
We absolutely intend to make it a powerful platform for computation on big
datasets.

Many people are storing data in S3, and many datasets are sourced from
locations on the web or remote data providers, anyway. With Wakari, you can
have extremely efficient access to those datasets via IOPro
(<http://continuum.io/iopro>), which is adding indexing on S3 data in the next
version.

We will also be adding sharing, collaboration, and publishing features in the
next version. I completely understand that it might not fit every situation
(or your particular situation right now), but I hope that eventually you'll
find these other features fairly compelling.

 _> After reading the blog post this looks like an enhanced IPython notebook
with some additional packages._

Not the case at all. We provide a fully sandboxed Linux environment,
accessible via the browser, with a complete installation of Anaconda Pro. The
IPython notebook is an _additional_ feature of Wakari, and not vice versa.

One thing we haven't showcased very much is the fact that you have concurrent
access to many different Python environments. Python 2.6 + Numpy 1.5? No
problem. Want to try out Numpy 1.7 on Python 3.3? Just change the drop-down,
and you get another shell with that. Want to play around with Travis
Oliphant's Numba compiler for Python
(<https://store.continuum.io/cshop/numbapro>), but don't want to go through
the hassle of installing LLVM, LLVM-Py, and Numba on your own system? You're
just one button click away.

Like I said, I can understand that Wakari isn't going to solve everyone's
problems right now, but it's fairly cool even in this initial version, and
it'll only improve over time. :-)

~~~
mileswu
_> We absolutely intend to make it a powerful platform for computation on big
datasets._

The datasets I run on tend to be several terabytes and all the clusters that I
use utilize various distributed filesystems (eg. Lustre [1] or Dcache [2] or
Xrootd [3]) to store these files across local storage on all the nodes of the
cluster.

I think for big datasets some kind of support for these distributed
filesystems will be necessary as most private (scientific) computing clusters
use them. The complication is of course that to use these filesystems, one
needs to build support for each access protocol into a python module (off the
top of my head I'm not sure if they exist yet; perhaps they do). While most of
these filesystems do offer standard posix 'mounting', this is usually not
recommended for performance reasons.

Perhaps I'm outside of the intended use case anyway. To a certain extent this
problem is solved for my field of research (of course nothing ever works all
the time heh), as all LHC/OSG Computing Grid centers run the same Scientific
Linux distribution with the theory being that any code that works on one
should work on another.

But this sounds great for teaching classes and labs! I look forward to using
it.

[1] <http://wiki.lustre.org/index.php/Main_Page> [2] <http://www.dcache.org/>
[3] <http://xrootd.slac.stanford.edu/>

~~~
pwang
Thanks for the feedback!

You're right that for scientific use cases, the data access story defines mode
and methods of computation. Your use case of having a large dataset stored
across many files on a distributed FS is one of the cornerstone motivations
for building Blaze, next-generation distributed Numpy:

<https://speakerdeck.com/sdiehl/blaze-next-generation-numpy>

<http://vimeopro.com/continuumanalytics/pydata-nyc-2012>

Rather than traditional "load all the data into memory" sort of approaches,
Blaze is inherently out-of-core, and allows the user to define mappings of an
index space onto local or remote files, and then manipulate that structure at
the Python prompt as easily as if it were a small matrix stored in memory.
There are existing PGAS approaches that are similar in spirit, but they tend
to invoke heavyweight MPI machinery or make assumptions about the regularity
or structure of the dataset that is being distributed across the cluster. Our
guidance in designing and implementing Blaze is to make simple distributed
things easy, and hard things possible; we are not trying to solve decades of
distributed computing and linear algebra problems as a start-up. :-)

So, hopefully you can see that with Blaze as your data access mechanism, and
Wakari as a web-based front-end, you should be able to do large scale compute
from within a web browser.

------
w1ntermute
And so continues the trend of software with Japanese names. Wakari, or 「わかり」,
means "understanding" or "comprehension".

~~~
tkf
Actually it's not a complete word. So, it's like "understan" or "comprehen".
Probably because it's still beta.

~~~
w1ntermute
Nope, it actually is a complete word:
<http://tangorin.com/general/%E3%82%8F%E3%81%8B%E3%82%8A>

You probably came to that conclusion from the fact that the word 「分かります」 means
"to understand". Actually, 「分かります」 is made up of two parts, 「分かり」 (the same
word that started this whole discussion), the nominalized form of the verb
「分かる」 ("to understand"), and the polite suffix 「ます」. 「分かり」 (or 「わかり」, as it's
often written when nominalized) can of course be used by itself as well.

> Probably because it's still beta.

From a software branding perspective, that would be a really stupid thing to
do. It would severely damage your SEO, and you'd be stuck renaming all kinds
of resources and files after you go out of beta.

~~~
wesm
Whoa there cowboy. I'm pretty sure that tkf (if he's the same one who's
submitted pull requests to my projects) is from the land of the rising sun.
And his second statement was a joke. And funny IMHO

~~~
w1ntermute
> I'm pretty sure that tkf (if he's the same one who's submitted pull requests
> to my projects) is from the land of the rising sun.

Him being Japanese doesn't automatically make him an authority on Japanese
grammar. The small Midwestern town I grew up in is chock full of white-as-
bleach European Americans whose ancestors have exclusively spoken English for
several generations. Most of them couldn't tell you the difference between an
adjective and an adverb if their lives depended on it.

The same goes for many of the foreign (read: white) English "teachers" in
Japan. Most of them barely graduated from college, couldn't find any other
work, and so they decided to go to Japan to teach English. In actuality,
_they_ should be going back to America and spending a couple years in middle
school remedial English classes.

In fact, just to confirm, I got out my copy of Koujien (an authoritative
Japanese dictionary comparable to Merriam-Webster's Collegiate Dictionary or
the Oxford English Dictionary) to look up 「わかり」. And sure enough, there it is:

> わかり【分り】

> わかること｡さとること｡のみこみ｡会得えとく｡了解。

> And his second statement was a joke. And funny IMHO

Well, that went over my head. If so, then his first statement was probably
meant as a joke as well.

------
protonormal
I'm a big fan of the concept, but this is hardly new (although the interactive
graphs are great)

For anyone who wants something like this, check out the IPython notebook and
Sage:

<http://ipython.org/> <http://sagemath.org/>

~~~
pwang
This is our MVP, so there are a lot of shared features with some other
existing projects. One of our "deep tech" features I don't think anyone else
has is the ability to quickly and dynamically switch between different Python
environments, meaning that you can easily try out new versions of libraries or
test your code with new versions of Python.

We are also iterating on the overall UI and user workflow, to really
facilitate the data exploration & analysis process with Python.

~~~
burcin
lmonade (<http://www.lmona.de>) is another (free) scientific software
distribution that tries to address these problems. The technology underlying
lmonade, basically Gentoo linux, can handle this.

    
    
        $ eselect python list
        Available Python interpreters:
          [1]   python2.6
          [2]   python2.7 *
          [3]   python3.1
          [4]   python3.2

~~~
pwang
Thanks for the pointer! However, note that we offer all of the Scientific
Python stack built against each of these interpreter versions (and against
different Numpy/Scipy versions) as well. :-)

------
jderick
I notice this seems to be a web based portal to a linux environment? If so,
anyone care to explain the advantages over say just offering a VNC server to
the same linux env?

~~~
paddy_m
Great question. This is our first release, and we will be fleshing out
features, here are a couple of advantages to our approach. We will be able to
have much finer grained sharing than is possible with VNC. Eventually you will
be able to publish a single interactive plot, or a single environment to
collaborate on. For many new users, VNC is a non starter because of the setup
involved. VNC works really well for controlling a single computer, but it
doesn't help you much with managing clusters, python execution environments,
sharing or many of the other things that we are doing with Wakari.

------
akoumjian
I'm very excited to give this a try. Congrats on the mvp release.

~~~
pwang
Thank you!

