
Python data tools just keep getting better - datascientist
http://strata.oreilly.com/2013/03/python-data-tools-just-keep-getting-better.html
======
Homunculiheaded
As someone who uses R for just about all of my ML/data analysis needs I'm
surprised not to see Theano[0] mentioned. SciPy, SciKit-learn, pandas etc are
great and all, but there's not much really different than what you get with R
(except of course having it all in a general purpose language). But Theano
(plus it's related deep learning tools) really stands out for me as something
the R tool chain can't compete with.

I feel like eventually I should become as fluent with SciPy/Scikit-
learn/pandas as I am with R, but learning Theano well is much higher on my
list.

0\. <http://deeplearning.net/software/theano/>

~~~
lightcatcher
If you're interested in theano primarily because of deep learning, I highly
recommend you check out pylearn2 (not too much documentation, but docs here:
<http://deeplearning.net/software/pylearn2/> and source on github here:
github.com/lisa-lab/pylearn2 ).

Pylearn2 is a set of deep learning algorithms implemented with theano. The
LISA (deep learning) group at the University of Montreal (same group that
created theano) maintains this library and puts a lot of the code they use for
their papers in pylearn2. pylearn2 thus makes it quite easy to use a lot of
state of the art algorithms, such as maxout.

~~~
Homunculiheaded
Thanks! This is exactly the sort of thing I've been looking for. I really want
to experiment with some deep learning techniques for some problems I have, but
the start up cost of "oh yea you have to build all the tools yourself right
now" keeps putting me off.

------
piqufoh
Whilst O'Reilly books just keep getting worse. Seriously, "Scipy and Numpy" is
typo ridden and so simplistic it's irrelevant and "Python for Data Analysis"
should really be called "Mainly Pandas 'cos I wrote it, plus a chapter about
IPython".

~~~
wesm
This is really a mischaracterization. The point of the book was to address
data tooling topics (and the bare essentials: IPython and NumPy and
matplotlib); for most data tasks (especially Chapters 7, 9, and 10) I would
challenge you to replicate all of the data work without using pandas and
_then_ come back and snark on HN about how I'm self-promoting, or whatever.
The truth is, it's the only game in town for complex structured data
manipulation in Python, unless you want pages and pages of spaghetti nested
dicts and lists.

The sales numbers already show that the book was timely and relevant to a huge
(> 10,000 so far) number of people.

~~~
piqufoh
Wes, there needs to be a great book about Pandas as it's an amazing tool, and
you've written it. But Data Analysis is a HUGE subject and passively implying
that it can only be tackled with Pandas is misleading. My post was intended to
be a snark at O'Reilly's general failing quality - I'm sorry if I put your
nose out of joint.

~~~
darkxanthos
I do data analysis and pandas is a huge part of that. He can't usefully write
about the whole universe of data analysis. The part he wrote about is
definitely quite relevant.

------
peterjs
As for the integrated stack problem, I can wholeheartedly recommend the
Enthought Python Distribution [1]. I've used it on both Windows and Mac. It
includes all the important libraries for linear algebra, matrix computation,
and visualization (SciPy, NumPy, matplotlib, etc.). So it is a great
replacement for Matlab, which probably falls short of any programming language
known to mankind.

[1] <http://www.enthought.com/>

~~~
hogu
That would have been my answer a year ago as well, however we've (continuum
analytics) put out our own distribution now which gives you much more - all
for free.

<https://store.continuum.io/cshop/anaconda>

~~~
cridal
How does this distribution play with the outside world? Will I be able to
install other third party libraries into the distribution? Does it have its
own 'installation procedure'?

~~~
hogu
So, we ship anaconda with our own package management system, called conda.
Conda is open source and was created because for scientific python, we need to
manage versions of non-python libraries (blas/atlas/mkl,libhdf5, etc..). But
you don't need to use conda if you don't want to. Conda is just what we use to
install stuff. You can use pip on top of Anaconda if you would like, and we
even have functions to turn whatever you did to your anaconda environment into
a conda package if you wanted to.

------
niggler
Anyone have experience with an f2py (fortran to python translator)? How does
it compare to translating to native C?

~~~
paddy_m
What is your use case?

Have you looked at Numba? There are guarantees about memory that can be made
with Fortran that can't be expressed with C. Numba is trying to bring the same
type of optimizations to NumPy python code, because it does know some of the
memory constraints.

Note: I work for Continuum Analytics.

~~~
tocomment
So I watched the numba talk at pycon. What I still don't understand is does it
speed up any python code or only code that uses numpy? How does it know if
you're using numpy?

~~~
hogu
it's a numpy-aware compiler - you either tell the jit decorator the types of
the arguments the function will be called with, and that information is used
in the compilation. It doesn't have to be NumPy arrays, but the type
declaration mechanism does know about them and can optimize around that.
Similar to how providing type information can allow cython to provide
efficient C code, providing type information on the decorator allows Numba to
generate efficient llvm byte code.

There is also an autojit decorator that watches what you're calling the
function with, and compiles it for the given type signature.

------
peterjs
Are there any similar tools available for Ruby? When I've checked it a few
years ago, there were some projects at very early stages; hardly comparable to
their Python counterparts.

~~~
synparb
There's <http://sciruby.com/>, but Ruby is really missing any sort of
scientific ecosystem that's at all comparable to Python.

------
tvst
This is the first time I hear about bokeh and bokeh.js. They sound super
interesting.

Does anyone know where I can find a demo of bokeh.js plots?

~~~
hogu
In the bokehjs repo, we have some unittests which demonstrate the capabilities
- thus far we've focused mainly on getting things ready for demos, and we've
SORELY neglected documentation, ease of getting others setup. But now that we
have some downtime after PyCon, we'll be working on this very soon. So I guess
what I'm saying is, wait a week, and those demos will be available. If you're
really eager, you could clone the repo, but it's quite hard to get up and
running right now. <https://github.com/ContinuumIO/bokehjs>,
<https://github.com/ContinuumIO/Bokeh>

------
aet
What about something like Mahout?

