

Passing the torch of NumPy and moving on to Blaze - cing
http://technicaldiscovery.blogspot.ca/2012/12/passing-torch-of-numpy-and-moving-on-to.html

======
carterschonwald
Congrats to Travis and the rest of the Continuum analytics team on the Darpa
XDATA funding!

As someone working to build tools in the same space as Continuum (and perhaps
as a competitor), having your competitors (Continuum) be intelligent, nice,
interesting folks who really understand the problem domain is pretty darn
great.

Point being: the numerical computing / data analysis landscape is going to be
seing a lot of great tools emerge and/or mature over the next year, and I have
no doubt that 30-50% of them will be coming from Continuum Analytics. [edit:
to the substantial enrichment of high level tools for extending numerical
within python / and likely generally!]

I can only hope that I execute my tool building work at WellPosed well enough
that I can call them a competitor for years to come!

------
xaa
As someone who has dabbled in using Python for numerical computing in several
small projects, I wonder: what would be the motivation for further investment
in Python as a numerical platform, considering all Python's problems with
concurrency.

Real threads will never come to Python. MPI is a real pain unless you are
running very large computations. This will only become more true as time
progresses. Am I missing something?

~~~
dagw
Is concurrency really that important for numeric work? Surely parallelism is
what you care about.

Many numpy primitives are already parallel since it basically just hands off
to your BLAS library. Beyond that there is numexpr which is really good at
doing parallel evaluation of large array expressions. If your problem isn't
solved by any of these, there are other powerful solutions like IPython,
Parallel Python and even multiprocessing from the standard library

If you need even more performance, cython has some support for semi-automated
parallelization, and if all else fails drop down to C and use OpenMP or
whatever else you like.

So while concurrency is a problem in python, numeric parallelism is an area
where many good solutions exist.

~~~
hippyloopy
Why should I have to revert to a C library in order to do anything in Python?
What's the point of using Python if every time I want to do something in
parallel I'm going to have to write a C library?

Python people have their heads in the sand! If we have a hundred core
processors, running Python on a single core is not going to be a tractable
solution to any problem. Your BLAS may be parallel, but any time you go back
into the Python driver code suddenly it's a massive bottleneck.

Distributed Python is a messy hack that wastes all the amazingly tuned shared
memory support in the processor. Writing C extensions goes against the whole
point of using Python.

"if all else fails" The problem with Python is that the moment you want to do
something in parallel, which in the next decade will be everyone, "all else
fails" is your starting point!

~~~
StefanKarpinski
In my view, this is the strongest evidence that Matlab, SciPy, R, etc. haven't
found the right abstraction level for numerical computing. The high-level
language is supposed to be the abstraction, yet in these systems you
continually need to break through that abstraction and code in C for
performance and scale. That's not a very good abstraction. This problem is
precisely what Travis Oliphant and his team are tackling with Numba and Blaze,
but it remains to be seen if they can produce a better abstraction.

If you're willing to try another language altogether, Julia
[<http://julialang.org>] is a general purpose language with enough performance
and expressiveness to be an effective abstraction layer for numerical
programming – you never have to dip into C for speed, scale or control. In
developing the language, we haven't allowed ourselves to resort to C –
instead, we've worked at making Julia itself fast enough to implement things
like I/O, Dicts, Strings, BitArrays (packed 8 bits-per-byte boolean arrays),
etc. – all in pure Julia code while getting C-like performance.

~~~
srean
Indeed. I have pretty much stopped engaging with the standard dialog repeated
ad-infinitum that goes along the lines of "code the bottleneck in C", "GIL is
a non-issue, just use parallel processes".

For some workloads, the latter is actually a good advice, but for my typical
use case that does not help. These would be tight'ish loop wrapped around a
fork-join. Shared memory handling can be quite clunky in numpy, and if you
want to do message passing, the overheads bleed off any advantage that
parallalelism ought to have given you. I dont mind the message passing
abstraction, just that the overhead for doing it in python/numpy is too much.
About the former, one major motivation to use numpy et. al. was to not use C
with its explicit indexing over arrays. Its both verbose and error prone.

It is never pleasant to drop into a different language, though it is much much
better than how bad it could be, thanks to Swig, Cython, Weave. Contrary to
common wisdom I prefer Weave because of its much succincter syntax. In Cython
I am back to writing C again but with a different syntax. This is not a
criticism of Cython, its an excellent tool and it is much much more pleasant
to parallelize from Cython than from Numpy/Python.

Julia looks pretty good. I have one suggestion: The best way to get speed out
of Julia is not to write vectorized expressions but to writeout explicit
loops. Thats a little unfortunate because though vectorization constructs
evolved out of the necessity to avoid loops (which was slow in the older
languages), it did have an excellent byproduct of succinct code. Ideally I
would like to retain that.

