

Blaze: Next Generation NumPy - freyrs3
https://speakerdeck.com/sdiehl/blaze-next-generation-numpy

======
bravura
Aside: What is the appropriate website to host _video_ or _audio_ of talks
with slides?

What I don't want: Just slides no audio. I also don't want video of the talk
uploaded to youtube, because sometimes the slides aren't visible in the video.

My talks involve slides that make no sense without the audio. If there is
video capture of the talk, it's nice to include some of the video too. But I
guess I would be content with just audio + slides.

I do not want to have to do complicated video editing; I want an existing
solution for this problem.

[edit: clarifications]

~~~
lucian1900
Infoq do this, but I'm not sure if they accept arbitrary content.

~~~
riffraff
E.g. [http://www.infoq.com/presentations/Taming-Effect-Simon-
Peyto...](http://www.infoq.com/presentations/Taming-Effect-Simon-Peyton-Jones)

I love this, and every time I am pointed to slideshare or speakerdeck I hope
they have implemented something like this :(

------
iskander
Very cool! I'm impressed at all the energy and fast pace of development coming
out of Travis Oliphant's new company. Also, I'm curious about how this will
interact with numba. Blaze seems to have a much richer type system and lazily
constructs AST trees. Numba, on the other hand (as far as I can tell), does
function-at-a-time specialization using much simpler array & scalar types. How
will these two different schemes interact?

~~~
pwang
Numba actually uses the Python AST, so it's not too different. Currently it's
also using Mark Florisson's Minivect internally to reason about Numpy arrays
and array expressions. This will be much easier when Numba encounters a Blaze
array, since the value transformations will be expressed in the AST, and the
index space transformations are more explicitly represented in the Blaze
expression graph.

Both projects are designed to work very well with each other, although both
have applicability outside the region of integration. Numba will compile
regular Python functions that consume Numpy arrays, and will not require
Blaze. Blaze will produce lazily-evaluated expression graphs and dataflows on
disjoint and out-of-core arrays, and can evaluate these efficiently without
Numba. (This is what the Array VM is all about; you can think of it as an
improved Numexpr.)

------
symmetricsaurus
I don't get it. Am I supposed too just look at the slides from a presentation
and get what it's about? If the presentation is any good that will be quite
hard.

------
mjw
This is really interesting although a little dense/cryptic as slides. Does
anyone have a link either to a slightly more leisurely exposition of the
theory behind it, or perhaps a more user-focused summary of what kind of
features and optimisations this new magic might enable?

As I understand it from a quick non-expert skim (please correct!): the idea is
to use a fairly flexible and expressive metadata language under the hood to
specify the physical memory layout of a multi-dimensional array -- more
expressive than numpy's existing support for strides. Sounds like this will
allow arrays to be reshaped and joined together without copying, support
arrays larger than core memory, etc.

I wonder if the language is too powerful, if this could make memory management
(e.g.: can I free this contiguous chunk of memory?) quite hard to reason
about.

Also: seems like this makes it easier to push the cost of array layout
manipulations in a somewhat lazy fashion from the producer onto the consumer
of results. At which point does one decide that traversal is now excessively
complicated and the thing should be copied into a flat layout. Especially
since this might affect one's ability to use certain BLAS routines, take
advantage of certain hardware optimisations etc. Is the idea that a compiler
makes this decision for you, or you make the decision explicitly or
implicitly?

Interesting stuff anyway, look forward to hearing more about it!

~~~
pwang
> Does anyone have a link either to a slightly more leisurely exposition of
> the theory behind it, or perhaps a more user-focused summary of what kind of
> features and optimisations this new magic might enable?

No, but those will be forthcoming as we build out more of it. We are targeting
an end-of-November preview release, as the slides indicate, and it will give
people a much more concrete idea of the things Blaze can do.

> the idea is to use a fairly flexible and expressive metadata language under
> the hood to specify the physical memory layout of a multi-dimensional array

Yep, that's exactly right. The challenge is to constrain this is that it is
not too general, i.e. enters into PhD research land, but still useful for a
large number of use cases.

One of the core ideas is to more coherently represent the metadata about
location, locality/compute affinity, and index spaces. Numpy has the
beginnings of some of these ideas in its various flags and View object
semantics, but it's all stuck in its very rigid, contiguous-memory origins.

> At which point does one decide that traversal is now excessively complicated
> and the thing should be copied into a flat layout.

This would be a great Master's research project. :-) The hope and the belief
is that one can punt on resolving this in generality, and still solve a number
of interesting, concrete problems in an efficient way.

~~~
mjw
Nice one. Sounds like you guys have thought hard about the right balance
between expressivity and allowing these structures to become excessively hard
to reason about or traverse.

Incidentally I found the use of topological manifold terminology (atlas of
coordinate charts, homeomorphisms, ...) in the slides quite intriguing, are
there topological connections here or is that more just an analogy?

~~~
freyrs3
There is a connection, we can view mappings between linear memory spaces to
higher arrays constructs as a series of coordinate transformations in the same
way we deal with transformations between charts on manifolds. Except in our
case our charts are necessarily disjoint, so the boundary conditions on charts
are trivial.

------
33a
This is so cool! I think that something like this could really extend the
power and flexibility of NumPy, and having some one like Travis Oliphant
behind this project gives it a lot of credibility. I am excited to see what
comes out of this next!

------
tomrod
I sometimes think I know Python/NumPy, but then I see Travis' company projects
and I'm blown away.

Good work, Continuum Analytics!

------
darkarmani
This is from his PyData talk. All of the talks are being recorded, but I'm not
sure when they will be released: <http://nyc2012.pydata.org/abstracts/>

~~~
omni
One of the Continuum guys said to expect "a week or so."

------
hogu
Hey all - apologies, most of the Continuum people are busy travelling from
PyData and dealing with hurricane sandy travel plans, we'll be commenting on
these discussions shortly.

------
dbcooper
Any benchmarks for this? (I didn't find any via google.)

~~~
hogu
implementation is not done yet, it's more about the proper mental model for
generalizing numpy

~~~
carterschonwald
a lot of the design ideas are pulled from some great research
[http://justtesting.org/repa-3-more-control-over-array-
repres...](http://justtesting.org/repa-3-more-control-over-array-
representation) (that paper is the most current relevant one, though theres a
few others).

chatted with stephen about Repa and hpc code engineering the evening before
the talk, its all haskell ideas ported to python land under the covers :)

(as the slide deck admits!)

~~~
pwang
> its all haskell ideas ported to python land under the covers :)

Well, that's not _entirely_ true. It's based on many of the same ideas which
underlie some of the good Haskell ideas, and have a natural expression in
Haskell because it's a type algebra on index spaces.

An incomplete list of the inspirations: Paralation Lisp, Merrimac, APL / J /
K, Manticore, FORTH, Chapel & ZPL (and VCODE), SISAL, OLAP & MDX, LINQ,
various PGAS approaches, etc., etc.

------
wglb
Cert says opendns. Problem?

~~~
seiji
speakerdeck had wobbly DNS for a while. Sounds like you're using opendns and
it hijacked a failed lookup?

~~~
wglb
Not using opendns that I know of.

