
Introducing HipHip (Array): Fast and flexible numerical computation in Clojure - trevoragilbert
http://blog.getprismatic.com/blog/2013/7/10/introducing-hiphip-array-fast-and-flexible-numerical-computation-in-clojure
======
adrianm
I do not believe this is the right approach to the problem, but I do
appreciate the problem you're trying to solve here. However, in my opinion -
Clojure libraries shouldn't be trying to reinvent the wheel. If your goal is
to expose a better interface for vector arithmetic in Clojure - write a
library that does that really well.

But if your primary concern is performance, please don't roll your own vector
or matrix "native" interface. You will certainly never come close in speed to
what has come before (BLAS implementations galore, et al). Also it's just a
lot of work that is basically keeping you from working on the higher order
problems out there that we desperately need to tackle.

If your goal is more "Clojurey" syntax then just spend a day or two wrapping
the functions you want over a tried and tested numerics implementation.
Additionally, there is likely a pre-existing Java wrapper which does just that
for whatever you need considering that Java is still beloved by university
professors, a key demographic for fast math libraries.

On the other hand, I think Vertigo ( github:
[https://github.com/ztellman/vertigo](https://github.com/ztellman/vertigo) )
is taking a very interesting approach to the Clojure->Native problem, which I
believe might be of use to any library wanting to bring performant numerics to
Clojure. Unfortunately, ztellman has deprecated his OpenGL and OpenCL
libraries, but I think that Vertigo in combination with OpenCL and the kernels
courtesy of clMAGMA would be fantastic.

~~~
w01fe
> If your goal is more "Clojurey" syntax then just spend a > day or two
> wrapping the functions you want over a tried > and tested numerics
> implementation.

This is exactly what we're trying to do: provide some Clojure macros that give
nicer syntax for interacting with Java arrays with high performance. We're
explicitly not introducing a new vector type.

Most of the work here wasn't in the wrapping -- hiphip itself consists of very
little code -- but in figuring out what's fast and what's not, documenting
this, and making it easy to do things the fast way.

~~~
jamesjporter
I think what the GP is arguing is that if you want to be really fast you
should forget using Java arrays and just wrap BLAS, LAPACK, etc., which are
written in "close to the metal" languages, optimized within an inch of their
life, have been around for decades, and are used by others with similar goals
(numpy/scipy, etc.). As the GP says, Java libraries that already do this are
probably available, so this may be a pretty trivial task.

~~~
rplevy
I don't know why no one has answered this (it is brought up a few places in
this thread) but if I had to guess why they didn't want to go this route I
would say it's the trade-off of not not having your data be native. They
presumably have a somewhat highly involved pipeline/topology of computations
that data flows through. In the interests of good readable and maintainable
code, having a nice declarative data representation is a big plus, and doing
the computation with native Java data structures is apparently fast enough for
their needs.

------
vemv

        All arithmetic operations on these boxed objects are
        significantly slower than on their primitive counterparts.
        This implementation also creates an unnecessary intermediate sequence
        (the result of the map), rather than just summing
        the numbers directly.
    

Clojure's Reducers framework might address the described issues in a future
when, in Rich's words, "those IFn.LLL, DDD etc primitive-taking function
interfaces spring to life". For now, they only solve the intermediate-
collections part of the problem.

~~~
w01fe
We're also anxiously awaiting this -- it seems with gvecs and reducers and
primitive fns the pieces are all there, we just need the glue to put them all
together. Unfortunately, for now I think we're stuck with arrays, and we're
trying to make the most of it :)

------
w01fe
One of the authors here. We're excited to hear your feedback on hiphip, and
will be around all day to read feedback and answer questions.

~~~
stevoski
Love the name. That sort of pun brings a smile to my face.

~~~
w01fe
Wish I could take credit, but I think all praise (and groans) must be directed
at @aria42

------
peatmoss
Looks awesome! One data issue I've seen go relatively unaddressed in the
Clojure community is the serialization of big matrices and arrays.

There's a start on a clojure hdf5 (hdf5 is a container format common in
scientific circles) implementation, but it's a long ways from done.
[https://github.com/clojure-numerics/clj-hdf5](https://github.com/clojure-
numerics/clj-hdf5) I'm not the author, but I am the negligent steward.

I'd love it if someone smarter / better at Clojure than me was interested in
helping to think about useful, idiomatic high-level abstractions on top this
high-performance data store.

PyTables does a great job of making gobs of hdf5 data easy to work with for
analysts--I'm just too novice at Clojure/FP to know what is a reasonable
analogue for Clojure.

~~~
prospero
Without knowing anything about hdf5 specifically, Vertigo [1] will let you
treat a memory-mapped file (or a piece of one) as a normal Clojure data
structure, as long as the element types are fixed-layout.

[1] [https://github.com/ztellman/vertigo](https://github.com/ztellman/vertigo)

------
51Cards
Love the project et. all, this would be very helpful!

I have to comment on the name as well... brilliant. Kudos for something
creative that has already stuck firmly in my mind.

~~~
aria
Have any icon ideas? Was thinking ["hip","hip"] or hiphip[]

~~~
Historiopode
Something along this idea[1], but properly drawn by a designer in more than
three minutes?

[1] [http://i.imgur.com/DR2w9Jl.png](http://i.imgur.com/DR2w9Jl.png)

~~~
k4st
This seems like one of the 10x productivity / mastery cases where your three
minutes and creativity have produced something that would take me ages, even
if you set me to the task of duplicating your design.

Are you willing to spend another 3 minutes producing a logo for an unrelated
programming project? :-P

------
netshade
Cool library, can imagine how moving away from boxing / unboxing can be a huge
boost for them.

I've been looking for something that gave SIMD intrinsics to Java programmers
- does anyone know if such a thing exists? Could be a nice addition to this
lib.

~~~
fiatmoney
You can't, unless you write it as native code, put your data in direct NIO
buffers, and go through the JNI dance.

~~~
netshade
Ah well, had hoped there might be an already made thing out there. Thanks!

------
Historiopode
What brought you to develop this library rather than relying on Incanter/Colt?
The scope of HipHip seems different, of course, but there is enough of an
overlap to warrant the question.

~~~
aria
I could be wrong, but I don't think Incanter has any Clojure-native means of
generic operations over arrays at all.

~~~
adrianm
Incanter's default Matrix implementation is now Catrix as well - which is a
Clojure friendli(er) wrapper over jBLAS matrices.
[https://github.com/tel/clatrix](https://github.com/tel/clatrix) Check out the
source.

------
mjw
Did you guys look at the core.matrix API?

[https://github.com/mikera/matrix-api](https://github.com/mikera/matrix-api)

~~~
w01fe
We did, and we've been talking to the developers about a potential future
collaboration. Our goals are really complementary; hiphip is about getting
_your_ code into the inner loop of Java bytecode (not just a set of canned
operations), whereas core.matrix is about abstractions for a fixed set of
operations across different matrix types. There may eventually be overlap, if
core.matrix gets into compiling expressions into new operation types, which
sounds like something they're interested in.

~~~
Mikera
core.matrix developer here :-) we're definitely looking at expression
compilation. Also hiphip could be very useful for writing fast core.matrix
implementations for the standard API. So definitely good room for
collaboration.

------
fiatmoney
One thing I've found is that with macros, it can actually be easier to write
performant primitive-reliant code. Still not up to Common Lisp standards, but
much better than, eg, having to use a scripting language to generate all the
primitive specializations of your data structure, like Trove and Fastutil do.

~~~
aria
Indeed, the core logic of HipHip is the same for all primitive types and
macros generate type-hinted versions for each primitive type.

------
tick113
Having written my own naive Clojure dot product, I can definitely appreciate
what you guys have done!

Any plans to attack sparse vectors? Performance on the sparse vector
operations I wrote was poor, but being new to Clojure it wasn't a great
implementation.

~~~
w01fe
We have sparse vector code built on hiphip that's slated for open-source
release down the road (once we get the resources to polish it) -- stay tuned!

------
bryansum
FYI: your link to the GitHub project halfway down the article is broken.

~~~
w01fe
Oops, thanks for letting us know! Fixed now.

------
wavesounds
I just got it 'Hip Hip Hurray' ... haha :-)

------
zinxq
As an interesting aside, you can nearly double the speed of your Java loop by
unrolling it a few times. (at least it did that for me in JDK 7)

~~~
w01fe
Huh, cool! I kinda assumed the JIT already took care of this sort of low-
hanging fruit, we'll test this out and if it works include it in the next
version of hiphip.

~~~
pkolaczk
JIT in fact does pretty aggressive loop unrolling, however probably there
might be some edge cases where it doesn't.

