Grenade: Deep Learning in Haskell

sndean · on May 24, 2017

In the discussion of HLearn [0] the author said this [1]:

"This is a bit awkward for me as I've paused the development of HLearn and emphatically do not recommend anyone use it. The main problem is that Haskell (which I otherwise love) has poor support for numerical computing."

Can anyone expand on this comment, and does that also apply to Grenade?

[0] https://news.ycombinator.com/item?id=14402378

[1] https://news.ycombinator.com/item?id=14409595

nanolith · on May 24, 2017

Haskell has a mature FFI. If there are any computational hurdles, it is possible to drop down to C/C++ to manage these. A good library can easily encapsulate this.

Of course, there is the point where such a library becomes a thin wrapper around an efficient numerical library written in C, but there are other advantages to Haskell that might make such a wrapper still worth the trade: an amazing type system, libraries that take advantage of this type system, and a compiler that generates fairly efficient code. If, in exchange for that, I have to marshal numerical data into and out of Haskell's FFI, I consider it a fair trade.

SomeStupidPoint · on May 24, 2017

Just to elaborate:

Even if all the computation were implmented in C (eg, DAG nodes in a TF style setup), being able to compose the network that forms the computation via Haskell would be a big win for me because it lets me use the type system and various powerful libraries to design my network. (And opens a straightforward path to GPU acceleration of Haskell designed code by using the C GPU libraries.)

_ibu9 · on May 24, 2017

Yeah this is basically how Python is used in ML/data science work. Except Haskell lets you apply solid engineering all the way from the prototype to the production system.

ferdterguson · on May 25, 2017

I've used Haskell a little. I understand the benefits of the type system from the perspective of something like a web app or libraries with abstractions. However, as someone who spends their days working on numerical code in Cython for research software, how would I actually leverage the type system in my numerical code? Would I ever leverage the power of the type system and define my own types?

It also seems like a terrible idea to use recursion to solve multidimensional array problems typically associated with loops and the indexing of those arrays.

axman6 · on May 25, 2017

Recursion is only a bad idea in languages with poor support for recursion, it's not difficult to write efficient Haskell code which produces good assembly with fairly obvious recursive algorithms. Optimisations like stream fusion and libraries like foldl also contribute to really good code generation (not guaranteed and also not generally applicable optimisations, but certainly helpful)

mr_overalls · on May 25, 2017

Does Haskell provide tail-call optimization or similar optimization to take away the stack-space costs of recursive function calls?

Terribledactyl · on May 25, 2017

GHC can, if you want it to. It's hard to answer in a general way because of the laziness, and you can always force evaluation. But then you rob GHC of potentially better optimizations, like never executing it in the first place.

This SO does a good job of explaining the trade offs. https://stackoverflow.com/questions/13042353/does-haskell-ha...

SomeStupidPoint · on May 25, 2017

You generally use the Haskell type system to define the DAG associated with the computation you want to run, and the algorithms in terms of functions applied to those DAG nodes. Since the whole DAG is naturally thought of as a big function composed of many small functions applied to slices of (intermediate) data, you can leverage higher order functions in the construction of the DAG around implementations of the numerical code written in, say, C or FORTRAN. This is actually how a lot of numerics in Python is done (or at least was, the last time I looked in to it heavily).

You'd use the FFI to call out to something like C to actually execute the tensor operation, much like the TF design, where you're constructing a DAG in Python of the dataflow through tensors, which have their implementations in another language.

The type system just gives you a powerful way to reason about the structure of your DAG, define higher level abstractions (eg, autoimplementing certain kinds of networks against a certain input type), or discuss the feedback of the system during training, since training is just a higher order function that takes the entire DAG function as input and returns a new DAG. (Okay, for performance reasons you generally update values, but you can model that easily enough in Haskell.)

So really, basically the same way Python for numerics -- calling C for number crunching and doing algorithm designing in the higher level language.

lallysingh · on May 24, 2017

You don't have to marshal. You can use the inline-c package to wrap all that up for you.

platz · on May 24, 2017

The author of HLearn is actually more interested in discovering a novel way of encoding machine learning problems using new type system features, as opposed to choosing a more familiar encoding that exists in other languages, that Haskell would otherwise have no problems with.

This locates the constraint on the authors ambitions relative to the desired abstraction level, rather than the performance level available

spott · on May 24, 2017

> has poor support for numerical computing.

The issue comes from being lazy by default.

When you are writing lazy functional code, you essentially build up a system of thunks[0] that collapse to a value when you need it. This works great for abstractions like infinite lists, or essentially only paying for the value you need. Unfortunately, they are kind of expensive (I believe they are usually implemented as a tagged union, where they are either a value that is returned immediately or a pointer to another function or thunk) when you don't need the laziness. In any sort of numerical computing, you are usually dealing with enough values that you lose a lot of performance when dealing with these thunks. Working around this is doable with a bunch of strict annotations (so that these thunks are immediately collapsed to values), but that is only half the battle: you still need to reduce the immutable function applications to a mutable version so you don't lose out by copying your values back and forth in memory, and you don't do a bunch of unnecessary calculations. All this requires "a sufficiently intelligent compiler" or a lot of compiler hints and peering into Core: the intermediate language of GHC. It isn't for the faint of heart.

There have been a bunch of work on avoiding some of this: hmatrix[1] is a lapack and blas library for finally doing the calculations once you have reduced the graph, "subhask"[2] is a standard library for Haskell that attempts to add some support for basic numerical work and "numerical"[3] is a library that I believe is trying to make a well typed linear algebra library (I'm not sure if it is going to call into hmatrix/blas/lapack or not). The issue with a lot of this is that until there are good tools for this kind of thing, no-one does serious numerical programming in Haskell, and there are only a few people working on the tooling because there aren't that many people trying to do serious numerical programming in Haskell.

(note: I'm not an expert in this, so don't depend on any of this for love or money. I'm sure someone will come along and tell me how I'm wrong.)

[0] https://en.wikipedia.org/wiki/Thunk, https://wiki.haskell.org/Thunk [1] http://hackage.haskell.org/package/hmatrix [2] https://github.com/mikeizbicki/subhask [3] https://github.com/wellposed/numerical

grumpyprole · on May 25, 2017

Laziness never prevented high-performance array libraries like Vector being implemented. GHC Haskell also now has a module-level pragma that turns on strict-by-default. Python typically doesn't implement any high-performance primitives anyway, it simply delegates to C/Fortran. Something Haskell could do equally well.

aseipp · on May 24, 2017

I think the problem the author of HLearn and SubHask faced isn't really so much of a problem with thunking or whatever. Compiler optimizations and whatnot are always great and avoiding indirection is good (it's also not super hard either to do it manually for 99% of cases). But rather, I believe the problem they were hinting at is that the current Haskell numeric hierarchy is insufficient to capture the patterns we want for the vast kinds of numerical computing we do today. But even then, the tools we need to design the hierarchy we do want are also a bit out of reach, and implementing them will take time.

SubHask for example uses hmatrix and other LAPACK/BLAS libraries under the hood, which is good of course -- but almost all of the actual work in SubHask, then, is around designing a reusable hierarchy that works for all the 'weird' cases you find with this domain. Even 'non weird' things like rank/dimension polymorphic products, etc, require a lot of type-level infrastructure to accomplish today, and the features to support that are constantly evolving. (An example of APL-style rank-polymorphic stuff I typed up recently can give you an idea of how much work it is to do some 'seemingly trivial' things, though the examples at the end are rather nice[1]) SubHask also has an extremely fine grained hierarchy, with lots of little classes to handle cases like "weird containers" with appropriate laws -- and figuring that out is a lot of work too.

Haskell's type system is getting more expressive all the time, so hopefully this will become less tiresome in the future, and iterating and honing the design will be far simpler. But getting the design right is much more important and the other stuff (better underlying numeric libraries, improved performance, etc) can come after.

I think you're spot on that the lack of attention here creatives a negative feedback loop, as well. Another example of this I think that was bad was that very little code took into account things like numerical stability of algorithms. The SubHask/HLearn author also tried tackling this in an interesting way, too[2].

[1] https://gist.github.com/thoughtpolice/c408773f43972e48f93377... [2] https://github.com/mikeizbicki/HerbiePlugin

spott · on May 24, 2017

>But rather, I believe the problem they were hinting at is that the current Haskell numeric hierarchy is insufficient to capture the patterns we want for the vast kinds of numerical computing we do today. But even then, the tools we need to design the hierarchy we do want are also a bit out of reach, and implementing them will take time.

I think part of this issue boils down to avoiding the laziness and immutability of the language. The "patterns we want for [...] numerical computing" not infrequently boil down to (a) build a matrix, (b) modify that matrix, (c) apply that matrix, or parts of that matrix, on vectors, (d) modify vectors, (e) partition matrices into submatrices to do work on, etc. Notice the "modify" part of so many of these.

These aren't really hard problems in imperative languages. The naive algorithm for Arnoldi iteration[0] is inherently iterative and really isn't that hard to write with ok performance in an imperative language... but try writing it naively in a functional language! So, tools and abstractions need to be built to make these kinds of algorithms more natural to write functionally without costing too much in performance. In other words: the tools and abstractions are only necessary because of the limitations of a language that leans so hard on the compiler for performance.

I think this is one of the underlying things that differentiate Haskell and other languages: In Haskell, abstractions can be absurdly reusable, allowing for really elegant code to do things... if you know what abstraction to use. If you don't, then things can get difficult to do. In imperative languages, abstractions can't quite reach the level of reusability that they can in Haskell... but you don't have to rely on the abstractions as much to solve the problems that you want to solve.

Think about the Arnoldi iteration. The iterative version in a number of languages is 10 lines, but the functional version needs some way of drilling down into an array in a mutable way (a monad) in order to make it match the iterative in performance...

[0]https://en.wikipedia.org/wiki/Arnoldi_iteration#The_Arnoldi_...

platz · on May 25, 2017

Of course, insisting on an immutable API that allows data sharing is going to produce different results than a mutable API.

There exist mutable counterparts such as Data.Vector.Unboxed.Mutable and everything in the ST monad.

The question is if you are allowing yourself to utilize them.

bgamari · on May 25, 2017

> The iterative version in a number of languages is 10 lines, but the functional version needs some way of drilling down into an array in a mutable way (a monad) in order to make it match the iterative in performance

Sure, but IMHO the monadic code necessary to implement something like Arnoldi is neither harder to write nor uglier than the same implementation in an imperative language.

If anything, this highlights a strength of the language: we can capture an imperative idea in a functional setting, and even have the ability to nicely isolate the mutability from the rest of our functional program (with `ST`).

zygomega · on May 25, 2017

The author addresses the potential benefits of immutability and laziness in a note on cover trees[0]. Last time I checked, this is still the world's fastest NN algorithm.

Haskell could shine in numerical computing. I think the key is polymorphism: a vector is a matrix is a tensor is a number, and they can all be multiplied in the same way, once you define what multiply is.

Poor support is more about the lack of haskell coders doing ML. It's hard to compete without the numbers that other platforms enjoy.

[0] https://izbicki.me/blog/fast-nearest-neighbor-queries-in-has...

eli_gottlieb · on May 25, 2017

I think it's more about ML experts not learning Haskell. "Haskell programmers" and "PL experts" are tightly overlapping groups, but almost no other major cluster of experts overlaps with those.

w4rh4wk5 · on May 24, 2017

May I ask, do you have any useful references on how to write strict, efficient Haskell code. Or at least some Do/Don'ts?

spott · on May 24, 2017

I don't.

I learned all this while trying my own hand[0] at writing a sparse matrix library in Haskell and running into some of these issues. Unfortunately, I mostly gave up.

[0]https://xkcd.com/1831/

cs702 · on May 24, 2017

Looks really neat: composable, "smartly typed," easy-to-specify neural nets.

The main issue that keeps me from giving this a try is the lack of GPU support out-of-the-box, necessary for tackling problems of more interesting size :-(

lpage · on May 24, 2017

Thought of the day: bridge Grenade with like CλaSH [1] and realize network topologies directly on an FPGA.

[1] http://www.clash-lang.org/

eli_gottlieb · on May 24, 2017

Ah, the wondrous things we could all do with Haskell, if only we neither had nor needed day-jobs.

lr4444lr · on May 24, 2017

I can't upvote this enough.

mark_l_watson · on May 25, 2017

Great project! It was easy to build and run the tests, and, the examples are very readable. Good job. I can't wait to have time to really kick the tires.

dannycastonguay · on May 24, 2017

Thanks Huw! I'm sure many of us have been itching to do it for some time.