
Functional Programming for Array-Based Parallelism - kooskoos
https://www.infoq.com/presentations/functional-programming-parallelism/
======
lmeyerov
One of the most elegant paper series I've ever read is her "More Types for
Nested Data Parallelism"
([https://dl.acm.org/doi/abs/10.1145/351240.351249](https://dl.acm.org/doi/abs/10.1145/351240.351249))
+ the preceding NESL / segmented scan papers by Guy Blelloch
([https://www.cs.cmu.edu/~guyb/papers/Nesl2.0.pdf](https://www.cs.cmu.edu/~guyb/papers/Nesl2.0.pdf))
. They helped lead to our GPU journey for starting Graphistry, and why we bet
on GPU dataframes (ex: RAPIDS.ai) eventually overtaking Spark and friends even
before DataBricks was a company :) If you do pandas/tensorflow/numpy/sql/etc.,
this stuff is amazing.

Fun to see come full circle here!

~~~
flafla2
Seconded! I had the pleasure of taking Guy Belloch's parallel programming
class at CMU as a sophomore. A large majority of the homework assignments
involved leveraging the segmented scan operator to solve all sorts of
classical parallel programming problems.

Really eye opening stuff! I am still impressed that prefix sum (generalized
over _all_ associative operators, not just +) can be done in O(log n) span.
And the algorithm is not too complicated either, I was able to get the gist of
it even as an undergrad.

~~~
throwaway_pdp09
I read that the prefix scan is great so I read somebody's thesis. I a) didn't
really get it and b) felt it was doing some complicated stuff for little
value. Doubtless I'm wrong, computer scientists tend to be smarter than me,
can you give me some insight why it's so valuable?

~~~
lmeyerov
There are two really cool things going on here. Also, keep in mind NESL was
published 1992 (and thus made in ~1990), which was not far from when the
Connection Machine + Cray's got hyped and largely failed to mainstream
parallel compute.

1\. Data parallelism, esp. for super power efficient parallel hardware (SIMD,
GPU, all the stuff that lets you go 10-100X over Spark for the same budget) is
hard for computing over "irregular" data like trees and graphs. Prefix scan
showed a basic way to do it for a bunch of basic math over these. A lot of the
90s / early 2000s became research into covering all sorts of weird cases. This
shows data parallelism is broadly applicable so worth pushing forward on.

2\. Guy B, and then Manuel C & Gabriele K, then showed you don't have to be a
weird HPC person hand-rolling Fortran nor rely on weird one-off special case
libraries to use the above. We can build libraries of high-level primitives --
think map/reduce -- and even to the level that compilers can make it look like
"normal" haskell. Most things can compile down to prefix scan. This is similar
to Impala/Spark figuring out RDDs/dataframes and doing SQL on top, except
again, for 10-100X better performance, and you don't have to contort
everything into 1970s SQL. RE:Prefix scan in particular, we don't _have_ to
implement all the algs as prefix scan, but it's reliable base case, and where
possible (most cases!), we nowadays instead swap in more efficient hand-
written CUDA library calls.

Nowadays, when GPUs are in the cloud, Moore's Law stalled out for CPUs but is
still going strong for GPUs, and most big data systems realize their latency
and data shuttling/serialization sucks, all ^^^ is a big deal. See 10yr AWS
price chart I calculated:
[https://twitter.com/lmeyerov/status/1247382156487213057/phot...](https://twitter.com/lmeyerov/status/1247382156487213057/photo/1)
. You can just code your Python dataframe code in
[https://rapids.ai/](https://rapids.ai/) and not really think about it, just
know you're doing better than the hadoop stuff underneath and without managing
a dirty horde. The existence of NESL and the research area that came after
means we're still at the beginning of swapping out crappy CPU software for
better data parallel stuff, and _it's doable_.

~~~
throwaway_pdp09
Thanks for a thorough answer!

------
munro
Great slides near the end, this project looks really fun, and it's cool to see
parallelism applied to simple vanilla code--rather than having to build the
transformation through some eDSL. That said, I hate to be a downer, but I've
really moved away from Haskell, so I don't see myself playing with this.
Lately it's PyTorch or PySpark, with an eye on Rust, and mildly Julia.

~~~
vmchale
> but I've really moved away from Haskell

Eh? Some sort of ML (or maybe Lisp) is going to be at the cutting edge in
something like this where you have JIT-compilers and such.

> Lately it's PyTorch or PySpark

There's some GPU stuff you can do in Python via Futhark. It's very cool.

------
tombert
I saw her keynote of this talk at Lambda Days, and it was easily my favorite
talk there this year (except mine, of course :) ).

I find it kind of weird that, outside of academic circles and the like,
Haskell is largely dismissed as "not practical"; I think the "cabal hell" days
before Stack came out really hurt the language and its adoption. Libraries
like Accelerate really demonstrate how powerful something like Haskell can be
when applied correctly.

------
vmchale
I'd love for this to get more mainstream attention! J, Futhark, Accelerate all
have very neat stuff under the hood.

~~~
eggy
I second J and Futhark. I love J, and I wish there was a way to use the GPU
with J. I have looked briefly at APL -> TAIL -> Futhark, but I don't know
enough to do something useful with that particular toolchain or wow myself
enough to keep going. More studying...

~~~
smabie
How about co-dfns? CUDA accelerated APL!

[https://github.com/Co-dfns/Co-dfns](https://github.com/Co-dfns/Co-dfns)

~~~
eggy
Yes that is Aaron Hsu's work for APL on the GPU. I was hoping for a J on the
GPU toolchain.

------
jiggunjer
I'm no expert on parallel algorithms but, based on some of the examples I've
seen, I think MATLAB might be suitable.

At least when I program in MATLAB instead of python I think very differently.

------
auggierose
She is also one of the inventors of associated types.

------
angel_j
Would it give me academic credibility to have written parallelizable
conditionals without 'if' statements?

