
EasyLambda – parallel data processing with modern C++ - eicossa
https://github.com/haptork/easylambda
======
hellofunk
Using line breaks between member function calls is an interesting syntax style
I haven't seen before:

ezl::rise(readFile<string> (argv[1]).rowSeparator('s').colSeparator(""))

    
    
        .reduce<1>(ezl::count(), 0).dump()
    
        .run();

~~~
lorenzhs
It's very common when chaining lots of operations, each of which transforms
the result of the previous one (functional style). I've seen tons of Scala
code do it. Writing it all in one line would be quite the unreadable mess.

~~~
skeuomorf
Yeah, it's pretty common with code that uses the builder pattern as well. I
also like to use it with rust combinators e.g.
[https://github.com/skeuomorf/conshash/blob/master/src/lib.rs...](https://github.com/skeuomorf/conshash/blob/master/src/lib.rs#L90)

------
polskibus
How does this differ from Threading Building Blocks ?
[https://www.threadingbuildingblocks.org/](https://www.threadingbuildingblocks.org/)

~~~
haptork
One can still use TBB inside the functions to make use of it alongside getting
MPI parallelism from easyLambda.

Moreover, easyLambda has data flows, map, reduce which in a way improves the
way programs are written and composed. There are no classes to be declared for
anything. It has many syntactic sugars, for column selection, control over
parallelism, i/o etc. I personally like the style of coding in it for various
problems, keeping parallelism aside.

~~~
srean
Since it supports distributed computation does it have any abstraction for
dealing with failures, for example, network problems etc. Support for
checkpointing or some other abstraction that deals with these kinds of errors
would be very useful. In fact absolutely essential for anything of non-trivial
scale.

BTW not complaining, far, far from it, just a useful feature addition you may
consider in case you have the bandwidth to invest.

~~~
haptork
Yeah, you are right, fault tolerance is a needed feature. Although, I'm not
pretty sure if that needs to be added to easyLambda itself. Since, MPI 3.0 is
expected to support fault tolerance by default. But instead of waiting for MPI
3.0 fault tolerance support, I'm thinking of running it on top of HPX or some
other system that provides good runtime support on top of MPI. Also, since
most of parallelism scheme is confined to MPIBridge which is just a unit in
the flow, these different implementations can coexist.

------
joe_the_user
So this is a bunch of constructs for creating function-pointers and the like
which then would be sent to mpi or other parallel libraries to do the actual
interfacing with the parallel mechanism.

Any idea how this would work with (random stuff I'm thinking about) Cuda or
Tensorflow?

~~~
haptork
Thanks for your interest. There are indeed no function pointers, no void
pointers, no preprocessor hacks, no wild type-casting involved anywhere in the
library. This is completely compile time type-safe. Still, the kind of things
you can do would be a far cry for all the hacks with those shoddy pointers.
There are a lot of features like column manipulation, passing vector of a
values for multiple rows and just a value for a single row etc. that make
composability and ease of use way better. All of these features are
implemented using template meta programming and traits available in modern
C++. The language has become quite impressive from what we used to think of.
You must check it out.

A map/reduce/rise take a template parameter which can be of any class type and
library has nothing to do what a function object or function does in the
function body. The function just need to take in the arguments that the prior
unit passes it and return something. I don't think there will be any problem
if one uses any other library to perform computation inside a function, even
if some library does not work out of the box, I'm pretty positive it can be
done with little tinkering. I hope this makes things clear.

~~~
joe_the_user
Thanks for the reply!

I'm probably not as up on the most modern c++ approaches as I should be but
I'm quite interested. I have downloaded the code and I'm looking through it.

Basically, it uses template metaprogramming to go from the higher-level
map/reduce/rise semantics to lower level for-loops and calls.

Beyond the particulars, the question/challenge I'd have would be - "how would
you use this dynamically generate map/reduce/rise calls?"

~~~
haptork
If you can give some idea of the client code / API that you are expecting, I
might be able to give some view on that.

