
Data flow syntax: The missing piece for the success of functional programming? - samuell
http://bionics.it/posts/fbp-data-flow-syntax
======
ff_
Reading the title I was expecting some Lisp/Haskell...

While you can do that, Python doesn't make your life easier if you want to go
functional, really. There's only map/reduce/filter, one-line lambdas (that get
really messy really fast), no currying, etc. (however, comprehensions are
fine)

I appreciate the effort of the OP in questioning the fact of dataflow
implementations, and I suggest him to further explore the world of functional
languages - it's huge and resourceful, while the quality of the content
written is surprisingly high on average - as he may even find a working
implementation for the problem he was trying to solve (happens way too often
to me).

BTW, a quick google brought me to this [1] - I'm not sure if it's exactly what
we are talking about, but seems related.

[1]:
[https://github.com/kennytilton/cells](https://github.com/kennytilton/cells)

~~~
samuell
Well, I've done a tiny bit of lisp/scheme too, but felt that lazy-evaluated
generators (which I happen to know the best in python) are a bit more "state
of the art" FP than the call/return semantics I've seen in lisp/scheme
(Clojure has async.io though, which is a whole other story, much more data
flow like, AFAIK).

Thanks for the cells link, will have to check!

------
scott_s
I work on IBM Streams, which has its own language, SPL (Streams Processing
Language), which solves the same problem. You can check out a small SPL
application here:
[https://github.com/IBMStreams/streamsx.demo.logwatch](https://github.com/IBMStreams/streamsx.demo.logwatch)

And a direct link to the SPL source code for the application:
[https://github.com/IBMStreams/streamsx.demo.logwatch/blob/ma...](https://github.com/IBMStreams/streamsx.demo.logwatch/blob/master/language/com.ibm.streamsx.demo.logwatch.language/LogWatch.spl)

------
strmpnk
I'd say Elm goes a long ways toward practical dataflow syntax and operation in
its rather small language. Making a series of comprehensions lazy doesn't
quite go far enough to make composition natural. While it seems like there are
ways to improve it, the Signal.Mailbox record type and the Elm architecture
have surprisingly good composition properties.

~~~
samuell
That's interesting! I've been watching Elm a small bit, but hadn't seen that
they also have the concept of "ports", which is so useful in FBP! (The
"example.ports.prices.send(42);" part in [http://elm-
lang.org/docs/syntax](http://elm-lang.org/docs/syntax) )

------
ninjakeyboard
Have you looked at reactive streams? There are some interesting semantics for
splitting and combining data and only describing the different flows
independently that handle the data as it flows through.

[http://doc.akka.io/docs/akka-stream-and-http-
experimental/1....](http://doc.akka.io/docs/akka-stream-and-http-
experimental/1.0-M2/scala/stream-graphs.html)

There are split semantics where the tuples are simply fed into other flows.
And then merge semantics where flows recombine into a single flow for further
processing.

~~~
amelius
What if the program is constantly altering the plumbing beneath the streams?
Isn't this kind of programming not becoming a bit "messy" then?

~~~
ninjakeyboard
I don't know - maybe I don't see the argument clearly. If you want to keep
state, then look at the actor model. If you want to talk about functional
programming, then we can assume things are immutable.

If you want to deal with multiple streams of data, I think combinators do just
fine. If you want to choose which function to pass into higher order functions
at runtime, then indeed you can define functions and just reference them. If
you want to split data up, you can use all of the nice function composition
and pattern matching available in languages like scala and haskell.

To split a string, take half to upper and half to lower, you can use tuples,
or zip and unzip, partition etc. Stuff like that. None of it seems messy but
I'm probably not understanding this article correctly.

------
dwenzek
The post is interesting and highlights a true fact : it is not always easy to
chain map/filter/reduce transformations when the processed values are compound
and call for processing that are specific to each part.

Nonetheless, the proposed solution is most in the component/object realm
rather in the functional world.

As an evidence the use of _nouns_ rather _verbs_ as in :

    
    
        splitter.input = hi_generator.his
        lowercaser.input = splitter.left_part
        uppercaser.input = splitter.right_part
    

This is not a judgement, just a remark. A functional solution would rather use
transformation combinators to propagate a transformation on a specific value
part. For instance :

    
    
        input |> map split_two_halves
              |> map_fst to_lower
              |> map_snd to_upper
    

Or even better, use lenses as a generic way to unpack a specific part of a
data structure and to repack an updated value:

    
    
        input |> map split_two_halves
              |> map_lens fst to_lower
              |> map_lens snd to_upper

------
harlowja
A (python) library that does alot of this (the parallelism, data-flow, flow
construction from task units, remote execution, resumable execution...). I'm
one of the authors, it's used in various openstack projects (and more)... This
article was very much along the ideas of it, so thought it useful to share.

\-
[http://docs.openstack.org/developer/taskflow/](http://docs.openstack.org/developer/taskflow/)

\-
[https://wiki.openstack.org/wiki/TaskFlow](https://wiki.openstack.org/wiki/TaskFlow)

We recently had a brown-bag (with a community member from HP and myself from
yahoo) that is shareable and maybe useful to folks @
[https://docs.google.com/presentation/d/1EZoY4FE2SDjfCqMCgBRr...](https://docs.google.com/presentation/d/1EZoY4FE2SDjfCqMCgBRrwo7ovHF4vYZFGKjjvFvWHIE/)

------
iovrthoughtthis
This is a problem I've experienced (and partially solved for my self) when
working with streams of events on UI's in js.

e.g.

    
    
      applyStoreFilterStream.recv(function () {
          // do some stuff
        })
        // when either removing a filter or adding a filter
        .compose.serial(removeStoreFilterStream, applyTypeFilterStream)
          .recv(removeAllElements)
          .recv(pager.reset.bind(pager))
          .recv(requestProductsStream.send);
    

This was using an in house streams library (before i'd found Rxjs).

I did find that, because there is only one line at a time within which to
express a dependency (from one stream to another) it was quite difficult to
conceptualize the graph structure being created. I think perhaps having
editors that can interpret this sort of code into a visible graph structure
and conversely, compile a graph structure down into this form, would be
interesting.

------
m_mueller
Please excuse my ignorance, but maybe someone can help me convince that I'm
wrong.

This article illustrates a problem I have with FP in general, coming from
procedural / OOP codebases: For the generator example, how can I be sure that
the initial generation of strings isn't getting executed twice? Will python
analyse that I'm only ever reading from "left" or "right" within the gen_uc /
gen_lc loops, thus optimizing away the the other side in the upstream
generator? So I'm basically trusting the runtime to optimize for me? Do we
have benchmark showing the overhead for this runtime analysis? Does it work
consistently with JITed pythons like pypy?

So far I'm just always too skeptical that I could program myself into a
performance hole that require complete rewrites to get out of, so I stick with
code where I know what will happen on the machine. Python for me is just the
toy language to try these concepts before diving into actually performant
ones, but not finding any significant Haskell/F#/.. benchmarks that can
perform as well as good C or Fortran code, hasn't helped in reducing my
skepticism. IMO the thinking of "correctness now, performance later" is a
fallacy in that the choices I'm making in the design phase can lead to
complete rewrites in order to get something performant - in the worst case
switching between languages.

~~~
icebraining
You can be sure it doesn't execute twice because the generation is lazy/on-
demand. When you create a generator, nothing gets actually done, it just sits
there waiting for you to pull elements from it, and which point it'll actually
generate one element (and wait for you to request another).

This is in contrast with list comprehensions, which have a similar syntax, but
which generate a full list immediately after being declared.

~~~
m_mueller
I know that this is true in general, but this case here seems tricky to me.
Look at the following line:

> yield (s[len(s)//2:], s[:len(s)//2])

I'm aware that the generator is being called lazily, but here we're actually
calling each yield _twice_ \- once the left side is thrown away, once the
right side downstream. Now, are you still sure that left and right is
generated only once?

~~~
icebraining
Oh, right, I missed that. No, it's actually generating both every time.
CPython can't eliminate one of the sides, since the calls (either to len() or
to the slicing method of 's') might have side effects.

In this case, I think it'd make more sense to divide gen_splitted_his into two
functions (or one which accepted a parameter), for generating one or the
other.

~~~
m_mueller
I have been toying around now with this second python example for a bit.

Here's my conclusion: Not only is it not efficient, it doesn't even work. The
generator cannot be rewound after it has been iterated over for lowercase -
the uppercase version will never be executed. You essentially run into the
problem described here: [http://stackoverflow.com/questions/1271320/reseting-
generato...](http://stackoverflow.com/questions/1271320/reseting-generator-
object-in-python) .

It only works once you actually generate twice

> lc_his = gen_lc_his(gen_splitted_his(('hi for the %d:th time!' % i for i in
> xrange(10))))

> uc_his = gen_uc_his(gen_splitted_his(('hi for the %d:th time!' % i for i in
> xrange(10))))

It seems to me for this toy example, generators are simply not the right tool.

~~~
icebraining
Yes, but I think the author realizes that, which is why (s)he wrote above _"
this will produce an error, since running a generator function twice is not
allowed, and this would not re-use the same items in memory anyway, which is
kind of the main point with stream processing_".

The solution being proposed at the end is actually not using generators at
all.

~~~
m_mueller
Oh I see, I missed that this "limitation" also applies for his DPI example
that I was toying with.

------
stuartaxelowen
If you're interested in this kind of stuff in python, you should check out
datastreams:
[https://github.com/StuartAxelOwen/datastreams](https://github.com/StuartAxelOwen/datastreams)

~~~
msellout
Why is `datastreams` better than using generators and comprehensions, plus the
itertools module in the standard library?

~~~
stuartaxelowen
It makes working with iterators easy - this way you're always talking about
one thing at a time, instead of the whole collection/iterator, making
efficient computation easy.

Also, the abstraction allows for optimizations like delegating to Apache Spark
or other processes/threads.

------
dudul
How is functional programming not already successful? All new languages claim
to be functional, old languages add functional features. FP is seen as the
best way to use multicore and keep improving software.

How is it not successful already?

~~~
lkrubner
I think they meant "not successful" in the sense of not showing up on popular
indexes, such as:

[http://www.tiobe.com/index.php/content/paperinfo/tpci/index....](http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html)

I'm only pointing to Tiobe because it is popular, not because it is accurate,
but you could use any other popular index and you would get similar results: a
whole lot of imperative languages, but none of the officially Functional
languages.

------
choudanu4
Considering the philosophy of Python, how likely is it that this idea of
return/parameter independence be introduced into the language (via a PEP or
whatnot)?

