Data flow syntax: The missing piece for the success of functional programming?

ff_ · on July 16, 2015

Reading the title I was expecting some Lisp/Haskell...

While you can do that, Python doesn't make your life easier if you want to go functional, really. There's only map/reduce/filter, one-line lambdas (that get really messy really fast), no currying, etc. (however, comprehensions are fine)

I appreciate the effort of the OP in questioning the fact of dataflow implementations, and I suggest him to further explore the world of functional languages - it's huge and resourceful, while the quality of the content written is surprisingly high on average - as he may even find a working implementation for the problem he was trying to solve (happens way too often to me).

BTW, a quick google brought me to this [1] - I'm not sure if it's exactly what we are talking about, but seems related.

[1]: https://github.com/kennytilton/cells

samuell · on July 16, 2015

Well, I've done a tiny bit of lisp/scheme too, but felt that lazy-evaluated generators (which I happen to know the best in python) are a bit more "state of the art" FP than the call/return semantics I've seen in lisp/scheme (Clojure has async.io though, which is a whole other story, much more data flow like, AFAIK).

Thanks for the cells link, will have to check!

scott_s · on July 17, 2015

I work on IBM Streams, which has its own language, SPL (Streams Processing Language), which solves the same problem. You can check out a small SPL application here: https://github.com/IBMStreams/streamsx.demo.logwatch

And a direct link to the SPL source code for the application: https://github.com/IBMStreams/streamsx.demo.logwatch/blob/ma...

strmpnk · on July 16, 2015

I'd say Elm goes a long ways toward practical dataflow syntax and operation in its rather small language. Making a series of comprehensions lazy doesn't quite go far enough to make composition natural. While it seems like there are ways to improve it, the Signal.Mailbox record type and the Elm architecture have surprisingly good composition properties.

samuell · on July 16, 2015

That's interesting! I've been watching Elm a small bit, but hadn't seen that they also have the concept of "ports", which is so useful in FBP! (The "example.ports.prices.send(42);" part in http://elm-lang.org/docs/syntax )

ninjakeyboard · on July 16, 2015

Have you looked at reactive streams? There are some interesting semantics for splitting and combining data and only describing the different flows independently that handle the data as it flows through.

http://doc.akka.io/docs/akka-stream-and-http-experimental/1....

There are split semantics where the tuples are simply fed into other flows. And then merge semantics where flows recombine into a single flow for further processing.

samuell · on July 17, 2015

Yes, I have been checking reactive streams, but they seem to take a similar approach as most other functional languages supporting chained lazy-evaluated functions.

Even though they have specialized functions and constructs for branching out and merging, I'd argue that being limited to chaining functions, IMO it quickly grows unmaneagable when dealing with any more complex data flow networks. AFAIS, you'll get super-long chains of function calls, that are really hard to change, or inspect.

Compare that with the syntax proposed in the post, or some other FBP syntax like the one in GoFlow [1], where you define each connection between an outport and an inport on one line. IMO this gets exponentially more manageable, readable and maintainable.

[1] https://github.com/trustmaster/goflow#basic-example

amelius · on July 16, 2015

What if the program is constantly altering the plumbing beneath the streams? Isn't this kind of programming not becoming a bit "messy" then?

ninjakeyboard · on July 16, 2015

I don't know - maybe I don't see the argument clearly. If you want to keep state, then look at the actor model. If you want to talk about functional programming, then we can assume things are immutable.

If you want to deal with multiple streams of data, I think combinators do just fine. If you want to choose which function to pass into higher order functions at runtime, then indeed you can define functions and just reference them. If you want to split data up, you can use all of the nice function composition and pattern matching available in languages like scala and haskell.

To split a string, take half to upper and half to lower, you can use tuples, or zip and unzip, partition etc. Stuff like that. None of it seems messy but I'm probably not understanding this article correctly.

nightski · on July 16, 2015

Instead of creating a mutable set of plumbing, you can create an equal design that has static plumbing with altering flows based on data. Maintaining the latter is going to be a lot easier than the former anyways.

dwenzek · on July 17, 2015

The post is interesting and highlights a true fact : it is not always easy to chain map/filter/reduce transformations when the processed values are compound and call for processing that are specific to each part.

Nonetheless, the proposed solution is most in the component/object realm rather in the functional world.

As an evidence the use of nouns rather verbs as in :

    splitter.input = hi_generator.his
    lowercaser.input = splitter.left_part
    uppercaser.input = splitter.right_part

This is not a judgement, just a remark. A functional solution would rather use transformation combinators to propagate a transformation on a specific value part. For instance :

    input |> map split_two_halves
          |> map_fst to_lower
          |> map_snd to_upper

Or even better, use lenses as a generic way to unpack a specific part of a data structure and to repack an updated value:

    input |> map split_two_halves
          |> map_lens fst to_lower
          |> map_lens snd to_upper

harlowja · on July 17, 2015

A (python) library that does alot of this (the parallelism, data-flow, flow construction from task units, remote execution, resumable execution...). I'm one of the authors, it's used in various openstack projects (and more)... This article was very much along the ideas of it, so thought it useful to share.

- http://docs.openstack.org/developer/taskflow/

- https://wiki.openstack.org/wiki/TaskFlow

We recently had a brown-bag (with a community member from HP and myself from yahoo) that is shareable and maybe useful to folks @ https://docs.google.com/presentation/d/1EZoY4FE2SDjfCqMCgBRr...

iovrthoughtthis · on July 17, 2015

This is a problem I've experienced (and partially solved for my self) when working with streams of events on UI's in js.

e.g.

  applyStoreFilterStream.recv(function () {
      // do some stuff
    })
    // when either removing a filter or adding a filter
    .compose.serial(removeStoreFilterStream, applyTypeFilterStream)
      .recv(removeAllElements)
      .recv(pager.reset.bind(pager))
      .recv(requestProductsStream.send);

This was using an in house streams library (before i'd found Rxjs).

I did find that, because there is only one line at a time within which to express a dependency (from one stream to another) it was quite difficult to conceptualize the graph structure being created. I think perhaps having editors that can interpret this sort of code into a visible graph structure and conversely, compile a graph structure down into this form, would be interesting.

m_mueller · on July 17, 2015

Please excuse my ignorance, but maybe someone can help me convince that I'm wrong.

This article illustrates a problem I have with FP in general, coming from procedural / OOP codebases: For the generator example, how can I be sure that the initial generation of strings isn't getting executed twice? Will python analyse that I'm only ever reading from "left" or "right" within the gen_uc / gen_lc loops, thus optimizing away the the other side in the upstream generator? So I'm basically trusting the runtime to optimize for me? Do we have benchmark showing the overhead for this runtime analysis? Does it work consistently with JITed pythons like pypy?

So far I'm just always too skeptical that I could program myself into a performance hole that require complete rewrites to get out of, so I stick with code where I know what will happen on the machine. Python for me is just the toy language to try these concepts before diving into actually performant ones, but not finding any significant Haskell/F#/.. benchmarks that can perform as well as good C or Fortran code, hasn't helped in reducing my skepticism. IMO the thinking of "correctness now, performance later" is a fallacy in that the choices I'm making in the design phase can lead to complete rewrites in order to get something performant - in the worst case switching between languages.

icebraining · on July 17, 2015

You can be sure it doesn't execute twice because the generation is lazy/on-demand. When you create a generator, nothing gets actually done, it just sits there waiting for you to pull elements from it, and which point it'll actually generate one element (and wait for you to request another).

This is in contrast with list comprehensions, which have a similar syntax, but which generate a full list immediately after being declared.

m_mueller · on July 17, 2015

I know that this is true in general, but this case here seems tricky to me. Look at the following line:

> yield (s[len(s)//2:], s[:len(s)//2])

I'm aware that the generator is being called lazily, but here we're actually calling each yield twice - once the left side is thrown away, once the right side downstream. Now, are you still sure that left and right is generated only once?

icebraining · on July 17, 2015

Oh, right, I missed that. No, it's actually generating both every time. CPython can't eliminate one of the sides, since the calls (either to len() or to the slicing method of 's') might have side effects.

In this case, I think it'd make more sense to divide gen_splitted_his into two functions (or one which accepted a parameter), for generating one or the other.

m_mueller · on July 17, 2015

I have been toying around now with this second python example for a bit.

Here's my conclusion: Not only is it not efficient, it doesn't even work. The generator cannot be rewound after it has been iterated over for lowercase - the uppercase version will never be executed. You essentially run into the problem described here: http://stackoverflow.com/questions/1271320/reseting-generato... .

It only works once you actually generate twice

> lc_his = gen_lc_his(gen_splitted_his(('hi for the %d:th time!' % i for i in xrange(10))))

> uc_his = gen_uc_his(gen_splitted_his(('hi for the %d:th time!' % i for i in xrange(10))))

It seems to me for this toy example, generators are simply not the right tool.

icebraining · on July 17, 2015

Yes, but I think the author realizes that, which is why (s)he wrote above "this will produce an error, since running a generator function twice is not allowed, and this would not re-use the same items in memory anyway, which is kind of the main point with stream processing".

The solution being proposed at the end is actually not using generators at all.

m_mueller · on July 17, 2015

Oh I see, I missed that this "limitation" also applies for his DPI example that I was toying with.

stuartaxelowen · on July 17, 2015

If you're interested in this kind of stuff in python, you should check out datastreams: https://github.com/StuartAxelOwen/datastreams

msellout · on July 17, 2015

Why is `datastreams` better than using generators and comprehensions, plus the itertools module in the standard library?

stuartaxelowen · on July 17, 2015

It makes working with iterators easy - this way you're always talking about one thing at a time, instead of the whole collection/iterator, making efficient computation easy.

Also, the abstraction allows for optimizations like delegating to Apache Spark or other processes/threads.

dudul · on July 17, 2015

How is functional programming not already successful? All new languages claim to be functional, old languages add functional features. FP is seen as the best way to use multicore and keep improving software.

How is it not successful already?

lkrubner · on July 17, 2015

I think they meant "not successful" in the sense of not showing up on popular indexes, such as:

http://www.tiobe.com/index.php/content/paperinfo/tpci/index....

I'm only pointing to Tiobe because it is popular, not because it is accurate, but you could use any other popular index and you would get similar results: a whole lot of imperative languages, but none of the officially Functional languages.

choudanu4 · on July 17, 2015

Considering the philosophy of Python, how likely is it that this idea of return/parameter independence be introduced into the language (via a PEP or whatnot)?