What advantage does a dataflow language have over a functional language? And how...

frankmcsherry · on Sept 21, 2015

I don't want to speak for all of dataflow-dom, but the main differences that I see (and exploit) are that data-parallel dataflow languages isolate control flow into small independent regions, making the larger computation data-driven. This does mean things are eager rather than lazy, but it also makes things much easier to parallelize (because of the independence) and much easier to incrementalize.[0]

[0]: https://github.com/frankmcsherry/differential-dataflow

chipsy · on Sept 22, 2015

A dataflow language overlaps with functional programming in many respects but has one key difference - components are defined to allow some form of many-in, many-out, rather than a mathematical function's many-in, one-out. This change in syntax - which comes with design concerns around how many-out manifests(e.g. static # of outs vs. automatic copy) - has large effects on programming style. It limits how cleanly one can compose dataflow modules, and compounds the possibilities of type-mismatch, but it also affords a richer definition of module responsibilities and relationships.

Memoization is one of several possible optimizations. Lazy eval may appear in a fashion by blocking a module from processing an input on one channel until its other input channels also have data, but in e.g. the J. Paul Morrison definition, there's a finite dataset and data does not get dropped on the floor, nor does it get stored in components. Eager is the default. If component implementations have broad control over when they process I/O, they can do some fairly substantial optimizations internally, though.

Dataflow works best as a mid-level system, controlled by and controlling other code in other styles. It can model the computing universe ground-up as a kind of electronics hardware in abstracted form(pure logic gates, no physical constraints), but this is a misuse of it. There are languages that can express control flow and arithmetic more succinctly. Moreover, the complexity of component relationship graphs in dataflow precludes using it for boilerplate of that type. It works best, instead, when the components are large enough to perform substantial functionality, but need some form of dynamic configuration.

Analogizing again to electronics, modular synthesis is the canonical example of dataflow in action. A modular synthesizer contains a set of building blocks - the modules - each of which turn some number of input voltages into output voltages, sometimes with physical knobs and switches on the panel. Patch cables are used to connect the modules together, allowing for a great deal of flexibility over the final signal path. In software, by comparison, a modular system operates on chunks of digital samples instead of voltage over time. The principles are the same for the end-user, while the implementation may be completely different underneath.

One should also note that analog synthesizers started incorporating digital CPUs as soon as it was affordable, for their most state-heavy functionality: tracking keypresses over time, and store and recall of patch data. This gives some idea of where dataflow techniques trail off and an imperative style can start to shine, hence why I call it a "mid-level" technique, not something that describes the whole system.