
Timely Dataflow in Fifteen Minutes [video] - mpweiher
https://www.youtube.com/watch?v=yOnPmVf4YWo
======
andyferris
This video is really cool. I’ve been following dataflow approaches for a
while, including some of Frank McSherry’s (usually enjoyable) articles. None
of the comments mention [https://materialize.io](https://materialize.io) so I
may as well (an open source commercial offering based off these concepts).

Watching this explanation I’m slightly curious whether things like materialize
and noria are a bit limited in that this could be a paradigm for an actual
functional reactive _programming language_ rather than specifically a “data”
thing. It appears to have the structure of nested contexts (loops, scopes,
etc) advocated by structured programming (ie “goto considered harmful”). It
can reliably calculate an answer at each point in time for each state of
input, concurrently and with parallelism. Even if there are multiple inputs
with their own notion of time (not covered in the video). That’s, like, the
holy grail of PLT these days, isn’t it? Or am I missing something?

~~~
pritambaral
> ... an open source commercial offering ...

I just looked at their license[1] and it doesn't appear to be open source at
the moment.

[https://github.com/MaterializeInc/materialize/blob/master/LI...](https://github.com/MaterializeInc/materialize/blob/master/LICENSE)

~~~
benesch
Yes, we consider ourselves an “open core” company. Timely and differential,
the core compute engine, are fully open source projects, but the Materialize
layer atop is licensed under the “Business Source License” (BSL).

We think the BSL strikes a good balance between giving back to the
community—four years after every release, the code is automatically relicensed
under Apache 2—and ensuring we can build a viable business. And you’re free to
use Materialize for any purpose in a non-distributed (i.e., single node)
deployment without paying for an enterprise license.

~~~
pritambaral
Wasn't complaining against your model; just correcting the parent's usage of
the terms.

------
dmos62
There's a few relevant repositories here
[https://github.com/TimelyDataflow](https://github.com/TimelyDataflow)
including two rust implementations.

~~~
FridgeSeal
The readme for the Abomonation repo makes me laugh every time.

------
rustybolt
In my opinion dataflow is the only true representation of computations. Unlike
normal code, it represents dependencies and parallel computation perfectly.
Because of this, it is also a great basis for a hardware implementation.

~~~
BubRoss
Graphs are good for seeing dependencies and ordering, but not great for
branching and looping.

~~~
DonaldFisk
Although in conventional languages with control flow, we're more used to how
they're done, both branching and looping can be done straightforwardly in
dataflow without introducing any non-dataflow constructs - just graphs with
vertices connected by edges.

Branching:
[http://www.fmjlang.co.uk/fmj/tutorials/Conditional.html](http://www.fmjlang.co.uk/fmj/tutorials/Conditional.html)

Looping:
[http://www.fmjlang.co.uk/fmj/tutorials/Iteration.html](http://www.fmjlang.co.uk/fmj/tutorials/Iteration.html)

Emit and collect:
[http://www.fmjlang.co.uk/fmj/tutorials/Macros.html](http://www.fmjlang.co.uk/fmj/tutorials/Macros.html)

These show the basic idea. There are many other examples throughout the
tutorials
([http://www.fmjlang.co.uk/fmj/tutorials/TOC.html](http://www.fmjlang.co.uk/fmj/tutorials/TOC.html)).

BTW, in case anyone's wondering why there have been no recent updates to the
pages on my visual dataflow language, it's because the many improvements I've
been making, particularly big changes to the type system, have required a lot
more work than I expected. I haven't abandoned work on it, but it will still
be some time before it's ready for release.

~~~
BubRoss
Those show that it's possible, not necessarily that it's a good interface to
make those parts of programs. Houdini's shader language has had branching done
in a data flow graph for a long time. Touch designer has a kind of looping
construct too. You might want to take a look at these domain specific
interfaces if you are doing your own graph, they are well done.

Fundamentally though, the density of text expressions exceeds a data flow
graph by a huge margin. If what is being done isn't fundamentally a directed
acyclic graph, visualizing it with a graph becomes more difficult to absorb
than the expressions as text.

------
throwaway8291
I looked at the data flow paradigm a couple of years ago. Back then I thought
that the difference to just a "ordinary" functions is not that big, and for
performance (which is important for my data work), you do not want to deviate
from the traditional way too much.

Anyone felt the same or can provide a real-world problem, where data flow is
actually working better that other solutions?

~~~
FridgeSeal
There’s this: [https://github.com/mit-pdos/noria](https://github.com/mit-
pdos/noria)

It’s like a cache, except it keeps itself in sync with the database
automatically and generates “materialised views” using data-flows based on the
queries that get asked of it and will automatically generate new ones if
someone makes a query it doesn’t already have a data flow for. Parts of data
flows can also be shared across views.

The paper linked in the github goes into detail about the performance gains,
but it easily outperforms straight database calls and caching setups.

------
thereyougo
So many talented teachers out there. I'm glad you shared this video. This guy
deserves more views to his videos

~~~
FridgeSeal
His papers are fascinating as well. The COST paper especially changed how I
thought about a lot of problems.

------
arendtio
Somehow the 'Hello World' example reminds me of

    
    
      $ printf 'Hello World' | awk '{print $2}' | tr '[A-Z]' '[a-z]' | wc -w | cat
    

Just kidding ;-)

