
An Old Conjecture on Stream Transducers - mr_tyzic
https://www.pvk.ca/Blog/2018/06/24/an-old-conjecture-on-stream-transducers/
======
Cieplak
Reminds me of a stream-processing library that Michael Snoyman recently
started working on:

[https://github.com/snoyberg/zerem](https://github.com/snoyberg/zerem)

For background, he authored Conduit, one of the most popular Haskell stream-
processing libraries:

[https://github.com/snoyberg/conduit](https://github.com/snoyberg/conduit)

~~~
fjsolwmv
So does conduit fall short?

------
mrmyers
oleg and friends claim to have disproven the 'pick two' thing. See their
'Stream Fusion, to Completeness' paper [1], and the associated Strymonas
library [2].

[1] [https://yanniss.github.io/streams-
popl17.pdf](https://yanniss.github.io/streams-popl17.pdf)

[2] [https://strymonas.github.io/](https://strymonas.github.io/)

~~~
Veedrac
That's not what I recall. Having a quick reskim, I don't believe they can
handle the shared mutability in `F(S(X), X)`, so would end up duplicating the
work to generate the elements of `X`. This problem looks unavoidable to me; if
you are computing `f(x[n], x[2*n])` you need both points in memory
simultaneously.

------
junke
> This issue would also appear if we forced stream transducers (processing
> nodes) to output a fixed number of value at each invocation: it suffices to
> let S repeat each value of X twice, i.e., interleave X with X.

I don't understand how the issue would appear in that case. Buffering one
value seems to be sufficient to upsample S to the "clock" of X, ie. project S
to the instants where X is defined (flow Z below) while filling missing
instants with the previously known value:

    
    
       X :  X0     X1    X2    X3   ...
       S :  S0           S1    
    
       Z :  S0     S0    S1    S1
    

That would look as follows in Signal:

    
    
        process test = (? integer x ! integer y)
          (|   c := not (c$ init false)
           |   s := (x when c) cell c
           |   y := x + s
           |)
           where
             boolean c;
             integer s;
           end;
    

C is the clock which alternates between false and true ($ refers to the last
value), "X when C" is filtering, "E cell C" is projection of values of E to
the instant of C, defaulting to the previous known value at logical instants
where E is not present but C is true.

The clock calculus in Signal (etc.) is way to assess if a program can be
generated to use constant (static) memory allocation and constant time. There
are programs which fail to do so, but you then need to refine them to add
explicit operations
([https://www.irisa.fr/prive/talpin/papers/date08.pdf](https://www.irisa.fr/prive/talpin/papers/date08.pdf))

~~~
pkhuong
This would definitely work if S didn't produce its output from X. Given that
it does, I don't see how the strategy above can implement

    
    
        f(x[0], x[0])
        f(x[0], x[1])
        f(x[1], x[2])
        f(x[1], x[3])
        f(x[2], x[4])
        f(x[2], x[5])
        ...
    

with a single pass over x.

~~~
junke
Thanks for the explanation. I see now that X is read at different rates from
different consumers. You need to have a buffer that might grows infinitely in
the general case (I remember reading about CSDF, cyclo-static dataflow graphs,
where the buffer size is (can be?) bounded).

------
phaedrus
This is interesting because I have (or had) been working on a C++ logic
programming library. Many if not most similar libraries are implemented using
streams. (See minikanren, LC++, etc.) Naturally as a C++ library I'm
interested in if it can be done without or with a minimum of dynamic
allocation, and had come to the conclusion "probably yes, but only if I give
up on some subset of the things I wanted it to be able to do." So I concur
with the article's opening, "choose 2 of 3."

------
carapace
Can anyone think up a realistic scenario where you might actually have this
problem? The example seems awfully contrived.

