Hacker News new | comments | show | ask | jobs | submit login
An Old Conjecture on Stream Transducers (pvk.ca)
36 points by mr_tyzic 36 days ago | hide | past | web | favorite | 9 comments



Reminds me of a stream-processing library that Michael Snoyman recently started working on:

https://github.com/snoyberg/zerem

For background, he authored Conduit, one of the most popular Haskell stream-processing libraries:

https://github.com/snoyberg/conduit


So does conduit fall short? Apparently it wastes a lot of CPU?

Do Zerem and Conduit have compatible APIs?


oleg and friends claim to have disproven the 'pick two' thing. See their 'Stream Fusion, to Completeness' paper [1], and the associated Strymonas library [2].

[1] https://yanniss.github.io/streams-popl17.pdf

[2] https://strymonas.github.io/


That's not what I recall. Having a quick reskim, I don't believe they can handle the shared mutability in `F(S(X), X)`, so would end up duplicating the work to generate the elements of `X`. This problem looks unavoidable to me; if you are computing `f(x[n], x[2*n])` you need both points in memory simultaneously.


> This issue would also appear if we forced stream transducers (processing nodes) to output a fixed number of value at each invocation: it suffices to let S repeat each value of X twice, i.e., interleave X with X.

I don't understand how the issue would appear in that case. Buffering one value seems to be sufficient to upsample S to the "clock" of X, ie. project S to the instants where X is defined (flow Z below) while filling missing instants with the previously known value:

   X :  X0     X1    X2    X3   ...
   S :  S0           S1    

   Z :  S0     S0    S1    S1
That would look as follows in Signal:

    process test = (? integer x ! integer y)
      (|   c := not (c$ init false)
       |   s := (x when c) cell c
       |   y := x + s
       |)
       where
         boolean c;
         integer s;
       end;
C is the clock which alternates between false and true ($ refers to the last value), "X when C" is filtering, "E cell C" is projection of values of E to the instant of C, defaulting to the previous known value at logical instants where E is not present but C is true.

The clock calculus in Signal (etc.) is way to assess if a program can be generated to use constant (static) memory allocation and constant time. There are programs which fail to do so, but you then need to refine them to add explicit operations (https://www.irisa.fr/prive/talpin/papers/date08.pdf)


This would definitely work if S didn't produce its output from X. Given that it does, I don't see how the strategy above can implement

    f(x[0], x[0])
    f(x[0], x[1])
    f(x[1], x[2])
    f(x[1], x[3])
    f(x[2], x[4])
    f(x[2], x[5])
    ...
with a single pass over x.


Thanks for the explanation. I see now that X is read at different rates from different consumers. You need to have a buffer that might grows infinitely in the general case (I remember reading about CSDF, cyclo-static dataflow graphs, where the buffer size is (can be?) bounded).


This is interesting because I have (or had) been working on a C++ logic programming library. Many if not most similar libraries are implemented using streams. (See minikanren, LC++, etc.) Naturally as a C++ library I'm interested in if it can be done without or with a minimum of dynamic allocation, and had come to the conclusion "probably yes, but only if I give up on some subset of the things I wanted it to be able to do." So I concur with the article's opening, "choose 2 of 3."


Can anyone think up a realistic scenario where you might actually have this problem? The example seems awfully contrived.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: