Hacker News new | comments | show | ask | jobs | submit login
Rich Hickey – Inside Transducers [video] (youtube.com)
112 points by tsm on Nov 21, 2014 | hide | past | web | favorite | 49 comments



Even if you forget entirely about Clojure for a second, Rich has this very rare gift to take a complicated subject and make it easy for the audience to understand.


This is a double-edged sword. Forgive me a small anecdote: I once had a wonderful tutor who could explain any advanced concept in highly intuitive terms and it just made sense... until I got out of the classroom. At which point I'd just have this feeling of "Hang on, whaaaaaa...?". The first time I attempted the exam in this particular subject matter, I failed miserably (rightly so). Having learned my lesson, I went back to study the actual source material more thoroughly instead of mostly just listening to the probably-best-educator on the subject. That time I actually understood the material and passed with flying colours. (I still think the particular educator had a major role in me passing at all, but I digress.)

That that for what you will, but please be aware that a "gift for simplification" sometimes is a double-edged sword and can leave the audience thinking that they've understood when, actually, they haven't.

I'm not saying that's the case here, just something to be wary of.


The difference here is that Rich Hickey is explaining a fundamental computing mechanism that many programmers have come in with but few think about enough to give a name: the finite state transducer [see Wikipedia].

The name "transducers" is not an invention, it's basic computer science. The idea of using transducers to abstract over an input is not really controversial in the sense that it's the essence of what turns source into executable code on digital computers. Sure, Ther are tradeoffs relative to which flavor of Turing machine underpins any abstraction, but at least this is an area where we can achieve clarity.

This is something Hickey implemented, not invented. Like all of automata, understanding the concept is likely non-trivial for some pretty smart people since it requires thinking about first principles of computing.


I had similar experiences many times, and I don't know what's best, looking at an abstraction for weeks until you get the "ohh they just meant this" or knowing in advance it's not that obscure and then work to imprint said abstraction deeper.


Ever since that experience, I've tried to somehow try to practice working with "X" in some concrete way, even though it might feel like a classic "math problem" situation. Personally, I find it helps me get a grasp on "X", whatever it might be. (Mind you, this is just personal experience so YMMV.)


For those not familiar with Clojure, here's a great demonstration of the concept done up in JavaScript: http://jlongster.com/Transducers.js--A-JavaScript-Library-fo...


From that link:

"The reduce function is the base transformation; any other transformation can be expressed in terms of it (map, filter, etc)."

This seems so obvious in retrospect -- I can't believe I had never made that connection before.


Yeah. That's actually how you implement lists in lambda calculus, as opaque functions that accept a "visitor". There are two different ways of doing it:

    -- Mogensen-Scott encoding
    data ScottList a = ScottList (forall r. (a -> ScottList a -> r) -> r -> r)

    -- Boehm-Berarducci encoding
    data ChurchList a = ChurchList (forall r. (a -> r -> r) -> r -> r)
Roughly, the first encoding gives you pattern matching, and the second gives you foldr (reduce). Either of these operations is sufficient to do anything with the list.

Also note that both of these are encodings of lazy (potentially infinite) lists. To encode strict (guaranteed finite) lists, you really need algebraic data types like in ML, the visitor pattern can't do that.


I had a similar epiphany while learning about regular expressions in Perl. That connection with text processing is what helped me understand list comprehensions. They seem strikingly similar.

For JS, I personally prefer lo-dash for this kind of work, or dropping in polyfills from MDN.

For some context, here are .map functions applied to normal arrays http://jsperf.com/native-vs-array-js-vs-underscore/54

I've been looking for similar libraries that work on typed-arrays because they are so much more efficient when working with web-workers or with raw canvas data. My attempts at hacking it in feel like they are just bad ideas: http://jsperf.com/float32array-map/2


Most people use underscore/lodash for this stuff. The difference is that the js transducers libraries don't create intermediate arrays, only do enough work to produce the requested output, and work on top of anything that can be coerced into the iteration protocol. I've seen demos of using Facebook's Immutable JS, CSP.js [1], and I don't see why you couldn't put them on top of Typed Arrays or a FRP library like Kefir.

[1] http://jlongster.com/s/nationjs-slides/


Yeah, the overhead of creating intermediate typed arrays is exactly what I'm trying to avoid with this.


Lo-Dash 3.0 will have support for lazy evaluation in its chaining syntax that supports shortcut fusion and avoids intermediate arrays as well.


Do you know if they work with typed arrays?


Graham Hutton has a very nice tutorial [0] on the universality of fold, which shows this elegantly (in Haskell, though).

[0] Graham Hutton, "A tutorial on the universality and expressiveness of fold", J. Functional Programming 9(4): 355–372, July 1999. http://www.cs.nott.ac.uk/~gmh/fold.pdf


For me the "ephiphany" was when I realized reduce don't need to be (T,T)->T, but can be (T1, T2) -> T1.


They can also be (T1,T2)->T3


Not really?

When you reduce with function f(T1, T2) -> T3 the result of previous iteration becomes first argument for the next iteration, so they must be of the same "type".

Or am I missing something?


Ah, no, you're right.

I was thinking that mapping can be f(T1)->T2 and how reducing can change types too, but I guess I wasn't paying enough attention because of course the reduce signature is f(accumulator, input) and the output is accumulator, so yea, you're absolutely right: f(T1,T2)->T1


There is also Transducers Explained [1] if you are familiar with JavaScript.

(Shameless plug in the hope it may be helpful to someone)

[1]: http://simplectic.com/blog/2014/transducers-explained-1/


For JavaScripters, there's also the 'Like Underscore, but lazier' Lazy.js http://danieltao.com/lazy.js/


Here's another video by Rick Hickey about transducers: https://www.youtube.com/watch?v=6mTbuzafcII

- it was referenced in this great post detailing the transducer notion with Clojure, Scala, and Haskell: http://blog.podsnap.com/ducers2.html


So Transducers generalize the usage of Enumerable functions such as map, reduce, filter, flatMap (...).

Please could someone tell me if this is conceptually different from ruby's Enumerable module which only needs the class its included in to implement `each` so anything can be enumerable ? Or is it a similar but just in translated to the FP world ?


Yes; There are differences.

1. Transducers are parallel under the covers. Since the expectations is that the code that the predicate or mapping functions that you pass into map, filter and reduce are pure (no variables are changed, no state is modified, just a calculation that is only dependent on arguments) the parallelism is hidden away from you, but it's there. Ruby's Enumerable can't do parallelism

2. When you compose transducers, there are no intermediate sequences generated. The simplest example is this:

In Ruby: [1, 2, 3].map {|x| x + 1}.map {|x| x+ 2} will generate an intermediate array [2, 3, 4] after the first map, and then will generate [4, 5, 6] when it has evaluated the whole expression.

In Clojure (I am sure I got the syntax wrong for this one) (transduce (map #(+ 2 %) (map inc)) [1 2 3]) will create an intermediate mapping function that will first increment, then add 2 to its argument, and then will map once using that intermediate function.


Transducers _can be_ parallelized (TODO) but some transducer implementations do contain state, like `take` and therefore cannot be parallelized.

The big thing is no intermediate results.


If you watch the first Rich Hickey Transducers talk, he shows that some transducers (the ones that rely in the underlying reduce implementation that uses fork/join and assumes that reducing the collection is associative) are already parallelized. I agree with you regarding `take`


IIRC from yesterday's talk he was explaining that these will be easy to parallelize, and its slated for 1.7. I had thought they were already parallel myself until I saw the talk; and possibly some of them are, but the item is still Open.

http://dev.clojure.org/jira/browse/CLJ-1553


I was just talking about the "reduce fold" being already parallel.


The big thing is not even that, the big thing is that you can seperate your logic from the data that you want to process. So instead of (map inc [1 2 3 4]) you can do (def inc-transducer (map inc)) and then use that for any datastructures, channels or whatever other transducer context people come up with. To be sure, you would not do that with (map inc) but if you have more logic, you could.


So transducers are like .Net/LINQ enumerables?


Nah, Clojure seqs are more like .NET/LINQ enumerables.

Functions that work over seqs/enumerables return a new seq/enumerable. From what I understand, and I'm sure I am wrong, a transducer receives and transforms a value, and may call the next transducer with the transformed value.

The nice thing is that a transducer does not create intermediate results (a seq/enumerable), and that it doesn't make any assumptions on the underlying datastructure. So you can, for instance, send a value through a transducer and directly place it in a list, instead of creating a set of enumerables and then call .ToList on that. Since transducers can work with any value, you also aren't tied to enumerables. So transducers should be a little more universal and performant than enumerables.


A contrived example in C#/LINQ

  Enumerable.Range(1, 100)
	  .Where(n => n % 2 == 0)
	  .Select(n => n * 2)
          .ToList();
And as a transducer it could looks something like this?

  sequence(
    Enumerable.Range(1, 100),
    compose(
      filter(n => n %2 == 0),
      map(x => x + 1)
    )
  ).ToList();
On the linq side it would create only 2 enumerable objects (one for where and select) and the ToList would result in a copy the object pointer as it was passing through each enumerable. For the transducer example rather than having 2 enumerable's we would have 2 transducer objects?

I absolutely understand the case where a language's map/filter/reduce/etc results in a whole new list, but for the above case the performance/memory overhead improvement doesn't see that huge and really minor. The fact that it removes the overhead of having to support IEnumerable and as you mentioned is only dealing with functions is what seems like a win. At first I was thinking that plain old LINQ is also more readable, but my contrived c# transducer example doesn't look that bad in the end. Am I incorrect on any of this?


With Linq you can't really create an object/function that describes the transformation that should happen. I start with an enumerable, describe how to transform /that/ enumerable, and in the end have something that describes a transformation applied to a particular initial data. I can't do anything with it but laziy read the transformed values for the original input.

With transducers I can describe a transformation that should happen to /some/ reducable input, and in the end have something that represents just the transformation. I. An then apply that to any reducable input I have.

[edit]

Your example becomes:

  var incrementedEvens = compose(
      filter(n => n %2 == 0),
      map(x => x + 1)
  );
  transduce(Enumerable.range(0, 100), incrementEvens);

  var lowerAlph = compose(
    filter(x => Character.isAlpha(x)),
    map(x => x.ToLowercase())
  );
  System.in.setTransducer(lowerAlpha);
You can really do that in Linq.


Thanks for the answer. Indeed Ruby in general has the limitations inherent to its "non functional" nature. However regarding 2) Enumeration can now (in ruby 2.0) be lazy and thus take each item one after the other through the whole chain.

(1..Float::INFINITY).lazy.map{|a| p a; a*10}.map{|a| p a+1} # => 1 11 2 21 etc...

So it can work with streams & so on... But I know (from a Jessica Kerr presentation) it also has limitations there.


Transducers are a different kind of thing, not just a generalized usage of enumerable functions.

Clojure transducers are isomorphic to functions of type `a -> [b]`, that is, functions which take some thing and return lists of other things. You can implement map and filter in this language:

    tmap f a = [f a]
    tfilter p a = if p a then [a] else []
You can also compose two such functions:

    concat :: [[x]] -> [x]
    concat [] = []
    concat (x : xs) = x ++ concat xs
    
    concatMap f list = concat (map f list)
    
    compose lb_a lc_b = \a -> concatMap lc_b (lb_a a)
One reduction function (`foldr`) turns out to just be the identity function [a] -> [a], as I recall.

The interesting thing about Clojure transformers is that they have a strange continuation-passing form, such that this function `compose` that I wrote above for the type `a -> [b]` is actually in Clojure just a function composition. That is, instead of `a -> [b]` we see `forall r. (b -> r -> r) -> (a -> r -> r)`, which composes with normal function composition.


What's you're thinking of is more like ISeq in Clojure, or Foldable is Haskell. The more interesting generalisation is not between arrays, vectors and hashsets, but between data structures and event streams.


Again a more technical talk. I really liked it. Specially the end bit about pipelines, that seams like it would be really usful.

Transducers overall are quite intressting, I am exited to see what other transducer context people come up with.


It's not that clear there will be one. Transducers have three steps: begin, during and end. In practice, start is only used for reduce operations and outside of reduce operations end can only be used for its side effects.

There's also definitional issues e.g. can you have an async transducer?


You can indeed use transducers in asynchronous contexts. In fact, that is what intrigues me most about the abstraction.

Transducers are defined such that you can abstract the context of input, output and iteration and focus on the transformation of each element from an independent source individually. The implementation of each transducer accepts another transformation in a "pipeline" (eventually the output sink) and accepts an input during an external iteration process. The implementation decides what to do with it (map changes with a function, filter ignores certain values with a predicate, etc.) You can also define transducers that send multiple values for each input (think string.split) or some that do not send any until completion (think buffered results).

Since the iteration source and sink are abstracted from the transformation, you can use the same transformation in other contexts (Promises, event streams, IO streams, CSP, etc.)

I've been experimenting with transducers in asynchronous contexts in JavaScript if anyone is interested, for example:

[1]: http://simplectic.com/projects/underscore-transducer/ [2]: http://simplectic.com/projects/underarm/ [3]: https://github.com/transduce/transduce-async [4]: https://github.com/kevinbeaty/transduce-stream [5]: https://github.com/transduce/transduce-push [6]: https://github.com/transduce/transduce-then


Yes you can have async and blocking transducers. Forward to the point in the video about channels for a discussion.


But if you use an async transducer on a channel, it fails. (Actual tested code, sadly.)

Which might be okay, but my basic point that it's definitionally fuzzy still holds.


These aren't materially different from itertools in Python.


For those who know Russian, here's awesome critics of "transreducers" http://thedeemon.livejournal.com/87320.html


Could you try to give a summary for those who don't?


The summary: "Hickey did not invent FP, how dares he?"


People keep attacking him whenever he interduces something. Why? Every time he releases something, he also shows from what research it originated. Go back to the first transducer talk and you will see the references.


Personally, I don't mind him too much even though I'm a typeful-programming weenie. That said, he does have a tendency to somewhat unfoundedly say "... and you couldn't do this in a typed language" (or at least allude to it)... whereas people again and again show that, yes, it could be done in a statically typed language. In fact, this is trivially true in a sense as shown by Bob Harper[1] another person who is undoubtedly hugely influential, but who I have trouble with because of his obvious bias and trollish ways.

(I think Rich is absolutely spot on on many of the more abstract things about or industry/craft, but this is just one of those details that irks me.)

[1] http://existentialtype.wordpress.com/2011/03/19/dynamic-lang...


In the strangeloop talk his argument seemed to be that you couldn't use types to describe the entire and exact contract of transducers. Which is true (and true of almost anything), but you can at least describe a significant enough chunk of it to reduce errors.

As always it's a trade off. As you try to capture more of the contract, you massively increase complexity, sometimes to the point of reducing usability (i.e. staring at an obtuse type error, or trying to work out how to do basic things with a function just from a Scala type)

It's my interpretation that "you can't type transducers" is an invitation to think more deeply about types and the limitations in what they can express, mixed in with some "get off my back about creating a dynamically typed language".


Isn't the problem with transducer typing is that transducers aren't very functional? In one of your posts you wrote:

   System.in.setTransducer(lowerAlpha);
In the LINQ/functional origami world, you could only produce new streams, not transform an existing stream in place. Because a new stream is computed, a new type can be computed. However, from what I've seen with transducers, they would be limited to basically Func<T, Predicate<T>, T> signatures.


That's part of it, but of course you can embed monadic types/effects within the appropriate combinators. (I think "tel" showed an example of this in the original "transducers" post)




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: