
Processing documents with Clojure transducers - jonpither
http://blog.juxt.pro/posts/xpath-in-transducers.html
======
kenko
"Zippers are complicated animals to work with, take quite a lot of setting up
and require that the document fits in memory."

"xml/parse parses the file into a tree structure like this"

So aren't we fitting the whole document into memory anyway?

~~~
loevborg
If I'm reading the source correctly, `clojure.xml/parse` consumes the entire
XML document. However, its successor `clojure.data.xml/parse` is lazy. The
techniques described in the blog post should work with a lazy sequence (lazy
tree), so you can stream data in that doesn't fit into memory. See
[http://stackoverflow.com/a/11215430/239678](http://stackoverflow.com/a/11215430/239678)
and
[http://clojure.github.io/data.xml/#clojure.data.xml/parse](http://clojure.github.io/data.xml/#clojure.data.xml/parse).

------
dukerutledge
Transducers seem really nice, but wouldn't they be unnecessary in a language
that could optimize list (etc) fusion?

~~~
brandonbloom
You're right that fusing loops and eliminating temporaries at compile time
would substantially reduce (heh) the perf motivation for transducers. However,
perf is not all there is to transducers.

For one thing, transducers can be used in alternative "reducing contexts", for
example, a core.async channel. If you define map/mapcat/filter/etc in terms of
a concrete data structure (such as lists), you can't reuse them as readily.

Another perf-ish reason for transducers is separate compilation. It's
dramatically easier to fuse loops for a high-level symbolic representation,
but can get much trickier once you only have byte-code left. All Clojure
functions are compiled immediately upon creation and the source code is
discarded. By being built out of function calls, you can have package A define
a transducer and package B compose it with another tranducer, without having
to perform inter-module optimizations. And the JIT will perform inline across
modules at runtime!

Having said that, there's an alternative approach that can be made to work
too: yield. Not shallow yield; delimited-continuation / monadic yield. Scala's
collections approximate this idea with effectful Traversable and such, but
really it's not quite right in both performance and flexibility.

So yes, Transducers are a bit of a hack to accommodate the host, but no, they
are not totally without novelty or intrinsic value.

~~~
moomin
I don't really like the "streaming" story. In reduction, you have to implement
three arities. In streaming, only two are used. Moreover, the "reducing value"
is opaque in said scenarios. All of this basically amounts to the two cases
are very different. So a streaming transducer can be used in a reducing
context, but not really vice versa.

~~~
brandonbloom
Seems perfectly reasonable to me, since streaming implies that there is not
necessarily a notion of a beginning nor an end. Reducing has stronger
requirements of its operations.

I will admit that the multiple arities is a bit awkward when an interface (or
two) could have done the trick. I'm not 100% sure I understand why Rich choose
to do it the way he did. I suspect it was so that `comp` would work.

I probably would have just defined ITransducer or something and supplied a
custom composition function, but then again, the Haskell lens package takes
the same approach, preferring composition via `.` on functions instead of
extending a hypothetical "Composable" or "Pipelinable" type-class upon which
`comp`/`.` could be built.

