If I'm reading the source correctly, `clojure.xml/parse` consumes the entire XML document. However, its successor `clojure.data.xml/parse` is lazy. The techniques described in the blog post should work with a lazy sequence (lazy tree), so you can stream data in that doesn't fit into memory. See http://stackoverflow.com/a/11215430/239678 and http://clojure.github.io/data.xml/#clojure.data.xml/parse.
You're right that fusing loops and eliminating temporaries at compile time would substantially reduce (heh) the perf motivation for transducers. However, perf is not all there is to transducers.
For one thing, transducers can be used in alternative "reducing contexts", for example, a core.async channel. If you define map/mapcat/filter/etc in terms of a concrete data structure (such as lists), you can't reuse them as readily.
Another perf-ish reason for transducers is separate compilation. It's dramatically easier to fuse loops for a high-level symbolic representation, but can get much trickier once you only have byte-code left. All Clojure functions are compiled immediately upon creation and the source code is discarded. By being built out of function calls, you can have package A define a transducer and package B compose it with another tranducer, without having to perform inter-module optimizations. And the JIT will perform inline across modules at runtime!
Having said that, there's an alternative approach that can be made to work too: yield. Not shallow yield; delimited-continuation / monadic yield. Scala's collections approximate this idea with effectful Traversable and such, but really it's not quite right in both performance and flexibility.
So yes, Transducers are a bit of a hack to accommodate the host, but no, they are not totally without novelty or intrinsic value.
I don't really like the "streaming" story. In reduction, you have to implement three arities. In streaming, only two are used. Moreover, the "reducing value" is opaque in said scenarios. All of this basically amounts to the two cases are very different. So a streaming transducer can be used in a reducing context, but not really vice versa.
Seems perfectly reasonable to me, since streaming implies that there is not necessarily a notion of a beginning nor an end. Reducing has stronger requirements of its operations.
I will admit that the multiple arities is a bit awkward when an interface (or two) could have done the trick. I'm not 100% sure I understand why Rich choose to do it the way he did. I suspect it was so that `comp` would work.
I probably would have just defined ITransducer or something and supplied a custom composition function, but then again, the Haskell lens package takes the same approach, preferring composition via `.` on functions instead of extending a hypothetical "Composable" or "Pipelinable" type-class upon which `comp`/`.` could be built.
"xml/parse parses the file into a tree structure like this"
So aren't we fitting the whole document into memory anyway?