Hacker News new | comments | show | ask | jobs | submit login

Just wondering out loud... I'm interested in how this relates to the existing implementations in core.

For example, if we went back in time, would all of the core functions have been implemented this way? Would this be a possible drop in replacement in the future? Could future versions of Clojure integrate these ideas more deeply? If so, what are these backwards compatibility concerns?




> For example, if we went back in time, would all of the code functions have been implemented this way?

I don't think so - the existing implementations work on the higher-level abstraction of sequences. Reducers are optimized parallel versions that work on collections. While parallelism is extremely useful in some parts of your code, there is overhead and I don't think you would want either the overhead or the restriction of working below the sequence abstraction in the general case.

I seem to see some of the same choices being made available in the new Java 8 collections and parallel operations work. That is, it is up to the developer when to "go parallel".

For an entirely different approach, check out Guy Steele's Fortress language which eagerly executes most things in parallel by default (like all iterations of a loop, arguments to a function, etc) and you have to tell it not to do that.

Guy's Strange Loop 2010 talk is an interesting complement to this work: http://www.infoq.com/presentations/Thinking-Parallel-Program...


Well besides the implicit parallelism of fold, wouldn't this be generally useful to reduce intermediate list creation? Or do lazy-seqs already solve that problem?


Yes, reducers provide that benefit to sequential reduce, independent of the parallelism of fold.


So then, what downsides, if any (I'm sure there are), would there be to moving this reducers model to being the "default"?


When you call a reducer like r/map or r/filter, the result is reducible but not seqable. So, if you have an incremental algorithm, are calling first etc, the seq-based fns are a better fit. Also, lazy fns automatically cache whereas reducers recalc every reduce, until you save the realized result. They are genuinely complementary in many ways.


1) What is the benefit of recalculated on every reduce? Is this so you can use side-effects?

2) If seq or first is called on a reducible, wouldn't it be easy to just implicitly realize the reducible in to a sequence first?


1) The benefit is that you don't have to cache the results in a data structure, which really slows it down. Suppose you map the function (fn [x] (+ x 1)) over a reducible, and then you sum it by reducing it with +. With reducibles, there is no intermediate allocation, and it will run really fast especially if the functions can be inlined. Compare this with mapping and then folding a lazy seq: map builds an intermediate data structure, and reduce immediately consumes it.

2) That's possible, but it makes it too easy to write code with abysmal performance because of (1). The common case is that you call both first and rest on the reducible. If both turn the reducible into a seq first, then both will take O(n) time in the best case (might be much worse depending on how the reducible was built up). Combine that with the fact that most times, you're going to recurse on the rest, and you've got an O(n^2) algorithm where you expected O(n), if everything is based on reducibles. Additionally, it's impossible to take the first or rest of an infinite reducible (well, perhaps you could do it with exceptions -- in general you can turn a reducible into a seq with continuations).


In the reducible model as described, it's not possible to implement some operations efficiently, like map on multiple collections (this is called zipWith in Haskell). Also, if you filer/map/etc a function over a reducible and then use the resulting reducible twice, it will cause the filter/map/etc to be performed twice. This can lead to performance surprises, for example it can turn a linear time algorithm into an exponential time algorithm in some cases. Thus this is not a drop-in replacement for seqs, although it can probably cover most uses more efficiently.


Fold is implicitly sequential. See Guy Steele's ICFP 09 presentation on conc lists, previously discussed here as http://news.ycombinator.com/item?id=814632.


foldl and foldr are sequential, but Clojure's fold is potentially parallel.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: