Hacker News new | past | comments | ask | show | jobs | submit login
ICFP '09: "Get rid of cons!" Guy Steele on parallel algorithms & data structures (sun.com)
99 points by mquander on Sept 10, 2009 | hide | past | web | favorite | 30 comments



I think the way parallelisism changes the established trade-offs in code "optimization" is summarised by his lines:

Don’t split a problem into “the first” and “the rest.” Instead, split a problem into roughly equal pieces; recursively solve subproblems, then combine subsolutions.

A tree can be processed breadth-first (parallel) or depth-first (sequential), or both.


That would be OK if Lisp wasn't list-based. In Lisp you always start at the beginning of the list: any processing requires going through that list somehow.

Prolog allows some workaround to this problem by using difference lists.


Did you not read the talk? You can provide the car/cdr interface on top of tree-based implementations. Yes, you pay a cost to use that interface...but you pay a different (Guy Steele would argue more expensive) cost if you write your algorithms using car/cdr instead of, say, empty/singleton/conc.

Moreover, as is pointed out on the slides, a lot of processing can happen without ever having to talk about car/cdr, e.g., using higher-order functions like map and filter.


This aspect of the talk is a big justification for more abstraction in a lisp (namely interfaces / type classes / similar).

This decision should not actually affect written code, other than maybe a type declaration or similar to say "this is a singly linked list" or "this is a tree list". Even if you ignore performance, that car really means "get the first element of this two element cons cell" rather than "get the first element of this sequence" is one of the most obvious limitations / warts of (classical) lisp (and scheme) as compared to more modern languages designed with data abstraction in mind.

Of course, this is "modern" as in OO decades ago (Java, C++, etc). Taking it a step further you could omit the type declaration entirely and infer at compile or runtime (via profiling + JIT) which list implementation is best.


My point was that Prolog works out of the box; not so with Lisp.


The language (and the features) presented in this presentation are from Fortress, which Guy and other Sun researchers have worked on for some time. On the project page one can download Fortress 1.0 and also check out Fortress 1.0 specification: http://projectfortress.sun.com/Projects/Community/

It's a pretty ambitious project and it looks like they are going to succeed. The focus of the language is to create a new Fortran, but personally, I think the scope and the ambition of Fortress are bigger. Viva la Guy Steele :)


From Fortress and a few other languages. Like Scheme. In the talk at ICFP he said, he did this, because he likes them all.

The video should come online soon. I hope ACM won't put it behind a pay wall.


the video is now online: http://www.vimeo.com/6624203


On p73: Use tree branching factors larger than 2. (Example: Rich Hickey’s Clojure is a JVM-based Lisp that represents lists as 64-ary trees.)

This is awesome and I had no idea!


Take a look at http://blog.higher-order.net/2009/02/01/understanding-clojur... for details of Clojure's persistent vector implementation.



Clojure tries to address this problem. First and rest are still fundamental concepts in its data structures, but underneath the hood is properly written concurrent Java code. Hickey wrote it once, and now you don't have to.

I highly recommend the Clojure videos, available at: http://clojure.blip.tv/. Watch them online or download them via iTunes. There are a two videos on Clojure data structures, and a set of introductions for both Lisp and Java programmers


This is confusing the difference between concurrency and parallelism, the latter of which Clojure doesn't yet address.

If your program is divided into threads, having a language like Clojure that makes concurrency easy and lock-free is a wonderful thing, and of course you get some benefit from running on multiple cores, though I don't think that's entirely the point.

If, on the other hand, you're looking for a faster way to map a function across eight million lines in a flat file, using Clojure's map function won't magically make your program run faster (and pmap might not run at all -- you may run out of memory, since pmap isn't truly lazy).

The fact that a Clojure PersistentVector (not a PersistentList) is implemented with a 32-way tree (not 64-way) does not imply that Clojure has abandoned the car/cdr paradigm in favor of conc lists -- it still uses cons, in a more generic form, but still with the same first/rest concept. The trees allow vectors to be inexpensively persistent, so you can have actual practical immutable vectors.


Ah yes, a silly mistake on my part. I am hoping that eventually Clojure will help the programmer write parallel algorithms, but you are correct in that such constructs don't exist in the language at this moment.



sweet. Another incf for Clojure.

Is there auto tree balancing?


Just saw your reply. That's kickass.


Hickey wrote it once, and now you don't have to.

But Steele is talking about introducing fundamental new constructs all the way up to the application level. This is not a problem that can be abstracted away by better language implementations, and certainly not by better implementations of familiar constructs.


> and certainly not by better implementations of familiar constructs.

Why not? I sort of took the slides as just that. For a good chunk of the slides he talks about the simple singly-linked list found in LISP. Then shows that implemented as a Tree (conc list), familiar operations still exist, but we get improvements by making operations more parallelizable.

Slide 24 starts showing how `first`, `rest`, `append` can be implemented with conc lists, and 31 shows `map`, `reduce`, `mapreduce` as well. Then, he goes on to show `length`, `filter` and `reverse` in terms of `mapreduce`.


Why not?

Well, if first and rest are unsuitable for parallelism as Steele says, a programming edifice built on top of better implementations of those might be beside the point.

My understanding of the slides is that we don't need better language runtimes to exploit parallelism for us while we continue to write the same kind of code as before; what we need is a new set of primitives (modeled by conc lists in this talk) that would allow us to compose programs differently, with more parallelism. And that far from being insulated from this, programmers will have to be thinking about it throughout.

It's possible that Clojure's implementation is already such that these primitives, and a programming style built on them, are feasible today. If so, that's awesome. But this is a different thing than providing better implementations of cons-based abstractions.

Edit: another thing - the transformation from cons to conc is nontrivial in the Lisp context because Lisp's code=data is so tied to cons. It might be an interesting to think about what programs would look like if you applied code=data rigorously in a conc model. Would a language whose expressions are represented as concs look or feel significantly different than a cons-based one?


> It might be an interesting to think about what programs would look like if you applied code=data rigorously in a conc model. Would a language whose expressions are represented as concs look or feel significantly different than a cons-based one?

Now that is food for thought.


Concatenative / point-free style? Guy already suggests this on p.68. Paraphrased, if data can be chopped in pieces, maybe code can too?


> My understanding of the slides is that we don't need better language runtimes to exploit parallelism for us while we continue to write the same kind of code as before;

Yes, I agree he was saying that, but that doesn't mean we can't change the implementations of what we use now to actually make it easier to do accomplish this.

> another thing - the transformation from cons to conc is nontrivial in the Lisp context because Lisp's code=data is so tied to cons. It might be an interesting to think about what programs would look like if you applied code=data rigorously in a conc model. Would a language whose expressions are represented as concs look or feel significantly different than a cons-based one?

But, you could still implement car and cdr in terms on conc, so this is not really an issue. It might not be the most efficient thing, but that's sort of the benefit of macros. Expand at compile time, and avoid the overhead. So, in summary, write macros in terms of an emulated cons cell, and use conc optimized primitives elsewhere.


But, you could still implement car and cdr in terms on conc, so this is not really an issue.

It might still be an issue. The more distance there is between how code is represented and how data and lower levels of code are represented, the less fluid programming in Lisp becomes. Yes, you can translate one to the other. But the sweet spot of Lisp is the space where no such mapping is needed or it's trivial; the magic drops off pretty sharply as you leave that space. It's not obvious (to me anyway) what the effect of what we're talking about would be. Small changes can have surprisingly big impacts, and one has to include the psychology of this.


I don't think so. Lisp code is already a tree.


It's a tree in the sense that a list is a tree.


No, it's really a tree structure. In fact lisp code is much like the abstract syntax tree that you get when you parse other languages.


Bring back SETL. Higher order operations on abstract sets can hide the complexity of parallelism. Under the hood you can implement sets using whatever data structure fits your hardware (single core, multiple cores, GPUs, weird memory hierarchies).


I'd like to see somebody one-up this and say ``and lambda!''. What I mean is restricting explicit recursion/loops to combinators which respect algebraic laws (i.e. Backus's FP/FL) in order to get a better grip on program transformation to reach efficient implementation. Guy already hints at this by stressing the importance of MapReduce.


I'm glad someone is working on this but I think parallel 'map' and 'filter' (which is 'easy' for compiler writers) will be plenty for the next five years. And how sure is anyone about the future of hardware beyond that?

Still this is really interesting. Can't wait to see the video.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: