
Harlan, new Lispy language for GPU programming - mpweiher
http://blog.theincredibleholk.org/blog/2013/06/28/announcing-the-release-of-harlan/
======
swannodette
Pretty cool, region inference is handled via miniKanren, a relational /
constraint logic programming embedding in Scheme (also available in Clojure as
core.logic):

[http://github.com/eholk/harlan/blob/master/harlan/middle/inf...](http://github.com/eholk/harlan/blob/master/harlan/middle/infer-
regions.scm)

~~~
eholk
We did start out with a miniKanren type inferencer and region inferencer, but
I replaced it a couple months back with a unified type and region inferencer.
Region inference is definitely an interesting challenge for miniKanren, since
a lot of the constraints it introduces are soft. There are many legal region
assignments, but deciding the best one requires some heuristics.

~~~
octo_t
Have you looked at GPUVerify[1] from Imperial College to verify the
correctness of code produced?

[1] -
[http://multicore.doc.ic.ac.uk/tools/GPUVerify/](http://multicore.doc.ic.ac.uk/tools/GPUVerify/)

------
BruceIV
I read through the user guide in the git repo, but it hasn't been updated for
a year, and doesn't tell me what I'm most interested in (namely, what kind of
optimizations does the compiler do, and what's the win for this language over
CUDA, besides being able to write kernels in Lisp instead of C++) - does
anyone have any insight?

~~~
eholk
I really should update the user guide, since the language has changed a lot in
the last year.

I have an earlier post that discusses a couple of the optimizations that
Harlan does: [http://blog.theincredibleholk.org/blog/2013/06/10/some-
simpl...](http://blog.theincredibleholk.org/blog/2013/06/10/some-simple-gpu-
optimizations/)

To me the win for Harlan over CUDA is its region system, which lets you work
with more intricate pointer structures in the GPU. For example, there is an
interpreter for the Lambda Calculus as one of the test cases, which would be
much harder to do in straight CUDA:
[https://github.com/eholk/harlan/blob/master/test/lambda3.kfc](https://github.com/eholk/harlan/blob/master/test/lambda3.kfc)

Very soon, Harlan will have support for higher order procedures, which is also
not available in CUDA.

~~~
iskander
How do you support higher order procedures? Do you do control flow analysis &
defunctionalization or are these "real" higher order functions?

~~~
eholk
It's basically defunctionalization. It's hard to get access to function
pointers on the GPU to do "real" higher order functions.

~~~
seanmcdirmid
What version of CUDA are you talking about. I though the latest version
supported indirect function pointers (finally).

~~~
iskander
That's good news! Did that change with Kepler? Any links to more info?

------
z3phyr
Eric is also working on the Rust language for GPU.
[https://github.com/eholk/rust/tree/nvptx](https://github.com/eholk/rust/tree/nvptx)

------
Ihmahr
This is very much needed to make gpu more mainstream.

~~~
phren0logy
I'm not sure if you are joking (given that it is a lisp, which is notoriously
not mainstream), but my limited experience with GPU programming with the
existing tools has been pretty painful. I, for one, welcome this development.

~~~
bajsejohannes
Yes, anything to make it easier to get started. I've tried before, but got
lost in figuring out which SDK to download. Too bad this toolkit doesn't help
in that process.

Happily, writing this answer I discovered that OS X 10.7+ ships with OpenCL
already! Can't believe I missed that last time.

For anyone else wanting to try it, the first example is pretty straight
forward:

[https://developer.apple.com/library/mac/#documentation/Perfo...](https://developer.apple.com/library/mac/#documentation/Performance/Conceptual/OpenCL_MacProgGuide/XCodeHelloWorld/XCodeHelloWorld.html)

(You'll also want the code examples at
[https://developer.apple.com/library/mac/#documentation/Perfo...](https://developer.apple.com/library/mac/#documentation/Performance/Conceptual/OpenCL_MacProgGuide/ExampleHelloWorld/Example_HelloWorld.html)
)

~~~
reeses
The OpenCL implementation provided by Apple is less than ideal. It lacks a
number of optimizations that are provided by, for example, AMD's OpenCL driver
packages for Linux and Windows.

That, coupled with the wretched GPU options available on the Mac Pro, make the
current stack not really cost effective. (I think a 5770 is still $250.)

However, it is better than CPU-only.

I've found slaving a cheap Linux box with expensive GPUs to be a much more
gratifying experience.

~~~
eholk
I've done most of my development on a Mac, but I keep finding many bugs in
Apple's OpenCL implementation. Intel's seems to follow the standard most
closely, which is useful for keeping me honest.

------
s-phi-nl
This reminds me of the Chapel programming language
([http://chapel.cray.com/](http://chapel.cray.com/)), which seeks to be a
relatively high-level language for all parallel computation.

They differ in that Harlan is just for GPUs, while Chapel aims to be good for
all parallel computation. Also, Harlan uses Lisp syntax, while Chapel's syntax
is C-based.

Any comments on similarities/differences between them from those who know
either one better than I?

------
SeanDav
A pity this is not available under Windows (unless I missed something).

~~~
eholk
It's not, although I don't think it should be too hard to port. I haven't done
it yet because I don't have ready access to a Windows machine with an OpenCL-
capable GPU.

