
Lift – Lisp Flavoured Tensor - antman
https://github.com/bhuztez/lift
======
rcarmo
This is begging for using [http://hylang.org](http://hylang.org) instead of
vanilla Python.

------
eschaton
This isn't so much "Lisp" as "S-expressions via Python."

I was expecting it to actually be implemented in Lisp.

~~~
gnarbarian
It's a step in the right direction. I can't wait to see more accessible
languages like lisp built on top of vulkan and opencl.

Programming for GPGPU and heterogeneous computing is difficult, unwieldy, and
feels primitive (Partly by design). With better languages more people will be
able to write powerful abstractions that are also highly concurrent when the
resources are available.

Pyopencl is also a step in the right direction. It allows you to descend into
opencl for the heavy lifting while remaining in the python ecosystem for
everything else.

~~~
Athas
What about something like Futhar[0] (which I am working on)? It also has
Python interop[1], although not quite as elegant as Lift.

[0]: [http://futhark-lang.org](http://futhark-lang.org)

[1]: [http://futhark-lang.org/blog/2016-04-15-futhark-and-
pyopencl...](http://futhark-lang.org/blog/2016-04-15-futhark-and-
pyopencl.html)

~~~
gnarbarian
Can I write code in futhark which shares memory with objects being rendered in
OpenGL? In other words, can it operate directly on objects being rendered?

I'd like to be able to write physics simulations that are visualized in 3d
without having to copy the points in and out of futharks memory (killing the
performance)

~~~
Athas
Not yet. But you can already write code that shares memory with objects that
you also access from hand-written OpenCL code, and since OpenCL can interop
with OpenGL, what you are asking for should eventually be possible.

~~~
gnarbarian
ahh yes. I have done something like that before with pyopencl but it was not
pretty:

[https://www.youtube.com/watch?v=lnOmy1ly6M0&list=PLCN-
Ml6vUJ...](https://www.youtube.com/watch?v=lnOmy1ly6M0&list=PLCN-Ml6vUJ-
JTiqdvcHd0vZVSXDpw_RcW)

~~~
Athas
I have exactly that example here: [https://github.com/HIPERFIT/futhark-
benchmarks/tree/master/a...](https://github.com/HIPERFIT/futhark-
benchmarks/tree/master/accelerate/nbody)

All simulation and rendering is done in Futhark, with Python+Pygame for gluing
things together. All the particle information stays on the GPU at all times.
The only thing being copied back to the CPU is the rendered bitmap, which is
then immediately moved back with a Pygame blit operation...

------
mlhulk
After doing some benchmark or source code reading you'll find out that the
author was just lying about its performance, since he even lacks basic
knowledge of GPGPU optimization techniques and made wrong use of isl to
generate low quality but obfuscated OpenCL kernel code, which is hard to see
through at first. Also he mistook "S-expression" for lisp, which is
ridiculous.

------
Athas
Can someone show how the generated code for matrix multiply looks?

~~~
bhuztez
the OpenCL generation is mostly stolen from ppcg (
[http://ppcg.gforge.inria.fr/](http://ppcg.gforge.inria.fr/) ) Unlike ppcg,
right now, the local memory is not properly handled, you can see that there
are some complicated expression here.

__kernel void kernel0( __global float v0[8][8], __global float v2[8][8]){
__local float local_v0[2][2][16]; float private_v2[2][2]; int b0 =
get_group_id(0); int b1 = get_group_id(1); int t0 = get_local_id(0); int t1 =
get_local_id(1);

for(int c2 = 0; (c2 <= 15); c2 = c2 + 1){ if(((((((((30 * t0) + (31 * t1)) +
(16 * b0)) + (28 * c2)) + 31) % 32) >= 16) || (b1 == t0))){
local_v0[t0][t1][c2] = (v0[((((2 * t0) + t1) + (4 * c2)) / 8)][((((2 * t0) +
t1) + (4 * c2)) % 8)]); } }

barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); for(int c0 = (2 * b0);
(c0 <= 7); c0 = c0 + 4){ for(int c1 = (2 * b1); (c1 <= 7); c1 = c1 + 4){
private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)] = 0.000000; for(int
c2 = 0; (c2 <= 3); c2 = c2 + 1){ for(int c5 = (2 * c2); (c5 <= ((2 * c2) +
1)); c5 = c5 + 1){ private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)]
= ((private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)]) +
((local_v0[(c2 % 2)][((-2 * c2) + c5)][(((2 * t0) + (2 * c0)) + (c2 / 2))]) *
(local_v0[b1][t1][((((-2 * b1) + c1) / 4) + (2 * c5))]))); } }
private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)] = (private_v2[(((-2
* b0) + c0) / 4)][(((-2 * b1) + c1) / 4)]); } }

for(int c0 = 0; (c0 <= 1); c0 = c0 + 1){ for(int c1 = 0; (c1 <= 1); c1 = c1 +
1){ v2[(((2 * b0) + t0) + (4 * c0))][(((2 * b1) + t1) + (4 * c1))] =
(private_v2[c0][c1]); } }

barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); }

~~~
gnarbarian
Man I think it dropped one if these )

------
Chris2048
Title feels like it was Markov-chain generated :-)

------
TylerE
Might want to consider a name change...

[http://www.liftweb.net/](http://www.liftweb.net/) is a well established
project dating back years.

~~~
andrewchambers
The thing you need to understand, is that simpsons already did everything
years ago, and thats ok.

