Am I right this language compiles to GPU code?
I'm thinking of a similar project myself and I'm curious what consideration besides loop vectorization goes into such stuff, especially, what about caches and access issues (OK, I ask the same question for any project like this)?
Also, isn't one factor in sparse representations that if you aren't careful, the data becomes un-sparse and slows down a lot?
Today is you lucky day! Taichi is also a C++ library with first class python bindings: https://github.com/yuanming-hu/taichi
however, as a C guy, I really don't want to give up the control of my data structures.