
GPU LSM: A Dynamic Dictionary Data Structure for the GPU [pdf] - espeed
https://arxiv.org/abs/1707.05354
======
tmostak
I was about to share this with the team at OmniSci (GPU analytics platform)
and realized the authors included our very own Saman Ashkiani and adviser John
Owens.

Having fast dynamic data structures on the GPU is of huge utility. People
think that you can't make these sorts of things efficient due to thread
divergence, but if you do it right, the massive flops and memory bandwidth of
a GPU can really work in your favor.

~~~
espeed
Hi Todd - Has your team looked at GraphBLAS [1] yet for MapD [2] now that Tim
Davis has released a 2.0 reference C implementation (included in SuiteSparse
5.3) [3]? There is a GraphBLAS 2.0 CPU implementation in RedisGraph now [4]
and official GraphBLAS 2.x GPU implementations are in the works.

[1] GraphBLAS [http://graphblas.org](http://graphblas.org)

[2] OmniSci MapD DB [https://github.com/omnisci/mapd-
core](https://github.com/omnisci/mapd-core)

[3] SuiteSparse::GraphBLAS
[http://faculty.cse.tamu.edu/davis/suitesparse.html](http://faculty.cse.tamu.edu/davis/suitesparse.html)

[4] Previous Discussion
[https://news.ycombinator.com/item?id=18099520](https://news.ycombinator.com/item?id=18099520)

~~~
tmostak
I've definitely been hearing about the project but didn't know it was being
ported to GPU, that's great news. How does this compare to things like
Gunrock?

~~~
ctchocula
Hi Todd!

Our recent work implements a subset of GraphBLAS operations for GPU and
compares them to Gunrock in breadth-first-search [1]. Our implementation of a
subset of GraphBLAS is comparable to Gunrock performance for power law graphs,
but are worse for mesh graphs. Gunrock uses a different load-balancer in
Advance for those graphs and the load-balancer we use in the analogous
operation (matrix-vector multiplication) isn't as optimized for mesh graphs.
We definitely want to collect data in more applications than just BFS, so
we're working on that now.

The code is open-source, so feel free to check it out! [2]

[1]
[https://arxiv.org/pdf/1804.03327.pdf](https://arxiv.org/pdf/1804.03327.pdf)

[2] [https://github.com/owensgroup/push-
pull](https://github.com/owensgroup/push-pull)

------
vanderZwan
> _We believe that our GPU LSM is the first dynamic general-purpose dictionary
> data structure for the GPU._

And now we wait and see if this is true, or if we'll see Cunningham's Law in
action[0]. Either way we'll learn something.

Anyway, a general-purpose dictionary data structure for the GPU sounds like it
should unlock a whole lot of new possible use-cases, no? Are there any
scenarios where this would be immediately beneficial?

[0]
[https://meta.wikimedia.org/wiki/Cunningham's_Law](https://meta.wikimedia.org/wiki/Cunningham's_Law)

~~~
The_rationalist
I may actually use it for my natural language logic checker.

(This experimental program analyse human text and detect if there is a
reasoning error, if so, which one). It already has 100% of success for
syllogisms. But I will in the future be hit by combinatorial explosion and gpu
acceleration will help me, even if dictionnaries are not my main data
structure.

~~~
martindevans
That sounds incredible. Is it publicly available somewhere?

~~~
The_rationalist
Thank you :) No, it's still too much in alpha stage.

I will not describe my algorithms, but still, I don't use overhyped neural
networks (which are not well designed for true NLU) instead I use a normal
program like opencog which allow top-down inferences.

~~~
vanderZwan
Speaking of which, have you seen this? Maybe this alternative strategy fits
your problem better than the standard _" overhyped neural networks"_ ;)

Good luck with your research!

[https://www.quantamagazine.org/new-ai-strategy-mimics-how-
br...](https://www.quantamagazine.org/new-ai-strategy-mimics-how-brains-learn-
to-smell-20180918/)

------
amelius
I was under the impression that GPUs were mainly optimized for data-locality,
while other access patterns are better left to the CPU with its sophisticated
(in comparison) cache hierarchy. It seems this data structure is operating
against this principle.

~~~
The_rationalist
As a side note I believe the whole purpose of HSA is to diminish gpu data
access latency by being more synchronised with the cpu.

[http://www.hsafoundation.com/](http://www.hsafoundation.com/)

~~~
monocasa
No, it's to decrease manual translations between the CPU and the GPU by having
the virtual memory maps match between them. Then you can just share pointers
back and forth between the CPU and GPU.

