
Towards Yinyang K-means on GPU - tanoku
http://blog.sourced.tech/post/towards_kmeans_on_gpu/
======
nl
This is pretty interesting. Is the author here? I have something to point them
at. @nlothian on Twitter, or email in my profile.

~~~
vmarkovtsev
Here. Please don't bite :)

~~~
fnord123
Do you have any interest in implementing other clustering algorithms on GPU?
e.g. HDBSCAN? Or is it not as parallelizable?

~~~
cs702
Agree on HDBSCAN/DBSCAN, which is able to find the number of clusters in a
large class of problems (unlike K-means, which requires that the number of
clusters/centroids be provided as a hyperparameter, or found via some kind of
search).

Otherwise, I just want to say to vmarkovtsev: thank you for this -- I will add
it to my arsenal of tools, and may others will surely do so as well.

~~~
vmarkovtsev
Thanks. Actually, I like DBSCAN a lot and use it often, though I am not much
familiar with it's internals. It looks like it is iterative and thus does not
fit very well to a GPU. The only way I see is to pick several seed points at
start...

~~~
cs702
A Google search reveals this paper:
[https://arxiv.org/abs/1506.02226](https://arxiv.org/abs/1506.02226)

This paper claims a "97x improvement" over traditional (non-parallelized)
DBSCAN algorithms, but that's not a very helpful claim, because it does not
indicate what the computational costs are as a function of, say, the number of
data points or dimensions.

~~~
vmarkovtsev
97x improvement is actually very suspicious. Thanks for the article!

