
A gentle introduction to HDBSCAN and density-based clustering - polm23
https://towardsdatascience.com/a-gentle-introduction-to-hdbscan-and-density-based-clustering-5fd79329c1e8
======
fluffet
Good article.

I used HDBSCAN in my master's thesis. It works well with high dimensional
data. If you're using it for high dimensional stuff I would recommend working
with Uniform Manifold Approximation and Projection (UMAP) to visualise. I
think it is made by the same author as HDBSCAN.

I wish they also talked about Density-Based Clustering Validation (DBCV) which
can be used to calculate the mathematical stability of the clusters (for hyper
parameters), apart from just looking at hierarchies.

~~~
bbischof
This is sorta true, but not quite. Leland McInnes and John Healy (the creators
of UMAP), do in-fact have an amazing paper on HDBSCAN, but it's not inventing
it. In their paper,
[https://arxiv.org/pdf/1705.07321.pdf](https://arxiv.org/pdf/1705.07321.pdf),
they introduce AHDBSCAN which is a great extension of HDBSCAN to dramatically
improve it's performance.

Their work is great but just wanted to save people a google in case they were
interested.

~~~
belval
Never heard of AHDBSCAN, thank you for sharing!

------
syntaxing
Is there a difference between HDBSCAN and DBSCAN (commonly used for point
cloud processing). The idea seems the same but I guess I never heard it called
HDBSCAN.

~~~
cwyers
HDBSCAN == Hierarchical DBSCAN, an extension of DBSCAN that produces
hierarchical clusters.

------
bra-ket
any article on clustering should mention ANN benchmarks and state of the art
[http://ann-benchmarks.com/](http://ann-benchmarks.com/)

~~~
Der_Einzige
Uhh, why? They're not really related to each other...

~~~
aflam
ANN is the building block of density estimation. If I recall correctly that's
the bottleneck for density-based clustering as its various algorithms then
take advantage of faster yet algorithms (sort, union-find, spanning-trees..).
While very interesting I am not sure this gentle introduction is the best
place to discuss ANN.

