"Accelerated Hierarchical Density Clustering"
https://arxiv.org/pdf/1705.07321
from HDBSCAN
This paper proposes a method to speed up the HDBSCAN* algorithm. The main points are:.
The HDBSCAN algorithm is described in the paper from three different perspectives
These three explanations illuminate different aspects of the same algorithm
Ultimately, these three approaches converge on the same algorithm, allowing for a deeper understanding of the algorithm by integrating perspectives from different research areas.
Each of these terms will be explained:
cluster tree
Robust Single Linkage Algorithm
Improved version of conventional single linkage More robust against noise Two parameters, k and α, are used More stable clustering by considering k neighborhoods around each point
core point
Important concepts in density-based clustering Points with a minimum number k or more points within a neighborhood of epsilon radius Represents the "core" point of the cluster
epsilon reachability
The concept of a relationship between two core points If both points are contained within an ε-neighborhood of each other, they are "ε-reachable" Used as a criterion for cluster formation
Mutual reachable distance
New distance metric between two points An index combining the core distance (distance to the kth nearest point) and the actual distance Distance measure considering density change of clusters
single-linkage clustering
A hierarchical clustering method based on the distance between the closest point pairs Merging clusters using minimum distance Characterized by the chaining effect (easy formation of elongated clusters)
topological data analysis
Mathematical methods to study topological properties of data Analyze data shape and structure in terms of phase space Extract invariant features for continuous deformations
persistent homology
A method for measuring the "lifetime" of topological features Quantify the scale at which the feature is present Helps to discern the essential structure of the data
Lesnick complex
A type of density-based mono-complex Based on the Vietoris-Rips complex, incorporating density information Computationally efficient structure.
simple substance (e.g. chemical)
A collection of single points, lines, triangles, etc. Basic structures used in topological data analysis A means of representing the topological structure of data
sheaf theory (cosmology)
Mathematical theory dealing with continuously varying sets on topological spaces Used to describe continuous changes in cluster structure Provides a more general mathematical framework
persistent score
Indicators to measure the importance of clusters Quantifies at what density level clusters continue to exist Used as a criterion for cluster extraction
This page is auto-translated from [/nishio/Accelerated Hierarchical Density Clustering](https://scrapbox.io/nishio/Accelerated Hierarchical Density Clustering) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.