2024-11-30
Scripts that find and explain dense chunks of data rather than sorting the entire data into clusters
background
Explanation of usage and implementation
*.ipynb) to demonstrate what results can be obtained, but you can also cut out the Python code parts and use them as Python scriptsNum dense clusters 48 after In [4] runshdb = HDBSCAN(min_cluster_size=5, max_cluster_size=30, min_samples=2)min_samples=2 is, the more smoothing effects occur in the density calculation, so the output is more likely to be like "roughly speaking, it's all one lump".min_cluster_size=5 means "extracting the places where more than 5 cases are densely clustered".max_cluster_size=30 was added with the intention of splitting up large clusters of 30 or more entries and looking at them in detail, but it may be better not to use it since it splits up large clusters and creates several clusters with similar contents.cluster_selection_method="leaf", which is not specified here.In [6], an AI commentary is generated from In [7].Output the interestingness of the nameplate on a scale of 100 points. Give 0 points to those that are commonplace and 100 points to those that you noticed new things about. Im thinking that the part was not so valid after seeing the results this time.orthographical variants
This page is auto-translated from /nishio/濃いクラスタ抽出 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.