Improve Threshold Range of Canopy Clustering Using Optimization Algorithms

Zhang, Ru (2024) Improve Threshold Range of Canopy Clustering Using Optimization Algorithms. Asian Journal of Research in Computer Science, 17 (12). pp. 148-164. ISSN 2581-8260

[thumbnail of Zhang17122024AJRCOS128026.pdf] Text
Zhang17122024AJRCOS128026.pdf - Published Version

Download (631kB)

Abstract

Canopy clustering is an effective method for determining the number of clusters dynamically without requiring a predefined cluster count, making it particularly suitable for large and complex datasets. However, its performance is highly dependent on the manual tuning of threshold parameters T1 and T2, which can be time-consuming and inefficient. This study aims to enhance the Canopy clustering algorithm by automating the optimization of threshold ranges using intelligent optimization algorithms. We propose a novel framework that integrates Simulated Annealing (SA), Particle Swarm Optimization (PSO), and Snake Optimization (SO) to automatically determine the optimal values of T1 and T2. Additionally, to address high-dimensional data complexity, we employ dimensionality reduction techniques such as t-SNE, SNE, and Kernel Principal Component Analysis (KPCA). The silhouette coefficient is utilized as the fitness function to evaluate clustering performance. Comprehensive experiments conducted on the Wine, Iris, and MNIST Subset datasets demonstrate that the proposed optimization-based Canopy clustering framework significantly improves clustering accuracy by up to 21% on the Wine dataset and 19% on the Iris dataset compared to traditional methods. Specifically, on the Wine dataset, the optimized Canopy clustering achieved a silhouette coefficient of 0.63, a 21% improvement over the original 0.52. On the Iris dataset, the optimized method outperformed k-means and manual Canopy clustering with silhouette coefficients of 0.62 versus 0.52 and 0.55, respectively. These results highlight the effectiveness of intelligent optimization algorithms in enhancing clustering adaptability and efficiency.

Item Type: Article
Subjects: STM Digital Press > Computer Science
Depositing User: Unnamed user with email support@stmdigipress.com
Date Deposited: 08 Jan 2025 05:17
Last Modified: 08 Jan 2025 05:17
URI: http://digitallibrary.eprintscholarlibrary.in/id/eprint/1582

Actions (login required)

View Item
View Item