Skip to main content
Have a personal or library account? Click to login
A Comprehensive R Package for Clusterability Testing Cover

A Comprehensive R Package for Clusterability Testing

Open Access
|May 2026

References

  1. Xu R, Wunsch DC. Survey of clustering algorithms. IEEE Transactions on Neural Networks; 2005. DOI: 10.1109/TNN.2005.845141
  2. Ackerman M, Dasgupta S. Incremental clustering: The case for extra clusters. In: Advances in Neural Information Processing Systems; 2014. pp. 307315.
  3. Margareta A, Ben-David S, Brânzei S, Loker D. Weighted clustering. Proceedings of the AAAI Conference on Artificial Intelligence. 2012;26.
  4. Adolfsson A, Ackerman M, Brownstein NC. To cluster, or not to cluster: An analysis of clusterability methods. Pattern Recognition. 2019;88:1326. DOI: 10.1016/j.patcog.2018.10.026
  5. Zhang L, Seo B, Lin L, Li J. OTclust: Mean Partition, Uncertainty Assessment, Cluster Validation and Visualization Selection for Cluster Analysis. R package version 1.0.6; 2023. https://CRAN.R-project.org/package=OTclust.
  6. Wiroonsri N, Preedasawakul O. UniversalCVI: Hard and Soft Cluster Validity Indices. R package version 1.3.0; 2025a. https://CRAN.R-project.org/package=UniversalCVI. DOI: 10.32614/CRAN.package.BayesCVI
  7. Wiroonsri N, Preedasawakul O. BayesCVI: Bayesian Cluster Validity Index. R package version 1.0.2; 2025b. https://CRAN.R-project.org/package=BayesCVI. DOI: 10.32614/CRAN.package.BayesCVI
  8. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001;63(2):411423. DOI: 10.1111/1467-9868.00293
  9. Kłopotek M. An aposteriorical clusterability criterion for k-means++ and simplicity of clustering. SN Computer Science. 2020;1(2):80. DOI: 10.1007/s42979-020-0079-8
  10. Zhang X, You Q. Clusterability analysis and incremental sampling for Nyström extension based spectral clustering. In: 2011 IEEE 11th International Conference on Data Mining. IEEE; 2011. pp. 942951. DOI: 10.1109/ICDM.2011.35
  11. Maechler M, Rousseeuw P, Struyf A, Hubert M. cluster: “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. R package version 2.1.3; 2022. https://CRAN.R-project.org/package=cluster.
  12. Fraley C, Raftery AE, Scrucca L. mclust: Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation. R package version 6.1; 2024. https://CRAN.R-project.org/package=mclust.
  13. Ackerman M, Ben-David S. Clusterability: A theoretical study. In: Artificial intelligence and statistics; 2009. pp. 18.
  14. Diallo AF, Patras P. Deciphering clusters with a deterministic measure of clustering tendency. IEEE Transactions on Knowledge and Data Engineering. 2023;36(4):14891501. DOI: 10.1109/TKDE.2023.3306024
  15. Laborde J, Stewart PA, Chen Z, Chen YA, Brownstein NC. Sparse clusterability: testing for cluster structure in high dimensions. BMC Bioinformatics. 2023;24:125. DOI: 10.1186/s12859-023-05210-6
  16. Epter S, Krishnamoorthy M, Zaki M. Clusterability detection and cluster initialization. In: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications at the 2nd SIAM International Conference on Data Mining; 2002. pp. 4758.
  17. Bezdek JC, Hathaway RJ. Vat: A tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), volume 3. IEEE; 2002. pp. 22252230. DOI: 10.1109/IJCNN.2002.1007487
  18. Wang L, Nguyen UTV, Bezdek JC, Leckie CA, Ramamohanarao K. ivat and avat: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia conference on knowledge discovery and data mining. Springer; 2010. pp. 1627. DOI: 10.1007/978-3-642-13657-3_5
  19. Zahn CT. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers. 1971;100(1):6886. DOI: 10.1109/T-C.1971.223083
  20. Simovici D, Hua K. Data ultrametricity and clusterability. In: Journal of Physics: Conference Series, volume 1334. IOP Publishing; 2019. pp. 012002. DOI: 10.1088/1742-6596/1334/1/012002
  21. Nowakowska E, Koronacki J, Lipovetsky S. Clusterability assessment for gaussian mixture models. Applied Mathematics and Computation. 2015;256:591601. DOI: 10.1016/j.amc.2014.12.038
  22. Mircea M, Hochane M, Fan X, Chuva de Sousa Lopes SM, Garlaschelli D, Semrau S. Phiclust: a clusterability measure for single-cell transcriptomics reveals phenotypic subpopulations. Genome Biology, 2022;23(1):18. DOI: 10.1186/s13059-021-02590-x
  23. Kaminski G, Odonkor P. The building clusterability index: A scalable framework for spatially coordinated der deployment in urban building stocks. Energy and Buildings; 2025. p. 116708. DOI: 10.2139/ssrn.5352566
  24. Hopkins B, Skellam JG. A new method for determining the type of distribution of plant individuals. Annals of Botany, 1954;18(2):213227. ISSN 0305-7364. DOI: 10.1093/oxfordjournals.aob.a083391
  25. Hastie T, Stuetzle W. Principal curves. Journal of the American statistical association, 1989;84(406):502516. DOI: 10.1080/01621459.1989.10478797
  26. Helgeson ES, Vock DM, Bair E. Nonparametric cluster significance testing with reference to a unimodal null distribution. Biometrics. 2021;77(4):12151226. DOI: 10.1111/biom.13376
  27. Wright K. Will the real hopkins statistic please stand up? R Journal. 2022;14(3). DOI: 10.32614/RJ-2022-055
  28. Hu L, Dong J, Jiang M, Liu Y, He Z. Clusterability test for categorical data. Knowledge and Information Systems. 2025;67(5):41134138. DOI: 10.1007/s10115-024-02317-x
  29. Rakic T, Gadasina L. Difficulties of cluster analysis on mixed-type variables. In: 2025 International Russian Smart Industry Conference (SmartIndustryCon). IEEE; 2025. pp. 411417. DOI: 10.1109/SmartIndustryCon65166.2025.10985999
  30. Gao P, Zhang P. Testing clusterability on labeled graphs. Tsinghua Science and Technology. 2025. DOI: 10.26599/TST.2025.9010180
  31. Li Y, Yang D, Li J. Testing higher-order clusterability on graphs. Journal of Combinatorial Optimization. 2025;49(3):51. DOI: 10.1007/s10878-025-01262-x
  32. Miasnikof P, Shestopaloff AY, Raigorodskii A. Statistical power, accuracy, reproducibility and robustness of a graph clusterability test. International Journal of Data Science and Analytics. 2023;15(4):379390. DOI: 10.1007/s41060-023-00389-6
  33. Chiplunkar A, Kapralov M, Khanna S, Mousavifar A, Peres Y. Testing graph clusterability: Algorithms and lower bounds. In: 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). IEEE; 2018. pp. 497508. DOI: 10.1109/FOCS.2018.00054
  34. He Z, Li X, Hu L, Jiang M, Liu Y. Community structure testing by counting frequent common neighbor sets. Information Sciences, 2025;691:121649. DOI: 10.1016/j.ins.2024.121649
  35. Hartigan JA, Hartigan PM. The dip test of unimodality. The Annals of Statistics. 1985;7084. DOI: 10.1214/aos/1176346577
  36. Silverman BW. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B (Methodological). 1981;9799. DOI: 10.1111/j.2517-6161.1981.tb01155.x
  37. Jolliffe IT. Principal component analysis, Second edition. Springer; 2002.
  38. Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. Journal of computational and graphical statistics. 2006;15(2):265286. DOI: 10.1198/106186006X113430
  39. Erichson NB, Zheng P, Manohar K, Brunton SL, Kutz NJ, Aravkin AY. Sparse principal component analysis via variable projection. SIAM Journal on Applied Mathematics. 2020;80(2):9771002. DOI: 10.1137/18M1211350
  40. Cheng M-Y, Hall P. Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1998;60(3):579589. DOI: 10.1111/1467-9868.00141
  41. Neville Z, Brownstein NC. Macros to conduct tests of multimodality in sas. Journal of Statistical Computation and Simulation. 2018;88(17):32693290. DOI: 10.1080/00949655.2018.1509979
  42. Brownstein NC, Adolfsson A, Ackerman M. Descriptive statistics and visualization of data from the r datasets package with implications for clusterability. Data in brief. 2019;25:104004. DOI: 10.1016/j.dib.2019.104004
  43. Schwaiger F, Holzmann H. Package which implements the silvermantest; 2013. https://www.mathematik.uni-marburg.de/– ˝stochastik/R_packages/.
  44. Hall P, York M. On the calibration of silverman’s test for multimodality. Statistica Sinica. 2001;515536.
  45. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria; 2025. https://www.R-project.org/.
  46. Wickham H. testthat: Get started with testing. The R Journal. 2011;3:510. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf. DOI: 10.32614/RJ-2011-002
  47. Sievert C. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC; 2020. ISBN 9781138331457. https://plotly-r.com. DOI: 10.1201/9780429447273
  48. Neville Z, Adolfsson A, Ackerman M, Brownstein N. Clusterability macros for the sas system. Unpublished; 2021+.
  49. Zou H, Hastie T. elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA. R package version 1.3; 2020. https://CRAN.R-project.org/package=elasticnet
  50. Fisher RA. The use of multiple measurements in taxonomic problems. Annals of eugenics. 1936;7(2):179188. DOI: 10.1111/j.1469-1809.1936.tb02137.x
  51. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V. Support vector clustering. Journal of Machine Learning Research, 2001;2(Dec):125137.
  52. Ezekiel M, Fox KA. Methods of correlation and regression analysis, linear and curvilinear. Wiley; 1930.
  53. Hester J, Vaughan D. bench: High Precision Timing of R Expressions. R package version 1.1.4; 2025. https://CRAN.R-project.org/package=bench
  54. Wickham H. R packages: organize, test, document, and share your code. “O’Reilly Media, Inc.”; 2015.
  55. Wickham H. Advanced r. Chapman and hall/CRC; 2019.
  56. Maechler M, Ringach D. diptest: Hartigan’s Dip Test Statistic for Unimodality – Corrected. R package version 0.77-2; 2025. https://CRAN.R-project.org/package=diptest
  57. Erichson NB, Zheng P, Aravkin S. sparsepca: Sparse Principal Component Analysis (SPCA). R package version 0.1.2; 2018. https://CRAN.R-project.org/package=sparsepca
  58. Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. cluster: Cluster Analysis Basics and Extensions. R package version 2.1.8.2 — For new features, see the ‘NEWS’ and the ‘Changelog’ file in the package source; 2026. https://CRAN.R-project.org/package=cluster
DOI: https://doi.org/10.5334/jors.389 | Journal eISSN: 2049-9647
Language: English
Page range: 37 - 37
Submitted on: Aug 24, 2021
Accepted on: Apr 1, 2026
Published on: May 20, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Zachariah Neville, Margareta Ackerman, Andreas Adolfsson, Naomi C. Brownstein, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.