Have a personal or library account? Click to login
Clustimpute: k-means Clustering with Built-in Missing Data Imputation Cover

Clustimpute: k-means Clustering with Built-in Missing Data Imputation

By: Oliver Pfaffel  
Open Access
|Aug 2025

Abstract

This article introduces a novel k-means clustering methodology and implementation designed to handle missing values efficiently. The method supports multivariate missingness and is computationally efficient, as it leverages current cluster assignments to define plausible distributions for missing values within each sample. Our experiments demonstrate strong scalability with increasing dataset size, comparable to simple random imputation—measured in terms of runtime. Regarding clustering performance, assessed via the Rand Index against ground truth labels, the method performs competitively with state-of-the-art approaches such as MICE and Amelia, especially when the proportion of missing values is moderate or the imputation runtime is a constraint.

DOI: https://doi.org/10.5334/jors.345 | Journal eISSN: 2049-9647
Language: English
Submitted on: Aug 22, 2020
|
Accepted on: Aug 7, 2025
|
Published on: Aug 18, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Oliver Pfaffel, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.