Have a personal or library account? Click to login
Lookahead selective sampling for incomplete data Cover

Abstract

Missing values in data are common in real world applications. There are several methods that deal with this problem. In this paper we present lookahead selective sampling (LSS) algorithms for datasets with missing values. We developed two versions of selective sampling. The first one integrates a distance function that can measure the similarity between pairs of incomplete points within the framework of the LSS algorithm. The second algorithm uses ensemble clustering in order to represent the data in a cluster matrix without missing values and then run the LSS algorithm based on the ensemble clustering instance space (LSS-EC). To construct the cluster matrix, we use the k-means and mean shift clustering algorithms especially modified to deal with incomplete datasets. We tested our algorithms on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the LSS and LSS-EC algorithms for incomplete data to two other basic methods. Our experiments show that the suggested selective sampling algorithms outperform the other methods.

DOI: https://doi.org/10.1515/amcs-2016-0062 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 871 - 884
Submitted on: Oct 15, 2015
Accepted on: Jul 19, 2016
Published on: Dec 30, 2016
Published by: University of Zielona Góra
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2016 Loai Abdallah, Ilan Shimshoni, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.