Have a personal or library account? Click to login
Optimal estimator of hypothesis probability for data mining problems with small samples Cover

Optimal estimator of hypothesis probability for data mining problems with small samples

Open Access
|Sep 2012

Abstract

The paper presents a new (to the best of the authors’ knowledge) estimator of probability called the “Eph √ 2 completeness estimator” along with a theoretical derivation of its optimality. The estimator is especially suitable for a small number of sample items, which is the feature of many real problems characterized by data insufficiency. The control parameter of the estimator is not assumed in an a priori, subjective way, but was determined on the basis of an optimization criterion (the least absolute errors).The estimator was compared with the universally used frequency estimator of probability and with Cestnik’s m-estimator with respect to accuracy. The comparison was realized both theoretically and experimentally. The results show the superiority of the Eph √ 2 completeness estimator over the frequency estimator for the probability interval ph ∈ (0.1, 0.9). The frequency estimator is better for ph ∈ [0, 0.1] and ph ∈ [0.9, 1].

DOI: https://doi.org/10.2478/v10006-012-0048-z | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 629 - 645
Published on: Sep 28, 2012
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2012 Andrzej Piegat, Marek Landowski, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.