Have a personal or library account? Click to login
Revisiting the Optimal Probability Estimator from Small Samples for Data Mining Cover

Revisiting the Optimal Probability Estimator from Small Samples for Data Mining

By: Bojan Cestnik  
Open Access
|Dec 2019

Abstract

Estimation of probabilities from empirical data samples has drawn close attention in the scientific community and has been identified as a crucial phase in many machine learning and knowledge discovery research projects and applications. In addition to trivial and straightforward estimation with relative frequency, more elaborated probability estimation methods from small samples were proposed and applied in practice (e.g., Laplace’s rule, the m-estimate). Piegat and Landowski (2012) proposed a novel probability estimation method from small samples Eph√2 that is optimal according to the mean absolute error of the estimation result. In this paper we show that, even though the articulation of Piegat’s formula seems different, it is in fact a special case of the m-estimate, where pa =1/2 and m = √2. In the context of an experimental framework, we present an in-depth analysis of several probability estimation methods with respect to their mean absolute errors and demonstrate their potential advantages and disadvantages. We extend the analysis from single instance samples to samples with a moderate number of instances. We define small samples for the purpose of estimating probabilities as samples containing either less than four successes or less than four failures and justify the definition by analysing probability estimation errors on various sample sizes.

DOI: https://doi.org/10.2478/amcs-2019-0058 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 783 - 796
Submitted on: Dec 15, 2018
Accepted on: Apr 23, 2019
Published on: Dec 31, 2019
Published by: University of Zielona Góra
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2019 Bojan Cestnik, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.