Have a personal or library account? Click to login
Comparison of some correlation measures for continuous and categorical data Cover

Comparison of some correlation measures for continuous and categorical data

Open Access
|Dec 2019

Abstract

In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.

DOI: https://doi.org/10.2478/bile-2019-0015 | Journal eISSN: 2199-577X | Journal ISSN: 1896-3811
Language: English
Page range: 253 - 261
Published on: Dec 16, 2019
Published by: Polish Biometric Society
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2019 Ewa Skotarczak, Anita Dobek, Krzysztof Moliński, published by Polish Biometric Society
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.