A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression Cover

A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression

Data Science Journal

Volume 22 (2023): Issue 1

By: Yannick Gerstorfer, Max Hahn-Klimroth and Lena Krieg

Open Access

|Nov 2023

Figures & Tables

Each occurrence of feature F splits the dataset into two parts. In the example, F₁ creates partition classes L₁ = {2, 4, 7, 3} and R₁ = {8, 12, 4, 6}. The split at F₂ creates classes L₂ = {7} and R₂ = {3}, whereas the split at F₃ defines L₃ = {8, 12} and R₃ = {4, 6}. The model is agnostic to any features other than F.

Mean and 95% confidence interval for the different trend estimators on SYN1(a) and SYN(b) for 250 independent trials each. On the x-axis, the proportion of noise is reported. Features 1–3 are informative, whereas features 4–10 are non-informative.

Pairplot of the used fish market dataset features (weight, height and width) and the predicted variable (Length).

Comparison of the trend estimators for FISH. We report the mean and the standard deviation of the different trend estimators over 100 bootstrap iterations, each containing 70% of the data. Relative absolute SHAP values shows the absolute sum of the SHAP values for each run, divided by the highest respective sum.

Mean and 95% confidence interval w.r.t. 100 independent iterations over noise on FISH. The x-axis reports the proportion of noise mixed to the real data.

Comparison of the trend estimators on HOUSING. The linear model assigns a negative coefficient to the total number of rooms feature, even though the feature itself is positively correlated to the target.

Comparison of the six different notions of feature importance on synthetic data. Figures A and B show results with respect to SYN2(a) and SYN2(b). Here, the labels are generated as $Y = 4 \cdot X_{0}^{1.5}$ , and {Ai} are given as by $X_{0} + W_{i}$ for differently strong Gaussian noise $W_{i}$ (SYN2(a)) and white noise (SYN2(b)). Figures C and D show results with respect to SYN3(a) (Gaussian noise) and SYN3(b) (White noise). Here, the labels are generated as $Y = 4 \cdot X_{0}^{1.5} + 2 \cdot X_{1} + 0.5 \cdot X_{2}^{2}$ , thus two more (weakly) informative features are given.

Comparison of the six different notions of feature importance on real-world instances. The l.h.s. reports the feature importance scores on the FISH dataset (mean and standard deviation over 400 independent runs), the r.h.s. on HOUSING (mean and standard deviation over 100 independent runs).

DOI: https://doi.org/10.5334/dsj-2023-042 | Journal eISSN: 1683-1470

Journal RSS Feed

Language: English

Submitted on: May 26, 2023

Accepted on: Sep 27, 2023

Published on: Nov 3, 2023

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Explainable Artificial Intelligence,

Feature Importance,

Gram-Schmidt Decorrelation,

Random Forest Regression,

Trend Estimation

© 2023 Yannick Gerstorfer, Max Hahn-Klimroth, Lena Krieg, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Previous article Volume 22 (2023): Issue 1 Next article