Have a personal or library account? Click to login
A Novel Method for Drift Detection in Streaming Data Based on Measurement of Changes in Feature Ranks Cover

A Novel Method for Drift Detection in Streaming Data Based on Measurement of Changes in Feature Ranks

Open Access
|Feb 2025

References

  1. Husheng Guo, Hai Li, Qiaoyan Ren, and Wenjian Wang. Concept drift type identification based on multi-sliding windows. Information Sciences, 585:1–23, 2022.
  2. Piotr Porwik and Rafal Doroz. Adaptation of the idea of concept drift to some behavioral biometrics: Preliminary studies. Engineering Applications of Artificial Intelligence, 99:104135, 2021.
  3. Thomas Bartz-Beielstein and Lukas Hans. Drift detection and handling. In Eva Bartz and Thomas Bartz-Beielstein, editors, Online Machine Learning: A Practical Guide with Examples in Python, pages 23–39, Singapore, 2024. Springer Nature Singapore.
  4. Jo˜ao Gama, Indr˙eŽliobait˙e, Albert Bifet, Mykola Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM Computing Surveys, 46:1 – 37, 2014.
  5. Supriya Agrahari and Anil Kumar Singh. Adaptive pca-based feature drift detection using statistical measure. Cluster Computing, 25(6):4481–4494, 2022.
  6. Paulo M. Gonçalves, Silas G.T. de Carvalho Santos, Roberto S.M. Barros, and Davi C.L. Vieira. A comparative study on concept drift detectors. Expert Systems with Applications, 41(18):8144–8156, 2014.
  7. Ruba Abu Khurma, Ibrahim Aljarah, Ahmad Sharieh, Mohamed Abd Elaziz, Robertas Damaševičius, and Tomas Krilavičius. A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics, 10(3):464, 2022.
  8. Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–1182, 2003.
  9. Geoffrey I. Webb, Roy Hyde, Hong Cao, Hai Long Nguyen, and Francois Petitjean. Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4):964–994, 2016.
  10. Hang Yu, Qingyong Zhang, Tianyu Liu, Jie Lu, Yimin Wen, and Guangquan Zhang. Meta-add: A meta-learning based pre-trained model for concept drift active detection. Information Sciences, 608:996–1009, 2022.
  11. Lei Yu and Huan Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205–1224, 2004.
  12. Jan Niklas Adams, Sebastiaan J. van Zelst, Thomas Rose, and Wil M.P. van der Aalst. Explainable concept drift in process mining. Information Systems, 114:102177, 2023.
  13. Hang Yu, Weixu Liu, Jie Lu, Yimin Wen, Xiangfeng Luo, and Guangquan Zhang. Detecting group concept drift from multiple data streams. Pattern Recognition, 134:109113, 2023.
  14. Supriya Agrahari and Anil Kumar Singh. Concept drift detection in data stream mining: A literature review. Journal of King Saud University -Computer and Information Sciences, 34(10, Part B):9523–9540, 2022.
  15. Mahmood Karimian and Hamid Beigy. Concept drift handling: A domain adaptation perspective. Expert Systems with Applications, 224:119946, 2023.
  16. Andrés L. Suárez-Cetrulo, David Quintana, and Alejandro Cervantes. A survey on machine learning for recurring concept drifting data streams. Expert Systems with Applications, 213:118934, 2023.
  17. Firas Bayram, Bestoun S. Ahmed, and Andreas Kassler. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowledge-Based Systems, 245:108632, 2022.
  18. Lin Sun, Tianxiang Wang, Weiping Ding, Jiucheng Xu, and Yaojin Lin. Feature selection using fisher score and multilabel neighborhood rough sets for multilabel classification. Information Sciences, 578:887–912, 2021.
  19. Frank S. Corotto. Chapter nine - the two-sample t test and the importance of pooled variance. In Frank S. Corotto, editor, Wise Use of Null Hypothesis Tests, pages 95–98. Academic Press, 2023.
  20. Piotr Porwik and Benjamin Mensah Dadzie. Detection of data drift in a two-dimensional stream using the Kolmogorov-Smirnov test. Procedia Computer Science, 207:168–175, 2022. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES2022.
  21. Toshiyuki Sueyoshi and Shingo Aoki. A use of a nonparametric statistic for dea frontier shift: the Kruskal and Wallis rank test. Omega, 29(1):1–18, 2001.
  22. Baoshuang Zhang, Yanying Li, and Zheng Chai. A novel random multi-subspace based relieff for feature selection. Knowledge-Based Systems, 252:109400, 2022.
  23. Jacob Goldberger, Sam Roweis, Geoff Hinton, and Ruslan Salakhutdinov. Neighbourhood components analysis. In Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04, page 513–520, Cambridge, MA, USA, 2004. MIT Press.
  24. Xue-wen Chen and Jong Cheol Jeong. Enhanced recursive feature elimination. In Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pages 429–435, 2007.
  25. Nickolay Trendafilov and Michele Gallo. Pca and other dimensionality-reduction techniques. In Robert J Tierney, Fazal Rizvi, and Kadriye Ercikan, editors, International Encyclopedia of Education (Fourth Edition), pages 590–599. Elsevier, Oxford, fourth edition edition, 2023.
  26. Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, 2001.
  27. Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.
  28. Yvan Saeys, I˜naki Inza, and Pedro Larra˜naga. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507–2517, 08 2007.
  29. Qiuming Zhu. On the performance of matthews correlation coefficient (mcc) for imbalanced dataset. Pattern Recognition Letters, 136:71–80, 2020.
  30. Davide Chicco, Niklas Tötsch, and Giuseppe Jurman. The matthews correlation coefficient (mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining, 14, 2021.
  31. Vinícius M. A. de Souza, Denis Moreira dos Reis, André Gustavo Maletzke, and Gustavo E. A. P. A. Batista. Challenges in benchmarking stream learning algorithms with real-world data. Data Mining and Knowledge Discovery, 34:1805–1858, 2020.
  32. Yaohui Zeng and Patrick Breheny. The biglasso package: A memory- and computation-efficient solver for lasso model fitting with big data in r. The R Journal, 12, 01 2017.
Language: English
Page range: 147 - 166
Submitted on: Oct 2, 2024
Accepted on: Dec 1, 2024
Published on: Feb 5, 2025
Published by: SAN University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Piotr Porwik, Tomasz Orczyk, Krzysztof Wrobel, Benjamin Mensah Dadzie, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.