References
- 1. D. Wu, C. M. Rice, and X. Wang, Cancer bioinformatics: A new approach to systems clinical medicine, 2012.
- 2. S. Zheng, L. Yang, Y. Dai, L. Jiang, Y. Wei, H. Wen, and Y. Xu, Screening and survival analysis of hub genes in gastric cancer based on bioinformatics, Journal of Computational Biology, vol. 26, no. 11, pp. 1316–1325, 2019.
- 3. C. Zhang, M. Berndt-Paetz, and J. Neuhaus, Identification of key biomarkers in bladder cancer: Evidence from a bioinformatics analysis, Diagnostics, vol. 10, no. 2, p. 66, 2020.10.3390/diagnostics10020066716892331991631
- 4. P. Kutwin, T. Konecki, M. Cichocki, P. Falkowski, and Z. Jabłonowski, Photodynamic diagnosis and narrow-band imaging in the management of bladder cancer: a review, Photomedicine and Laser Surgery, vol. 35, no. 9, pp. 459–464, 2017.10.1089/pho.2016.421728537820
- 5. I. Erb and C. Notredame, How should we measure proportionality on relative gene expression data?, Theory in Biosciences, vol. 135, no. 1-2, pp. 21–36, 2016.10.1007/s12064-015-0220-8487031026762323
- 6. C. A. Gallo, R. L. Cecchini, J. A. Carballido, S. Micheletto, and I. Ponzoni, Discretization of gene expression data revised, Briefings in bioinformatics, vol. 17, no. 5, pp. 758–770, 2016.10.1093/bib/bbv07426438418
- 7. P. Domingos, The role of occam’s razor in knowledge discovery, Data mining and knowledge discovery, vol. 3, no. 4, pp. 409–425, 1999.10.1023/A:1009868929893
- 8. C. Zhang, M. Berndt-Paetz, and J. Neuhaus, Bioinformatics analysis identifying key biomarkers in bladder cancer, Data, vol. 5, no. 2, p. 38, 2020.10.3390/data5020038
- 9. S. v. Buuren and K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in r, Journal of statistical software, pp. 1–68, 2010.10.18637/jss.v045.i03
- 10. B. V. Church, H. T. Williams, and J. C. Mar, Investigating skewness to understand gene expression heterogeneity in large patient cohorts, BMC bioinformatics, vol. 20, no. 24, pp. 1–14, 2019.10.1186/s12859-019-3252-0692388331861976
- 11. Y. Chen, S. Tu, and L. Xu, The prognostic role of genes with skewed expression distribution in lung adenocarcinoma, in International Conference on Intelligent Science and Big Data Engineering, pp. 631–640, Springer International Publishing, 2017.10.1007/978-3-319-67777-4_57
- 12. J. R. Holland, J. D. Baeder, and K. Duraisamy, Towards integrated field inversion and machine learning with embedded neural networks for rans modeling, in AIAA Scitech 2019 Forum, p. 1884, American Institute of Aeronautics and Astronautics, 2019.
- 13. D. George and M. Mallery, Using SPSS for Windows step by step: a simple guide and reference. Boston, MA: Allyn & Bacon, 2003.
- 14. T. Speed, Always log spot intensities and ratios, Speed Group Microarray Page, at http://www.stat.berkeley.edu/users/terry/zarray/Html/log.html, 2000.
- 15. C. Cheadle, M. P. Vawter, W. J. Freed, and K. G. Becker, Analysis of microarray data using z score transformation, The Journal of molecular diagnostics, vol. 5, no. 2, pp. 73–81, 2003.10.1016/S1525-1578(10)60455-2190732212707371
- 16. R. D’Agostino and E. S. Pearson, Tests for departure from normality. empirical results for the distributions of b2 and b \sqrt b , Biometrika, vol. 60, no. 3, pp. 613–622, 1973.10.1093/biomet/60.3.613
- 17. C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median, Journal of Experimental Social Psychology, vol. 49, no. 4, pp. 764–766, 2013.10.1016/j.jesp.2013.03.013
- 18. F. E. Harrell and C. Davis, A new distribution-free quantile estimator, Biometrika, vol. 69, no. 3, pp. 635–640, 1982.10.1093/biomet/69.3.635
- 19. Z. Gu, L. Gu, R. Eils, M. Schlesner, and B. Brors, circlize implements and enhances circular visualization in r, Bioinformatics, vol. 30, no. 19, pp. 2811–2812, 2014.
- 20. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, in Advances in neural information processing systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.
- 21. M. Beleut, R. Soeldner, M. Egorov, R. Guenther, S. Dehler, C. Morys-Wortmann, H. Moch, K. Henco, and P. Schraml, Discretization of gene expression data unmasks molecular subgroups recurring in different human cancer types, PloS one, vol. 11, no. 8, p. e0161514, 2016.10.1371/journal.pone.0161514499032727537329
- 22. S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, vol. 32, no. 1, pp. 47–58, 2006.
- 23. L. Peng, W. Qing, and G. Yujia, Study on comparison of discretization methods, in 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 380–384, IEEE, 2009.10.1109/AICI.2009.385
- 24. L. A. Kurgan and K. J. Cios, Caim discretization algorithm, IEEE transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 145–153, 2004.10.1109/TKDE.2004.1269594
- 25. C.-J. Tsai, C.-I. Lee, and W.-P. Yang, A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, vol. 178, no. 3, pp. 714–731, 2008.10.1016/j.ins.2007.09.004
- 26. L. Gonzalez-Abril, F. J. Cuberos, F. Velasco, and J. A. Ortega, Ameva: An autonomous discretization algorithm, Expert Systems with Applications, vol. 36, no. 3, pp. 5327–5332, 2009.
- 27. U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, in Proceedings of the 13th international joint conference on artificial intelligence, pp. 1022–1027, IJCAI, 1993.
- 28. R. Kerber, Chimerge: Discretization of numeric attributes, in Proceedings of the tenth national conference on Artificial intelligence, pp. 123–128, AAAI Press, 1992.
- 29. F. E. Tay and L. Shen, A modified chi2 algorithm for discretization, IEEE Transactions on knowledge and data engineering, vol. 14, no. 3, pp. 666–670, 2002.10.1109/TKDE.2002.1000349
- 30. C.-T. Su and J.-H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE transactions on knowledge and data engineering, vol. 17, no. 3, pp. 437–441, 2005.10.1109/TKDE.2005.39
- 31. L. Reiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression trees (Belmont, California: Wadsworth Ind. Group). Wadsworth Ind. Group, 1984.
- 32. T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, New York, NY, USA: Association for Computing Machinery, 2016.
- 33. C. Ding and H. Peng, Minimum redundancy feature selection from microarray gene expression data, Journal of bioinformatics and computational biology, vol. 3, no. 02, pp. 185–205, 2005.10.1142/S021972000500100415852500
- 34. G. Figueroa, Y.-S. Chen, N. Avila, and C.-C. Chu, Improved practices in machine learning algorithms for ntl detection with imbalanced data, in 2017 IEEE Power & Energy Society General Meeting, pp. 1–5, IEEE, 2017.10.1109/PESGM.2017.8273852
- 35. A. Martino, A. Rizzi, and F. M. F. Mascioli, Supervised approaches for protein function prediction by topological data analysis, in 2018 International joint conference on neural networks (IJCNN), pp. 1–8, IEEE, 2018.10.1109/IJCNN.2018.8489307
- 36. G. Demiröz and H. A. Güvenir, Classification by voting feature intervals, in European Conference on Machine Learning, pp. 85–92, Springer, 1997.10.1007/3-540-62858-4_74
- 37. F. Ali and M. Hayat, Classification of membrane protein types using voting feature interval in combination with chou pseudo amino acid composition, Journal of theoretical biology, vol. 384, pp. 78–83, 2015.10.1016/j.jtbi.2015.07.03426297889
- 38. L. v. d. Maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
- 39. H. Je reys, An invariant form for the prior probability in estimation problems, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 186, no. 1007, pp. 453–461, 1946.
- 40. E. Purdom and S. P. Holmes, Error distribution for gene expression data, Statistical applications in genetics and molecular biology, vol. 4, no. 1, 2005.10.2202/1544-6115.107016646833
- 41. Z. Fang, R. Du, and X. Cui, Uniform approximation is more appropriate for wilcoxon rank-sum test in gene set analysis, Plos One, vol. 7, no. 2, p. e31505, 2012.10.1371/journal.pone.0031505327453622347488
- 42. M. C. Whitlock and D. Schluter, The analysis of biological data. Roberts and Company Publishers, 2009.
- 43. G. Navas-Palencia, Optimal binning: mathematical programming formulation, arXiv preprint arXiv:2001.08025, 2020.
- 44. R. Anderson, The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford University Press, 2007.
- 45. G. L. Libralon, A. C. P. de Leon Ferreira, A. C. Lorena, et al., Pre-processing for noise detection in gene expression classification data, Journal of the Brazilian Computer Society, vol. 15, no. 1, pp. 3–11, 2009.10.1007/BF03192573