Have a personal or library account? Click to login
Double-stage discretization approaches for biomarker-based bladder cancer survival modeling Cover

Double-stage discretization approaches for biomarker-based bladder cancer survival modeling

Open Access
|Aug 2021

References

  1. 1. D. Wu, C. M. Rice, and X. Wang, Cancer bioinformatics: A new approach to systems clinical medicine, 2012.
  2. 2. S. Zheng, L. Yang, Y. Dai, L. Jiang, Y. Wei, H. Wen, and Y. Xu, Screening and survival analysis of hub genes in gastric cancer based on bioinformatics, Journal of Computational Biology, vol. 26, no. 11, pp. 1316–1325, 2019.
  3. 3. C. Zhang, M. Berndt-Paetz, and J. Neuhaus, Identification of key biomarkers in bladder cancer: Evidence from a bioinformatics analysis, Diagnostics, vol. 10, no. 2, p. 66, 2020.10.3390/diagnostics10020066716892331991631
  4. 4. P. Kutwin, T. Konecki, M. Cichocki, P. Falkowski, and Z. Jabłonowski, Photodynamic diagnosis and narrow-band imaging in the management of bladder cancer: a review, Photomedicine and Laser Surgery, vol. 35, no. 9, pp. 459–464, 2017.10.1089/pho.2016.421728537820
  5. 5. I. Erb and C. Notredame, How should we measure proportionality on relative gene expression data?, Theory in Biosciences, vol. 135, no. 1-2, pp. 21–36, 2016.10.1007/s12064-015-0220-8487031026762323
  6. 6. C. A. Gallo, R. L. Cecchini, J. A. Carballido, S. Micheletto, and I. Ponzoni, Discretization of gene expression data revised, Briefings in bioinformatics, vol. 17, no. 5, pp. 758–770, 2016.10.1093/bib/bbv07426438418
  7. 7. P. Domingos, The role of occam’s razor in knowledge discovery, Data mining and knowledge discovery, vol. 3, no. 4, pp. 409–425, 1999.10.1023/A:1009868929893
  8. 8. C. Zhang, M. Berndt-Paetz, and J. Neuhaus, Bioinformatics analysis identifying key biomarkers in bladder cancer, Data, vol. 5, no. 2, p. 38, 2020.10.3390/data5020038
  9. 9. S. v. Buuren and K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in r, Journal of statistical software, pp. 1–68, 2010.10.18637/jss.v045.i03
  10. 10. B. V. Church, H. T. Williams, and J. C. Mar, Investigating skewness to understand gene expression heterogeneity in large patient cohorts, BMC bioinformatics, vol. 20, no. 24, pp. 1–14, 2019.10.1186/s12859-019-3252-0692388331861976
  11. 11. Y. Chen, S. Tu, and L. Xu, The prognostic role of genes with skewed expression distribution in lung adenocarcinoma, in International Conference on Intelligent Science and Big Data Engineering, pp. 631–640, Springer International Publishing, 2017.10.1007/978-3-319-67777-4_57
  12. 12. J. R. Holland, J. D. Baeder, and K. Duraisamy, Towards integrated field inversion and machine learning with embedded neural networks for rans modeling, in AIAA Scitech 2019 Forum, p. 1884, American Institute of Aeronautics and Astronautics, 2019.
  13. 13. D. George and M. Mallery, Using SPSS for Windows step by step: a simple guide and reference. Boston, MA: Allyn & Bacon, 2003.
  14. 14. T. Speed, Always log spot intensities and ratios, Speed Group Microarray Page, at http://www.stat.berkeley.edu/users/terry/zarray/Html/log.html, 2000.
  15. 15. C. Cheadle, M. P. Vawter, W. J. Freed, and K. G. Becker, Analysis of microarray data using z score transformation, The Journal of molecular diagnostics, vol. 5, no. 2, pp. 73–81, 2003.10.1016/S1525-1578(10)60455-2190732212707371
  16. 16. R. D’Agostino and E. S. Pearson, Tests for departure from normality. empirical results for the distributions of b2 and b \sqrt b , Biometrika, vol. 60, no. 3, pp. 613–622, 1973.10.1093/biomet/60.3.613
  17. 17. C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median, Journal of Experimental Social Psychology, vol. 49, no. 4, pp. 764–766, 2013.10.1016/j.jesp.2013.03.013
  18. 18. F. E. Harrell and C. Davis, A new distribution-free quantile estimator, Biometrika, vol. 69, no. 3, pp. 635–640, 1982.10.1093/biomet/69.3.635
  19. 19. Z. Gu, L. Gu, R. Eils, M. Schlesner, and B. Brors, circlize implements and enhances circular visualization in r, Bioinformatics, vol. 30, no. 19, pp. 2811–2812, 2014.
  20. 20. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, in Advances in neural information processing systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.
  21. 21. M. Beleut, R. Soeldner, M. Egorov, R. Guenther, S. Dehler, C. Morys-Wortmann, H. Moch, K. Henco, and P. Schraml, Discretization of gene expression data unmasks molecular subgroups recurring in different human cancer types, PloS one, vol. 11, no. 8, p. e0161514, 2016.10.1371/journal.pone.0161514499032727537329
  22. 22. S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, vol. 32, no. 1, pp. 47–58, 2006.
  23. 23. L. Peng, W. Qing, and G. Yujia, Study on comparison of discretization methods, in 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 380–384, IEEE, 2009.10.1109/AICI.2009.385
  24. 24. L. A. Kurgan and K. J. Cios, Caim discretization algorithm, IEEE transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 145–153, 2004.10.1109/TKDE.2004.1269594
  25. 25. C.-J. Tsai, C.-I. Lee, and W.-P. Yang, A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, vol. 178, no. 3, pp. 714–731, 2008.10.1016/j.ins.2007.09.004
  26. 26. L. Gonzalez-Abril, F. J. Cuberos, F. Velasco, and J. A. Ortega, Ameva: An autonomous discretization algorithm, Expert Systems with Applications, vol. 36, no. 3, pp. 5327–5332, 2009.
  27. 27. U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, in Proceedings of the 13th international joint conference on artificial intelligence, pp. 1022–1027, IJCAI, 1993.
  28. 28. R. Kerber, Chimerge: Discretization of numeric attributes, in Proceedings of the tenth national conference on Artificial intelligence, pp. 123–128, AAAI Press, 1992.
  29. 29. F. E. Tay and L. Shen, A modified chi2 algorithm for discretization, IEEE Transactions on knowledge and data engineering, vol. 14, no. 3, pp. 666–670, 2002.10.1109/TKDE.2002.1000349
  30. 30. C.-T. Su and J.-H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE transactions on knowledge and data engineering, vol. 17, no. 3, pp. 437–441, 2005.10.1109/TKDE.2005.39
  31. 31. L. Reiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression trees (Belmont, California: Wadsworth Ind. Group). Wadsworth Ind. Group, 1984.
  32. 32. T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, New York, NY, USA: Association for Computing Machinery, 2016.
  33. 33. C. Ding and H. Peng, Minimum redundancy feature selection from microarray gene expression data, Journal of bioinformatics and computational biology, vol. 3, no. 02, pp. 185–205, 2005.10.1142/S021972000500100415852500
  34. 34. G. Figueroa, Y.-S. Chen, N. Avila, and C.-C. Chu, Improved practices in machine learning algorithms for ntl detection with imbalanced data, in 2017 IEEE Power & Energy Society General Meeting, pp. 1–5, IEEE, 2017.10.1109/PESGM.2017.8273852
  35. 35. A. Martino, A. Rizzi, and F. M. F. Mascioli, Supervised approaches for protein function prediction by topological data analysis, in 2018 International joint conference on neural networks (IJCNN), pp. 1–8, IEEE, 2018.10.1109/IJCNN.2018.8489307
  36. 36. G. Demiröz and H. A. Güvenir, Classification by voting feature intervals, in European Conference on Machine Learning, pp. 85–92, Springer, 1997.10.1007/3-540-62858-4_74
  37. 37. F. Ali and M. Hayat, Classification of membrane protein types using voting feature interval in combination with chou pseudo amino acid composition, Journal of theoretical biology, vol. 384, pp. 78–83, 2015.10.1016/j.jtbi.2015.07.03426297889
  38. 38. L. v. d. Maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
  39. 39. H. Je reys, An invariant form for the prior probability in estimation problems, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 186, no. 1007, pp. 453–461, 1946.
  40. 40. E. Purdom and S. P. Holmes, Error distribution for gene expression data, Statistical applications in genetics and molecular biology, vol. 4, no. 1, 2005.10.2202/1544-6115.107016646833
  41. 41. Z. Fang, R. Du, and X. Cui, Uniform approximation is more appropriate for wilcoxon rank-sum test in gene set analysis, Plos One, vol. 7, no. 2, p. e31505, 2012.10.1371/journal.pone.0031505327453622347488
  42. 42. M. C. Whitlock and D. Schluter, The analysis of biological data. Roberts and Company Publishers, 2009.
  43. 43. G. Navas-Palencia, Optimal binning: mathematical programming formulation, arXiv preprint arXiv:2001.08025, 2020.
  44. 44. R. Anderson, The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford University Press, 2007.
  45. 45. G. L. Libralon, A. C. P. de Leon Ferreira, A. C. Lorena, et al., Pre-processing for noise detection in gene expression classification data, Journal of the Brazilian Computer Society, vol. 15, no. 1, pp. 3–11, 2009.10.1007/BF03192573
Language: English
Page range: 29 - 47
Submitted on: Feb 2, 2021
Accepted on: Jul 6, 2021
Published on: Aug 10, 2021
Published by: Italian Society for Applied and Industrial Mathemathics
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Mauro Nascimben, Manolo Venturin, Lia Rimondini, published by Italian Society for Applied and Industrial Mathemathics
This work is licensed under the Creative Commons Attribution 4.0 License.