Have a personal or library account? Click to login

Implementation of Enzyme Family Classification by using Autoencoders in a Study Case with Imbalanced and Underrepresented Classes

Open Access
|Mar 2025

References

  1. N. Buton, F. Coste, and Y. Le Cunff, “Predicting Enzymatic Function of Protein Sequences With Attention,” Bioinformatics, vol. 39, no. 10, Oct. 2023, doi: 10.1093/bioinformatics/btad620.
  2. Y. González Valle, D. Galpert, and R. Molina-Ruiz, “Integración De Rasgos Y Aprendizaje SemiSupervisado Para La Clasificación Funcional De Enzimas Utilizando KMeans De Spark,” Revista Cubana de Ciencias Informáticas, vol. 14, no. 4, 2020.
  3. Y. González Valle, D. Galpert, and R. MolinaRuiz, “Agrupamiento Funcional De Enzimas GH-70 Utilizando Aprendizaje Semi-Supervisado Y Apache Spark,” Revista Cubana de Transformación Digital, pp. 14–32, 2021.
  4. H. Chehili, S. E. Aliouane, A. Bendahmane, and M. A. Hamidechi, “DeepEnz: Prediction Of Enzyme Classification By Deep Learning,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 2, 2021, doi: 10.11591/ije ecs.v22.i2.pp1108-1115.
  5. Z. Tao, B. Dong, Z. Teng, and Y. Zhao, “The Classification of Enzymes by Deep Learning,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.202 0.2992468.
  6. N. Ibtehaz and D. Kihara, “Application of Sequence Embedding in Protein Sequence-Based Predictions,” in Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics, 2022. doi: 10.1142/9789811258589_0002.
  7. K. K. Yang, Z. Wu, C. N. Bedbrook, and F. H. Arnold, “Learned Protein Embeddings For Machine Learning,” Bioinformatics, vol. 34, no. 15, pp. 2642–2648, Aug. 2018, doi: 10.1093/bi oinformatics/bty178.
  8. C. Marquet et al., “Embeddings From Protein Language Models Predict Conservation And Variant Effects,” Hum Genet, vol. 141, no. 10, 2022, doi: 10.1007/s00439-021-02411-y.
  9. M. M. Moya and D. R. Hush, “Network Constraints And Multi-Objective Optimization For One-Class Classification,” Neural Networks, vol. 9, no. 3, 1996, doi: 10.1016/0893-6080(95)00120-4.
  10. M. Sakurada and T. Yairi, “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction,” in ACM International Conference Proceeding Series, 2014. doi: 10.1145/2689746.2689747.
  11. K. Pawar and V. Attar, “Deep Learning Model Based on Cascaded Autoencoders and One-Class Learning For Detection And Localization Of Anomalies From Surveillance Videos,” IET Biom, vol. 11, no. 4, 2022, doi: 10.1049/bme2.12064.
  12. L. López, N. Acosta-Mendoza, and A. Gago-Alonso, “Detección De Anomalías Basada En Aprendizaje Profundo,” Revista de Ciencias Informáticas, vol. 13, no. 3, 2020.
  13. M. V. Nallapareddy and R. Dwivedula, “ABLE: Attention Based Learning For Enzyme Classification,” Comput Biol Chem, vol. 94, p. 107558, 2021, doi: https://doi.org/10.1016/j.compbiolchem.2021.107558.
  14. R. Atienza, Advanced Deep Learning with Keras. 2018.
  15. L. Wang, H. Zhang, W. Xu, Z. Xue, and Y. Wang, “Deciphering The Protein Landscape With Protflash, A Lightweight Language Model,” Cell Rep PhysSci, vol. 4, no. 10, p. 101600, 2023, doi: https://doi.org/10.1016/j.xcrp.2023.101600.
  16. K. Cabello-Solorzano, I. Ortigosa de Araujo, M. Peña, L. Correia, and A. J. Tallón-Ballesteros, “The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis,” 2023. doi: 10.1007/978-3-031-4253 6-3_33.
  17. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, 2002, doi: 10.1613/jair.953.
  18. G. Douzas, F. Bacao, and F. Last, “Improving Imbalanced Learning Through A Heuristic Oversampling Method Based On K-Means And SMOTE,” Inf Sci (N Y), vol. 465, 2018, doi: 10.1 016/j.ins.2018.06.056.
  19. H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A New Over-Sampling Method In Imbalanced Data Sets Learning,” in Lecture Notes in Computer Science, 2005. doi: 10.1007/115380 59_91.
  20. Aurélien Géaron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, And Techniques to Build Intelligent Systems. 2022.
  21. D. P. Kingma and J. L. Ba, “Adam: A Method For Stochastic Optimization,” in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015.
  22. R. Dhanuka, A. Tripathi, and J. P. Singh, “A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction,” IEEE J Biomed Health Inform, vol. 26, no. 10, pp. 4957–4965, Oct. 2022, doi: 10.1109/JBHI.2022.3163150.
  23. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way To Prevent Neural Networks From Overfitting,” Journal of Machine Learning Research, vol. 15, 2014.
  24. T. Dozat, “Incorporating Nesterov Momentum into Adam,” ICLR Workshop, no. 1, 2016.
  25. H. Bin Shen and K. C. Chou, “EzyPred: A Top– Down Approach For Predicting Enzyme Functional Classes And Subclasses,” Biochem Biophys Res Commun, vol. 364, no. 1, pp. 53–59, Dec. 2007, doi: 10.1016/J.BBRC.2007.09.098.
  26. A. Dalkiran, A. S. Rifaioglu, M. J. Martin, R. Cetin-Atalay, V. Atalay, and T. Doğan, “ECPred: A Tool For The Prediction Of The Enzymatic Functions Of Protein Sequences Based On The EC Nomenclature,” BMC Bioinformatics, vol. 19, no. 1, Sep. 2018, doi: 10.1186/s12859-018-2368-y.
  27. T. Sanderson, M. L. Bileschi, D. Belanger, and L. J. Colwell, “ProteInfer, Deep Neural Networks for Protein Functional Inference,” Elife, vol. 12, 2023, doi: 10.7554/eLife.80942.
DOI: https://doi.org/10.14313/jamris-2025-005 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 42 - 48
Submitted on: Apr 15, 2024
Accepted on: May 20, 2024
Published on: Mar 31, 2025
Published by: Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2025 Darian Fernández Gutiérrez, Ariadna Arbolaez Espinosa, Deborah Raquel Galpert Cañizares, María Matilde García Lorenzo, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.