Implementation of Enzyme Family Classification by using Autoencoders in a Study Case with Imbalanced and Underrepresented Classes

Gutiérrez, Darian Fernández; Espinosa, Ariadna Arbolaez; Cañizares, Deborah Raquel Galpert; Lorenzo, María Matilde García

doi:10.14313/jamris-2025-005

References

N. Buton, F. Coste, and Y. Le Cunff, “Predicting Enzymatic Function of Protein Sequences With Attention,” Bioinformatics, vol. 39, no. 10, Oct. 2023, doi: 10.1093/bioinformatics/btad620.
Open DOI Search in Google Scholar Back to article
Y. González Valle, D. Galpert, and R. Molina-Ruiz, “Integración De Rasgos Y Aprendizaje SemiSupervisado Para La Clasificación Funcional De Enzimas Utilizando KMeans De Spark,” Revista Cubana de Ciencias Informáticas, vol. 14, no. 4, 2020.
Search in Google Scholar Back to article
Y. González Valle, D. Galpert, and R. MolinaRuiz, “Agrupamiento Funcional De Enzimas GH-70 Utilizando Aprendizaje Semi-Supervisado Y Apache Spark,” Revista Cubana de Transformación Digital, pp. 14–32, 2021.
Search in Google Scholar Back to article
H. Chehili, S. E. Aliouane, A. Bendahmane, and M. A. Hamidechi, “DeepEnz: Prediction Of Enzyme Classification By Deep Learning,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 2, 2021, doi: 10.11591/ije ecs.v22.i2.pp1108-1115.
Open DOI Search in Google Scholar Back to article
Z. Tao, B. Dong, Z. Teng, and Y. Zhao, “The Classification of Enzymes by Deep Learning,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.202 0.2992468.
Open DOI Search in Google Scholar Back to article
N. Ibtehaz and D. Kihara, “Application of Sequence Embedding in Protein Sequence-Based Predictions,” in Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics, 2022. doi: 10.1142/9789811258589_0002.
Open DOI Search in Google Scholar Back to article
K. K. Yang, Z. Wu, C. N. Bedbrook, and F. H. Arnold, “Learned Protein Embeddings For Machine Learning,” Bioinformatics, vol. 34, no. 15, pp. 2642–2648, Aug. 2018, doi: 10.1093/bi oinformatics/bty178.
Open DOI Search in Google Scholar Back to article
C. Marquet et al., “Embeddings From Protein Language Models Predict Conservation And Variant Effects,” Hum Genet, vol. 141, no. 10, 2022, doi: 10.1007/s00439-021-02411-y.
Open DOI Search in Google Scholar Back to article
M. M. Moya and D. R. Hush, “Network Constraints And Multi-Objective Optimization For One-Class Classification,” Neural Networks, vol. 9, no. 3, 1996, doi: 10.1016/0893-6080(95)00120-4.
Open DOI Search in Google Scholar Back to article
M. Sakurada and T. Yairi, “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction,” in ACM International Conference Proceeding Series, 2014. doi: 10.1145/2689746.2689747.
Open DOI Search in Google Scholar Back to article
K. Pawar and V. Attar, “Deep Learning Model Based on Cascaded Autoencoders and One-Class Learning For Detection And Localization Of Anomalies From Surveillance Videos,” IET Biom, vol. 11, no. 4, 2022, doi: 10.1049/bme2.12064.
Open DOI Search in Google Scholar Back to article
L. López, N. Acosta-Mendoza, and A. Gago-Alonso, “Detección De Anomalías Basada En Aprendizaje Profundo,” Revista de Ciencias Informáticas, vol. 13, no. 3, 2020.
Search in Google Scholar Back to article
M. V. Nallapareddy and R. Dwivedula, “ABLE: Attention Based Learning For Enzyme Classification,” Comput Biol Chem, vol. 94, p. 107558, 2021, doi: https://doi.org/10.1016/j.compbiolchem.2021.107558.
Search in Google Scholar Back to article
R. Atienza, Advanced Deep Learning with Keras. 2018.
Search in Google Scholar Back to article
L. Wang, H. Zhang, W. Xu, Z. Xue, and Y. Wang, “Deciphering The Protein Landscape With Protflash, A Lightweight Language Model,” Cell Rep PhysSci, vol. 4, no. 10, p. 101600, 2023, doi: https://doi.org/10.1016/j.xcrp.2023.101600.
Search in Google Scholar Back to article
K. Cabello-Solorzano, I. Ortigosa de Araujo, M. Peña, L. Correia, and A. J. Tallón-Ballesteros, “The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis,” 2023. doi: 10.1007/978-3-031-4253 6-3_33.
Open DOI Search in Google Scholar Back to article
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, 2002, doi: 10.1613/jair.953.
Open DOI Search in Google Scholar Back to article
G. Douzas, F. Bacao, and F. Last, “Improving Imbalanced Learning Through A Heuristic Oversampling Method Based On K-Means And SMOTE,” Inf Sci (N Y), vol. 465, 2018, doi: 10.1 016/j.ins.2018.06.056.
Open DOI Search in Google Scholar Back to article
H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A New Over-Sampling Method In Imbalanced Data Sets Learning,” in Lecture Notes in Computer Science, 2005. doi: 10.1007/115380 59_91.
Open DOI Search in Google Scholar Back to article
Aurélien Géaron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, And Techniques to Build Intelligent Systems. 2022.
Search in Google Scholar Back to article
D. P. Kingma and J. L. Ba, “Adam: A Method For Stochastic Optimization,” in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015.
Search in Google Scholar Back to article
R. Dhanuka, A. Tripathi, and J. P. Singh, “A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction,” IEEE J Biomed Health Inform, vol. 26, no. 10, pp. 4957–4965, Oct. 2022, doi: 10.1109/JBHI.2022.3163150.
Open DOI Search in Google Scholar Back to article
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way To Prevent Neural Networks From Overfitting,” Journal of Machine Learning Research, vol. 15, 2014.
Search in Google Scholar Back to article
T. Dozat, “Incorporating Nesterov Momentum into Adam,” ICLR Workshop, no. 1, 2016.
Search in Google Scholar Back to article
H. Bin Shen and K. C. Chou, “EzyPred: A Top– Down Approach For Predicting Enzyme Functional Classes And Subclasses,” Biochem Biophys Res Commun, vol. 364, no. 1, pp. 53–59, Dec. 2007, doi: 10.1016/J.BBRC.2007.09.098.
Open DOI Search in Google Scholar Back to article
A. Dalkiran, A. S. Rifaioglu, M. J. Martin, R. Cetin-Atalay, V. Atalay, and T. Doğan, “ECPred: A Tool For The Prediction Of The Enzymatic Functions Of Protein Sequences Based On The EC Nomenclature,” BMC Bioinformatics, vol. 19, no. 1, Sep. 2018, doi: 10.1186/s12859-018-2368-y.
Open DOI Search in Google Scholar Back to article
T. Sanderson, M. L. Bileschi, D. Belanger, and L. J. Colwell, “ProteInfer, Deep Neural Networks for Protein Functional Inference,” Elife, vol. 12, 2023, doi: 10.7554/eLife.80942.
Open DOI Search in Google Scholar Back to article

Implementation of Enzyme Family Classification by using Autoencoders in a Study Case with Imbalanced and Underrepresented Classes

References

Paradigm

My account