Have a personal or library account? Click to login

Implementation of Enzyme Family Classification by using Autoencoders in a Study Case with Imbalanced and Underrepresented Classes

Open Access
|Mar 2025

Figures & Tables

Figure 1.

Autoencoder-based two-level classifier
Autoencoder-based two-level classifier

Figure 2.

Loss function (MSE) of the first level classifier using SMOTE
Loss function (MSE) of the first level classifier using SMOTE

Figure 3.

Descriptive diagram of the workflow of the second level of the classifier. The processes AE1, AE2, …, AEn represent the Autoencoders of the corresponding enzyme families F1, F2,…, Fn
Descriptive diagram of the workflow of the second level of the classifier. The processes AE1, AE2, …, AEn represent the Autoencoders of the corresponding enzyme families F1, F2,…, Fn

Figure 4.

Architecture of the two-level classifier (the enzyme family classifier) once implemented using the TensorFlow library
Architecture of the two-level classifier (the enzyme family classifier) once implemented using the TensorFlow library

Figure 5.

Loss function (categorical cross-entropy) of the second level classifier using SMOTE
Loss function (categorical cross-entropy) of the second level classifier using SMOTE

Figure 6.

Comparison of Accuracy by different software for enzyme classification or not
Comparison of Accuracy by different software for enzyme classification or not

Results of the family classification

PrecisionRecallF1-Score
GH180.900.900.90
GH190.911.000.95
No Enzyme0.890.800.84
accuracy0.90
macro avg0.900.900.90
weighted avg0.900.900.90

Number of enzymes per family

FamilyNumber of enzymes
GH18356
GH1983

Comparison of the trainings of the First Level

Loss FunctionLoss Function (Validation)
Without SMOTE0.02020.0223
SMOTE0.01270.0162
Hyperparameter optimization. (SMOTE)0.00250.0074

Comparison of the training of the Second Level

Loss FunctionLoss Function (Validation)
Without SMOTE0.03280.1163
SMOTE0.05180.0330
Hyperparameter optimization. (SMOTE)0.03920.0350

Comparison of different softwares for the classification of sequences into enzymes or non-enzymes (precision)

EzyPredECPredProteinferAE
Not Enzyme0.590.570.470.91
Enzyme1.000.820.951.00

Comparison of different softwares for the classification of sequences into enzymes or non-enzymes (F1-score)

EzyPredECPredProteinferAE
Not Enzyme0.740.470.620.95
Enzyme0.870.860.780.98

Comparison of different softwares for the classification of sequences into enzymes or non-enzymes (recall)

EzyPredECPredProteinferAE
Not Enzyme1.000.400.901.00
Enzyme0.770.900.670.97
DOI: https://doi.org/10.14313/jamris-2025-005 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 42 - 48
Submitted on: Apr 15, 2024
Accepted on: May 20, 2024
Published on: Mar 31, 2025
Published by: Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2025 Darian Fernández Gutiérrez, Ariadna Arbolaez Espinosa, Deborah Raquel Galpert Cañizares, María Matilde García Lorenzo, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.