
Figure 1
Example of mid-infrared spectroscopy obtained from two insect species.

Figure 2
Class distribution into training and test sets for the tasks of species prediction (top) and age prediction (bottom). For species prediction, AA represents the species Anopheles arabiensis and AG represents Anopheles gambiae.

Figure 3
MIRS data after the preprocessing steps of dimensionality reduction and normalization.

Figure 4
General process of the feature-based approach.

Figure 5
Wavenumbers selected as features to train supervised machine learning algorithms.

Figure 6
Example of promising and unpromising intervals on MIRS data of Anopheles gambiae.

Figure 7
Process of kernel convolution for feature extraction in which two features (MAX and PPV) are extracted from the transformed time series (or feature map). Rocket performs such a process for 10,000 random kernels generating 20,000 features for training a linear classifier.

Figure 8
Residual Network (ResNet) architecture for time-series classification (Lima & Souza 2023).
| APPROACH | ALGORITHM | PARAMETERS |
|---|---|---|
| Feature-based | K-Nearest Neighbors (KNN) | k = 1, Distance: Manhattan |
| Logistic Regression (LR) | C = 5, Penalty: L1, Solver: linear | |
| Support Vector Machines (SVM) | C = 5, Kernel: linear | |
| Random Forest (RF) | Estimators: 300, Criterion: entropy | |
| XGBoost (XGB) | Estimators: 300, Learning rate: 0.1, Gamma: 0.1, Max. depth: 7 | |
| Interval-based | Time Series Forest (TSF) | Estimators: 200, Intervals: |
| Canonical Interval Forest (CIF) | Estimators: 200, Intervals: | |
| Diverse Representation CIF (DrCIF) | Estimators: 200, Intervals: | |
| Convolution-based | Random Convolutional Kernel Transform (Rocket) | Kernels: 10000 |
| Minimally Rocket (MiniRocket) | Kernels: 10000, Max. dilations per kernel: 32, Features per kernel: 4 | |
| Deep learning-based | Residual Network (ResNet) | Residual blocks: 3, Conv. per residual block: 3, Filters: [128,64,64], Kernel size: [8,5,3], Padding: same, Activation: ReLU, Epochs: 2000 |
| InceptionTime | Classifiers: 5, Depth: 6, Filters: 32, Conv. per layer: 3, Kernel size: 40, Padding: same, Activation: ReLU, Epochs: 1500 | |
| Fully Convolutional Network (FCN) | Layers: 3, Kernel size: [8,5,3], Filters: [128,256,128], Avg. pool size: 3, Padding: same, Activation: ReLU, Epochs: 2000 | |
| Time Convolutional Neural Network (Time-CNN) | Layers: 2, Kernel size: 7, Filters: [6,12], Avg. pool size: 3, Padding: valid, Activation: sigmoid, Epochs: 2000 |

Figure 9
Overview of the varying approaches, feature sets, and supervised learning algorithms covered in the experimental evaluation.

Figure 10
Accuracy of feature-based classifiers for the species prediction.

Figure 11
Accuracy results of interval and convolution-based classifiers for the species prediction.

Figure 12
Accuracy of deep learning classifiers for the species prediction.
Table 1
Ranking of algorithms from different categories for the task of species prediction.
| ALGORITHM | APPROACH | ACCURACY |
|---|---|---|
| InceptionTime | Deep learning | 0.97 |
| ResNet | Deep learning | 0.96 |
| FCN | Deep learning | 0.94 |
| Rocket | Convolution-based | 0.93 |
| LR (raw data) | Feature-based | 0.93 |
| MiniRocket | Convolution-based | 0.92 |
| Time-CNN | Deep learning | 0.92 |
| SVM (raw data) | Feature-based | 0.92 |
| CIF | Interval-based | 0.90 |
| XGB (raw data) | Feature-based | 0.90 |
| TSF | Interval-based | 0.86 |
| RF (raw data) | Feature-based | 0.86 |
| DrCIF | Interval-based | 0.85 |
| KNN (raw data) | Feature-based | 0.82 |
| RF (Catch-22 + wavenumbers) | Feature-based | 0.81 |

Figure 13
Confusion matrix obtained by the best classifier of each approach (i.e., feature-based, interval-based, convolution-based, and deep learning) for species classification.

Figure 14
Accuracy of feature-based classifiers for the age prediction.

Figure 15
Accuracy results of interval and convolution-based classifiers for the age prediction.

Figure 16
Accuracy of deep learning classifiers for the age prediction.
Table 2
Ranking of algorithms from different categories for the task of age prediction.
| ALGORITHM | APPROACH | ACCURACY |
|---|---|---|
| InceptionTime | Deep learning | 0.83 |
| CIF | Interval-based | 0.76 |
| Rocket | Convolution-based | 0.75 |
| XGB (raw data) | Feature-based | 0.75 |
| MiniRocket | Convolution-based | 0.74 |
| TSF | Interval-based | 0.74 |
| ResNet | Deep learning | 0.73 |
| RF (raw data) | Feature-based | 0.72 |
| FCN | Deep learning | 0.67 |
| RF (Wavenumbers) | Feature-based | 0.67 |
| Time-CNN | Deep learning | 0.66 |
| DrCIF | Interval-based | 0.66 |
| XGB (Catch-22 + wavenumbers) | Feature-based | 0.66 |
| RF (Catch-22 + wavenumbers) | Feature-based | 0.64 |
| KNN (raw data) | Feature-based | 0.63 |

Figure 17
Confusion matrix obtained by the best classifier of each approach (i.e., feature-based, interval-based, convolution-based, and deep learning-based) for species classification.
