Have a personal or library account? Click to login
Time Series Mining Approaches for Malaria Vector Prediction on Mid-Infrared Spectroscopy Data Cover

Time Series Mining Approaches for Malaria Vector Prediction on Mid-Infrared Spectroscopy Data

Open Access
|May 2024

Figures & Tables

dsj-23-1711-g1.png
Figure 1

Example of mid-infrared spectroscopy obtained from two insect species.

dsj-23-1711-g2.png
Figure 2

Class distribution into training and test sets for the tasks of species prediction (top) and age prediction (bottom). For species prediction, AA represents the species Anopheles arabiensis and AG represents Anopheles gambiae.

dsj-23-1711-g3.png
Figure 3

MIRS data after the preprocessing steps of dimensionality reduction and normalization.

dsj-23-1711-g4.png
Figure 4

General process of the feature-based approach.

dsj-23-1711-g5.png
Figure 5

Wavenumbers selected as features to train supervised machine learning algorithms.

dsj-23-1711-g6.png
Figure 6

Example of promising and unpromising intervals on MIRS data of Anopheles gambiae.

dsj-23-1711-g7.png
Figure 7

Process of kernel convolution for feature extraction in which two features (MAX and PPV) are extracted from the transformed time series (or feature map). Rocket performs such a process for 10,000 random kernels generating 20,000 features for training a linear classifier.

dsj-23-1711-g8.png
Figure 8

Residual Network (ResNet) architecture for time-series classification (Lima & Souza 2023).

APPROACHALGORITHMPARAMETERS
Feature-basedK-Nearest Neighbors (KNN)k = 1, Distance: Manhattan
Logistic Regression (LR)C = 5, Penalty: L1, Solver: linear
Support Vector Machines (SVM)C = 5, Kernel: linear
Random Forest (RF)Estimators: 300, Criterion: entropy
XGBoost (XGB)Estimators: 300, Learning rate: 0.1, Gamma: 0.1, Max. depth: 7
Interval-basedTime Series Forest (TSF)Estimators: 200, Intervals: m
Canonical Interval Forest (CIF)Estimators: 200, Intervals: m
Diverse Representation CIF (DrCIF)Estimators: 200, Intervals: m
Convolution-basedRandom Convolutional Kernel Transform (Rocket)Kernels: 10000
Minimally Rocket (MiniRocket)Kernels: 10000, Max. dilations per kernel: 32, Features per kernel: 4
Deep learning-basedResidual Network (ResNet)Residual blocks: 3, Conv. per residual block: 3, Filters: [128,64,64], Kernel size: [8,5,3], Padding: same, Activation: ReLU, Epochs: 2000
InceptionTimeClassifiers: 5, Depth: 6, Filters: 32, Conv. per layer: 3, Kernel size: 40, Padding: same, Activation: ReLU, Epochs: 1500
Fully Convolutional Network (FCN)Layers: 3, Kernel size: [8,5,3], Filters: [128,256,128], Avg. pool size: 3, Padding: same, Activation: ReLU, Epochs: 2000
Time Convolutional Neural Network (Time-CNN)Layers: 2, Kernel size: 7, Filters: [6,12], Avg. pool size: 3, Padding: valid, Activation: sigmoid, Epochs: 2000
dsj-23-1711-g9.png
Figure 9

Overview of the varying approaches, feature sets, and supervised learning algorithms covered in the experimental evaluation.

dsj-23-1711-g10.png
Figure 10

Accuracy of feature-based classifiers for the species prediction.

dsj-23-1711-g11.png
Figure 11

Accuracy results of interval and convolution-based classifiers for the species prediction.

dsj-23-1711-g12.png
Figure 12

Accuracy of deep learning classifiers for the species prediction.

Table 1

Ranking of algorithms from different categories for the task of species prediction.

ALGORITHMAPPROACHACCURACY
InceptionTimeDeep learning0.97
ResNetDeep learning0.96
FCNDeep learning0.94
RocketConvolution-based0.93
LR (raw data)Feature-based0.93
MiniRocketConvolution-based0.92
Time-CNNDeep learning0.92
SVM (raw data)Feature-based0.92
CIFInterval-based0.90
XGB (raw data)Feature-based0.90
TSFInterval-based0.86
RF (raw data)Feature-based0.86
DrCIFInterval-based0.85
KNN (raw data)Feature-based0.82
RF (Catch-22 + wavenumbers)Feature-based0.81
dsj-23-1711-g13.png
Figure 13

Confusion matrix obtained by the best classifier of each approach (i.e., feature-based, interval-based, convolution-based, and deep learning) for species classification.

dsj-23-1711-g14.png
Figure 14

Accuracy of feature-based classifiers for the age prediction.

dsj-23-1711-g15.png
Figure 15

Accuracy results of interval and convolution-based classifiers for the age prediction.

dsj-23-1711-g16.png
Figure 16

Accuracy of deep learning classifiers for the age prediction.

Table 2

Ranking of algorithms from different categories for the task of age prediction.

ALGORITHMAPPROACHACCURACY
InceptionTimeDeep learning0.83
CIFInterval-based0.76
RocketConvolution-based0.75
XGB (raw data)Feature-based0.75
MiniRocketConvolution-based0.74
TSFInterval-based0.74
ResNetDeep learning0.73
RF (raw data)Feature-based0.72
FCNDeep learning0.67
RF (Wavenumbers)Feature-based0.67
Time-CNNDeep learning0.66
DrCIFInterval-based0.66
XGB (Catch-22 + wavenumbers)Feature-based0.66
RF (Catch-22 + wavenumbers)Feature-based0.64
KNN (raw data)Feature-based0.63
dsj-23-1711-g17.png
Figure 17

Confusion matrix obtained by the best classifier of each approach (i.e., feature-based, interval-based, convolution-based, and deep learning-based) for species classification.

Language: English
Submitted on: Feb 17, 2024
|
Accepted on: Apr 13, 2024
|
Published on: May 1, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Lucas G. M. Castro, Henrique V. Costa, Vinicius M. A. Souza, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.