Have a personal or library account? Click to login
Optimized Feature-Level Fusion Model For Multimodal Biometric System Cover

Optimized Feature-Level Fusion Model For Multimodal Biometric System

By: R. Bharathi and  M. B. Anandaraju  
Open Access
|Mar 2026

Full Article

I.
Introduction

Biometrics is a way to find out who someone is by looking at their unique traits. These traits, which are also known as “biometric traits,” can be broken down into two groups: physiological and behavioral [1]. Biometrics makes things safer, more comfortable, and harder to fake because they are harder to lose, copy, or fake [2]. The iris and finger vein are two of the most accurate hand-based biometrics for identifying people [3]. Because the finger vein system is hidden inside the finger, it is hard to fake [4]. In the same way, the eye has unique features that make it hard to copy and change. Based on how many biological traits are used, biometrics are either unimodal or multimodal [5] (Figure 1). Multimodal biometrics needs a way to combine or fuse the different biometric traits that are used. For multimodal biometrics to work, the different biometric traits must be able to work together [6]. Multiple fusion methods are used in multimodal biometrics at the decision, score, feature, and picture levels. As the fusion level goes up, more information is added, but it becomes harder to combine [7]. Score-level fusion is the most popular method of fusion because it hits a good balance between making fusion easy and giving enough information to tell the difference between a True and an imposter case [8].

Figure 1:

Various biometric traits.

Most biometric systems are unimodal, which means they authenticate with only one source of data [9]. However, unimodal biometric systems have challenges such as noise in sensitive data, inter- and intra-class variations, a lack of universality, and susceptibility to spoofing attempts. To address these difficulties, multimodal biometrics can be utilized [10]. Multimodal systems combine various biometric techniques, offering multiple sources of information for more secure authentication [11]. Furthermore, this study proposes an optimal fusion model that incorporates iris, fingerprint, and ear biometrics at the feature level. Local binary patterns (LBPs) are used for feature extraction, while concatenation-based feature-level fusion improves the system's discriminatory capabilities. The Bayesian Optimized SVM Feature Fusion (BSFF) technique is then used to ensure efficient and accurate classification. The study discusses existing multimodal recognition and fusion approaches in Section II, research contributions using Bayesian optimization (BO) in Section III, experimental validation in Section V, and finally, the results and future directions (Figure 1).

II.
Literature Review

Multimodal biometric systems enhance authentication by integrating different traits [12]. This review focuses on multimodal biometric systems for various levels of fusion.

Chen et al. [13] improved data security and privacy by introducing a multimodal biometric recognition technique that uses federated learning. The method ensured decentralized data storage while maintaining a modest identification accuracy across modalities, such as voice, fingerprint, and face.

Kurdthongmee et al. [14] propose a fast and accurate pupil estimation method using a fine-tuned semantic segmentation model with a shallow convolutional neural network (CNN) backbone. It handles varying lighting and camera conditions effectively, enabling real-time performance with high precision—ideal for applications in vision, biometrics, and medical diagnostics.

Kuklin et al. [15] address data drift in biometric systems using a Bayes–Minkowski-based feature correlation model and a neuroimmune approach. It employs adaptive learning to handle limited data and improves reliability, showing low equal error rate (EER) and strong resistance to adversarial attacks and information variability.

Adebayo et al. [16] state that Malay, as an under-resourced language, faces challenges in POS tagging due to limited annotated data. This study compares deep learning models and finds that a BERT + Bi-LSTM model achieves the highest performance, with 98.82% accuracy, outperforming traditional CRF and HMM approaches. The model excels in handling both known and unknown words, making it highly effective for POS tagging in Malay.

Bowyer et al. [17] presented an overview of 3D biometrics, stressing its superiority over 2D systems in handling occlusion and illumination challenges. They employed iterative closest point (ICP) algorithms for 3D face recognition, demonstrating improved recognition rates under varying environmental conditions.

Poh & Bengio [18] explored multimodal biometric fusion techniques, including voice, fingerprint, and iris integration. Using Bayesian fusion techniques, they achieved a 30% reduction in error rates compared with individual modalities.

Ross & Jain [19] provided a thorough assessment of multimodal biometrics, highlighting the integration of complementary traits such as the face and iris. Their work utilized feature-level fusion with support vector machines (SVM), achieving better recognition accuracy.

Zhang et al. [20] studied palmprint-based biometric systems. They achieved a decent recognition rate on controlled datasets by using principal component analysis (PCA) and Gabor filters for feature extraction. They also highlighted the robustness of palmprint recognition when combined with other modalities.

Li & Jain examined the advantages of 3D biometrics over 2D systems, focusing on 3D face and ear recognition. Using geometric feature extraction algorithms, they demonstrated higher robustness against occlusion and age-related changes [21].

Chen et al. reviewed advancements in 3D ear biometrics, demonstrating its effectiveness in handling occlusion and providing stable features over time. They applied wavelet transform and 3D surface matching, on large-scale datasets [22].

Nguyen et al. [23] explored the integration of 3D biometrics in multimodal systems, combining face and ear recognition. By employing CNNs, they achieved an 88% improvement in robustness under varying environmental conditions.

While the primary focus of our work is on developing an accurate fusion model, several aspects of the proposed system support real-world deployment. First, texture-based LBP features are computationally lightweight and offer robustness to sensor inconsistencies, such as changes in resolution, lighting, and imaging quality. Second, by combining three distinct biometric modalities, the system is inherently more secure against spoofing attempts, as attacking all modalities simultaneously is considerably more difficult. Finally, the use of widely accepted, non-contact biometrics (iris, fingerprint, and ear) makes the system more practical and user-friendly for high-security applications, where user cooperation and comfort are essential.

This research presents a novel multimodal biometric authentication system that uniquely combines iris, fingerprint, and ear traits using feature-level fusion, where LBP are applied uniformly for texture feature extraction and a Bayesian-optimized SVM fusion technique (BSFF) is introduced to enhance classification accuracy; the approach not only leverages the rich discriminative power of fused features but also ensures robustness, scalability, and superior performance.

III.
Materials and Methods

The proposed feature-level fusion model for a multimodal biometric system combines iris, fingerprint, and ear data while extracting texture using LBP. The acquired features from each modality are concatenated to form a strong feature vector, which is then optimized using dimensionality reduction techniques, as illustrated in Figure 2. Finally, an SVM classifier and Bayesian Optimizer were utilized to achieve efficient and accurate recognition.

Figure 2:

Multimodal Biometric System.

a.
Dataset

Ear images from the IIT Delhi Ear Database v1.0, fingerprint images from CASIA-FingerprintV5, and iris images from CASIA with different resolutions were collected. The required dataset was prepared through preprocessing techniques, including resizing, data augmentation, normalization, and noise reduction, to ensure uniformity and enhance recognition performance.

b.
LBPs

LBPs are employed for feature extraction from iris, ear, and fingerprint images due to their effectiveness in capturing texture details [24]. Iris-LBP extracts fine-grained texture patterns from iris images, enhancing feature robustness against illumination and scale variations [25]. Here, LBP captures the distinct structural and textural characteristics of the traits, improving its discriminative capability for biometric identification. Fingerprint-LBP encodes the patterns, preserving local texture information crucial for biometric recognition as mentioned in Figure 3.

Figure 3:

LBP. LBP, local binary pattern.

The LBP operator encodes texture by comparing a pixel's intensity with its neighbors. Each neighbor is assigned a binary value, and 1 if its intensity is higher than or equal to the central pixel, otherwise 0 [26]. These values form a binary code, defining the LBP for the pixel, as expressed in Eqs (1) and (2). (1) LBP(S,t)=j=0n1I(SjSm)2j {\rm{LBP}}(S,t) = \sum\nolimits_{j = 0}^{n - 1} {I({S_j} - {S_m}){2^j}} (2) I(S)=1,S00,S0 I(S) = \left\{{\matrix{{1,\,S \ge 0} \hfill \cr {0,\,S \le 0} \hfill \cr}} \right.

We acknowledge that feature-level concatenation can lead to high-dimensional vectors. To address this, we applied LBPs for texture feature extraction, which inherently produces compact and discriminative feature sets, while no additional dimensionality reduction (e.g., PCA) was applied in this study.

c.
Fusion of fingerprint, ear, and iris

The extracted LBP features from all three modalities are then used for fusion and classification for improved recognition accuracy [27]. Feature-level fusion combines the extracted features for richer information and merges all the features for efficient decision-making [28]. Feature-level fusion by union was employed, combining the left (L) and right (R) features of the three traits and represented as in Eqs (3)(5) for ear, iris, and fingerprint, respectively. Finally, these features are fused [29] into a single fused vector as in Eq. (6): (3) FEj=[{Re1,Re2,Ren}{Le1,Le2,Len}] {\rm{FEj}} = [\{{\rm{Re}}1,\,{\rm{Re}}2, \ldots {\rm{Ren}}\} \,\{{\rm{Le}}1,\,{\rm{Le}}2, \ldots {\rm{Len}}\} ] (4) FIj=[{Ri1,Ri2,Rin}{Li1,Li2,Lin}] {\rm{FIj}} = [\{{\rm{Ri}}1,\,{\rm{Ri}}2, \ldots {\rm{Rin}}\} \,\{{\rm{Li}}1,\,{\rm{Li}}2, \ldots {\rm{Lin}}\} ] (5) FFPj=[{Rfp1,Rfp2,Rfpn}{Lfp1,Lfp2,Lfpn}] {\rm{FFPj}} = [\{{\rm{Rfp}}1,\,{\rm{Rfp}}2, \ldots {\rm{Rfpn}}\} \,\{{\rm{Lfp}}1,\,{\rm{Lfp}}2, \ldots {\rm{Lfpn}}\} ] (6) F=[FEj,FIj,FFPj] {\rm{F}} = [{\rm{FEj}},\,{\rm{FIj}},\,{\rm{FFPj}}]

d.
Optimization and matching

The suggested work uses the data {(xi,yi)}i=1N \{({x_i},{y_i})\}_{i = 1}^N to train a machine learning model that learns or sets the parameters “p” using hyperparameters ‘λ’ [30]. In essence, a machine learning problem aims to minimize [31], an objective function “f” using Eq. (7): (7) minimizef,p,λp,λ|xi,yii=1N {\rm{minimize}}\,f,\,p,\,\lambda \left({p,\,\lambda |\left\{{\left({{x_i},\,{y_i}} \right)} \right\}_{i = 1}^N} \right)

Eq. (8) is implemented to evaluate the proposed system with a linear kernel: (8) klinear(x,x)=x,x {k_{{\rm{linear}}}}(x,x') = \left\langle {x,x'} \right\rangle where (x, x′), are the LBP vectors.

In summary, the hyperparameter “λ” of linear SVM is as in Eq. (9): (9) λLinearSVM={C} {\lambda_{{\rm{LinearSVM}}}} = \{C\} Furthermore, hyperparameter “γ” is optimized by Bayesian optimizer [32]. The optimized hyperparameters “λ” of linear SVM is given in Eq. (10): (10) λLinearSVM={C,γ} {\lambda_{{\rm{LinearSVM}}}} = \{C,\gamma \} Where γ is the width of the kernel and C is the regularization hyperparameter [33]. BSFF multiplies optimized Bayesian weights with SVM prediction scores, sums them, and uses the result for match using Eq. (11). (11) Fs=i=1nwiscore(xti) {F_s} = \sum\nolimits_{i = 1}^n {{w_i}{\rm{score}}(x_t^i)} Where w denotes the score's weight and Fs represents the fused score.

The score between the samples is evaluated from the Euclidean distance matcher using Eq. (12): (12) d(x,x)==iD(xi,xt)2 d(x,x') = \sqrt {\sum\nolimits_{= i}^D {{{({x_i},x_t^{'})}^2}}}

e.
BO

BO is a type of sequential model-based optimization (SMBO) [34] that starts with sample hyperparameters to create a regression model [35]. An acquisition function selects new samples, using the model [36] as a cost-effective proxy for the objective function in BSFF. Each sample xi yields an observation of the objective function yi = f(xi). The new sample is added to the set of all samples or data, Q = {(x1, y1) … (xi, yi)}, and is later utilized to update the model [37]. SMBO optimizes the LBP and SVM hyperparameters, to achieve better results and faster performance and is implemented in Algorithm 1. The BSFF algorithm is used to optimize the LBP hyperparameters, the SVM hyperparameters, and the weights of the feature level fusion using Algorithm 2.

Algorithm 1:

SMBO

Data: f,x,A,M
// Data initialiszation
Q ← Init. Samples (f,x);
// run up to N steps
for i ← |Q| to N do
// Model Training
p(yx, Q) ← FitModel(M, Q);
// Selection of best hyperparameter
xi ← argmaxxX A(x,p(yx, Q));
// estimate the hyperparameters
yif(xi);
// append new data
QQ ∪ (xi,yi);
end
Algorithm 2:

BSFF Objective Function

Input: F, y, λLBP, λsvm
Output: Statistical Parameter
// hyperparameters usage from BO
λLBP← Trial from BO;
λSVM← Trial from BO.
// run for each Trait Ear, Iris, Fingerprint
for j←1 to n  do
// LBP feature extraction
FEj ← Get LBP (xj, λLBP); //Ear
FIj ← Get LBP (xj, λLBP); // Iris
FFPj ← Get LBP (xj, λLBP); // Fingerprint
// concatenate the features as a Fused Feature vector
F ← {FEj, FIj, FFPj};
// split Fused features (F) into train and test data
xtrain, xtest, ytrain, ytest ← Train Test Split(x, y);
// train a SVM model with BO
SVMFit (xtrain, ytrain, λSVM);
// use the trained model (M) to predict test data
ypred ← SVMPredict (M, xtest);
// Evaluate Statistical Parameter (SP)
SP← Get SP (ytest, ypred);
return SP.

The BSFF approach innovates beyond traditional SVM-based multimodal fusion by performing early (feature-level) fusion of biometric traits, applying uniform LBP-based texture feature extraction, and incorporating a BO framework to automatically tune classifier parameters—resulting in a more robust, discriminative, and accurate biometric recognition system.

f.
Performance evaluation

By utilizing feature-level fusion [38], an optimized multimodal biometric system was developed to improve resilience and accuracy by multimodal recognition [39]. Statistical parameters [40] are evaluated using the following established Eqs (13)(18): (13) Accuracy=TP+TNTP+TN+FN+FP {\rm{Accuracy}} = {{{\rm{TP}} + {\rm{TN}}} \over {{\rm{TP}} + {\rm{TN}} + {\rm{FN}} + {\rm{FP}}}} (14) Recall=TPTP+FN {\rm{Recall}} = {{{\rm{TP}}} \over {{\rm{TP}} + {\rm{FN}}}} (15) F1Score=2*Precison*RecallPrecision+Recall {\rm{F}}1\,{\rm{Score}} = {{2*{\rm{Precison}}*{\rm{Recall}}} \over {{\rm{Precision}} + {\rm{Recall}}}} (16) Precision=TPTP+FP {\rm{Precision}} = {{{\rm{TP}}} \over {{\rm{TP}} + {\rm{FP}}}} (17) EER=0.5×{[FP/(FP+TN)]/[FN(FN+TP)]} {\rm{EER}} = 0.5 \times \{[{\rm{FP}}/({\rm{FP}} + {\rm{TN}})]/[{\rm{FN}}({\rm{FN}} + {\rm{TP}})]\} (18) NRMSE=1N=iN(yiyi)ymaxymin {\rm{NRMSE}} = \sqrt {\left({{1 \over N}} \right)\sum\limits_{= i}^N {{{({y_i} - y_i^{'})} \over {ymax - ymin}}}}

The actual and predicted values are represented by yi and yi {y_i^{'}} , respectively. The number of observations is denoted by N, and the maximum and minimum values of the actual data are ymax and ymin.

IV.
Results

In the research, computation has been done using a Intel Core i5-7th Generation CPU with 8GB RAM. Datasets are collected from the Resource Collaboratory, and preprocessing has been carried out before the extraction of feature from all the three traits. In total, 3600 images were processed using data augmentation included mirroring, cropping, and resizing. Although the dataset includes around 3600 images, it is balanced across all modalities and classes, with each subject contributing iris, fingerprint, and ear samples. To address generalizability concerns, we used fivefold cross-validation, LBPs for compact feature extraction, and Bayesian-optimized SVM to reduce overfitting. Despite the dataset size, the model demonstrates strong performance and scalability potential for larger systems. Statistical parameters including true positive rate (TPR), false positive rate (FPR), and accuracy calculated for fivefold. Training and testing data were preprocessed and kept separately. Figure 4 depicts the sample image used in the proposed system. Table 1 presents the optimization process over 30 iterations, focusing on the objective function, runtime, best-obtained value, and hyperparameters like Box Constraint & Kernel Scale. The goal value increased from 0.096 to 0.069 between iteration 1 and iteration 7 and then stabilized, showing early convergence. Function evaluation times varied, indicating computational complexity owing to hyperparameter selection. Lower Box Constraints ≤ 0.001 and Kernel Scales ≤ 0.010 yielded the best results, emphasizing the importance of fine-tuning in this range for efficiency.

Figure 4:

Sample Images used in Training process.

Table 1:

Optimization analysis of the parameters

IterEvalObjectiveObjective (run time)Best so farBest so far est timeBox constraintKernel scale
1B0.09613.1350.0960.0960.3370.002
2B0.0791.5760.0790.082871.770.010
3A0.33191.0620.0790.092431.780.029
4A0.8332.1250.0790.0790.00226.961
5A0.0793.0160.0790.0790.0740.002
6A0.9020.7350.0790.0790.041998.72
7B0.06912.3510.0690.0690.0010.001
8A0.0772.7310.0690.0690.0650.004
9A0.22346.8930.0690.069152.810.001
10A0.1670.6360.0690.0790.0010.004
11A0.0795.5040.0690.078964.370.001
12A0.0792.7600.0690.069984.190.003
13A0.0792.0500.0690.0697.3190.004
14A0.09820.0250.0690.0690.0590.001
15A0.7631.7790.0690.0690.0010.039
16A0.894242.510.0690.0698.479984.37
17A0.0792.1630.0690.0680.4550.005
18A0.3980.6930.0690.0690.0010.748
19A0.0797.1520.0690.06890.1690.001
20A0.0986.3620.0690.0690.0010.002
21A0.0792.1220.0690.069984.740.007
22A0.19655.9450.0690.069863.670.005
23A0.0735.0810.0690.0690.0010.001
24A0.831195.470.0690.069965.598.748
25A0.0941.9530.0690.0690.0010.002
26A0.0794.0700.0690.069840.910.002
27A0.0797.2550.0690.0690.0620.001
28A0.0792.7590.0690.0690.1950.003
29A0.2400.5500.0690.0690.0010.135
30A0.388149.790.0690.069920.340.236

In the proposed work, BO algorithm performed 30 evaluations, yielding an observed objective function value of 0.06875 and an estimated value of 0.069326, indicating correct prediction (Table 2). Each evaluation lasted 12.3512 s. The Box Constraint 0.0013385 promotes flexibility and reduces misclassification penalties, whereas the Kernel Scale 0.0010021 produces highly localized decision boundaries and improves accuracy.

Table 2:

Hyper parameter optimization process using Bayesian approach

BO processHyperparameter tuning
Total function evaluations30
Observed objective function value0.06875
Estimated objective function value0.069326
Function evaluation time12.3512
Box constraint0.0013385
Kernel scale0.0010021

BO, Bayesian optimization.

For effective multimodal analysis, the objective value should decline quickly at first, followed by steady convergence with minimum oscillations, as in Figure 5.

Figure 5:

Acquisition function used in BO. BO, Bayesian optimization.

A parallel coordinates diagram is depicted in Figure 6, which illustrates the hyperparameters. The image offers a comprehensive view of the hyperparameters that produce optimal results, illustrating all optimization trials that the Bayesian optimizer has conducted across a variety of digits, datasets, and matchers. It is important to note that the majority of orientation and cell hyperparameter values are within the optimal range.

Figure 6:

Parallel coordinates plot of the Hyperparameters.

The effective selection of optimal parameters is facilitated by the convergence of hyperparameter optimization, thereby enhancing the performance of the model. In Figure 7, the hyperparameter optimization chart indicates a value of 0.0854 at Iteration 29, while the EER is 0.1478 as shown in Figure 8. This suggests a more refined equilibrium between the rates of false acceptance and false rejection. The improved EER serves as an illustration of the utility of feature-level fusion, which leads to a more precise and resilient multimodal biometric system.

Figure 7:

Convergence of hyper parameter optimization.

Figure 8:

Optimized EER. EER, equal error rate; FPR, false positive rate; TPR, true positive rate.

Table 3 presents statistical analysis using fivefold validation, while Figure 9 illustrates the performance of the optimized multimodal system. The model's accuracy over five iterations is often high, with values ranging from 0.9101 to 0.9501 and an average of 0.9343. Iteration 4 achieves the highest accuracy of 0.9501, and Iteration 2 shows a 0.9101 a little drop. Similarly, the Macro AUC ranged from 0.8845 to 0.9644, indicating excellent classification performance, with Iteration 4 yielding the highest AUC. Macro Recall follows a similar pattern, peaking at 0.9559 in Iteration 4, demonstrating a strong sensitivity to good examples. Iteration 5 gets the greatest macro F1-score with 0.9570, showing balanced performance in terms of precision and recall. EER averages 0.1674, with Iteration 4 having the lowest EER of 0.1478, showing better classification threshold optimization. Finally, the NRMSE, which quantifies error, remains quite low across iterations, reaching a peak of 0.0864 in Iteration 2, indicating greater model generalization.

Table 3:

Statistical Parameter analysis using 5-Fold cross Validation.

IterationAccuracyMacro AUCMacro RecallMacro F1-scoreEERNRMSE
10.94800.91330.88530.91310.14910.0897
20.91010.89460.87280.89370.23250.0864
30.93280.88450.92690.87400.14970.0879
40.95010.96440.95590.94940.14780.0983
50.93060.91610.90870.95700.15810.0941
Average0.93430.91460.90990.91740.16740.0913
Figure 9:

Performance of optimized multimodal system. EER, equal error rate.

Figures 10 and 11 depict the confusion matrix and ROC plot detailing the model's classification performance. The plotted curve of ROC lies consistently above the diagonal line, indicating the effectiveness of the proposed model in classifying different classes.

Figure 10.

Confusion matrix in fourth iteration.

Figure 11.

ROC curve in fourth iteration. FPR, false positive rate.

Table 4 presents a comparison between the proposed model and recent state-of-the-art approaches, highlighting a slight but meaningful improvement in accuracy over current methods.

Table 4:

Comparison of proposed model with state of art of models

No.AuthorsAccuracy in%
1Kumarmohanta et al.[41]85.20
2Soleymani et al.[42]88.12
3Alshardan et al. [6]90.01
4Proposed Model95.01

Key challenges in real-world deployment include variations in acquisition conditions (e.g., lighting, sensor quality) and spoofing attempts. Our system mitigates these by using LBP, which are robust to illumination and texture changes, and by employing multimodal fusion (iris, fingerprint, ear), making it significantly harder for attackers to spoof all modalities simultaneously. This combination enhances resilience, security, and reliability in uncontrolled environments.

V.
Conclusion

The proposed Optimized Feature Level Fusion Model enhances biometric recognition by integrating iris, fingerprint, and ear features using LBPs and Bayesian optimized SVM. The experimental results demonstrate its effectiveness in improving classification accuracy, making it a robust solution for multimodal biometric systems. The objective function value of 0.06875 was obtained through BO, which also refined decision boundaries through an optimized Box Constraint of 0.0013 and Kernel Scale of 0.0010. This was achieved after 30 evaluations. Furthermore, Iteration 4 obtained the highest accuracy of 0.9501 and Macro AUC of 0.9644, thereby guaranteeing effective classification. Conversely, Iteration 2 reduced NRMSE errors. The model exhibited stability, with a balanced recall of 0.9559 and an F1-score of 0.9570. Additionally, the lowest EER of 0.1478 mitigated misclassification risks. These findings validate the model's effectiveness and highlight its potential for biometric applications, including secure authentication and identity verification. Future work will involve benchmarking our method against deep learning-based fusion approaches and conducting cross-dataset validation to assess the tradeoffs in accuracy, computational complexity, deployment feasibility, and the model's generalizability across diverse real-world conditions.

Language: English
Submitted on: Jun 18, 2025
|
Published on: Mar 5, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 R. Bharathi, M. B. Anandaraju, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.