Abstract
Reliable epileptic seizure detection remains challenging due to the heterogeneity of modalities and poor interpretability in existing models. To address these issues, this research proposes NeuroFusionNet, a unified multi-modal framework that jointly leverages Electro-Encephalo-Gram (EEG) and functional Magnetic Resonance Imaging (fMRI) signals through modality-specific graph encoders and a Cross-Modal Graph Transformer (CMGT). The CMGT architecture captures both temporal and spatial-functional dynamics, enabling robust feature learning across modalities. Additionally, a modality-wise contrastive alignment objective is employed to ensure latent consistency, then an evidential uncertainty head is also incorporated, which assists in estimating clinical reliability for calibrated confidence. Hence, the model demonstrates strong generalization across CHB-MIT, resting-state (rs)-fMRI from UW–Madison, and 7 T fMRI datasets. Finally, the proposed NeuroFusionNet achieved higher results with 99.22% accuracy, 99.89% precision, and 99.85% recall, outperforming the existing TriSeizureDualNet model. These results determine that the proposed NeuroFusionNet is interpretable and trustworthy for seizure detection.
