Lightweight inception-UNet with attention mechanisms for semantic segmentation
Abstract
Semantic segmentation is a pivotal step in extracting regions of interest from images to enhance scene understanding. However, this task is challenging when dealing with complex images, where factors such as occlusions, varying lighting conditions, diverse viewpoints, and dynamic human movement introduce substantial obstacles. For efficient segmentation, this paper presents a lightweight inception-UNet with attention mechanism to enhance the model ability to discern crucial information from the input image. Specifically, the inception module in the presented UNet captures the features at multiple levels in the encoder phase. To identify the spatial features of an image effectively, decoder phases incorporate the attention module for dense prediction. Both qualitative and quantitative results demonstrate the superiority of our proposed model by achieving highest Intersection over Union (IoU) scores of 0.8768, 0.9283, 0.8768, 0.9768, and 0.9650 across datasets. An ablation study of the proposed model is conducted along with statistical analysis and computational complexity.
© 2026 Twinkle Tiwari, Mukesh Saraswat, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.