Lightweight inception-UNet with attention mechanisms for semantic segmentation

Twinkle Tiwari; Mukesh Saraswat

doi:10.2478/ijssis-2026-0014

.blurhash-client-img { display: none !important; }

Lightweight inception-UNet with attention mechanisms for semantic segmentation

International Journal on Smart Sensing and Intelligent Systems

Volume 19 (2026): Issue 1 (January 2026)

By: Twinkle Tiwari and Mukesh Saraswat

Open Access

|Apr 2026

Abstract

Semantic segmentation is a pivotal step in extracting regions of interest from images to enhance scene understanding. However, this task is challenging when dealing with complex images, where factors such as occlusions, varying lighting conditions, diverse viewpoints, and dynamic human movement introduce substantial obstacles. For efficient segmentation, this paper presents a lightweight inception-UNet with attention mechanism to enhance the model ability to discern crucial information from the input image. Specifically, the inception module in the presented UNet captures the features at multiple levels in the encoder phase. To identify the spatial features of an image effectively, decoder phases incorporate the attention module for dense prediction. Both qualitative and quantitative results demonstrate the superiority of our proposed model by achieving highest Intersection over Union (IoU) scores of 0.8768, 0.9283, 0.8768, 0.9768, and 0.9650 across datasets. An ablation study of the proposed model is conducted along with statistical analysis and computational complexity.

References

S. Hao, Y. Zhou, Y. Guo, A brief survey on semantic segmentation with deep learning, Neurocomputing 406 (2020) 302–321.
Search in Google Scholar Back to article
A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, P. Martinez-Gonzalez, J. Garcia-Rodriguez, A survey on deep learning techniques for image and video semantic segmentation, Applied Soft Computing 70 (2018) 41–65.
Search in Google Scholar Back to article
X. Yuan, J. Shi, L. Gu, A review of deep learning methods for semantic segmentation of remote sensing imagery, Expert Systems with Applications 169 (2021) 114417.
Search in Google Scholar Back to article
G. Lampropoulos, E. Keramopoulos, K. Diamantaras, Enhancing the functionality of augmented reality using deep learning, semantic web and knowledge graphs: A review, Visual Informatics 4 (1) (2020) 32–42.
Search in Google Scholar Back to article
H. T. Nguyen, N. N. Truong, L. T. T. Pham, N. H. Pham, An approach using skeleton-based representations and neural networks for yoga pose recognition, Applied Computer Systems 30 (1) (2025) 75–84.
Search in Google Scholar Back to article
K. Thyagharajan, G. Kalaiarasi, A review on near-duplicate detection of images using computer vision techniques, Archives of Computational Methods in Engineering 28 (2021) 897–916.
Search in Google Scholar Back to article
R. Dhir, M. Ashok, S. Gite, et al., An overview of advances in image colorization using computer vision and deep learning techniques, Rev. Comput. Eng. Res 7 (2) (2020) 86–95.
Search in Google Scholar Back to article
N. T. K. Son, N. H. Quynh, B. T. Minh, Refining graduation classification accuracy with synergistic deep learning models, Cybernetics and Information Technologies 25 (2) (2025).
Search in Google Scholar Back to article
M. Grupac, G. Lăzăroiu, Image processing computational algorithms, sensory data mining techniques, and predictive customer analytics in the metaverse economy, Review of Contemporary Philosophy 21 (2022) 205–222.
Search in Google Scholar Back to article
Y. Zhu, C. Yao, X. Bai, Scene text detection and recognition: Recent advances and future trends, Frontiers of Computer Science 10 (2016) 19–36.
Search in Google Scholar Back to article
H. Greenspan, B. Van Ginneken, R. M. Summers, Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique, IEEE transactions on medical imaging 35 (5) (2016) 1153–1159.
Search in Google Scholar Back to article
V. Naosekpam, N. Sahu, Text detection, recognition, and script identification in natural scene images: A review, International Journal of Multimedia Information Retrieval 11 (3) (2022) 291–314.
Search in Google Scholar Back to article
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234–241.
Search in Google Scholar Back to article
A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, arXiv preprint arXiv:1606.02147 (2016).
Search in Google Scholar Back to article
V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE transactions on pattern analysis and machine intelligence 39 (12) (2017) 2481–2495.
Search in Google Scholar Back to article
C. Peng, T. Tian, C. Chen, X. Guo, J. Ma, Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation, Neural Networks 137 (2021) 188–199.
Search in Google Scholar Back to article
A. Abedalla, M. Abdullah, M. Al-Ayyoub, E. Benkhelifa, The 2st-unet for pneumothorax segmentation in chest x-rays using resnet34 as a backbone for u-net, arXiv preprint arXiv:2009.02805 (2020).
Search in Google Scholar Back to article
Autorikshaw detection challenge, https://cvit.iiit.ac.in/autorickshaw_detection/, (Accessed on 01/10/2024).
Search in Google Scholar Back to article
Autonue challenge 2019, https://cvit.iiit.ac.in/autonue2019/challenge/overview.php, (Accessed on 12/21/2023).
Search in Google Scholar Back to article
Ct liver, https://www.kaggle.com/datasets/zxcv2022/digital-medical-images-for-download-resource/, (Accessed on 12/21/2023).
Search in Google Scholar Back to article
W. Zhang, Z. Liu, L. Zhou, H. Leung, A. B. Chan, Martial arts, dancing and sports dataset: A challenging stereo and multi-view dataset for 3d human pose estimation, Image and Vision Computing 61 (2017) 22–39.
Search in Google Scholar Back to article
Visual geometry group - university of oxford, https://www.robots.ox.ac.uk/~vgg/data/pets/, (Accessed on 12/21/2023).
Search in Google Scholar Back to article
F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint arXiv:1511.07122 (2015).
Search in Google Scholar Back to article
W. Liu, A. Rabinovich, A. C. Berg, Parsenet: Looking wider to see better, arXiv preprint arXiv:1506.04579 (2015).
Search in Google Scholar Back to article
G. Lin, C. Shen, A. Van Den Hengel, I. Reid, Efficient piecewise training of deep structured models for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3194–3203.
Search in Google Scholar Back to article
H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, A. Agrawal, Context encoding for semantic segmentation, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7151–7160. doi:10.1109/CVPR.2018.00747.
Open DOI Search in Google Scholar Back to article
Y. Zhuang, F. Yang, L. Tao, C. Ma, Z. Zhang, Y. Li, H. Jia, X. Xie, W. Gao, Dense relation network: Learning consistent and context-aware representation for semantic image segmentation, in: 2018 25th IEEE international conference on image processing (ICIP), IEEE, 2018, pp. 3698–3702.
Search in Google Scholar Back to article
H. Zhang, H. Zhang, C. Wang, J. Xie, Co-occurrent features in semantic segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 548–557.
Search in Google Scholar Back to article
H. Ding, X. Jiang, B. Shuai, A. Q. Liu, G. Wang, Semantic correlation promoted shape-variant context for segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8885–8894.
Search in Google Scholar Back to article
J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
Search in Google Scholar Back to article
Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, Springer, 2018, pp. 3–11.
Search in Google Scholar Back to article
Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d u-net: learning dense volumetric segmentation from sparse annotation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17–21, 2016, Proceedings, Part II 19, Springer, 2016, pp. 424–432.
Search in Google Scholar Back to article
X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, P.-A. Heng, H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes, IEEE transactions on medical imaging 37 (12) (2018) 2663–2674.
Search in Google Scholar Back to article
S. Shah, P. Ghosh, L. S. Davis, T. Goldstein, Stacked u-nets: a no-frills approach to natural image segmentation, arXiv preprint arXiv:1804.10343 (2018).
Search in Google Scholar Back to article
T. M. Quan, D. G. C. Hildebrand, W.-K. Jeong, Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics, Frontiers in Computer Science 3 (2021) 613981.
Search in Google Scholar Back to article
B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 447–456.
Search in Google Scholar Back to article
G. Lin, A. Milan, C. Shen, I. Reid, Refinenet: Multi-path refinement networks for high-resolution semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1925–1934.
Search in Google Scholar Back to article
V. Nekrasov, C. Shen, I. Reid, Light-weight refinenet for real-time semantic segmentation, arXiv preprint arXiv:1810.03272 (2018).
Search in Google Scholar Back to article
T. Pohlen, A. Hermans, M. Mathias, B. Leibe, Full-resolution residual networks for semantic segmentation in street scenes, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4151–4160.
Search in Google Scholar Back to article
H. Sak, A. Senior, F. Beaufays, Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition, arXiv preprint arXiv:1402.1128 (2014).
Search in Google Scholar Back to article
R. Messina, J. Louradour, Segmentation-free handwritten chinese text recognition with LSTM-RNN, in: 2015 13th International conference on document analysis and recognition (icdar), IEEE, 2015, pp. 171–175.
Search in Google Scholar Back to article
P. Pinheiro, R. Collobert, Recurrent convolutional neural networks for scene labeling, in: International conference on machine learning, PMLR, 2014, pp. 82–90.
Search in Google Scholar Back to article
R. P. Poudel, P. Lamata, G. Montana, Recurrent fully convolutional neural networks for multi-slice mri cardiac segmentation, in: Reconstruction, Segmentation, and Analysis of Medical Images: First International Workshops, RAMBO 2016 and HVSMR 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, October 17, 2016, Revised Selected Papers 1, Springer, 2017, pp. 83–94.
Search in Google Scholar Back to article
W. Byeon, T. M. Breuel, F. Raue, M. Liwicki, Scene labeling with LSTM recurrent neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3547–3555.
Search in Google Scholar Back to article
F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, A. Courville, Reseg: A recurrent neural network-based model for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 41–48.
Search in Google Scholar Back to article
B. Shuai, Z. Zuo, B. Wang, G. Wang, Dag-recurrent neural networks for scene labeling, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3620–3629.
Search in Google Scholar Back to article
H. Fan, H. Ling, Dense recurrent neural networks for scene labeling, arXiv preprint arXiv:1801.06831 (2018).
Search in Google Scholar Back to article
Q. Zhao, J. Liu, Y. Li, H. Zhang, Semantic segmentation with attention mechanism for remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 60 (2021) 1–13.
Search in Google Scholar Back to article
H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
Search in Google Scholar Back to article
V. Badrinarayanan, A. Kendall, R. C. SegNet, A deep convolutional encoder-decoder architecture for image segmentation, arXiv preprint arXiv:1511.00561 5 (2015).
Search in Google Scholar Back to article
D. Fourure, R. Emonet, E. Fromont, D. Muselet, A. Tremeau, C. Wolf, Residual conv-deconv grid network for semantic segmentation, arXiv preprint arXiv:1707.07958 (2017).
Search in Google Scholar Back to article
A. Kendall, V. Badrinarayanan, R. Cipolla, Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding, arXiv preprint arXiv:1511.02680 (2015).
Search in Google Scholar Back to article
J. Fu, J. Liu, Y. Wang, J. Zhou, C. Wang, H. Lu, Stacked deconvolutional network for semantic segmentation, IEEE Transactions on Image Processing (2019).
Search in Google Scholar Back to article
S. M. Sam, K. Kamardin, N. N. A. Sjarif, N. Mohamed, et al., Offline signature verification using deep learning convolutional neural network (CNN) architectures googlenet inception-v1 and inception-v3, Procedia Computer Science 161 (2019) 475–483.
Search in Google Scholar Back to article
O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al., Attention u-net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999 (2018).
Search in Google Scholar Back to article
J. Lee, J. Choi, J. Mok, S. Yoon, Reducing information bottleneck for weakly supervised semantic segmentation, Advances in Neural Information Processing Systems 34 (2021) 27408–27421.
Search in Google Scholar Back to article
Y. Kim, Y. Lee, M. Jeon, Imbalanced image classification with complement cross entropy, Pattern Recognition Letters 151 (2021) 33–40.
Search in Google Scholar Back to article
Z. Zhang, M. Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels, Advances in neural information processing systems 31 (2018).
Search in Google Scholar Back to article
C. K. Dewa, et al., Suitable CNN weight initialization and activation function for Javanese vowels classification, Procedia computer science 144 (2018) 124–132.
Search in Google Scholar Back to article
U. M. Khaire, R. Dhanalakshmi, High-dimensional microarray dataset classification using an improved adam optimizer (iadam), Journal of Ambient Intelligence and Humanized Computing 11 (11) (2020) 5187–5204.
Search in Google Scholar Back to article
R. Meyes, M. Lu, C. W. de Puiseau, T. Meisen, Ablation studies in artificial neural networks, arXiv preprint arXiv:1901.08644 (2019).
Search in Google Scholar Back to article
R. F. Woolson, Wilcoxon signed-rank test, Wiley encyclopedia of clinical trials (2007) 1–3.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/ijssis-2026-0014 | Journal eISSN: 1178-5608

Journal RSS Feed

Language: English

Submitted on: Jul 11, 2025

Published on: Apr 10, 2026

Published by: Macquarie University, Australia

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

semantic segmentation,

UNet,

attention module

Related subjects:

Engineering,

Introductions and overviews,

Engineering, other

© 2026 Twinkle Tiwari, Mukesh Saraswat, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 19 (2026): Issue 1 (January 2026)