Lightweight inception-UNet with attention mechanisms for semantic segmentation
By: Twinkle Tiwari and Mukesh Saraswat
References
- S. Hao, Y. Zhou, Y. Guo, A brief survey on semantic segmentation with deep learning, Neurocomputing 406 (2020) 302–321.
- A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, P. Martinez-Gonzalez, J. Garcia-Rodriguez, A survey on deep learning techniques for image and video semantic segmentation, Applied Soft Computing 70 (2018) 41–65.
- X. Yuan, J. Shi, L. Gu, A review of deep learning methods for semantic segmentation of remote sensing imagery, Expert Systems with Applications 169 (2021) 114417.
- G. Lampropoulos, E. Keramopoulos, K. Diamantaras, Enhancing the functionality of augmented reality using deep learning, semantic web and knowledge graphs: A review, Visual Informatics 4 (1) (2020) 32–42.
- H. T. Nguyen, N. N. Truong, L. T. T. Pham, N. H. Pham, An approach using skeleton-based representations and neural networks for yoga pose recognition, Applied Computer Systems 30 (1) (2025) 75–84.
- K. Thyagharajan, G. Kalaiarasi, A review on near-duplicate detection of images using computer vision techniques, Archives of Computational Methods in Engineering 28 (2021) 897–916.
- R. Dhir, M. Ashok, S. Gite, et al., An overview of advances in image colorization using computer vision and deep learning techniques, Rev. Comput. Eng. Res 7 (2) (2020) 86–95.
- N. T. K. Son, N. H. Quynh, B. T. Minh, Refining graduation classification accuracy with synergistic deep learning models, Cybernetics and Information Technologies 25 (2) (2025).
- M. Grupac, G. Lăzăroiu, Image processing computational algorithms, sensory data mining techniques, and predictive customer analytics in the metaverse economy, Review of Contemporary Philosophy 21 (2022) 205–222.
- Y. Zhu, C. Yao, X. Bai, Scene text detection and recognition: Recent advances and future trends, Frontiers of Computer Science 10 (2016) 19–36.
- H. Greenspan, B. Van Ginneken, R. M. Summers, Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique, IEEE transactions on medical imaging 35 (5) (2016) 1153–1159.
- V. Naosekpam, N. Sahu, Text detection, recognition, and script identification in natural scene images: A review, International Journal of Multimedia Information Retrieval 11 (3) (2022) 291–314.
- O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234–241.
- A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, arXiv preprint arXiv:1606.02147 (2016).
- V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE transactions on pattern analysis and machine intelligence 39 (12) (2017) 2481–2495.
- C. Peng, T. Tian, C. Chen, X. Guo, J. Ma, Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation, Neural Networks 137 (2021) 188–199.
- A. Abedalla, M. Abdullah, M. Al-Ayyoub, E. Benkhelifa, The 2st-unet for pneumothorax segmentation in chest x-rays using resnet34 as a backbone for u-net, arXiv preprint arXiv:2009.02805 (2020).
- Autorikshaw detection challenge,
https://cvit.iiit.ac.in/autorickshaw_detection/ , (Accessed on 01/10/2024). - Autonue challenge 2019,
https://cvit.iiit.ac.in/autonue2019/challenge/overview.php , (Accessed on 12/21/2023). - Ct liver,
https://www.kaggle.com/datasets/zxcv2022/digital-medical-images-for-download-resource/ , (Accessed on 12/21/2023). - W. Zhang, Z. Liu, L. Zhou, H. Leung, A. B. Chan, Martial arts, dancing and sports dataset: A challenging stereo and multi-view dataset for 3d human pose estimation, Image and Vision Computing 61 (2017) 22–39.
- Visual geometry group - university of oxford,
https://www.robots.ox.ac.uk/~vgg/data/pets/ , (Accessed on 12/21/2023). - F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint arXiv:1511.07122 (2015).
- W. Liu, A. Rabinovich, A. C. Berg, Parsenet: Looking wider to see better, arXiv preprint arXiv:1506.04579 (2015).
- G. Lin, C. Shen, A. Van Den Hengel, I. Reid, Efficient piecewise training of deep structured models for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3194–3203.
- H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, A. Agrawal, Context encoding for semantic segmentation, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7151–7160. doi:10.1109/CVPR.2018.00747.
- Y. Zhuang, F. Yang, L. Tao, C. Ma, Z. Zhang, Y. Li, H. Jia, X. Xie, W. Gao, Dense relation network: Learning consistent and context-aware representation for semantic image segmentation, in: 2018 25th IEEE international conference on image processing (ICIP), IEEE, 2018, pp. 3698–3702.
- H. Zhang, H. Zhang, C. Wang, J. Xie, Co-occurrent features in semantic segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 548–557.
- H. Ding, X. Jiang, B. Shuai, A. Q. Liu, G. Wang, Semantic correlation promoted shape-variant context for segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8885–8894.
- J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
- Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, Springer, 2018, pp. 3–11.
- Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d u-net: learning dense volumetric segmentation from sparse annotation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17–21, 2016, Proceedings, Part II 19, Springer, 2016, pp. 424–432.
- X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, P.-A. Heng, H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes, IEEE transactions on medical imaging 37 (12) (2018) 2663–2674.
- S. Shah, P. Ghosh, L. S. Davis, T. Goldstein, Stacked u-nets: a no-frills approach to natural image segmentation, arXiv preprint arXiv:1804.10343 (2018).
- T. M. Quan, D. G. C. Hildebrand, W.-K. Jeong, Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics, Frontiers in Computer Science 3 (2021) 613981.
- B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 447–456.
- G. Lin, A. Milan, C. Shen, I. Reid, Refinenet: Multi-path refinement networks for high-resolution semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1925–1934.
- V. Nekrasov, C. Shen, I. Reid, Light-weight refinenet for real-time semantic segmentation, arXiv preprint arXiv:1810.03272 (2018).
- T. Pohlen, A. Hermans, M. Mathias, B. Leibe, Full-resolution residual networks for semantic segmentation in street scenes, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4151–4160.
- H. Sak, A. Senior, F. Beaufays, Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition, arXiv preprint arXiv:1402.1128 (2014).
- R. Messina, J. Louradour, Segmentation-free handwritten chinese text recognition with LSTM-RNN, in: 2015 13th International conference on document analysis and recognition (icdar), IEEE, 2015, pp. 171–175.
- P. Pinheiro, R. Collobert, Recurrent convolutional neural networks for scene labeling, in: International conference on machine learning, PMLR, 2014, pp. 82–90.
- R. P. Poudel, P. Lamata, G. Montana, Recurrent fully convolutional neural networks for multi-slice mri cardiac segmentation, in: Reconstruction, Segmentation, and Analysis of Medical Images: First International Workshops, RAMBO 2016 and HVSMR 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, October 17, 2016, Revised Selected Papers 1, Springer, 2017, pp. 83–94.
- W. Byeon, T. M. Breuel, F. Raue, M. Liwicki, Scene labeling with LSTM recurrent neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3547–3555.
- F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, A. Courville, Reseg: A recurrent neural network-based model for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 41–48.
- B. Shuai, Z. Zuo, B. Wang, G. Wang, Dag-recurrent neural networks for scene labeling, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3620–3629.
- H. Fan, H. Ling, Dense recurrent neural networks for scene labeling, arXiv preprint arXiv:1801.06831 (2018).
- Q. Zhao, J. Liu, Y. Li, H. Zhang, Semantic segmentation with attention mechanism for remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 60 (2021) 1–13.
- H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
- V. Badrinarayanan, A. Kendall, R. C. SegNet, A deep convolutional encoder-decoder architecture for image segmentation, arXiv preprint arXiv:1511.00561 5 (2015).
- D. Fourure, R. Emonet, E. Fromont, D. Muselet, A. Tremeau, C. Wolf, Residual conv-deconv grid network for semantic segmentation, arXiv preprint arXiv:1707.07958 (2017).
- A. Kendall, V. Badrinarayanan, R. Cipolla, Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding, arXiv preprint arXiv:1511.02680 (2015).
- J. Fu, J. Liu, Y. Wang, J. Zhou, C. Wang, H. Lu, Stacked deconvolutional network for semantic segmentation, IEEE Transactions on Image Processing (2019).
- S. M. Sam, K. Kamardin, N. N. A. Sjarif, N. Mohamed, et al., Offline signature verification using deep learning convolutional neural network (CNN) architectures googlenet inception-v1 and inception-v3, Procedia Computer Science 161 (2019) 475–483.
- O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al., Attention u-net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999 (2018).
- J. Lee, J. Choi, J. Mok, S. Yoon, Reducing information bottleneck for weakly supervised semantic segmentation, Advances in Neural Information Processing Systems 34 (2021) 27408–27421.
- Y. Kim, Y. Lee, M. Jeon, Imbalanced image classification with complement cross entropy, Pattern Recognition Letters 151 (2021) 33–40.
- Z. Zhang, M. Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels, Advances in neural information processing systems 31 (2018).
- C. K. Dewa, et al., Suitable CNN weight initialization and activation function for Javanese vowels classification, Procedia computer science 144 (2018) 124–132.
- U. M. Khaire, R. Dhanalakshmi, High-dimensional microarray dataset classification using an improved adam optimizer (iadam), Journal of Ambient Intelligence and Humanized Computing 11 (11) (2020) 5187–5204.
- R. Meyes, M. Lu, C. W. de Puiseau, T. Meisen, Ablation studies in artificial neural networks, arXiv preprint arXiv:1901.08644 (2019).
- R. F. Woolson, Wilcoxon signed-rank test, Wiley encyclopedia of clinical trials (2007) 1–3.
DOI: https://doi.org/10.2478/ijssis-2026-0014 | Journal eISSN: 1178-5608
Language: English
Submitted on: Jul 11, 2025
Published on: Apr 10, 2026
Published by: Macquarie University, Australia
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year
Keywords:
Related subjects:
© 2026 Twinkle Tiwari, Mukesh Saraswat, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.