An Improved Yolo Algorithm Based on Concise Decoupled Head for Real-Time Object Detection in Night Scenarios

Yanhua Ma; Ke Lv; Li-Juan Liu; Hamid Reza Karimi

doi:10.2478/jaiscr-2026-0011

.blurhash-client-img { display: none !important; }

An Improved Yolo Algorithm Based on Concise Decoupled Head for Real-Time Object Detection in Night Scenarios

Journal of Artificial Intelligence and Soft Computing Research

Volume 16 (2026): Issue 3 (June 2026)

By: Yanhua Ma , Ke Lv , Li-Juan Liu and Hamid Reza Karimi

Open Access

|Feb 2026

Abstract

This paper proposes CDH-YOLO, an efficient, real-time pedestrian detection model for nighttime RGB images. Built on YOLOv5, CDH-YOLO incorporates structural reparameterization to optimize the backbone network and integrates convolutional block attention module to enhance feature representation. Transposed convolution replaces nearest neighbor interpolation for upsampling to preserve semantic information. A lightweight decoupled head addresses spatial misalignment between classification and regression tasks, while SIoU loss improves training convergence and localization accuracy. Experiments on the KAIST dataset demonstrate that CDH-YOLO achieves superior accuracy with real-time performance, significantly outperforming existing methods in nighttime pedestrian detection.

References

K. Min, G.-H. Lee, S.-W. Lee, Attentional feature pyramid network for small object detection, Neural Networks, 155, 2022, 439–450.
Search in Google Scholar Back to article
C. Zhang, Q. Gao, R. Shi, M. Yue, LDHDNet: a lightweight network with double branch head for feature enhancement of UAV targets in complex scenes, International Journal of Intelligent Systems, 2024, 7259029.
Search in Google Scholar Back to article
J. Wei, S. Su, Z. Zhao, et al., Infrared pedestrian detection using improved UNet and YOLO through sharing visible light domain information, Measurement, 221, 2023, 113442.
Search in Google Scholar Back to article
H. Shang, L. Sun, W. Qin, Pedestrian detection at night based on infrared camera and millimeter wave radar fusion, Journal of Sensor Technology, 34, 2021, 1137–1145.
Search in Google Scholar Back to article
Y. Xue, Z. Ju, Y. Li, W. Zhang, MAF-YOLO: multi-modal attention fusion based YOLO for pedestrian detection, Infrared Physics & Technology, 118, 2021, 103906.
Search in Google Scholar Back to article
L.-J. Liu, Y. Zhang, H. R. Karimi, Resilient machine learning for steel surface defect detection based on lightweight convolution, International Journal of Advanced Manufacturing Technology, 134, 2024, 4639–4650.
Search in Google Scholar Back to article
L.-J. Liu, S.-Q. Sun, H.R. Karimi, A real-time surface defect detection model based on adaptive feature information selection and fusion, Information Fusion, 129, 2026.
Search in Google Scholar Back to article
L. Zhang, L. Zhong, et al., Knowledge-guided multi-task attention network for survival risk prediction using multi-center computed tomography images, Neural Networks, 152, 2022, 394–406.
Search in Google Scholar Back to article
J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and-excitation networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2017, 2011–2023.
Search in Google Scholar Back to article
S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, CBAM: convolutional block attention module, Proceedings of the European Conference on Computer Vision (ECCV), 2018, 3–19.
Search in Google Scholar Back to article
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, European Conference on Computer Vision, 2020, 213–229.
Search in Google Scholar Back to article
Y. Cao, J. Xu, S. Lin, F. Wei, H. Hu, GCNet: non-local networks meet squeeze-excitation networks and beyond, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, 1971–1980.
Search in Google Scholar Back to article
X. Ma, K. Hu, X. Sun, S. Chen, Adaptive attention module for image recognition systems in autonomous driving, International Journal of Intelligent Systems, 2024, 3934270.
Search in Google Scholar Back to article
L.-J. Liu, Y. Zhang, H.R.Karimi, Defect detection of printed circuit board surface based on an improved YOLOv8 with FasterNet backbone algorithms. SIViP 19, 89 (2025).
Search in Google Scholar Back to article
L.-J. Liu, S.-Q. Sun, H.R. Karimi, A real-time surface defect detection model based on adaptive feature information selection and fusion, Information Fusion, 129, 2026.
Search in Google Scholar Back to article
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, OverFeat: integrated recognition, localization and detection using convolutional networks, CoRR, 2013.
Search in Google Scholar Back to article
W. Liu, D. Anguelov, D. Erhan, et al., SSD: single shot MultiBox detector, European Conference on Computer Vision, 2016.
Search in Google Scholar Back to article
J. Redmon, S. K. Divvala, R. B. Girshick, A. Farhadi, You only look once: unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 779–788.
Search in Google Scholar Back to article
R. B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2013, 580–587.
Search in Google Scholar Back to article
R. Girshick, Fast R-CNN, 2015 IEEE International Conference on Computer Vision (ICCV), 2015, 1440–1448.
Search in Google Scholar Back to article
K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
Search in Google Scholar Back to article
K. He, G. Gkioxari, P. Dollár, R. B. Girshick, Mask R-CNN, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2017, 386–397.
Search in Google Scholar Back to article
S. Ren, K. He, R. B. Girshick, J. Sun, Faster RCNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2015, 1137–1149.
Search in Google Scholar Back to article
J. Redmon, A. Farhadi, YOLO9000: better, faster, stronger, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 6517–6525.
Search in Google Scholar Back to article
J. Redmon, YOLOv3: an incremental improvement, ArXiv, 2018.
Search in Google Scholar Back to article
A. Bochkovskiy, C.-Y. Wang, H.-Y. M. Liao, YOLOv4: optimal speed and accuracy of object detection, ArXiv, 2020.
Search in Google Scholar Back to article
G. Jocher, YOLOv5 by Ultralytics, Available at https://github.com/ultralytics/yolov5, 2022.
Search in Google Scholar Back to article
C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, et al., YOLOv6: a single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976, 2022.
Search in Google Scholar Back to article
C.-Y. Wang, A. Bochkovskiy, H.-Y. M. Liao, YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 7464–7475.
Search in Google Scholar Back to article
W. Zhang, Z. Hong, L. Xiong, Z. Zeng, Z. Cai, K. Tan, Sinextnet: a new small object detection model for aerial images based on PP-YOLOE, Journal of Artificial Intelligence and Soft Computing Research, 14, 2024.
Search in Google Scholar Back to article
X. Ji, J. Chang, Y. Ji, Adaptive separation fusion: a novel downsampling approach in CNNs, Journal of Artificial Intelligence and Soft Computing Research, 15, 2025.
Search in Google Scholar Back to article
T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, S. Belongie, et al., Feature pyramid networks for object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 936–944.
Search in Google Scholar Back to article
S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 8759–8768.
Search in Google Scholar Back to article
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, CoRR, 2014.
Search in Google Scholar Back to article
K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. S. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, CoRR, 2015.
Search in Google Scholar Back to article
K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, et al., DRAW: a recurrent neural network for image generation, ArXiv, 2015.
Search in Google Scholar Back to article
M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, Advances in Neural Information Processing Systems, 28, 2015.
Search in Google Scholar Back to article
J. Xu, Y. Cai, X. Wu, X. Lei, Q. Huang, H. F. Leung, Q. Li, Incorporating context-relevant concepts into convolutional neural networks for short text classification, Neurocomputing, 386, 2020, 42–53.
Search in Google Scholar Back to article
Y. Cai, Q. Huang, Z. Lin, J. Xu, et al., Recurrent neural network with pooling operation and attention mechanism for sentiment analysis: a multi-task learning approach, Knowledge-Based Systems, 203, 2020, 1–12.
Search in Google Scholar Back to article
T. Hussain, W.-C. Wang, M. Gogate, K. Dashtipour, Y. Tsao, X. Lu, A. Ahsan, A. Hussain, A novel temporal attentive-pooling based convolutional recurrent architecture for acoustic signal enhancement, IEEE Transactions on Artificial Intelligence, 3, 2022, 833–842.
Search in Google Scholar Back to article
M. Nawaz, T. Nazir, A. Javed, M. F. Masood, J. Rashid, J. Kim, A. Hussain, A robust deep learning approach for tomato plant leaf disease localization and classification, Scientific Reports, 12, 2022.
Search in Google Scholar Back to article
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 770–778.
Search in Google Scholar Back to article
A. Veit, M. J. Wilber, S. Belongie, Residual networks behave like ensembles of relatively shallow networks, Advances in Neural Information Processing Systems, 29, 2016.
Search in Google Scholar Back to article
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, et al., Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, 1–9.
Search in Google Scholar Back to article
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, ArXiv, 2015.
Search in Google Scholar Back to article
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, et al., Rethinking the inception architecture for computer vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 2818–2826.
Search in Google Scholar Back to article
C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, Inception-v4, Inception-ResNet and the impact of residual connections on learning, ArXiv, 2016.
Search in Google Scholar Back to article
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, J. Sun, RepVGG: making VGG-style ConvNets great again, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 13728–13737.
Search in Google Scholar Back to article
G. Song, Y. Liu, X. Wang, Revisiting the sibling head in object detector, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 11563–11572.
Search in Google Scholar Back to article
Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, YOLOX: exceeding YOLO series in 2021, ArXiv, 2021.
Search in Google Scholar Back to article
Z. Gevorgyan, SIoU loss: more powerful learning for bounding box regression, ArXiv, 2022.
Search in Google Scholar Back to article
S. Hwang, J. Park, N. Kim, Y. Choi, I. S. Kweon, Multispectral pedestrian detection: benchmark dataset and baseline, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1037–1045.
Search in Google Scholar Back to article
Y. Cao, C. Li, Y. Peng, Night pedestrian detection algorithm based on improved YOLOv7, Changjiang Information & Communications, 35, 2023, 57–60.
Search in Google Scholar Back to article
Z. He, G. Chen, J. Chen, Y. Zhang, et al., Multi-scale feature fusion lightweight real-time infrared pedestrian detection at night, Chinese Journal of Lasers, 49, 2023, 1709002.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/jaiscr-2026-0011

Journal RSS Feed

Language: English

Page range: 219 - 235

Submitted on: Jun 30, 2025

Accepted on: Dec 27, 2025

Published on: Feb 25, 2026

Published by: SAN University

In partnership with: Paradigm Publishing Services

Keywords:

Convolutional block attention module (CBAM),

decoupled head,

structure reparameterization,

YOLO

Related subjects:

Computer sciences,

Databases and data mining,

Artificial intelligence

© 2026 Yanhua Ma, Ke Lv, Li-Juan Liu, Hamid Reza Karimi, published by SAN University
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 16 (2026): Issue 3 (June 2026)