An Improved RT-DETR-Based Object Detection Algorithm for UAV Aerial Photography

Yingying Long; Liuhua Di; Yanfang Fu; Shifeng Zhao; Xiaojun Bai

doi:10.2478/ijanmc-2025-0036

Abstract

—Object detection and recognition in drone aerial images hold broad application value, but also present challenges such as large variations in object scales, difficulties in detecting small objects, and occlusions in dense scenes. To address these issues, this paper proposes an improved object detection algorithm based on RT-DETR. First, a Spatial-Channel Collaborative Attention (SCSA) module is introduced into the PResNet backbone network to enhance feature representation and improve detection accuracy. Second, the Content-Aware ReAssembly of Features (CARAFE) upsampling method is adopted in the Hybrid Encoder, which preserves more detailed information of small objects while reducing model complexity, further boosting detection performance. Finally, a modified MFRC3 module incorporating Biphasic Feature Aggregation Module (BFAM) and boundary attention mechanism is proposed to replace the original CSPRepLayer. This enhances multi-scale feature fusion and improves the retention of fine-grained and textural features.Experimental results on the VisDrone2019 datasets show that the improved algorithm achieves an mAP@0.5 of 51.1%, which is 3.1% higher than the baseline RT-DETR model.

References

Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition.IEEE, 2005.DOI: 10.1109/CVPR.2005.177.
Open DOI Search in Google Scholar Back to article
Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[J].IEEE Computer Society, 2014.DOI:10.1109/CVPR.2014.81.
Open DOI Search in Google Scholar Back to article
Girshick R. Fast R-CNN [J].Computer Science, 2015.DOI:10.1109/ICCV.2015.169.
Open DOI Search in Google Scholar Back to article
Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-Time Object Detection[C]//Computer Vision & Pattern Recognition.IEEE, 2016.DOI:10.1109/CVPR.2016.91.
Open DOI Search in Google Scholar Back to article
Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[J].arXiv, 2017.DOI: 10.48550/arXiv. 1706.03762.
Open DOI Search in Google Scholar Back to article
Carion N, Massa F, Synnaeve G, et al.End-to-End Object Detection with Transformers [M]. 2020.
Search in Google Scholar Back to article
Zhu X, Su W, Lu L, et al.Deformable DETR: Deformable Transformers for End-to-End Object Detection [J]. 2020. DOI: 10.48550/arXiv.2010.04159.
Open DOI Search in Google Scholar Back to article
Li F, Zhang H, Zhang N L. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4):2239-2251.
Search in Google Scholar Back to article
Liu S, Li F, Zhang H, et al.DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR [J]. 2022. DOI:10.48550/arXiv.2201.12329.
Open DOI Search in Google Scholar Back to article
Liu S, et al. RT-DETR: Real-Time Detection Transformer. NeurIPS 2023.
Search in Google Scholar Back to article
Wang J, et al. SCSA: Exploring Spatial-Channel Synergy. CVPR 2022.
Search in Google Scholar Back to article
Zhang Y, et al. CARAFE: Content-Aware Feature Reassembly. ICCV 2019.
Search in Google Scholar Back to article
Ding X, Zhang X. Ma N, et al. RepVGG: Making VGG-style ConvNets Great Again [J]. 2021. DOI:10.1109/CVPR46437.2021.01352.
Open DOI Search in Google Scholar Back to article
Li X, et al. B2CNet: Boundary-to-Center Refinement. TGRS 2023.
Search in Google Scholar Back to article
Zhu P, Wen L, Bian X, et al.Vision Meets Drones: A Challenge [J].Springer, Cham, 2018.DOI:10.1007/978-3-030-11021-5_27.
Open DOI Search in Google Scholar Back to article
Wang A, Chen H, Liu L, et al.YOLOv10: Real-Time End-to-End Object Detection [J]. 2024.
Search in Google Scholar Back to article
Zhang H, Li F, Liu S, et al. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection [J].arXiv e-prints, 2022.DOI:10.48550/arXiv.2203.03605.
Open DOI Search in Google Scholar Back to article

An Improved RT-DETR-Based Object Detection Algorithm for UAV Aerial Photography

Abstract

Paradigm

My account