Abstract
—Object detection and recognition in drone aerial images hold broad application value, but also present challenges such as large variations in object scales, difficulties in detecting small objects, and occlusions in dense scenes. To address these issues, this paper proposes an improved object detection algorithm based on RT-DETR. First, a Spatial-Channel Collaborative Attention (SCSA) module is introduced into the PResNet backbone network to enhance feature representation and improve detection accuracy. Second, the Content-Aware ReAssembly of Features (CARAFE) upsampling method is adopted in the Hybrid Encoder, which preserves more detailed information of small objects while reducing model complexity, further boosting detection performance. Finally, a modified MFRC3 module incorporating Biphasic Feature Aggregation Module (BFAM) and boundary attention mechanism is proposed to replace the original CSPRepLayer. This enhances multi-scale feature fusion and improves the retention of fine-grained and textural features.Experimental results on the VisDrone2019 datasets show that the improved algorithm achieves an mAP@0.5 of 51.1%, which is 3.1% higher than the baseline RT-DETR model.