Abstract
—Small object detection remains a formidable challenge in computer vision, primarily because conventional models like SSD suffer from two critical limitations: weak semantic information in shallow feature maps and a mismatch between the receptive field and the actual size of small targets. To address these deficiencies, this paper introduces Lite-RFB SSD, an innovative architecture that strategically integrates a lightweight Receptive Field Block (RFB) module into the SSD framework. This module is meticulously reconstructed using depthwise separable convolutions and channel pruning techniques, resulting in a remarkable 62% reduction in parameters. By embedding this optimized module into the shallow conv4_3 layer, the model preserves high-resolution features crucial for small object detection while significantly enhancing computational efficiency. Experimental validation on the PASCAL VOC dataset demonstrates that Lite-RFB SSD achieves an average precision for small objects (APs) of 22.9%, a substantial 4.2% improvement over the original SSD. Furthermore, it operates at an impressive 28 FPS on edge devices, establishing a superior balance between accuracy and efficiency that outperforms competing methods such as standard RFB and MobileNet-SSD.