Abstract
Finding the optimal level of data augmentation intensity remains one of the most challenging aspects of training deep learning models on small-scale datasets, which is particularly relevant for resource-constrained environments in robotics and automation systems. While data augmentation is universally recognized as essential for preventing overfitting and improving generalization, excessive augmentation can paradoxically harm model performance by introducing too much variability in the training data. This research investigates the “sweet spot” of augmentation intensity through a comprehensive study of six distinct augmentation strategies on CIFAR-10, a representative small-scale image classification benchmark commonly used in mobile robotics applications. We designed a controlled experiment comparing: No Augmentation (baseline), Basic torchvision transforms, Light Advanced albumentations, Moderate Advanced geometric-photometric combinations, Strong Advanced with noise injection, and AutoAugment Style with complex transformations. Our findings reveal a clear relationship between augmentation intensity and model performance, with peak performance achieved at moderate intensity levels (Basic strategy with intensity score [IS] 0.49). The Basic augmentation strategy achieved 79.84% validation accuracy, significantly outperforming both minimal augmentation (77.49%) and excessive augmentation (71.64%). Through statistical analysis including correlation studies (Pearson r = –0.759, p = 0.080; Spearman ρ = –0.714, p = 0.111), the “sweet spot" lies in balanced augmentation that provides regularization benefits without overwhelming the learning process.
