Abstract
To address the challenges of low reconstruction accuracy and insufficient model generalization in image super-resolution (ISR) under complex degradation scenarios, this paper proposes an improved method that integrates generative adversarial networks (GAN) and vision transformers (ViT). First, in the generator module of Real-ESRGAN, some residualin-residual dense blocks (RRDB) are replaced with ViT modules, leveraging the self-attention mechanism to enhance global feature modeling. This enables the model to better capture global information while preserving local details in complex scenes. Experimental results demonstrate that the improved model achieves PSNR gains of 0.59dB/0.45dB and SSIM improvements of 0.018/0.056 in ×2/×4 upscaling tasks on the Urban100 dataset, while also exhibiting excellent performance on benchmark datasets such as Set14. This method significantly enhances image reconstruction quality under complex degradation conditions, providing an effective technical solution for practical applications such as security surveillance, remote sensing monitoring, and target reconnaissance.