Chest X-ray analysis is vital for clinical screening, diagnosis, and treatment planning. The increasing workload on radiologists calls for robust automated solutions to generate accurate and standardized reports. Conventional report generation models often struggle to detect rare and anomalous diseases, particularly when faced with imbalanced datasets, which can compromise diagnostic knowledge accuracy. To address these limitations, we propose ChestXGen, a novel multimodal framework for automated radiology report generation. Our model is based on a fully Transformer-based encoder-decoder architecture that integrates Memory Augmented Transformer (MAT) blocks with a Context-Aware Bi-Gate (CABG) mechanism. These enable the model to capture long-range dependencies, effectively fuse visual and textual features, and better handle underrepresented conditions. Visual features are extracted using a ResNet-101-V2 backbone and refined through a shared memory module that continuously reinforces cross-modal associations. This integrated approach facilitates the generation of comprehensive, accurate, and contextually coherent reports. Extensive evaluation on the large-scale MIMIC-CXR dataset, comprising 377,110 images and corresponding free-text reports demonstrate that ChestXGen outperforms previous models on BLEU-1, BLEU-2, BLEU-3, and METEOR metrics. The results demonstrate the efficacy of Transformer-based models in substantially reducing radiologists’ reporting burden while concurrently enhancing the precision and reliability of diagnostic interpretations.
© 2025 Sharofiddin Allaberdiev, Asif Khan, Sardor Mamarasulov, Xiaojun Chen, published by SAN University
This work is licensed under the Creative Commons Attribution 4.0 License.