Have a personal or library account? Click to login
The Review of Image Inpainting Cover

Figures & Tables

Figure 1.

Mask datasets
Mask datasets

Figure 2.

Image datasets
Image datasets

Figure 3.

Image inpainting methods framework
Image inpainting methods framework

Comparison of methods

FeatureProgressive Image InpaintingCNNTransformerDiffusion Models
Core PrinciplesMulti-stage processing (e.g., structure recovery followed by detail refinement, as in EdgeConnect's edge-prediction-and-filling stages)Local feature extraction via convolutional kernels (e.g., Partial Conv's mask-aware convolution)Global dependency modeling via self-attention (e.g., MAT's long-range reasoning)Iterative denoising process for image generation (e.g., RePaint's stepwise restoration)
Key Strengths1.High structural integrity2.Natural texture transitions (e.g., RFR-Net's recursive feature refinement)1.Strong local feature extraction2.High computational efficiency (e.g., DeepFill's real-time performance)1.Robust global semantics2.Effective for large missing regions (e.g., Swin Transformer's multi-scale fusion)1.Highest generation quality2.Exceptional detail recovery (e.g., DiffBIR's realistic textures)
Key Weaknesses1.High computational complexity2.Multi-stage training challenges (e.g., PRVS's convergence issues)1.Limited receptive field2.Poor long-range dependency modeling (e.g., structural discontinuities in traditional CNNs)1.High resource consumption2.Overfitting risks with small datasets (e.g., ViT's billion-scale pre-training requirement)1.Slow inference2.High memory usage (e.g., DDPM's 1,000-step iterations)
Typical Use CasesComplex structural restoration (e.g., artifact crack repair)Small-area fast restoration (e.g., watermark removal from phone photos)Large-area semantic restoration (e.g., street view occlusion removal)High-fidelity detail generation (e.g., medical image super-resolution)
Computatio nal EfficiencyModerate (requires multiple forward passes)High (parallelizable computations)Low (quadratic attention complexity)Very Low (hundreds of denoising steps)
Training Data NeedsModerate (requires structural annotations like edge maps)Moderate (millions of images)Very High (billion-scale pretraining)Very High (massive high-quality datasets)
Representat ive MethodsEdgeConnect, RFR-NetPartial Conv, DeepFillMAT, SwinIRRePaint, DiffBIR
Language: English
Page range: 54 - 71
Published on: Sep 30, 2025
Published by: Xi’an Technological University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Tongyang Zhu, Li Zhao, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.