Single-image generation models generate high-quality and diverse images by learning the internal distribution of patches within a single image, addressing the issue of data scarcity and attracting increasing attention. However, existing methods are unsatisfactory when dealing with images with global structures, such as animal images. To address this issue, we propose Semantic fusion and Structure-guided global generation from a Single image with Diffusion models (S3Diff). Specifically, during training, we employ a semantic extractor to extract high-level semantic features from training images and use the proposed semantic fusion block to fuse semantic features with image features, enhancing the model’s understanding of image semantics and improving the quality of the generated images. During sampling, we apply manifold constrained gradient based on image structure to enforce the generation path to regress to the manifold of the original image, preserving reasonable global structures. Extensive experiments on public datasets demonstrate the thorough exploration of hyperparameters and the rationality of key designs, with quantitative and qualitative comparisons against baseline methods and validating that our proposed method preserves reasonable semantic and structural relationships, can generate high-quality and diverse images, significantly improving the model’s global generation capabilities.
© 2025 Xianjie Zhang, Yusen Zhang, Yujie He, Min Li, published by SAN University
This work is licensed under the Creative Commons Attribution 4.0 License.