Text-to-Image Generation Method Combining Transformer and DF-GAN
收藏中国科学数据2026-02-09 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0069611
下载链接
链接失效反馈官方服务:
资源简介:
To address the failure of the text encoder to deeply mine text information in text-to-image generation tasks, which leads to semantic inconsistency in the subsequently generated images, a DXC-GAN method for text-to-image generation is proposed. This method introduces the Xtra Long Network (XLNet) pretraining model from the Transformer series to replace the original text encoder, enabling the capture of prior knowledge from a vast amount of text for deep mining of contextual information. A Convolutional Block Attention Module (CBAM) is added to increase the generator's focus on important information in images, thus solving the issues of incomplete image details and incorrect spatial structure. In the discriminator, contrastive loss is introduced and combined with match-aware gradient penalty and unidirectional output in the model, making images with the same semantics closer and those with different semantics further apart, thereby enhancing the semantic consistency between text and generated images. The experimental results show that compared to the DF-GAN model, the Inception Score (IS) and Fréchet Inception Distance (FID) on the CUB dataset for the proposed model improved by 4.42% and 17.96%, respectively. On the Oxford-102 dataset, the IS is 3.97 and the FID is 37.82. Evidently, compared to DF-GAN, DXC-GAN effectively avoids deformities such as multi-headedness and foot deficiency in bird image generation and significantly reduces image quality issues such as missing petals in flower image generation. Furthermore, it enhances the alignment between text and images, significantly improving the completeness and generation effect of images.
创建时间:
2026-02-09



