Multi-modal Synergy: Bridging Chinese Culture Expression and Teaching Interaction in Art English Textbooks via Self-supervised Learning

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/z7j7fdpj5k

下载链接

链接失效反馈

官方服务：

资源简介：

Existing art English textbooks often have inadequate semantic coherence and limited interactivity in Chinese cultural content. To address this, we propose a novel methodology integrating self-supervised learning with the LXMERT model. The CLIP model is used to align text, image, and video data in a shared semantic space. Then, LXMERT's two-stream Transformer architecture extracts and fuses multimodal features, enhancing fusion consistency. A Chinese cultural knowledge graph boosts cultural concept semantic expression and improves textbook content structure. Our two-way interactive model and cross-cultural learning path optimization strategy also enhance teaching adaptability and communication effectiveness. Experiments show our approach improves text-image semantic matching (above 0.88), multimodal feature fusion consistency (0.895), and concept matching accuracy (up to 0.94), while raising learner interaction frequency to an average of 35 times per day. These optimizations elevate textbook multimodal consistency, cross-cultural adaptability, and teaching interactivity, offering methodological support for art English textbook enhancement and boosting students' cross-cultural competence.

创建时间：

2025-04-25