Multi-modal Synergy: Bridging Chinese Culture Expression and Teaching Interaction in Art English Textbooks via Self-supervised Learning

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/z7j7fdpj5k

下载链接

链接失效反馈

官方服务：

资源简介：

Existing art English textbooks often have inadequate semantic coherence and limited interactivity in Chinese cultural content. To address this, we propose a novel methodology integrating self-supervised learning with the LXMERT model. The CLIP model is used to align text, image, and video data in a shared semantic space. Then, LXMERT's two-stream Transformer architecture extracts and fuses multimodal features, enhancing fusion consistency. A Chinese cultural knowledge graph boosts cultural concept semantic expression and improves textbook content structure. Our two-way interactive model and cross-cultural learning path optimization strategy also enhance teaching adaptability and communication effectiveness. Experiments show our approach improves text-image semantic matching (above 0.88), multimodal feature fusion consistency (0.895), and concept matching accuracy (up to 0.94), while raising learner interaction frequency to an average of 35 times per day. These optimizations elevate textbook multimodal consistency, cross-cultural adaptability, and teaching interactivity, offering methodological support for art English textbook enhancement and boosting students' cross-cultural competence.

当前主流美术英语教材普遍存在语义连贯性不足、中国文化内容交互性有限的问题。为解决这一痛点，我们提出了一种融合自监督学习与LXMERT模型（LXMERT）的创新研究方法。首先利用CLIP模型（CLIP）在共享语义空间中完成文本、图像与视频数据的对齐；随后依托LXMERT的双流Transformer（Transformer）架构提取并融合多模态特征，提升特征融合的一致性。此外，引入中国文化知识图谱可强化文化概念的语义表达，优化教材内容结构。我们构建的双向交互模型与跨文化学习路径优化策略，同样可有效提升教学适配性与跨文化沟通效能。实验结果表明，本方法可将图文语义匹配精度提升至0.88以上，多模态特征融合一致性达到0.895，概念匹配准确率最高可达0.94，同时将学习者日均交互频次提升至35次。上述优化举措有效提升了教材的多模态一致性、跨文化适配性与教学交互性，可为美术英语教材的提质升级提供方法论支撑，并助力学生跨文化素养的培育。

创建时间：

2025-04-25