five

BIGVIDEO

收藏
arXiv2023-07-03 更新2024-06-21 收录
下载链接:
https://github.com/DeepLearnXMU/BigVideoVMT
下载链接
链接失效反馈
官方服务:
资源简介:
BIGVIDEO是一个大规模的视频字幕翻译数据集,旨在促进多模态机器翻译的研究。该数据集包含15.5万个视频,总计450万对高质量的英汉平行句子和9981小时视频,数据来源于YouTube和西瓜视频平台。BIGVIDEO特别设计了两个测试集AMBIGUOUS和UNAMBIGUOUS,以验证视觉信息的必要性。AMBIGUOUS包含需要视频来消除歧义的句子,而UNAMBIGUOUS则包含文本内容足以进行翻译的句子。此数据集适用于研究视频在机器翻译中的作用,特别是在解决语义歧义方面的应用。

BIGVIDEO is a large-scale video subtitle translation dataset developed to advance research in multimodal machine translation. It comprises 155,000 videos, totaling 4.5 million high-quality English-Chinese parallel sentence pairs and 9,981 hours of video content, sourced from YouTube and Xigua Video platforms. The BIGVIDEO dataset incorporates two specially designed test sets, namely AMBIGUOUS and UNAMBIGUOUS, to validate the necessity of visual information for machine translation. The AMBIGUOUS test set contains sentences that require video context to disambiguate their meanings, while the UNAMBIGUOUS test set includes sentences whose textual content alone is sufficient to complete the translation task. This dataset is suitable for research investigating the role of video in machine translation, particularly in resolving semantic ambiguity.
提供机构:
厦门大学信息学院
创建时间:
2023-05-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作