MSVD(Microsoft Research Video Description Corpus)

Name: MSVD(Microsoft Research Video Description Corpus)
Creator: OpenDataLab
Published: 2026-05-24 04:30:22
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/MSVD

下载链接

链接失效反馈

官方服务：

资源简介：

收集翻译和释义数据的传统方法可能非常昂贵，使得构建大型新语料库变得困难。虽然众包提供了一种廉价的替代方案，但质量控制和可扩展性可能会成为问题。在这个项目中，我们介绍了一种新颖的注释任务，它使用短视频剪辑（通常少于 10 秒）作为刺激，以引发注释者的平行语言反应。相同语言的相同视频的描述可以用作彼此的释义，而不同语言的描述可以用作彼此的翻译。这种数据收集方法的一些优点是：只需要说单语的人来创建翻译数据创建更自然的释义，不受源句的影响由于没有要翻译的源句，因此不鼓励使用在线翻译服务等作弊行为在 2010 年 7 月到 9 月的两个月期间，我们为 2,089 个视频剪辑收集了 85,000 个英文描述，以及针对十几种语言中的每一种的上千个描述。除了为释义和翻译引擎提供训练和测试数据外，这些数据还为大量视频数据提供自然语言描述。视频剪辑通常描绘单个、明确的动作或事件。

Traditional methods for collecting translation and paraphrase data can be prohibitively expensive, making it difficult to build large new corpora. While crowdsourcing offers a low-cost alternative, quality control and scalability can become problematic. In this project, we introduce a novel annotation task that uses short video clips (typically under 10 seconds) as stimuli to elicit parallel linguistic responses from annotators. Descriptions of the same video in the same language can serve as paraphrases of one another, while descriptions in different languages can serve as translations of each other. Several advantages of this data collection method are as follows: 1. Only monolingual speakers are required to create translation data; 2. More natural paraphrases can be generated without being constrained by source sentences; 3. Since there is no source sentence to translate, cheating behaviors such as using online translation services are discouraged. Over the two-month period from July to September 2010, we collected 85,000 English descriptions for 2,089 video clips, as well as thousands of descriptions for each of more than ten other languages. In addition to providing training and testing data for paraphrase and translation engines, this dataset also offers natural language descriptions for a large volume of video data. The video clips typically depict single, distinct actions or events.

提供机构：

OpenDataLab

创建时间：

2022-08-16

搜集汇总

数据集介绍