Indonesian Dataset Expansion of Microsoft Research Video Description Corpus and Its Similarity Analysis

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/d7vx5cc92y

下载链接

链接失效反馈

官方服务：

资源简介：

Microsoft research video description corpus is an openly dataset contains about 120K sentences. The sentences are a set of roughly parallel descriptions of more than 2,000 video snippets of 35 languages. Both paraphrase and bilingual relation are available but Indonesian description is not available in the dataset. This dataset is Indonesian expansion of Microsoft research video description corpus. The collection consists of 43,753 description texts of 1,959 short videos, parallel with Microsoft’s dataset. Adding more value to the dataset, the similarity metrics calculations of the texts are done. The metrics are cosine, jaccard, euclidian, and manhattan with average results are 0.22, 0.33, 2.38, and 6.08 respectively.

创建时间：

2018-08-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集