five

Indonesian Dataset Expansion of Microsoft Research Video Description Corpus and Its Similarity Analysis

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/d7vx5cc92y
下载链接
链接失效反馈
官方服务:
资源简介:
Microsoft research video description corpus is an openly dataset contains about 120K sentences. The sentences are a set of roughly parallel descriptions of more than 2,000 video snippets of 35 languages. Both paraphrase and bilingual relation are available but Indonesian description is not available in the dataset. This dataset is Indonesian expansion of Microsoft research video description corpus. The collection consists of 43,753 description texts of 1,959 short videos, parallel with Microsoft’s dataset. Adding more value to the dataset, the similarity metrics calculations of the texts are done. The metrics are cosine, jaccard, euclidian, and manhattan with average results are 0.22, 0.33, 2.38, and 6.08 respectively.
创建时间:
2018-08-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作