MSVD-Indonesian
收藏arXiv2023-06-20 更新2024-06-21 收录
下载链接:
https://github.com/willyfh/msvd-indonesian
下载链接
链接失效反馈官方服务:
资源简介:
MSVD-Indonesian是由独立研究员Willy Fitra Hendria创建的第一个公开的印度尼西亚语视频-文本数据集,该数据集通过将英文MSVD数据集中的句子翻译成印度尼西亚语而构建。数据集包含1970个视频,总计约8万句印度尼西亚语句子,用于支持多模态视频-文本任务的研究,如文本到视频检索、视频到文本检索和视频字幕生成。创建过程中使用了Google翻译API,尽管存在一些翻译不准确的问题,但数据集仍被用于训练神经网络模型,并在三个任务上进行了评估。该数据集的应用领域主要集中在印度尼西亚语环境下的多模态视频-文本研究,旨在解决该语言环境下此类任务的研究空白。
MSVD-Indonesian is the first publicly available Indonesian video-text dataset created by independent researcher Willy Fitra Hendria. It is constructed by translating sentences from the English MSVD dataset into Indonesian. The dataset contains 1970 videos and approximately 80,000 Indonesian sentences in total, aiming to support research on multimodal video-text tasks including text-to-video retrieval, video-to-text retrieval, and video captioning. Google Translate API was utilized during its development. Despite certain translation inaccuracies, the dataset has been employed for training neural network models and evaluated on the three aforementioned tasks. Its application mainly focuses on multimodal video-text research in Indonesian-speaking contexts, with the goal of addressing the research gap of such tasks in this language environment.
提供机构:
独立研究员
创建时间:
2023-06-20



