dinhanhx/coco-2017-vi

Name: dinhanhx/coco-2017-vi
Creator: dinhanhx
Published: 2023-11-09 09:03:38
License: 暂无描述

Hugging Face2023-11-09 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/dinhanhx/coco-2017-vi

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集首次在[dinhanhx/VisualRoBERTa](https://github.com/dinhanhx/VisualRoBERTa/tree/main)中引入。使用VinAI工具将[COCO 2017图像描述](https://cocodataset.org/#download)（2017年训练/验证注释）从英语翻译成越南语，然后合并了[UIT-ViIC](https://arxiv.org/abs/2002.00175)数据集。数据集包含英文原版和越南语版本（包括UIT-ViIC）。注意：UIT-ViIC分割源自`en/captions_train2017.json`，因此将所有UIT-ViIC分割合并后，再合并到`vi/captions_train2017_trans.json`中，最终得到`captions_train2017_trans_plus.json`。`vi/captions_train2017_trans.json`和`vi/captions_val2017_trans.json`是由VinAI从`en/`中的文件翻译而来。

提供机构：

dinhanhx

原始信息汇总

数据集概述

数据集名称

COCO 2017 image captions in Vietnamese

语言

越南语 (vi)
英语 (en)

来源数据集

MS COCO

许可

未知

任务类别

图像到文本

任务ID

图像标题生成

数据集内容

提供英语原始版本和越南语版本（包括UIT-ViIC）。
越南语版本包括vi/captions_train2017_trans.json和vi/captions_val2017_trans.json，这些文件是由VinAI从英语版本翻译而来。
合并了UIT-ViIC数据集，生成了captions_train2017_trans_plus.json。