MMInstruction/M3IT-80

Name: MMInstruction/M3IT-80
Creator: MMInstruction
Published: 2023-06-20 12:43:25
License: 暂无描述

Hugging Face2023-06-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/MMInstruction/M3IT-80

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other task_categories: - image-to-text - image-classification size_categories: - 0.5M<n<1M --- # Dataset Card for M3IT-80 Project Page: [https://m3-it.github.io/](https://m3-it.github.io/) ## Dataset Description - **Homepage: https://huggingface.co/datasets/MMInstruction/M3IT-80** - **Repository: https://huggingface.co/datasets/MMInstruction/M3IT-80** - **Paper: https://huggingface.co/papers/2306.04387** - **Leaderboard:** - **Point of Contact:** ### Languages 80 languages translated from English. ## Dataset Metainfo [M3IT](https://huggingface.co/datasets/MMInstruction/M3IT) dataset compiles diverse tasks of classical vision-language tasks, including captioning, visual question answering~(VQA), visual conditioned generation, reasoning and classification. **M3IT-80** is the 80-language translated version of M3IT. ### Languages ```python _LAN_CODES = [ "af", "am", "ar", "as", "ast", "be", "bg", "bn", "bs", "ca", "ceb", "cs", "cy", "da", "de", "el", "es", "et", "fi", "fr", "fuv", "gl", "gu", "ha", "he", "hi", "hr", "hu", "hy", "id", "ig", "is", "it", "ja", "jv", "ka", "kk", "km", "kn", "ko", "ky", "lb", "lg", "lij", "li", "ln", "lo", "lt", "lv", "mi", "mk", "ml", "mr", "mt", "my", "nl", "ny", "oc", "pa", "pl", "pt", "ro", "ru", "sd", "sk", "sn", "so", "sr", "sv", "ta", "te", "tg", "th", "tl", "tr", "uk", "ur", "vi", "wo", "zh", ] ``` ### Dataset Statistics We report the number of the train/validation/test of each dataset per language. | Task | Dataset | #Train | #Val | #Test | |---------------------------|--------------|--------|------|-------| | Classification | `imagenet` | 500 | 500 | 0 | | Visual Question Answering | `vqa-v2` | 500 | 500 | 0 | | Knowledgeable Visual QA | `okvqa` | 500 | 500 | 0 | | Reasoning | `winoground` | 0 | 0 | 800 | | Generation | `vist` | 500 | 500 | 500 | | Video | `msrvtt` | 500 | 500 | 0 | | | `msrvtt-qa` | 500 | 500 | 0 | ### Source Data Source language: English | Task | Dataset [Citation] | Source | |---------------------------|--------------------|------------------------------------------------------------------------------------| | Classification | `imagenet` [1] | [Source](https://www.image-net.org/) | | Visual Question Answering | `vqa-v2` [2] | [Source](https://visualqa.org/) | | Knowledgeable Visual QA | `okvqa` [3] | [Source](https://okvqa.allenai.org/) | | Reasoning | `winoground` [4] | [Source](https://huggingface.co/datasets/facebook/winoground) | | Generation | `vist` [5] | [Source](https://visionandlanguage.net/VIST/) | | Video | `msrvtt` [6] | [Source](https://paperswithcode.com/dataset/msr-vtt) | | | `msrvtt-qa` [7] | [Source](https://paperswithcode.com/sota/visual-question-answering-on-msrvtt-qa-1) | ### Translation We use free [Alibaba Translate](https://www.alibabacloud.com/product/machine-translation), a deep neural network translation (NMT) system, to perform the translation task. ## Dataset Structure ### HuggingFace Login (Optional) ```python # OR run huggingface-cli login from huggingface_hub import login hf_token = "hf_xxx" # TODO: set a valid HuggingFace access token for loading datasets/models login(token=hf_token) ``` ### Data Loading ```python from datasets import load_dataset ds_name = "okvqa-zh" # change the dataset name here dataset = load_dataset("MMInstruction/M3IT-80", ds_name) ``` ### Data Splits ```python from datasets import load_dataset ds_name = "okvqa-zh" # change the dataset name here dataset = load_dataset("MMInstruction/M3IT-80", ds_name) train_set = dataset["train"] validation_set = dataset["validation"] test_set = dataset["test"] ``` ### Data Instances ```python from datasets import load_dataset from io import BytesIO from base64 import b64decode from PIL import Image ds_name = "okvqa-zh" # change the dataset name here dataset = load_dataset("MMInstruction/M3IT-80", ds_name) train_set = dataset["train"] for train_instance in train_set: instruction = train_instance["instruction"] # str inputs = train_instance["inputs"] # str outputs = train_instance["outputs"] # str image_base64_str_list = train_instance["image_base64_str"] # str (base64) image_0 = Image.open(BytesIO(b64decode(image_base64_str_list[0]))) ``` ### Data Fields ```python import datasets features = datasets.Features( { "instruction": datasets.Value("string"), "inputs": datasets.Value("string"), "image_base64_str": [datasets.Value("string")], "outputs": datasets.Value("string"), } ) ``` ### Licensing Information The content of original dataset follows their original license. We suggest that for the task with Unknown/Custom license, the user can check the original project or contact the dataset owner for detailed license information. Our annotated instruction data is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). ### Citation Information ```bibtex @article{li2023m3it, title={M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning}, author={Lei Li and Yuwei Yin and Shicheng Li and Liang Chen and Peiyi Wang and Shuhuai Ren and Mukai Li and Yazheng Yang and Jingjing Xu and Xu Sun and Lingpeng Kong and Qi Liu}, journal={arXiv preprint arXiv:2306.04387}, year={2023} } ``` ### Contributions M3IT-80 is the translated version of M3IT, an open-source, large-scale Multi-modal, Multilingual Instruction Tuning dataset, designed to enable the development of general-purpose multi-modal agents. ## References - [1] Imagenet large scale visual recognition challenge - [2] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering - [3] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge - [4] WinoGround: Probing vision and language models for visio-linguistic compositionality - [5] Visual Storytelling - [6] Video Question Answering via Gradually Refined Attention over Appearance and Motion - [7] MSR-VTT: A large video description dataset for bridging video and language

提供机构：

MMInstruction

原始信息汇总

数据集概述

数据集名称: M3IT-80

数据集描述: M3IT-80 是 M3IT 数据集的80种语言翻译版本，涵盖了多种视觉-语言任务，包括标题生成、视觉问答（VQA）、视觉条件生成、推理和分类。

语言: 数据集包含80种语言，具体语言代码列表如下： python _LAN_CODES = [ "af", "am", "ar", "as", "ast", "be", "bg", "bn", "bs", "ca", "ceb", "cs", "cy", "da", "de", "el", "es", "et", "fi", "fr", "fuv", "gl", "gu", "ha", "he", "hi", "hr", "hu", "hy", "id", "ig", "is", "it", "ja", "jv", "ka", "kk", "km", "kn", "ko", "ky", "lb", "lg", "lij", "li", "ln", "lo", "lt", "lv", "mi", "mk", "ml", "mr", "mt", "my", "nl", "ny", "oc", "pa", "pl", "pt", "ro", "ru", "sd", "sk", "sn", "so", "sr", "sv", "ta", "te", "tg", "th", "tl", "tr", "uk", "ur", "vi", "wo", "zh", ]

数据集统计: 数据集提供了每种语言的训练/验证/测试集数量，具体统计如下：

Task	Dataset	#Train	#Val	#Test
Classification	`imagenet`	500	500	0
Visual Question Answering	`vqa-v2`	500	500	0
Knowledgeable Visual QA	`okvqa`	500	500	0
Reasoning	`winoground`	0	0	800
Generation	`vist`	500	500	500
Video	`msrvtt`	500	500	0
	`msrvtt-qa`	500	500	0

源数据: 源语言为英语，使用阿里巴巴翻译服务进行翻译。

数据集结构: 数据集支持通过HuggingFace加载，具体加载方式如下： python from datasets import load_dataset

ds_name = "okvqa-zh" # 更改数据集名称 dataset = load_dataset("MMInstruction/M3IT-80", ds_name)

数据字段: 数据集包含以下字段： python features = datasets.Features( { "instruction": datasets.Value("string"), "inputs": datasets.Value("string"), "image_base64_str": [datasets.Value("string")], "outputs": datasets.Value("string"), } )

许可证信息: 原始数据集遵循其原始许可证。注释指令数据根据CC BY 4.0许可。

引用信息: bibtex @article{li2023m3it, title={M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning}, author={Lei Li and Yuwei Yin and Shicheng Li and Liang Chen and Peiyu Wang and Shuhuai Ren and Mukai Li and Yazheng Yang and Jingjing Xu and Xu Sun and Lingpeng Kong and Qi Liu}, journal={arXiv preprint arXiv:2306.04387}, year={2023} }

贡献: M3IT-80 是一个开源的大型多模态多语言指令调优数据集，旨在促进通用多模态代理的开发。

5,000+

优质数据集

54 个

任务类型

进入经典数据集