M3IT

arXiv2023-06-08 更新2024-06-21 收录

下载链接：

https://huggingface.co/datasets/MMInstruction/M3IT

下载链接

链接失效反馈

官方服务：

资源简介：

M3IT数据集是一个旨在优化视觉语言模型（VLM）与人类指令对齐的大规模多模态多语言指令调整数据集。该数据集由香港大学、北京大学和上海人工智能实验室共同创建，包含40个不同任务，总计240万个实例和400个手工编写的任务指令。数据集中的关键任务被翻译成80种语言，确保了更广泛的可用性。M3IT数据集不仅覆盖了多种视觉语言任务，如图像分类、视觉问答和图像字幕，还包含了视频相关的任务，如视频问答，以及中文视觉语言任务。此外，数据集的创建过程包括四个阶段：手工指令编写、数据预处理、仔细的质量检查和多语言数据集构建。M3IT数据集的应用领域广泛，旨在解决视觉语言模型在理解和执行多样化任务时的挑战，推动多模态智能代理的发展。

The M3IT dataset is a large-scale multimodal multilingual instruction-tuning dataset designed to optimize the alignment between Vision-Language Models (VLMs) and human instructions. Co-created by The University of Hong Kong, Peking University, and the Shanghai AI Laboratory, it encompasses 40 distinct tasks, totaling 2.4 million instances and 400 manually written task instructions. Key tasks within the dataset have been translated into 80 languages to ensure broader usability. The M3IT dataset covers a wide range of vision-language tasks, including image classification, visual question answering, and image captioning, as well as video-related tasks such as video question answering, and Chinese vision-language tasks. In addition, the dataset creation process consists of four stages: manual instruction writing, data preprocessing, rigorous quality inspection, and multilingual dataset construction. The M3IT dataset has extensive application scenarios, aiming to address the challenges encountered by vision-language models when understanding and executing diverse tasks, and promote the development of multimodal intelligent agents.

提供机构：

香港大学北京大学上海人工智能实验室

创建时间：

2023-06-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集