MMDU

Name: MMDU
Creator: maas
Published: 2025-12-05 16:25:42
License: 暂无描述

魔搭社区2025-12-05 更新2025-03-08 收录

下载链接：

https://modelscope.cn/datasets/ViRFT/MMDU

下载链接

链接失效反馈

官方服务：

资源简介：

### 📢 News - [06/13/2024] 🚀 We release our MMDU benchmark and MMDU-45k instruct tunning data to huggingface. ### 💎 MMDU Benchmark To evaluate the multi-image multi-turn dialogue capabilities of existing models, we have developed the MMDU Benchmark. Our benchmark comprises **110 high-quality multi-image multi-turn dialogues with more than 1600 questions**, each accompanied by detailed long-form answers. Previous benchmarks typically involved only single images or a small number of images, with fewer rounds of questions and short-form answers. However, MMDU significantly increases the number of images, the number of question-and-answer rounds, and the in-context length of the Q&A. The questions in MMUD **involve 2 to 20 images**, with an average image&text token length of **8.2k tokens**, and a maximum image&text length reaching **18K tokens**, presenting significant challenges to existing multimodal large models. ### 🎆 MMDU-45k Instruct Tuning Dataset In the MMDU-45k, we construct a total of **45k instruct tuning data conversations**. Each data in our MMDU-45k dataset features an ultra-long context, with an average image&text token length of **5k** and a maximum image&text token length of **17k tokens**. Each dialogue contains an average of **9 turns of Q&A**, with a maximum of **27 turns**. Additionally, each data includes content from **2-5 images**. The dataset is constructed in a well-designed format, providing excellent scalability. It can be expanded to generate a larger number and longer multi-image, multi-turn dialogues through combinations. **The image-text length and the number of turns in MMDU-45k significantly surpass those of all existing instruct tuning datasets.** This enhancement greatly improves the model's capabilities in multi-image recognition and understanding, as well as its ability to handle long-context dialogues. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use For more information, please refer to our 💻[Github](https://github.com/Liuziyu77/MMDU/), 🏠[Homepage](https://liuziyu77.github.io/MMDU/), or 📖[Paper](https://arxiv.org/abs/2406.11833).

### 📢 公告 - [2024年6月13日] 🚀 我们已将MMDU基准测试集与MMDU-45k指令微调数据上传至Hugging Face平台。 ### 💎 MMDU基准测试集为评估现有模型的多图像多轮对话能力，我们研发了MMDU基准测试集。该基准包含**110组高质量多图像多轮对话，累计超过1600个问题**，每组对话均配有详细的长文本回答。现有基准通常仅支持单图像或少量图像，且对话轮次较少、回答多为短文本。相较之下，MMDU大幅提升了图像数量、问答轮次与问答上下文长度。MMDU中的问题涉及**2至20张图像**，平均图像-文本Token长度达**8.2k个Token**，最大长度可达**18k个Token**，对现有多模态大模型构成了显著挑战。 ### 🎆 MMDU-45k指令微调数据集 MMDU-45k数据集共包含**45k条指令微调对话数据**。该数据集的每条样本均具备超长上下文，平均图像-文本Token长度为**5k**，最大长度可达**17k个Token**。每组对话平均包含**9轮问答**，最多可达**27轮**。此外，每条数据涵盖**2至5张图像**。该数据集采用精心设计的格式构建，具备出色的可扩展性，可通过组合方式生成更多数量、更长长度的多图像多轮对话。**MMDU-45k的图像-文本长度与对话轮次均显著优于现有所有指令微调数据集**。该数据集的优化可大幅提升模型的多图像识别与理解能力，以及长上下文对话处理能力。许可协议：知识共享署名-非商业性使用4.0国际许可协议（Attribution-NonCommercial 4.0 International），使用时需遵守OpenAI相关政策：https://openai.com/policies/terms-of-use 更多信息请参阅我们的💻[Github](https://github.com/Liuziyu77/MMDU/)、🏠[项目主页](https://liuziyu77.github.io/MMDU/)或📖[研究论文](https://arxiv.org/abs/2406.11833)。

提供机构：

maas

创建时间：

2025-03-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集