MMDU
收藏魔搭社区2025-12-05 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/ViRFT/MMDU
下载链接
链接失效反馈官方服务:
资源简介:
### 📢 News
- [06/13/2024] 🚀 We release our MMDU benchmark and MMDU-45k instruct tunning data to huggingface.
### 💎 MMDU Benchmark
To evaluate the multi-image multi-turn dialogue capabilities of existing models, we have developed the MMDU Benchmark. Our benchmark comprises **110 high-quality multi-image multi-turn dialogues with more than 1600 questions**, each accompanied by detailed long-form answers. Previous benchmarks typically involved only single images or a small number of images, with fewer rounds of questions and short-form answers. However, MMDU significantly increases the number of images, the number of question-and-answer rounds, and the in-context length of the Q&A. The questions in MMUD **involve 2 to 20 images**, with an average image&text token length of **8.2k tokens**, and a maximum image&text length reaching **18K tokens**, presenting significant challenges to existing multimodal large models.
### 🎆 MMDU-45k Instruct Tuning Dataset
In the MMDU-45k, we construct a total of **45k instruct tuning data conversations**. Each data in our MMDU-45k dataset features an ultra-long context, with an average image&text token length of **5k** and a maximum image&text token length of **17k tokens**. Each dialogue contains an average of **9 turns of Q&A**, with a maximum of **27 turns**. Additionally, each data includes content from **2-5 images**. The dataset is constructed in a well-designed format, providing excellent scalability. It can be expanded to generate a larger number and longer multi-image, multi-turn dialogues through combinations. **The image-text length and the number of turns in MMDU-45k significantly surpass those of all existing instruct tuning datasets.** This enhancement greatly improves the model's capabilities in multi-image recognition and understanding, as well as its ability to handle long-context dialogues.
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
For more information, please refer to our 💻[Github](https://github.com/Liuziyu77/MMDU/), 🏠[Homepage](https://liuziyu77.github.io/MMDU/), or 📖[Paper](https://arxiv.org/abs/2406.11833).
### 📢 公告
- [2024年6月13日] 🚀 我们已将MMDU基准测试集与MMDU-45k指令微调数据上传至Hugging Face平台。
### 💎 MMDU基准测试集
为评估现有模型的多图像多轮对话能力,我们研发了MMDU基准测试集。该基准包含**110组高质量多图像多轮对话,累计超过1600个问题**,每组对话均配有详细的长文本回答。现有基准通常仅支持单图像或少量图像,且对话轮次较少、回答多为短文本。相较之下,MMDU大幅提升了图像数量、问答轮次与问答上下文长度。MMDU中的问题涉及**2至20张图像**,平均图像-文本Token长度达**8.2k个Token**,最大长度可达**18k个Token**,对现有多模态大模型构成了显著挑战。
### 🎆 MMDU-45k指令微调数据集
MMDU-45k数据集共包含**45k条指令微调对话数据**。该数据集的每条样本均具备超长上下文,平均图像-文本Token长度为**5k**,最大长度可达**17k个Token**。每组对话平均包含**9轮问答**,最多可达**27轮**。此外,每条数据涵盖**2至5张图像**。该数据集采用精心设计的格式构建,具备出色的可扩展性,可通过组合方式生成更多数量、更长长度的多图像多轮对话。**MMDU-45k的图像-文本长度与对话轮次均显著优于现有所有指令微调数据集**。该数据集的优化可大幅提升模型的多图像识别与理解能力,以及长上下文对话处理能力。
许可协议:知识共享署名-非商业性使用4.0国际许可协议(Attribution-NonCommercial 4.0 International),使用时需遵守OpenAI相关政策:https://openai.com/policies/terms-of-use
更多信息请参阅我们的💻[Github](https://github.com/Liuziyu77/MMDU/)、🏠[项目主页](https://liuziyu77.github.io/MMDU/)或📖[研究论文](https://arxiv.org/abs/2406.11833)。
提供机构:
maas
创建时间:
2025-03-05



