MMD
收藏魔搭社区2025-08-23 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/RealmSky/MMD
下载链接
链接失效反馈官方服务:
资源简介:
### 📢 News
- [06/13/2024] 🚀 We release our MMDU benchmark and MMDU-45k instruct tunning data to huggingface.
### 💎 MMDU Benchmark
To evaluate the multi-image multi-turn dialogue capabilities of existing models, we have developed the MMDU Benchmark. Our benchmark comprises **110 high-quality multi-image multi-turn dialogues with more than 1600 questions**, each accompanied by detailed long-form answers. Previous benchmarks typically involved only single images or a small number of images, with fewer rounds of questions and short-form answers. However, MMDU significantly increases the number of images, the number of question-and-answer rounds, and the in-context length of the Q&A. The questions in MMUD **involve 2 to 20 images**, with an average image&text token length of **8.2k tokens**, and a maximum image&text length reaching **18K tokens**, presenting significant challenges to existing multimodal large models.
### 🎆 MMDU-45k Instruct Tuning Dataset
In the MMDU-45k, we construct a total of **45k instruct tuning data conversations**. Each data in our MMDU-45k dataset features an ultra-long context, with an average image&text token length of **5k** and a maximum image&text token length of **17k tokens**. Each dialogue contains an average of **9 turns of Q&A**, with a maximum of **27 turns**. Additionally, each data includes content from **2-5 images**. The dataset is constructed in a well-designed format, providing excellent scalability. It can be expanded to generate a larger number and longer multi-image, multi-turn dialogues through combinations. **The image-text length and the number of turns in MMDU-45k significantly surpass those of all existing instruct tuning datasets.** This enhancement greatly improves the model's capabilities in multi-image recognition and understanding, as well as its ability to handle long-context dialogues.
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
For more information, please refer to our 💻[Github](https://github.com/Liuziyu77/MMDU/), 🏠[Homepage](https://liuziyu77.github.io/MMDU/), or 📖[Paper](https://arxiv.org/abs/2406.11833).
### 📢 消息公告
- [2024年6月13日] 🚀 我们已将MMDU基准测试集与MMDU-45k指令微调数据集发布至Hugging Face平台。
### 💎 MMDU基准测试集
为评估现有模型的多图像多轮对话能力,我们构建了MMDU基准测试集。该基准集包含**110组高质量多图像多轮对话,涵盖1600余道问题**,每组对话均配有详实的长文本回答。此前的基准测试集通常仅支持单图像或少量图像输入,且对话轮次较少、回答多为短文本。而MMDU基准集则大幅提升了图像数量、问答轮次与问答上下文长度。MMDU的单组问答涉及**2至20张图像**,平均图像与文本总Token长度达**8.2千个Token**,最大上下文长度可达**18千个Token**,对现有多模态大模型构成了显著挑战。
### 🎆 MMDU-45k指令微调数据集
在MMDU-45k数据集中,我们共构建了**4.5万组指令微调对话数据**。该数据集的每组样本均具备超长上下文特性,平均图像与文本总Token长度为**5千个**,最大长度可达**17千个Token**。每组对话平均包含**9轮问答**,最多可达**27轮**。此外,每组样本均包含**2至5张图像**。该数据集采用精心设计的格式构建,具备极佳的可扩展性,可通过组合方式生成更多数量、更长长度的多图像多轮对话数据。MMDU-45k的图文上下文长度与对话轮次,均显著优于当前所有公开的指令微调数据集。该特性可有效提升模型的多图像识别与理解能力,以及长上下文对话处理能力。
许可证:署名-非商业性使用4.0国际版(CC BY-NC 4.0),需遵守OpenAI相关政策:https://openai.com/policies/terms-of-use
如需了解更多信息,请访问我们的💻[GitHub仓库](https://github.com/Liuziyu77/MMDU/)、🏠[项目主页](https://liuziyu77.github.io/MMDU/)或📖[研究论文](https://arxiv.org/abs/2406.11833)。
提供机构:
maas
创建时间:
2025-03-06



