MileBench

Name: MileBench
Creator: maas
Published: 2025-12-04 16:21:16
License: 暂无描述

魔搭社区2025-12-04 更新2025-01-25 收录

下载链接：

https://modelscope.cn/datasets/FreedomIntelligence/MileBench

下载链接

链接失效反馈

官方服务：

资源简介：

# MileBench ## Introduction We introduce MileBench, a pioneering benchmark designed to test the **M**ult**I**modal **L**ong-cont**E**xt capabilities of MLLMs. This benchmark comprises not only multimodal long contexts, but also multiple tasks requiring both comprehension and generation. We establish two distinct evaluation sets, diagnostic and realistic, to systematically assess MLLMs’ long-context adaptation capacity and their ability to completetasks in long-context scenarios <img src="./images/MileBench.png" width="600" alt="MileBench" align="center" /> To construct our evaluation sets, we gather 6,440 multimodal long-context samples from 21 pre-existing or self-constructed datasets, with an average of 15.2 images and 422.3 words each, as depicted in the figure, and we categorize them into their respective subsets. <center class="half"> <img src="./images/stat2.png" width="300" alt="stat2"/><img src="./images/stat1.png" width="300" alt="stat1"/> </center> ## How to use? Please download MileBench_part*.tar.gz and unzip them using the following command. ```bash for file in MileBench_part*.tar.gz do tar -xzvf "$file" done ``` Then please refer to [Code for MileBench](https://github.com/MileBench/MileBench?tab=readme-ov-file#-dataset-preparation) to evaluate. ## Links - **Homepage:** [MileBench Homepage](https://milebench.github.io/) - **Repository:** [MileBench GitHub](https://github.com/MileBench/MileBench) - **Paper:** [Arxiv](https://arxiv.org/abs/2404.18532) - **Point of Contact:** [Dingjie Song](mailto:bbsngg@outlook.com) ## Citation If you find this project useful in your research, please consider citing: ```BibTeX @article{song2024milebench, title={MileBench: Benchmarking MLLMs in Long Context}, author={Song, Dingjie and Chen, Shunian and Chen, Guiming Hardy and Yu, Fei and Wan, Xiang and Wang, Benyou}, journal={arXiv preprint arXiv:2404.18532}, year={2024} } ```

# MileBench ## 简介我们推出了MileBench，这是一款用于评测**多模态大语言模型（Multimodal Large Language Model, MLLM）**多模态长上下文能力的开创性基准测试集。该基准测试集不仅涵盖多模态长上下文数据，还包含了同时需要理解与生成能力的多项任务。我们构建了诊断型与现实型两个独立的评测子集，以系统性评估MLLM的长上下文适配能力，以及其在长上下文场景下完成任务的能力。 <img src="./images/MileBench.png" width="600" alt="MileBench" align="center" /> 为构建评测子集，我们从21个已有或自建数据集中共收集了6440条多模态长上下文样本，每条样本平均包含15.2张图像与422.3个单词，如图中所示，并将其划分为对应的子类别。 <center class="half"> <img src="./images/stat2.png" width="300" alt="stat2"/><img src="./images/stat1.png" width="300" alt="stat1"/> </center> ## 使用方法请下载MileBench_part*.tar.gz文件，并通过以下命令进行解压： bash for file in MileBench_part*.tar.gz do tar -xzvf "$file" done 随后请参考[MileBench评测代码](https://github.com/MileBench/MileBench?tab=readme-ov-file#-dataset-preparation)进行评测。 ## 相关链接 - **主页：** [MileBench官方主页](https://milebench.github.io/) - **代码仓库：** [MileBench GitHub仓库](https://github.com/MileBench/MileBench) - **论文：** [Arxiv预印本](https://arxiv.org/abs/2404.18532) - **联系方式：** [宋鼎杰（Dingjie Song）](mailto:bbsngg@outlook.com) ## 引用声明若您在研究中使用本项目，请引用以下文献： BibTeX @article{song2024milebench, title={MileBench: Benchmarking MLLMs in Long Context}, author={Song, Dingjie and Chen, Shunian and Chen, Guiming Hardy and Yu, Fei and Wan, Xiang and Wang, Benyou}, journal={arXiv preprint arXiv:2404.18532}, year={2024} }

提供机构：

maas

创建时间：

2025-01-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集