MMLongCite
收藏魔搭社区2025-11-14 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/Jonas123/MMLongCite
下载链接
链接失效反馈官方服务:
资源简介:
# MMLongCite: A Benchmark for Evaluating Fidelity of Long-Context
[Paper](https://arxiv.org/abs/2510.13276)   [Github](https://github.com/bytedance/MMLongCite/tree/main)
## Benchmark Overview
MMLongCite is a comprehensive benchmark designed to evaluate the **fidelity** of long-context vision-language models (LVLMs) through **citation**. It covers **4 task categories**, including Single-Source Visual Reasoning, Multi-Source Visual Reasoning, Vision Grounding, and Video Understanding, encompassing **8 distinct long-context tasks**. These tasks incorporate diverse modalities such as **images, text, and videos**, with context lengths ranging from **8K to 48K**.
## Data Format
All data in MMLongCite follows the format below:
- id: A unique identifier for the data sample.
- context: A list containing all the contextual information (e.g., images, text) needed to answer the question.
- question: A list containing the specific question to be answered, which may include text and multiple-choice options.
- ground_truth: The correct answer for the question.
- task: A label that specifies the sub task category of the data sample.
- text_length: A metadata field indicating the length of text content within the context.
- mm_length: A metadata field quantifying the multi-modal content within the context(e.g., number of images).
Here is an example:
```
{
"id": 1,
"context": [
{
"type": "image",
"image": "image/mmlongcite/longdocurl/4027862_72.png"
},
...
],
"question": [
{
"type": "text",
"text": "What was difference value between the quantity of total consumption and total import for rice production in 2020?\n(A). 30517 metric tons\n(B). 34082 metric tons\n(C). 3565 metric tons\n(D). 64599 metric tons\nChoose the letter name in front of the right option from A, B, C, D."
}
]
"ground_truth": "C",
"task": ["SP_Figure_Reasoning"],
"text_length": 0,
"mm_length": 4620,
}
```
## Citation
If you find our work helpful, please cite our paper:
```
@article{zhou2025mmlongcite,
title={MMLongCite: A Benchmark for Evaluating Fidelity of Long-Context Vision-Language Models},
author={Zhou, Keyan and Tang, Zecheng and Ming, Lingfeng and Zhou, Guanghao and Chen, Qiguang and Qiao, Dan and Yang, Zheming and Qin, Libo and Qiu, Minghui and Li, Juntao and others},
journal={arXiv preprint arXiv:2510.13276},
year={2025}
}
```
# MMLongCite:用于评估长上下文视觉语言模型保真度的基准测试集
[论文](https://arxiv.org/abs/2510.13276)   [代码仓库](https://github.com/bytedance/MMLongCite/tree/main)
## 基准概览
MMLongCite是一款综合性基准测试集,旨在通过**引用任务**评估长上下文视觉语言模型(Vision-Language Models, LVLMs)的**保真度**。该基准覆盖**4大任务类别**,包括单源视觉推理、多源视觉推理、视觉定位与视频理解,涵盖**8种独立的长上下文任务**。这些任务包含图像、文本、视频等多种模态,上下文长度跨度为8K至48K。
## 数据格式
MMLongCite的所有数据均遵循以下格式:
- id:数据样本的唯一标识符。
- context:包含回答问题所需全部上下文信息(如图像、文本)的列表。
- question:包含待解答具体问题的列表,可包含文本与多项选择题选项。
- ground_truth:该问题的正确答案。
- task:用于标注数据样本所属子任务类别的标签。
- text_length:用于标注上下文中文本内容长度的元数据字段。
- mm_length:用于量化上下文内多模态内容(如图像数量)的元数据字段。
以下为示例:
{
"id": 1,
"context": [
{
"type": "image",
"image": "image/mmlongcite/longdocurl/4027862_72.png"
},
...
],
"question": [
{
"type": "text",
"text": "What was difference value between the quantity of total consumption and total import for rice production in 2020?
(A). 30517 metric tons
(B). 34082 metric tons
(C). 3565 metric tons
(D). 64599 metric tons
Choose the letter name in front of the right option from A, B, C, D."
}
]
"ground_truth": "C",
"task": ["SP_Figure_Reasoning"],
"text_length": 0,
"mm_length": 4620,
}
## 引用
若您认为本工作对您有所帮助,请引用我们的论文:
@article{zhou2025mmlongcite,
title={MMLongCite: A Benchmark for Evaluating Fidelity of Long-Context Vision-Language Models},
author={Zhou, Keyan and Tang, Zecheng and Ming, Lingfeng and Zhou, Guanghao and Chen, Qiguang and Qiao, Dan and Yang, Zheming and Qin, Libo and Qiu, Minghui and Li, Juntao and others},
journal={arXiv preprint arXiv:2510.13276},
year={2025}
}
提供机构:
maas
创建时间:
2025-11-14



