lightonai/MMLBD-C

Name: lightonai/MMLBD-C
Creator: lightonai
Published: 2026-02-18 02:01:13
License: 暂无描述

Hugging Face2026-02-18 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/lightonai/MMLBD-C

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - visual-question-answering pretty_name: MMLBD-C size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: "mmlbd-c.json" default: true --- # MMLBD-C ## Dataset summary **MMLBD-C** is a manually corrected and quality-filtered evaluation variant of **MMLongBench-Doc** designed to reduce noise from erroneous or low-quality examples when benchmarking **long-document visual question answering**. This release focuses on fixing issues such as: - incorrect question–document pairing, - ambiguous / underspecified wording, - typos, - incorrect answers, - and improving “Not answerable” handling by accepting equivalent responses where appropriate. In our [paper](https://arxiv.org/abs/2602.15257), we flag 342 examples for review, **modify 251**, and **remove 16** from the benchmark. > We hope this release helps the community better push the frontier of **long document understanding**. ## What’s included in this repo - **Corrected annotations** for MMLBD-C (relative to the upstream MMLongBench-Doc benchmark). This includes the flagging pipeline remarks and pages it marked relevant, along with our modifications and final actions. - A **TSV file** in the format used by **VLMEvalKit**, so you can evaluate easily. - This dataset is intended primarily for **evaluation** (benchmarking). ## Corrections made We construct MMLBD-C by flagging and correcting issues in MMLongBenchDoc including incorrect question-document pairing, ambiguous or misleading wording, typos, and answer errors. Flagged items are manually reviewed and one of the following actions is taken: **leave as is**, **modify** (question and/or answer), or **remove**. ### Categories of fixes - **Document mismatch** - Example: “List all the PM health effects that increse by more than 35% in India and Thailand.” was paired with an unrelated document about digital marketing. - Action: remove 9/10 affected questions and convert the remaining one to “Not answerable”. - **Underspecified** - Example: “List all the sections that discuss about the experiment setup?” - Answer: "['Section 4.1', 'Section 4.2', 'Section 4.3', 'Appendix A']" - Issue: the question is underspecified for the given answer since it excludes clearly relevant sections (see image below). - **Typo** - Example: “How do Amazon recognize **least** cost?” should read “**lease** cost”. - Issue: “least” is plausible in context and can legitimately confuse models. - **Incorrect answer** - Example: "How many percentage respondents in this survey access to internet more than two times per month?" - Answer: "Not answerable" - Issue: Explicit evidence exists in the document (see image below). - **Answer expansion** - For “Not answerable” questions, we also accept equivalent responses (e.g., “None”, “0”, “No one”) where appropriate. | Document mismatch | Underspecified | |---|---| | <img src="images/question_doc_mismatch.jpeg" width="360"> | <img src="images/section_3_methodology_experiment_setup_question.jpeg" width="360"> | | Typo (“least” → “lease”) | Incorrect “Not answerable” | |---|---| | <img src="images/lease_costs.jpeg" width="360"> | <img src="images/access_to_internet.jpeg" width="360"> | ## Data format This repo includes a json file for easy use and browsing along with a **TSV** export for drop-in **VLMEvalKit** compatibility. ## Intended use - Benchmarking/evaluating long-context VLMs on long-document VQA. ## Notes on licensing MMLBD-C is a derivative/correction layer over the upstream MMLongBench-Doc benchmark. Please follow the licensing and usage terms of the upstream dataset and associated documents. ## Citation If you use this dataset, please cite our work: ```bibtex @misc{orion_longdoc_vlm_2026, title={How to Train Your Long-Context Visual Document Model}, author={Austin Veselka}, year={2026}, eprint={2602.15257}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2602.15257}, } @misc{ma2024mmlongbenchdocbenchmarkinglongcontextdocument, title={MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations}, author={Yubo Ma and Yuhang Zang and Liangyu Chen and Meiqi Chen and Yizhu Jiao and Xinze Li and Xinyuan Lu and Ziyu Liu and Yan Ma and Xiaoyi Dong and Pan Zhang and Liangming Pan and Yu-Gang Jiang and Jiaqi Wang and Yixin Cao and Aixin Sun}, year={2024}, eprint={2407.01523}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.01523}, } ```

--- 许可证：Apache-2.0 任务类别： - 视觉问答（visual-question-answering）展示名称：MMLBD-C 样本规模类别： - 1K<n<10K 配置项： - 配置名称：default 数据文件： - 拆分集：train 路径："mmlbd-c.json" 默认启用：true --- # MMLBD-C ## 数据集概述 **MMLBD-C** 是针对**MMLongBench-Doc**进行人工校正与质量过滤后的评估变体，旨在降低长文档视觉问答（long-document visual question answering）基准测试中错误或低质量样本引入的噪声。本次发布重点修复以下问题： - 错误的问题-文档配对 - 模糊/表述不明确的措辞 - 拼写错误 - 错误的答案 - 并优化了“无法回答”类问题的处理逻辑，在适当场景下接受等价的响应表述。在我们的[论文](https://arxiv.org/abs/2602.15257)中，我们标记了342个待审核样本，**修正了251个**，并从基准测试中**移除了16个**样本。 > 我们希望本次发布能够帮助社区更好地推动长文档理解（long document understanding）领域的前沿发展。 ## 本仓库包含内容 - **MMLBD-C的校正后标注**（相对于上游基准MMLongBench-Doc），其中包含标记流程备注、标记的相关页面，以及我们的修改方案与最终处理结果。 - 符合**VLMEvalKit**格式的TSV文件，可直接用于模型评估。 - 本数据集主要用于**评估（基准测试）**任务。 ## 所做校正我们通过标记并修正MMLongBench-Doc中的各类问题构建了MMLBD-C，包括错误的问题-文档配对、模糊或误导性措辞、拼写错误以及答案错误。标记后的样本将经过人工审核，并执行以下操作之一：**保留原样**、**修改**（问题和/或答案）或**移除**。 ### 修复类别 - **文档不匹配** - 示例：“列出印度和泰国中增幅超过35%的PM健康影响。”被配对到了一篇无关的数字营销相关文档。 - 处理措施：移除9/10受影响的问题，并将剩余1个问题标记为“无法回答”。 - **表述不明确** - 示例：“列出所有讨论实验设置的章节？” - 答案：`["Section 4.1", "Section 4.2", "Section 4.3", "Appendix A"]` - 问题：该问题相对于给定答案表述不明确，因为遗漏了明确相关的章节（详见下图）。 - **拼写错误** - 示例：“亚马逊如何识别**least**成本？”正确表述应为“**lease**成本”。 - 问题：“least”在语境中看似合理，可能会对模型造成误导。 - **答案错误** - 示例：“本次调查中有多少比例的受访者每月上网次数超过两次？” - 答案：“无法回答” - 问题：文档中存在明确的相关证据（详见下图）。 - **答案扩展** 对于“无法回答”类问题，我们在适当场景下接受等价的响应表述（例如“无”、“0”、“无人”）。 | 文档不匹配 | 表述不明确 | |---|---| | <img src="images/question_doc_mismatch.jpeg" width="360"> | <img src="images/section_3_methodology_experiment_setup_question.jpeg" width="360"> | | 拼写错误（“least” → “lease”） | 错误的“无法回答”答案 | |---|---| | <img src="images/lease_costs.jpeg" width="360"> | <img src="images/access_to_internet.jpeg" width="360"> | ## 数据格式本仓库包含便于使用与浏览的JSON文件，以及可直接适配VLMEvalKit的TSV导出文件。 ## 预期用途 - 针对长上下文视觉语言模型（Visual Language Model）的长文档视觉问答任务进行基准测试与评估。 ## 许可证说明 MMLBD-C 是上游基准MMLongBench-Doc的衍生/校正版本，请遵循上游数据集及相关文档的许可与使用条款。 ## 引用若您使用本数据集，请引用以下研究： bibtex @misc{orion_longdoc_vlm_2026, title={How to Train Your Long-Context Visual Document Model}, author={Austin Veselka}, year={2026}, eprint={2602.15257}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2602.15257}, } @misc{ma2024mmlongbenchdocbenchmarkinglongcontextdocument, title={MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations}, author={Yubo Ma and Yuhang Zang and Liangyu Chen and Meiqi Chen and Yizhu Jiao and Xinze Li and Xinyuan Lu and Ziyu Liu and Yan Ma and Xiaoyi Dong and Pan Zhang and Liangming Pan and Yu-Gang Jiang and Jiaqi Wang and Yixin Cao and Aixin Sun}, year={2024}, eprint={2407.01523}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.01523}, }

提供机构：

lightonai

5,000+

优质数据集

54 个

任务类型

进入经典数据集