five

FreedomIntelligence/MileBench

收藏
Hugging Face2024-05-19 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/FreedomIntelligence/MileBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-2.0 dataset_info: features: - name: sample_id dtype: int32 - name: task_instruction dtype: string - name: task_instance struct: - name: context dtype: string - name: images_path sequence: string - name: choice_list sequence: string - name: combined_1_images sequence: string - name: response dtype: string splits: - name: ActionLocalization_test num_bytes: 291199 num_examples: 200 - name: ActionLocalization_adv num_bytes: 291199 num_examples: 200 - name: ActionPrediction_test num_bytes: 255687 num_examples: 200 - name: ActionPrediction_adv num_bytes: 255687 num_examples: 200 - name: ActionSequence_test num_bytes: 262234 num_examples: 200 - name: ActionSequence_adv num_bytes: 262234 num_examples: 200 - name: ALFRED_test num_bytes: 112715 num_examples: 200 - name: ALFRED_adv num_bytes: 112715 num_examples: 200 - name: CharacterOrder_test num_bytes: 274821 num_examples: 200 - name: CharacterOrder_adv num_bytes: 274821 num_examples: 200 - name: CLEVR_Change_test num_bytes: 114792 num_examples: 200 - name: CLEVR_Change_adv num_bytes: 114792 num_examples: 200 - name: CounterfactualInference_test num_bytes: 129074 num_examples: 200 - name: CounterfactualInference_adv num_bytes: 129074 num_examples: 200 - name: DocVQA_test num_bytes: 76660 num_examples: 200 - name: DocVQA_adv num_bytes: 76660 num_examples: 200 - name: EgocentricNavigation_test num_bytes: 559193 num_examples: 200 - name: EgocentricNavigation_adv num_bytes: 559193 num_examples: 200 - name: GPR1200_test num_bytes: 579624 num_examples: 600 - name: IEdit_test num_bytes: 50907 num_examples: 200 - name: IEdit_adv num_bytes: 50907 num_examples: 200 - name: ImageNeedleInAHaystack_test num_bytes: 303423 num_examples: 320 - name: MMCoQA_test num_bytes: 344623 num_examples: 200 - name: MMCoQA_adv num_bytes: 344623 num_examples: 200 - name: MovingAttribute_test num_bytes: 97299 num_examples: 200 - name: MovingAttribute_adv num_bytes: 97299 num_examples: 200 - name: MovingDirection_test num_bytes: 115832 num_examples: 200 - name: MovingDirection_adv num_bytes: 115832 num_examples: 200 - name: MultiModalQA_test num_bytes: 87978 num_examples: 200 - name: MultiModalQA_adv num_bytes: 87978 num_examples: 200 - name: nuscenes_test num_bytes: 87282 num_examples: 200 - name: nuscenes_adv num_bytes: 87282 num_examples: 200 - name: ObjectExistence_test num_bytes: 94139 num_examples: 200 - name: ObjectExistence_adv num_bytes: 94139 num_examples: 200 - name: ObjectInteraction_test num_bytes: 264032 num_examples: 200 - name: ObjectInteraction_adv num_bytes: 264032 num_examples: 200 - name: ObjectShuffle_test num_bytes: 289186 num_examples: 200 - name: ObjectShuffle_adv num_bytes: 289186 num_examples: 200 - name: OCR_VQA_test num_bytes: 80940 num_examples: 200 - name: OCR_VQA_adv num_bytes: 80940 num_examples: 200 - name: SceneTransition_test num_bytes: 266203 num_examples: 200 - name: SceneTransition_adv num_bytes: 266203 num_examples: 200 - name: SlideVQA_test num_bytes: 89462 num_examples: 200 - name: SlideVQA_adv num_bytes: 89462 num_examples: 200 - name: Spot_the_Diff_test num_bytes: 47823 num_examples: 200 - name: Spot_the_Diff_adv num_bytes: 47823 num_examples: 200 - name: StateChange_test num_bytes: 286783 num_examples: 200 - name: StateChange_adv num_bytes: 286783 num_examples: 200 - name: TextNeedleInAHaystack_test num_bytes: 11140730 num_examples: 320 - name: TQA_test num_bytes: 92861 num_examples: 200 - name: TQA_adv num_bytes: 92861 num_examples: 200 - name: WebQA_test num_bytes: 202682 num_examples: 200 - name: WebQA_adv num_bytes: 202682 num_examples: 200 - name: WikiVQA_test num_bytes: 2557847 num_examples: 200 - name: WikiVQA_adv num_bytes: 2557847 num_examples: 200 download_size: 12035444 dataset_size: 26288285 configs: - config_name: default data_files: - split: ActionLocalization_test path: preview/ActionLocalization_test-* - split: ActionLocalization_adv path: preview/ActionLocalization_adv-* - split: ActionPrediction_test path: preview/ActionPrediction_test-* - split: ActionPrediction_adv path: preview/ActionPrediction_adv-* - split: ActionSequence_test path: preview/ActionSequence_test-* - split: ActionSequence_adv path: preview/ActionSequence_adv-* - split: ALFRED_test path: preview/ALFRED_test-* - split: ALFRED_adv path: preview/ALFRED_adv-* - split: CharacterOrder_test path: preview/CharacterOrder_test-* - split: CharacterOrder_adv path: preview/CharacterOrder_adv-* - split: CLEVR_Change_test path: preview/CLEVR_Change_test-* - split: CLEVR_Change_adv path: preview/CLEVR_Change_adv-* - split: CounterfactualInference_test path: preview/CounterfactualInference_test-* - split: CounterfactualInference_adv path: preview/CounterfactualInference_adv-* - split: DocVQA_test path: preview/DocVQA_test-* - split: DocVQA_adv path: preview/DocVQA_adv-* - split: EgocentricNavigation_test path: preview/EgocentricNavigation_test-* - split: EgocentricNavigation_adv path: preview/EgocentricNavigation_adv-* - split: GPR1200_test path: preview/GPR1200_test-* - split: IEdit_test path: preview/IEdit_test-* - split: IEdit_adv path: preview/IEdit_adv-* - split: ImageNeedleInAHaystack_test path: preview/ImageNeedleInAHaystack_test-* - split: MMCoQA_test path: preview/MMCoQA_test-* - split: MMCoQA_adv path: preview/MMCoQA_adv-* - split: MovingAttribute_test path: preview/MovingAttribute_test-* - split: MovingAttribute_adv path: preview/MovingAttribute_adv-* - split: MovingDirection_test path: preview/MovingDirection_test-* - split: MovingDirection_adv path: preview/MovingDirection_adv-* - split: MultiModalQA_test path: preview/MultiModalQA_test-* - split: MultiModalQA_adv path: preview/MultiModalQA_adv-* - split: nuscenes_test path: preview/nuscenes_test-* - split: nuscenes_adv path: preview/nuscenes_adv-* - split: ObjectExistence_test path: preview/ObjectExistence_test-* - split: ObjectExistence_adv path: preview/ObjectExistence_adv-* - split: ObjectInteraction_test path: preview/ObjectInteraction_test-* - split: ObjectInteraction_adv path: preview/ObjectInteraction_adv-* - split: ObjectShuffle_test path: preview/ObjectShuffle_test-* - split: ObjectShuffle_adv path: preview/ObjectShuffle_adv-* - split: OCR_VQA_test path: preview/OCR_VQA_test-* - split: OCR_VQA_adv path: preview/OCR_VQA_adv-* - split: SceneTransition_test path: preview/SceneTransition_test-* - split: SceneTransition_adv path: preview/SceneTransition_adv-* - split: SlideVQA_test path: preview/SlideVQA_test-* - split: SlideVQA_adv path: preview/SlideVQA_adv-* - split: Spot_the_Diff_test path: preview/Spot_the_Diff_test-* - split: Spot_the_Diff_adv path: preview/Spot_the_Diff_adv-* - split: StateChange_test path: preview/StateChange_test-* - split: StateChange_adv path: preview/StateChange_adv-* - split: TextNeedleInAHaystack_test path: preview/TextNeedleInAHaystack_test-* - split: TQA_test path: preview/TQA_test-* - split: TQA_adv path: preview/TQA_adv-* - split: WebQA_test path: preview/WebQA_test-* - split: WebQA_adv path: preview/WebQA_adv-* - split: WikiVQA_test path: preview/WikiVQA_test-* - split: WikiVQA_adv path: preview/WikiVQA_adv-* task_categories: - visual-question-answering - question-answering - text-generation - image-to-text - video-classification language: - en tags: - Long-context - MLLM - VLM - LLM - Benchmark pretty_name: MileBench size_categories: - 1K<n<10K --- # MileBench ## Introduction We introduce MileBench, a pioneering benchmark designed to test the **M**ult**I**modal **L**ong-cont**E**xt capabilities of MLLMs. This benchmark comprises not only multimodal long contexts, but also multiple tasks requiring both comprehension and generation. We establish two distinct evaluation sets, diagnostic and realistic, to systematically assess MLLMs’ long-context adaptation capacity and their ability to completetasks in long-context scenarios <img src="./images/MileBench.png" width="600" alt="MileBench" align="center" /> To construct our evaluation sets, we gather 6,440 multimodal long-context samples from 21 pre-existing or self-constructed datasets, with an average of 15.2 images and 422.3 words each, as depicted in the figure, and we categorize them into their respective subsets. <center class="half"> <img src="./images/stat2.png" width="300" alt="stat2"/><img src="./images/stat1.png" width="300" alt="stat1"/> </center> ## How to use? Please download MileBench_part*.tar.gz and unzip them using the following command. ```bash for file in MileBench_part*.tar.gz do tar -xzvf "$file" done ``` Then please refer to [Code for MileBench](https://github.com/MileBench/MileBench?tab=readme-ov-file#-dataset-preparation) to evaluate. ## Links - **Homepage:** [MileBench Homepage](https://milebench.github.io/) - **Repository:** [MileBench GitHub](https://github.com/MileBench/MileBench) - **Paper:** [Arxiv](https://arxiv.org/abs/2404.18532) - **Point of Contact:** [Dingjie Song](mailto:bbsngg@outlook.com) ## Citation If you find this project useful in your research, please consider citing: ```BibTeX @article{song2024milebench, title={MileBench: Benchmarking MLLMs in Long Context}, author={Song, Dingjie and Chen, Shunian and Chen, Guiming Hardy and Yu, Fei and Wan, Xiang and Wang, Benyou}, journal={arXiv preprint arXiv:2404.18532}, year={2024} } ```
提供机构:
FreedomIntelligence
原始信息汇总

数据集概述

数据集信息

  • 许可证: CC-BY-2.0

数据集特征

  • 样本ID: 整数类型 (int32)
  • 任务指令: 字符串类型 (string)
  • 任务实例: 结构类型,包含以下子特征:
    • 上下文: 字符串类型 (string)
    • 图像路径: 字符串序列 (sequence: string)
    • 选择列表: 字符串序列 (sequence: string)
    • 组合图像1: 字符串序列 (sequence: string)
  • 响应: 字符串类型 (string)

数据集分割

  • ActionLocalization_test: 200个示例,291199字节
  • ActionLocalization_adv: 200个示例,291199字节
  • ActionPrediction_test: 200个示例,255687字节
  • ActionPrediction_adv: 200个示例,255687字节
  • ActionSequence_test: 200个示例,262234字节
  • ActionSequence_adv: 200个示例,262234字节
  • ALFRED_test: 200个示例,112715字节
  • ALFRED_adv: 200个示例,112715字节
  • CharacterOrder_test: 200个示例,274821字节
  • CharacterOrder_adv: 200个示例,274821字节
  • CLEVR_Change_test: 200个示例,114792字节
  • CLEVR_Change_adv: 200个示例,114792字节
  • CounterfactualInference_test: 200个示例,129074字节
  • CounterfactualInference_adv: 200个示例,129074字节
  • DocVQA_test: 200个示例,76660字节
  • DocVQA_adv: 200个示例,76660字节
  • EgocentricNavigation_test: 200个示例,559193字节
  • EgocentricNavigation_adv: 200个示例,559193字节
  • GPR1200_test: 600个示例,579624字节
  • IEdit_test: 200个示例,50907字节
  • IEdit_adv: 200个示例,50907字节
  • ImageNeedleInAHaystack_test: 320个示例,303423字节
  • MMCoQA_test: 200个示例,344623字节
  • MMCoQA_adv: 200个示例,344623字节
  • MovingAttribute_test: 200个示例,97299字节
  • MovingAttribute_adv: 200个示例,97299字节
  • MovingDirection_test: 200个示例,115832字节
  • MovingDirection_adv: 200个示例,115832字节
  • MultiModalQA_test: 200个示例,87978字节
  • MultiModalQA_adv: 200个示例,87978字节
  • nuscenes_test: 200个示例,87282字节
  • nuscenes_adv: 200个示例,87282字节
  • ObjectExistence_test: 200个示例,94139字节
  • ObjectExistence_adv: 200个示例,94139字节
  • ObjectInteraction_test: 200个示例,264032字节
  • ObjectInteraction_adv: 200个示例,264032字节
  • ObjectShuffle_test: 200个示例,289186字节
  • ObjectShuffle_adv: 200个示例,289186字节
  • OCR_VQA_test: 200个示例,80940字节
  • OCR_VQA_adv: 200个示例,80940字节
  • SceneTransition_test: 200个示例,266203字节
  • SceneTransition_adv: 200个示例,266203字节
  • SlideVQA_test: 200个示例,89462字节
  • SlideVQA_adv: 200个示例,89462字节
  • Spot_the_Diff_test: 200个示例,47823字节
  • Spot_the_Diff_adv: 200个示例,47823字节
  • StateChange_test: 200个示例,286783字节
  • StateChange_adv: 200个示例,286783字节
  • TextNeedleInAHaystack_test: 320个示例,11140730字节
  • TQA_test: 200个示例,92861字节
  • TQA_adv: 200个示例,92861字节
  • WebQA_test: 200个示例,202682字节
  • WebQA_adv: 200个示例,202682字节
  • WikiVQA_test: 200个示例,2557847字节
  • WikiVQA_adv: 200个示例,2557847字节

数据集大小

  • 下载大小: 12035444字节
  • 数据集大小: 26288285字节

配置

  • 默认配置: 包含多个分割的数据文件路径,每个分割对应不同的数据集部分。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作