VLLMs/MIRB

Name: VLLMs/MIRB
Creator: VLLMs
Published: 2024-06-28 16:31:30
License: 暂无描述

Hugging Face2024-06-28 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/VLLMs/MIRB

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于评估视觉和语言模型中的多图像理解能力，包括感知、知识、推理和多跳推理。数据集包含多个JSON文件和图像，每个JSON文件包含问题、答案和图像引用。数据集适用于问答任务，主要使用英语，规模在1,000到10,000条记录之间。

This dataset is used for benchmarking multi-image understanding in vision and language models, encompassing perception, knowledge, reasoning, and multi-hop reasoning. It consists of various JSON files and images, with each JSON file containing questions, answers, and references to images. The dataset is suitable for question-answering tasks, primarily in English, and ranges in size from 1,000 to 10,000 entries.

提供机构：

VLLMs

原始信息汇总

数据集概述

基本信息

许可证: MIT
任务类别: 问答
语言: 英语
数据规模: 1K<n<10K

文件结构

├── MIR |── analogy.json │── codeu.json |── dataset_namex.json └── Images ├── analogy │ └── image_x.jpg └──codeu └── image_x.jpg

JSON结构

json { "questions": " What is the expected kurtosis of the sequence created bycreate_number_sequence(-10, 10)?

-1.5
-1.2002400240024003
0
1. 2

", "answers": 2, "images": [ "images/codeu/example_53_main.png", "images/codeu/example_53_enhanced_operations.png" ] }

images字段是一个列表，每个元素的格式为images/{dataset_name}/image_name，可以直接从该路径索引图像。

引用

@article{zhao2024mirb author = {Bingchen Zhao, Yongshuo Zong, Letian Zhang, Timothy Hospedales}, title = {Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning}, journal = {arXiv preprint}, year = {2024}, }

论文链接: arxiv.org/abs/2406.12742

5,000+

优质数据集

54 个

任务类型

进入经典数据集