VLLMs/MIRB
收藏Hugging Face2024-06-28 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/VLLMs/MIRB
下载链接
链接失效反馈官方服务:
资源简介:
该数据集用于评估视觉和语言模型中的多图像理解能力,包括感知、知识、推理和多跳推理。数据集包含多个JSON文件和图像,每个JSON文件包含问题、答案和图像引用。数据集适用于问答任务,主要使用英语,规模在1,000到10,000条记录之间。
This dataset is used for benchmarking multi-image understanding in vision and language models, encompassing perception, knowledge, reasoning, and multi-hop reasoning. It consists of various JSON files and images, with each JSON file containing questions, answers, and references to images. The dataset is suitable for question-answering tasks, primarily in English, and ranges in size from 1,000 to 10,000 entries.
提供机构:
VLLMs
原始信息汇总
数据集概述
基本信息
- 许可证: MIT
- 任务类别: 问答
- 语言: 英语
- 数据规模: 1K<n<10K
文件结构
├── MIR |── analogy.json │── codeu.json |── dataset_namex.json └── Images ├── analogy │ └── image_x.jpg └──codeu └── image_x.jpg
JSON结构
json
{
"questions": " What is the expected kurtosis of the sequence created bycreate_number_sequence(-10, 10)?
- -1.5
- -1.2002400240024003
- 0
-
- 2
", "answers": 2, "images": [ "images/codeu/example_53_main.png", "images/codeu/example_53_enhanced_operations.png" ] }
images字段是一个列表,每个元素的格式为images/{dataset_name}/image_name,可以直接从该路径索引图像。
引用
@article{zhao2024mirb author = {Bingchen Zhao, Yongshuo Zong, Letian Zhang, Timothy Hospedales}, title = {Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning}, journal = {arXiv preprint}, year = {2024}, }
论文链接: arxiv.org/abs/2406.12742



