five

VLLMs/MIRB

收藏
Hugging Face2024-06-28 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/VLLMs/MIRB
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集用于评估视觉和语言模型中的多图像理解能力,包括感知、知识、推理和多跳推理。数据集包含多个JSON文件和图像,每个JSON文件包含问题、答案和图像引用。数据集适用于问答任务,主要使用英语,规模在1,000到10,000条记录之间。

This dataset is used for benchmarking multi-image understanding in vision and language models, encompassing perception, knowledge, reasoning, and multi-hop reasoning. It consists of various JSON files and images, with each JSON file containing questions, answers, and references to images. The dataset is suitable for question-answering tasks, primarily in English, and ranges in size from 1,000 to 10,000 entries.
提供机构:
VLLMs
原始信息汇总

数据集概述

基本信息

  • 许可证: MIT
  • 任务类别: 问答
  • 语言: 英语
  • 数据规模: 1K<n<10K

文件结构

├── MIR |── analogy.json │── codeu.json |── dataset_namex.json └── Images ├── analogy │ └── image_x.jpg └──codeu └── image_x.jpg

JSON结构

json { "questions": " What is the expected kurtosis of the sequence created bycreate_number_sequence(-10, 10)?

  1. -1.5
  2. -1.2002400240024003
  3. 0
    1. 2

", "answers": 2, "images": [ "images/codeu/example_53_main.png", "images/codeu/example_53_enhanced_operations.png" ] }

  • images字段是一个列表,每个元素的格式为images/{dataset_name}/image_name,可以直接从该路径索引图像。

引用

@article{zhao2024mirb author = {Bingchen Zhao, Yongshuo Zong, Letian Zhang, Timothy Hospedales}, title = {Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning}, journal = {arXiv preprint}, year = {2024}, }

论文链接: arxiv.org/abs/2406.12742

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作