mlfoundations/VisIT-Bench

Name: mlfoundations/VisIT-Bench
Creator: mlfoundations
Published: 2024-01-23 08:48:48
License: 暂无描述

Hugging Face2024-01-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mlfoundations/VisIT-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

VisIT-Bench是一个用于视觉和语言指令跟随的数据集和基准测试。该数据集由图像-指令对和相应的示例输出组成，涵盖了从简单对象识别到复杂推理任务的各种任务。数据集提供了对聊天机器人能力的全面视图。结果表明，像GPT-4和BLIP2这样的最先进模型具有较高的成功率，但仍有改进的空间。

VisIT-Bench is a dataset and benchmark for visual and language instruction following. It comprises image-instruction pairs and corresponding example outputs, covering a wide range of tasks spanning from simple object recognition to complex reasoning tasks. This dataset provides a comprehensive view of chatbot capabilities. Results indicate that state-of-the-art models such as GPT-4 and BLIP2 achieve high success rates, yet there remains room for further improvement.

提供机构：

mlfoundations

原始信息汇总

数据集概述

基本信息

名称: VisIT-Bench
描述: 一个用于视觉与语言指令遵循的数据集和基准，包含图像-指令对及相应的示例输出，覆盖从简单对象识别到复杂推理任务的广泛范围。
语言: 英语 (en)
创建者: 通过众包方式创建
数据集大小: 10K<n<100K
来源: 原始数据
许可证: CC-BY-4.0

数据结构

数据字段:
- instruction_category (字符串)
- image_url (字符串)
- image (图像)
- visual (字符串)
- instruction (字符串)
- instruction_conditioned_caption (字符串)
- reference_output (字符串)
- human_ratings_gpt4_correct (布尔值)
- human_ratings_problem_in_caption (布尔值)
- human_ratings_problem_in_gpt4 (布尔值)
- public_images_metadata (字典)
数据分割: 目前仅有一个测试集 (TEST)，未来将提供更多分割。

数据加载

python from datasets import load_dataset examples = load_dataset(mlfoundations/visit-bench, use_auth_token=<YOUR USER ACCESS TOKEN>)

许可证信息

数据集的新贡献（如指令、参考输出、模型排名注释等）根据CC BY 4.0许可证授权。
所有使用的图像均为公共授权，具体许可证信息可在数据集中的"public_images_metadata"字段查看。
商业用途限制：可作为测试集使用，禁止作为训练集使用。

注释

通过亚马逊Mechanical Turk的众包工作者进行注释。
注释过程遵循详细步骤，生成指令、参考输出和模型排名注释。

使用数据注意事项

社会影响: 旨在促进研究AI模型理解和遵循自然语言指令及视觉输入的能力。
数据限制: 可能未涵盖所有类型的指令，特别是需要复杂推理或高级知识的指令。
隐私: 使用公开图像，未披露图像具体来源，保护图像创作者隐私。
选择理由: 提供广泛的指令类型和难度级别，挑战当前AI能力。

引用信息

bibtex @misc{bitton2023visitbench, title={VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use}, author={Yonatan Bitton and Hritik Bansal and Jack Hessel and Rulin Shao and Wanrong Zhu and Anas Awadalla and Josh Gardner and Rohan Taori and Ludwig Schimdt}, year={2023}, eprint={2308.06595}, archivePrefix={arXiv}, primaryClass={cs.CL} }

搜集汇总

数据集介绍

背景与挑战

背景概述

VisIT-Bench is a multimodal dataset designed to assess vision-and-language instruction-following abilities in AI models. It contains 574 image-instruction pairs with human annotations, GPT-4 responses, and task performance metrics, covering diverse scenarios from object recognition to complex reasoning. The dataset serves as a testbed for evaluating model performance in real-world interactive tasks.

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集