cvbench

Name: cvbench
Creator: maas
Published: 2026-01-09 20:02:31
License: 暂无描述

魔搭社区2026-01-09 更新2025-11-08 收录

下载链接：

https://modelscope.cn/datasets/comefly/cvbench

下载链接

链接失效反馈

官方服务：

资源简介：

<a href="https://arxiv.org/abs/2406.16860" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-Cambrian--1-red?logo=arxiv" /> </a> <a href="https://cambrian-mllm.github.io/" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="Website" src="https://img.shields.io/badge/🌎_Website-cambrian--mllm.github.io-blue.svg" /> </a> <a href="https://github.com/cambrian-mllm/cambrian" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="GitHub Code" src="https://img.shields.io/badge/Code-cambrian--mllm/cambrian-white?&logo=github&logoColor=white" /> </a> <a href="https://huggingface.co/collections/nyu-visionx/cambrian-1-models-666fa7116d5420e514b0f23c" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Model-Cambrian--1-ffc107?color=ffc107&logoColor=white" /> </a> <a href="https://huggingface.co/collections/nyu-visionx/cambrian-data-6667ce801e179b4fbe774e11" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Data-Cambrian--10M-ffc107?color=ffc107&logoColor=white" /> </a> # Cambrian Vision-Centric Benchmark (CV-Bench) This repository contains the Cambrian Vision-Centric Benchmark (CV-Bench), introduced in [Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs](https://arxiv.org/pdf/2406.16860). ## Files The `test*.parquet` files contain the dataset annotations and images pre-loaded for processing with HF Datasets. These can be loaded in 3 different configurations using `datasets` as follows: ```python from datasets import load_dataset # default: both 2D and 3D tasks cv_bench = load_dataset("nyu-visionx/CV-Bench") # 2D tasks only cv_bench_2d = load_dataset("nyu-visionx/CV-Bench", "2D") # 3D tasks only cv_bench_3d = load_dataset("nyu-visionx/CV-Bench", "3D") ``` Additionally, we provide the raw images and annotations separately. - `test_2d.jsonl`: 2D text annotations - `test_3d.jsonl`: 3D text annotations - `img/` dir: images corresponding to the `filename` field in the annotations ## Dataset Description CV-Bench addresses the limited size of existing vision-centric benchmarks, containing `2638` *manually-inspected* examples. By repurposing standard vision benchmarks, `ADE20k`, `COCO` and `OMNI3D`, we assess models at classic vision tasks within a multimodal context. Leveraging the rich ground truth annotations from the benchmarks, we formulate natural language questions that probe the fundamental 2D and 3D understanding of the models. CV-Bench evaluates 2D understanding via spatial relationships & object counting, and 3D understanding via depth order & relative distance. The dataset contains the following fields: | Field Name | Description | | :--------- | :---------- | | `idx` | Global index of the entry in the dataset | | `type` | Type of task: `2D` or `3D` | | `task` | The task associated with the entry | | `image` | Image object | | `question` | Question asked about the image | | `choices` | Answer choices for the question | | `answer` | Correct answer to the question | | `prompt` | Prompt with question and choices pre-formatted | | `filename` | Path to the image in the `img/` directory | | `source` | Source of the image: `ADE20K`, `COCO`, or `Omni3D` | | `source_dataset` | More detailed source of the image | | `source_filename` | Filename of the image in the source dataset | | `target_class` | Target class of the image (only for `COCO` images) | | `target_size` | Target size of the image (only for `COCO` images) | | `bbox` | Bounding box of the image (only for `Omni3D` images) | ## Accuracy We calculate the accuracy for each task and compute a combined accuracy as specified in the following formula: $$\text{CV-Bench Accuracy} = \frac 1 2 \left( \frac{\text{accuracy}_{2D_{ade}} + \text{accuracy}_{2D_{coco}}}{2} + \text{accuracy}_{3D_{omni}} \right)$$ ### Example Code ```python import pandas as pd # Load the CSV file into a DataFrame df = pd.read_csv('cv_bench_results.csv') # Define a function to calculate accuracy for a given source def calculate_accuracy(df, source): source_df = df[df['source'] == source] accuracy = source_df['result'].mean() # Assuming 'result' is 1 for correct and 0 for incorrect return accuracy # Calculate accuracy for each source accuracy_2d_ade = calculate_accuracy(df, 'ADE20K') accuracy_2d_coco = calculate_accuracy(df, 'COCO') accuracy_3d_omni = calculate_accuracy(df, 'Omni3D') # Calculate the accuracy for each type accuracy_2d = (accuracy_2d_ade + accuracy_2d_coco) / 2 accuracy_3d = accuracy_3d_omni # Compute the combined accuracy as specified combined_accuracy = (accuracy_2d + accuracy_3d) / 2 # Print the results print(f"CV-Bench Accuracy: {combined_accuracy:.4f}") print() print(f"Type Accuracies:") print(f"2D Accuracy: {accuracy_2d:.4f}") print(f"3D Accuracy: {accuracy_3d:.4f}") print() print(f"Source Accuracies:") print(f"ADE20K Accuracy: {accuracy_2d_ade:.4f}") print(f"COCO Accuracy: {accuracy_2d_coco:.4f}") print(f"Omni3D Accuracy: {accuracy_3d_omni:.4f}") ``` ## Citation ```bibtex @misc{tong2024cambrian1, title={Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs}, author={Shengbang Tong and Ellis Brown and Penghao Wu and Sanghyun Woo and Manoj Middepogu and Sai Charitha Akula and Jihan Yang and Shusheng Yang and Adithya Iyer and Xichen Pan and Austin Wang and Rob Fergus and Yann LeCun and Saining Xie}, year={2024}, eprint={2406.16860}, } ```

<a href="https://arxiv.org/abs/2406.16860" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="arXiv论文" src="https://img.shields.io/badge/arXiv-Cambrian--1-red?logo=arxiv" /> </a> <a href="https://cambrian-mllm.github.io/" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="官方网站" src="https://img.shields.io/badge/🌎_Website-cambrian--mllm.github.io-blue.svg" /> </a> <a href="https://github.com/cambrian-mllm/cambrian" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="GitHub代码" src="https://img.shields.io/badge/Code-cambrian--mllm/cambrian-white?&logo=github&logoColor=white" /> </a> <a href="https://huggingface.co/collections/nyu-visionx/cambrian-1-models-666fa7116d5420e514b0f23c" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="Hugging Face模型" src="https://img.shields.io/badge/🤗_Model-Cambrian--1-ffc107?color=ffc107&logoColor=white" /> </a> <a href="https://huggingface.co/collections/nyu-visionx/cambrian-data-6667ce801e179b4fbe774e11" target="_blank" style="display: inline-block; margin-right: 10px;"> <img alt="Hugging Face数据集" src="https://img.shields.io/badge/🤗_Data-Cambrian--10M-ffc107?color=ffc107&logoColor=white" /> </a> # 寒武纪视觉基准测试集（Cambrian Vision-Centric Benchmark，CV-Bench）本仓库收录了寒武纪视觉基准测试集（Cambrian Vision-Centric Benchmark，CV-Bench），相关研究载于论文《Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs》（https://arxiv.org/pdf/2406.16860）。 ## 文件说明 `test*.parquet` 文件包含适配 Hugging Face Datasets（HF Datasets）加载处理所需的数据集标注与预加载图像。可通过`datasets`库以三种不同配置加载该数据集，示例代码如下： python from datasets import load_dataset # 默认配置：同时包含2D与3D任务 cv_bench = load_dataset("nyu-visionx/CV-Bench") # 仅加载2D任务 cv_bench_2d = load_dataset("nyu-visionx/CV-Bench", "2D") # 仅加载3D任务 cv_bench_3d = load_dataset("nyu-visionx/CV-Bench", "3D") 此外，我们还单独提供了原始图像与标注文件： - `test_2d.jsonl`: 2D 文本标注文件 - `test_3d.jsonl`: 3D 文本标注文件 - `img/` 目录：与标注文件中`filename`字段对应的图像文件 ## 数据集概述 CV-Bench针对现有视觉聚焦基准测试集规模有限的问题，共收录**2638条经人工校验**的样本。我们通过复用`ADE20k`、`COCO`与`OMNI3D`等标准视觉基准测试集，在多模态语境下对模型的经典视觉任务能力进行评估。依托基准测试集丰富的真值标注，我们设计了自然语言问题，以探究模型对2D与3D视觉内容的基础理解能力。CV-Bench通过空间关系与物体计数任务评估模型的2D理解能力，通过深度顺序与相对距离任务评估模型的3D理解能力。该数据集包含以下字段： | 字段名 | 描述 | | :--------- | :---------- | | `idx` | 全局索引：数据集中条目的全局唯一标识 | | `type` | 任务类型：可选值为`2D`或`3D` | | `task` | 当前条目对应的具体任务 | | `image` | 图像对象 | | `question` | 针对该图像提出的问题 | | `choices` | 问题的候选答案 | | `answer` | 问题的正确答案 | | `prompt` | 整合了问题与候选答案的格式化提示词 | | `filename` | `img/`目录中对应图像的路径 | | `source` | 图像来源：可选值为`ADE20K`、`COCO`或`Omni3D` | | `source_dataset` | 图像的更细分来源 | | `source_filename` | 原始数据集中的图像文件名 | | `target_class` | 图像的目标类别（仅适用于`COCO`图像） | | `target_size` | 图像的目标尺寸（仅适用于`COCO`图像） | | `bbox` | 图像的边界框（仅适用于`Omni3D`图像） | ## 准确率计算我们会针对每个任务单独计算准确率，并按照以下公式计算综合准确率： $$ ext{CV-Bench 综合准确率} = frac 1 2 left( frac{ ext{准确率}_{2D_{ADE20K}} + ext{准确率}_{2D_{COCO}}}{2} + ext{准确率}_{3D_{Omni3D}} ight)$$ ### 示例代码 python import pandas as pd # 将CSV结果文件加载为DataFrame df = pd.read_csv('cv_bench_results.csv') # 定义计算指定来源任务准确率的函数 def calculate_accuracy(df, source): source_df = df[df['source'] == source] # 假设'result'字段中1代表预测正确，0代表预测错误 accuracy = source_df['result'].mean() return accuracy # 计算各来源任务的准确率 accuracy_2d_ade = calculate_accuracy(df, 'ADE20K') accuracy_2d_coco = calculate_accuracy(df, 'COCO') accuracy_3d_omni = calculate_accuracy(df, 'Omni3D') # 计算2D与3D任务的整体准确率 accuracy_2d = (accuracy_2d_ade + accuracy_2d_coco) / 2 accuracy_3d = accuracy_3d_omni # 按照公式计算最终综合准确率 combined_accuracy = (accuracy_2d + accuracy_3d) / 2 # 打印计算结果 print(f"CV-Bench 综合准确率: {combined_accuracy:.4f}") print() print(f"单类型任务准确率:") print(f"2D任务平均准确率: {accuracy_2d:.4f}") print(f"3D任务平均准确率: {accuracy_3d:.4f}") print() print(f"各来源任务准确率:") print(f"ADE20K 准确率: {accuracy_2d_ade:.4f}") print(f"COCO 准确率: {accuracy_2d_coco:.4f}") print(f"Omni3D 准确率: {accuracy_3d_omni:.4f}") ## 引用 bibtex @misc{tong2024cambrian1, title={Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs}, author={Shengbang Tong and Ellis Brown and Penghao Wu and Sanghyun Woo and Manoj Middepogu and Sai Charitha Akula and Jihan Yang and Shusheng Yang and Adithya Iyer and Xichen Pan and Austin Wang and Rob Fergus and Yann LeCun and Saining Xie}, year={2024}, eprint={2406.16860}, }

提供机构：

maas

创建时间：

2025-11-03

搜集汇总

数据集介绍