omlab/OVDEval

Name: omlab/OVDEval
Creator: omlab
Published: 2023-12-26 06:32:15
License: 暂无描述

Hugging Face2023-12-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/omlab/OVDEval

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: openrail --- <h1 align="center"> OVDEval </h1> <h2 align="center"> A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection</h2> <p align="center"> <a href="https://arxiv.org/abs/2308.13177"><strong> [Paper 📄] </strong></a> </p> ## Dataset Description **OVDEval** is a new benchmark for OVD model, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called **Non-Maximum Suppression Average Precision (NMS-AP)** to address this issue. ## Data Details ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a2e94991d8e7fb24f7688/ngOkek9wJdppyxPB0xZ8Q.png) ## Dataset Structure ```python { "categories": [ { "supercategory": "object", "id": 0, "name": "computer without screen on" }, { "supercategory": "object", "id": 1, "name": "computer with screen on" } ] "annotations": [ { "id": 0, "bbox": [ 111, 117, 99, 75 ], "category_id": 0, "image_id": 0, "iscrowd": 0, "area": 7523 }] "images": [ { "file_name": "64d22c6fe4b011b0db94b993.jpg", "id": 0, "height": 254, "width": 340, "text": [ "computer without screen on" # "text" represents the annotated positive labels of this image. ], "neg_text": [ "computer with screen on" # "neg_text" contains fine-grained hard negative labels which are generated according specific sub-tasks. ] }] } ``` ## How to use it Reference https://github.com/om-ai-lab/OVDEval ## Languages The dataset contains questions in English and code solutions in Python. ## Citation Information If you find our data, or code helpful, please cite the original paper: ``` @article{yao2023evaluate, title={How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection}, author={Yao, Yiyang and Liu, Peng and Zhao, Tiancheng and Zhang, Qianqian and Liao, Jiajia and Fang, Chunxin and Lee, Kyusong and Wang, Qing}, journal={arXiv preprint arXiv:2308.13177}, year={2023} } ```

--- license: openrail --- <h1 align="center">OVDEval</h1> <h2 align="center">开放词汇检测（Open-Vocabulary Detection, OVD）综合评测基准</h2> <p align="center"> <a href="https://arxiv.org/abs/2308.13177"><strong> [论文 📄] </strong></a> </p> ## 数据集描述 **OVDEval** 是一款面向开放词汇检测（Open-Vocabulary Detection, OVD）模型的全新评测基准，涵盖9个子任务，并新增了常识知识、属性理解、位置认知、对象关系理解等多维度的评测内容。本数据集精心构建了难负样本，用以检验模型对视觉与语言输入的真实理解能力。此外，我们发现当前主流的平均精度（Average Precision, AP）指标在这类细粒度标注数据集上开展模型评测时存在局限性，因此提出了一种名为**非极大值抑制平均精度（Non-Maximum Suppression Average Precision, NMS-AP）**的新型评测指标以解决该问题。 ## 数据详情 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a2e94991d8e7fb24f7688/ngOkek9wJdppyxPB0xZ8Q.png) ## 数据集结构 python { "categories": [ { "supercategory": "object", "id": 0, "name": "未点亮屏幕的计算机" }, { "supercategory": "object", "id": 1, "name": "点亮屏幕的计算机" } ], "annotations": [ { "id": 0, "bbox": [ 111, 117, 99, 75 ], "category_id": 0, "image_id": 0, "iscrowd": 0, "area": 7523 } ], "images": [ { "file_name": "64d22c6fe4b011b0db94b993.jpg", "id": 0, "height": 254, "width": 340, "text": [ "未点亮屏幕的计算机" # "text"代表该图像的标注正样本标签 ], "neg_text": [ "点亮屏幕的计算机" # "neg_text"包含依据特定子任务生成的细粒度难负样本标签 ] } ] } ## 使用方法请参考：https://github.com/om-ai-lab/OVDEval ## 语言说明本数据集包含英文问题与Python代码实现。 ## 引用信息若您认为本数据集或代码对您的研究有所帮助，请引用以下原论文： @article{yao2023evaluate, title={How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection}, author={Yao, Yiyang and Liu, Peng and Zhao, Tiancheng and Zhang, Qianqian and Liao, Jiajia and Fang, Chunxin and Lee, Kyusong and Wang, Qing}, journal={arXiv preprint arXiv:2308.13177}, year={2023} }

提供机构：

omlab

原始信息汇总

OVDEval 数据集概述

数据集描述

OVDEval 是一个用于开放词汇检测（OVD）模型的新基准，包含9个子任务，并引入了对常识知识、属性理解、位置理解、对象关系理解等的评估。该数据集精心设计，以提供挑战模型对视觉和语言输入真正理解的硬负例。此外，我们发现流行的平均精度（AP）指标在评估这些细粒度标签数据集时存在问题，并提出了一种新的指标——非极大值抑制平均精度（NMS-AP）来解决这一问题。

数据详情

数据集结构如下：

python { "categories": [ { "supercategory": "object", "id": 0, "name": "computer without screen on" }, { "supercategory": "object", "id": 1, "name": "computer with screen on" } ], "annotations": [ { "id": 0, "bbox": [ 111, 117, 99, 75 ], "category_id": 0, "image_id": 0, "iscrowd": 0, "area": 7523 } ], "images": [ { "file_name": "64d22c6fe4b011b0db94b993.jpg", "id": 0, "height": 254, "width": 340, "text": [ "computer without screen on" # "text" 表示该图像的标注正例标签。 ], "neg_text": [ "computer with screen on" # "neg_text" 包含根据特定子任务生成的细粒度硬负例标签。 ] } ] }

语言

数据集包含英语的问题和Python的代码解决方案。

搜集汇总

数据集介绍

构建方式

在开放词汇检测领域，OVDEval数据集的构建体现了对模型泛化能力的深度考量。该数据集通过精心设计的九个子任务，系统性地涵盖了常识知识、属性理解、位置感知及物体关系等多个维度。其核心在于引入了具有挑战性的硬负样本，这些样本基于细粒度标签生成，旨在检验模型对视觉与语言输入的真正理解。数据标注结构包含类别、边界框及正负文本描述，确保了评估任务的复杂性与多样性。

特点

OVDEval的显著特点在于其全面而精细的评估框架。它不仅突破了传统检测数据集的局限，通过硬负样本和细粒度标签深化了模型测试的难度，还创新性地提出了非极大值抑制平均精度这一新指标，以解决现有评估方法在细粒度数据集上的度量偏差。数据集结构清晰，涵盖图像、注释及正负文本字段，支持对开放词汇检测模型在多维度认知任务上的系统性评测。

使用方法

使用OVDEval数据集时，研究者可依托其提供的结构化数据，对开放词汇检测模型进行综合性能评估。数据集以JSON格式组织，包含类别、图像及注释信息，其中正负文本字段分别标注了图像的正面标签与硬负样本标签。用户可通过官方GitHub仓库获取详细使用指南，结合提出的NMS-AP指标，对模型在九个子任务上的表现进行量化分析，从而深入探究模型在开放词汇场景下的泛化能力与认知局限。

背景与挑战

背景概述

在计算机视觉领域，开放词汇检测（Open-Vocabulary Detection, OVD）旨在使模型能够识别训练过程中未见过的类别，从而推动视觉系统向更通用的人工智能迈进。OVDEval基准由研究团队于2023年提出，其核心研究问题聚焦于全面评估模型在细粒度视觉语言理解上的泛化能力。该数据集通过精心设计的九个子任务，涵盖常识知识、属性理解、位置感知及物体关系等多个维度，不仅挑战模型对视觉与语言输入的深层理解，还针对现有评估指标的局限性提出了改进方案，对推动开放词汇检测领域的发展具有重要影响力。

当前挑战

OVDEval所针对的领域挑战在于开放词汇检测模型在细粒度类别区分与复杂场景理解上的不足，例如模型需准确辨识‘屏幕开启的电脑’与‘屏幕关闭的电脑’这类细微差异，同时处理物体属性、空间关系等高级语义信息。在构建过程中，数据集面临生成高质量困难负样本的挑战，这些负样本需在视觉或语言上与正样本高度相似，以有效检验模型的真实理解能力；此外，传统平均精度指标在细粒度标签评估中存在偏差，促使研究者开发非极大值抑制平均精度新指标，以更精准地衡量模型性能。

常用场景

经典使用场景

在计算机视觉领域，开放词汇检测（OVD）旨在突破传统检测模型对固定类别集合的依赖，实现对任意文本描述对象的识别与定位。OVDEval作为该领域的综合性评估基准，其经典使用场景在于系统性地评测模型在细粒度视觉概念理解上的泛化能力。研究者通过该数据集的九个子任务，如常识知识推理、属性理解、位置关系解析等，能够深入分析模型在面对精心设计的困难负样本时的表现，从而揭示模型对视觉与语言信息融合的真实理解深度。

衍生相关工作

OVDEval作为一项基准性工作，其提出的评估框架和NMS-AP指标已对后续研究产生了显著影响。它促使学术界重新审视开放词汇检测模型的评估方式，并激发了围绕细粒度属性理解、视觉常识推理以及困难负样本构建等一系列衍生研究方向。基于该基准的深入分析，催生了旨在提升模型在特定子任务上性能的新方法，同时也推动了针对评估指标本身（如长尾分布、标签歧义性处理）的改进研究。这些相关工作共同深化了社区对开放世界视觉感知挑战的认识，构成了该领域技术演进的重要脉络。

数据集最近研究