---
license: openrail
---
<h1 align="center"> OVDEval </h1>
<h2 align="center"> A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection</h2>
<p align="center">
<a href="https://arxiv.org/abs/2308.13177"><strong> [Paper 📄] </strong></a>
</p>
## Dataset Description
**OVDEval** is a new benchmark for OVD model, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called **Non-Maximum Suppression Average Precision (NMS-AP)** to address this issue.
## Data Details

## Dataset Structure
```python
{
"categories": [
{
"supercategory": "object",
"id": 0,
"name": "computer without screen on"
},
{
"supercategory": "object",
"id": 1,
"name": "computer with screen on"
}
]
"annotations": [
{
"id": 0,
"bbox": [
111,
117,
99,
75
],
"category_id": 0,
"image_id": 0,
"iscrowd": 0,
"area": 7523
}]
"images": [
{
"file_name": "64d22c6fe4b011b0db94b993.jpg",
"id": 0,
"height": 254,
"width": 340,
"text": [
"computer without screen on" # "text" represents the annotated positive labels of this image.
],
"neg_text": [
"computer with screen on" # "neg_text" contains fine-grained hard negative labels which are generated according specific sub-tasks.
]
}]
}
```
## How to use it
Reference https://github.com/om-ai-lab/OVDEval
## Languages
The dataset contains questions in English and code solutions in Python.
## Citation Information
If you find our data, or code helpful, please cite the original paper:
```
@article{yao2023evaluate,
title={How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection},
author={Yao, Yiyang and Liu, Peng and Zhao, Tiancheng and Zhang, Qianqian and Liao, Jiajia and Fang, Chunxin and Lee, Kyusong and Wang, Qing},
journal={arXiv preprint arXiv:2308.13177},
year={2023}
}
```
---
license: openrail
---
<h1 align="center">OVDEval</h1>
<h2 align="center">开放词汇检测(Open-Vocabulary Detection, OVD)综合评测基准</h2>
<p align="center">
<a href="https://arxiv.org/abs/2308.13177"><strong> [论文 📄] </strong></a>
</p>
## 数据集描述
**OVDEval** 是一款面向开放词汇检测(Open-Vocabulary Detection, OVD)模型的全新评测基准,涵盖9个子任务,并新增了常识知识、属性理解、位置认知、对象关系理解等多维度的评测内容。本数据集精心构建了难负样本,用以检验模型对视觉与语言输入的真实理解能力。此外,我们发现当前主流的平均精度(Average Precision, AP)指标在这类细粒度标注数据集上开展模型评测时存在局限性,因此提出了一种名为**非极大值抑制平均精度(Non-Maximum Suppression Average Precision, NMS-AP)**的新型评测指标以解决该问题。
## 数据详情

## 数据集结构
python
{
"categories": [
{
"supercategory": "object",
"id": 0,
"name": "未点亮屏幕的计算机"
},
{
"supercategory": "object",
"id": 1,
"name": "点亮屏幕的计算机"
}
],
"annotations": [
{
"id": 0,
"bbox": [
111,
117,
99,
75
],
"category_id": 0,
"image_id": 0,
"iscrowd": 0,
"area": 7523
}
],
"images": [
{
"file_name": "64d22c6fe4b011b0db94b993.jpg",
"id": 0,
"height": 254,
"width": 340,
"text": [
"未点亮屏幕的计算机" # "text"代表该图像的标注正样本标签
],
"neg_text": [
"点亮屏幕的计算机" # "neg_text"包含依据特定子任务生成的细粒度难负样本标签
]
}
]
}
## 使用方法
请参考:https://github.com/om-ai-lab/OVDEval
## 语言说明
本数据集包含英文问题与Python代码实现。
## 引用信息
若您认为本数据集或代码对您的研究有所帮助,请引用以下原论文:
@article{yao2023evaluate,
title={How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection},
author={Yao, Yiyang and Liu, Peng and Zhao, Tiancheng and Zhang, Qianqian and Liao, Jiajia and Fang, Chunxin and Lee, Kyusong and Wang, Qing},
journal={arXiv preprint arXiv:2308.13177},
year={2023}
}