maelic/GQA200-coco-format
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/maelic/GQA200-coco-format
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- object-detection
tags:
- scene-graph-generation
- visual-relationship-detection
- gqa
- coco-format
language:
- en
pretty_name: GQA — General Question Answering (COCO format)
size_categories:
- 100K<n<1M
---
# GQA — General Question Answering (COCO format)
This dataset is the **GQA200** split of
[the GQA dataset](https://cs.stanford.edu/people/dorarad/gqa/about.html)
(Hudson et al., 2019), reformatted in standard COCO-JSON format.
GQA200 contains the top 200 object categories
and 100 relations from the original GQA dataset, selected by frequency in the
[Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation
paper](https://arxiv.org/abs/2203.09811). This dataset has no official test split since it was used
for question answering rather than scene graph generation (for test there is no scene graph annotations).
This version in COCO format was produced as part of the
[SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and used to train
the models described in the **REACT++** paper
([Neau et al., 2026](https://arxiv.org/abs/2603.06386)).
---
## Annotation overview
Each image comes with:
- **Object bounding boxes** — 200 GQA object categories.
- **Scene-graph relations** — 100 predicate categories connecting pairs of objects as
directed `(subject, predicate, object)` triplets.

*Four random validation images with bounding boxes (coloured by category) and
relation arrows (yellow, labelled with the predicate name).*
---
## Dataset statistics
| Split | Images | Object annotations | Relations |
|-------|--------:|-------------------:|-----------:|
| train | 57 623 | 775 744 | 238 720 |
| val | 8 209 | 110 030 | 33 487 |
---
## Object categories (200)
Top-200 GQA object vocabulary used by the standard SGG split. Full list
embedded in `dataset_info.description`.
## Predicate categories (100)
Top 100 GQA predicate vocabulary used by the standard SGG split. Full list
embedded in `dataset_info.description`.
---
## Dataset structure
```python
DatasetDict({
train: Dataset({
features: ['image', 'image_id', 'width', 'height', 'file_name',
'objects', 'relations'],
num_rows: 57623
}),
val: Dataset({
features: ['image', 'image_id', 'width', 'height', 'file_name',
'objects', 'relations'],
num_rows: 8209
}),
})
```
Each row contains:
| Field | Type | Description |
|-------|------|-------------|
| `image` | `Image` | PIL image |
| `image_id` | `int` | Original GQA200 image id |
| `width` / `height` | `int` | Image dimensions |
| `file_name` | `str` | Original filename |
| `objects` | `List[dict]` | `{id, category_id, bbox (xywh), area, iscrowd, segmentation}` |
| `relations` | `List[dict]` | `{id, subject_id, object_id, predicate_id}` — ids refer to `objects[*].id` |
---
## Usage
```python
from datasets import load_dataset
import json
ds = load_dataset("maelic/GQA200-coco-format")
# Recover label maps from the embedded metadata
meta = json.loads(ds["train"].info.description)
cat_id2name = {c["id"]: c["name"] for c in meta["categories"]}
pred_id2name = {c["id"]: c["name"] for c in meta["rel_categories"]}
sample = ds["train"][0]
image = sample["image"] # PIL Image
for obj in sample["objects"]:
print(cat_id2name[obj["category_id"]], obj["bbox"])
for rel in sample["relations"]:
print(rel["subject_id"], "--", pred_id2name[rel["predicate_id"]], "->", rel["object_id"])
```
---
## Citation
If you use this dataset, please cite GQA:
```bibtex
@inproceedings{hudson2019gqa,
title={Gqa: A new dataset for real-world visual reasoning and compositional question answering},
author={Hudson, Drew A and Manning, Christopher D},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={6700--6709},
year={2019}
}
```
And also the paper that established the GQA-200 split:
```bibtex
@inproceedings{dong2022stacked,
title={Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation},
author={Dong, Xingning and Gan, Tian and Song, Xuemeng and Wu, Jianlong and Cheng, Yuan and Nie, Liqiang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={19427--19436},
year={2022}
}
```
And the REACT paper if you use the SGG-Benchmark models:
```bibtex
@inproceedings{Neau_2025_BMVC,
author = {Ma\"elic Neau and Paulo Eduardo Santos and Anne-Gwenn Bosser
and Akihiro Sugimoto and Cedric Buche},
title = {REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs
in Scene Graph Generation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025,
Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year = {2025},
url = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_239/paper.pdf},
}
```
---
## License
The GQA images and annotations are released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) license.
提供机构:
maelic



