maelic/GQA200-coco-format

Name: maelic/GQA200-coco-format
Creator: maelic
Published: 2026-03-23 18:32:12
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/maelic/GQA200-coco-format

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - object-detection tags: - scene-graph-generation - visual-relationship-detection - gqa - coco-format language: - en pretty_name: GQA — General Question Answering (COCO format) size_categories: - 100K<n<1M --- # GQA — General Question Answering (COCO format) This dataset is the **GQA200** split of [the GQA dataset](https://cs.stanford.edu/people/dorarad/gqa/about.html) (Hudson et al., 2019), reformatted in standard COCO-JSON format. GQA200 contains the top 200 object categories and 100 relations from the original GQA dataset, selected by frequency in the [Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation paper](https://arxiv.org/abs/2203.09811). This dataset has no official test split since it was used for question answering rather than scene graph generation (for test there is no scene graph annotations). This version in COCO format was produced as part of the [SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and used to train the models described in the **REACT++** paper ([Neau et al., 2026](https://arxiv.org/abs/2603.06386)). --- ## Annotation overview Each image comes with: - **Object bounding boxes** — 200 GQA object categories. - **Scene-graph relations** — 100 predicate categories connecting pairs of objects as directed `(subject, predicate, object)` triplets. ![Annotation example — val split](gqa200_samples_val.png) *Four random validation images with bounding boxes (coloured by category) and relation arrows (yellow, labelled with the predicate name).* --- ## Dataset statistics | Split | Images | Object annotations | Relations | |-------|--------:|-------------------:|-----------:| | train | 57 623 | 775 744 | 238 720 | | val | 8 209 | 110 030 | 33 487 | --- ## Object categories (200) Top-200 GQA object vocabulary used by the standard SGG split. Full list embedded in `dataset_info.description`. ## Predicate categories (100) Top 100 GQA predicate vocabulary used by the standard SGG split. Full list embedded in `dataset_info.description`. --- ## Dataset structure ```python DatasetDict({ train: Dataset({ features: ['image', 'image_id', 'width', 'height', 'file_name', 'objects', 'relations'], num_rows: 57623 }), val: Dataset({ features: ['image', 'image_id', 'width', 'height', 'file_name', 'objects', 'relations'], num_rows: 8209 }), }) ``` Each row contains: | Field | Type | Description | |-------|------|-------------| | `image` | `Image` | PIL image | | `image_id` | `int` | Original GQA200 image id | | `width` / `height` | `int` | Image dimensions | | `file_name` | `str` | Original filename | | `objects` | `List[dict]` | `{id, category_id, bbox (xywh), area, iscrowd, segmentation}` | | `relations` | `List[dict]` | `{id, subject_id, object_id, predicate_id}` — ids refer to `objects[*].id` | --- ## Usage ```python from datasets import load_dataset import json ds = load_dataset("maelic/GQA200-coco-format") # Recover label maps from the embedded metadata meta = json.loads(ds["train"].info.description) cat_id2name = {c["id"]: c["name"] for c in meta["categories"]} pred_id2name = {c["id"]: c["name"] for c in meta["rel_categories"]} sample = ds["train"][0] image = sample["image"] # PIL Image for obj in sample["objects"]: print(cat_id2name[obj["category_id"]], obj["bbox"]) for rel in sample["relations"]: print(rel["subject_id"], "--", pred_id2name[rel["predicate_id"]], "->", rel["object_id"]) ``` --- ## Citation If you use this dataset, please cite GQA: ```bibtex @inproceedings{hudson2019gqa, title={Gqa: A new dataset for real-world visual reasoning and compositional question answering}, author={Hudson, Drew A and Manning, Christopher D}, booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, pages={6700--6709}, year={2019} } ``` And also the paper that established the GQA-200 split: ```bibtex @inproceedings{dong2022stacked, title={Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation}, author={Dong, Xingning and Gan, Tian and Song, Xuemeng and Wu, Jianlong and Cheng, Yuan and Nie, Liqiang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19427--19436}, year={2022} } ``` And the REACT paper if you use the SGG-Benchmark models: ```bibtex @inproceedings{Neau_2025_BMVC, author = {Ma\"elic Neau and Paulo Eduardo Santos and Anne-Gwenn Bosser and Akihiro Sugimoto and Cedric Buche}, title = {REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation}, booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025}, publisher = {BMVA}, year = {2025}, url = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_239/paper.pdf}, } ``` --- ## License The GQA images and annotations are released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) license.

提供机构：

maelic

5,000+

优质数据集

54 个

任务类型

进入经典数据集