five

maelic/GQA200-coco-format

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/maelic/GQA200-coco-format
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - object-detection tags: - scene-graph-generation - visual-relationship-detection - gqa - coco-format language: - en pretty_name: GQA — General Question Answering (COCO format) size_categories: - 100K<n<1M --- # GQA — General Question Answering (COCO format) This dataset is the **GQA200** split of [the GQA dataset](https://cs.stanford.edu/people/dorarad/gqa/about.html) (Hudson et al., 2019), reformatted in standard COCO-JSON format. GQA200 contains the top 200 object categories and 100 relations from the original GQA dataset, selected by frequency in the [Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation paper](https://arxiv.org/abs/2203.09811). This dataset has no official test split since it was used for question answering rather than scene graph generation (for test there is no scene graph annotations). This version in COCO format was produced as part of the [SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and used to train the models described in the **REACT++** paper ([Neau et al., 2026](https://arxiv.org/abs/2603.06386)). --- ## Annotation overview Each image comes with: - **Object bounding boxes** — 200 GQA object categories. - **Scene-graph relations** — 100 predicate categories connecting pairs of objects as directed `(subject, predicate, object)` triplets. ![Annotation example — val split](gqa200_samples_val.png) *Four random validation images with bounding boxes (coloured by category) and relation arrows (yellow, labelled with the predicate name).* --- ## Dataset statistics | Split | Images | Object annotations | Relations | |-------|--------:|-------------------:|-----------:| | train | 57 623 | 775 744 | 238 720 | | val | 8 209 | 110 030 | 33 487 | --- ## Object categories (200) Top-200 GQA object vocabulary used by the standard SGG split. Full list embedded in `dataset_info.description`. ## Predicate categories (100) Top 100 GQA predicate vocabulary used by the standard SGG split. Full list embedded in `dataset_info.description`. --- ## Dataset structure ```python DatasetDict({ train: Dataset({ features: ['image', 'image_id', 'width', 'height', 'file_name', 'objects', 'relations'], num_rows: 57623 }), val: Dataset({ features: ['image', 'image_id', 'width', 'height', 'file_name', 'objects', 'relations'], num_rows: 8209 }), }) ``` Each row contains: | Field | Type | Description | |-------|------|-------------| | `image` | `Image` | PIL image | | `image_id` | `int` | Original GQA200 image id | | `width` / `height` | `int` | Image dimensions | | `file_name` | `str` | Original filename | | `objects` | `List[dict]` | `{id, category_id, bbox (xywh), area, iscrowd, segmentation}` | | `relations` | `List[dict]` | `{id, subject_id, object_id, predicate_id}` — ids refer to `objects[*].id` | --- ## Usage ```python from datasets import load_dataset import json ds = load_dataset("maelic/GQA200-coco-format") # Recover label maps from the embedded metadata meta = json.loads(ds["train"].info.description) cat_id2name = {c["id"]: c["name"] for c in meta["categories"]} pred_id2name = {c["id"]: c["name"] for c in meta["rel_categories"]} sample = ds["train"][0] image = sample["image"] # PIL Image for obj in sample["objects"]: print(cat_id2name[obj["category_id"]], obj["bbox"]) for rel in sample["relations"]: print(rel["subject_id"], "--", pred_id2name[rel["predicate_id"]], "->", rel["object_id"]) ``` --- ## Citation If you use this dataset, please cite GQA: ```bibtex @inproceedings{hudson2019gqa, title={Gqa: A new dataset for real-world visual reasoning and compositional question answering}, author={Hudson, Drew A and Manning, Christopher D}, booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, pages={6700--6709}, year={2019} } ``` And also the paper that established the GQA-200 split: ```bibtex @inproceedings{dong2022stacked, title={Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation}, author={Dong, Xingning and Gan, Tian and Song, Xuemeng and Wu, Jianlong and Cheng, Yuan and Nie, Liqiang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19427--19436}, year={2022} } ``` And the REACT paper if you use the SGG-Benchmark models: ```bibtex @inproceedings{Neau_2025_BMVC, author = {Ma\"elic Neau and Paulo Eduardo Santos and Anne-Gwenn Bosser and Akihiro Sugimoto and Cedric Buche}, title = {REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation}, booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025}, publisher = {BMVA}, year = {2025}, url = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_239/paper.pdf}, } ``` --- ## License The GQA images and annotations are released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) license.
提供机构:
maelic
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作