GQA-ru
收藏魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/deepvk/GQA-ru
下载链接
链接失效反馈官方服务:
资源简介:
# GQA-ru
This is a translated version of original [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html) dataset and
stored in format supported for [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval) pipeline.
For this dataset, we:
1. Translate the original one with `gpt-4-turbo`
2. Filter out unsuccessful translations, i.e. where the model protection was triggered
3. Manually validate most common errors
## Dataset Structure
Dataset includes both train and test splits translated from original `train_balanced` and `testdev_balanced`.
Train split includes 27519 images with 40000 questions to them and test split contains 398 images with 12216 different question to them.
Storage format is similar to [`lmms-lab/GQA`](https://huggingface.co/datasets/lmms-lab/GQA), key fields:
* `id`: ID of a question
* `imageId`: ID of an image (images stored in a separate table)
* `question`: text of a question
* `answer`: one word answer
* `fullAnswer`: detailed answer
## Usage
The easiest way to evaluate model on `GQA-ru` is through [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)
For example, to evaluate [`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b):
```bash
accelerate launch -m lmms_eval --model llava_hf \
--model_args pretrained="deepvk/llava-saiga-8b" \
--tasks gqa-ru --batch_size 1 \
--log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/
```
This would print a table with a result, the main metric for this task is `ExactMatch` for one word answer -- whether generated word is completely similar to ground truth.
## Citation
```
@inproceedings{hudson2019gqa,
title={Gqa: A new dataset for real-world visual reasoning and compositional question answering},
author={Hudson, Drew A and Manning, Christopher D},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={6700--6709},
year={2019}
}
```
```
@misc{deepvk2024gqa_ru,
title={GQA-ru},
author={Belopolskih, Daniil and Spirin, Egor},
url={https://huggingface.co/datasets/deepvk/GQA-ru},
publisher={Hugging Face}
year={2024},
}
```
# GQA-ru
本数据集为原始[GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html)的译制版,采用适配[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线的格式存储。
针对本数据集,我们完成了以下工作:
1. 使用`gpt-4-turbo`对原始数据集进行翻译
2. 过滤掉翻译失败的样本,即触发模型内容保护机制的翻译结果
3. 对常见翻译错误进行人工校验
## 数据集结构
数据集包含从原始`train_balanced`与`testdev_balanced`拆分转换而来的训练集与测试集。训练集涵盖27519张图像及对应40000个问题,测试集包含398张图像及12216个不同的问题。
其存储格式与[`lmms-lab/GQA`](https://huggingface.co/datasets/lmms-lab/GQA)一致,核心字段如下:
* `id`:问题唯一标识符
* `imageId`:图像ID(图像单独存储于独立数据表中)
* `question`:问题文本
* `answer`:单字词标准答案
* `fullAnswer`:详细标准答案
## 使用方式
通过[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)是在GQA-ru上评估模型的最简途径。
例如,评估[`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b)的命令如下:
bash
accelerate launch -m lmms_eval --model llava_hf
--model_args pretrained="deepvk/llava-saiga-8b"
--tasks gqa-ru --batch_size 1
--log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/
该命令将输出包含评估结果的表格,本任务的核心评价指标为**精确匹配(ExactMatch)**,即生成答案与标准答案完全一致的比例。
## 引用信息
@inproceedings{hudson2019gqa,
title={GQA:面向真实世界视觉推理与组合式问答的新型数据集},
author={Hudson, Drew A and Manning, Christopher D},
booktitle={IEEE/CVF计算机视觉与模式识别会议论文集},
pages={6700--6709},
year={2019}
}
@misc{deepvk2024gqa_ru,
title={GQA-ru},
author={Belopolskih, Daniil and Spirin, Egor},
url={https://huggingface.co/datasets/deepvk/GQA-ru},
publisher={Hugging Face},
year={2024}
}
提供机构:
maas
创建时间:
2025-08-01



