Siluni/sinhala-vqa-dataset
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Siluni/sinhala-vqa-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- si
task_categories:
- visual-question-answering
pretty_name: Sinhala VQA
size_categories:
- 10K<n<100K
tags:
- sinhala
- vqa
- low-resource
- multimodal
- visual-genome
---
# Sinhala VQA Dataset
A Sinhala-language Visual Question Answering dataset of 37,318 QA pairs, constructed by translating [Visual Genome](https://visualgenome.org/) QA annotations into Sinhala using gemini-3-flash-preview. This dataset was developed as part of research on benchmarking and adapting compact multimodal models for Sinhala VQA under low-resource conditions.
## Dataset Summary
| Split | Samples |
|------------|---------|
| Train | 33,409 |
| Validation | 2,909 |
| Test | 1,000 |
| **Total** | 37,318 |
## Schema
Each row contains:
| Field | Type | Description |
|------------|--------|----------------------------------------------------------------|
| `qa_id` | int64 | QA pair ID — directly corresponds to the Visual Genome QA ID |
| `image_id` | int64 | Image ID — directly corresponds to the Visual Genome image ID |
| `question` | string | Question in Sinhala |
| `answer` | string | Answer in Sinhala |
## Images
**Images are not included** in this dataset. They must be downloaded separately from Visual Genome:
- **Version**: Visual Genome Version 1.2 (completed August 29, 2016)
- **Download**: https://homes.cs.washington.edu/~ranjay/visualgenome/api.html
- The `image_id` field in each row maps directly to the corresponding image in Visual Genome v1.2.
## Construction
QA pairs from Visual Genome v1.2 were translated from English to Sinhala using the gemini-3-flash-preview API. The source dataset is the Visual Genome QA subset (`question_answers.json`).
## License
The annotations in this dataset (questions and answers) are released under **CC-BY 4.0**.
The underlying images are sourced from Visual Genome and remain under [Visual Genome's own license](https://homes.cs.washington.edu/~ranjay/visualgenome/api.html).
## Citation
If you use this dataset, please cite:
```bibtex
@misc{keerthiratne2025sinhalavqa,
title = {Benchmarking and Adapting Compact Multimodal Models for Sinhala Visual Question Answering},
author = {Keerthiratne, Siluni and Weerasinghe, Ruvan and Sumanathilaka, Deshan},
year = {2025},
institution = {Informatics Institute of Technology / Robert Gordon University},
note = {Dataset available at https://huggingface.co/datasets/Siluni/sinhala-vqa-dataset}
}
```
## Contact
Siluni Keerthiratne — Informatics Institute of Technology, Sri Lanka
提供机构:
Siluni



