Siluni/sinhala-vqa-dataset

Name: Siluni/sinhala-vqa-dataset
Creator: Siluni
Published: 2026-04-05 16:36:34
License: 暂无描述

Hugging Face2026-04-05 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Siluni/sinhala-vqa-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - si task_categories: - visual-question-answering pretty_name: Sinhala VQA size_categories: - 10K<n<100K tags: - sinhala - vqa - low-resource - multimodal - visual-genome --- # Sinhala VQA Dataset A Sinhala-language Visual Question Answering dataset of 37,318 QA pairs, constructed by translating [Visual Genome](https://visualgenome.org/) QA annotations into Sinhala using gemini-3-flash-preview. This dataset was developed as part of research on benchmarking and adapting compact multimodal models for Sinhala VQA under low-resource conditions. ## Dataset Summary | Split | Samples | |------------|---------| | Train | 33,409 | | Validation | 2,909 | | Test | 1,000 | | **Total** | 37,318 | ## Schema Each row contains: | Field | Type | Description | |------------|--------|----------------------------------------------------------------| | `qa_id` | int64 | QA pair ID — directly corresponds to the Visual Genome QA ID | | `image_id` | int64 | Image ID — directly corresponds to the Visual Genome image ID | | `question` | string | Question in Sinhala | | `answer` | string | Answer in Sinhala | ## Images **Images are not included** in this dataset. They must be downloaded separately from Visual Genome: - **Version**: Visual Genome Version 1.2 (completed August 29, 2016) - **Download**: https://homes.cs.washington.edu/~ranjay/visualgenome/api.html - The `image_id` field in each row maps directly to the corresponding image in Visual Genome v1.2. ## Construction QA pairs from Visual Genome v1.2 were translated from English to Sinhala using the gemini-3-flash-preview API. The source dataset is the Visual Genome QA subset (`question_answers.json`). ## License The annotations in this dataset (questions and answers) are released under **CC-BY 4.0**. The underlying images are sourced from Visual Genome and remain under [Visual Genome's own license](https://homes.cs.washington.edu/~ranjay/visualgenome/api.html). ## Citation If you use this dataset, please cite: ```bibtex @misc{keerthiratne2025sinhalavqa, title = {Benchmarking and Adapting Compact Multimodal Models for Sinhala Visual Question Answering}, author = {Keerthiratne, Siluni and Weerasinghe, Ruvan and Sumanathilaka, Deshan}, year = {2025}, institution = {Informatics Institute of Technology / Robert Gordon University}, note = {Dataset available at https://huggingface.co/datasets/Siluni/sinhala-vqa-dataset} } ``` ## Contact Siluni Keerthiratne — Informatics Institute of Technology, Sri Lanka

提供机构：

Siluni

5,000+

优质数据集

54 个

任务类型

进入经典数据集