JA-VG-VQA-500

Name: JA-VG-VQA-500
Creator: maas
Published: 2025-12-26 16:20:49
License: 暂无描述

魔搭社区2025-12-26 更新2025-01-18 收录

下载链接：

https://modelscope.cn/datasets/SakanaAI/JA-VG-VQA-500

下载链接

链接失效反馈

官方服务：

资源简介：

# JA-VG-VQA-500 ## Dataset Description **JA-VG-VQA-500** is a 500-sample subset of [Japanese Visual Genome VQA dataset](https://github.com/yahoojapan/ja-vg-vqa). This dataset was used in the evaluation of [EvoVLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B). Please refer to our [report](https://arxiv.org/abs/2403.13187) and [blog](https://sakana.ai/evolutionary-model-merge/) for more details. We are grateful to the developers for making the dataset available under [Creative Commons Attribution 4.0 License](https://creativecommons.org/licenses/by/4.0/legalcode). - [Visual Genome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html) - [Japanese Visual Genome VQA dataset](https://github.com/yahoojapan/ja-vg-vqa) ## Usage Use the code below to get started with the dataset. ```python from datasets import load_dataset dataset = load_dataset("SakanaAI/JA-VG-VQA-500", split="test") ``` See [our GitHub repository](https://github.com/SakanaAI/evolutionary-model-merge) to evaluate Japanese VLMs. ## Acknowledgement We would like to thank the developers of the source datasets for their contributions and for making their work available. ## Citation ```bibtex @article{Krishna2016VisualGC, title = {Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations}, author. = {Ranjay Krishna and Yuke Zhu and Oliver Groth and Justin Johnson and Kenji Hata and Joshua Kravitz and Stephanie Chen and Yannis Kalantidis and Li-Jia Li and David A. Shamma and Michael S. Bernstein and Li Fei-Fei}, journal = {International Journal of Computer Vision}, year. = {2017}, volume. = {123}, pages. = {32-73}, URL = {https://doi.org/10.1007/s11263-016-0981-7}, doi = {10.1007/s11263-016-0981-7} } ``` ```bibtex @InProceedings{C18-1163, author = "Shimizu, Nobuyuki and Rong, Na and Miyazaki, Takashi", title = "Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps", booktitle = "Proceedings of the 27th International Conference on Computational Linguistics", year = "2018", publisher = "Association for Computational Linguistics", pages = "1918--1928", location = "Santa Fe, New Mexico, USA", url = "http://aclweb.org/anthology/C18-1163" } ```

# JA-VG-VQA-500 ## 数据集描述 **JA-VG-VQA-500** 是[日语视觉基因组视觉问答数据集（Japanese Visual Genome VQA dataset）](https://github.com/yahoojapan/ja-vg-vqa)的500样本子集。本数据集被用于评估[EvoVLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)。如需了解更多细节，请参阅我们的[研究报告](https://arxiv.org/abs/2403.13187)与[技术博客](https://sakana.ai/evolutionary-model-merge/)。我们感谢数据集开发者将其基于[知识共享署名4.0许可协议（Creative Commons Attribution 4.0 License）](https://creativecommons.org/licenses/by/4.0/legalcode)发布。 - [视觉基因组（Visual Genome）](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html) - [日语视觉基因组视觉问答数据集（Japanese Visual Genome VQA dataset）](https://github.com/yahoojapan/ja-vg-vqa) ## 使用方法使用以下代码快速上手本数据集： python from datasets import load_dataset dataset = load_dataset("SakanaAI/JA-VG-VQA-500", split="test") 如需评估日语视觉语言模型（Vision-Language Model, VLM），请访问我们的[GitHub仓库](https://github.com/SakanaAI/evolutionary-model-merge)。 ## 致谢我们谨向源数据集的开发者致以谢意，感谢其贡献与开源分享其研究成果。 ## 引用 bibtex @article{Krishna2016VisualGC, title = {Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations}, author. = {Ranjay Krishna and Yuke Zhu and Oliver Groth and Justin Johnson and Kenji Hata and Joshua Kravitz and Stephanie Chen and Yannis Kalantidis and Li-Jia Li and David A. Shamma and Michael S. Bernstein and Li Fei-Fei}, journal = {International Journal of Computer Vision}, year. = {2017}, volume. = {123}, pages. = {32-73}, URL = {https://doi.org/10.1007/s11263-016-0981-7}, doi = {10.1007/s11263-016-0981-7} } bibtex @InProceedings{C18-1163, author = "Shimizu, Nobuyuki and Rong, Na and Miyazaki, Takashi", title = "Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps", booktitle = "Proceedings of the 27th International Conference on Computational Linguistics", year = "2018", publisher = "Association for Computational Linguistics", pages = "1918--1928", location = "Santa Fe, New Mexico, USA", url = "http://aclweb.org/anthology/C18-1163" }

提供机构：

maas

创建时间：

2025-01-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集