bongdong/VisDoTQA

Name: bongdong/VisDoTQA
Creator: bongdong
Published: 2026-04-17 07:16:30
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/bongdong/VisDoTQA

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: VisDoTQA language: - en license: unknown task_categories: - visual-question-answering - question-answering size_categories: - 1K<n<10K source_datasets: - original annotations_creators: - machine-generated language_creators: - machine-generated multilinguality: - monolingual tags: - chart - chart-understanding - multimodal - vision-language - reasoning - synthetic - benchmark --- # VisDoTQA: Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought This repository releases **VisDoTQA**, the public benchmark introduced in our paper [*VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought*](https://aclanthology.org/2026.findings-eacl.30/). The canonical publication record is the **ACL Anthology** page for **Findings of the Association for Computational Linguistics: EACL 2026**, and an additional mirror is available on **arXiv:2603.11631**. See our [GitHub repository](https://github.com/bongdong22/VisDoTQA) for the synchronized source release and updates. See the paper on [ACL Anthology](https://aclanthology.org/2026.findings-eacl.30/), on [arXiv:2603.11631](https://arxiv.org/abs/2603.11631), or via [DOI](https://doi.org/10.18653/v1/2026.findings-eacl.30). ## Highlights - We release **VisDoTQA**, a public benchmark for evaluating **visual grounding** and **compositional reasoning** on chart images. - The benchmark contains **1,120 QA pairs** built from **609 held-out charts**. - VisDoTQA covers four perceptual task families: **Position**, **Length**, **Pattern**, and **Extract**. - This Hugging Face repository releases the **public benchmark test split only**. The full research dataset described in the paper contains **331,969 QA pairs** and is not included here. ## Dataset Structure - **Split**: `test` - **Images**: `test/images/` - **Metadata**: `test/metadata.jsonl` Each example contains: - `file_name`: relative path to the chart image - `imgname`: image filename - `query`: benchmark question - `label`: ground-truth answer - `source`: VisDoTQA task category (`Position`, `Length`, `Pattern`, `Extract`) ## Links - [Paper (ACL Anthology, canonical)](https://aclanthology.org/2026.findings-eacl.30/) - [Paper (arXiv:2603.11631 mirror)](https://arxiv.org/abs/2603.11631) - [DOI](https://doi.org/10.18653/v1/2026.findings-eacl.30) - [GitHub Repository](https://github.com/bongdong22/VisDoTQA) ## Contact If you have questions about this dataset release, please use the [GitHub repository](https://github.com/bongdong22/VisDoTQA). ## Citation ```bibtex @inproceedings{lee2026visdot, title={VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought}, author={Lee, Eunsoo and Lee, Jeongwoo and Hong, Minki and Choi, Jangho and Kim, Jihie}, booktitle={Findings of the Association for Computational Linguistics: EACL 2026}, pages={610--640}, year={2026}, doi={10.18653/v1/2026.findings-eacl.30}, url={https://aclanthology.org/2026.findings-eacl.30/} } ```

提供机构：

bongdong

5,000+

优质数据集

54 个

任务类型

进入经典数据集