five

bongdong/VisDoTQA

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/bongdong/VisDoTQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: VisDoTQA language: - en license: unknown task_categories: - visual-question-answering - question-answering size_categories: - 1K<n<10K source_datasets: - original annotations_creators: - machine-generated language_creators: - machine-generated multilinguality: - monolingual tags: - chart - chart-understanding - multimodal - vision-language - reasoning - synthetic - benchmark --- # VisDoTQA: Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought This repository releases **VisDoTQA**, the public benchmark introduced in our paper [*VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought*](https://aclanthology.org/2026.findings-eacl.30/). The canonical publication record is the **ACL Anthology** page for **Findings of the Association for Computational Linguistics: EACL 2026**, and an additional mirror is available on **arXiv:2603.11631**. See our [GitHub repository](https://github.com/bongdong22/VisDoTQA) for the synchronized source release and updates. See the paper on [ACL Anthology](https://aclanthology.org/2026.findings-eacl.30/), on [arXiv:2603.11631](https://arxiv.org/abs/2603.11631), or via [DOI](https://doi.org/10.18653/v1/2026.findings-eacl.30). ## Highlights - We release **VisDoTQA**, a public benchmark for evaluating **visual grounding** and **compositional reasoning** on chart images. - The benchmark contains **1,120 QA pairs** built from **609 held-out charts**. - VisDoTQA covers four perceptual task families: **Position**, **Length**, **Pattern**, and **Extract**. - This Hugging Face repository releases the **public benchmark test split only**. The full research dataset described in the paper contains **331,969 QA pairs** and is not included here. ## Dataset Structure - **Split**: `test` - **Images**: `test/images/` - **Metadata**: `test/metadata.jsonl` Each example contains: - `file_name`: relative path to the chart image - `imgname`: image filename - `query`: benchmark question - `label`: ground-truth answer - `source`: VisDoTQA task category (`Position`, `Length`, `Pattern`, `Extract`) ## Links - [Paper (ACL Anthology, canonical)](https://aclanthology.org/2026.findings-eacl.30/) - [Paper (arXiv:2603.11631 mirror)](https://arxiv.org/abs/2603.11631) - [DOI](https://doi.org/10.18653/v1/2026.findings-eacl.30) - [GitHub Repository](https://github.com/bongdong22/VisDoTQA) ## Contact If you have questions about this dataset release, please use the [GitHub repository](https://github.com/bongdong22/VisDoTQA). ## Citation ```bibtex @inproceedings{lee2026visdot, title={VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought}, author={Lee, Eunsoo and Lee, Jeongwoo and Hong, Minki and Choi, Jangho and Kim, Jihie}, booktitle={Findings of the Association for Computational Linguistics: EACL 2026}, pages={610--640}, year={2026}, doi={10.18653/v1/2026.findings-eacl.30}, url={https://aclanthology.org/2026.findings-eacl.30/} } ```
提供机构:
bongdong
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作