NusaBharat/INDOTABVQA
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/NusaBharat/INDOTABVQA
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: image
dtype: image
- name: question
dtype: string
- name: answer
dtype: string
- name: type
dtype: string
- name: category
dtype: string
splits:
- name: train_id
num_bytes: 219033103
num_examples: 500
- name: test_id
num_bytes: 454218656
num_examples: 1043
- name: val_id
num_bytes: 19063004
num_examples: 50
- name: train_en
num_bytes: 219034049
num_examples: 500
- name: test_en
num_bytes: 453862218
num_examples: 1043
- name: val_en
num_bytes: 19063077
num_examples: 50
- name: train_hi
num_bytes: 219076727
num_examples: 500
- name: test_hi
num_bytes: 453954425
num_examples: 1043
- name: val_hi
num_bytes: 19067265
num_examples: 50
- name: train_ar
num_bytes: 219048465
num_examples: 500
- name: test_ar
num_bytes: 453895425
num_examples: 1043
- name: val_ar
num_bytes: 19064285
num_examples: 50
download_size: 2769417141
dataset_size: 2768380699
configs:
- config_name: default
data_files:
- split: train_id
path: data/train_id-*
- split: test_id
path: data/test_id-*
- split: val_id
path: data/val_id-*
- split: train_en
path: data/train_en-*
- split: test_en
path: data/test_en-*
- split: val_en
path: data/val_en-*
- split: train_hi
path: data/train_hi-*
- split: test_hi
path: data/test_hi-*
- split: val_hi
path: data/val_hi-*
- split: train_ar
path: data/train_ar-*
- split: test_ar
path: data/test_ar-*
- split: val_ar
path: data/val_ar-*
---
# INDOTABVQA 📊
## Cross-Lingual Table Visual Question Answering Benchmark for Document Images
This repository contains the dataset for the paper:
## 📄 <span style="color:#2E86C1;">INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents </span>(<span style="color:#E74C3C;"><b>ACL 2026 Findings</b></span>)
## INDOTABVQA evaluates Vision-Language Models (VLMs) on:
- 🌐 Cross-lingual understanding
- 🔢 Numerical & structural reasoning over tables
- 📄 Document-level table comprehension
The dataset focuses on real-world document images in Bahasa Indonesia, with multilingual QA pairs enabling both monolingual and cross-lingual evaluation.
Languages:
- Bahasa Indonesia (ID)
- English (EN)
- Hindi (HI)
- Arabic (AR)
Table Types:
- Bordered
- Borderless
- Colorful
Domains: Government, Finance, Education, Health
Each document contains one or more tables, reflecting real-world complexity and layout diversity.
# Evaluation Settings
INDOTABVQA supports three evaluation scenarios:
1. Zero-Shot
- No task-specific training
- Tests out-of-the-box VLM capability
2. Fine-Tuned
- Models trained on INDOTABVQA training split
- Evaluates domain adaptation
3. Fine-Tuned + Spatial Priors
- Adds table bounding boxes (from detectors like YOLOv9)
- Improves localization and reasoning
# Leaderboard (Test Set)
Metric: In-Match Accuracy (%) ↑
| Model | #Params | ID | EN | HI | AR | Avg |
|-------------------------|---------|------|------|------|------|------|
| Donut | — | 10.5 | 5.5 | 4.7 | 4.4 | 6.2 |
| Qwen2.5-VL | 3B | 37.8 | 28.7 | 4.1 | 16.4 | 21.9 |
| Gemma-3 | 12B | 40.9 | 27.4 | 19.5 | 17.4 | 26.1 |
| Qwen2.5-VL | 7B | 54.8 | 36.2 | 17.3 | 23.0 | 32.9 |
| LLaMA-3.2-V | 11B | 57.4 | 30.8 | 15.5 | 19.4 | 30.7 |
| GPT-4o | — | 72.2 | 44.6 | 26.0 | 21.4 | 41.1 |
| INDOTABVQA (fine-tuned) | 3B | 66.4 | 46.1 | 22.1 | 25.8 | 39.7 |
| INDOTABVQA (fine-tuned) | 7B | 71.9 | 51.6 | 26.2 | 28.1 | 44.5 |
| INDOTABVQA + Spatial | 3B | 73.1 | 54.8 | 27.2 | 31.1 | 46.6 |
| INDOTABVQA + Spatial | 7B | 78.3 | 58.4 | 29.4 | 32.8 | 48.5 |
- ID: Bahasa Indonesia, EN: English, HI: Hindi, AR: Arabic
- Spatial = Table bounding boxes provided as additional input
## Why INDOTABVQA?
Unlike prior datasets:
- ✅ Focuses on low-resource languages (Bahasa Indonesia)
- ✅ Supports true cross-lingual VQA
- ✅ Emphasizes table-specific reasoning
- ✅ Includes layout diversity + spatial annotations
Access
📂 Dataset: https://huggingface.co/datasets/NusaBharat/INDOTABVQA
Contact:
For queries, please contact:
- Somraj Gautam (somrajbg9@gmail.com)
- Anathapindika Dravichi (dravichijan@gmail.com)
提供机构:
NusaBharat



