bharatgenai/TORQUE
收藏Hugging Face2025-11-22 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/bharatgenai/TORQUE
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image_name
dtype: string
- name: image
dtype: image
- name: post_corrected_html
dtype: string
- name: question_1
dtype: string
- name: answer_1
dtype: string
- name: question_2
dtype: string
- name: answer_2
dtype: string
- name: question_3
dtype: string
- name: answer_3
dtype: string
license: cc-by-4.0
task_categories:
- table-question-answering
- visual-question-answering
language:
- en
- hi
pretty_name: 'TORQUE: Table Oriented Reconstruction and Question-answering Upon dEvanagari'
size_categories:
- 1K<n<10K
---
# TORQUE: Table Oriented Reconstruction and Question-answering Upon dEvanagari
## Curated Benchmark
**TORQUE** — *Table Oriented Reconstruction and Question-answering Upon dEvanagari* —
is a **curated Hindi benchmark** for evaluating:
- **Hindi Table Reconstruction**, and
- **Hindi Tabular Visual Question Answering (Hindi TabVQA)**
It provides a comprehensive testbed for evaluating **Vision-Language Models (VLMs)** and **OCR-based pipelines** in **Devanagari-script** contexts. The dataset includes both **scanned** and **digital-born** tables, ensuring coverage of diverse real-world document types.
---
## Dataset Composition
| Component | Description | Quantity / Details |
|------------|--------------|--------------------|
| **Scanned tables** | From government circulars and official reports | **109** |
| **Digital-born tables** | From spiritual texts and printed sources (MUSTARD subset) | **101** |
| **Structure Type** | Simple / Complex distribution | **149 Simple**, **61 Complex** |
| **QA Pairs** | Verified Hindi question–answer pairs (GPT-oss-20B generated + manual check) | **422** |
| **HTML Representations** | ChatGPT-4o generated | Manually post-corrected |
| **Images** | Cropped table images (`.jpg`) | `images.zip` |
| **Tables** | Total tables | **210** |
---
## Dataset Files
| File | Description |
|------|--------------|
| `torque_images.zip` | All 210 cropped table images in `.jpg` format. |
| `torquw_html.csv` | Model-generated raw HTML outputs and manually post corrected HTMLs. |
| `torque_qa_pairs.csv` | Hindi QA pairs mapped to corresponding table images. |
**CSV Columns:**
- `image_name` — Unique identifier
- `Table HTML` / `Manually Post corrected HTML` — Table HTML content
- `question`, `answer` — TabVQA pairs in Hindi
---
## Example Entry
```json
[
{
"Image_name": "hin-1.png",
"Post_corrected_HTML": "<table><tr><td>संकेतक</td><td>औसत*</td><td>सबसे अच्छा प्रदर्शन</td><td>सबसे खराब प्रदर्शन</td></tr><tr><td>स्टंटेड बच्चे ( पांच वर्ष तक की आयु )</td><td>38.7%</td><td>केरल: 19.4%<br>गोवा: 21.3%<br>तमिलनाडू: 23.3%</td><td>उत्तर प्रदेश: 50.4%<br>बिहार: 49.4%<br>झारखंड: 47.4%</td></tr><tr><td>वेस्टेड बच्चे ( पांच वर्ष तक की आयु )</td><td>15.1%</td><td>सिक्किम: 5.1% <br>मणिपुर:7.1%<br>जम्मू और कश्मीर: 8.1</td><td>आंध्र प्रदेश: 19.0%<br>तमिलनाडु: 19.0%<br>गुजरात: 18.7%</td></tr><tr><td>अंडरवेट बच्चे ( पांच वर्ष तक की आयु )</td><td>29.4%</td><td>मणिपुर: 14.1%<br>मिजोरम: 14.8%<br>जम्मू और कश्मीर: 15.6%</td><td>झारखंड: 42.1%<br>बिहार: 37.1%<br>मध्य प्रदेश: 36.1%</td></tr><tr><td colspan=\"4\">\n नोट: *संबंधित जनसंख्या का प्रतिशत\n स्रोत: रैपिड सर्वे ऑफ चिल्ड्रन (आरएसओसी), 2014\n </td></tr></table>"
"question": "वेस्टेड बच्चों का औसत प्रतिशत क्या है?",
"answer": "15.10%",
}
]
```
提供机构:
bharatgenai



