five

bharatgenai/TORQUE

收藏
Hugging Face2025-11-22 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/bharatgenai/TORQUE
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image_name dtype: string - name: image dtype: image - name: post_corrected_html dtype: string - name: question_1 dtype: string - name: answer_1 dtype: string - name: question_2 dtype: string - name: answer_2 dtype: string - name: question_3 dtype: string - name: answer_3 dtype: string license: cc-by-4.0 task_categories: - table-question-answering - visual-question-answering language: - en - hi pretty_name: 'TORQUE: Table Oriented Reconstruction and Question-answering Upon dEvanagari' size_categories: - 1K<n<10K --- # TORQUE: Table Oriented Reconstruction and Question-answering Upon dEvanagari ## Curated Benchmark **TORQUE** — *Table Oriented Reconstruction and Question-answering Upon dEvanagari* — is a **curated Hindi benchmark** for evaluating: - **Hindi Table Reconstruction**, and - **Hindi Tabular Visual Question Answering (Hindi TabVQA)** It provides a comprehensive testbed for evaluating **Vision-Language Models (VLMs)** and **OCR-based pipelines** in **Devanagari-script** contexts. The dataset includes both **scanned** and **digital-born** tables, ensuring coverage of diverse real-world document types. --- ## Dataset Composition | Component | Description | Quantity / Details | |------------|--------------|--------------------| |  **Scanned tables** | From government circulars and official reports | **109** | |  **Digital-born tables** | From spiritual texts and printed sources (MUSTARD subset) | **101** | | **Structure Type** | Simple / Complex distribution | **149 Simple**, **61 Complex** | | **QA Pairs** | Verified Hindi question–answer pairs (GPT-oss-20B generated + manual check) | **422** | | **HTML Representations** | ChatGPT-4o generated | Manually post-corrected | | **Images** | Cropped table images (`.jpg`) | `images.zip` | | **Tables** | Total tables | **210** | --- ## Dataset Files | File | Description | |------|--------------| | `torque_images.zip` | All 210 cropped table images in `.jpg` format. | | `torquw_html.csv` | Model-generated raw HTML outputs and manually post corrected HTMLs. | | `torque_qa_pairs.csv` | Hindi QA pairs mapped to corresponding table images. | **CSV Columns:** - `image_name` — Unique identifier - `Table HTML` / `Manually Post corrected HTML` — Table HTML content - `question`, `answer` — TabVQA pairs in Hindi --- ## Example Entry ```json [ { "Image_name": "hin-1.png", "Post_corrected_HTML": "<table><tr><td>संकेतक</td><td>औसत*</td><td>सबसे अच्छा प्रदर्शन</td><td>सबसे खराब प्रदर्शन</td></tr><tr><td>स्टंटेड बच्चे ( पांच वर्ष तक की आयु )</td><td>38.7%</td><td>केरल: 19.4%<br>गोवा: 21.3%<br>तमिलनाडू: 23.3%</td><td>उत्तर प्रदेश: 50.4%<br>बिहार: 49.4%<br>झारखंड: 47.4%</td></tr><tr><td>वेस्टेड बच्चे ( पांच वर्ष तक की आयु )</td><td>15.1%</td><td>सिक्किम: 5.1% <br>मणिपुर:7.1%<br>जम्मू और कश्मीर: 8.1</td><td>आंध्र प्रदेश: 19.0%<br>तमिलनाडु: 19.0%<br>गुजरात: 18.7%</td></tr><tr><td>अंडरवेट बच्चे ( पांच वर्ष तक की आयु )</td><td>29.4%</td><td>मणिपुर: 14.1%<br>मिजोरम: 14.8%<br>जम्मू और कश्मीर: 15.6%</td><td>झारखंड: 42.1%<br>बिहार: 37.1%<br>मध्य प्रदेश: 36.1%</td></tr><tr><td colspan=\"4\">\n नोट: *संबंधित जनसंख्या का प्रतिशत\n स्रोत: रैपिड सर्वे ऑफ चिल्ड्रन (आरएसओसी), 2014\n </td></tr></table>" "question": "वेस्टेड बच्चों का औसत प्रतिशत क्या है?", "answer": "15.10%", } ] ```
提供机构:
bharatgenai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作