five

krutrim-ai-labs/IndicVisionBench

收藏
Hugging Face2026-03-11 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/krutrim-ai-labs/IndicVisionBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: mmt features: - name: id dtype: string - name: image dtype: image - name: topic dtype: string - name: State/UT dtype: string - name: English dtype: string - name: Hindi dtype: string - name: Bengali dtype: string - name: Gujarati dtype: string - name: Kannada dtype: string - name: Malayalam dtype: string - name: Marathi dtype: string - name: Odia dtype: string - name: Punjabi dtype: string - name: Tamil dtype: string - name: Telugu dtype: string - name: source_url dtype: string splits: - name: test num_bytes: 14424797 num_examples: 106 download_size: 13255747 dataset_size: 14424797 - config_name: ocr features: - name: id dtype: string - name: image dtype: image - name: text dtype: string - name: language dtype: string - name: page_url dtype: string splits: - name: test num_bytes: 614014454 num_examples: 876 download_size: 612223184 dataset_size: 614014454 - config_name: vqa_en features: - name: id dtype: string - name: image dtype: image - name: topic dtype: string - name: State/UT dtype: string - name: language dtype: string - name: short_q1 dtype: string - name: short_a1 dtype: string - name: short_q2 dtype: string - name: short_a2 dtype: string - name: mcq dtype: string - name: mcq_a dtype: string - name: mcq_opt1 dtype: string - name: mcq_opt2 dtype: string - name: mcq_opt3 dtype: string - name: mcq_opt4 dtype: string - name: true_false_q dtype: string - name: true_false_a dtype: string - name: long_q dtype: string - name: long_a dtype: string - name: adversarial_question dtype: string - name: adversarial_answer dtype: string - name: source_url dtype: string splits: - name: test num_bytes: 1131332865 num_examples: 4117 download_size: 1127187152 dataset_size: 1131332865 - config_name: vqa_indic features: - name: id dtype: string - name: image dtype: image - name: topic dtype: string - name: State/UT dtype: string - name: language dtype: string - name: short_q1 dtype: string - name: short_a1 dtype: string - name: short_q2 dtype: string - name: short_a2 dtype: string - name: mcq dtype: string - name: mcq_a dtype: string - name: mcq_opt1 dtype: string - name: mcq_opt2 dtype: string - name: mcq_opt3 dtype: string - name: mcq_opt4 dtype: string - name: true_false_q dtype: string - name: true_false_a dtype: string - name: long_q dtype: string - name: long_a dtype: string - name: adversarial_question dtype: string - name: adversarial_answer dtype: string - name: source_url dtype: string splits: - name: test num_bytes: 276711951 num_examples: 1007 download_size: 273419974 dataset_size: 276711951 - config_name: vqa_parallel features: - name: id dtype: string - name: image dtype: image - name: topic dtype: string - name: State/UT dtype: string - name: language dtype: string - name: short_q1 dtype: string - name: short_a1 dtype: string - name: short_q2 dtype: string - name: short_a2 dtype: string - name: mcq dtype: string - name: mcq_a dtype: string - name: mcq_opt1 dtype: string - name: mcq_opt2 dtype: string - name: mcq_opt3 dtype: string - name: mcq_opt4 dtype: string - name: true_false_q dtype: string - name: true_false_a dtype: string - name: long_q dtype: string - name: long_a dtype: string - name: adversarial_question dtype: string - name: adversarial_answer dtype: string - name: source_url dtype: string splits: - name: test num_bytes: 324650384 num_examples: 1166 download_size: 321701661 dataset_size: 324650384 configs: - config_name: mmt data_files: - split: test path: mmt/test-* - config_name: ocr data_files: - split: test path: ocr/test-* - config_name: vqa_en data_files: - split: test path: vqa_en/test-* - config_name: vqa_indic data_files: - split: test path: vqa_indic/test-* - config_name: vqa_parallel data_files: - split: test path: vqa_parallel/test-* task_categories: - visual-question-answering language: - en - hi - ta - te - ml - mr - gu - pa - or - kn - bn tags: - vision - ocr - vqa - indic - benchmark - cultural - mmt - multimodal size_categories: - 10K<n<100K --- # IndicVisionBench [![ICLR 2026](https://img.shields.io/badge/ICLR-2026-blue)](https://openreview.net/forum?id=LmJoLn04iL) [![arXiv](https://img.shields.io/badge/arXiv-2511.04727-b31b1b.svg)](https://arxiv.org/abs/2511.04727) [![IndicVisionBench-Github](https://img.shields.io/badge/Github-IndicVisionBench-green?logo=github)](https://github.com/ola-krutrim/IndicVisionBench) This repository contains the dataset for **IndicVisionBench**, introduced in **“IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs”** 📄 [arXiv:2511.04727](https://arxiv.org/abs/2511.04727) 🏛️ Accepted at **ICLR 2026** 🔗 OpenReview: https://openreview.net/forum?id=LmJoLn04iL IndicVisionBench is a **culturally grounded, multilingual vision-language benchmark** designed to evaluate Vision–Language Models (VLMs) on visual understanding tasks in the Indian context. The benchmark focuses on: - Multilingual Visual Question Answering (VQA) - Culturally-aware reasoning - Adversarial robustness - Parallel cross-lingual consistency - Optical Character Recognition (OCR) in Indic scripts - Multimodal Machine Translation (MMT) Unlike generic VQA datasets, IndicVisionBench emphasizes **Indian cultural context, regional diversity, and Indic language coverage**, enabling systematic evaluation of multilingual and culturally-aware VLMs. --- ## Languages Covered - English - Hindi - Tamil - Telugu - Malayalam - Marathi - Gujarati - Punjabi - Odia - Kannada - Bengali --- ## Benchmark Overview IndicVisionBench consists of five main configurations: | Config | Task | #Images | Description | |--------|------|-----------|-------------| | `mmt` | Multimodal Machine Translation | 106 | Image-grounded translations across Indic languages | | `ocr` | Optical Character Recognition | 876 | OCR in multiple Indic scripts | | `vqa_en` | Visual Question Answering | 4,117 | Culturally grounded VQA in English | | `vqa_indic` | Visual Question Answering | 1,007 | Culturally grounded VQA in Indic languages | | `vqa_parallel` | Visual Question Answering | 1,166 | Same QA pairs across multiple languages for cross-lingual consistency | - **Total images across all configs:** 4993 - **Total questions across VQA En, Indic and Parallel:** (4117 + 1007 + 1166)*6 = 37,740 --- ## Subset Descriptions ### 1️⃣ Multimodal Machine Translation (`mmt`) Image-grounded translation benchmark with aligned captions across multiple Indic languages. **Features:** - `image` - `topic` - `State/UT` - Parallel captions in 11 languages - `source_url` This subset evaluates: - Cultural terminology consistency - Visual grounding in translation ### 2️⃣ Optical Character Recognition (`ocr`) OCR dataset consisting of scanned pages in Indic scripts from Wikisource. **Features:** - `image` - `text` - `language` - `page_url` This subset evaluates OCR capabitilies on Indic scripts/languages. ### 3️⃣ English VQA (`vqa_en`) Culturally grounded VQA in English. Each example includes: - 2 short-answer questions - 1 multiple-choice question (4 options) - 1 true/false question - 1 long-form reasoning question - 1 adversarial question - Metadata: `topic`, `language`, `State/UT`, 'source_url' This subset evaluates: - Object & scene understanding - Cultural knowledge - Fine-grained attribute recognition - Robustness to false assumptions in the adversarial questions ### 4️⃣ Indic VQA (`vqa_indic`) Same VQA format as in `vqa_en`, but in Indic languages. This subset evaluates: - Multilingual reasoning - Cultural alignment in local languages ### 5️⃣ Parallel VQA (`vqa_parallel`) Same VQA format as in `vqa_en`. Parallel multilingual QA pairs for the same image. This subset enables the study of - cross-lingual performance of VLMs across 11 languages (English and 10 Indic languages) - region-specific strengths or biases ## Usage All configurations can be loaded using `datasets`: ```python from datasets import load_dataset # Example: load English VQA split ds = load_dataset("krutrim-ai-labs/IndicVisionBench", "vqa_en")["test"] print(ds[0]) ``` The following five configurations/splits are present in the dataset: - mmt - ocr - vqa_en - vqa_indic - vqa_parallel Images are stored directly within the dataset and loaded automatically by 🤗 Datasets. ## Evaluation Dimensions IndicVisionBench is designed to measure: - Scene & contextual understanding - Attribute detection - Cultural understanding - Bias & adversarial robustness - Cross-lingual consistency - OCR performance - Image-grounded translation capability ## Code & Evaluation The official inference and evaluation codebase for IndicVisionBench is available on GitHub. **GitHub Repository:** [https://github.com/ola-krutrim/IndicVisionBench](https://github.com/ola-krutrim/IndicVisionBench) The repository provides the complete pipeline for running inference and reproducing benchmark results across all evaluation tracks. The codebase includes: - End-to-end inference pipelines for **Vision-Language Models (VLMs)** and **OCR systems** - Modular wrappers enabling easy integration of **API-based models** and **open-source models** - Evaluation pipelines for all benchmark tasks: - **OCR evaluation** - **Visual Question Answering (VQA)** - Structured questions (MCQ, True/False) - Open-ended questions (short answer, long answer, adversarial) - **Multimodal Machine Translation (MMT)** - **LLM-as-a-judge evaluation** for open-ended VQA responses - Data generation scripts for constructing a similar multimodal benchmark. ### Citation If you use this dataset, please cite: ```bibtex @inproceedings{faraz2026indicvisionbench, title={IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs}, author={Ali Faraz and Akash and Shaharukh Khan and Raja Kolla and Akshat Patidar and Suranjan Goswami and Abhinav Ravi and Chandra Khatri and Shubham Agarwal}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026}, url={https://openreview.net/forum?id=LmJoLn04iL} } ```
提供机构:
krutrim-ai-labs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作