large-traversaal/openbookqa_urdu_final

Name: large-traversaal/openbookqa_urdu_final
Creator: large-traversaal
Published: 2026-03-05 17:38:36
License: 暂无描述

Hugging Face2026-03-05 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/large-traversaal/openbookqa_urdu_final

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card: OpenBookQA Urdu ## Dataset Summary `openbookqa_urdu_cleaned` is a cleaned Urdu translation of the **OpenBookQA** dataset, a multiple-choice question answering benchmark designed to test **elementary science understanding combined with commonsense reasoning**. Each example consists of a question and four answer options, with exactly one correct answer. The dataset provides Urdu translations of questions and answer choices, enabling evaluation and training of **Urdu and multilingual language models** on scientific reasoning tasks in a low-resource language setting. ## Dataset Details * **Dataset Name:** openbookqa_urdu_cleaned * **Maintained by:** large-traversaal (Traversaal.ai) * **Task Type:** Multiple-choice question answering * **Domain:** Elementary science and commonsense reasoning * **Languages:** Urdu (primary), English (where original fields are retained) * **Format:** Parquet * **Answer Choices:** 4 per question ## Dataset Structure Each record typically contains the following fields: * `id`: Unique example identifier * `question`: Urdu translation of the question * `choices`: Answer options (four choices, labeled A–D) * `answerKey`: Correct answer label (A, B, C, or D) * `english_question` (optional): Original English question * `english_choices` (optional): Original English answer options Exact field names may vary slightly depending on split and preprocessing version. ## Intended Uses This dataset is intended for: * Training and evaluating Urdu and multilingual QA models * Benchmarking reasoning performance on science-based questions * Cross-lingual transfer learning from English to Urdu * Research in low-resource language understanding and reasoning ## Loading the Dataset ```python from datasets import load_dataset ds = load_dataset("large-traversaal/openbookqa_urdu_cleaned") ``` ## Licensing and Usage Licensing follows the terms of the original OpenBookQA dataset. Users should verify license details on the Hugging Face dataset page before redistribution or commercial use. ## 📄 Citation If you use this dataset in your research, please cite the **UrduBench paper**: ```bibtex @misc{shafique2026urdubenchurdureasoningbenchmark, title={UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop}, author={Muhammad Ali Shafique and Areej Mehboob and Layba Fiaz and Muhammad Usman Qadeer and Hamza Farooq}, year={2026}, eprint={2601.21000}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2601.21000} } ```

提供机构：

large-traversaal

5,000+

优质数据集

54 个

任务类型

进入经典数据集