five

large-traversaal/openbookqa_urdu_final

收藏
Hugging Face2026-03-05 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/large-traversaal/openbookqa_urdu_final
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card: OpenBookQA Urdu ## Dataset Summary `openbookqa_urdu_cleaned` is a cleaned Urdu translation of the **OpenBookQA** dataset, a multiple-choice question answering benchmark designed to test **elementary science understanding combined with commonsense reasoning**. Each example consists of a question and four answer options, with exactly one correct answer. The dataset provides Urdu translations of questions and answer choices, enabling evaluation and training of **Urdu and multilingual language models** on scientific reasoning tasks in a low-resource language setting. ## Dataset Details * **Dataset Name:** openbookqa_urdu_cleaned * **Maintained by:** large-traversaal (Traversaal.ai) * **Task Type:** Multiple-choice question answering * **Domain:** Elementary science and commonsense reasoning * **Languages:** Urdu (primary), English (where original fields are retained) * **Format:** Parquet * **Answer Choices:** 4 per question ## Dataset Structure Each record typically contains the following fields: * `id`: Unique example identifier * `question`: Urdu translation of the question * `choices`: Answer options (four choices, labeled A–D) * `answerKey`: Correct answer label (A, B, C, or D) * `english_question` (optional): Original English question * `english_choices` (optional): Original English answer options Exact field names may vary slightly depending on split and preprocessing version. ## Intended Uses This dataset is intended for: * Training and evaluating Urdu and multilingual QA models * Benchmarking reasoning performance on science-based questions * Cross-lingual transfer learning from English to Urdu * Research in low-resource language understanding and reasoning ## Loading the Dataset ```python from datasets import load_dataset ds = load_dataset("large-traversaal/openbookqa_urdu_cleaned") ``` ## Licensing and Usage Licensing follows the terms of the original OpenBookQA dataset. Users should verify license details on the Hugging Face dataset page before redistribution or commercial use. ## 📄 Citation If you use this dataset in your research, please cite the **UrduBench paper**: ```bibtex @misc{shafique2026urdubenchurdureasoningbenchmark, title={UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop}, author={Muhammad Ali Shafique and Areej Mehboob and Layba Fiaz and Muhammad Usman Qadeer and Hamza Farooq}, year={2026}, eprint={2601.21000}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2601.21000} } ```
提供机构:
large-traversaal
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作