boolq-indic

Name: boolq-indic
Creator: maas
Published: 2025-11-27 16:35:00
License: 暂无描述

魔搭社区2025-11-27 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/sarvamai/boolq-indic

下载链接

链接失效反馈

官方服务：

资源简介：

# Indic BoolQ Dataset A multilingual version of the [BoolQ](https://huggingface.co/datasets/google/boolq) (Boolean Questions) dataset, translated from English into 10 Indian languages. It is a question-answering dataset for yes/no questions containing ~12k naturally occurring questions. ### Languages Covered The dataset includes translations in the following languages: - Bengali (bn) - Gujarati (gu) - Hindi (hi) - Kannada (kn) - Marathi (mr) - Malayalam (ml) - Oriya (or) - Punjabi (pa) - Tamil (ta) - Telugu (te) ### Dataset Format Each example contains: - `question`: A yes/no question in the target language - `passage`: A passage providing context for the question - `answer`: Yes/No - `label`: 1 for 'yes' and 0 for 'no' - `language`: ISO 639-1 language code ## Dataset Statistics - Total number of examples: ~140k - Split sizes match the original BoolQ dataset: - Training: 9,427 examples per language - Validation: 3,270 examples per language ## Usage ```python from datasets import load_dataset dataset = load_dataset("sarvamai/boolq-indic") ``` ## License This dataset follows the same license as the original BoolQ dataset. ## Acknowledgments - Original BoolQ dataset creators

# Indic BoolQ 数据集本数据集是[BoolQ](https://huggingface.co/datasets/google/boolq)（布尔问答数据集）的多语言版本，由英语翻译为10种印度语言，是一个包含约1.2万个自然生成的真实是非类问题的问答数据集。 ### 覆盖语言本数据集包含以下语言的翻译版本： - 孟加拉语（bn） - 古吉拉特语（gu） - 印地语（hi） - 卡纳达语（kn） - 马拉地语（mr） - 马拉雅拉姆语（ml） - 奥里亚语（or） - 旁遮普语（pa） - 泰米尔语（ta） - 泰卢固语（te） ### 数据集格式每个样本包含以下字段： - `question`：目标语言下的是非类问题 - `passage`：为该问题提供上下文的文段 - `answer`：取值为「是」或「否」 - `label`：「是」对应取值1，「否」对应取值0 - `language`：ISO 639-1 语言代码 ### 数据集统计信息 - 总样本数：约14万 - 数据集拆分规模与原始BoolQ数据集保持一致： - 训练集：每种语言9427个样本 - 验证集：每种语言3270个样本 ### 使用方法可通过如下Python代码加载该数据集： python from datasets import load_dataset dataset = load_dataset("sarvamai/boolq-indic") ### 许可协议本数据集遵循与原始BoolQ数据集完全相同的许可协议。 ### 致谢 - 原始BoolQ数据集的创建者

提供机构：

maas

创建时间：

2025-05-26

搜集汇总

数据集介绍