FoundryAILabs/k12-indian-curriculum-4.9m
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/FoundryAILabs/k12-indian-curriculum-4.9m
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- hi
- bn
- te
- ta
- kn
- ml
- mr
- gu
- or
- pa
- ur
license: apache-2.0
task_categories:
- question-answering
- text-generation
tags:
- education
- k12
- indian-languages
- cbse
- ncert
- bharatllm
size_categories:
- 1M<n<10M
---
# BharatLLM K-12 Indian Curriculum Dataset (4.9M)
**4,904,936 question-answer pairs** covering **CBSE/NCERT K-12 curriculum** across **12 Indian languages**.
| Language | Script | Entries |
|----------|--------|---------|
| English | Latin | ~594K |
| Hindi | Devanagari | ~449K |
| Bengali | Bengali | ~408K |
| Telugu | Telugu | ~408K |
| Tamil | Tamil | ~408K |
| Kannada | Kannada | ~408K |
| Malayalam | Malayalam | ~408K |
| Marathi | Devanagari | ~408K |
| Gujarati | Gujarati | ~408K |
| Odia | Odia | ~408K |
| Punjabi | Gurmukhi | ~408K |
| Urdu | Nastaliq | ~374K |
## Format
```json
{
"instruction": "What is the Pythagorean theorem?",
"response": "The Pythagorean theorem states that...",
"language": "english",
"subject": "Mathematics",
"class": "10"
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("FoundryAILabs/k12-indian-curriculum-4.9m")
hindi = dataset.filter(lambda x: x["language"] == "hindi")
```
**Website**: [foundryailabs.io](https://foundryailabs.io)
---
language:
- 英语
- 印地语
- 孟加拉语
- 泰卢固语
- 泰米尔语
- 卡纳达语
- 马拉雅拉姆语
- 马拉地语
- 古吉拉特语
- 奥里亚语
- 旁遮普语
- 乌尔都语
license: apache-2.0
task_categories:
- 问答
- 文本生成
tags:
- 教育
- K12
- 印度语言
- CBSE(Central Board of Secondary Education)
- NCERT(National Council of Educational Research and Training)
- BharatLLM
size_categories:
- 100万 < 样本数 < 1000万
---
# BharatLLM K-12印度课程数据集(490万)
**共计4,904,936条问答对**,覆盖**CBSE(Central Board of Secondary Education)与NCERT(National Council of Educational Research and Training)的K-12课程体系**,涵盖12种印度语言。
| 语言 | 书写系统 | 条目数 |
|----------|--------|---------|
| 英语 | 拉丁字母 | ~59.4万 |
| 印地语 | 天城文 | ~44.9万 |
| 孟加拉语 | 孟加拉文 | ~40.8万 |
| 泰卢固语 | 泰卢固文 | ~40.8万 |
| 泰米尔语 | 泰米尔文 | ~40.8万 |
| 卡纳达语 | 卡纳达文 | ~40.8万 |
| 马拉雅拉姆语 | 马拉雅拉姆文 | ~40.8万 |
| 马拉地语 | 天城文 | ~40.8万 |
| 古吉拉特语 | 古吉拉特文 | ~40.8万 |
| 奥里亚语 | 奥里亚文 | ~40.8万 |
| 旁遮普语 | 古木基文 | ~40.8万 |
| 乌尔都语 | 纳斯塔里克书体 | ~37.4万 |
## 数据格式
json
{
"instruction": "勾股定理是什么?",
"response": "勾股定理指出……",
"language": "english",
"subject": "Mathematics",
"class": "10"
}
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("FoundryAILabs/k12-indian-curriculum-4.9m")
hindi = dataset.filter(lambda x: x["language"] == "hindi")
**官方网站**:[foundryailabs.io](https://foundryailabs.io)
提供机构:
FoundryAILabs



