five

LingoIITGN/Gurukul

收藏
Hugging Face2026-03-19 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/LingoIITGN/Gurukul
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nd-4.0 language: - en size_categories: - 10K<n<100K --- # Gurukul **Gurukul** is an educational question-answering dataset aligned with the **Indian school curriculum**, building on the original Gurukul series. It contains high-quality QA pairs derived from Class-level textbooks (primarily English prose and related subjects), designed to support reading comprehension, vocabulary building, inference, and curriculum-based language understanding in educational AI applications. ## Overview Gurukul provides structured question-answer pairs extracted from Indian educational NCERT textbooks, with rich contextual passages. It targets school-level content (mainly secondary education) to enable training and evaluation of models for: - Educational question answering - Curriculum-aligned reading comprehension - Vocabulary, idiom, antonym/synonym, and inference tasks - Development of AI tutors / assistants for Indian students Key features: - Aligned with Indian education system (NCERT-style content) - Focus on English language learning in school context - High-quality, human-curated or refined examples ## Languages - **English** (primary language of questions, answers, and contexts) ### Covered Subjects and Classes Gurukul draws from NCERT-aligned textbooks, supporting multiple core subjects across secondary and higher secondary levels: | Subject | Classes Covered | Focus Areas / Example Topics | |--------------|--------------------------|-----------------------------------------------------------| | **English** | Class 9 – 12 | Prose, poetry, comprehension, vocabulary, grammar, literature (e.g., biographies, stories, idioms) | | **Mathematics** | Class 9 – 12 | Algebra, geometry, trigonometry, calculus basics, number systems, statistics, coordinate geometry | | **Science** | Class 9 – 12 | Physics (motion, force, electricity), Chemistry (atoms, reactions, acids/bases), Biology (life processes, heredity, ecology) | - Questions are curriculum-aligned, often chapter-specific. ## Supported Tasks - **Question Answering** (abstractive / extractive from given context) - **Reading Comprehension** - **Vocabulary & Language Understanding** (definitions, antonyms, idioms) - **Educational NLP** (school-level Question and explanation generation) ## Dataset Structure - **Size**: ~10K–20K examples - **Core Columns**: | Column | Type | Description | |-----------|--------|-----------------------------------------------------------------------------| | `question`| string | The comprehension or knowledge question | | `answer` | string | Reference answer (detailed or concise) | | `context` | string | Relevant textbook passage or expanded explanation | | `chapter` | string | Chapter identifier (e.g., prose chapter codes) | | `class` | string | School level (e.g., Class 9, Class 10) | | `subject` | string | Subject area (primarily English; possibly others in extensions) | ### Dataset Description - **Curated by:** [Lingo Research Group at IIT Gandhinagar](https://lingo.iitgn.ac.in/) - **Licensed by:** cc-by-4.0 ## Contact US ✉️ [Lingo Research Group at IIT Gandhinagar, India](https://labs.iitgn.ac.in/lingo/) </br> Mail at: [lingo@iitgn.ac.in](lingo@iitgn.ac.in)

--- 许可证:CC BY-ND 4.0 语言: - 英语 样本量范围: - 10000 < 样本量 < 100000 --- # Gurukul **Gurukul** 是一款适配印度学校课程体系的教育问答数据集,基于初代Gurukul系列打造。数据集包含源自各年级教材(主要为英语文本及相关学科)的高质量问答对,旨在支撑教育人工智能应用中的阅读理解、词汇积累、推理以及适配课程的语言理解任务。 ## 数据集概览 Gurukul 提供从印度教育**印度国家教育研究与培训理事会(National Council of Educational Research and Training,NCERT)**教材中提取的结构化问答对,配有丰富的上下文段落。其面向中学阶段内容,可用于训练与评估模型完成以下任务: - 教育问答任务 - 适配课程的阅读理解任务 - 词汇、习语、反/同义词及推理任务 - 面向印度学生的AI导师与助手开发 ### 核心特性 - 适配印度教育体系(NCERT风格内容) - 聚焦学校场景下的英语语言学习 - 高质量、经人工整理或优化的样本 ## 语言说明 - **英语**(问答内容及上下文的主要语言) ## 覆盖学科与学段 Gurukul 取材于适配NCERT标准的教材,覆盖中等教育及高等中等教育阶段的多门核心学科: | 学科分类 | 覆盖学段 | 重点领域/示例主题 | |--------------|--------------------------|-----------------------------------------------------------| | **英语** | 9至12年级 | 散文、诗歌、阅读理解、词汇、语法、文学作品(如传记、故事、习语) | | **数学** | 9至12年级 | 代数、几何、三角函数、微积分基础、数制、统计学、解析几何 | | **科学** | 9至12年级 | 物理(运动、力、电学)、化学(原子、化学反应、酸碱)、生物(生命过程、遗传、生态学) | - 所有问题均适配课程要求,通常按章节划分。 ## 支持任务类型 - **问答任务**(基于给定上下文的抽取式/生成式问答) - **阅读理解任务** - **词汇与语言理解任务**(词义、反义词、习语) - **教育自然语言处理任务**(中学级问答及解析生成) ## 数据集结构 - **样本规模**:约10000–20000条样本 - **核心字段**: | 字段名 | 数据类型 | 说明 | |-----------|--------|-----------------------------------------------------------------------------| | `question`| 字符串 | 阅读理解或知识性问题 | | `answer` | 字符串 | 参考答案(详细或简洁版) | | `context` | 字符串 | 相关教材段落或扩展解释 | | `chapter` | 字符串 | 章节标识(如散文章节编码) | | `class` | 字符串 | 学校学段(如9年级、10年级) | | `subject` | 字符串 | 学科领域(主要为英语;扩展版本可能包含其他学科) | ### 数据集详情 - **整理方**:[印度理工学院甘地纳格尔分校Lingo研究团队](https://lingo.iitgn.ac.in/) - **授权协议**:CC BY 4.0 ## 联系方式 ✉️ [印度理工学院甘地纳格尔分校Lingo研究团队](https://labs.iitgn.ac.in/lingo/) </br> 邮件联系:[lingo@iitgn.ac.in](lingo@iitgn.ac.in)
提供机构:
LingoIITGN
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作