"NEP-MultiSent: A large-scale multilingual dataset on National Education Policy (NEP) 2020"
收藏DataCite Commons2025-12-12 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/nep-multisent-large-scale-multilingual-dataset-national-education-policy-nep-2020
下载链接
链接失效反馈官方服务:
资源简介:
"This dataset comprises 1,840,710 records of educational content related to Artificial Intelligence (AI) in education, collected from diverse sources across all Indian states and union territories. The dataset is uniquely structured to support research on AI integration in educational systems, with particular emphasis on alignment with India's National Education Policy (NEP) 2020.Each record contains five attributes: a unique identifier (ID), language of content (Language), full text content (Content), publication timestamp (Date Published), and geographic location (Place\/Location). The temporal coverage spans a nine-year coverage during the period 2016-2025, capturing the critical period of NEP 2020 adoption and implementation in Indian education.The dataset enables multiple research applications including: (1) Natural Language Processing (NLP) analysis of educational content; (2) Geographic disparities in technology adoption in education; (3) Temporal trend analysis of AI integration in curricula; (4) Multilingual education content analysis; (5) Policy impact assessment of NEP 2020; and (6) Regional comparison studies across Indian administrative divisions.The large scale (1.84 million records) and comprehensive geographic coverage (all states\/UTs) make this dataset particularly valuable for training machine learning models, conducting longitudinal studies, and informing evidence-based educational policy decisions. The dataset supports research aligned with NEP 2020's focus areas including technology integration, multilingual education, and equitable access to quality education."
本数据集包含1840710条与教育领域人工智能(Artificial Intelligence)相关的教育内容记录,数据采集自印度所有邦及联邦属地的多元来源。本数据集的结构设计独具特色,旨在支撑教育系统中人工智能融合应用相关研究,尤其侧重于契合印度《国家教育政策(National Education Policy, NEP)2020》的要求。每条记录包含五项属性:唯一标识符(ID)、内容语言(Language)、完整文本内容(Content)、发布时间戳(Date Published)以及地理位置(Place/Location)。该数据集的时间覆盖范围为2016年至2025年的九年周期,涵盖了印度教育领域推行并落实《国家教育政策2020》的关键阶段。本数据集可支撑多项研究应用,具体包括:(1) 教育内容的自然语言处理(Natural Language Processing, NLP)分析;(2) 教育领域技术应用的地域差异研究;(3) 课程体系中人工智能融合应用的时序趋势分析;(4) 多语言教育内容分析;(5) 《国家教育政策2020》的政策影响评估;(6) 印度各行政区划间的区域对比研究。本数据集规模庞大(184万条记录)且地理覆盖范围全面(覆盖所有邦及联邦属地),尤其适用于机器学习模型训练、纵向研究开展以及为循证教育政策制定提供参考。本数据集可支撑契合《国家教育政策2020》重点方向的相关研究,涵盖技术融合应用、多语言教育以及优质教育资源公平获取等领域。
提供机构:
IEEE DataPort
创建时间:
2025-12-12



