SSC-BanglaTutor: A Curriculum-Aligned Bengali Dataset for Intelligent Tutoring Systems
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/krn9bzypsn
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises a Bengali-language educational corpus specifically curated to support the fine-tuning and evaluation of AI-driven, hint-based tutoring systems aligned with the Secondary School Certificate (SSC) science curriculum of Bangladesh. It contains a total of 11,286 structured question–answer–hint entries, distributed across three core science subjects:
- Biology: 4,859 entries (14 chapters)
- Chemistry: 3,034 entries (12 chapters)
- Physics: 3,393 entries (14 chapters)
Each entry includes:
- A question written in Bengali
- Five progressively ranked hints guiding learners from general to specific concepts
- A convergence metric estimating the probability of a correct response at each hint
- Correct and distractor answers based on common student misconceptions
- Curriculum-aligned topic tags mapped to the SSC syllabus
All data are encoded in UTF-8 JSON Lines (.jsonl) format, ensuring compatibility with Bengali NLP tools and large-scale AI training pipelines. The dataset’s structured design supports personalized feedback, enabling adaptive learning, retrieval-augmented generation (RAG), and fine-tuning of large language models (LLMs) for education in low-resource languages.
创建时间:
2025-10-27



