DataOrigin/k12-video-solutions-india
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DataOrigin/k12-video-solutions-india
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
task_categories:
- video-classification
- question-answering
language:
- hi
- en
- ta
- te
- ml
- bn
- mr
- gu
- or
- pa
- as
tags:
- education,
- k12,
- india,
- multilingual,
- mathematics,
- science,
- curriculum,
- indic,
pretty_name: K-12 Video Solutions India
size_categories:
- 1M<n<10M
---
# K-12 Video Solutions India
## Dataset Description
A large-scale collection of step-by-step video solutions to K-12 curriculum questions,
solved by subject matter experts. Produced by Prepp, India's largest K-12 learning
platform, operated by Collegedunia Web Private Limited.
## Dataset Summary
- **Total videos:** 3.8 million (38 lakh)
- **Content type:** Step-by-step question-answer video solutions
- **Subjects:** Mathematics, Science, Social Studies, English, Regional Languages
- **Grades:** Class 1 through Class 12
- **Boards:** 33 Indian educational boards including CBSE, ICSE, and all major
state boards
- **Languages:** Hindi, English, Tamil, Telugu, Kannada, Malayalam, Marathi,
Bengali, Gujarati, Odia, Punjabi, Assamese
- **Format:** Video (MP4) with audio narration and visual problem-solving
## Sample Data
Three sample videos are available in this repository demonstrating:
- Sample 1: Mathematics problem solving (Class 10, CBSE)
- Sample 2: Science concept explanation (Class 8, CBSE)
- Sample 3: Regional language content (Hindi medium)
## Key Features
- **Human-verified:** All solutions reviewed by qualified subject experts
- **Curriculum-mapped:** Each video tagged to specific board, grade, subject,
chapter, and topic
- **Multimodal:** Rich combination of audio narration, handwritten working,
diagrams, and step-by-step visual explanation
- **Low-resource languages:** Includes Odia, Assamese, and Punjabi —
genuinely low-resource for AI training globally
## Intended Uses
- Training multimodal AI models for educational reasoning
- STEM problem-solving model development
- Indic language speech and audio model training
- Video understanding and question-answering model fine-tuning
- Multilingual educational AI applications
## Data Collection and Rights
All content is proprietary, produced by Prepp's in-house content team and
contracted subject matter experts under work-for-hire agreements. Content is
ethically sourced and curriculum-mapped. Full dataset licensing is available
for commercial AI training purposes.
## Licensing and Commercial Access
This repository contains sample data only. The full dataset of 3.8 million
videos is available for commercial AI training licensing.
**For licensing inquiries contact:**
Ankit Dubey — Head of AI Data Partnerships, Collegedunia
ankit.dubey@collegedunia.com
## Dataset Curator
[Collegedunia Web Private Limited](https://collegedunia.com) |
[Prepp](https://prepp.in)
Gurugram, Haryana, India
提供机构:
DataOrigin



