five

DataOrigin/ncert-lectures-india

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DataOrigin/ncert-lectures-india
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other task_categories: - audio-classification - automatic-speech-recognition language: - en - hi - bn - ta - te - ml - mr - or - as - pa tags: - ncert, - education, - india, - upsc, - humanities, - recorded-lectures, - government-exams, - science, - long-form-audio pretty_name: NCERT Lectures India size_categories: - 1K<n<10K # NCERT Lectures India ## Dataset Description A large-scale collection of recorded lectures covering NCERT curriculum across Science and Humanities streams, specifically designed for government exam preparation including UPSC, SSC, and State PSC examinations. Produced by Prepp, India's largest government exam preparation platform, operated by Collegedunia Web Private Limited. ## Dataset Summary - **Total duration:** 5,100 hours of recorded lectures - **Content type:** Structured curriculum lectures covering NCERT Science and Humanities - **Exam relevance:** UPSC Civil Services, SSC, State PSC, Railways, Banking, and all major government competitive examinations - **Curriculum:** Full NCERT coverage — Classes 6 through 12, Science and Humanities streams - **Chapters available:** History Ch-1, Ch-2, Ch-3 (samples) - **Languages:** Hindi and English primary; regional language variants available - **Format:** Audio/Video with structured chapter-by-chapter delivery ## Sample Data Three sample lectures are available in this repository: - History Chapter 1 — The Early Societies - History Chapter 2 — Early Economies and Empires - History Chapter 3 — Political and Economic History ## Key Features - **NCERT-aligned:** Follows official NCERT curriculum chapter structure exactly — high-value signal for Indian education AI models - **Government exam optimised:** Content structured specifically for competitive exam preparation — includes emphasis patterns, important facts, and exam-relevant framing that general educational content lacks - **Long-form audio:** 5,100 hours of continuous structured lecture audio — one of the largest Indic long-form educational audio datasets available - **Subject breadth:** Covers History, Geography, Polity, Economics, Science, and Environment across multiple NCERT grades - **High information density:** Unlike casual educational content, government exam lectures are dense with factual content — valuable for knowledge-intensive LLM training ## Intended Uses - Training automatic speech recognition (ASR) models for Hindi and Indic languages in educational domain - Long-form audio understanding model development - Knowledge-intensive question answering model training - Indian history, geography, and polity domain model fine-tuning - Government exam preparation AI development - Curriculum-structured audio dataset for educational AI research ## Why This Dataset Is Unique Government exam preparation content is structurally different from general educational content. It is optimised for information retention, fact density, and conceptual clarity under exam conditions. No comparable large-scale structured dataset exists for Indian government exam preparation content in audio format. ## Data Collection and Rights All content is proprietary, produced by Prepp's in-house faculty team of government exam subject matter experts. Content is curriculum-mapped to NCERT chapter structure and ethically sourced under work-for-hire agreements. Full dataset licensing is available for commercial AI training purposes. ## Licensing and Commercial Access This repository contains sample data only. The full dataset of 5,100 hours of NCERT lecture recordings is available for commercial AI training licensing. **For licensing inquiries contact:** Ankit Dubey — Head of AI Data Partnerships, Collegedunia ankit.dubey@collegedunia.com ## Dataset Curator [Collegedunia Web Private Limited](https://collegedunia.com) | [Prepp](https://prepp.in) Gurugram, Haryana, India
提供机构:
DataOrigin
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作