gam30/Nepali-asr-train-val

Name: gam30/Nepali-asr-train-val
Creator: gam30
Published: 2026-04-05 16:02:54
License: 暂无描述

Hugging Face2026-04-05 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/gam30/Nepali-asr-train-val

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: audio dtype: audio - name: text dtype: string - name: duration dtype: float64 splits: - name: train num_bytes: 1808020899 num_examples: 15820 - name: val num_bytes: 436296937 num_examples: 3955 download_size: 2238681396 dataset_size: 2244317836 configs: - config_name: default data_files: - split: train path: data/train-* - split: val path: data/val-* --- # Nepali ASR Train and Validation Set (Clean & Noisy) This dataset contains approximately **19.41 hours** of total Nepali speech data sourced from OpenSLR, organized into an 80:20 train and validation split. To improve model robustness in real-world scenarios, 40% of the samples in both splits have been augmented with background noise (crowd, traffic, construction, and wind). The remaining 60% consists of clean audio. ## Dataset Overview | Property | Value | |----------|-------| | **Language** | Nepali | | **Source** | OpenSLR | | **Total Samples** | 19,775 | | **Noise Type** | Synthetic environmental noise (crowd, traffic, construction, wind) | | **Noise Coverage** | 40% augmented, 60% clean | | **Audio Format** | WAV | | **Sample Rate** | 16,000 Hz (16 kHz) | ## Dataset Features Each sample in the dataset contains: - **`audio`** (Audio): The audio waveform data - Sample Rate: **16,000 Hz** (16 kHz) - Channels: 1 (Mono) - **`text`** (String): Full Nepali transcription of the speech - **`duration`** (Float): Audio duration in seconds ## Noise Characteristics 40% of the audio samples contain synthetic environmental noise mixed with the clean Nepali speech: - **Noise Sources**: - 🏢 Crowd noise (background conversations, ambient chatter) - 🚗 Traffic noise (vehicle engines, horns, road sounds) - 🏗️ Construction noise (machinery, tools, equipment) - 💨 Wind noise (outdoor wind, air movements) ## Loading the Dataset ### Using HuggingFace `datasets` library ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("gam30/Nepali-asr-train-val") # Access samples from the train split sample = dataset['train'][0] print(f"Transcription: {sample['text']}") print(f"Duration: {sample['duration']}s") # Iterate through dataset for sample in dataset['train']: text = sample['text'] duration = sample['duration'] print(f"{duration}s - {text[:50]}...") ``` ### With Audio Feature ```python from datasets import load_dataset, Audio dataset = load_dataset("gam30/Nepali-asr-train-val") dataset = dataset.cast_column("audio", Audio(sampling_rate=16000)) # Access audio data sample = dataset['train'][0] print(f"Array shape: {sample['audio']['array'].shape}") print(f"Sampling rate: {sample['audio']['sampling_rate']} Hz") # Will be 16000 ``` ### Working with Audio Files ```python from datasets import load_dataset import librosa dataset = load_dataset("gam30/Nepali-asr-train-val") # Process first 5 samples for sample in dataset['train'][:5]: # Load audio using librosa audio_path = sample['audio']['path'] y, sr = librosa.load(audio_path, sr=None) print(f"File: {audio_path}") print(f"Sampling rate: {sr} Hz") print(f"Duration: {len(y) / sr:.2f}s") print(f"Text: {sample['text'][:60]}...") print() ``` ## Use Cases This dataset is suitable for: 1. **Robust ASR Model Training** - Training models on noisy speech 2. **Noise Robustness Testing** - Evaluating ASR systems on noisy conditions 3. **Domain Adaptation** - Fine-tuning pre-trained models on Nepali 4. **Speech Enhancement Research** - Testing denoising techniques ## Dataset Statistics - **Total Samples**: 19,775 - **Total Audio Duration**: ~19.41 hours - **Train Split**: 15,820 samples (~15.59 hours) - **Validation Split**: 3,955 samples (~3.82 hours) - **Sample Rate**: 16,000 Hz (16 kHz Mono) - standard for ASR tasks ## Citation If you use this dataset in your research, please cite: ```bibtex @dataset{nepali_asr_noisy_2024, title={Nepali ASR Train and Validation noisy set}, author={sangam}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/gam30/Nepali-asr-train-val} } ``` ## Quality Assurance - ✓ All transcriptions in UTF-8 Unicode format - ✓ Duration metadata computed and validated - ✓ Audio verified at 16,000Hz mono ## Dataset Structure ```text gam30/Nepali-asr-train-val ├── train/ │ ├── audio (Audio) │ ├── text (String) │ └── duration (Float) └── val/ ├── audio (Audio) ├── text (String) └── duration (Float) ``` ## Support & Issues For questions or issues with the dataset: 1. Check the Hugging Face community discussions 2. Open an issue on the dataset repository --- **Dataset ID**: `gam30/Nepali-asr-train-val` **Last Updated**: 2026

提供机构：

gam30

5,000+

优质数据集

54 个

任务类型

进入经典数据集