five

manhpv26/khmer-yt-voice-dataset

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/manhpv26/khmer-yt-voice-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - km license: cc-by-sa-4.0 task_categories: - automatic-speech-recognition - audio-classification size_categories: - 100K<n<1M --- # Khmer YouTube Voice Dataset Full audio + diarization metadata from Khmer YouTube videos. | Stat | Value | |------|-------| | Videos | 3,945 | | Total turns | 16,021 | | Total duration | 1033.4 hours | | Format | WAV (original sample rate) | ## Structure ``` data/ shard_0000.tar.gz shard_0001.tar.gz ... manifest.jsonl ``` Mỗi shard chứa ~20 videos, mỗi video gồm: ``` video_id/ full.wav # audio nguyên gốc metadata.json # video info + turns (speaker, start, end, transcript) ``` ## manifest.jsonl | Field | Description | |-------|-------------| | `video_id` | YouTube video ID | | `shard` | Shard chứa video | | `duration_sec` | Thời lượng video (seconds) | | `num_turns` | Số turns (speaker segments) | | `title` | Video title | | `channel` | YouTube channel | | `source_url` | YouTube URL |
提供机构:
manhpv26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作