five

ufal/parczech4speech-segmented

收藏
Hugging Face2025-06-16 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/ufal/parczech4speech-segmented
下载链接
链接失效反馈
官方服务:
资源简介:
ParCzech4Speech (句子分割变体)是一个基于捷克议会录音和官方转录的大型语音数据集,用于语音识别和合成任务。该数据集提供了干净的音频-文本对齐和可靠的片段边界。它源自ParCzech 4.0语料库和AudioPSP 24.01音频集合,通过WhisperX和Wav2Vec 2.0自动对齐确保了高质量片段,并提供了丰富的元数据用于筛选和质量控制。

ParCzech4Speech (Sentence-Segmented Variant) is a large-scale Czech speech dataset based on parliamentary recordings and official transcripts, designed for speech recognition and synthesis tasks. It provides clean audio-text alignment and reliable segment boundaries, derived from the ParCzech 4.0 corpus and AudioPSP 24.01 audio collection, aligned with WhisperX and Wav2Vec 2.0 for high-quality segments and rich metadata for filtering and quality control.
提供机构:
ufal
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作