ufal/parczech4speech-segmented
收藏Hugging Face2025-06-16 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/ufal/parczech4speech-segmented
下载链接
链接失效反馈官方服务:
资源简介:
ParCzech4Speech (句子分割变体)是一个基于捷克议会录音和官方转录的大型语音数据集,用于语音识别和合成任务。该数据集提供了干净的音频-文本对齐和可靠的片段边界。它源自ParCzech 4.0语料库和AudioPSP 24.01音频集合,通过WhisperX和Wav2Vec 2.0自动对齐确保了高质量片段,并提供了丰富的元数据用于筛选和质量控制。
ParCzech4Speech (Sentence-Segmented Variant) is a large-scale Czech speech dataset based on parliamentary recordings and official transcripts, designed for speech recognition and synthesis tasks. It provides clean audio-text alignment and reliable segment boundaries, derived from the ParCzech 4.0 corpus and AudioPSP 24.01 audio collection, aligned with WhisperX and Wav2Vec 2.0 for high-quality segments and rich metadata for filtering and quality control.
提供机构:
ufal



