AdoCleanCode/german-multi-mfa
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/AdoCleanCode/german-multi-mfa
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含音频和文本转录的语音数据集,具有单词和音素级别的标注信息。每个样本包含:唯一标识符(id)、说话人ID(speaker_id)、16kHz采样率的音频数据、转录文本、单词级标注(包含单词内容及起止时间)和音素级标注(包含音素内容及起止时间)。数据集仅包含训练集,共476,805条样本数据,总大小约231GB。
This dataset is a speech dataset containing audio and text transcriptions with word and phoneme level annotations. Each sample includes: unique identifier (id), speaker ID (speaker_id), audio data sampled at 16kHz, transcription text, word-level annotations (including word content and start/end times) and phoneme-level annotations (including phoneme content and start/end times). The dataset contains only the training set, with a total of 476,805 samples and a size of approximately 231GB.
提供机构:
AdoCleanCode



