AdoCleanCode/german-multi-mfa

Name: AdoCleanCode/german-multi-mfa
Creator: AdoCleanCode
Published: 2025-12-16 03:28:51
License: 暂无描述

Hugging Face2025-12-16 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/AdoCleanCode/german-multi-mfa

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个包含音频和文本转录的语音数据集，具有单词和音素级别的标注信息。每个样本包含：唯一标识符(id)、说话人ID(speaker_id)、16kHz采样率的音频数据、转录文本、单词级标注(包含单词内容及起止时间)和音素级标注(包含音素内容及起止时间)。数据集仅包含训练集，共476,805条样本数据，总大小约231GB。

This dataset is a speech dataset containing audio and text transcriptions with word and phoneme level annotations. Each sample includes: unique identifier (id), speaker ID (speaker_id), audio data sampled at 16kHz, transcription text, word-level annotations (including word content and start/end times) and phoneme-level annotations (including phoneme content and start/end times). The dataset contains only the training set, with a total of 476,805 samples and a size of approximately 231GB.

提供机构：

AdoCleanCode

5,000+

优质数据集

54 个

任务类型

进入经典数据集