humairawan/giga_mfa_correct_
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/humairawan/giga_mfa_correct_
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含音频及其转录文本的语音数据集,同时提供了单词和音素级别的时间标注信息。具体特征包括:每条数据有唯一的id和说话者id,音频文件的采样率为16000Hz,转录文本,以及单词和音素的时间标注(包括每个单词或音素的开始和结束时间)。数据集仅包含训练集,共有300,574个样本,总大小为38,147,390,728.25字节。
This dataset is a speech dataset containing audio and its transcriptions, along with time-aligned word and phoneme annotations. The features include: unique id and speaker id for each entry, audio files with a sampling rate of 16000Hz, transcriptions, and time-aligned word and phoneme annotations (including start and end times for each word or phoneme). The dataset only includes a training set with 300,574 samples and a total size of 38,147,390,728.25 bytes.
提供机构:
humairawan



