AdoCleanCode/giga_mfa_correct_

Name: AdoCleanCode/giga_mfa_correct_
Creator: AdoCleanCode
Published: 2025-12-15 13:01:03
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/AdoCleanCode/giga_mfa_correct_

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个语音数据集，包含音频文件及其转录文本、单词和音素级别的标注信息。具体特征包括：id（唯一标识符）、speaker_id（说话者标识符）、音频（采样率为16000Hz）、转录文本、单词列表（包含单词及其起止时间）和音素列表（包含音素及其起止时间）。数据集仅包含训练集，共有300574个样本，总大小约为38.15GB。

This is a speech dataset containing audio files along with their transcriptions, word-level and phoneme-level annotations. The features include: id (unique identifier), speaker_id (speaker identifier), audio (with a sampling rate of 16000Hz), transcript, words list (including words and their start and end times), and phonemes list (including phonemes and their start and end times). The dataset only contains a training set with 300,574 samples, totaling approximately 38.15GB.

提供机构：

AdoCleanCode

5,000+

优质数据集

54 个

任务类型

进入经典数据集