AdoCleanCode/portuguese-multi-mfa

Name: AdoCleanCode/portuguese-multi-mfa
Creator: AdoCleanCode
Published: 2025-12-14 21:16:35
License: 暂无描述

Hugging Face2025-12-14 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/AdoCleanCode/portuguese-multi-mfa

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个包含音频和文本转录信息的语音数据集，具有详细的单词和音素级别的时间标注。每个样本包含唯一的id和说话者id，音频数据的采样率为16000Hz，以及对应的转录文本。此外，数据集提供了单词和音素的时间标注信息，包括每个单词或音素的开始和结束时间。数据集仅包含训练集，共有39230个样本，总大小约为19.56GB。

This dataset is a speech dataset containing audio and text transcription information with detailed word and phoneme-level time annotations. Each sample includes a unique id and speaker id, audio data with a sampling rate of 16000Hz, and corresponding transcript text. Additionally, the dataset provides time annotation information for words and phonemes, including the start and end times of each word or phoneme. The dataset only contains a training set, with a total of 39,230 samples and a total size of approximately 19.56GB.

提供机构：

AdoCleanCode

5,000+

优质数据集

54 个

任务类型

进入经典数据集