mpanda27/voxpopuli_it_pseudo_labelled
收藏Hugging Face2024-11-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/mpanda27/voxpopuli_it_pseudo_labelled
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含音频ID、音频数据、标准化文本、条件序列和Whisper转录文本等特征。数据集分为训练集、验证集和测试集,分别包含不同数量的字节和示例。训练集包含12306个示例,验证集包含708个示例,测试集包含688个示例。
This dataset is a collection of audio and text data, primarily used for speech recognition tasks. It includes Italian audio data, with each sample containing an audio ID, audio data, normalized text, a condition indicating whether it depends on the previous sample, and a Whisper transcript. The dataset is divided into training, validation, and test sets, containing 12306, 708, and 688 samples respectively.
提供机构:
mpanda27



