mesolitica/IMDA-STT

Name: mesolitica/IMDA-STT
Creator: mesolitica
Published: 2023-12-28 06:55:01
License: 暂无描述

Hugging Face2023-12-28 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mesolitica/IMDA-STT

下载链接

链接失效反馈

官方服务：

资源简介：

IMDA国家语音语料库（NSC）语音转文本数据集来源于新加坡的IMDA，包含了大量的语音数据，分为六个部分，每个部分的时长从1052小时到2162小时不等。数据集以mp3格式存储，并使用7z压缩。由于HuggingFace数据集格式在处理大数据集时效率较低，因此选择了直接读取mp3文件的方式。

The Speech-to-Text Dataset of the IMDA National Speech Corpus (NSC) is sourced from IMDA Singapore. It contains a large volume of speech data, divided into six subsets, with each subset ranging from 1052 to 2162 hours in duration. The dataset is stored in MP3 format and compressed using 7z. Given the low efficiency of the HuggingFace dataset format when handling large-scale datasets, the approach of directly reading MP3 files was adopted.

提供机构：

mesolitica

原始信息汇总