DATAI-By-ML-Data-Products/SouthAfrican_Accented_English_SpeechData
收藏Hugging Face2025-12-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/DATAI-By-ML-Data-Products/SouthAfrican_Accented_English_SpeechData
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含1小时的南非口音英语语音,配有人工标注的转录文本,作为公开样本展示完整商业版本的数据结构、音频质量和转录风格。音频文件为高质量的.mp3格式,附有包含ID、文件名和转录文本的元数据文件。转录过程包括自动语音识别系统的初步处理和人工审核,以反映真实世界的噪音和不完美情况。数据集适用于数据集检查、口音分析、ASR实验和商业评估,但样本量较小且转录存在约15%的错误率。完整数据集为300小时,需联系DATAI获取。
This dataset contains 1 hour of South African–accented English speech paired with human-annotated transcripts. It is provided as a public sample to demonstrate the dataset structure, audio quality, and transcription style used in the full commercial version. Audio files are in high-quality .mp3 format, accompanied by metadata including ID, file name, and transcript. The transcription process involves initial processing by an Automatic Speech Recognition (ASR) system followed by human review to reflect real-world noise and imperfections. The dataset is suitable for dataset inspection, accent analysis, ASR experimentation, and commercial assessment, but the sample size is small with an estimated transcription error rate of approximately 15%. The full 300-hour dataset is available by contacting DATAI.
提供机构:
DATAI-By-ML-Data-Products



