five

Avalinguo

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/papasega/Avalinguo-Audio-Dataset-splitted
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为Avalinguo,包含了1424段非母语英语者的音频记录,这些记录被标注为低、中或高流畅度。在排除了36个语音极少的文件后,剩余1388段录音,采样率为16千赫兹,总时长约2小时,涵盖了对话式口语。这些语音区域根据说话单词的数量,时长介于0.5秒至4.9秒之间。此外,一个包含转写文本和统计分析的标准化版本可在网上获取。该数据集规模为1388段录音,其任务是进行流畅度评估。

This dataset, named Avalinguo, contains 1,424 audio recordings from non-native English speakers, which are annotated with low, medium, or high fluency levels. After excluding 36 files with extremely sparse speech content, 1,388 valid recordings are retained. The recordings have a sampling rate of 16 kHz, with a total duration of roughly 2 hours, and cover conversational spoken English. Each speech segment in the recordings, based on the number of spoken words, has a duration ranging from 0.5 seconds to 4.9 seconds. Additionally, a standardized version including transcriptions and statistical analyses is publicly available online. The dataset comprises 1,388 recordings, with the associated task being fluency assessment.
提供机构:
Hugging Face
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作