全球多口音英语高质量语音数据集

Name: 全球多口音英语高质量语音数据集
Creator: 北京海天瑞声科技股份有限公司
Published: 2026-04-10 19:03:05
License: 暂无描述

国家数据集管理服务平台2026-04-10 更新2026-04-29 收录

下载链接：

https://www.ndsms.cn/dataRetrieval/datasetDetail/?id=cb2401722550924daba5f4424bde0dd6

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集融合英语播客音频与专业团队定向采集的65国口音英语语音数据，总时长达124万小时，数据规模为125 TB。收录超过42000种音色，覆盖标准英语、地域方言、非母语口音等多样化语音形态，支撑语音识别、语言研究、跨文化交流等领域前沿研究，同时该数据集近三年累计服务44个全球知名企业，覆盖商贸流通、智能驾驶、智慧金融、教育科研等领域多个场景。

This dataset integrates English podcast audio and English speech data of 65 national English accents collected by a professional team targeting these accents. It has a total duration of 1.24 million hours and a total data volume of 125 TB, containing over 42,000 unique voice timbres. The dataset covers diverse speech forms including standard English, regional dialects, non-native accents and more, supporting cutting-edge research in fields such as speech recognition, linguistic studies, cross-cultural communication and other related areas. Additionally, over the past three years, this dataset has served 44 globally renowned enterprises, covering multiple application scenarios across sectors including commercial trade circulation, intelligent driving, smart finance, education and research.

提供机构：

北京海天瑞声科技股份有限公司

创建时间：

2026-04-10

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个超大规模、高质量的多口音英语语音库，融合了英语播客音频与专业团队采集的65国口音数据，总时长124万小时，规模达125TB，收录超过42000种音色。它覆盖了标准英语、地域方言及非母语口音等多样化语音形态，为语音识别、语言研究和跨文化交流等前沿领域提供支持，并已服务44家全球知名企业，应用于商贸流通、智能驾驶、智慧金融及教育科研等多个场景。

以上内容由遇见数据集搜集并总结生成