five

Common Voice

收藏
arXiv2020-03-06 更新2024-06-21 收录
下载链接:
http://voice.mozilla.org
下载链接
链接失效反馈
官方服务:
资源简介:
Common Voice是由Mozilla创建的一个大规模多语言语音数据集,旨在支持语音技术研究和开发。该数据集通过众包方式收集和验证数据,目前已涵盖38种语言,总音频时长超过2,500小时。数据集内容包括多种语言的语音数据,每种语言的数据量从数百到数万条不等,主要来源于社区成员的录音。创建过程中,通过Mozilla提供的工具和平台,如Pontoon和Sentence Collector,社区成员可以翻译界面、提交文本和录音,并通过投票系统验证数据。Common Voice的应用领域广泛,主要用于自动语音识别(ASR)研究,尤其是对于资源较少语言的支持。

Common Voice is a large-scale multilingual speech dataset created by Mozilla, with the goal of supporting research and development in speech technology. This dataset collects and validates its data via crowdsourcing, and currently covers 38 languages with a total audio duration exceeding 2,500 hours. The dataset encompasses speech data for various languages, where the scale of each language's dataset ranges from hundreds to tens of thousands of entries, mainly sourced from recordings provided by community members. During the dataset's creation, community members are able to translate interfaces, submit text and audio recordings, and validate the collected data through a voting system, utilizing tools and platforms offered by Mozilla such as Pontoon and Sentence Collector. Common Voice has a broad range of application domains, and is primarily utilized for automatic speech recognition (ASR) research, particularly to support under-resourced languages.
提供机构:
Mozilla
创建时间:
2019-12-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作