five

ASR-GigaSpeech-MulDomESC: A Multi-domain English Speech Corpus

收藏
MagicHub开源社区2021-07-29 更新2024-06-08 收录
下载链接:
https://magichub.com/datasets/giga-speech/
下载链接
链接失效反馈
官方服务:
资源简介:
GigaSpeech, prepared and released by SpeechColab, is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 33,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 33,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles,and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription.
创建时间:
2021-07-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作