five

AISHELL-1

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2018S14
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>AISHELL-1 was developed by <a href="http://www.aishelltech.com/sy">Beijing Shell Shell Technology Co., Ltd.</a> It contains approximately 520 hours of Chinese Mandarin speech from 400 speakers recorded simultaneously on three different devices with associated transcripts.</p><br> <p>The goal of the collection was to support speech recognition system development in 11 domains, five of which are include in this corpus:&nbsp;Finance, Science &amp; Technology, Sports, Entertainment, and News. Participants read 500 sentences covering the domains; sentences were chosen for their speech and phonetic characteristics.</p><br> <p>Speakers were recruited from different accent areas across China, including North, South and Yue-Gui-Min regions. There were 214 female speakers and 186 male speakers, constituting 53% and 47% of the database, respectively. Additional demographic information about the participants is included in this release.</p><br> <h3>Data</h3><br> <p>Speech was recorded in a quiet indoor environment on a high fidelity microphone and two mobile phones (Android and iOS). All speech is presented as 16-bit flac compressed wav files; the microphone speech sample rate is 44.1kHz and the phone speech sample rate is 16kHz. Each speech file ranges from approximately 1 second to 14 seconds in length.</p><br> <p>Transcripts are stored as UTF-8 encoded plain text files and are not time-aligned.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2018S14.mic.flac">Microphone</a></li><br> <li><a href="desc/addenda/LDC2018S14.and.flac">Android</a></li><br> <li><a href="desc/addenda/LDC2018S14.ios.flac">iOS</a></li><br> <li><a href="desc/addenda/LDC2018S14.txt">Transcript</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2018 Beijing Shell Shell Technology Co., Ltd., © 2018 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作