five

Wikipedia Spanish Speech and Transcripts

收藏
DataCite Commons2021-08-12 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2021S07
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Wikipedia Spanish Speech and Transcripts consists of approximately 25 hours of Spanish read speech and transcripts. The read text was taken from the Spanish version of <a href="https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spoken_Wikipedia"> WikiProject Spoken Wikipedia</a>, referred to as <a href="https://es.wikipedia.org/wiki/Wikiproyecto:Wikipedia_grabada">Wikipedia Grabada</a>. The transcripts were developed for this release.</p><br> <h3>Data</h3><br> <p>The audio is comprised of short recordings from Wikipedia articles red by 193 speakers (150 male, 43 female). Audio and transcripts were segmented and transcribed by native Spanish speakers.</p><br> <p>Audio is presented as 16kHz, 16-bit, single channel flac files. When uncompressed, they produce PCM wav files.</p><br> <p>Transcripts are contained in a single plain text file encoded as UTF-8. Speaker metadata is also provided.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2021S07.f.flac">Female Speech</a></li><br> <li><a href="desc/addenda/LDC2021S07.f.txt">Female Transcript</a></li><br> <li><a href="desc/addenda/LDC2021S07.m.flac">Male Speech</a></li><br> <li><a href="desc/addenda/LDC2021S07.m.txt">Male Transcript</a></li><br> </ul><br> <h3>Acknowledgements</h3><br> <p>The authors thank Alberto Templos Carbajal, Elena Vera and Ang&eacute;lica Guti&eacute;rrez for their support of the social service program "Desarrollo de Tecnolog&iacute;as del Habla" at the Facultad de Ingenier&iacute;a (FI) of the Universidad Nacional Aut&oacute;noma de M&eacute;xico (UNAM) and also thank the social service students for all the hard work.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2021 Carlos Daniel Hernández Mena, © 2021 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2021-08-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作