five

CRPIH_UVigo-GL-Voices: Galician TTS dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8027724
下载链接
链接失效反馈
官方服务:
资源简介:
CRPIH_UVigo-GL-Voices is a Galician TTS multi-speaker dataset containing audio recordings from four different speakers (two female and two male voices). The characteristics of each voice are detailed in the table below: Voice Name Gender Speaker Recording # Utts Duration Sampling Rate Format Sabela Female Professional radio broadcaster Professional studio 9,999 14h 28m 16kHz/ 44kHz 16-bit PCM Icía Female Amateur Semi-professional studio 2,950 4h 5m 16kHz/ 96kHz 16/24-bit PCM Iago Male Amateur Radio studio 1,316 1h 13m 16kHz/ 48kHz 16-bit PCM Paulo Male Amateur Radio studio 1,316 1h 15m 16kHz 16-bit PCM   Each speaker recorded a subset of utterances from a text corpus of 10,000 sentences, with a length between 1 and 44 words. This corpus is mainly composed of press excerpts, but it also contains a small subset of manually designed sentences. The press excerpts were extracted from newspapers published before 2010 ("O Correo Galego", "Galicia Hoxe" and "Vieiros"), whereas the hand-crafted sentences were created at the CRPIH in the year 1999. The data is organized into folders with each folder corresponding to one of the speakers. Each speaker's folder is composed of the following subdirectories: txt → Audio transcripts enconded in ISO 8859-1. fon → Phoneme-level forced alignment between phonemic transcriptions (provided by Cotovía) and audio recordings. wav_[bit depth]bits_[sampling rate]kHz → WAV format audio files at [sampling-rate]kHz [bit-depth]-bit (see table above). The file naming convention is as follows: two lowercase elements indicating the creators of the dataset (“crpih_uvigo”), the ISO code for the Galician language (“gl”), the name of the voice (e. g., “sabela”), and a 5-digit number identifying the utterance. All components are separated by underscores (e. g., “crpih_uvigo_gl_sabela_00001.txt”). For some of the speakers, there is an additional element for identifying dataset's splits (e. g., “crpih_uvigo_gl_iago_a_00001.txt” and “crpih_uvigo_gl_iago_b_00001.txt”). Acknowledgements We would like to thank the speakers for recording and donating their voices.
创建时间:
2024-03-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作