Central Kurdish (CK) Speech Corpus for Text-to-Speech
收藏DataCite Commons2025-05-12 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/jmtn248cc9/2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset represents a comprehensive resource for advancing Kurdish TTS systems. Converting text to speech is one of the important topics in the design and construction of multimedia systems, human-machine communication, and information and communication technology, and its purpose, along with speech recognition, is to establish communication between humans and machines in its most basic and natural form, that is, spoken language.
For our text corpus, we collected 6,565 sentences from a set of texts in various categories, including Kurdish books, news, sport, health, question and exclamation sentences, science, general information, politics, education and literature, story, and tourism, to create the train sentences. We thoroughly reviewed the texts and normalized them, then they were recorded by a male speaker. We recorded audios in a voice recording studio at 44,100Hz, and all audio files are down sampled to 22,050 Hz in our modeling process. The audio ranges from 3 to 36 seconds in length. We generate the speech corpus in this method, and the last speech has about 6,565 texts and audio pairings, which takes around 19 hours. Altogether, audio files are saved in wave format, and the texts are saved in text files in the corresponding sub-folders. Furthermore, for model training, all of the audio files are gathered in a single folder. Each line in the transcript files is formatted as WAVS | audio file’s name.wav| transcript. The audio file’s name includes the extensions, and the transcript was the speech's text.
The audio recording and editing process lasted for 90 days. It involved capturing over 6,565 WAV files and over 19 h of recorded speech. The data set helps researchers improve Kurdish TTS early, thereby reducing the time consumed for this process.
Acknowledgments:
We would like to express our sincere gratitude to Ayoub Mohammadzadeh for his invaluable support in recording the corpus.
本数据集是推进库尔德语文本转语音(TTS)系统的综合性资源。文本转语音是多媒体系统、人机通信及信息与通信技术设计与构建中的重要研究方向之一,其与语音识别(speech recognition)的目标一致,即通过最基础、最自然的形式——口语——实现人机间的交互。
在文本语料库(text corpus)构建方面,我们从库尔德语书籍、新闻、体育、健康、疑问与感叹句、科学、通用信息、政治、教育与文学、故事及旅游等多类别文本中收集了6565个句子作为训练语句。我们对文本进行了全面审核与标准化处理,随后由一名男性说话人录制。录音在专业录音室以44100Hz采样率完成,建模过程中将所有音频文件下采样至22050Hz。音频时长范围为3至36秒。通过此方法构建的语音语料库包含约6565组文本-音频配对,总时长约19小时。所有音频文件以WAV格式存储,文本则对应保存在子文件夹的文本文件中。此外,为便于模型训练,所有音频文件被汇总至单一文件夹。转录文件(transcript files)中每行格式为:WAVS | 音频文件名.wav | 转录文本。音频文件名包含扩展名,转录文本即对应语音内容。
音频录制与编辑过程历时90天,共获取6565余个WAV文件及19余小时的语音数据。本数据集可助力研究人员早期优化库尔德语TTS系统,从而缩短相关研发时间。
致谢:我们诚挚感谢Ayoub Mohammadzadeh在语料库录制过程中提供的宝贵支持。
提供机构:
Mendeley Data
创建时间:
2025-05-12



