five

YembaTones: An Annotated Dataset for Tonal and Syllabic Analysis of the Yemba Language

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/cx268tmrwn
下载链接
链接失效反馈
官方服务:
资源简介:
YembaTones is a meticulously annotated dataset that focuses on tonal and syllabic variations in the Yemba language. It was created to facilitate automatic tone detection and enhance resources available for speech recognition and synthesis in this tonal language. This dataset is derived from a dictionary containing 344 Yemba/French words, carefully selected from commonly used phrases in the language. The words are grouped based on their spelling differences in terms of tones. Audio recordings of the pronunciation of these words were made by 11 native Yemba speakers, primarily linguistics specialists with a strong command of the language's sounds. The recordings were captured in various locations such as speakers' homes, university campuses, and workplaces. Subsequently, the recordings were cleaned and segmented into individual audio files corresponding to isolated word pronunciations using Audacity software. The YembaTones dataset consists of 3420 high-quality audio files that have been meticulously annotated at the syllabic and tonal levels using Praat software. It serves as a valuable resource not only for training and evaluating automatic tone detection models, but also for automatic speech recognition, speech synthesis in tonal and low-resource languages, as well as research in prosody, Yemba phonetics, speech acoustics, and phonetic linguistics. YembaTones provides a comprehensive foundation for further advancements in tonal analysis, speech technology, and linguistic research for the Yemba language. By addressing the scarcity of resources in this domain, this dataset paves the way for the development of more accurate and effective speech processing applications for tonal languages.
创建时间:
2023-10-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作