Killkan
收藏arXiv2024-04-24 更新2024-06-21 收录
下载链接:
https://github.com/ctaguchi/killkan
下载链接
链接失效反馈官方服务:
资源简介:
Killkan是首个针对Kichwa语言的自动语音识别数据集,由圣母大学和厄瓜多尔天主教大学共同创建。该数据集包含约4小时的音频,涵盖了转录、西班牙语翻译和形态句法标注。数据来源于公开的Kichwa广播节目,特别关注Kichwa的粘着形态和与西班牙语的频繁代码转换。Killkan不仅支持开发首个Kichwa ASR系统,还展示了低资源语言资源构建及其应用的积极成果,旨在解决语言濒危和语言技术资源不足的问题。
Killkan is the first automatic speech recognition (ASR) dataset dedicated to the Kichwa language, jointly developed by the University of Notre Dame and Pontificia Universidad Católica del Ecuador. This dataset comprises roughly 4 hours of audio data, paired with corresponding transcriptions, Spanish translations, and morphosyntactic annotations. The dataset is sourced from publicly accessible Kichwa-language radio broadcasts, with a special emphasis on the agglutinative morphology of Kichwa and frequent code-switching between Kichwa and Spanish. Killkan not only enables the development of the first Kichwa ASR system, but also showcases positive outcomes in low-resource language resource construction and its practical applications, aiming to address the issues of language endangerment and the scarcity of language technology resources.
提供机构:
圣母大学
创建时间:
2024-04-24



