Automatic Catalan KWS Database for Projecte AINA
收藏DataCite Commons2025-06-10 更新2024-07-13 收录
下载链接:
https://dataverse.csuc.cat/citation?persistentId=doi:10.34810/data1400
下载链接
链接失效反馈官方服务:
资源简介:
Automatically extracted Catalan word database using alignment techniques (Montreal Forced Alignment, MFA) from speech databases with transcriptions. Precisely: Mozilla Common Voice, ParlamentParla, and OpenSLR-69. Usable for training keyword spotting models for home automation.
MFA leverages algorithms to accurately synchronize speech signals with the corresponding text at the phoneme level.
Two versions of the database have been created:
general: This version encompasses all data, providing a comprehensive dataset for various analyses and applications.
split: This version is divided into train, dev, and test to ease the task of training a keyword spotting model. Speaker-wise, It is divided by 80%, 10%, and 10%.
提供机构:
CORA.Repositori de Dades de Recerca
创建时间:
2024-06-04



