five

Donate Speech Corpus: Training data (100h)

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://etsin.fairdata.fi/dataset/aa707a22-77bd-49cf-9a25-440f7e20a152
下载链接
链接失效反馈
官方服务:
资源简介:
This resource is available for download in Kielipankki - The Language Bank of Finland as part of "Donate Speech: Selected dataset", http://urn.fi/urn:nbn:fi:lb-2022060127. The resource contains a subset of 100 hours of transcribed speech that was selected from the Donate Speech Corpus and used for training an ASR system at Aalto University. The training data includes speech from 1129 different speakers (according to the metadata accompanying the original recordings). Note that the training dataset has just over 20% of male speakers, whereas the puhelahjat-test and puhelahjat-dev sets contain 40% of male speakers. For speech technology development purposes, the training dataset can be used together with the puhelahjat-test and puhelahjat-dev datasets. There is no overlap of speakers between these three sets.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作