Donate Speech Corpus: Training data (100h)

Mendeley Data2024-01-31 更新2024-06-27 收录

下载链接：

https://etsin.fairdata.fi/dataset/aa707a22-77bd-49cf-9a25-440f7e20a152

下载链接

链接失效反馈

官方服务：

资源简介：

This resource is available for download in Kielipankki - The Language Bank of Finland as part of "Donate Speech: Selected dataset", http://urn.fi/urn:nbn:fi:lb-2022060127. The resource contains a subset of 100 hours of transcribed speech that was selected from the Donate Speech Corpus and used for training an ASR system at Aalto University. The training data includes speech from 1129 different speakers (according to the metadata accompanying the original recordings). Note that the training dataset has just over 20% of male speakers, whereas the puhelahjat-test and puhelahjat-dev sets contain 40% of male speakers. For speech technology development purposes, the training dataset can be used together with the puhelahjat-test and puhelahjat-dev datasets. There is no overlap of speakers between these three sets.

创建时间：

2024-01-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集