XBMU-bo-Lhasa31:A Speech Recognition Dataset for the Lhasa Dialect of Tibetan
收藏DataCite Commons2025-07-01 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=bdd9e8849a584d7d9152163022e58c6c
下载链接
链接失效反馈官方服务:
资源简介:
The dataset consists of audio files, text files and description files. Where (1) wav is the audio folder, under which it is divided into 51 subfolders according to the speaker, with a total duration of 31.61 hours, containing 24,289 speech samples, with an average duration of 4.68 seconds each, totaling 2.68 GB.(2) The text in the transcript file corresponds to the audio one-to-one, where all the textual data are derived from the news domain, and the textual non pronunciation symbols are normalized. (3)The readme.txt file contains some basic information of the dataset. (4) resource_lexicon.txt is the pronunciation lexicon file.
提供机构:
Science Data Bank
创建时间:
2025-07-01



