five

XBMU-bo-Lhasa31:A Speech Recognition Dataset for the Lhasa Dialect of Tibetan

收藏
DataCite Commons2025-07-01 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=bdd9e8849a584d7d9152163022e58c6c
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset consists of audio files, text files and description files. Where (1) wav is the audio folder, under which it is divided into 51 subfolders according to the speaker, with a total duration of 31.61 hours, containing 24,289 speech samples, with an average duration of 4.68 seconds each, totaling 2.68 GB.(2) The text in the transcript file corresponds to the audio one-to-one, where all the textual data are derived from the news domain, and the textual non pronunciation symbols are normalized. (3)The readme.txt file contains some basic information of the dataset. (4) resource_lexicon.txt is the pronunciation lexicon file.
提供机构:
Science Data Bank
创建时间:
2025-07-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作