A speech dataset of three ethnic languages of Bangladesh: Chakma, Garo, and Marma.
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/yjhybztwf4/1
下载链接
链接失效反馈官方服务:
资源简介:
This is a managed speech corpus of three ethnic languages of Bangladesh, specifically, Chakma, Marma, and Garo. It comprises 2321 short audio recordings (between 1 and 4 seconds long) of 11 native speakers (20-26 years old) who were reading 211 predefined sentences in the Bengali language. Smartphones were used to record in various acoustic environments and standardized with further metadata information such as the speaker ID, age, gender, ethnicity, recording environment, recording device and file length. The data can be used in automatic speech recognition, speaker/language recognition, acoustic modeling as well as other low-resource speech processing problems.
本数据集为孟加拉国三种民族语言的标准化语音语料库,具体涵盖查克玛语(Chakma)、马尔马语(Marma)以及加罗语(Garo)。该语料库包含11名母语使用者(年龄20至26岁)朗读211条预定义孟加拉语句子所生成的2321条短音频录音,单条录音时长介于1至4秒之间。采集工作使用智能手机在多种声学环境下完成,同时附带了标准化的元数据信息,包括说话人ID、年龄、性别、民族、录制环境、录制设备以及音频文件时长。该数据集可应用于自动语音识别、说话人/语言识别、声学建模以及其他低资源语音处理相关任务。
提供机构:
Daffodil International University



