NDD
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/EncryptedBinary/BanglaDialecto
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了专为微调自动语音识别(ASR)和大型语言模型(LLM)而设计的方言语音信号,目的是将方言语音转录为方言文本,并将其翻译成标准的孟加拉语文本。该数据集涵盖了带注解的方言音频信号,附有方言转录文本及其对应的标准化孟加拉语文本翻译。数据规模为7200个样本,分为训练集(6270个样本)、验证集(810个样本)和测试集(120个样本)。任务包括自动语音识别(ASR)和机器翻译(MT)。
This dataset comprises dialectal speech signals specifically tailored for fine-tuning Automatic Speech Recognition (ASR) and Large Language Models (LLMs). Its goals are to transcribe dialectal speech into dialectal text and translate such text into standard Bengali text. This dataset includes annotated dialectal audio signals, paired with their dialectal transcriptions and corresponding standardized Bengali translations. The dataset contains 7200 total samples, divided into the training set (6270 samples), validation set (810 samples), and test set (120 samples). The supported tasks include Automatic Speech Recognition (ASR) and Machine Translation (MT).



