BanglaRegionalTextCorpus: A Curated Dataset for Four Regional Bangla Dialects
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/92r62h4k5k/3
下载链接
链接失效反馈官方服务:
资源简介:
The BanglaRegionalTextCorpus is a manually curated dataset comprising 4,653 Bangla sentences representing four regional dialects—Rangpur, Barisal, Narail, and Khulna—along with their Standard Bangla and English translations. The data were collected through community interactions, field recordings, and online sources, followed by linguistic validation from native speakers. The corpus highlights regional lexical, phonetic, and syntactic variations, providing a valuable resource for dialect identification, translation, sociolinguistic analysis, and inclusive NLP model development
提供机构:
Comilla University; Daffodil International University



