BanglaRegionalTextCorpus: A Curated Dataset for Four Regional Bangla Dialects
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/92r62h4k5k
下载链接
链接失效反馈官方服务:
资源简介:
The BanglaRegionalTextCorpus is a manually curated dataset comprising 4,653 Bangla sentences representing four regional dialects—Rangpur, Barisal, Narail, and Khulna—along with their Standard Bangla and English translations. The data were collected through community interactions, field recordings, and online sources, followed by linguistic validation from native speakers. The corpus highlights regional lexical, phonetic, and syntactic variations, providing a valuable resource for dialect identification, translation, sociolinguistic analysis, and inclusive NLP model development
创建时间:
2026-01-27



