five

ONUBAD: An Extensive Dataset for Automated Translation of Bangla Regional Dialects into Standard Bangla Language

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/6ft99kf89b
下载链接
链接失效反馈
官方服务:
资源简介:
1. Although extensive research has been conducted on the Bangla language in natural language processing (NLP), a substantial resource gap exists for its various regional dialects, including those spoken in Chittagong, Sylhet, and Barisal. 2. Linguists even classify these as separate languages. To address this, we introduce ONUBAD, an extensive and open-access dataset for the automated translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla. 3. The translation of regional dialects into Standard Bengali can enhance communication between local farmers and agricultural extension services, help preserve cultural identity and heritage, and provide a valuable resource for research in the field of natural language processing (NLP). 4. The data was extracted from various Facebook pages, websites, and regional people in Bangladesh. It was selectively collected to ensure balanced representation across different data labels. Additionally, the data has been annotated by native experts in Bangla regional dialects. 5. This dataset captures the most frequently regional words, clauses, and sentences which consist of total 6160 words, 520 clauses, and 3920 sentences from different regions, including Chittagong, Barisal, Sylhet, and Standard Bangla. The dataset details are as follows: Barisal: --------- Words: 1540 Clause: 130 Sentence: 980 Sylhet: -------- Words: 1540 Clause: 130 Sentence: 980 Chittagong: ------------- Words: 1540 Clause: 130 Sentence: 980 Standard Bangla: ------------------- Words: 1540 Clause: 130 Sentence: 980 English Translation: ------------------- Words: 1540 Clause: 130 Sentence: 980
创建时间:
2024-12-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作