ONUBAD: An Extensive Dataset for Automated Translation of Bangla Regional Dialects into Standard Bangla Language
收藏DataCite Commons2025-04-01 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/6ft99kf89b
下载链接
链接失效反馈官方服务:
资源简介:
1. Although extensive research has been conducted on the Bangla language in natural language processing (NLP), a substantial resource gap exists for its various regional dialects, including those spoken in Chittagong, Sylhet, and Barisal.
2. Linguists even classify these as separate languages. To address this, we introduce ONUBAD, an extensive and open-access dataset for the automated translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla.
3. The translation of regional dialects into Standard Bengali can enhance communication between local farmers and agricultural extension services, help preserve cultural identity and heritage, and provide a valuable resource for research in the field of natural language processing (NLP).
4. The data was extracted from various Facebook pages, websites, and regional people in Bangladesh. It was selectively collected to ensure balanced representation across different data labels. Additionally, the data has been annotated by native experts in Bangla regional dialects.
5. This dataset captures the most frequently regional words, clauses, and sentences which consist of total 6160 words, 520 clauses, and 3920 sentences from different regions, including Chittagong, Barisal, Sylhet, and Standard Bangla. The dataset details are as follows:
Barisal:
---------
Words: 1540 Clause: 130 Sentence: 980
Sylhet:
--------
Words: 1540 Clause: 130 Sentence: 980
Chittagong:
-------------
Words: 1540 Clause: 130 Sentence: 980
Standard Bangla:
-------------------
Words: 1540 Clause: 130 Sentence: 980
English Translation:
-------------------
Words: 1540 Clause: 130 Sentence: 980
提供机构:
Mendeley Data
创建时间:
2024-10-17



