Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/bj5jgk878b
下载链接
链接失效反馈官方服务:
资源简介:
The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows:
Specifics of the Core Data:
—-------------------------------
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
English: Train 1875, Test 375, Validation 250 (Total 2500)
Specifics of the Regional Data:
—--------------------------------------
Chittagong:
—------------
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Noakhali:
—---------
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Sylhet:
—------
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Barishal:
—---------
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Mymensingh:
—---------------
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
创建时间:
2024-01-15



