Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://data.mendeley.com/datasets/bj5jgk878b
下载链接
链接失效反馈官方服务:
资源简介:
The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows: Specifics of the Core Data: —------------------------------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) English: Train 1875, Test 375, Validation 250 (Total 2500) Specifics of the Regional Data: —-------------------------------------- Chittagong: —------------ Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Noakhali: —--------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Sylhet: —------ Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Barishal: —--------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Mymensingh: —--------------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
创建时间:
2024-01-23



