five

Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/bj5jgk878b
下载链接
链接失效反馈
官方服务:
资源简介:
The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows: Specifics of the Core Data: —------------------------------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) English: Train 1875, Test 375, Validation 250 (Total 2500) Specifics of the Regional Data: —-------------------------------------- Chittagong: —------------ Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Noakhali: —--------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Sylhet: —------ Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Barishal: —--------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Mymensingh: —--------------- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
创建时间:
2024-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作