A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes
收藏arXiv2021-01-14 更新2024-07-31 收录
下载链接:
http://www.kaggle.com/c/bengaliai-cv19
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes’,由Bengali.AI创建,包含411,882个精心筛选的样本,涵盖1295个日常使用中常见的孟加拉语手写音节。此外,测试集中还包含900个不常见的音节,用于评估模型对未见过的音节的识别能力。数据集旨在为多目标音节分类的视觉算法提供基准测试。音节的独特性基于Google孟加拉语自动语音识别(ASR)语料库中的常见性。通过Kaggle竞赛的进程,我们观察到深度学习方法能够泛化到训练过程中未出现的广泛音节范围,证明了数据集的有效性。
This dataset, named *A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes*, was created by Bengali.AI. It comprises 411,882 carefully curated samples covering 1,295 common Bengali handwritten graphemes used in daily life. Additionally, the test set includes 900 rare graphemes intended to evaluate a model's ability to recognize unseen graphemes. This dataset is designed to serve as a benchmark for visual algorithms targeting multi-target grapheme classification. The selection criteria for these graphemes are based on their frequency in the Google Bengali automatic speech recognition (ASR) corpus. During the associated Kaggle competition, it was observed that deep learning models can generalize to a wide range of graphemes not seen during training, which validates the effectiveness of this dataset.
提供机构:
Bengali.AI
创建时间:
2020-10-01



