Bengali.AI Handwritten Graphemes

Name: Bengali.AI Handwritten Graphemes
Creator: OpenDataLab
Published: 2026-05-24 13:30:31
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/Bengali_AI_Handwritten_Graphemes

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含单个手写孟加拉语字符的图像。孟加拉语字符（字形）是通过组合三个组件编写的：字形根、元音变音符号和辅音变音符号。您的挑战是对每个图像中字素的组成部分进行分类。大约有 10,000 个可能的字素，其中大约 1,000 个在训练集中表示。测试集包括一些在 train 中不存在但没有新的字素组件的字素。需要很多志愿者填写这样的表格才能生成有用数量的真实数据；将问题集中在字素组件上而不是识别整个字素应该可以组装一个孟加拉语 OCR 系统，而无需为所有 10,000 个字素提供手写样本。

This dataset contains images of individual handwritten Bengali characters. Bengali characters (graphemes) are constructed by combining three components: grapheme roots, vowel diacritics, and consonant diacritics. The core challenge is to classify the constituent components of each grapheme in the input images. There are roughly 10,000 potential graphemes, with approximately 1,000 of them represented in the training dataset. The test set encompasses certain graphemes that are absent from the training set but do not involve any new grapheme components. Generating a viable quantity of authentic real-world data necessitates the participation of a large number of volunteers who complete such forms; by focusing the classification task on grapheme components rather than full grapheme recognition, it becomes possible to build a Bengali OCR system without requiring handwritten samples for all 10,000 possible graphemes.

提供机构：

OpenDataLab

创建时间：

2022-09-01

搜集汇总

数据集介绍