Synthetic Printed Words and Test Protocols Data for Bangla OCR

Name: Synthetic Printed Words and Test Protocols Data for Bangla OCR
Creator: figshare
Published: 2023-06-13 09:28:30
License: 暂无描述

DataCite Commons2023-06-13 更新2024-07-29 收录

下载链接：

https://figshare.com/articles/dataset/Synthetic_Printed_Words_and_Test_Protocols_Data_for_Bangla_OCR/20186825

下载链接

链接失效反馈

官方服务：

资源简介：

Synthetic Printed word image data and test protocols word image Data repository for the paper <strong>"A Multifaceted Evaluation of Representation of Graphemes for Practically Effective Bangla OCR."</strong> In this paper, we have utilized the popular Convolutional Recurrent Neural Network (CRNN) architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Updates: 10 June 2023: The paper has been accepted for publication in International Journal on Document Analysis and Recognition (IJDAR).

提供机构：

figshare

创建时间：

2022-06-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集