five

Synthetic Printed Words and Test Protocols Data for Bangla OCR

收藏
DataCite Commons2023-06-13 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Synthetic_Printed_Words_and_Test_Protocols_Data_for_Bangla_OCR/20186825
下载链接
链接失效反馈
官方服务:
资源简介:
Synthetic Printed word image data and test protocols word image Data repository for the paper <strong>"A Multifaceted Evaluation of Representation of Graphemes for Practically Effective Bangla OCR."</strong> In this paper, we have utilized the popular Convolutional Recurrent Neural Network (CRNN) architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Updates: 10 June 2023: The paper has been accepted for publication in International Journal on Document Analysis and Recognition (IJDAR).
提供机构:
figshare
创建时间:
2022-06-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作