Synthetic Printed Words and Test Protocols Data for Bangla OCR
收藏Figshare2022-06-29 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Synthetic_Printed_Words_and_Test_Protocols_Data_for_Bangla_OCR/20186825
下载链接
链接失效反馈官方服务:
资源简介:
Synthetic Printed word image data and test protocols word image Data repository for the paper "A Multifaceted Evaluation of Representation of Graphemes for Practically Effective Bangla OCR." In this paper, we have utilized the popular Convolutional Recurrent Neural Network (CRNN) architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Updates: 10 June 2023: The paper has been accepted for publication in International Journal on Document Analysis and Recognition (IJDAR).
创建时间:
2022-06-29



