five

BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla Handwritten Text Recognition (HTR)

收藏
Mendeley Data2021-06-25 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/743k6dm543/1
下载链接
链接失效反馈
官方服务:
资源简介:
We introduce a new dataset for offline Handwritten Text Recognition (HTR) from images of Bangla scripts comprising words, lines, and document-level annotations. The BN-HTRd dataset is based on the BBC Bangla News corpus - which acted as ground truth texts for the handwritings. Our dataset contains a total of 788 full-page images collected from 150 different writers. With a staggering 108,147 instances of handwritten words, distributed over 13,867 lines and 23,115 unique words, this is currently the 'largest and most comprehensive dataset' in this field. We also provided the YOLO annotations for lines and the ground truth annotations for both full-text and words, along with the segmented images and their positions. The contents of our dataset came from a diverse news category, and annotators of different ages, genders, and backgrounds, having variability in writing styles. The BN-HTRd dataset can be adopted as a basis for various handwriting classification tasks such as end-to-end document recognition, word-spotting, word/line segmentation, and so on. The statistics of the dataset is given below: ------------------------------------------------- Number of writers = 150 Total number of images = 788 Total number of lines = 13,867 Total number of words = 108,147 Total number of unique words = 23,115 Total number of punctuation’s = 7,446 Total number of characters = 5,74,203 -------------------------------------------------
创建时间:
2021-06-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作