PromitoLipi: A versatile offline dataset of handwritten Bangla words and paragraphs
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/fnw59h7y89
下载链接
链接失效反馈官方服务:
资源简介:
The PromitoLipi Dataset contains two different datasets, PromitoLipi1.1 and PromitoLipi2.1. PromitoLipi1.1 contains 80 single-page handwritten paragraphs of individuals of different personalities and ages, with 7231 words. Among these datasets, most of the paragraphs are comprehensive. They can be used for handwritten line/word segmentation, paragraph recognition, and multimodal natural language processing tasks like document summarization, sentiment analysis, and content extraction. Also, some paragraphs are incomprehensive/incoherent( random unassociated words are written one after another) and are primarily helpful for segmentation-based tasks.
On the other hand, PromitoLipi2.1 contains 9830 open vocabulary word images consisting of 24050 consonant/vowel/diacritic/number/conjunct/punctuation instances and their corresponding annotation files. This dataset can be used for handwritten character segmentation and word recognition. For 70% of the words in this dataset, Writers of different ages from different areas were asked to write random words on paper. For the rest of the 30%, words were collected from different CMATERdb datasets to create versatility in the dataset. Then, the word images were segmented and binarized(background pixels in white and foreground/text pixels in black) so that each image contained no irrelevant information besides the word text consisting of single/multiple classes.
创建时间:
2024-02-05



