PromitoLipi: A versatile offline dataset of handwritten Bangla words and paragraphs

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://data.mendeley.com/datasets/fnw59h7y89

下载链接

链接失效反馈

官方服务：

资源简介：

The PromitoLipi Dataset contains two different datasets, PromitoLipi1.1 and PromitoLipi2.1. PromitoLipi1.1 contains 80 single-page handwritten paragraphs of individuals of different personalities and ages, with 7231 words. Among these datasets, most of the paragraphs are comprehensive. They can be used for handwritten line/word segmentation, paragraph recognition, and multimodal natural language processing tasks like document summarization, sentiment analysis, and content extraction. Also, some paragraphs are incomprehensive/incoherent( random unassociated words are written one after another) and are primarily helpful for segmentation-based tasks. On the other hand, PromitoLipi2.1 contains 9830 open vocabulary word images consisting of 24050 consonant/vowel/diacritic/number/conjunct/punctuation instances and their corresponding annotation files. This dataset can be used for handwritten character segmentation and word recognition. For 70% of the words in this dataset, Writers of different ages from different areas were asked to write random words on paper. For the rest of the 30%, words were collected from different CMATERdb datasets to create versatility in the dataset. Then, the word images were segmented and binarized(background pixels in white and foreground/text pixels in black) so that each image contained no irrelevant information besides the word text consisting of single/multiple classes.

创建时间：

2024-02-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集