five

PromitoLipi: A versatile offline dataset of handwritten Bangla words and paragraphs

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/fnw59h7y89
下载链接
链接失效反馈
官方服务:
资源简介:
The PromitoLipi Dataset contains two different datasets, PromitoLipi1.1 and PromitoLipi2.1. PromitoLipi1.1 contains 80 single-page handwritten paragraphs of individuals of different personalities and ages, with 7231 words. Among these datasets, most of the paragraphs are comprehensive. They can be used for handwritten line/word segmentation, paragraph recognition, and multimodal natural language processing tasks like document summarization, sentiment analysis, and content extraction. Also, some paragraphs are incomprehensive/incoherent( random unassociated words are written one after another) and are primarily helpful for segmentation-based tasks. On the other hand, PromitoLipi2.1 contains 9830 open vocabulary word images consisting of 24050 consonant/vowel/diacritic/number/conjunct/punctuation instances and their corresponding annotation files. This dataset can be used for handwritten character segmentation and word recognition. For 70% of the words in this dataset, Writers of different ages from different areas were asked to write random words on paper. For the rest of the 30%, words were collected from different CMATERdb datasets to create versatility in the dataset. Then, the word images were segmented and binarized(background pixels in white and foreground/text pixels in black) so that each image contained no irrelevant information besides the word text consisting of single/multiple classes.
创建时间:
2024-02-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作