Dataset for Bangla Text Detection and Recognition

Name: Dataset for Bangla Text Detection and Recognition
Creator: Redwan Ahmed Rizvee
License: 暂无描述

Mendeley Data2026-04-09 收录

下载链接：

https://data.mendeley.com/datasets/n57phs3k4t

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset can be used for Bangla Text Detection and Recognition. There are two folders and one (.xlsx) file. "Image Folder" contains the images of all the text documents. "Word Folder" contains all the text in (.docx) format for the parallel image file. So, if we have a text image "Image Folder/a/b.jpg", then we also have a corresponding text docx file "Word Folder/a/b.docx". There are, in-total, 1166 parallel documents of images and corresponding texts. The images are PDFs containing Bangla-typed texts collected from various sources, novels, stories, educational books, etc. The "Typing List.xlsx" file is the collection containing the names of the parallel jpg and docx files. During preparing the docx files, alignment is maintained to keep it similar to the corresponding image texts. The goal of this data collection problem is to gather sufficient data to train machine learning models as such so that the architecture can be used for scanning Bangla documents and extracting the texts on the fly, maintaining the spaces and alignments.

提供机构：

Redwan Ahmed Rizvee

5,000+

优质数据集

54 个

任务类型

进入经典数据集