five

Dataset for Bangla Text Detection and Recognition

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/n57phs3k4t
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset can be used for Bangla Text Detection and Recognition. There are two folders and one (.xlsx) file. "Image Folder" contains the images of all the text documents. "Word Folder" contains all the text in (.docx) format for the parallel image file. So, if we have a text image "Image Folder/a/b.jpg", then we also have a corresponding text docx file "Word Folder/a/b.docx". There are, in-total, 1166 parallel documents of images and corresponding texts. The images are PDFs containing Bangla-typed texts collected from various sources, novels, stories, educational books, etc. The "Typing List.xlsx" file is the collection containing the names of the parallel jpg and docx files. During preparing the docx files, alignment is maintained to keep it similar to the corresponding image texts. The goal of this data collection problem is to gather sufficient data to train machine learning models as such so that the architecture can be used for scanning Bangla documents and extracting the texts on the fly, maintaining the spaces and alignments.
提供机构:
Redwan Ahmed Rizvee
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作