MDIW-13 MultiScript Document Database

IEEE2019-10-25 更新2026-04-17 收录

下载链接：

https://ieee-dataport.org/open-access/mdiw-13-multiscript-document-database

下载链接

链接失效反馈

官方服务：

资源简介：

Wide varieties of scripts are used in writing languages throughout the world. In a multiscript and multi-language environment, it is necessary to know the different scripts used in every part of a document to apply the appropriate document analysis algorithm. Consequently, several approaches for automatic script identification have been proposed in the literature, and can be broadly classified under two categories of techniques: those that are structure and visual appearance-based and those that are deep learning-based. Incidentally, since most existing techniques have been tested using different datasets and script combinations, a fair comparison between them is difficult. To alleviate this drawback, this paper therefore introduces a multiscript database, which contains both printed and handwritten documents obtained from a wide variety of scripts, such as Arabic, Bengali, Gujarati, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu and Thai. The dataset consists of 1137 documents scanned from local newspapers, as well as handwritten letters and notes. Further, these documents are segmented into lines and words, for a total, respectively, of 13,983 and 86,675 lines and words in the dataset.

创建时间：

2019-10-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集