Handwritten Devanagari Characters Dataset –(Vowels, Consonants and Numerals) of 44,000 images for Devanagari CAPTCHA Generation and Recognition.

Name: Handwritten Devanagari Characters Dataset –(Vowels, Consonants and Numerals) of 44,000 images for Devanagari CAPTCHA Generation and Recognition.
Creator: IEEE DataPort
Published: 2022-10-06 15:41:27
License: 暂无描述

DataCite Commons2022-10-06 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/handwritten-devanagari-characters-dataset-–vowels-consonants-and-numerals-44000-images

下载链接

链接失效反馈

官方服务：

资源简介：

Devanagari is a phonetic script that originated from Ancient Brahmi. It is the foundation of various Indian languages. According to data from the year 2022, the Devanagari Hindi script is spoken by over 342 million people worldwide and ranks third among the top 45 languages. There are approximately 11 vowels and 33 consonants and 10 numerals in the Devanagari script. The Devanagari script has no upper-or lower-case letters and is written from left to right.The data set includes 44 handwritten Devanagari vowels, consonants, and numbers (i.e., 4 Vowels, 30 Consonants, and 10 numerals) from 63 Devanagari character sets, 19 images from the character set were eliminated to avoid confusing humans and maintain usability. The dataset is created using 44 (forty-four) distinct Devanagari characters in total.Numerals (10)Vowels (04)Consonants (30)०  १  २  ३  ४  ५  ६  ७  ८  ९अ इ उ ए क ख ग घ च  छ ज झ ट ड ढ ण त  थ  द  ध न प ब भ म य र ल व श ष स ह ळOn a Python-created canvas, the data is gathered and distributed the canvas code to more than one hundred (100+) Devanagari language native speakers of all ages, including both lefts- and right-handed computer users. Each user writes 440 characters (44 characters multiplied by 10) on the canvas and saves it on their computers. All user data is then compiled. The character on the canvas is black with a white background. No image noise is a benefit of using canvas. The total number of character images collected was 44,000 (forty-four thousand).Additionally, data is pre-processed, scaled, and kept in a place that is open to the public. The final data set contains a total of 44,000 digitized images, 10,000 Devanagari numerals (10 numerals * 1000 each), 4000 vowels (4 vowels * 1000 each), and 30,000 consonants (30 Consonants * 1000 each), after the occluded images and scribbles have been removed. Each image has a grayscale data type and is in the .jpeg format. Each image requires 1.5 kb of storage and has a resolution of 65 by 65 pixels. As a result, although just 50 MB of data storage was necessary, 162 MB of disc space was needed. Data was manually organized into the appropriate folders. Additionally.CSV (Comma Separated Values)  files with training sets (70%) and testing sets (30%) are available for the said dataset. A.zip file containing the entire data set of images is also available.

提供机构：

IEEE DataPort

创建时间：

2022-10-06