Handwritten Devanagari Characters Dataset –(Vowels, Consonants and Numerals) of 44,000 images for Devanagari CAPTCHA Generation and Recognition.

Name: Handwritten Devanagari Characters Dataset –(Vowels, Consonants and Numerals) of 44,000 images for Devanagari CAPTCHA Generation and Recognition.
Creator: IEEE Dataport
License: 暂无描述

ieee-dataport.org2025-03-21 收录

下载链接：

https://ieee-dataport.org/documents/handwritten-devanagari-characters-dataset-%E2%80%93vowels-consonants-and-numerals-44000-images

下载链接

链接失效反馈

官方服务：

资源简介：

Devanagari is a phonetic script that originated from Ancient Brahmi. It is the foundation of various Indian languages. According to data from the year 2022, the Devanagari Hindi script is spoken by over 342 million people worldwide and ranks third among the top 45 languages. There are approximately 11 vowels and 33 consonants and 10 numerals in the Devanagari script. The Devanagari script has no upper-or lower-case letters and is written from left to right.The data set includes 44 handwritten Devanagari vowels, consonants, and numbers (i.e., 4 Vowels, 30 Consonants, and 10 numerals) from 63 Devanagari character sets, 19 images from the character set were eliminated to avoid confusing humans and maintain usability. The dataset is created using 44 (forty-four) distinct Devanagari characters in total.Numerals (10)Vowels (04)Consonants (30)० १ २ ३ ४ ५ ६ ७ ८ ९अ इ उ ए क ख ग घ च छ ज झ ट ड ढ ण त थ द ध न प ब भ म य र ल व श ष स ह ळOn a Python-created canvas, the data is gathered and distributed the canvas code to more than one hundred (100+) Devanagari language native speakers of all ages, including both lefts- and right-handed computer users. Each user writes 440 characters (44 characters multiplied by 10) on the canvas and saves it on their computers. All user data is then compiled. The character on the canvas is black with a white background. No image noise is a benefit of using canvas. The total number of character images collected was 44,000 (forty-four thousand).Additionally, data is pre-processed, scaled, and kept in a place that is open to the public. The final data set contains a total of 44,000 digitized images, 10,000 Devanagari numerals (10 numerals * 1000 each), 4000 vowels (4 vowels * 1000 each), and 30,000 consonants (30 Consonants * 1000 each), after the occluded images and scribbles have been removed. Each image has a grayscale data type and is in the .jpeg format. Each image requires 1.5 kb of storage and has a resolution of 65 by 65 pixels. As a result, although just 50 MB of data storage was necessary, 162 MB of disc space was needed. Data was manually organized into the appropriate folders. Additionally.CSV (Comma Separated Values) files with training sets (70%) and testing sets (30%) are available for the said dataset. A.zip file containing the entire data set of images is also available.

梵文天城文是一种起源于古布拉米文法的音节文字，它是众多印度语言的基石。根据2022年的数据，全球超过3.42亿人使用天城文梵文书写，位列全球45种主要语言排名的第三位。天城文包含约11个元音、33个辅音以及10个数字，且该文字体系无大小写之分，书写方向为从左至右。该数据集包含44个手写的天城文元音、辅音和数字（即4个元音、30个辅音和10个数字），这些字符来源于63种不同的天城文字符集。为避免混淆并保持可用性，从字符集中剔除了19幅图像。数据集的构建基于44个独特的天城文字符。其中，数字（10个）、元音（4个）和辅音（30个）。字符在画布上呈现为黑色背景，这一设计减少了图像噪声，并利于使用。总共收集了44,000（四万四千）个字符图像。此外，数据经过预处理、缩放，并被存放在公共可访问的位置。最终数据集包含总计44,000个数字化图像，10,000个天城文数字（每个数字1,000个）、4,000个元音（每个元音1,000个）和30,000个辅音（每个辅音1,000个），在移除遮挡图像和涂鸦后。每幅图像具有灰度数据类型，并以.jpeg格式存储。每幅图像占用的存储空间为1.5 kb，分辨率为65 x 65像素。因此，尽管数据存储仅需50 MB，但磁盘空间需求达162 MB。数据被人工整理到相应的文件夹中。此外，该数据集还提供了包含训练集（70%）和测试集（30%）的CSV（逗号分隔值）文件，以及包含整个图像数据集的.zip文件。

提供机构：

IEEE Dataport