five

Devanagari Handwritten CAPTCHA - Dataset of 90 K Images : A Challenge Test

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/v9wwkvdjmm
下载链接
链接失效反馈
官方服务:
资源简介:
Captcha stands for Completely Automated Public Turing Tests to Distinguish Between Humans and Computers. This test cannot be successfully completed by current computer systems; only humans can. It is applied in several contexts for machine and human identification. The most common kind found on websites are text-based CAPTCHAs.A CAPTCHA is made up of a series of alphabets or numbers that are linked together in a certain order. Random lines, blocks, grids, rotations, and other sorts of noise have been used to distort this image.It is difficult for rural residents who only speak their local tongues to pass the test because the majority of the letters in this protected CAPTCHA script are in English. Machine identification of Devanagari characters is significantly more challenging due to their higher character complexity compared to normal English characters and numeral-based CAPTCHAs. The vast majority of official Indian websites exclusively provide content in Devanagari. Regretfully, websites do not employ CAPTCHAs in Devanagari. Because of this, we have developed a brand-new text-based CAPTCHA using Devanagari writing.A canvas was created using Python. This canvas code is distributed to more than one hundred (100+) Devanagari native speakers of all ages, including both left- and right-handed computer users. Each user writes 440 characters (44 characters multiplied by 10) on the canvas and saves it on their computers. All user data is then gathered and compiled. The character on the canvas is black with a white background. No noise in the image is a benefit of using canvas. The final data set contains a total of 44,000 digitized images, 10,000 numerals, 4000 vowels, and 30,000 consonants. This dataset was published for research scholars for recognition and other applications on Mendeley (Mendeley Data, DOI: 10.17632/yb9rmfjzc2.1, dated October 5, 2022) and the IEEE data port (DOI: 10.21227/9zpv-3194, dated October 6, 2022).We have designed our own algorithm to design the Handwritten Devanagari CAPTCHA. We used the above-created handwritten character set. General CAPTCHA generation principles are used to add noise to the image using digital image processing techniques. The size of each CAPTCHA image is 250 x 90 pixels. Three (03) types of character sets are used: handwritten alphabets, handwritten digits, and handwritten alphabets and digits combined. For 09 Classes X 10,000 images , a Devanagari CAPTCHA data set of 90,0000 images was created using Python. All images are stored in CSV format for easy use to researchers. To make the CAPTCHA image less recognized or not easily broken. Passing a test identifying Devanagari alphabets is difficult. It is beneficial to researchers who are investigating captcha recognition in this area. This dataset is helpful to researchers in designing OCR to recognize Devanagari CAPTCHA and break it. If you are able to successfully bypass the CAPTCHA, please acknowledge us by sending an email to sanjayepate@gmail.com.
创建时间:
2023-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作