Vulnerable Captchas

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14970240

下载链接

链接失效反馈

官方服务：

资源简介：

Description: This dataset focuses on an interesting example of weak Vulnerable Captchas implementations, highlighting potential security vulnerabilities in systems that rely on simple alphanumeric captchas. CAPTCHAs (Completely Automated Public Turing Test to Tell Computers and Humans Apart) are widely used to protect websites from bots and automated scripts. However, not all CAPTCHA implementations are equally secure, and some are prone to exploitation through automated processes. Download Dataset Context The inspiration for this dataset came from a personal experience while accessing a website I frequently use, which I will refer to as “System” for privacy reasons. I wanted to automate a repetitive task on the site using a Python script, but I was initially blocked by a CAPTCHA that was required to complete the login process. CAPTCHAs are generally effective in stopping bots, especially those like Google’s reCAPTCHA, which are difficult to bypass with machine learning models due to their sophisticated design. However, in this case, the CAPTCHA images were simple enough for human eyes to decipher, consisting only of clearly readable alphanumeric characters. The challenge intrigued me, and as I was simultaneously reading “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien Géron, I decided to use this scenario as an opportunity to apply my newly acquired knowledge in machine learning. Problem and Approach The dataset captures images of these vulnerable CAPTCHA challenges and provides annotations for each. During the process of automating the CAPTCHA resolution, I learned that the system did not just rely on the image itself. Upon inspection of the HTML, I found that the CAPTCHA content was hashed and stored inside a hidden form field. Which could easily be manipulate to bypass the verification entirely. Key Learnings CAPTCHA Design Matters: Not all CAPTCHAs are created equal. Simpler alphanumeric CAPTCHAs can often be defeated by image recognition models or form manipulation. Image Classification: This dataset offers a collection of label CAPTCHA images. That could be use to train image classification models aimed at recognizing and solving CAPTCHAs automatically. Security Implications: The project sheds light on the importance of implementing proper security mechanisms beyond just CAPTCHA images. Such as encryption, hashing, and verification strategies that prevent easy manipulation. Practical Approach: Sometimes, simpler solutions such as analyzing the webpage structure and finding security loopholes can be more efficient than complex machine learning models. This dataset is sourced from Kaggle.

创建时间：

2025-03-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集