Screenshots and metadata for 214 reCAPTCHA challenges encountered between September 2022 - September 2023

Name: Screenshots and metadata for 214 reCAPTCHA challenges encountered between September 2022 - September 2023
Creator: Dryad
Published: 2026-03-15 22:17:48
License: 暂无描述

DataCite Commons2026-03-15 更新2025-04-10 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.h70rxwdsr

下载链接

链接失效反馈

官方服务：

资源简介：

In Chapter 3 of my dissertation (tentatively titled " Becoming Users:Layers of People, Technology, and Power on the Internet. "), I describe how online user activities are datafied and monetized in subtle and often obfuscated ways. The chapter focuses on Google’s reCAPTCHA, a popular implementation of a CAPTCHA challenge. A CAPTCHA, or “Completely Automated Turning test to tell Computers and Humans Apart” is a simple task or challenge which is intended to differentiate between genuine human users and those who may be using software or other automated means to interact maliciously with a website, such as for spam, mass data scraping, or denial of service attacks. reCAPTCHA challenges are increasingly being hidden from direct view of the user, and instead assessing our mouse movements, browsing patterns, and other data to evaluate the likelihood that we are “authentic” users. These hidden challenges raise the stakes of understanding our own construction as Users because they obfuscate practices of surveillance and the ways that our activities as users are commodified by large corporations (Pettis, 2023). By studying the specifics of how such data collection works—that is, how we’re called upon and situated as Users—we can make more informed decisions about how we engage with the contemporary internet. This data set contains metadata for the 214 reCAPTCHA elements that I encountered during my personal use of the Web for the period of one year (September 2022 through September 2023). Of these reCAPTCHAs, 137 were visible challenges—meaning that there was some indication of the presence of a reCAPTCHA challenge. The remaining 77 reCAPTCHAs were entirely hidden on the page. If I had not been running my browser extension, I would likely never have been aware of the use of a reCAPTCHA on the page. The data set also includes screenshots for 174 of the reCAPTCHAs. Screenshots that contain sensitive or private information have been excluded from public access. Researchers can request access to these additional files by contacting Ben Pettis <bpettis@wisc.edu>. A browsable and searchable version of the data is also available at https://capturingcaptcha.com

本研究学位论文的第三章暂定题为《成为用户：互联网上的人、技术与权力层级》（Becoming Users:Layers of People, Technology, and Power on the Internet.），文中阐述了在线用户活动如何以隐蔽且常被模糊化的方式被数据化并实现商业化变现。本章聚焦谷歌（Google）推出的reCAPTCHA——一款广为使用的CAPTCHA挑战实现方案。CAPTCHA即“全自动区分计算机和人类的图灵测试（Completely Automated Public Turing test to tell Computers and Humans Apart，简称CAPTCHA）”，是一类旨在区分真实人类用户与使用软件或其他自动化手段恶意访问网站（如发送垃圾邮件、大规模数据爬取或拒绝服务攻击）的用户的简易任务或挑战。如今reCAPTCHA挑战正越来越多地对用户隐藏直接交互界面，转而通过采集用户的鼠标移动轨迹、浏览模式及其他数据，以判断其为“真实”用户的概率。这类隐蔽的挑战进一步提升了理解我们自身作为“用户”之身份建构的难度，因为它们掩盖了监视行为，以及大型科技公司将用户活动商品化的运作方式（Pettis, 2023）。通过研究此类数据收集的具体运作逻辑——即我们如何被召唤并被定位为“用户”——我们能够更明智地决策如何参与当代互联网活动。本数据集涵盖了笔者在2022年9月至2023年9月为期一年的个人网页使用期间所遇到的214个reCAPTCHA元素的元数据。其中137个reCAPTCHA为可见挑战，即页面存在明确的reCAPTCHA挑战标识；剩余77个则完全隐藏于页面中。若未运行笔者开发的浏览器扩展程序，笔者大概率永远无法察觉该页面部署了reCAPTCHA。本数据集还包含了其中174个reCAPTCHA的截图。包含敏感或隐私信息的截图已被排除在公开访问范围之外。研究人员可通过联系本·佩蒂斯（Ben Pettis，邮箱：bpettis@wisc.edu）申请获取此类额外文件。该数据集的可浏览与可检索版本已上线https://capturingcaptcha.com。

提供机构：

Dryad

创建时间：

2024-06-19