Screenshots and metadata for 214 reCAPTCHA challenges encountered between September 2022 - September 2023
收藏DataCite Commons2026-03-15 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.h70rxwdsr
下载链接
链接失效反馈官方服务:
资源简介:
In Chapter 3 of my dissertation (tentatively titled " Becoming
Users:Layers of People, Technology, and Power on the Internet. "), I
describe how online user activities are datafied and monetized in subtle
and often obfuscated ways. The chapter focuses on Google’s reCAPTCHA, a
popular implementation of a CAPTCHA challenge. A CAPTCHA, or “Completely
Automated Turning test to tell Computers and Humans Apart” is a simple
task or challenge which is intended to differentiate between genuine human
users and those who may be using software or other automated means to
interact maliciously with a website, such as for spam, mass data scraping,
or denial of service attacks. reCAPTCHA challenges are increasingly being
hidden from direct view of the user, and instead assessing our mouse
movements, browsing patterns, and other data to evaluate the likelihood
that we are “authentic” users. These hidden challenges raise the stakes of
understanding our own construction as Users because they obfuscate
practices of surveillance and the ways that our activities as users are
commodified by large corporations (Pettis, 2023). By studying the
specifics of how such data collection works—that is, how we’re called upon
and situated as Users—we can make more informed decisions about how we
engage with the contemporary internet. This data set contains metadata for
the 214 reCAPTCHA elements that I encountered during my personal use of
the Web for the period of one year (September 2022 through September
2023). Of these reCAPTCHAs, 137 were visible challenges—meaning that there
was some indication of the presence of a reCAPTCHA challenge. The
remaining 77 reCAPTCHAs were entirely hidden on the page. If I had not
been running my browser extension, I would likely never have been aware of
the use of a reCAPTCHA on the page. The data set also includes screenshots
for 174 of the reCAPTCHAs. Screenshots that contain sensitive or private
information have been excluded from public access. Researchers can request
access to these additional files by contacting Ben Pettis
<bpettis@wisc.edu>. A browsable and searchable version of
the data is also available at https://capturingcaptcha.com
本研究学位论文的第三章暂定题为《成为用户:互联网上的人、技术与权力层级》(Becoming Users:Layers of People, Technology, and Power on the Internet.),文中阐述了在线用户活动如何以隐蔽且常被模糊化的方式被数据化并实现商业化变现。本章聚焦谷歌(Google)推出的reCAPTCHA——一款广为使用的CAPTCHA挑战实现方案。CAPTCHA即“全自动区分计算机和人类的图灵测试(Completely Automated Public Turing test to tell Computers and Humans Apart,简称CAPTCHA)”,是一类旨在区分真实人类用户与使用软件或其他自动化手段恶意访问网站(如发送垃圾邮件、大规模数据爬取或拒绝服务攻击)的用户的简易任务或挑战。
如今reCAPTCHA挑战正越来越多地对用户隐藏直接交互界面,转而通过采集用户的鼠标移动轨迹、浏览模式及其他数据,以判断其为“真实”用户的概率。这类隐蔽的挑战进一步提升了理解我们自身作为“用户”之身份建构的难度,因为它们掩盖了监视行为,以及大型科技公司将用户活动商品化的运作方式(Pettis, 2023)。通过研究此类数据收集的具体运作逻辑——即我们如何被召唤并被定位为“用户”——我们能够更明智地决策如何参与当代互联网活动。
本数据集涵盖了笔者在2022年9月至2023年9月为期一年的个人网页使用期间所遇到的214个reCAPTCHA元素的元数据。其中137个reCAPTCHA为可见挑战,即页面存在明确的reCAPTCHA挑战标识;剩余77个则完全隐藏于页面中。若未运行笔者开发的浏览器扩展程序,笔者大概率永远无法察觉该页面部署了reCAPTCHA。
本数据集还包含了其中174个reCAPTCHA的截图。包含敏感或隐私信息的截图已被排除在公开访问范围之外。研究人员可通过联系本·佩蒂斯(Ben Pettis,邮箱:bpettis@wisc.edu)申请获取此类额外文件。该数据集的可浏览与可检索版本已上线https://capturingcaptcha.com。
提供机构:
Dryad
创建时间:
2024-06-19



