five

Methods to Detect Low Quality Data and Its Implication for Psychological Research

收藏
osf.io2021-06-14 更新2025-03-25 收录
下载链接:
https://osf.io/x6t8a
下载链接
链接失效反馈
官方服务:
资源简介:
Web-based data collection methods such as Amazon's Mechanical Turk (AMT) are an appealing option to recruit participants quickly and cheaply for psychological research. While concerns regarding data quality have emerged with AMT, several studies have exhibited that data collected via AMT are as reliable as traditional college samples and are often more diverse and representative of noncollege populations. The development of methods to screen for low quality data, however, has been less explored. Omitting participants based on simple screening methods in isolation, such as response time or attention checks may not be adequate identification methods, with an inability to delineate between high or low effort participants. Additionally, problematic survey responses may arise from survey automation techniques such as survey bots or automated form fillers. The current project developed low quality data detection methods while overcoming previous screening limitations. Multiple checks were employed, such as page response times, distribution of survey responses, the number of utilized choices from a given range of scale options, click counts, and manipulation checks. This method was tested on a survey taken with an easily available plug-in survey bot, as well as compared to data collected by human participants providing both high effort and randomized, or low effort, answers. Identified cases can then be used as part of sensitivity analyses to warrant exclusion from further analyses. This algorithm can be a promising tool to identify low quality or automated data via AMT or other online data collection platforms.

基于网络的资料收集方法,如亚马逊的Mechanical Turk(AMT),因能迅速且低成本地招募参与者而备受心理学研究者的青睐。尽管AMT在数据质量方面引发了诸多担忧,但多项研究表明,通过AMT收集的数据与传统大学样本相比,其可靠性相当,且往往更具多样性,更能代表非大学人群。然而,针对低质量数据筛选方法的研究却相对较少。仅依据简单的筛选标准,如响应时间或注意力检查来排除参与者,可能不足以作为充分识别高或低努力程度参与者的方法。此外,由调查自动化技术,如调查机器人或自动表单填充器等引起的调查回答问题也可能出现。当前项目在克服先前筛选限制的同时,开发了低质量数据检测方法。采用了多种检查手段,例如页面响应时间、调查回答的分布、给定范围内选项的使用数量、点击次数以及操作检查等。该方法在易于获取的插件调查机器人进行的调查中进行了测试,并与提供高努力程度和随机化或低努力程度答案的人类参与者收集的数据进行了比较。识别出的案例可作为敏感性分析的一部分,以确保其排除在进一步分析之外。此算法有望成为通过AMT或其他在线数据收集平台识别低质量或自动化数据的潜在工具。
提供机构:
osf.io
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作