five

Replication Data for: Measuring and Answering the Challenge of Spurious Correla-tions in Big Search Data

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://doi.org/10.7910/DVN/UW1UYR
下载链接
链接失效反馈
官方服务:
资源简介:
Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phe-nomena, measures that might serve an important role as leading indicators in forecasts and now-casts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies for variables following gamma and spatially auto-correlated distributions, and random walks. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress towards accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from study authors.
创建时间:
2023-02-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作