Replication Data for: Measuring and Answering the Challenge of Spurious Correla-tions in Big Search Data
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://doi.org/10.7910/DVN/UW1UYR
下载链接
链接失效反馈官方服务:
资源简介:
Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phe-nomena, measures that might serve an important role as leading indicators in forecasts and now-casts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies for variables following gamma and spatially auto-correlated distributions, and random walks. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress towards accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from study authors.
创建时间:
2023-02-17



