five

Bias and High-Dimensional Adjustment in Observational Studies of Peer Effects

收藏
DataCite Commons2021-04-01 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Bias_and_high-dimensional_adjustment_in_observational_studies_of_peer_effects/12851349/4
下载链接
链接失效反馈
官方服务:
资源简介:
Peer effects, in which an individual’s behavior is affected by peers’ behavior, are posited by multiple theories in the social sciences. Randomized field experiments that identify peer effects, however, are often expensive or infeasible, so many studies of peer effects use observational data, which is expected to suffer from confounding. Here we show, in the context of information and media diffusion, that high-dimensional adjustment of a nonexperimental control group (660 million observations) using propensity score models produces estimates of peer effects statistically indistinguishable from those using a large randomized experiment (215 million observations). Compared with the experiment, naive observational estimators overstate peer effects by over 300% and commonly available variables (e.g., demographics) offer little bias reduction. Adjusting for a measure of prior behaviors closely related to the focal behavior reduces this bias by 91%, while models adjusting for over 3700 past behaviors provide additional bias reduction, reducing bias by over 97%, which is statistically indistinguishable from unbiasedness. This demonstrates how detailed records of behavior can improve studies of social influence, information diffusion, and imitation; these results are encouraging for the credibility of some studies but also cautionary for studies of peer effects in rare or new behaviors. More generally, these results show how large, high-dimensional datasets and statistical learning can be used to improve causal inference. Supplementary materials for this article are available online.

社会科学领域的诸多理论均提出了同伴效应(peer effects)这一概念——即个体行为会受到同伴行为的影响。然而,用于识别同伴效应的随机实地实验往往成本高昂且难以实施,因此众多同伴效应研究采用观测数据,但这类数据往往存在混杂偏倚问题。本文以信息与媒体传播场景为研究背景,证明通过倾向得分模型(propensity score models)对非实验对照组(含6.6亿条观测样本)进行高维度调整后,得到的同伴效应估计结果,与基于大型随机实验(含2.15亿条观测样本)的估计结果在统计学上无显著差异。相较于随机实验,朴素观测估计量会将同伴效应高估300%以上,而常用变量(如人口统计学变量)几乎无法降低这类偏倚。若调整与核心行为高度相关的既往行为指标,可将偏倚降低91%;而针对超过3700项既往行为进行调整的模型,则可进一步降低偏倚,使偏倚减少幅度超过97%,此时的估计结果与无偏估计在统计学上无显著差异。这一研究结果证明,详尽的行为记录可如何推动社会影响、信息传播与模仿行为相关研究的完善;该结论既为部分研究的可信度提供了支撑,也提醒学界需谨慎对待针对稀有或新兴行为的同伴效应研究。更广泛而言,本研究展示了大型高维度数据集与统计学习方法可如何用于改进因果推断。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2020-09-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作