five

Replication Data for: The Prevalence and Severity of Underreporting Bias in Machine and Human Coded Data

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://doi.org/10.7910/DVN/CFCZ2L
下载链接
链接失效反馈
官方服务:
资源简介:
Textual data are plagued by underreporting bias. For example, news sources often fail to report human rights violations. Cook et al. (2017) propose a multi-source estimator to gauge, and to account for, the underreporting of state repression events within human codings of news texts produced by the Agence-France Presse (AFP) and Associated Press (AP). We evaluate this estimator with Monte Carlo experiments, and then use it to compare the prevalence and seriousness of underreporting when comparable texts are machine coded and recorded in the World-Integrated Crisis Early Warning System (ICEWS) Dataset. We replicate Cook et al.’s investigation of human-coded state repression events with our machine-coded events, and validate both models against an external measure of human rights protections in Africa. We then use the Cook et al. estimator to gauge the seriousness and prevalence of underreporting in machine and human coded event data on human rights violations in Colombia. We find in both applications that machine coded data are as valid as human coded data.
创建时间:
2018-01-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作