Replication Data for: Minmaxing of Bayesian Improved Surname and Geography Level Ups in Predicting Race
收藏DataONE2022-09-29 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:e0cde9a2d23f496ac0ad3a56234749ebafc98a4e968abd934655bdfe05aee10d
下载链接
链接失效反馈官方服务:
资源简介:
Racial identification is a critical factor in understanding a multitude of important outcomes in many fields. However, inferring an individual’s race from ecological data is prone to bias and error. This process was only recently improved via Bayesian Improved Surname Geocoding (BISG). With surname and geographic-based demographic data, it is possible to more accurately estimate individual racial identification than ever before. However, the level of geography used in this process varies widely. Whereas some existing work makes use of geocoding to place individuals in precise census blocks, a substantial portion either skips geocoding altogether or relies on estimation using surname or county-level analyses. Presently, the tradeoffs of such variation are unknown. In this letter we quantify those tradeoffs through a validation of BISG on Georgia’s voter file using both geocoded and non-geocoded processes and introduce a new level of geography--ZIP codes--to this method. We find that when estimating the racial identification of White and Black voters, non-geocoded ZIP code-based estimates are acceptable alternatives. However, census blocks provide the most accurate estimations when imputing racial identification for Asian and Hispanic voters. Our results document the most efficient means to sequentially conduct BISG analysis to maximize racial identification estimation while simultaneously minimizing data missingness and bias.
创建时间:
2023-11-19



