five

Race and ethnicity data for first, middle, and last names

收藏
DataONE2023-04-11 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:4e4ba43af2df6aad8cbb93cd087d9f344c4ccc2efadd78acc3773e74c04a677e
下载链接
链接失效反馈
官方服务:
资源简介:
We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: \"Name Dictionaries for \"wru\" R Package\", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the \"WRU\" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.
创建时间:
2023-11-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作