five

Using First Name Information to Improve Race and Ethnicity Classification

收藏
figshare.com2023-05-31 更新2025-01-22 收录
下载链接:
https://figshare.com/articles/dataset/Using_First_Name_Information_to_Improve_Race_and_Ethnicity_Classification/5813859/2
下载链接
链接失效反馈
官方服务:
资源简介:
This article uses a recent first name list to develop an improvement to an existing Bayesian classifier, namely the Bayesian Improved Surname Geocoding (BISG) method, which combines surname and geography information to impute missing race/ethnicity. The new Bayesian Improved First Name Surname Geocoding (BIFSG) method is validated using a large sample of mortgage applicants who self-report their race/ethnicity. BIFSG outperforms BISG, in terms of accuracy and coverage, for all major racial/ethnic categories. Although the overall magnitude of improvement is somewhat small, the largest improvements occur for non-Hispanic Blacks, a group for which the BISG performance is weakest. When estimating the race/ethnicity effects on mortgage pricing and underwriting decisions with regression models, estimation biases from both BIFSG and BISG are very small, with BIFSG generally having smaller biases, and the maximum a posteriori classifier resulting in smaller biases than through use of estimated probabilities. Robustness checks using voter registration data confirm BIFSG's improved performance vis-a-vis BISG and illustrate BIFSG's applicability to areas other than mortgage lending. Finally, I demonstrate an application of the BIFSG to the imputation of missing race/ethnicity in the Home Mortgage Disclosure Act data, and in the process, offer novel evidence that the incidence of missing race/ethnicity information is correlated with race/ethnicity.

本文采用最新发布的姓名列表,对现有的贝叶斯分类器进行优化,即提出了贝叶斯改进姓氏地理编码(BISG)方法,该方法通过结合姓氏和地理信息来推断缺失的种族/民族。新的贝叶斯改进姓名姓氏地理编码(BIFSG)方法通过大量自我报告种族/民族的自住房抵押贷款申请人样本进行验证。在准确性和覆盖范围方面,BIFSG相较于BISG在所有主要种族/民族类别中均表现出优越性。尽管整体改进幅度较小,但对于BISG表现最弱的非西班牙裔黑人群体,改进效果最为显著。在利用回归模型估计种族/民族对抵押贷款定价和信贷决策的影响时,BIFSG和BISG的估计偏差均非常小,其中BIFSG的偏差通常更小,而最大后验分类器通过使用估计概率所产生偏差小于直接使用估计概率的情况。利用选民登记数据进行稳健性检验,证实了BIFSG相较于BISG在性能上的提升,并展示了BIFSG在抵押贷款贷款之外的领域的适用性。最后,本文展示了BIFSG在补充住房抵押贷款披露法案数据中缺失种族/民族信息中的应用,并在过程中提供了新的证据,表明缺失种族/民族信息的发生率与种族/民族相关。
提供机构:
figshare.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作