five

fairness_bias.pdf

收藏
DataCite Commons2024-04-11 更新2024-07-13 收录
下载链接:
https://hra.figshare.com/articles/dataset/fairness_bias_pdf/25587981/1
下载链接
链接失效反馈
官方服务:
资源简介:
<em>Objective: The study aims to investigate whether machine learning-based predictive models for cardiovascular</em> <em>disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and</em> <em>gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic</em> <em>bias may be introduced when collecting and preprocessing health data, which could affect the performance of the</em> <em>models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data</em> <em>and various machine learning models.</em> <em>Methods: The study used large de-identified Electronic Health Records data from Vanderbilt University Medical</em> <em>Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees,</em> <em>and long short-term memory were applied to build multiple predictive models. Model bias and fairness were</em> <em>evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates</em> <em>fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association</em> <em>(AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different</em> <em>de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training</em> <em>dataset by sample size, and resampling by the proportion of people with CVD outcomes.</em> <em>Results: The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3%</em> <em>White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than</em> <em>PCEs. For ML models, the mean EOD ranged from 􀀀 0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094</em> <em>across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136</em> <em>and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn’t significantly</em> <em>reduced the bias for most ML models. Resampling by sample size also didn’t consistently decrease bias.</em> <em>Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many</em> <em>cases.</em> <em>Conclusions: Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the</em> <em>need to investigate and correct gender disparities</em>
提供机构:
Health Research Alliance
创建时间:
2024-04-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作