Supporting data for "How to remove or control confounds in predictive models, with applications to brain biomarkers"

Name: Supporting data for "How to remove or control confounds in predictive models, with applications to brain biomarkers"
Creator: GigaScience Database
Published: 2025-05-26 17:15:28
License: 暂无描述

DataCite Commons2025-05-26 更新2025-04-15 收录

下载链接：

http://gigadb.org/dataset/100980

下载链接

链接失效反馈

官方服务：

资源简介：

With increasing data sizes and more easily available computational methods, neurosciences rely more and more on predictive modeling with machine learning, eg to extract biomarkers of pathologies. Yet, a successful prediction may capture a confounding effect correlated with the outcome instead of brain features specific to the outcome of interest e.g. the pathology. For instance, as patients tend to move more in the scanner than controls, imaging biomarkers of a pathology may mostly reflect head motion, leading to inefficient use of resources and wrong interpretation of the biomarkers. Here we study how to adapt statistical methods that control for confounds to predictive modeling settings. We review how to train predictors that are not driven by such spurious effects. We also show how to measure the unbiased predictive accuracy of these biomarkers, based on a confounded dataset. For this purpose, cross-validation must be modified to account for the nuisance effect. To guide understanding and practical recommendations, we apply various strategies to assess predictive models in the presence of confounds on simulated data and population brain imaging settings. Theoretical and empirical studies show that deconfounding should not be applied to the train and test data jointly: modeling the effect of confounds, on the train data only, should instead be decoupled from removing confounds. Cross-validation that isolates nuisance effects gives an additional piece of information: confound-free prediction accuracy.

提供机构：

GigaScience Database

创建时间：

2022-01-18