Scaling Neuroscience Research with Federated Learning
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://doi.org/10.7910/DVN/2RKAQP
下载链接
链接失效反馈官方服务:
资源简介:
The current repository contains the .csv files used to load the UKBB MRI scans from the local filesystem and train the 5-CNN model for the BrainAge prediction task. Although the files do not contain the raw scan data they can be used as a guide to understand how the split of the data into training and testing was carried out for every experiment presented in the original work. In particular, the test dataset used to evaluate the different policies across all the experimental setups was kept the same, while a different training data assignment was followed for each of the three distributions across the learning sites (8 in total): Uniform & IID Uniform & Non-IID Skewed & Non-IID To create each data distribution, we processed all the available training data of the centralized model, and then for every learning site, we defined the amount of data (partition size) that needed to be assigned along with the mean and standard deviation of the scans' age distribution. For completeness, we provide the split_centralized_training.py script that we used to split the centralized training data (centralized_train.csv). In order to replicate the age distribution of every experiment that we conducted, uncomment the lines of the SIZES, MEAN, and STD lists in the python script for each corresponding case (lines 45-58).
当前代码仓库包含用于从本地文件系统加载英国生物样本库(UK Biobank, UKBB)磁共振成像(MRI)扫描数据,并针对脑年龄预测任务训练5-CNN模型的逗号分隔值(CSV)文件。尽管这些文件未包含原始扫描数据,但可作为参考,帮助理解原研究中每项实验的数据划分为训练集与测试集的具体流程。
具体而言,所有实验配置中用于评估不同策略的测试集均保持一致;而针对共8个学习节点的三类数据分布(均匀独立同分布、均匀非独立同分布、偏态非独立同分布),每项分布均采用了差异化的训练数据分配方案。
为构建每一类数据分布,我们首先处理集中式模型可用的全部训练数据,随后为每个学习节点划定需分配的数据量(即分区大小),同时确定对应扫描数据年龄分布的均值与标准差。
为保证研究完整性,我们附带了用于拆分集中式训练数据(centralized_train.csv)的split_centralized_training.py脚本。
若需复现我们开展的每项实验的年龄分布,请针对对应场景(第45至58行)取消该Python脚本中SIZES、MEAN及STD列表相关代码行的注释。
创建时间:
2021-02-07



