Crohn's Disease Treatment Prediction Model
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/y2hhsygy49
下载链接
链接失效反馈官方服务:
资源简介:
DB for Machine learning using clinical data at baselines. Used to predicts the medium-term efficacy of biologic therapies for in patients with Crohn's Disease.
1. Data Collection Sources
- Electronic Health Records (EHR)
- Clinical trials and studies
- Genetic data
- Patient-reported outcomes
- Medical imaging
Types of Data
- Demographic information
- Clinical data (symptoms, disease severity, treatment history)
- Genetic data (SNPs, mutations)
- Lab results (CRP levels, fecal calprotectin)
- Imaging data (MRI, endoscopy)
- Lifestyle data (diet, smoking status)
2. Data Preprocessing Steps
- Data Cleaning: Handle missing values, remove duplicates, correct errors.
- Data Normalization/Standardization: Normalize lab results, standardize imaging data.
- Feature Engineering: Create new features from existing data, e.g., calculate disease activity scores.
- Encoding Categorical Data: Convert categorical variables to numerical ones using one-hot encoding or label encoding.
- Data Splitting: Split data into training, validation, and test sets.
本数据集为基于基线临床数据构建的机器学习数据库,旨在预测克罗恩病(Crohn's Disease)患者接受生物制剂治疗的中期疗效。
1. 数据采集来源
- 电子健康档案(Electronic Health Records, EHR)
- 临床试验与研究
- 遗传学数据
- 患者报告结局(Patient-reported outcomes)
- 医学影像
数据类型
- 人口统计学信息
- 临床数据(含症状、疾病严重程度、治疗史)
- 遗传学数据(单核苷酸多态性(SNPs)、突变)
- 实验室检测结果(C反应蛋白(CRP)水平、粪便钙卫蛋白(fecal calprotectin))
- 影像数据(磁共振成像(MRI)、内镜检查(endoscopy))
- 生活方式数据(饮食、吸烟状态)
2. 数据预处理流程
- 数据清洗:处理缺失值、去除重复样本、修正数据错误
- 数据归一化/标准化:对实验室检测结果进行归一化,对影像数据进行标准化处理
- 特征工程:基于现有数据构建新特征,例如计算疾病活动度评分
- 分类数据编码:采用独热编码或标签编码将分类变量转换为数值变量
- 数据拆分:将数据集划分为训练集、验证集与测试集
创建时间:
2024-07-12



