five

Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records

收藏
NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/_Automatic_Prediction_of_Rheumatoid_Arthritis_Disease_Activity_from_the_Electronic_Medical_Records_/775125
下载链接
链接失效反馈
官方服务:
资源简介:
Objective We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. Materials and Methods The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. Results Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. Conclusion Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.

### 研究目标 我们旨在挖掘电子病历(Electronic Medical Record, EMR)中的数据,以自动识别患者在不同风湿科门诊就诊时的类风湿关节炎(Rheumatoid Arthritis, RA)疾病活动度。本研究将该问题建模为文档分类任务,其特征空间涵盖电子病历中存储的临床叙事文本概念与实验室检验指标。 ### 材料与方法 训练集包含2792份临床笔记及其配套的实验室检验指标。测试集1包含1749份临床笔记及其配套的实验室检验指标。测试集2包含344份无配套实验室检验指标的临床笔记。本研究采用Apache临床文本分析与知识抽取系统(Apache clinical Text Analysis and Knowledge Extraction System)对文本进行分析,并将其转换为可与相关实验室检验指标结合的有效特征。 ### 研究结果 本研究针对多种机器学习算法与特征组合开展了实验。性能最优的组合为结合特征选择与实验室检验指标、采用统一医学语言系统(Unified Medical Language System, UMLS)概念唯一标识符(Concept Unique Identifier, CUI)特征的线性核支持向量机(Support Vector Machines, SVM)。其受试者工作特征曲线下面积(Area Under the Receiver Operating Characteristic Curve, AUC)为0.831(标准差σ=0.0317),与两个基准模型(AUC=0.758,标准差σ=0.0291)相比具有统计学显著性。相较于临床定义为疾病活动度中间类别(中度与低度)的病例,该算法在临床定义为极端类别(缓解与高活动度)的病例上表现更优,且纳入了炎症标志物的实验室数据。 ### 研究结论 从电子病历数据中自动识别类风湿关节炎疾病活动度是一项可学习的任务,其性能可接近人类水平。因此,该方法可应用于多项研究场景,例如为全基因组药物基因组学研究筛选患者——这类研究需要大样本量,并对疾病活动度与治疗响应有精准的定义要求。
创建时间:
2013-08-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作