Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports

Name: Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports
Creator: Taylor & Francis
Published: 2021-05-08 12:40:18
License: 暂无描述

DataCite Commons2021-05-08 更新2024-07-28 收录

下载链接：

https://tandf.figshare.com/articles/dataset/Natural_language_processing_for_automated_quantification_of_bone_metastases_reported_in_free-text_bone_scintigraphy_reports/12948537/1

下载链接

链接失效反馈

官方服务：

资源简介：

The widespread use of electronic patient-generated health data has led to unprecedented opportunities for automated extraction of clinical features from free-text medical notes. However, processing this rich resource of data for clinical and research purposes, depends on labor-intensive and potentially error-prone manual review. The aim of this study was to develop a natural language processing (NLP) algorithm for binary classification (single metastasis versus two or more metastases) in bone scintigraphy reports of patients undergoing surgery for bone metastases. Bone scintigraphy reports of patients undergoing surgery for bone metastases were labeled each by three independent reviewers using a binary classification (single metastasis versus two or more metastases) to establish a ground truth. A stratified 80:20 split was used to develop and test an extreme-gradient boosting supervised machine learning NLP algorithm. A total of 704 free-text bone scintigraphy reports from 704 patients were included in this study and 617 (88%) had multiple bone metastases. In the independent test set (<i>n</i> = 141) not used for model development, the NLP algorithm achieved an 0.97 AUC-ROC (95% confidence interval [CI], 0.92–0.99) for classification of multiple bone metastases and an 0.99 AUC-PRC (95% CI, 0.99–0.99). At a threshold of 0.90, NLP algorithm correctly identified multiple bone metastases in 117 of the 124 who had multiple bone metastases in the testing cohort (sensitivity 0.94) and yielded 3 false positives (specificity 0.82). At the same threshold, the NLP algorithm had a positive predictive value of 0.97 and F1-score of 0.96. NLP has the potential to automate clinical data extraction from free text radiology notes in orthopedics, thereby optimizing the speed, accuracy, and consistency of clinical chart review. Pending external validation, the NLP algorithm developed in this study may be implemented as a means to aid researchers in tackling large amounts of data.

电子患者生成健康数据的广泛应用，为从自由文本医疗记录中自动提取临床特征带来了前所未有的机遇。然而，将这一丰富的数据资源用于临床与研究目的，仍依赖于劳动密集型且易引入误差的人工审阅流程。本研究旨在开发一款自然语言处理（Natural Language Processing, NLP）算法，用于对骨转移瘤手术患者的骨闪烁成像报告开展二分类任务，即区分单发骨转移灶与2个及以上骨转移灶。本研究纳入的骨转移瘤手术患者骨闪烁成像报告，均由3名独立评审员按照上述二分类标准完成标注，以此构建金标准数据集。研究采用分层80:20划分策略，分别用于该极端梯度提升（extreme-gradient boosting）监督式机器学习NLP算法的开发与测试。本研究共纳入704份来自704名患者的自由文本骨闪烁成像报告，其中617份（88%）提示患者存在多发骨转移瘤。在未参与模型开发的独立测试集（n=141）中，该NLP算法用于鉴别多发骨转移瘤的受试者工作特征曲线下面积（Area Under the Receiver Operating Characteristic Curve, AUC-ROC）达0.97（95%置信区间[CI]：0.92–0.99），精确召回曲线下面积（Area Under the Precision-Recall Curve, AUC-PRC）达0.99（95% CI：0.99–0.99）。当分类阈值设为0.90时，算法在测试队列中准确识别出124名多发骨转移瘤患者中的117名，灵敏度为0.94，仅出现3例假阳性，特异度为0.82。在该阈值下，算法的阳性预测值为0.97，F1分数为0.96。自然语言处理技术有望实现骨科领域自由文本放射学记录的临床数据自动提取，从而优化临床病历审阅的速度、准确性与一致性。在完成外部验证之前，本研究开发的NLP算法可用于辅助研究人员处理大规模临床数据集。

提供机构：

Taylor & Francis

创建时间：

2020-09-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集