Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization
收藏physionet.org2025-03-25 收录
下载链接:
https://physionet.org/content/task-1-3-soap-note-tag/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modelling textual features and relation prediction [1] . However, there is a paucity of annotated corpus built to model clinical diagnostic reasoning, a process that involves text understanding, domain knowledge abstraction and reasoning, and clinical text generation. The datasets here support a hierarchical annotation schema with two out of the three stages available to address clinical text understanding and text generation. The datasets provided here are for individual tasks in Stages 1 and 3. The task for Stage 2 was previously accepted as part of the National NLP Clinical Challenges (n2c2) and may be retrieved from the n2c2 challenge website.
The annotated corpus is based on an extensive collection of Intensive Care Unit progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The progress notes were sourced from MIMIC-III. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization. The ultimate goal of these datasets is to advance the development and evaluation of NLP models for clinical applications that lead to AI-assisted clinical decision support and reduce medical errors.
运用自然语言处理方法于电子健康记录(EHR)数据领域正日益发展。现有的语料库和标注集中于文本特征建模和关系预测[1]。然而,针对临床诊断推理的标注语料库建设尚显不足,这一过程涉及文本理解、领域知识抽象和推理以及临床文本生成。本数据集支持一种层级标注方案,其中三个阶段中的两个可供使用,以解决临床文本理解和文本生成问题。所提供的数据集针对第一阶段和第三阶段的单个任务。第二阶段的任务此前已被纳入国家自然语言处理临床挑战(n2c2)的范畴,并可通过n2c2挑战网站获取。标注语料库基于大规模的重症监护单元病程记录,一种以问题为导向的时间序列EHR文档。病程记录来源于MIMIC-III。病程记录的常规格式遵循主观、客观、评估和计划(SOAP)的标题。新颖的任务套件旨在训练和评估未来自然语言处理模型在临床文本理解、临床知识表示、推理和总结方面的性能。这些数据集的最终目标是推进自然语言处理模型在临床应用中的开发与评估,以促进人工智能辅助的临床决策支持并降低医疗错误。
提供机构:
physionet.org



