five

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

收藏
physionet.org2025-01-15 收录
下载链接:
https://physionet.org/content/labelled-notes-hospital-course/1.1.0/
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset presents a curated collection of preprocessed and labeled clinical notes derived from the MIMIC-IV-Note database. The primary aim of this resource is to facilitate the development and training of machine learning models focused on summarizing brief hospital courses (BHC) from clinical discharge notes. The dataset contains 270,033 meticulously cleaned and standardized clinical notes containing an average token length of 2,267, ensuring usability for machine learning (ML) applications. Each clinical note is paired with a corresponding BHC summary, providing a robust foundation for supervised learning tasks. The preprocessing pipeline employed uses regular expressions to address common issues in the raw clinical text, such as special characters, extraneous whitespace, inconsistent formatting, and irrelevant text, to produce a high-quality, structured dataset with separated clinical note sections through appropriate headings. By offering this resource, we aim to support healthcare professionals and researchers in their efforts to enhance patient care through the automation of BHC summarization. This dataset is ideal for exploring various NLP techniques, developing predictive models, and improving the efficiency and accuracy of clinical documentation practices. We invite the research community to utilize this dataset to advance the field of medical informatics and contribute to better health outcomes.

本数据集呈现了从MIMIC-IV-Note数据库中精心挑选、预处理并标注的临床笔记集合。该资源的主要目的是为了促进专注于从临床出院笔记中总结简短住院过程(BHC)的机器学习模型的开发和训练。数据集中包含了270,033条经过精心清洗和标准化的临床笔记,平均Token长度为2,267,确保了其在机器学习(ML)应用中的可用性。每条临床笔记都与相应的BHC摘要配对,为监督学习任务提供了坚实的基石。所采用的预处理流程通过正则表达式来解决原始临床文本中常见的诸如特殊字符、多余的空白、不一致的格式和无关文本等问题,从而生成一个高质量的、结构化的数据集,并通过适当的标题将临床笔记部分分离。通过提供这一资源,我们旨在支持医疗保健专业人员和研究者在通过自动化BHC总结以提升患者护理水平的努力中。本数据集适用于探索各种自然语言处理技术、开发预测模型以及提高临床文档实践效率和准确性的研究。我们邀请研究界利用此数据集推进医学信息学领域的发展,并为改善健康结果做出贡献。
提供机构:
physionet.org
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作