MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization
收藏DataCite Commons2025-02-03 更新2025-04-16 收录
下载链接:
https://physionet.org/content/labelled-notes-hospital-course/
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents a curated collection of preprocessed and labeled
clinical notes derived from the MIMIC-IV-Note database. The primary aim of
this resource is to facilitate the development and training of machine
learning models focused on summarizing brief hospital courses (BHC) from
clinical discharge notes.
The dataset contains 270,033 meticulously cleaned and standardized clinical
notes containing an average token length of 2,267, ensuring usability for
machine learning (ML) applications. Each clinical note is paired with a
corresponding BHC summary, providing a robust foundation for supervised
learning tasks. The preprocessing pipeline employed uses regular expressions
to address common issues in the raw clinical text, such as special characters,
extraneous whitespace, inconsistent formatting, and irrelevant text, to
produce a high-quality, structured dataset with separated clinical note
sections through appropriate headings.
By offering this resource, we aim to support healthcare professionals and
researchers in their efforts to enhance patient care through the automation of
BHC summarization. This dataset is ideal for exploring various NLP techniques,
developing predictive models, and improving the efficiency and accuracy of
clinical documentation practices. We invite the research community to utilize
this dataset to advance the field of medical informatics and contribute to
better health outcomes.
提供机构:
PhysioNet
创建时间:
2024-09-07



